laws of structuring data in Excel

The 10 immutable laws of structuring data in Excel

There's a myriad of ways to create and manipulate data in spreadsheets - there's no one right way, but some are far better than others. Follow the 10 rules below and you'll make your work less error prone, easier to use, perform faster, and will save time when you come to analyse data.

1. Start your data in A1 and organise tables vertically

Excel is a little like a canvas - you can place data anywhere on a sheet, and for analysis or presentation this allows you to create rich, good looking spreadsheets. For data it’s a little different.

Excel functions best when data is organised like it would be in a database, that means starting in the top left of the sheet in cell A1, column headers in row 1 going from left to right and data flowing vertically down the page.

This method make sorting and filtering data easier. It also makes functions, charts and pivot tables less error prone because you can select entire columns as ranges instead of selecting the exact dimensions of your table, which can change over time.

2. One piece of information per cell

I can’t express this enough - don’t store multiple pieces of information per cell. For example, if you have a table of shirt sales, put the size and the colour in different columns. This will make analysis, functions and charts much easier and less error prone.

3. Don’t use colours to denote data characteristics

I know, it’s an easy trap to fall into, and I’ve done it myself. You’re searching data for a particular characteristic - items owned by an individual or past a certain date, and you highlight them (usually in that awful bright yellow colour).

This is really bad practice - you cannot perform functions, charts or pivot tables based on colour. There is functionality in the newer versions of excel to sort by colour, but that’s as far as you get. Colours can also be overwritten or lost by copy & pasting.

Instead, add a new column containing markers for the rows you would have highlighted. Try using Yes/No or 1/0 for easy analysis.

The one exception to this is when the cell also contains the information, but you are using the colour to denote priority. For example, red/amber/green status for a list of projects is OK if there is also a column which denotes that status using text.

4. Be consistent

Whatever you’re doing in Excel - creating tables, charts, functions, do it consistently. It will save you and other users of your spreadsheets trying to decipher the meaning and workings of your work.

Use consistent data categories. If for example, your table contains a 'status' field, make sure there’s only one spelling of the word 'Cancelled'. Only enter numbers into numerical fields and dates into date fields. Don’t sometimes write 'canceled'. Pick one and stick with it. Creating drop down lists using Data Validation can help with this.

Use consistent formats - make sure you’re using the same date and numerical formats throughout your work.

Use a consistent layout across your work. If you can, organise your spreadsheets in a consistent manner. For example, I always add a Glossary and Settings tab to my work.

Use consistent file names - this will help users find the most up to date version of a spreadsheet. Use the same naming convention with a date and version number.

5. Choose good naming conventions

This is not just a tip for data, but goes for almost any work on a computer. Firstly, choose good file names - that means including a date and version number along with a descriptive name. For example, calling a file update.xlsx is practically useless. 20180117 Web Design Project update v1.2.xlsx provides much more information.

You should avoid naming your file with the word FINAL or similar - usually you’ll end up with it not being the final version and calling it FINAL 2 or similar.

As a general rule, don’t use spaces, either in variable names or file names - use underscore instead. This makes functions and VBA easier to use and less error prone.

Similarly, in your spreadsheet use good sheet names and good column headers to ensure your data can be understood.

Avoid special characters, except for underscores and hyphens. Other symbols ($, @, %, #, &, *, (, ), !, /, etc.) are best avoided as they can make functions, pivot tables, and macros more error prone.

6. Choose what to do with empty cells

What you do with blank cells can seriously affect the accuracy of your work. For example, if you are trying to count or average a numerical field do you want to include blank cells in the calculations?

For text fields, you should make a distinction between cells that are missing and cells that are deliberately blank.

Populate them with a null value such as Unknown, NULL, N/A, 0.

7. Use only one row per entry

This might seem obvious, but if your data requires more than one row per entry, then you’ve created your columns incorrectly. If you have a list of staff on payroll for example, and you need two rows to show two bank accounts for each person - consider if this could be done by having two bank account columns.

Having one data entry across two rows means you cannot sort, filter, or analyse your data in a meaningful way.

8. Create a dictionary or glossary

It’s tedious, but if you’re creating a spreadsheet that others will use - or even one you’re using just by yourself, it is useful to create a sheet dedicated to explaining what the data is, where it originated, who the best contact is, and any special characteristics the user should be aware of.

9. Keep raw data and analysis separate

Often, Excel data originates elsewhere - an export from another system, or a spreadsheet populated elsewhere in your organisation.

It’s a sensible practice to keep raw data intact - without any functions or analysis. Keep any analysis, or data enrichment in a separate file.

Doing this maintains data integrity, so any data that gets removed or altered can be recovered if necessary. You can also use this method to help prove your work is correct.

10. Write dates in a recognised format

Finally, dates in Excel can sometimes be problematic. Excel sees a date as a number which is just formatted to look like a date.

This counting system has been in place since the early days of Excel - day number 1 is 1st January 1900. You can see this by changing the format of any date to a number - for example 1st January 2018 is 43,101.

However, dates exported from other systems are often not recognised by Excel due to their format (in the UK this often caused by dates being exported from other systems in US date format). Throughout this course we will cover a few methods to convert text dates into ones that Excel recognises, but check if you have the option to export in the right format to begin with as this can save on time and sometime complex formulas to correct the data. 

 

Get More Exclusive Content

Sent Directly to your inbox with Excel 3 Tip Friday. A weekly dose of Functions shortcuts and tips for getting more out of Excel.

We won't send you spam. Unsubscribe at any time. Powered by ConvertKit