Organising your data can feel like a chore – but that doesn’t have to be the case…
If you know me, you know that I’m the type of person who loves things to be organised. I thrive off deadlines, I live by my to-do lists and most importantly, I like to keep my Excel spreadsheets clean and tidy.
Keeping your data clean is essential when it comes to Excel. Why? Because having a clean set of data will increase your overall productivity and allow for your formulas and functions to produce the highest quality output.
It also means there’s less room for errors (both human and computer), which means that you’ll be able to produce more reliable results which can lead to better decision-making.
Sounds good right?
What is data cleaning?
Simply put, data cleaning is the process of fixing or removing any data that might cause errors in your Excel calculations.
Examples of data that needs to be cleaned include:
- Incorrect or corrupted data sets
- Incorrectly formatted data
- Duplicated data
- Incomplete data
Cleaning your data is especially important when you’re combining multiple sources. Why? Because with multiple sources there’s more opportunity for things to be duplicated, or labelled differently between data sets, or a variety of other issues which will create unreliable outcomes (even though they might look correct).
So what are some of the most effective ways to clean your data?
Keep reading for my top tips on keeping your data sets flawless in Excel…
10 quick tips for data cleaning in Excel
1. Spell Check
Use the spell checker tool to find words that have been spelt incorrectly throughout your data. You can also use this tool to find values that haven’t been used consistently, such as product or company names (add those values to your custom dictionary to make sure a spell check will identify errors!)
2. Remove duplicate rows
Run a filter in your sheet to remove duplicate rows. To remove duplicate values, click ‘Data’ > ‘Data Tools’ > ‘Remove Duplicates’.
3. Find and replace text
Use the ‘Find and Replace’ toolbox to search your data for values that you want to amend. For example, if you have a label followed by a colon, you can use the ‘Find and Replace’ to search all instances of this and remove the colon from all of the labels in one go.
4. Change the case of your text
You can use the UPPER, LOWER, and PROPER functions in Excel to change the case of your text and make sure that your data reads consistently. This is handy for making sure things like email addresses, product names, product codes etc are all kept uniform.
5. Remove spaces and nonprinting characters from your text
If text values contain leading, trailing, or multiple embedded space characters, or nonprinting characters, this can cause errors when you sort, filter or try using a Vlookup or Match formula. To make this data easier to analyse, use the TRIM, CLEAN and SUBSTITUTE functions to clean your data.
6. Fix numbers and number signs
Before you start to analyse your data, clean it by converting numbers that are stored in cells as text (which may confuse calculations or muddle sort orders) to number format using the ‘Text to Columns’ feature in the data ribbon.
7. Fix dates and times
Similarly to fixing numbers, if your dates and times aren’t correctly formatted, they can wreak havoc with your data outputs. Use the DATE, DATEVALUE, TIME and TIMEVALUE to return sequential serial numbers that represent dates and times in a unified format throughout your spreadsheet.
8. Merge and split columns
Merging (Concatenate) and splitting columns (Text to Columns) can help you organise your data by either breaking it down or by bundling it together as required for your analysis. For example, if you’ve imported a list of names that have been formatted in a single column, you might want to split this into two columns for First Name and Last Name (or vice versa).
9. Transforms and rearrange rows and columns
Sometimes, you might want or need to switch your data around so that it appears in a different arrangement. With the TRANSPOSE function, you can switch a vertical range of cells into a horizontal range of cells (or vice versa). This is helpful for structuring your data quickly and effectively.
10. Reconcile table data by joining or matching
You can use Excel to find and correct matching errors when two or more tables are joined. There are a useful range of functions that can help you reconcile two tables from different worksheets to get an easier overview or to compare tables and find rows that don’t match. These functions include LOOKUP, HLOOKUP, VLOOKUP, XLOOKUP, INDEX, MATCH and OFFSET.
You can also use Power Query for this purpose…
Power Query for Cleaning Data
I’ve talked about the incredible power of Power Query in previous articles, but it really is the gift that keeps on giving.
Power Query excels at consolidating data from various sources. When dealing with diverse datasets, merging and consolidating in Power Query ensures consistency and accuracy (and it also allows for handling missing or mismatched data gracefully.)
Going beyond simple transformations, Power Query also enables custom transformations using M code (the language of Power Query). This advanced feature allows you to create complex logic tailored to your dataset’s specific requirements, providing unparalleled flexibility in cleaning your data.
Using this ACE function you can easily build robust error-handling mechanisms. By setting up custom error handling, you can also ensure that your data cleaning process continues smoothly even when dealing with unexpected data issues, minimising the risk of losing valuable information.
Establishing Data Cleaning Protocols
Data cleaning sounds like a chore, but there are processes you can implement to make it much, much easier.
Excel macros (VBA) and Office Scripts allow you to automate recurring data-cleaning tasks. Automation not only saves time (and who doesn’t love a time saver?) but also reduces the risk of manual errors, especially when dealing with large datasets.
I also recommend establishing a robust documentation process. Detailed documentation of your data cleaning steps, along with version control, provides a clear roadmap of the cleaning process. It ensures that you and your team can reproduce and validate the results whenever needed and eliminates discrepancies that might lead to errors.
Don’t forget that data is dynamic. Regularly monitoring your datasets ensures that any discrepancies or inconsistencies are promptly addressed. Implement alerts and monitoring mechanisms to stay ahead of potential data quality issues, ensuring your analyses are always based on accurate and clean information.
Summary
Data cleaning in Excel is such an important task. With these helpful tips on how to clear your data, and by mastering advanced features such as Power Query you can easily tidy up and elevate your data analysis.
A flawless dataset not only enhances the credibility of your analyses but it also empowers you to extract meaningful insights, enabling informed decision-making.
Remember, the journey to a flawless dataset is an ongoing process, requiring continuous learning and adaptation to the evolving landscape of data analysis.