Ever found yourself stuck with a messy pile of data that seems more like a labyrinth than a pathway to clean insights?
In this article, we invite you into the “world” of data cleaning in Qlik Sense™ to help you uncover the analytical potential behind your data.
The Importance of Data Cleaning
Imagine you’re baking a cake. Would you eyeball the measurements of your ingredients? Probably not, unless you fancy a disaster cake. Just like one poorly measured cup of flour can ruin your entire recipe, a small data error can throw off your whole analysis. That’s why, before you dive into the fun part—data analysis—you’ve got to make sure your key ingredient (data) is as clean and precise as possible.
Why Data Cleaning is More than Just a Chore
It’s not just about tidying up; it’s about quality control. Skipped steps or overlooked errors can lead to inaccurate results that could misinform your business decisions.
The accuracy of your analytics depends heavily on your data’s quality. Data cleaning helps to weed out errors and inconsistencies, ensuring your insights are both trustworthy and actionable. Tools like mapping tables or functions like SubField can be invaluable in this stage.
Inconsistent data formats or naming conventions can be a real roadblock. Qlik Sense™ offers features like the SubField function and mapping tables to help you standardize data for consistent reporting and visualization.
When you’re integrating data from various sources, alignment is crucial. Qlik Sense™ provides numerous functions that help in aligning these disparate datasets into a cohesive, unified form.
Enhanced Visualization and Performance
Clean data doesn’t just make your visualizations more meaningful; it also enhances the performance of your Qlik™ applications. Expect faster data retrieval and more efficient analysis when your data is in good shape.
Data Cleaning techniques in Qlik Sense™
Duplicate records can distort your analysis and reporting. Qlik™ offers built-in functions like Keep when loading tables or the DISTINCT keyword in your script to load only unique rows.
You can address missing values by removing records or filling in gaps based on specific criteria. Functions like IsNull, IsNullCount, and NullAsValue come in handy.
Using the numerous string functions available in Qlik Sense™, you can standardize data values to a consistent format. For example, the Upper, Lower, Date, and Num functions can be used to unify text or dates.
Sometimes the data you import into Qlik Sense™ doesn’t exactly fit your needs. Qlik™ offers ways to reshape your data accordingly.
For instance inconsistent field values can often occur when pulling data from multiple tables and this inconsistency can disrupt the connections between data sets. An efficient solution to this is to use Mapping tables.
These types of tables behave differently than other tables in that they are stored in a separate area of the memory and are strictly used as mapping tables when the script is run, they are then automatically dropped.
Let’s take a look at how to do this and the different statements and functions that can be used:
This function is used to create a mapping table.
Keep in mind that a mapping table must have two columns, the first containing the comparison values and the second contains the desired mapping values.
The ApplyMap function is used to replace data in a field based on a previously created Mapping Table.
The first parameter in ApplyMap is the Mapping Table name in quotes. The second parameter is the field containing the data that needs to be mapped.
You can add a third parameter to the ApplyMap function that serves as a default to handle cases when the value doesn’t match one in the Mapping Table.
ApplyMap(‘CountryMap’, Country, ‘Rest of the world’) As Country
The MapSubstring function is used to map parts of a field, this can be used as an alternative to Replace() or PurgeChar() functions.
For instance, let’s clean up these phone number values from unwanted characters:
MAP … USING
The Map…Using statement works differently than the ApplyMap() function in that ApplyMap does mapping every time the field name is encountered, whereas Map… Using does mapping when the values is stored under the field name in the internal table.
For instance, in the following load script, the Mapping will be applied to the Country field in Data1, however it will not be applied to Country2 field in Data2 table.
That’s because Map… USING statement is only applied to the field named Country. But in Data2, the field is stored as Country2 in the internal table.
Useful functions for data cleaning
Used to extract substrings from a string field that consists of two or more parts separated by a delimiter.
The arguments it takes are a Text (original string), a delimiter (character within the input text that divides the string into parts), and field_no that’s either 1 to return the first substring (left) or 2 to return the second substring (right)).
SubField(text, delimiter, field_no)
Returns the length of the input string
Returns a string of the first (left) characters of the input string, where the number of characters is determined by the second parameter.
Similar to Left, it returns a string of the last (rightmost) characters of the input string. The second parameter determines the number of characters to be returned.
The index function searches a string and returns the starting position of the nth occurrence of a provided substring.
Index(‘qwerty’, ‘ty’) will return 5
Index(‘qwertywy’, ‘w’, 2) will return the second occurrence of ‘w’, i.e: 7
Using a combination of the functions above to clean up a field. Let’s take a more complex field and try to extract the first name and last name.
Cleaning HTML in a field
For information about Qlik™, please visit this site: qlik.com.
For specific and specialized solutions from QQinfo, please visit this page: QQsolutions.
In order to be in touch with the latest news in the field, unique solutions explained, but also with our personal perspectives regarding the world of management, data and analytics, we recommend the QQblog !