Data Cleaning in Qlik Sense™

Page content

Ever found yourself stuck with a messy pile of data that seems more like a labyrinth than a pathway to clean insights?

In this article, we invite you into the “world” of data cleaning in Qlik Sense™ to help you uncover the analytical potential behind your data.

The Importance of Data Cleaning

Imagine you’re baking a cake. Would you eyeball the measurements of your ingredients? Probably not, unless you fancy a disaster cake. Just like one poorly measured cup of flour can ruin your entire recipe, a small data error can throw off your whole analysis. That’s why, before you dive into the fun part—data analysis—you’ve got to make sure your key ingredient (data) is as clean and precise as possible.

Why Data Cleaning is More than Just a Chore

It’s not just about tidying up; it’s about quality control. Skipped steps or overlooked errors can lead to inaccurate results that could misinform your business decisions.

Data Accuracy

The accuracy of your analytics depends heavily on your data’s quality. Data cleaning helps to weed out errors and inconsistencies, ensuring your insights are both trustworthy and actionable. Tools like mapping tables or functions like SubField can be invaluable in this stage.

Data Consistency

Inconsistent data formats or naming conventions can be a real roadblock. Qlik Sense™ offers features like the SubField function and mapping tables to help you standardize data for consistent reporting and visualization.

Data Integration

When you’re integrating data from various sources, alignment is crucial. Qlik Sense™ provides numerous functions that help in aligning these disparate datasets into a cohesive, unified form.

Enhanced Visualization and Performance

Clean data doesn’t just make your visualizations more meaningful; it also enhances the performance of your Qlik™ applications. Expect faster data retrieval and more efficient analysis when your data is in good shape.

Data Cleaning techniques in Qlik Sense™

Duplicates removal

Duplicate records can distort your analysis and reporting. Qlik offers built-in functions like Keep when loading tables or the DISTINCT keyword in your script to load only unique rows.

Missing values

You can address missing values by removing records or filling in gaps based on specific criteria. Functions like IsNull, IsNullCount, and NullAsValue come in handy.

Data formatting

Using the numerous string functions available in Qlik Sense, you can standardize data values to a consistent format. For example, the Upper, Lower, Date, and Num functions can be used to unify text or dates.

Data manipulation

Sometimes the data you import into Qlik Sense doesn’t exactly fit your needs. Qlik offers ways to reshape your data accordingly.

For instance inconsistent field values can often occur when pulling data from multiple tables and this inconsistency can disrupt the connections between data sets. An efficient solution to this is to use Mapping tables.

Mapping Tables

These types of tables behave differently than other tables in that they are stored in a separate area of the memory and are strictly used as mapping tables when the script is run, they are then automatically dropped.

Let’s take a look at how to do this and the different statements and functions that can be used:

MAPING prefix

This function is used to create a mapping table.
For instance:

Keep in mind that a mapping table must have two columns, the first containing the comparison values and the second contains the desired mapping values.

ApplyMap()

The ApplyMap function is used to replace data in a field based on a previously created Mapping Table.

The first parameter in ApplyMap is the Mapping Table name in quotes. The second parameter is the field containing the data that needs to be mapped.

You can add a third parameter to the ApplyMap function that serves as a default to handle cases when the value doesn’t match one in the Mapping Table.

For instance:
ApplyMap(‘CountryMap’, Country, ‘Rest of the world’) As Country

after mapping:

MapSubstring()

The MapSubstring function is used to map parts of a field, this can be used as an alternative to Replace() or PurgeChar() functions.

For instance, let’s clean up these phone number values from unwanted characters:

after cleaning:

MAP … USING

The Map…Using statement works differently than the ApplyMap() function in that ApplyMap does mapping every time the field name is encountered, whereas Map… Using does mapping when the values is stored under the field name in the internal table.

For instance, in the following load script, the Mapping will be applied to the Country field in Data1, however it will not be applied to Country2 field in Data2 table.
That’s because Map… USING statement is only applied to the field named Country. But in Data2, the field is stored as Country2 in the internal table.

Useful functions for data cleaning

SubField()

Used to extract substrings from a string field that consists of two or more parts separated by a delimiter.

The arguments it takes are a Text (original string), a delimiter (character within the input text that divides the string into parts), and field_no that’s either 1 to return the first substring (left) or 2 to return the second substring (right)).
SubField(text, delimiter, field_no)

For instance:

Len()

Returns the length of the input string

Left()

Returns a string of the first (left) characters of the input string, where the number of characters is determined by the second parameter.
Left(text, count)

Right()

Similar to Left, it returns a string of the last (rightmost) characters of the input string. The second parameter determines the number of characters to be returned.

Index()

The index function searches a string and returns the starting position of the nth occurrence of a provided substring.

For instance:
Index(‘qwerty’, ‘ty’)  will return 5
Index(‘qwertywy’, ‘w’, 2) will return the second occurrence of ‘w’, i.e: 7

Example 1:
Using a combination of the functions above to clean up a field. Let’s take a more complex field and try to extract the first name and last name.

after cleaning:

Example 2:
Cleaning HTML in a field

after cleaning:

For information about Qlik™, please visit this site: qlik.com.
For specific and specialized solutions from QQinfo, please visit this page: QQsolutions.
In order to be in touch with the latest news in the field, unique solutions explained, but also with our personal perspectives regarding the world of management, data and analytics, we recommend the QQblog !