Free Data Matching Online tool (2024)

Data matching, also known as record linkage or deduplication, is the process of identifying and linking related records across multiple datasets. It's the secret sauce that transforms raw, messy data into a valuable asset for marketing, sales, or data analysis.

Most available software tools are complex and costly. Datablist is an online application, compatible with Mac OS, Microsoft Windows, and Linux, to perform data matching fast. It implements exact match, phonetic, and advanced fuzzy matching algorithms.

In this guide, you will learn to demystify the complexities of data matching. Whether you're a novice looking to get started or an experienced data professional seeking to optimize your skills, this tutorial offers something for everyone.

Here is a quick summary of the Data Matching points you will find in this article:

  • Loading your datasets
  • Data Preparation with Cleaning and Normalization
  • Match records within or across several data collections
  • Deduplicate: remove or merge matching groups

Start using our online Data Matching tool in seconds. No sales calls or features on PowerPoints.

Step 1: Load your datasets

The first step is to load your datasets into Datablist. Datablist is an online tool for list management. You can view and edit CSV files and Excel files. It's the perfect tool to manage your leads list, clean customer data, or clean scraped data.

To start, create a collection to load your first dataset.

Free Data Matching Online tool (1)Free Data Matching Online tool (2)

Then, click the Import button.

Free Data Matching Online tool (3)Free Data Matching Online tool (4)

Once your first dataset is imported. You can either:

  • Import other datasets with common data structures into the same collection.
  • Import datasets with different data structures into new collections.

Datablist Duplicates Finder finds matching records within a single collection or across collections with different data structures.

Step 2: Clean your data if needed

The second step is Data Cleansing. Data cleaning is an essential prerequisite for data matching. It ensures the accuracy and reliability of the matching process. Dirty or unclean data can lead to incorrect matches and unreliable results. In essence, data cleaning paves the way for successful data matching.

When they appear in names (people names, company names), they add nothing but might prevent the deduplication algorithm from detecting duplicates.

Datablist comes with a list of built-in data-cleaning tools:

  • Remove symbols and punctuation - Scraped texts might have smileys and ASCII symbols. Or names with punctuation. Datablist matching algorithm ignore them for data matching, but they will prevent automatic merging during the deduplication process.
  • Remove extra spaces - An extra space between words is enough to make two strings different. Datablist matching algorithms pre-process text values to remove extra spaces, but they will also prevent automatic merging during the deduplication process.
  • Extract email addresses, URLs, etc. from texts - If your data has unstructured text containing email addresses, URLs, member mentions, tags, etc. Use our Data Extractor to get those entities and structure your data. Your Data Matching process will be easier with structured entities to match.
  • Remove HTML tags - Another cleaning feature is the ability to get plain text from strings with HTML tags. This lets you match scraped lists containing HTML tags with your other datasets.
  • Convert text to DateTime, Number, Boolean, etc. - Datablist offers real data structure with built-in DateTime, Number, Boolean, etc. format. An important step for data cleansing is to convert raw text to a native format. Native Datetime, number, etc. are crucial for advanced merging rules when you need to select a master item based on value comparison (for example, the most recent date).
  • Change text case to get consistent formatting - Text case transformation is a simple yet necessary step. Datablist has several built-in case transformation algorithms).
  • Split or merge properties - This feature is perfect for dealing with multi-value data. If a property contains several email addresses separated with commas/semicolons/spaces, Datablist Split Property tool will create several properties with a single email address.
  • Remove or replace empty values - Use Datablist Filtering features to filter empty values or empty rows.

Check our guide on data cleaning for more examples and instructions on data cleansing.

Normalize people names

Dealing with datasets of people is common in Data Deduplication. Customers, Leads, and prospects datasets are good examples of lists of people. In the best case, data matching on people datasets relies on unique identifiers such as email addresses or identity numbers. Without such identifiers, or to match people across datasets, you will need to use people's names for the data-matching process.

Pre-processing people's names ensures that all names follow a uniform format, reducing errors during deduplication.

Remove noises from names

People's names can be highly variable. Nicknames, abbreviations, alternate spellings, and use of special characters are common variations.

Use the powerful to remove prefixes, suffixes, stop-words, regional addendums, and other useless words.

For example, to remove name titles, you can use this regular expression:

^\s*(mr|mrs|dr|miss|ms|sir|madam|m).?\s

And replace it with an empty string.

Free Data Matching Online tool (5)Free Data Matching Online tool (6)

Free Data Matching Online tool (7)Free Data Matching Online tool (8)

Notes If you are not familiar with Regular Expressions, just contact us and we'll help you clean your data.

Parse full name into parts

Datablist is more than a data-cleaning tool and provides data enrichment. Examples of enrichment are lead enrichments, or csv translations using Deepl.

The Name Parser is a perfect enrichment for cleaning people's names. It takes a full name and returns its parts: first name, middle name, last name. And return the most common gender and country for the name.

It uses statistical data to split the full names.

To use it, open the "Enrich Menu" from the top buttons.

Free Data Matching Online tool (9)Free Data Matching Online tool (10)

Then select "Name Parser".

Free Data Matching Online tool (11)Free Data Matching Online tool (12)

Then, select the property with name values. And map or create new property to store the results of the parsing. The full name property won't be edited, only the output properties will be edited with the results.

Free Data Matching Online tool (13)Free Data Matching Online tool (14)

Normalize company names

You can remove noise from company names as well. If you have prefixes, suffixes, stop-words, regional addenda, or others that prevent good matching.

An example of cleaning is removing company name suffixes such as "Inc." or "GmbH".

Use this regular expression in the tool:

,?\s(llc|inc|incorporated|corporation|corp|co|gmbh|ltd).?$

And replace it with an empty string.

Free Data Matching Online tool (15)Free Data Matching Online tool (16)

Free Data Matching Online tool (17)Free Data Matching Online tool (18)

Normalizing company names and addresses across all datasets is important to bring them into a standard format.

Normalize street names

If you are doing Data Matching on postal addresses, street name normalization is crucial. Addresses can be written with abbreviations, directional prefixes, or numeral suffixes. Without normalization, the same street can appear multiple times under different representations, making data matching a challenging task.

For example: Main 9 St, Main 9TH St., and Main 9th Street refer to the same street name. Or Washington Blvd and Washington Boulevard.

Using fuzzy algorithms to deal with those differences is inefficient. Several letter changes are needed between Washington Blvd and Washington Boulevard. And the similarity distance calculated using fuzzy-matching algorithms would be high.

A better way to deal with variation in street names is to normalize your street names. A consistent format ensures consistency.

Datablist offers street name normalization for English street name formats. It normalizes abbreviations, street numbers, etc.

Notes
The street name normalization works for split addresses. The street name information must be in a distinct property. Full address values would not work.

Click on "Normalize Street Names" from the "Clean" menu.

Then, select the property with street names, and pick "Normalize english street names".

Free Data Matching Online tool (21)Free Data Matching Online tool (22)

Check the preview for the changes, and click "Run".

Free Data Matching Online tool (23)Free Data Matching Online tool (24)

Step 3: Match records within or across several data collections

Now that your data is clean and normalized, it's time to move to the data-matching step. In this step, we want to group similar records together.

Free Data Matching Online tool (25)Free Data Matching Online tool (26)

Datablist has two ways to compare records:

  • Selected properties comparison - This is the most used mode. You define the properties to be compared. This mode is compatible with multi-collection matching.
  • All Properties comparison - In this mode, Datablist Duplicates Finder identifies and removes records that are exactly the same. They must have the same data for the same properties. If a property has an empty value, records will not be matched.

Free Data Matching Online tool (27)Free Data Matching Online tool (28)

Select properties to match

For the rest of this guide, we'll be using the "Selected Properties & Multi Collections" mode.

The next step is to select the properties that will be used for data matching. If you have selected multiple collections in the previous step, you will be asked to select a mapping property for each collection.

Notes
Datablist will try to auto-map your properties across collections using their name.

Free Data Matching Online tool (29)Free Data Matching Online tool (30)

Select matching algorithm

In the following step, selected properties are listed and you need to configure comparison algorithms.

Datablist implements the following matching algorithms:

  • Exact - The exact algorithm is the recommended algorithm for non-text properties such as DateTime, Number, Boolean, etc. When used for text properties, an option lets you decide if the comparison should be text-sensitive. The exact algorithm removes leading and trailing spaces from texts.

  • Smart - The smart algorithm preprocesses your items to match data with slight variations. It will match URLs with different protocols. It also deals with word order and punctuation. "John-Doe" and "Doe John" will match.

  • Phonetic with Double Metaphone algorithm - Datablist implements the Double Metaphone algorithm for phonetic matching. This algorithm converts words to codes representing their pronunciation. Two similar-sounding words will get the same Double Metaphone code.

  • Fuzzy matching with distance algorithms - Datablist also implements fuzzy matching with the Jaro-Winkler and Levenshtein distances. When selected, you must set a similarity threshold. The higher the threshold, the less variation will be allowed.

Check our documentation for more details on the matching algorithms.

Free Data Matching Online tool (31)Free Data Matching Online tool (32)

Notes

  • Smart, Phonetic, and fuzzy algorithms only apply to text-based properties (including Email, Text, LongText.).
  • URL properties are only compatible with Exact and Smart algorithms.

Step 4: Remove or merge matching groups (deduplication)

Datablist Duplicates Finder returns a list matching groups within a few seconds.

Automatic merging for single collection deduplication

Datablist Data Matching tool provides an advanced algorithm to merge your duplicates. It provides two modes for dupes merging:

  • Non-conflicting items merging. (see below)
  • Merge conflicting items using text concatenation or value dropping (see below)

Notes:
Please note that this feature is only available on single collection deduplication. With multi-collection deduplications, the data structure might be different between your collections.

Auto-Merging without data conflicts

Datablist will automatically find all duplicates that can be merged without losing any information.

It works as follows:

  • If all the duplicate items have the same property values, only one item will be kept and the others will be deleted.
  • If the duplicate items are complementary, the item with the most information will be selected as the primary item and its property values will be filled using other item property values. Then all items except the primary item will be deleted.
  • If duplicate items have conflicted property values, items will be skipped for manual merging.

Auto-Merging with data conflict resolution

When running the Auto-Merging algorithm, Datablist Duplicates Finder automatically detects conflicting properties. A conflict is when two items have different values for a property. To be able to merge them, you must choose between two options:

  • Combine properties values - The combine option lets you concatenate values using a delimiter. For example, if two different "Phone" values exist for a similar record, you can concatenate the two using a semi-colon. This option is perfect for email addresses, phone numbers, notes, etc.
  • Drop properties values - For non-text properties, it's possible to keep only one value and drop the conflicting one. For example, if two Datetime values conflict, it's not possible to concatenate them. In this case, you must keep only one. This option is also useful to deal with external identifiers. When you are cleaning data for your CRM, the external account ID must identify a record on the CRM and it can't be a concatenated string.

Cleaning rules for multi-collection data matching

When you run Datablist Data Matching tool on several datasets, the auto-merging feature won't be available. Your collections might have different data structures with different properties.

Instead, Datablist offers a cleaning feature to remove duplicate items in all except one collection. Use this tool to ensure item uniqueness across your datasets.

Free Data Matching Online tool (33)Free Data Matching Online tool (34)

A preview of the changes to be done is displayed before running the cleaning algorithm.

Free Data Matching Online tool (35)Free Data Matching Online tool (36)

Manual merging with the Merging Assistant

For the remaining duplicate records, a manual merging assistant is available.

To merge duplicates, click on the "Manual Merging Assistant" button on the left of every duplicate group.

Free Data Matching Online tool (37)Free Data Matching Online tool (38)

It opens a merging tool. On the right, a "Primary Item" is shown and on the left, the remaining duplicate items are called "Secondary Items".Datablist elects the item with the most data as the "Primary item".

Free Data Matching Online tool (39)Free Data Matching Online tool (40)

When possible, property values from secondary items are auto-selected to be merged into the primary items. If several values conflict, you will have to make a decision and select which value to keep.

If the resulting "Primary item" suits you, click the Merge button to confirm the merge process. All the secondary items will be deleted to keep only one combined item.

NotesThe manual Merging Assistant is available in multi-collection deduplication when the collection has a similar data structure (same properties).

Download duplicate groups to merge with an external tool

Finally, Datablist Data Matching tool offers an export of the duplicate groups detected. It lets you export a CSV or Excel file with all duplicate items listed consecutively.

Free Data Matching Online tool (41)Free Data Matching Online tool (42)

Use the export file to clean your records with another tool (for example a spreadsheet tool), or to run more complex analysis.

FAQ

What is data matching?

Data matching, also known as record linkage or deduplication, is the process of identifying and linking related records within or across datasets. Its primary objective is to enhance data quality, accuracy, and consistency by recognizing and consolidating duplicate or similar entries representing the same entities, individuals, or objects.

This process is helpful for cleaning datasets that get duplicate records added over time. Or to combine several datasets with similar or overlapping fields.

Data matching can use discriminative fields such as email address, website URL, or identification strings/numbers. Or a combination of non-unique attributes (such as name, date of birth, company name, or location) to generate a similarity score between records.

How fast is Datablist data-matching tool?

Datablist Data Matching tool loads your datasets in memory to perform the matching analysis. It is suited for datasets under 1 million records. And perform most data-matching analyses in less than a few minutes.

Do I need technical skills to perform Data Matching?

No. Datablist is a no-code solution made for everyone. From data analysts to marketing or salespeople.

When to use data matching?

Data matching finds extensive applications across diverse fields, including finance, healthcare, marketing, and customer management, where reliable data is essential for informed decision-making or data integration with other tools.

This process facilitates tasks such as fraud detection, user profile consolidation, and data enrichment from multiple sources.

What's next?

If you're interested in data cleansing, you'll like the following guides:

  • How to clean data online
  • How to detect junk emails in a list
Free Data Matching Online tool (2024)

References

Top Articles
Ready to race? Q&As with rookies Kvyat and Magnussen
GDP up by 0.2% and employment up by 0.2% in the euro area
Spasa Parish
Rentals for rent in Maastricht
159R Bus Schedule Pdf
Sallisaw Bin Store
Espn Transfer Portal Basketball
Pollen Levels Richmond
11 Best Sites Like The Chive For Funny Pictures and Memes
Finger Lakes 1 Police Beat
Craigslist Pets Huntsville Alabama
Paulette Goddard | American Actress, Modern Times, Charlie Chaplin
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
What's the Difference Between Halal and Haram Meat & Food?
R/Skinwalker
Rugged Gentleman Barber Shop Martinsburg Wv
Jennifer Lenzini Leaving Ktiv
Havasu Lake residents boiling over water quality as EPA assumes oversight
Justified - Streams, Episodenguide und News zur Serie
Epay. Medstarhealth.org
Olde Kegg Bar & Grill Portage Menu
Cubilabras
Half Inning In Which The Home Team Bats Crossword
Amazing Lash Bay Colony
Dirt Devil Ud70181 Parts Diagram
Truist Bank Open Saturday
Water Leaks in Your Car When It Rains? Common Causes & Fixes
What’s Closing at Disney World? A Complete Guide
New from Simply So Good - Cherry Apricot Slab Pie
Drys Pharmacy
Ohio State Football Wiki
Find Words Containing Specific Letters | WordFinder®
Abby's Caribbean Cafe
Joanna Gaines Reveals Who Bought the 'Fixer Upper' Lake House and Her Favorite Features of the Milestone Project
Pull And Pay Middletown Ohio
Tri-State Dog Racing Results
Navy Qrs Supervisor Answers
Trade Chart Dave Richard
Lincoln Financial Field Section 110
Free Stuff Craigslist Roanoke Va
Kino am Raschplatz - Vorschau
Wi Dept Of Regulation & Licensing
Pick N Pull Near Me [Locator Map + Guide + FAQ]
Ice Hockey Dboard
Wie blocke ich einen Bot aus Boardman/USA - sellerforum.de
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
A look back at the history of the Capital One Tower
How To Use Price Chopper Points At Quiktrip
Maria Butina Bikini
Busted Newspaper Zapata Tx
Latest Posts
Article information

Author: Greg O'Connell

Last Updated:

Views: 6014

Rating: 4.1 / 5 (42 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Greg O'Connell

Birthday: 1992-01-10

Address: Suite 517 2436 Jefferey Pass, Shanitaside, UT 27519

Phone: +2614651609714

Job: Education Developer

Hobby: Cooking, Gambling, Pottery, Shooting, Baseball, Singing, Snowboarding

Introduction: My name is Greg O'Connell, I am a delightful, colorful, talented, kind, lively, modern, tender person who loves writing and wants to share my knowledge and understanding with you.