One of the great features of marketing automation platforms (MAPs) is the ability to automate deduping. Most MAPs look at Email Address as the unique identifier. While this doesn’t stop all dupes, it does take care of 80% of new dupes from imports, form fill outs, and events.
But you will still have duplicates from Sales, CRM history, and people filling in different information. So how can you get closer to zero dupes?
How Marketo handles duplicates
Marketo automatically dedupes leads based on email address. The dedupe works only when you enter new leads into Marketo from:
- List Import
- Marketo Form (iFrame or Marketo Page is ok)
- Direct creation in Marketo database.
- Direct creation through API. (exceptions apply)
Thus you should do your best to create new leads using the above methods. I can’t speak for other systems, but they are probably similar in approach. There are some caveats to Marketo’s approach:
- Records from the CRM will never be deduped. Your Sales team can create dupes!
- Existing Records on the first sync will also not be deduped. This is why you should clean up your database before implementation.
- If there are Duplicates by Email Address, Marketo will choose the most recently updated Record to append to. This may not be the record you actually want.
In Marketo and your CRM, you can merge records manually, but not automatically.
Choosing the Right Record in Marketo
The use case here is that you may have a Customer Record that was selected to receive an important billing message. You want that Record to be tagged as having Sent the email to.
When duplicates exist in Marketo, you will upload a static list in CSV with one column: email address. But Marketo cannot guarantee it will select the exact Record you saw in your CRM. So if it chooses a loose SFDC Lead instead, your system will record the Email Send, but it won’t be obvious to anyone looking at the Customer’s main CRM record.
Most of the time this is not a big deal, but it can be a big deal when you want to record billing or legal messages have been sent. You can be sure that a Customer will call up sales to complain about a price increase they never saw. When Sales looks at the Contact, no email is shown. But when you look across all records, you can prove the email was sent. As a MOPS pro, you can be sure this scenario will occur and that the salesperson won’t always check possible dupes.
To solve this, you would need to do one of two things:
- Option 1: use SFDC Report to Add to SFDC Campaign and then point Marketo at that correct list. This won’t always work if you use SFDC Campaign as attribution tools.
- Option 2: use a CRM flag to identify the specific record to Marketo. It’s ok, but not scalable.
- Option 3: use Talend (or similar tool) to map the SFDC ID Record against the Marketo ID Record to Add to List.
- Option 4: paste a list of SFDC IDs into the SFDC ID filter in a Smart List.
- Option 4 is only possible if you have less than 2499 rows to place in the filter. I don’t like this option because it’s not scalable.
In an ideal world, Marketo would enable a List Import option to map against better unique database keys like SFDC ID. Until that happens, those are your options.
But this case is not unique to Marketo — all MAPs that rely on email address deduping will encounter this scenario with a CRM or another database.
Why Duplicates Matter to You
Every record in your database costs you money and time in some way. A single record may be a fraction of a cent in some databases. In most MAPs that charge by the record, you are charged whether or not that record has value to you. Per record fees may be $.05 to $.20! That matters because, in my experience, most B2B marketing databases have about 18% useful prospect records and perhaps 60% bad or inactive records that just sit there. If you have 100,000 records, that’s $12,000 per year in deadweight. The rule of thumb bandied about is 25% of your database goes bad every year.
- Clutter – not being able to get accurate counts for segmentations.
- Compounding Bad Data with Sales – salespeople update the wrong records and chaos ensues
- Spam Law Compliance – issues with duplicates not having the right permissions could end up as a nasty letter or legal action.
- Vendor Costs – most vendors use the number of records as a pricing scale, even if you have duplicates. Managing this is your problem, not the vendor.
- Email Costs – if you use a vendor that prices per email, then you will definitely have an increased cost.
- Inaccurate reports – incorrect records invariably lead to questionable reports and bad decisions.
- Double Purchases: bad data quality and dupes often means Salespeople (and you) will buy lists that have the same people.
There are tons of studies by data vendors that make the case even more clearly. A 2011 Gartner study suggested poor data quality lowered productivity 20%. And SiriusDecisions (and data processing vendors) have shown how bad data compounds over time.
Hunting Down Sources and Preventing Dupes
You can prevent duplicates. One way is to map out how data enters your systems and develop processes for deciding which systems can create records or find and append records. The steps I would take include the following:
Map of All Data Entry Methods:
- Create in CRM
- Import into CRM
- Product database
- List Import into your MAP
- Salespeople/Manual Creation
- Lead fills out a form
Then decide which sources are permitted and who is permitted to create records.
- Which system is the Source of Truth that trumps other sources?
- Which records will win during a merge?
Those people and systems then need a process to ensure they attempt to identify dupes and handle them. You can create a hierarchy of record scenarios to build automation rules.
If you are looking for specific rules over which fields and records to choose to override, you can think about some of the options:
- Prefer Older Record over newer.
- Prefer more complete record to less complete.
- Prefer the Source of Truth Record against an incoming record.
- Choose the existing data rather than the purchased data (if not empty).
- Choose the Customer record over the Lead record.
- Choose the business email record over the matched Gmail record.
- Choose the data from Data Vendor 1 over Data Vendor 2.
- Choose the most recently updated field over the older field.
- Average the scores (although Marketo adds them).
- Use the score of the more recent lead.
- Choose the record that is furthest down the funnel.
Blocking Dupe Creation
- Using a CRM Tool like Dupeblocker: enforce a search of the database before creation.
- Using Automation to use unique keys across systems to match up before creation.
- Using Email Address deduplication in your MAP.
- Block Salespeople from record creation.
- End use of SFDC Leads and only use Account-Contacts. (Much harder!).
- Sync your CRM-MAP with all records. Not syncing all records ensures dupe creation.
At some point, you will have to accept that duplicates will exist and have a tolerable threshold. My personal rule of thumb is that you are doing very well if 3% of your database are dupes by email address. Duplicates by Name or other fuzzy criteria may be slightly higher, so you will need more powerful tools to identify that count. I have often come across duplicates where the company had multiple domains and Company Names, which would be very difficult to fully identify without knowing their particular case.
At the end of the day, your team must decide how much bad data is costing you against the cost of setting up an automated process to clean the database. For smaller databases, some simple rules will be enough with your email address deduping. For databases over 1MM records, I personally recommend an automated solution. The earlier you can do this, the less frustration you will have in the future.
Image Credit: barbourians