Monthly Archives: June 2016

Duplicate Rule Woes

A long time ago, I implemented duplicate checking in Apex for custom object Foo__c. I decided it was time to use SFDC’s out-of-the-box Duplicate Rules so I could move the logic into point-and-click configuration and clean up the code.

Good idea eh?

Well, sort of. There are some considerations before you jump into this.

Starting condition:
My existing Apex logic checked for duplicates intrabatch as well as extrabatch. Meaning, if two Foo__c‘s with the same key appeared in the same trigger set, they were both flagged as duplicate errors. Similarly, if any Foo__c within the batch matched an existing Foo__c outside of the batch, it would be flagged as an error.

Consideration (coding)

  • Unfortunately, SFDC duplicate rules won’t block intrabatch duplicates. This is documented in the Help
  • Doubly unfortunate, once you insert in a batch duplicate Foos, if you edit one of the duplicate Foos without changing the matching key field, SFDC won’t check it against the other Foo with the same key. For example, if you bulk uploaded in one batch two Foos, each with key ‘Bar’, SFDC doesn’t detect duplicates. When you go to edit one of the Foos with key ‘Bar’ and change any field other than the matching key, SFDC won’t tell you that Foo i with key ‘Bar’ is the same as existing Foo j with key ‘Bar’.

That said, you do get to eliminate any Apex code that does SOQL to check for duplicates extrabatch.

Workaround
If you really want to block Foos with the same key from getting into the database, you have to implement in your domain layer (i.e. trigger), Apex verification. No SOQL is required because all the records that have to be checked will be in Trigger.new

Consideration (deployment)
As of V37.0 (Summer 16), there is no way to deploy Duplicate Rules through ant or Change Sets. You have to manually add the Duplicate Rules in your target orgs. You can deploy MatchingRules via ant or Change Sets but that won’t do you much good as they have to be bound to a Duplicate Rule. There is an Idea worth voting up.

Rapid compare of two lists in Excel

The business problem was to mass delete Leads from Salesforce to cause a corresponding mass delete from HubSpot. The business wanted to know how many of the planned SFDC deletions were in the HubSpot SFDC sync inclusion list. The lists were very large (> 100,000)

  1. Export SFDC leads to be deleted – with email column in export
  2. Export HubSpot contacts in the SFDC integration settings inclusion list
  3. Create a new Excel workbook and place the Hubspot emails into column A and the SFDC emails in column B
  4. In column C, Write a VLOOKUP of value in column B to see if in column A, if not, error
  5. Count the non error cells in column C

Not so fast pard’ner!

Literally, Excel VLOOKUP exact match (4th argument set to false) is really sloooooowwwwwww on large spreadsheets. So, instead, you have to use VLOOKUP twice but with approximate matching and sorted lists.

Step 1 – sort (only) Column A, then sort (only) Column B
Step 2 – in cell C2, the Excel formula is (remember – HubSpot data is in column A, SFDC data in column B):

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, VLOOKUP(B2,$A:$A,1,TRUE), NA()) and then copy the formula down for all rows in column B (SFDC leads).

This runs lightning fast (the alternative VLOOKUP exact match would be in the minutes on a 64 bit quad processor 32 GB Windows 7 machine).

Why does this work so well?
When VLOOKUP uses approximate matching on sorted lists, it stops once it finds the match (or the very next value). So, the IF condition tests to see if VLOOKUP returns the same value as the lookup key, if yes, the true condition simply returns the lookup results again – because we know it was an exact match. If false, return the #N/A value because we know it was not an equal match.

Now, I could have done this faster with the following:

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, B2, NA())

This is because the result column is the same as the lookup column. The second VLOOKUP can be used to return any column from the search array by varying the third argument. I leave the two VLOOKUPs in for future proofing the general fast VLOOKUP technique on other tables.