Category Archives: VLOOKUP

Rapid compare of two lists in Excel

The business problem was to mass delete Leads from Salesforce to cause a corresponding mass delete from HubSpot. The business wanted to know how many of the planned SFDC deletions were in the HubSpot SFDC sync inclusion list. The lists were very large (> 100,000)

  1. Export SFDC leads to be deleted – with email column in export
  2. Export HubSpot contacts in the SFDC integration settings inclusion list
  3. Create a new Excel workbook and place the Hubspot emails into column A and the SFDC emails in column B
  4. In column C, Write a VLOOKUP of value in column B to see if in column A, if not, error
  5. Count the non error cells in column C

Not so fast pard’ner!

Literally, Excel VLOOKUP exact match (4th argument set to false) is really sloooooowwwwwww on large spreadsheets. So, instead, you have to use VLOOKUP twice but with approximate matching and sorted lists.

Step 1 – sort (only) Column A, then sort (only) Column B
Step 2 – in cell C2, the Excel formula is (remember – HubSpot data is in column A, SFDC data in column B):

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, VLOOKUP(B2,$A:$A,1,TRUE), NA()) and then copy the formula down for all rows in column B (SFDC leads).

This runs lightning fast (the alternative VLOOKUP exact match would be in the minutes on a 64 bit quad processor 32 GB Windows 7 machine).

Why does this work so well?
When VLOOKUP uses approximate matching on sorted lists, it stops once it finds the match (or the very next value). So, the IF condition tests to see if VLOOKUP returns the same value as the lookup key, if yes, the true condition simply returns the lookup results again – because we know it was an exact match. If false, return the #N/A value because we know it was not an equal match.

Now, I could have done this faster with the following:

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, B2, NA())

This is because the result column is the same as the lookup column. The second VLOOKUP can be used to return any column from the search array by varying the third argument. I leave the two VLOOKUPs in for future proofing the general fast VLOOKUP technique on other tables.