Category Archives: Excel

Rapid compare of two lists in Excel

The business problem was to mass delete Leads from Salesforce to cause a corresponding mass delete from HubSpot. The business wanted to know how many of the planned SFDC deletions were in the HubSpot SFDC sync inclusion list. The lists were very large (> 100,000)

  1. Export SFDC leads to be deleted – with email column in export
  2. Export HubSpot contacts in the SFDC integration settings inclusion list
  3. Create a new Excel workbook and place the Hubspot emails into column A and the SFDC emails in column B
  4. In column C, Write a VLOOKUP of value in column B to see if in column A, if not, error
  5. Count the non error cells in column C

Not so fast pard’ner!

Literally, Excel VLOOKUP exact match (4th argument set to false) is really sloooooowwwwwww on large spreadsheets. So, instead, you have to use VLOOKUP twice but with approximate matching and sorted lists.

Step 1 – sort (only) Column A, then sort (only) Column B
Step 2 – in cell C2, the Excel formula is (remember – HubSpot data is in column A, SFDC data in column B):

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, VLOOKUP(B2,$A:$A,1,TRUE), NA()) and then copy the formula down for all rows in column B (SFDC leads).

This runs lightning fast (the alternative VLOOKUP exact match would be in the minutes on a 64 bit quad processor 32 GB Windows 7 machine).

Why does this work so well?
When VLOOKUP uses approximate matching on sorted lists, it stops once it finds the match (or the very next value). So, the IF condition tests to see if VLOOKUP returns the same value as the lookup key, if yes, the true condition simply returns the lookup results again – because we know it was an exact match. If false, return the #N/A value because we know it was not an equal match.

Now, I could have done this faster with the following:

=IF(VLOOKUP(B2,$A:$A,1,TRUE)=B2, B2, NA())

This is because the result column is the same as the lookup column. The second VLOOKUP can be used to return any column from the search array by varying the third argument. I leave the two VLOOKUPs in for future proofing the general fast VLOOKUP technique on other tables.

Conga Composer – Excel – Array Formulas – aargh

We’re a steady user of Conga Composer to generate Excel reports for our users where we can’t get SFDC to format a report the way the users want. This week, an interesting problem arose.

Business problem
Display a count of distinct values in a set of rows matching some condition; some rows may have null values

Without getting into too much detail, we needed to count unique accounts for various conditions in the Conga-generated, merged Excel template. I couldn’t coerce the well-known ‘Power of One’ solution to work for us in the actual SFDC report that serves as the Conga source data so the decision was made to do the counting in Excel using new cell ranges below the generated Conga output.

How do you count distinct values in Excel?
If you go to the Microsoft Excel Help, you get this page.

For example, one method is to use this excel formula: =SUM(IF(FREQUENCY(MATCH(B2:B10,B2:B10,0),MATCH(B2:B10,B2:B10,0))>0,1))

Now, if you coded up this formula in the Conga template, you couldn’t directly use the cell value B10 because Conga will generate a variable number of rows based on your source data. So, you would code this in Excel as:
=SUM(IF(FREQUENCY(MATCH(B2:B2,B2:B2,0),MATCH(B2:B2,B2:B2,0))>0,1)) and rely on Conga+Excel to expand the outer range of each cell range expression to be B13 or B24 or B65 or whatever, depending on how many rows of generated data (where column B represents Excel value to be counted and rows 2…n are what Conga inserts into the merged template. I’m presuming you understand Conga merge templates.

Unfortunately, this does not work as the formula =SUM(IF(FREQUENCY(MATCH(B2:B2,B2:B2,0),MATCH(B2:B2,B2:B2,0))>0,1)) has to be entered into Excel as an array formula and Conga won’t expand/Excel array formulas by adjusting the outer range and B2:B2 expression

So, now the problem is how to count distinct values without using array formulas.

The solution that worked for us (with limitations)
Some Google searching yielded this count distinct formula that does not rely on Excel array formulas:

Geez, how does this work? I’m used to the second argument of COUNTIF being some expression that is tested against the elements of the first argument but here we are comparing the list to itself. The MSFT documentation of COUNTIF doesn’t have such an example.

Well, I turned to the handy Excel formula expression evaluator in my Windows Excel 2010. Take a look at the picture below (click to see full size so you can read it):
Count distinct formula evaluation

As you can see from the picture and successive evaluations, the COUNTIF(list1,list1) looks at each element in the list and sees how many times it appears in itself, returning an array of the same size as List1. Then the reciprocal is taken of each resulting array element.

So, if a value X appears twice in the range, the resulting array will be {2,..,2,…} and thus the reciprocal array is {.5,..,.5,…}. If also a value Y appears in the range three times, the resulting array will be {2,3,3,…2,…}. Positions are of course arbitrary as there is no assumption made about sorting. And the reciprocal array would be {.5,.333,.333,…5,…}

Finally, SUMPRODUCT, because it has only one array, doesn’t multiply anything but does add up the values in the reciprocal array. Adding .5 twice is 1; Adding .33 three times is 1. Summing 1 and 1 is 2 – two unique elements

As you can see from the picture, the result ‘4’ is the number of unique Account Names. pretty cool and no array formula required.

But what happens if you have nulls in the range of Account Names?
The expression will return #DIV/0. Sigh.

The solution is to recognize that if the comparison array in COUNTIF is guaranteed to have a non-null value, then the counts will work. Look at this picture:

Count distinct with null formula evaluation

Note the concatenation operator ‘&’. This appends a zero length string to the second list in the COUNTIF. Thus your final result will include a count of distinct Account names plus 1 for the null cells. If you are guaranteed there will be null cells (and we were due to Conga subtotal lines inserted), you just subtract one from the final result of the =SUMPRODUCT(…) formula.