We’re a steady user of Conga Composer to generate Excel reports for our users where we can’t get SFDC to format a report the way the users want. This week, an interesting problem arose.
Business problem
Display a count of distinct values in a set of rows matching some condition; some rows may have null values
Background
Without getting into too much detail, we needed to count unique accounts for various conditions in the Conga-generated, merged Excel template. I couldn’t coerce the well-known ‘Power of One’ solution to work for us in the actual SFDC report that serves as the Conga source data so the decision was made to do the counting in Excel using new cell ranges below the generated Conga output.
How do you count distinct values in Excel?
If you go to the Microsoft Excel Help, you get this page.
For example, one method is to use this excel formula: =SUM(IF(FREQUENCY(MATCH(B2:B10,B2:B10,0),MATCH(B2:B10,B2:B10,0))>0,1))
Now, if you coded up this formula in the Conga template, you couldn’t directly use the cell value B10 because Conga will generate a variable number of rows based on your source data. So, you would code this in Excel as:
=SUM(IF(FREQUENCY(MATCH(B2:B2,B2:B2,0),MATCH(B2:B2,B2:B2,0))>0,1))
and rely on Conga+Excel to expand the outer range of each cell range expression to be B13 or B24 or B65 or whatever, depending on how many rows of generated data (where column B represents Excel value to be counted and rows 2…n are what Conga inserts into the merged template. I’m presuming you understand Conga merge templates.
Unfortunately, this does not work as the formula =SUM(IF(FREQUENCY(MATCH(B2:B2,B2:B2,0),MATCH(B2:B2,B2:B2,0))>0,1))
has to be entered into Excel as an array formula and Conga won’t expand/Excel array formulas by adjusting the outer range and B2:B2 expression
So, now the problem is how to count distinct values without using array formulas.
The solution that worked for us (with limitations)
Some Google searching yielded this count distinct formula that does not rely on Excel array formulas:
=SUMPRODUCT(1/COUNTIF(List1, List1))
Geez, how does this work? I’m used to the second argument of COUNTIF being some expression that is tested against the elements of the first argument but here we are comparing the list to itself. The MSFT documentation of COUNTIF doesn’t have such an example.
Well, I turned to the handy Excel formula expression evaluator in my Windows Excel 2010. Take a look at the picture below (click to see full size so you can read it):
As you can see from the picture and successive evaluations, the COUNTIF(list1,list1) looks at each element in the list and sees how many times it appears in itself, returning an array of the same size as List1. Then the reciprocal is taken of each resulting array element.
So, if a value X appears twice in the range, the resulting array will be {2,..,2,…} and thus the reciprocal array is {.5,..,.5,…}. If also a value Y appears in the range three times, the resulting array will be {2,3,3,…2,…}. Positions are of course arbitrary as there is no assumption made about sorting. And the reciprocal array would be {.5,.333,.333,…5,…}
Finally, SUMPRODUCT, because it has only one array, doesn’t multiply anything but does add up the values in the reciprocal array. Adding .5 twice is 1; Adding .33 three times is 1. Summing 1 and 1 is 2 – two unique elements
As you can see from the picture, the result ‘4’ is the number of unique Account Names. pretty cool and no array formula required.
But what happens if you have nulls in the range of Account Names?
The expression will return #DIV/0. Sigh.
The solution is to recognize that if the comparison array in COUNTIF is guaranteed to have a non-null value, then the counts will work. Look at this picture:
Note the concatenation operator ‘&’. This appends a zero length string to the second list in the COUNTIF. Thus your final result will include a count of distinct Account names plus 1 for the null cells. If you are guaranteed there will be null cells (and we were due to Conga subtotal lines inserted), you just subtract one from the final result of the =SUMPRODUCT(…) formula.