Identifying and ranking repeat values in an Excel data series
Use Excel to develop a running tally of how many times each particular value in a data series has been repeated.
I have an Excel spreadsheet where some of the number values are repeated at various points down the list. It’s easy to determine how many times a certain value is repeated within the data set via a COUNTIF
function, but how can I determine which repeat each particular data point represents (ie: give it a rank)?
Consider the following data set:
ID | Barcode | Date |
---|---|---|
1 | 3434324 | 2/05/2015 |
2 | 5654656 | 2/05/2015 |
3 | 4452367 | 2/05/2015 |
4 | 6789558 | 2/05/2015 |
5 | 9993291 | 5/05/2015 |
6 | 4452367 | 7/05/2015 |
7 | 5654656 | 8/05/2015 |
8 | 4452367 | 9/05/2015 |
9 | 4452367 | 10/05/2015 |
10 | 5654656 | 11/05/2015 |
11 | 4452367 | 12/05/2015 |
12 | 6789558 | 13/05/2015 |
13 | 9993291 | 14/05/2015 |
14 | 3434324 | 16/04/2015 |
As can be seen, some of the barcodes repeat over time. What I need to do is insert another column that indicates the repeat number (rank) of each barcode in the data set. So, for instance, barcode 3434324 at ID=1 would be 1 (first instance) but at ID=14 it would be 2 (second instance).
For this exercise, I shall use an array formula in a table that has been “formatted as table“.
Part 1: Tell Excel where to count
In order to create our tally, a count needs to be made of the number of times that a chosen value has occurred between the first row of the table and the current row. The current row is determined by ROW()
.
We are going to dynamically generate the cell ranges to search using the ADDRESS
and INDIRECT
functions. Since our table has headers in Row 1, we need to start counting our barcodes from Row 2:
=ADDRESS(2,COLUMN([Barcode]))
This returns the value $B$2
, which is where our counting will commence. Our counting should conclude on the current row, hence:
=ADDRESS(ROW(),COLUMN([Barcode]))
Using an INDIRECT
function, we can turn these into a cell range (separated by a colon):
INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))
As an example for Row 5, the dynamically-generated range would be $B$2:$B$5.
Part 2: Tell Excel what to count
We’ll use the COUNT
function to count the barcodes in our range. We need to compare the barcode in the current row with all other barcodes in the range, confirm matches and then only count against those. To do this, we’ll use an IF
statement. Create a new column called “Repeat Tally” and insert the following formula where “Barcode” is the name of the column of data that we’re searching:
=COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))
Because this is an array formula, it won’t work simply by typing it into the formula bar and pressing enter. The array formula is activated by pressing Ctrl + Shift + Enter. You will then notice that the curly braces are applied to the formula, indicating that it’s an array:
={COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))}
The repeat instance (rank) of each barcode will be displayed in our new “Repeat Tally” column.
Part 3: Fix the error
This works well, except for the very first row where the first instance of barcode 3434324 is 2. Wrong!
ID | Barcode | Date | Repeat Tally |
---|---|---|---|
1 | 3434324 | 2/05/2015 | 2 |
2 | 5654656 | 2/05/2015 | 1 |
3 | 4452367 | 2/05/2015 | 1 |
4 | 6789558 | 2/05/2015 | 1 |
5 | 9993291 | 5/05/2015 | 1 |
6 | 4452367 | 7/05/2015 | 2 |
7 | 5654656 | 8/05/2015 | 2 |
8 | 4452367 | 9/05/2015 | 3 |
9 | 4452367 | 10/05/2015 | 4 |
10 | 5654656 | 11/05/2015 | 3 |
11 | 4452367 | 12/05/2015 | 5 |
12 | 6789558 | 13/05/2015 | 2 |
13 | 9993291 | 14/05/2015 | 2 |
14 | 3434324 | 16/04/2015 | 2 |
To fix this, we add a simple conditional statement to our formula:
=IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))
This formula says that if the row number is 2, make the value 1 (it cannot be anything else), otherwise calculate the formula as per normal. Don’t forget to press Ctrl + Shift + Enter to activate the array!
={IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))}
Now everything is correct:
ID | Barcode | Date | Repeat Tally |
---|---|---|---|
1 | 3434324 | 2/05/2015 | 1 |
2 | 5654656 | 2/05/2015 | 1 |
3 | 4452367 | 2/05/2015 | 1 |
4 | 6789558 | 2/05/2015 | 1 |
5 | 9993291 | 5/05/2015 | 1 |
6 | 4452367 | 7/05/2015 | 2 |
7 | 5654656 | 8/05/2015 | 2 |
8 | 4452367 | 9/05/2015 | 3 |
9 | 4452367 | 10/05/2015 | 4 |
10 | 5654656 | 11/05/2015 | 3 |
11 | 4452367 | 12/05/2015 | 5 |
12 | 6789558 | 13/05/2015 | 2 |
13 | 9993291 | 14/05/2015 | 2 |
14 | 3434324 | 16/04/2015 | 2 |
Optional Part 4: Add Multiple Factors
It is possible to add additional factors to our equation. Let’s say that we added another parameter to our data set (customer’s name):
ID | Barcode | Name | Date |
---|---|---|---|
1 | 3434324 | Jane | 2/05/2015 |
2 | 5654656 | Paul | 2/05/2015 |
3 | 4452367 | Sarah | 2/05/2015 |
4 | 6789558 | John | 2/05/2015 |
5 | 9993291 | Emily | 5/05/2015 |
6 | 4452367 | Paul | 7/05/2015 |
7 | 5654656 | Paul | 8/05/2015 |
8 | 4452367 | Jane | 9/05/2015 |
9 | 4452367 | Paul | 10/05/2015 |
10 | 5654656 | Sarah | 11/05/2015 |
11 | 4452367 | Paul | 12/05/2015 |
12 | 6789558 | Peter | 13/05/2015 |
13 | 9993291 | Emily | 14/05/2015 |
14 | 3434324 | Emily | 16/04/2015 |
Now I want the repeat tally to tell me how many times a barcode has been repeated for each customer over time. To do this, we simply multiply the factors in our IF
statement:
=IF(ROW()=2, 1, COUNT(IF(([Barcode]=[@Barcode])*([Name]=[@Name]), INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))
The output of the repeat tally is amended to reflect the changes to our parameters:
ID | Barcode | Name | Date | Repeat Tally |
---|---|---|---|---|
1 | 3434324 | Jane | 2/05/2015 | 1 |
2 | 5654656 | Paul | 2/05/2015 | 1 |
3 | 4452367 | Sarah | 2/05/2015 | 1 |
4 | 6789558 | John | 2/05/2015 | 1 |
5 | 9993291 | Emily | 5/05/2015 | 1 |
6 | 4452367 | Paul | 7/05/2015 | 1 |
7 | 5654656 | Paul | 8/05/2015 | 2 |
8 | 4452367 | Jane | 9/05/2015 | 1 |
9 | 4452367 | Paul | 10/05/2015 | 2 |
10 | 5654656 | Sarah | 11/05/2015 | 1 |
11 | 4452367 | Paul | 12/05/2015 | 3 |
12 | 6789558 | Peter | 13/05/2015 | 1 |
13 | 9993291 | Emily | 14/05/2015 | 2 |
14 | 3434324 | Emily | 16/04/2015 | 1 |
In simple terms
There’s no doubt that this formula is complicated. As a means of summary, I have replaced the variables of this formula with square brackets and a letter inside:
={IF(ROW()=2, 1, COUNT(IF([A]=[B], INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}
- [A] = The individual value that is to be ranked
- [B] = The criterion that the value needs to meet in order to be matched and ranked
- [C] = The column that contains the range of values to be tallied
To add multiple factors, amend the equation:
={IF(ROW()=2, 1, COUNT(IF(([A]=[B])*([D]=[E]), INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}
- [A] = The individual value that is to be ranked
- [B] = The criterion that the value needs to meet in order to be matched and ranked
- [C] = The column that contains the range of values to be tallied
- [D] = The second parameter that is to be considered
- [E] = The series that the second parameter needs to matched against
Please note that only number sequences can be counted using this technique.
Comments
2 responses to “Identifying and ranking repeat values in an Excel data series”
I have following excel file
A B
1 25 Apr-2015
2 25 Apr-2015
3 25 Apr-2015
4 26 May-2015
5 26 May-2015
6 26 May-2015
7 27 Jun-2015
8 27 Jun-2015
I want find out by formula that first 25 and Apr-2015 is cell address A1 and last cell address a3
Thanks
I was looking for exactly this but found your formula too complex and then i stumbled upon such a simpler method found here:
Look for heading “Count The Order Of Occurrence Of Duplicates In Excel”
https://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html
Please comment if your method has an advantage so i would rather study yours and save for future ref.
Thank you,
M. Mudassir