# Identifying and ranking repeat values in an Excel data series

I have an Excel spreadsheet where some of the number values are repeated at various points down the list. It’s easy to determine how many times a certain value is repeated within the data set via a `COUNTIF`

function, but how can I determine *which* repeat each particular data point represents (ie: give it a rank)?

Consider the following data set:

ID | Barcode | Date |
---|---|---|

1 | 3434324 | 2/05/2015 |

2 | 5654656 | 2/05/2015 |

3 | 4452367 | 2/05/2015 |

4 | 6789558 | 2/05/2015 |

5 | 9993291 | 5/05/2015 |

6 | 4452367 | 7/05/2015 |

7 | 5654656 | 8/05/2015 |

8 | 4452367 | 9/05/2015 |

9 | 4452367 | 10/05/2015 |

10 | 5654656 | 11/05/2015 |

11 | 4452367 | 12/05/2015 |

12 | 6789558 | 13/05/2015 |

13 | 9993291 | 14/05/2015 |

14 | 3434324 | 16/04/2015 |

As can be seen, some of the barcodes repeat over time. What I need to do is insert another column that indicates the repeat number (rank) of each barcode in the data set. So, for instance, barcode 3434324 at ID=1 would be 1 (first instance) but at ID=14 it would be 2 (second instance).

For this exercise, I shall use an array formula in a table that has been “formatted as table“.

## Part 1: Tell Excel *where* to count

In order to create our tally, a count needs to be made of the number of times that a chosen value has occurred between the *first* row of the table and the *current* row. The current row is determined by `ROW()`

.

We are going to dynamically generate the cell ranges to search using the `ADDRESS`

and `INDIRECT`

functions. Since our table has headers in Row 1, we need to start counting our barcodes from Row 2:

`=ADDRESS(2,COLUMN([Barcode]))`

This returns the value `$B$2`

, which is where our counting will commence. Our counting should conclude on the current row, hence:

`=ADDRESS(ROW(),COLUMN([Barcode]))`

Using an `INDIRECT`

function, we can turn these into a cell range (separated by a colon):

`INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))`

As an example for Row 5, the dynamically-generated range would be $B$2:$B$5.

## Part 2: Tell Excel *what* to count

We’ll use the `COUNT`

function to count the barcodes in our range. We need to compare the barcode in the current row with all other barcodes in the range, confirm matches and then only count against those. To do this, we’ll use an `IF`

statement. Create a new column called “Repeat Tally” and insert the following formula where “Barcode” is the name of the column of data that we’re searching:

`=COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))`

Because this is an array formula, it won’t work simply by typing it into the formula bar and pressing enter. The array formula is activated by pressing Ctrl + Shift + Enter. You will then notice that the curly braces are applied to the formula, indicating that it’s an array:

`={COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))}`

The repeat instance (rank) of each barcode will be displayed in our new “Repeat Tally” column.

## Part 3: Fix the error

This works well, except for the very first row where the first instance of barcode 3434324 is 2. *Wrong!*

ID | Barcode | Date | Repeat Tally |
---|---|---|---|

1 | 3434324 | 2/05/2015 | 2 |

2 | 5654656 | 2/05/2015 | 1 |

3 | 4452367 | 2/05/2015 | 1 |

4 | 6789558 | 2/05/2015 | 1 |

5 | 9993291 | 5/05/2015 | 1 |

6 | 4452367 | 7/05/2015 | 2 |

7 | 5654656 | 8/05/2015 | 2 |

8 | 4452367 | 9/05/2015 | 3 |

9 | 4452367 | 10/05/2015 | 4 |

10 | 5654656 | 11/05/2015 | 3 |

11 | 4452367 | 12/05/2015 | 5 |

12 | 6789558 | 13/05/2015 | 2 |

13 | 9993291 | 14/05/2015 | 2 |

14 | 3434324 | 16/04/2015 | 2 |

To fix this, we add a simple conditional statement to our formula:

`=IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))`

This formula says that if the row number is 2, make the value 1 (it cannot be anything else), otherwise calculate the formula as per normal. Don’t forget to press Ctrl + Shift + Enter to activate the array!

`={IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))}`

Now everything is correct:

ID | Barcode | Date | Repeat Tally |
---|---|---|---|

1 | 3434324 | 2/05/2015 | 1 |

2 | 5654656 | 2/05/2015 | 1 |

3 | 4452367 | 2/05/2015 | 1 |

4 | 6789558 | 2/05/2015 | 1 |

5 | 9993291 | 5/05/2015 | 1 |

6 | 4452367 | 7/05/2015 | 2 |

7 | 5654656 | 8/05/2015 | 2 |

8 | 4452367 | 9/05/2015 | 3 |

9 | 4452367 | 10/05/2015 | 4 |

10 | 5654656 | 11/05/2015 | 3 |

11 | 4452367 | 12/05/2015 | 5 |

12 | 6789558 | 13/05/2015 | 2 |

13 | 9993291 | 14/05/2015 | 2 |

14 | 3434324 | 16/04/2015 | 2 |

## Optional Part 4: Add Multiple Factors

It is possible to add *additional* factors to our equation. Let’s say that we added another parameter to our data set (customer’s name):

ID | Barcode | Name | Date |
---|---|---|---|

1 | 3434324 | Jane | 2/05/2015 |

2 | 5654656 | Paul | 2/05/2015 |

3 | 4452367 | Sarah | 2/05/2015 |

4 | 6789558 | John | 2/05/2015 |

5 | 9993291 | Emily | 5/05/2015 |

6 | 4452367 | Paul | 7/05/2015 |

7 | 5654656 | Paul | 8/05/2015 |

8 | 4452367 | Jane | 9/05/2015 |

9 | 4452367 | Paul | 10/05/2015 |

10 | 5654656 | Sarah | 11/05/2015 |

11 | 4452367 | Paul | 12/05/2015 |

12 | 6789558 | Peter | 13/05/2015 |

13 | 9993291 | Emily | 14/05/2015 |

14 | 3434324 | Emily | 16/04/2015 |

Now I want the repeat tally to tell me how many times a barcode has been repeated for each customer over time. To do this, we simply multiply the factors in our `IF`

statement:

`=IF(ROW()=2, 1, COUNT(IF(([Barcode]=[@Barcode])*([Name]=[@Name]), INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))`

The output of the repeat tally is amended to reflect the changes to our parameters:

ID | Barcode | Name | Date | Repeat Tally |
---|---|---|---|---|

1 | 3434324 | Jane | 2/05/2015 | 1 |

2 | 5654656 | Paul | 2/05/2015 | 1 |

3 | 4452367 | Sarah | 2/05/2015 | 1 |

4 | 6789558 | John | 2/05/2015 | 1 |

5 | 9993291 | Emily | 5/05/2015 | 1 |

6 | 4452367 | Paul | 7/05/2015 | 1 |

7 | 5654656 | Paul | 8/05/2015 | 2 |

8 | 4452367 | Jane | 9/05/2015 | 1 |

9 | 4452367 | Paul | 10/05/2015 | 2 |

10 | 5654656 | Sarah | 11/05/2015 | 1 |

11 | 4452367 | Paul | 12/05/2015 | 3 |

12 | 6789558 | Peter | 13/05/2015 | 1 |

13 | 9993291 | Emily | 14/05/2015 | 2 |

14 | 3434324 | Emily | 16/04/2015 | 1 |

## In simple terms

There’s no doubt that this formula is complicated. As a means of summary, I have replaced the variables of this formula with square brackets and a letter inside:

`={IF(ROW()=2, 1, COUNT(IF([A]=[B], INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}`

- [A] = The individual value that is to be ranked
- [B] = The
*criterion*that the value needs to meet in order to be matched and ranked - [C] = The
*column*that contains the range of values to be tallied

To add multiple factors, amend the equation:

`={IF(ROW()=2, 1, COUNT(IF(([A]=[B])*([D]=[E]), INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}`

- [A] = The individual value that is to be ranked
- [B] = The
*criterion*that the value needs to meet in order to be matched and ranked - [C] = The
*column*that contains the range of values to be tallied - [D] = The second parameter that is to be considered
- [E] = The
*series*that the second parameter needs to matched against

Please note that only number sequences can be counted using this technique.

## Comments:

2 responses to “Identifying and ranking repeat values in an Excel data series”

Written byNitin Patelon19 September 2015: +I have following excel file

A B

1 25 Apr-2015

2 25 Apr-2015

3 25 Apr-2015

4 26 May-2015

5 26 May-2015

6 26 May-2015

7 27 Jun-2015

8 27 Jun-2015

I want find out by formula that first 25 and Apr-2015 is cell address A1 and last cell address a3

Thanks

Written byMudassiron22 June 2018: +I was looking for exactly this but found your formula too complex and then i stumbled upon such a simpler method found here:

Look for heading “Count The Order Of Occurrence Of Duplicates In Excel”https://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html

Please comment if your method has an advantage so i would rather study yours and save for future ref.

Thank you,

M. Mudassir