Skip to content

Identifying and ranking repeat values in an Excel data series

G9th July 2015

C2 Comments

TGeneral

Use Excel to develop a running tally of how many times each particular value in a data series has been repeated.

I have an Excel spreadsheet where some of the number values are repeated at various points down the list. It’s easy to determine how many times a certain value is repeated within the data set via a COUNTIF function, but how can I determine which repeat each particular data point represents (ie: give it a rank)?

Consider the following data set:

ID Barcode Date
1 3434324 2/05/2015
2 5654656 2/05/2015
3 4452367 2/05/2015
4 6789558 2/05/2015
5 9993291 5/05/2015
6 4452367 7/05/2015
7 5654656 8/05/2015
8 4452367 9/05/2015
9 4452367 10/05/2015
10 5654656 11/05/2015
11 4452367 12/05/2015
12 6789558 13/05/2015
13 9993291 14/05/2015
14 3434324 16/04/2015

As can be seen, some of the barcodes repeat over time. What I need to do is insert another column that indicates the repeat number (rank) of each barcode in the data set. So, for instance, barcode 3434324 at ID=1 would be 1 (first instance) but at ID=14 it would be 2 (second instance).

For this exercise, I shall use an array formula in a table that has been “formatted as table“.

Part 1: Tell Excel where to count

In order to create our tally, a count needs to be made of the number of times that a chosen value has occurred between the first row of the table and the current row. The current row is determined by ROW().

We are going to dynamically generate the cell ranges to search using the ADDRESS and INDIRECT functions. Since our table has headers in Row 1, we need to start counting our barcodes from Row 2:

=ADDRESS(2,COLUMN([Barcode]))

This returns the value $B$2, which is where our counting will commence. Our counting should conclude on the current row, hence:

=ADDRESS(ROW(),COLUMN([Barcode]))

Using an INDIRECT function, we can turn these into a cell range (separated by a colon):

INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))

As an example for Row 5, the dynamically-generated range would be $B$2:$B$5.

Part 2: Tell Excel what to count

We’ll use the COUNT function to count the barcodes in our range. We need to compare the barcode in the current row with all other barcodes in the range, confirm matches and then only count against those. To do this, we’ll use an IF statement. Create a new column called “Repeat Tally” and insert the following formula where “Barcode” is the name of the column of data that we’re searching:

=COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))

Because this is an array formula, it won’t work simply by typing it into the formula bar and pressing enter. The array formula is activated by pressing Ctrl + Shift + Enter. You will then notice that the curly braces are applied to the formula, indicating that it’s an array:

={COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode])))))}

The repeat instance (rank) of each barcode will be displayed in our new “Repeat Tally” column.

Part 3: Fix the error

This works well, except for the very first row where the first instance of barcode 3434324 is 2. Wrong!

ID Barcode Date Repeat Tally
1 3434324 2/05/2015 2
2 5654656 2/05/2015 1
3 4452367 2/05/2015 1
4 6789558 2/05/2015 1
5 9993291 5/05/2015 1
6 4452367 7/05/2015 2
7 5654656 8/05/2015 2
8 4452367 9/05/2015 3
9 4452367 10/05/2015 4
10 5654656 11/05/2015 3
11 4452367 12/05/2015 5
12 6789558 13/05/2015 2
13 9993291 14/05/2015 2
14 3434324 16/04/2015 2

To fix this, we add a simple conditional statement to our formula:

=IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))

This formula says that if the row number is 2, make the value 1 (it cannot be anything else), otherwise calculate the formula as per normal. Don’t forget to press Ctrl + Shift + Enter to activate the array!

={IF(ROW()=2, 1, COUNT(IF([Barcode]=[@Barcode], INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))}

Now everything is correct:

ID Barcode Date Repeat Tally
1 3434324 2/05/2015 1
2 5654656 2/05/2015 1
3 4452367 2/05/2015 1
4 6789558 2/05/2015 1
5 9993291 5/05/2015 1
6 4452367 7/05/2015 2
7 5654656 8/05/2015 2
8 4452367 9/05/2015 3
9 4452367 10/05/2015 4
10 5654656 11/05/2015 3
11 4452367 12/05/2015 5
12 6789558 13/05/2015 2
13 9993291 14/05/2015 2
14 3434324 16/04/2015 2

Optional Part 4: Add Multiple Factors

It is possible to add additional factors to our equation. Let’s say that we added another parameter to our data set (customer’s name):

ID Barcode Name Date
1 3434324 Jane 2/05/2015
2 5654656 Paul 2/05/2015
3 4452367 Sarah 2/05/2015
4 6789558 John 2/05/2015
5 9993291 Emily 5/05/2015
6 4452367 Paul 7/05/2015
7 5654656 Paul 8/05/2015
8 4452367 Jane 9/05/2015
9 4452367 Paul 10/05/2015
10 5654656 Sarah 11/05/2015
11 4452367 Paul 12/05/2015
12 6789558 Peter 13/05/2015
13 9993291 Emily 14/05/2015
14 3434324 Emily 16/04/2015

Now I want the repeat tally to tell me how many times a barcode has been repeated for each customer over time. To do this, we simply multiply the factors in our IFstatement:

=IF(ROW()=2, 1, COUNT(IF(([Barcode]=[@Barcode])*([Name]=[@Name]), INDIRECT(ADDRESS(2,COLUMN([Barcode]))&":"&ADDRESS(ROW(),COLUMN([Barcode]))))))

The output of the repeat tally is amended to reflect the changes to our parameters:

ID Barcode Name Date Repeat Tally
1 3434324 Jane 2/05/2015 1
2 5654656 Paul 2/05/2015 1
3 4452367 Sarah 2/05/2015 1
4 6789558 John 2/05/2015 1
5 9993291 Emily 5/05/2015 1
6 4452367 Paul 7/05/2015 1
7 5654656 Paul 8/05/2015 2
8 4452367 Jane 9/05/2015 1
9 4452367 Paul 10/05/2015 2
10 5654656 Sarah 11/05/2015 1
11 4452367 Paul 12/05/2015 3
12 6789558 Peter 13/05/2015 1
13 9993291 Emily 14/05/2015 2
14 3434324 Emily 16/04/2015 1

In simple terms

There’s no doubt that this formula is complicated. As a means of summary, I have replaced the variables of this formula with square brackets and a letter inside:

={IF(ROW()=2, 1, COUNT(IF([A]=[B], INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}
  • [A] = The individual value that is to be ranked
  • [B] = The criterion that the value needs to meet in order to be matched and ranked
  • [C] = The column that contains the range of values to be tallied

To add multiple factors, amend the equation:

={IF(ROW()=2, 1, COUNT(IF(([A]=[B])*([D]=[E]), INDIRECT(ADDRESS(2,COLUMN([C]))&":"&ADDRESS(ROW(),COLUMN([C]))))))}
  • [A] = The individual value that is to be ranked
  • [B] = The criterion that the value needs to meet in order to be matched and ranked
  • [C] = The column that contains the range of values to be tallied
  • [D] = The second parameter that is to be considered
  • [E] = The series that the second parameter needs to matched against

Please note that only number sequences can be counted using this technique.

   

Comments:

2 responses to “Identifying and ranking repeat values in an Excel data series”

  • Written by Nitin Patel on 19 September 2015:

    I have following excel file

    A B
    1 25 Apr-2015
    2 25 Apr-2015
    3 25 Apr-2015
    4 26 May-2015
    5 26 May-2015
    6 26 May-2015
    7 27 Jun-2015
    8 27 Jun-2015

    I want find out by formula that first 25 and Apr-2015 is cell address A1 and last cell address a3

    Thanks

  • Written by Mudassir on 22 June 2018:

    I was looking for exactly this but found your formula too complex and then i stumbled upon such a simpler method found here:

    Look for heading “Count The Order Of Occurrence Of Duplicates In Excel”
    https://www.extendoffice.com/documents/excel/1499-count-duplicate-values-in-column.html

    Please comment if your method has an advantage so i would rather study yours and save for future ref.

    Thank you,
    M. Mudassir

Have your say:

The following HTML is permitted:
<a href="" title=""> <b> <blockquote cite=""> <code> <em> <i> <q cite=""> <strike> <strong>

Comments will be published subject to the Editorial Policy.