Skip to content

Applying dates to normalised data in Excel

G19th July 2015

CNo Comments

TExcel, Microsoft Office

Use this formula to apply dates to all variables in a normalised linear list data set in Excel.

In order to create Pivot Tables in Excel (and conduct other analyses), I often restructure matrix data into a “linear list”. This is variously known as unpivoting, normalising or linearising data. (An excellent explanation of this can be found on ExcelCampus).

I recently linearised (normalised) a large data set, as per the example below. Unfortunately the script that linearised the data listed the date as another variable rather than attaching it to each instance. I needed a formula to assign the correct date to each variable.

Here’s an example of the linearised data output, which was converted from a matrix:

Barcode Variable Value
1000547 Time Stamp 12/02/2015 9:51
1000547 vissv1 1012.305
1000547 vissv2 1072.168
1000547 vissv3 1228.595
1000547 vistv 1384.012
1000548 Time Stamp 12/02/2015 9:52
1000548 vissv1 1172.659
1000548 vissv2 1209.106
1000548 vissv3 1003.14
1000548 vistv 1439.781
1000552 Time Stamp 12/02/2015 9:54
1000552 vissv1 1110.405
1000552 vissv2 1099.066
1000552 vissv3 1111.265
1000552 vistv 1505.061

As can be seen, in this format it’s not possible to determine which variable was measured on which date. It is inferred because of the order in which the data is entered, but this will be lost if the data is re-sorted.

This is what I needed to achieve: A date matched to each measured variable.

Barcode Variable Value Date
1000547 vissv1 1012.305 12/02/2015
1000547 vissv2 1072.168 12/02/2015
1000547 vissv3 1228.595 12/02/2015
1000547 vistv 1384.012 12/02/2015
1000548 vissv1 1172.659 12/02/2015
1000548 vissv2 1209.106 12/02/2015
1000548 vissv3 1003.14 12/02/2015
1000548 vistv 1439.781 12/02/2015
1000552 vissv1 1110.405 12/02/2015
1000552 vissv2 1099.066 12/02/2015
1000552 vissv3 1111.265 12/02/2015
1000552 vistv 1505.061 12/02/2015

With 5000+ rows of data, doing this manually is not an option.

Identifying dates from data

In order to extract the dates, I needed to format the list as a table.

Next, I added a column called “Date” (which is shown in the second table above).

Unfortunately the date data is not actually in date format. There are two spaces between the time and date and the cell format is “custom”.

In essence, the dates are text and do not actually translate to a date value.

Excel stores dates and times as a number representing the number of days since 1 January 1900, plus a fractional portion of a 24 hour day: ddddd.tttttt. This is called a serial date. Typically, machines that generate data sets (especially in CSV format) do not enter dates in serial format (as is the case here) and so Excel cannot read the date properly. To test whether a date is entered correctly, a DATEVALUE() can be used to test this. An error indicates that the date is stored as text. This is a problem.

The formula

To extract the dates for all the entries, I needed to apply the following logic for each line of data:

  1. Is the number in the “Value” column meant to be read as a date? (eg: 12/02/2015 9:51:26 AM)
  2. If so, what is the translation of that number into a serial date?
  3. If the number in the “Value” column is not meant to be read as a date, what date should be matched to that row?

Two tests will be applied to see whether a number is a date:

  • The serial date should start with a “4” (40000 = 6 July 2009 and all dates fall after this period)
  • There should be a decimal point at the sixth position (eg: 12/02/2015 9:51:26AM = 42047.4107175926)

The chances that a number meets these criteria and isn’t a date is slim in this data set. Beware: Other data sets may require additional tests. Think carefully about this step.

The following formula will apply the test:

=IF(AND(FIND("4",[@Value],1)=1, FIND(".",[@Value],1)=6), SUM(DATEVALUE(TEXT([@Value], "dd/mm/yyyy"))+TIMEVALUE(TEXT([@Value], "hh:mm"))))

This formula applies the aforementioned test (using a combination of IF and AND functions) and extracts both the date (DATEVALUE) and time (TIMEVALUE) from the text string. A SUM function is used to add them together to derive the full serial date.

Unfortunately, the formula is not complete, because:

  • If a number in the “Value” column contains a “4” anywhere in it but can’t be read as a date, the equation will return a FALSE value
  • If a number in the “Value” column does not contain a “4” anywhere and can’t be read as a date, the equation will return a #VALUE! error

To fix these, we need to expand the equation so that under both scenarios, the equation will obtain the date from the cell above.

Here’s the final equation:

=IFERROR(IF(AND(FIND("4",[@Value],1)=1, FIND(".",[@Value],1)=6), SUM(DATEVALUE(TEXT([@Value], "dd/mm/yyyy"))+TIMEVALUE(TEXT([@Value], "hh:mm"))), INDIRECT("R[-1]", FALSE)), INDIRECT("R[-1]", FALSE))

The combination of IFERROR and IF helps deal with both scenarios. The instructions about where to get the date from are given by INDIRECT("R[-1]", FALSE) which says “get the date from the cell above” (refer to a previous blog post about INDIRECT). The INDIRECT function must be repeated twice within the formula in order to deal with both the FALSE and #VALUE! answers.

Finishing the job

The next step is to copy all of the dates and paste as values. That way, the dates and times will permanently be attached to each variable measurement.

Finally, re-sort the data and delete all the entries where “Time Stamp” appears in the “Variable” column.

   

Comments:

No comments have yet been submitted. Be the first!

Have your say:

The following HTML is permitted:
<a href="" title=""> <b> <blockquote cite=""> <code> <em> <i> <q cite=""> <strike> <strong>

Comments will be published subject to the Editorial Policy.