Applying dates to normalised data in Excel
In order to create Pivot Tables in Excel (and conduct other analyses), I often restructure matrix data into a “linear list”. This is variously known as unpivoting, normalising or linearising data. (An excellent explanation of this can be found on ExcelCampus).
I recently linearised (normalised) a large data set, as per the example below. Unfortunately the script that linearised the data listed the date as another variable rather than attaching it to each instance. I needed a formula to assign the correct date to each variable.
Here’s an example of the linearised data output, which was converted from a matrix:
|1000547||Time Stamp||12/02/2015 9:51|
|1000548||Time Stamp||12/02/2015 9:52|
|1000552||Time Stamp||12/02/2015 9:54|
As can be seen, in this format it’s not possible to determine which variable was measured on which date. It is inferred because of the order in which the data is entered, but this will be lost if the data is re-sorted.
This is what I needed to achieve: A date matched to each measured variable.
With 5000+ rows of data, doing this manually is not an option.
Identifying dates from data
In order to extract the dates, I needed to format the list as a table.
Next, I added a column called “Date” (which is shown in the second table above).
Unfortunately the date data is not actually in date format. There are two spaces between the time and date and the cell format is “custom”.
In essence, the dates are text and do not actually translate to a date value.
Excel stores dates and times as a number representing the number of days since 1 January 1900, plus a fractional portion of a 24 hour day:
ddddd.tttttt. This is called a serial date. Typically, machines that generate data sets (especially in CSV format) do not enter dates in serial format (as is the case here) and so Excel cannot read the date properly. To test whether a date is entered correctly, a
DATEVALUE() can be used to test this. An error indicates that the date is stored as text. This is a problem.
To extract the dates for all the entries, I needed to apply the following logic for each line of data:
- Is the number in the “Value” column meant to be read as a date? (
eg: 12/02/2015 9:51:26 AM)
- If so, what is the translation of that number into a serial date?
- If the number in the “Value” column is not meant to be read as a date, what date should be matched to that row?
Two tests will be applied to see whether a number is a date:
- The serial date should start with a “4” (40000 = 6 July 2009 and all dates fall after this period)
- There should be a decimal point at the sixth position (eg: 12/02/2015 9:51:26AM = 42047.4107175926)
The chances that a number meets these criteria and isn’t a date is slim in this data set. Beware: Other data sets may require additional tests. Think carefully about this step.
The following formula will apply the test:
=IF(AND(FIND("4",[@Value],1)=1, FIND(".",[@Value],1)=6), SUM(DATEVALUE(TEXT([@Value], "dd/mm/yyyy"))+TIMEVALUE(TEXT([@Value], "hh:mm"))))
This formula applies the aforementioned test (using a combination of
AND functions) and extracts both the date (
DATEVALUE) and time (
TIMEVALUE) from the text string. A
SUM function is used to add them together to derive the full serial date.
Unfortunately, the formula is not complete, because:
- If a number in the “Value” column contains a “4” anywhere in it but can’t be read as a date, the equation will return a
- If a number in the “Value” column does not contain a “4” anywhere and can’t be read as a date, the equation will return a
To fix these, we need to expand the equation so that under both scenarios, the equation will obtain the date from the cell above.
Here’s the final equation:
=IFERROR(IF(AND(FIND("4",[@Value],1)=1, FIND(".",[@Value],1)=6), SUM(DATEVALUE(TEXT([@Value], "dd/mm/yyyy"))+TIMEVALUE(TEXT([@Value], "hh:mm"))), INDIRECT("R[-1]", FALSE)), INDIRECT("R[-1]", FALSE))
The combination of
IF helps deal with both scenarios. The instructions about where to get the date from are given by
INDIRECT("R[-1]", FALSE) which says “get the date from the cell above” (refer to a previous blog post about
INDIRECT function must be repeated twice within the formula in order to deal with both the
Finishing the job
The next step is to copy all of the dates and paste as values. That way, the dates and times will permanently be attached to each variable measurement.
Finally, re-sort the data and delete all the entries where “Time Stamp” appears in the “Variable” column.