Skip to content

Dear Internet Explorer user: Your browser is no longer supported

Please switch to a modern browser such as Microsoft Edge, Mozilla Firefox or Google Chrome to view this website's content.

Cumulative totals by group in R

Use Tidyverse’s dplyr package in R to add cumulative totals to a table based on group.

This week I needed to add some cumulative totals to a dataset in R by group. This isn’t too difficult with the dplyr package from Tidyverse. To demonstrate this concept, we’ll use some timely data from the State of Victoria showing daily COVID-19 vaccination rates. Here’s the data from 1 March 2021 to 4 June 2021:

date <- c("1-Mar-21", "2-Mar-21", "3-Mar-21", "4-Mar-21", "5-Mar-21", "6-Mar-21", "7-Mar-21", "8-Mar-21", "9-Mar-21", "10-Mar-21", "11-Mar-21", "12-Mar-21", "13-Mar-21", "14-Mar-21", "15-Mar-21", "16-Mar-21", "17-Mar-21", "18-Mar-21", "19-Mar-21", "20-Mar-21", "21-Mar-21", "22-Mar-21", "23-Mar-21", "24-Mar-21", "25-Mar-21", "26-Mar-21", "27-Mar-21", "28-Mar-21", "29-Mar-21", "30-Mar-21", "31-Mar-21", "1-Apr-21", "2-Apr-21", "3-Apr-21", "4-Apr-21", "5-Apr-21", "6-Apr-21", "7-Apr-21", "8-Apr-21", "9-Apr-21", "10-Apr-21", "11-Apr-21", "12-Apr-21", "13-Apr-21", "14-Apr-21", "15-Apr-21", "16-Apr-21", "17-Apr-21", "18-Apr-21", "19-Apr-21", "20-Apr-21", "21-Apr-21", "22-Apr-21", "23-Apr-21", "24-Apr-21", "25-Apr-21", "26-Apr-21", "27-Apr-21", "28-Apr-21", "29-Apr-21", "30-Apr-21", "1-May-21", "2-May-21", "3-May-21", "4-May-21", "5-May-21", "6-May-21", "7-May-21", "8-May-21", "9-May-21", "10-May-21", "11-May-21", "12-May-21", "13-May-21", "14-May-21", "15-May-21", "16-May-21", "17-May-21", "18-May-21", "19-May-21", "20-May-21", "21-May-21", "22-May-21", "23-May-21", "24-May-21", "25-May-21", "26-May-21", "27-May-21", "28-May-21", "29-May-21", "30-May-21", "31-May-21", "1-Jun-21", "2-Jun-21", "3-Jun-21", "4-Jun-21")
vaccinated <- c(100, 1158, 1656, 1969, 2262, 1944, 727, 844, 1072, 2831, 3992, 3631, 4350, 904, 606, 3474, 3666, 4041, 4742, 4552, 1398, 1342, 4486, 4385, 5698, 6034, 5072, 3512, 2072, 6483, 6300, 8457, 7555, 910, 247, 443, 1023, 6059, 5778, 131513, 13256, 7408, 3528, 13094, 14382, 14581, 14182, 13009, 5283, 18777, 16705, 15074, 16937, 19185, 13395, 6376, 1109, 11854, 17195, 20489, 16805, 12503, 4964, 2281, 19155, 23867, 22537, 24564, 18528, 7956, 3213, 22044, 19769, 23393, 21895, 19954, 7667, 4370, 24217, 25162, 24961, 26553, 21741, 9175, 4964, 11424, 30498, 30426, 41389, 43458, 27448, 20909, 40389, 51176, 50710, 21475)

We need to format the dates so R can read them:

date <- as.Date(date, "%d-%b-%y")

We also need to load the Tidyverse package (which contains dplyr):

library(tidyverse)

First, we make a dataframe combining the dates and the daily vaccination numbers:

df <- data.frame(date, vaccinated)

Cumulative totals are easy using the cumsum() function combined with dplyr’s mutate():

df <- df %>% mutate(cumulative=cumsum(vaccinated))

Here’s a little bit of output generated using head(df):

        date vaccinated cumulative
1 2021-03-01        100        100
2 2021-03-02       1158       1258
3 2021-03-03       1656       2914
4 2021-03-04       1969       4883
5 2021-03-05       2262       7145
6 2021-03-06       1944       9089

Looks good.

Now what if we wanted to subset the data by groups? For simplicity in this example, we’ll add some categorical information and break our data into groups based on week of the year. (Typical experimental data would already have categorical information). To add the week number to our data frame, we use the following commands:

week <- as.numeric(format(as.Date(date), "%W"))
df <- data.frame(df, week)

Now we have some group data. Let’s check it with print(df) (I’ll only show a sample of the output here):

         date vaccinated cumulative week
1  2021-03-01        100        100    9
2  2021-03-02       1158       1258    9
3  2021-03-03       1656       2914    9
4  2021-03-04       1969       4883    9
5  2021-03-05       2262       7145    9
6  2021-03-06       1944       9089    9
7  2021-03-07        727       9816    9
8  2021-03-08        844      10660   10
9  2021-03-09       1072      11732   10
10 2021-03-10       2831      14563   10
11 2021-03-11       3992      18555   10
12 2021-03-12       3631      22186   10
13 2021-03-13       4350      26536   10
14 2021-03-14        904      27440   10
15 2021-03-15        606      28046   11
16 2021-03-16       3474      31520   11
17 2021-03-17       3666      35186   11
18 2021-03-18       4041      39227   11
19 2021-03-19       4742      43969   11
20 2021-03-20       4552      48521   11
21 2021-03-21       1398      49919   11
22 2021-03-22       1342      51261   12
23 2021-03-23       4486      55747   12
24 2021-03-24       4385      60132   12
25 2021-03-25       5698      65830   12
26 2021-03-26       6034      71864   12
27 2021-03-27       5072      76936   12
28 2021-03-28       3512      80448   12
...

Now we need to summarise the data (to get a weekly total number of vaccinations) then mutate the output to calculate a cumulative total for the weekly statistics. Let’s do it in two steps:

summary <- df %>% group_by(week) %>% summarise(total_vaccinated = sum(vaccinated), .groups = 'drop')

This summarises the totals. If we head(summary), we get the following output:

# A tibble: 6 x 2
   week total_vaccinated
  <dbl>            <dbl>
1     9             9816
2    10            17624
3    11            22479
4    12            30529
5    13            32024
6    14           165480

Now we can add the mutate() function to get a weekly cumulative total:

summary <- df %>% group_by(week) %>% summarise(total_vaccinated = sum(vaccinated), .groups = 'drop') %>% mutate(cumulative_vaccinated = cumsum(total_vaccinated))

This yields the following output:

# A tibble: 14 x 3
    week total_vaccinated cumulative_vaccinated
   <dbl>            <dbl>                 <dbl>
 1     9             9816                  9816
 2    10            17624                 27440
 3    11            22479                 49919
 4    12            30529                 80448
 5    13            32024                112472
 6    14           165480                277952
 7    15            78059                356011
 8    16           106449                462460
 9    17            84919                547379
10    18           118888                666267
11    19           117935                784202
12    20           136179                920381
13    21           189607               1109988
14    22           184659               1294647

And there we have it: summarised data with cumulative totals by group (in this case week of vaccination). Don’t forget that this can be converted into a data frame via data.frame(summary).

   

Comments

No comments have yet been submitted. Be the first!

Have Your Say

The following HTML is permitted:
<a href="" title=""> <b> <blockquote cite=""> <code> <em> <i> <q cite=""> <strike> <strong>

Comments will be published subject to the Editorial Policy.