R, among other things, is a great tool for adhoc data analysis. The data presented in this document was collected over 182 days and represents all application usage activity on my mobile device.
I collected the data with an application called Quality Time. I bought the paid version, so I’ll be able to keep two years worth of logs. This dataset has nearly 30,000 rows - Let’s see how creative we can get with it!
R Setup Chunk
In the setup chunk, I’ve specify which packages are used in the project and defined some other options like, “messages=FALSE”, which suppresses messages from package loading. To obtain the package calls you see commented out beside each package, I’ve used the Annoator package written by Luis Verde Arregoitia.
{r, setup, message=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse) # Easily Install and Load the 'Tidyverse'
library(lubridate) # Make Dealing with Dates a Little Easier
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data
library(ggplot2) # Create Elegant Data Visualisations Using the Grammar of Graphics
library(scales) # Scale Functions for Visualization
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax
Importing the Data
Using read_csv, let’s import the data into a tibble called “history” and define a couple columns types.
history <- read_csv("mob_histo_data_export.csv",
col_types = cols(
"Start Time" = col_character(),
"End Time" = col_character()
))
## # A tibble: 29,633 x 6
## Application `Start Time` `Start Date` `End Time` `End Date` Usage
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Clock 7:30:03 01-Mar-19 7:30:11 01-Mar-19 8 sec
## 2 reddit is fun 7:30:11 01-Mar-19 7:30:23 01-Mar-19 12 sec
## 3 reddit is fun 7:35:01 01-Mar-19 7:35:03 01-Mar-19 2 sec
## 4 Clock 7:35:03 01-Mar-19 7:35:08 01-Mar-19 5 sec
## 5 reddit is fun 7:35:08 01-Mar-19 7:35:16 01-Mar-19 8 sec
## 6 reddit is fun 7:40:01 01-Mar-19 7:40:02 01-Mar-19 1 sec
## 7 Clock 7:40:02 01-Mar-19 7:40:06 01-Mar-19 4 sec
## 8 reddit is fun 7:40:06 01-Mar-19 7:40:14 01-Mar-19 8 sec
## 9 reddit is fun 7:45:01 01-Mar-19 7:45:02 01-Mar-19 1 sec
## 10 Clock 7:45:02 01-Mar-19 7:45:07 01-Mar-19 5 sec
## # … with 29,623 more rows
Stock tibbles are fine, but the kableExtra package does a great job of creating simple, elegant looking tables:
kable(history[1:10, 1:6]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Application | Start Time | Start Date | End Time | End Date | Usage |
---|---|---|---|---|---|
Clock | 7:30:03 | 01-Mar-19 | 7:30:11 | 01-Mar-19 | 8 sec |
reddit is fun | 7:30:11 | 01-Mar-19 | 7:30:23 | 01-Mar-19 | 12 sec |
reddit is fun | 7:35:01 | 01-Mar-19 | 7:35:03 | 01-Mar-19 | 2 sec |
Clock | 7:35:03 | 01-Mar-19 | 7:35:08 | 01-Mar-19 | 5 sec |
reddit is fun | 7:35:08 | 01-Mar-19 | 7:35:16 | 01-Mar-19 | 8 sec |
reddit is fun | 7:40:01 | 01-Mar-19 | 7:40:02 | 01-Mar-19 | 1 sec |
Clock | 7:40:02 | 01-Mar-19 | 7:40:06 | 01-Mar-19 | 4 sec |
reddit is fun | 7:40:06 | 01-Mar-19 | 7:40:14 | 01-Mar-19 | 8 sec |
reddit is fun | 7:45:01 | 01-Mar-19 | 7:45:02 | 01-Mar-19 | 1 sec |
Clock | 7:45:02 | 01-Mar-19 | 7:45:07 | 01-Mar-19 | 5 sec |
Kable in a function is very convenient, so let’s create one:
kable_car <- function(x, y = 6, ...){
kable(x[1:10, 1:y]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
}
We can now display the same table above by calling “kable_car(history)” Convenient!
Not captured in the ten rows displayed here are usage values of minutes and hours (which exist in the raw data).
Tidying and Transforming the Data
The Janitor package is a headache reliever. As in other languages, case matters in R. Messy data often includes columns with inconsistent naming conventions, which is where the Janitor package comes to the rescue. I’ll use it here to standarize the column names - This will save me some headache while writing the next chunks of code.
history_clean <- janitor::clean_names(history) %>%
unite("start", start_date:start_time, sep = " ") %>%
unite("end", end_date:end_time, sep = " ") %>%
mutate_at(vars(start, end), dmy_hms) %>%
mutate(usage_seconds = as.integer(end - start)) %>%
select(-usage)
kable_car(history_clean, 4)
In the same chunk, I used unite and lubridate to bring the date and time together into a useable format; and mutate, to consolidate/convert the start and end times into seconds, allowing for more simple summarizations.
application | start | end | usage_seconds |
---|---|---|---|
Clock | 2019-03-01 07:30:03 | 2019-03-01 07:30:11 | 8 |
reddit is fun | 2019-03-01 07:30:11 | 2019-03-01 07:30:23 | 12 |
reddit is fun | 2019-03-01 07:35:01 | 2019-03-01 07:35:03 | 2 |
Clock | 2019-03-01 07:35:03 | 2019-03-01 07:35:08 | 5 |
reddit is fun | 2019-03-01 07:35:08 | 2019-03-01 07:35:16 | 8 |
reddit is fun | 2019-03-01 07:40:01 | 2019-03-01 07:40:02 | 1 |
Clock | 2019-03-01 07:40:02 | 2019-03-01 07:40:06 | 4 |
reddit is fun | 2019-03-01 07:40:06 | 2019-03-01 07:40:14 | 8 |
reddit is fun | 2019-03-01 07:45:01 | 2019-03-01 07:45:02 | 1 |
Clock | 2019-03-01 07:45:02 | 2019-03-01 07:45:07 | 5 |
This is where it starts to get interesting. Mobile devices are extensions of the human body. Below, I’ll calculate the total time spent in each application in seconds, minutes, hours and then days. If you have a way to do this more elegnatly, please let me know!
app_usage <- history_clean %>%
group_by(application) %>%
summarise(total_seconds = sum(usage_seconds)) %>%
mutate(total_minutes = as.integer(total_seconds / 60)) %>%
mutate(total_hours = as.integer(total_minutes / 60)) %>%
mutate(total_days = as.integer(total_hours / 60)) %>%
mutate(total_days = as.integer(total_hours / 24)) %>%
arrange(desc(total_seconds))
kable_car(app_usage, y = 5)
application | total_seconds | total_minutes | total_hours | total_days |
---|---|---|---|---|
reddit is fun | 704015 | 11733 | 195 | 8 |
Silence | 192867 | 3214 | 53 | 2 |
181452 | 3024 | 50 | 2 | |
Chrome | 181297 | 3021 | 50 | 2 |
Used | 170544 | 2842 | 47 | 1 |
Call screen | 139251 | 2320 | 38 | 1 |
132456 | 2207 | 36 | 1 | |
102208 | 1703 | 28 | 1 | |
Maps | 74599 | 1243 | 20 | 0 |
Snapchat | 59019 | 983 | 16 | 0 |
This post was originally published on on Apr 14, 2020: Analyzing Six Months of Mobile Phone Usage in R.