My Profile Photo

Lane Clark


Data Enthusiast. Maker. {JOB_ID} Analyst.


Analyzing Six Months of Mobile Phone Usage in R

R, among other things, is a great tool for adhoc data analysis. The data presented in this document was collected over 182 days and represents all application usage activity on my mobile device.

I collected the data with an application called Quality Time. I bought the paid version, so I’ll be able to keep two years worth of logs. This dataset has nearly 30,000 rows - Let’s see how creative we can get with it!

R Setup Chunk

In the setup chunk, I’ve specify which packages are used in the project and defined some other options like, “messages=FALSE”, which suppresses messages from package loading. To obtain the package calls you see commented out beside each package, I’ve used the Annoator package written by Luis Verde Arregoitia.

{r, setup, message=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse) # Easily Install and Load the 'Tidyverse'
library(lubridate) # Make Dealing with Dates a Little Easier
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data
library(ggplot2) # Create Elegant Data Visualisations Using the Grammar of Graphics
library(scales) # Scale Functions for Visualization
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax

Importing the Data

Using read_csv, let’s import the data into a tibble called “history” and define a couple columns types.

history <- read_csv("mob_histo_data_export.csv", 
                        col_types = cols(
                        "Start Time" = col_character(),
                        "End Time" = col_character()
                        ))
 
## # A tibble: 29,633 x 6
##    Application   `Start Time` `Start Date` `End Time` `End Date` Usage 
##    <chr>         <chr>        <chr>        <chr>      <chr>      <chr> 
##  1 Clock         7:30:03      01-Mar-19    7:30:11    01-Mar-19  8 sec 
##  2 reddit is fun 7:30:11      01-Mar-19    7:30:23    01-Mar-19  12 sec
##  3 reddit is fun 7:35:01      01-Mar-19    7:35:03    01-Mar-19  2 sec 
##  4 Clock         7:35:03      01-Mar-19    7:35:08    01-Mar-19  5 sec 
##  5 reddit is fun 7:35:08      01-Mar-19    7:35:16    01-Mar-19  8 sec 
##  6 reddit is fun 7:40:01      01-Mar-19    7:40:02    01-Mar-19  1 sec 
##  7 Clock         7:40:02      01-Mar-19    7:40:06    01-Mar-19  4 sec 
##  8 reddit is fun 7:40:06      01-Mar-19    7:40:14    01-Mar-19  8 sec 
##  9 reddit is fun 7:45:01      01-Mar-19    7:45:02    01-Mar-19  1 sec 
## 10 Clock         7:45:02      01-Mar-19    7:45:07    01-Mar-19  5 sec 
## # … with 29,623 more rows

Stock tibbles are fine, but the kableExtra package does a great job of creating simple, elegant looking tables:

kable(history[1:10, 1:6]) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Application Start Time Start Date End Time End Date Usage
Clock 7:30:03 01-Mar-19 7:30:11 01-Mar-19 8 sec
reddit is fun 7:30:11 01-Mar-19 7:30:23 01-Mar-19 12 sec
reddit is fun 7:35:01 01-Mar-19 7:35:03 01-Mar-19 2 sec
Clock 7:35:03 01-Mar-19 7:35:08 01-Mar-19 5 sec
reddit is fun 7:35:08 01-Mar-19 7:35:16 01-Mar-19 8 sec
reddit is fun 7:40:01 01-Mar-19 7:40:02 01-Mar-19 1 sec
Clock 7:40:02 01-Mar-19 7:40:06 01-Mar-19 4 sec
reddit is fun 7:40:06 01-Mar-19 7:40:14 01-Mar-19 8 sec
reddit is fun 7:45:01 01-Mar-19 7:45:02 01-Mar-19 1 sec
Clock 7:45:02 01-Mar-19 7:45:07 01-Mar-19 5 sec

Kable in a function is very convenient, so let’s create one:

kable_car <- function(x, y = 6, ...){
  kable(x[1:10, 1:y]) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
}

We can now display the same table above by calling “kable_car(history)” Convenient!

Not captured in the ten rows displayed here are usage values of minutes and hours (which exist in the raw data).

Tidying and Transforming the Data

The Janitor package is a headache reliever. As in other languages, case matters in R. Messy data often includes columns with inconsistent naming conventions, which is where the Janitor package comes to the rescue. I’ll use it here to standarize the column names - This will save me some headache while writing the next chunks of code.

history_clean <- janitor::clean_names(history) %>%
  unite("start", start_date:start_time, sep = " ") %>% 
  unite("end", end_date:end_time, sep = " ") %>%
  mutate_at(vars(start, end), dmy_hms) %>%
    mutate(usage_seconds = as.integer(end - start)) %>% 
  select(-usage)
kable_car(history_clean, 4)

In the same chunk, I used unite and lubridate to bring the date and time together into a useable format; and mutate, to consolidate/convert the start and end times into seconds, allowing for more simple summarizations.

application start end usage_seconds
Clock 2019-03-01 07:30:03 2019-03-01 07:30:11 8
reddit is fun 2019-03-01 07:30:11 2019-03-01 07:30:23 12
reddit is fun 2019-03-01 07:35:01 2019-03-01 07:35:03 2
Clock 2019-03-01 07:35:03 2019-03-01 07:35:08 5
reddit is fun 2019-03-01 07:35:08 2019-03-01 07:35:16 8
reddit is fun 2019-03-01 07:40:01 2019-03-01 07:40:02 1
Clock 2019-03-01 07:40:02 2019-03-01 07:40:06 4
reddit is fun 2019-03-01 07:40:06 2019-03-01 07:40:14 8
reddit is fun 2019-03-01 07:45:01 2019-03-01 07:45:02 1
Clock 2019-03-01 07:45:02 2019-03-01 07:45:07 5

This is where it starts to get interesting. Mobile devices are extensions of the human body. Below, I’ll calculate the total time spent in each application in seconds, minutes, hours and then days. If you have a way to do this more elegnatly, please let me know!

app_usage <- history_clean %>% 
    group_by(application) %>% 
  summarise(total_seconds = sum(usage_seconds)) %>% 
  mutate(total_minutes = as.integer(total_seconds / 60)) %>% 
   mutate(total_hours = as.integer(total_minutes / 60)) %>% 
     mutate(total_days = as.integer(total_hours / 60)) %>% 
        mutate(total_days = as.integer(total_hours / 24)) %>% 
      arrange(desc(total_seconds))
kable_car(app_usage, y = 5)
application total_seconds total_minutes total_hours total_days
reddit is fun 704015 11733 195 8
Silence 192867 3214 53 2
Instagram 181452 3024 50 2
Chrome 181297 3021 50 2
Used 170544 2842 47 1
Call screen 139251 2320 38 1
Facebook 132456 2207 36 1
Google 102208 1703 28 1
Maps 74599 1243 20 0
Snapchat 59019 983 16 0

This post was originally published on on Apr 14, 2020: Analyzing Six Months of Mobile Phone Usage in R.