---
title: "Get started"
output: rmarkdown::html_vignette
description: >
  How to get started with mpathsenser? This vignette takes a 
  look at some example data and provides a practical starting
  point on working with the package.
vignette: >
  %\VignetteIndexEntry{Get started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, echo = FALSE, include = FALSE, message = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

First, make sure we load some useful libraries (and of course `mpathsenser` itself).

```{r setup}
library(tidyr)
library(dplyr)
library(ggplot2)
library(mpathsenser)
```

# Importing files

The data for this vignette is contained in the `extdata` folder. However, on some system this folder may be set to read-only and it is generally good practice not to modify package folders (to prevent changing or breaking the package). To this end, we first copy the data to a temporary directory (as defined by the environment variable \code{TMPDIR}, \code{TMP}, or \code{TEMP}), a directory that is freshly created each time at R's start up and cleaned up when the session ends.

```{r copy data, results="hide"}
# Get the temp folder
tempdir <- tempdir()
tempdir <- file.path(tempdir, "vignette")
dir.create(tempdir)

# Get a handle to the data files
path <- system.file("extdata", "example", package = "mpathsenser")

# Get a list of all the files that are to be copied
copy_list <- list.files(path, "carp-data", full.names = TRUE)

# Copy all data
file.copy(
  from = copy_list,
  to = tempdir,
  overwrite = TRUE,
  copy.mode = FALSE
)
```

The `extdata` folder contains several `.zip` files as well as some `JSON` files. It is likely that the data for your study will look the same only much more. Note that all of these data files came directly from m-Path Sense (i.e. there was no pre-processing yet).

The data from m-Path Sense originates in the following way: The application continuously collects all kinds of data in the background (e.g. accelerometer data). Once collected, the data goes through several stages where, for example, the data is pre-processed (as already happens with data from the `light` sensor) or anonymised upon request. Finally, data is written to a `JSON` file which is really just a text file but with a specific format. When some new data comes in (whether it be from the same sensor or not), the next line is written in the `JSON` file and so on, until the file has reached a certain size (5MB by default). The `JSON` file is then zipped to reduce its size and subsequently transferred to a server. Once transferred, the data is deleted from the participant's phone to both save on space as well as prevent data leakage.

Thus, a first step to take is to unzip these files to extract its `JSON` contents. If you feel more comfortable unzipping using your favourite zip program you can do so, just make sure all files end up in the same directory (including the non-zipped JSON files).

```{r unzip}
unzip_data(path = tempdir)
```

In m-Path Sense, data is written to JSON files as it comes in. In the JSON file format, every file starts with `[` and ends with `]`. If the app is killed, JSON files are not properly closed and hence cannot be read by JSON parsers. So, we must first test if all files are in a valid JSON format and fix those that are not.

You may first run `fix_jsons()` to fix all files in the directory that need fixing, or you can first run `test_jsons()` to get an estimate of how many file need fixing. Running `fix_jsons()` also runs `test_jsons()` implicitly to only fix files that really need fixing.

```{r fix and test JSONS}
# Note that test_jsons returns the full path names
to_fix <- test_jsons(tempdir)
print(to_fix)

fix_jsons(path = NULL, to_fix)
```

Next, we can create an `mpathsenser` database using the `create_db()` function. This function takes as a first argument a pathname where the database will be created and as a second argument a `db_name` to name the database file. You may also leave `path` set to `NULL` and specify a full path name (including file name) in the `db_name` argument. Always remember to save the output to a variable so you can use it later and to avoid having an open handle but unused handle on the database connection.

```{r create db}
# Create a new database
db <- create_db(tempdir, "getstarted.db")
```

Finally, we can import the data into the newly created database using the `import()` function. This function takes as a first argument the path to the directory where the data is located and as a second argument the database handle. You may also specify a `batch_size` to control how many files are imported at once. Files are processed sequentially (though parallellism is supported via the `future` package) but files are written in batch to the database for the sake of efficiency. A greater batch size thus denotes larger batches of files being written to the database at once (and is more efficient), but risk failing the larger batch a single file fails to be processed.

```{r import data}
# Import the data
import(
  path = tempdir,
  db = db,
  sensors = NULL, # All sensors
  batch_size = 12,
  recursive = TRUE
)
```

# Creating a coverage chart

To get a better overview of the data, we may create a coverage chart that shows the average number of collected samples per hour for each sensor for a single participant. An important distinction must be made between a relative and absolute coverage chart. The relative coverage chart shows the average number of samples per hour for each sensor as a percentage of the *expected* number of samples per hour for that sensor. The absolute coverage chart shows the average number of samples per hour for each sensor. The relative coverage chart is useful to see how well the data is being collected for each sensor, while the absolute coverage chart is useful to see how well the data is being collected in general.

```{r}
sensors <- c(
  "Accelerometer", "Activity", "AppUsage", "Bluetooth", "Calendar",
  "Connectivity", "Device", "Gyroscope", "InstalledApps", "Light",
  "Location", "Memory", "Pedometer", "Screen", "Weather", "Wifi"
)
cov <- coverage(
  db = db,
  participant_id = "2784",
  sensor = sensors,
  relative = FALSE
)

print(cov)
```

Additionally, we can print the coverage data using the `plot()` function.

```{r, fig.width=13, fig.height=8, fig.align='center', dpi=55}
plot(cov)
```

# Closing the database

Finally, recall that once you're done working with a database to also close it.

```{r}
close_db(db)
```