Package 'mpathsenser'

Title: Process and Analyse Data from m-Path Sense
Description: Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically 'm-Path Sense' <https://m-path.io>. The main task of 'mpathsenser' is therefore to read 'm-Path Sense' JSON files into a database and provide several convenience functions to aid in data processing.
Authors: Koen Niemeijer [aut, cre] , Kristof Meers [ctb] , KU Leuven [cph, fnd]
Maintainer: Koen Niemeijer <[email protected]>
License: GPL (>= 3)
Version: 1.2.3.9000
Built: 2025-02-24 16:37:54 UTC
Source: https://github.com/koenniem/mpathsenser

Help Index


Add gap periods to sensor data

Description

[Stable]

Since there may be many gaps in mobile sensing data, it is pivotal to pay attention to them in the analysis. This function adds known gaps to data as "measurements", thereby allowing easier calculations for, for example, finding the duration. For instance, consider a participant spent 30 minutes walking. However, if it is known there is gap of 15 minutes in this interval, we should somehow account for it. add_gaps accounts for this by adding the gap data to sensors data by splitting intervals where gaps occur.

Usage

add_gaps(data, gaps, by = NULL, continue = FALSE, fill = NULL)

Arguments

data

A data frame containing the data. See get_data() for retrieving data from an mpathsenser database.

gaps

A data frame (extension) containing the gap data. See identify_gaps() for retrieving gap data from an mpathsenser database. It should at least contain the columns from and to (both in a date-time format), as well as any specified columns in by.

by

A character vector indicating the variable(s) to match by, typically the participant IDs. If NULL, the default, ⁠*_join()⁠ will perform a natural join, using all variables in common across ⁠x and ⁠y'.

continue

Whether to continue the measurement(s) prior to the gap once the gap ends.

fill

A named list of the columns to fill with default values for the extra measurements that are added because of the gaps.

Details

In the example of 30 minutes walking where a 15 minute gap occurred (say after 5 minutes), add_gaps() adds two rows: one after 5 minutes of the start of the interval indicating the start of the gap(if needed containing values from fill), and one after 20 minutes of the start of the interval signalling the walking activity. Then, when calculating time differences between subsequent measurements, the gap period is appropriately accounted for. Note that if multiple measurements occurred before the gap, they will both be continued after the gap.

Value

A tibble containing the data and the added gaps.

Warning

Depending on the sensor that is used to identify the gaps (though this is typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a small delay between the start of the gap and the actual start of the gap. For example, if the accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last accelerometer measurement (so just before the next measurement), the app was killed. However, within that time other measurements may still have taken place, thereby technically occurring "within" the gap. This is especially important if you want to use these gaps in add_gaps since this issue may lead to erroneous results.

An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.

See Also

identify_gaps() for finding gaps in the sampling; link_gaps() for linking gaps to ESM data, analogous to link().

Examples

# Define some data
dat <- data.frame(
  participant_id = "12345",
  time = as.POSIXct(c("2022-05-10 10:00:00", "2022-05-10 10:30:00", "2022-05-10 11:30:00")),
  type = c("WALKING", "STILL", "RUNNING"),
  confidence = c(80, 100, 20)
)

# Get the gaps from identify_gaps, but in this example define them ourselves
gaps <- data.frame(
  participant_id = "12345",
  from = as.POSIXct(c("2022-05-10 10:05:00", "2022-05-10 10:50:00")),
  to = as.POSIXct(c("2022-05-10 10:20:00", "2022-05-10 11:10:00"))
)

# Now add the gaps to the data
add_gaps(
  data = dat,
  gaps = gaps,
  by = "participant_id"
)

# You can use fill if you want to get rid of those pesky NA's
add_gaps(
  data = dat,
  gaps = gaps,
  by = "participant_id",
  fill = list(type = "GAP", confidence = 100)
)

Find the category of an app on the Google Play Store

Description

[Stable]

This function scrapes the Google Play Store by using name as the search term. From there it selects the first result in the list and its corresponding category and package name.

Usage

app_category(name, num = 1, rate_limit = 5, exact = TRUE)

Arguments

name

The name of the app to search for.

num

Which result should be selected in the list of search results. Defaults to one.

rate_limit

The time interval to keep between queries, in seconds. If the rate limit is too low, the Google Play Store may reject further requests or even ban your entirely.

exact

In m-Path Sense, the app names of the AppUsage sensor are the last part of the app's package names. When exact is TRUE, the function guarantees that name is exactly equal to the last part of the selected package from the search results. Note that when exact is TRUE, it interacts with num in the sense that it no longer selects the top search result but instead the top search result that matches the last part of the package name.

Value

A list containing the following fields:

package the package name that was selected from the Google Play search
genre the corresponding genre of this package

Warning

Do not abuse this function or you will be banned by the Google Play Store. The minimum delay between requests seems to be around 5 seconds, but this is untested. Also make sure not to do batch lookups, as many subsequent requests will get you blocked as well.

Examples

app_category("whatsapp")

# Example of a generic app name where we can't find a specific app
app_category("weather") # Weather forecast channel

# Get OnePlus weather
app_category("net.oneplus.weather")

Create bins in variable time series

Description

[Stable]

In time series with variable measurements, an often recurring task is calculating the total time spent (i.e. the duration) in fixed bins, for example per hour or day. However, this may be difficult when two subsequent measurements are in different bins or span over multiple bins.

Usage

bin_data(
  data,
  start_time,
  end_time,
  by = c("sec", "min", "hour", "day"),
  fixed = TRUE,
  .name = "bin"
)

Arguments

data

A data frame or tibble containing the time series.

start_time

The column name of the start time of the interval, a POSIXt.

end_time

The column name of the end time of the interval, a POSIXt.

by

A binning specification.

fixed

Whether to create fixed bins. If TRUE, bins will be rounded to, for example, whole hours or days (depending on by). If FALSE, bins will be created based on the first timestamp.

.name

The name of the column containing the nested data.

Value

A tibble containing the group columns (if any), date, hour (if by = "hour"), and the duration in seconds.

See Also

link_gaps() for linking gaps to data.

Examples

library(dplyr)

data <- tibble(
  participant_id = 1,
  datetime = c(
    "2022-06-21 15:00:00", "2022-06-21 15:55:00",
    "2022-06-21 17:05:00", "2022-06-21 17:10:00"
  ),
  confidence = 100,
  type = "WALKING"
)

# get bins per hour, even if the interval is longer than one hour
data |>
  mutate(datetime = as.POSIXct(datetime)) |>
  mutate(lead = lead(datetime)) |>
  bin_data(
    start_time = datetime,
    end_time = lead,
    by = "hour"
  )

# Alternatively, you can give an integer value to by to create custom-sized
# bins, but only if fixed = FALSE. Not that these bins are not rounded to,
# as in this example 30 minutes, but rather depends on the earliest time
# in the group.
data |>
  mutate(datetime = as.POSIXct(datetime)) |>
  mutate(lead = lead(datetime)) |>
  bin_data(
    start_time = datetime,
    end_time = lead,
    by = 1800L,
    fixed = FALSE
  )

# More complicated data for showcasing grouping:
data <- tibble(
  participant_id = 1,
  datetime = c(
    "2022-06-21 15:00:00", "2022-06-21 15:55:00",
    "2022-06-21 17:05:00", "2022-06-21 17:10:00"
  ),
  confidence = 100,
  type = c("STILL", "WALKING", "STILL", "WALKING")
)

# binned_intervals also takes into account the prior grouping structure
out <- data |>
  mutate(datetime = as.POSIXct(datetime)) |>
  group_by(participant_id) |>
  mutate(lead = lead(datetime)) |>
  group_by(participant_id, type) |>
  bin_data(
    start_time = datetime,
    end_time = lead,
    by = "hour"
  )
print(out)

# To get the duration for each bin (note to change the variable names in sum):
purrr::map_dbl(
  out$bin_data,
  ~ sum(as.double(.x$lead) - as.double(.x$datetime),
    na.rm = TRUE
  )
)

# Or:
out |>
  tidyr::unnest(bin_data, keep_empty = TRUE) |>
  mutate(duration = .data$lead - .data$datetime) |>
  group_by(bin, .add = TRUE) |>
  summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")

Copy mpathsenser zip files to a new location

Description

[Stable]

Copy zip files from a source destination to an origin destination where they do not yet exist. That is, it only updates the target folder from the source folder.

Usage

ccopy(from, to, recursive = TRUE)

Arguments

from

A path to copy files from.

to

A path to copy files to.

recursive

Should files from subdirectories be copied?

Value

A message indicating how many files were copied.

Examples

## Not run: 
ccopy("K:/data/myproject/", "~/myproject")

## End(Not run)

Close a database connection

Description

[Stable]

This is a convenience function that is simply a wrapper around DBI::dbDisconnect().

Usage

close_db(db)

Arguments

db

A database connection to an m-Path Sense database.

Value

Returns invisibly regardless of whether the database is active, valid, or even exists.

See Also

open_db() for opening an mpathsenser database.

Examples

# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")

# Then close it
close_db(db)

# You can even try to close a database that is already closed. This will not trigger an error.
close_db(db)

# Cleanup
file.remove(file.path(tempdir(), "mydb.db"))

Copy (a subset of) a database to another database

Description

[Stable]

Usage

copy_db(source_db, target_db, sensor = "All")

Arguments

source_db

A mpathsenser database connection from where the data will be transferred.

target_db

A mpathsenser database connection where the data will be transferred to. create_db() to create a new database.

sensor

A character vector containing one or multiple sensors. See sensors for a list of available sensors. Use "All" for all available sensors.

Value

Returns TRUE invisibly, called for side effects.

Examples

# First create two databases in a temporary directory
db1 <- create_db(tempdir(), "mydb1.db")
db2 <- create_db(tempdir(), "mydb2.db")

# Populate the first database with some data
DBI::dbExecute(db1, "INSERT INTO Study VALUES ('study_1', 'default')")
DBI::dbExecute(db1, "INSERT INTO Participant VALUES ('1', 'study_1')")
DBI::dbExecute(db1, "INSERT INTO Activity VALUES(
               '123', '1', '2024-01-01', '08:00:00', '100', 'WALKING')")

# Then copy the first database to the second database
copy_db(db1, db2)

# Check that the second database has the same data as the first database
get_data(db2, "Activity")

# Cleanup
close_db(db1)
close_db(db2)
file.remove(file.path(tempdir(), "mydb1.db"))
file.remove(file.path(tempdir(), "mydb2.db"))

Create a coverage chart of the sampling rate

Description

[Stable]

Only applicable to non-reactive sensors with 'continuous' sampling

Usage

coverage(
  db,
  participant_id,
  sensor = NULL,
  frequency = mpathsenser::freq,
  relative = TRUE,
  offset = "None",
  start_date = NULL,
  end_date = NULL,
  plot = deprecated()
)

Arguments

db

A valid database connection. Schema must be that as it is created by open_db.

participant_id

A character string of one participant ID.

sensor

A character vector containing one or multiple sensors. See sensors for a list of available sensors. Use NULL for all available sensors.

frequency

A named numeric vector with sensors as names and the number of expected samples per hour

relative

Show absolute number of measurements or relative to the expected number? Logical value.

offset

Currently not used.

start_date

A date (or convertible to a date using base::as.Date()) indicating the earliest date to show. Leave empty for all data. Must be used with end_date.

end_date

A date (or convertible to a date using base::as.Date()) indicating the latest date to show.Leave empty for all data. Must be used with start_date.

plot

[Deprecated] Instead of built-in functionality, use plot.coverage() to plot the output.

Value

A ggplot of the coverage results if plot is TRUE or a tibble containing the hour, type of measure (i.e. sensor), and (relative) coverage.

Examples

## Not run: 
freq <- c(
  Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements.
  AirQuality = 1,
  AppUsage = 2, # Once every 30 minutes
  Bluetooth = 60, # Once per minute. Can have multiple measurements.
  Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements.
  Light = 360, # Once per 10 seconds
  Location = 60, # Once per 60 seconds
  Memory = 60, # Once per minute
  Noise = 120,
  Pedometer = 1,
  Weather = 1,
  Wifi = 60 # once per minute
)

coverage(
  db = db,
  participant_id = "12345",
  sensor = c("Accelerometer", "Gyroscope"),
  frequency = mpathsenser::freq,
  start_date = "2021-01-01",
  end_date = "2021-05-01"
)

## End(Not run)

Create a new mpathsenser database

Description

[Stable]

Usage

create_db(path = getwd(), db_name = "sense.db", overwrite = FALSE)

Arguments

path

The path to the database.

db_name

The name of the database.

overwrite

In case a database with db_name already exists, indicate whether it should be overwritten or not. Otherwise, this option is ignored.

Value

A database connection using prepared database schemas.

Examples

# Create a new database in a temporary directory
db <- create_db(tempdir(), "mydb.db")

# You can also create an in-memory database
db2 <- create_db(path = NULL, ":memory:")

# Cleanup
close_db(db)
close_db(db2)
file.remove(file.path(tempdir(), "mydb.db"))

Decrypt GPS data from a curve25519 public key

Description

[Stable]

By default, the latitude and longitude of the GPS data collected by m-Path Sense are encrypted using an asymmetric curve25519 key to provide extra protection for these highly sensitive data. This function takes a character vector and decrypts its longitude and latitude columns using the provided key.

Usage

decrypt_gps(data, key, ignore = ":")

Arguments

data

A character vector containing hexadecimal (i.e. encrypted) data.

key

A curve25519 private key.

ignore

A string with characters to ignore from data. See sodium::hex2bin().

Value

A vector of doubles of the decrypted GPS coordinates.

Parallel

This function supports parallel processing in the sense that it is able to distribute it's computation load among multiple workers. To make use of this functionality, run future::plan("multisession") before calling this function.

Examples

library(dplyr)
library(sodium)
# Create some GPS  coordinates.
data <- data.frame(
  participant_id = "12345",
  time = as.POSIXct(c(
    "2022-12-02 12:00:00",
    "2022-12-02 12:00:01",
    "2022-12-02 12:00:02"
  )),
  longitude = c("50.12345", "50.23456", "50.34567"),
  latitude = c("4.12345", "4.23456", "4.345678")
)

# Generate keypair
key <- sodium::keygen()
pub <- sodium::pubkey(key)

# Encrypt coordinates with pubkey
# You do not need to do this for m-Path Sense
# as this is already encrypted
encrypt <- function(data, pub) {
  data <- lapply(data, charToRaw)
  data <- lapply(data, function(x) sodium::simple_encrypt(x, pub))
  data <- lapply(data, sodium::bin2hex)
  data <- unlist(data)
  data
}
data$longitude <- encrypt(data$longitude, pub)
data$latitude <- encrypt(data$latitude, pub)

# Once the data has been collected, decrypt it using decrypt_gps().
data |>
  mutate(longitude = decrypt_gps(longitude, key)) |>
  mutate(latitude = decrypt_gps(latitude, key))

Get the device info for one or more participants

Description

[Stable]

Usage

device_info(db, participant_id = NULL)

Arguments

db

A database connection to an m-Path Sense database.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

Value

A tibble containing device info for each participant

Examples

## Not run: 
# Open the database
db <- open_db("path/to/db")

# Get device info for all participants
device_info(db)

# Get device info for a specific participant
device_info(db, participant_id = 1)

## End(Not run)

Extract the date of the first entry

Description

[Stable]

A helper function for extracting the first date of entry of (of one or all participant) of one sensor. Note that this function is specific to the first date of a sensor. After all, it wouldn't make sense to extract the first date for a participant of the accelerometer, while the first device measurement occurred a day later.

Usage

first_date(db, sensor, participant_id = NULL)

Arguments

db

A database connection to an m-Path Sense database.

sensor

The name of a sensor. See sensors for a list of available sensors.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

Value

A string in the format 'YYYY-mm-dd' of the first entry date.

Examples

## Not run: 
db <- open_db()
first_date(db, "Accelerometer", "12345")

## End(Not run)

Fix the end of JSON files

Description

[Experimental]

When copying data directly coming from m-Path Sense, JSON files are sometimes corrupted due to the app not properly closing them. This function attempts to fix the most common problems associated with improper file closure by m-Path Sense.

Usage

fix_jsons(path = getwd(), files = NULL, recursive = TRUE)

Arguments

path

The path name of the JSON files.

files

Alternatively, a character list of the input files

recursive

Should the listing recurse into directories?

Details

There are two distinct problems this functions tries to tackle. First of all, there are often bad file endings (e.g. no ]) because the app was closed before it could properly close the file. There are several cases that may be wrong (or even multiple), so it unclear what the precise problems are. As this function is experimental, it may even make it worse by accidentally inserting an incorrect file ending.

Secondly, in rare scenarios there are illegal ASCII characters in the JSON files. Not often does this happen, and it is likely because of an OS failure (such as a flush error), a disk failure, or corrupted data during transmit. Nevertheless, these illegal characters make the file completely unreadable. Fortunately, they are detected correctly by test_jsons, but they cannot be imported by import. This functions attempts to surgically remove lines with illegal characters, by removing that specific line as well as the next line, as this is often a comma. It may therefore be too liberal in its approach – cutting away more data than necessary – or not liberal enough when the corruption has spread throughout multiple lines. Nevertheless, it is a first step in removing some straightforward corruption from files so that only a small number may still need to be fixed by hand.

Value

A message indicating how many files were fixed, and the number of fixed files invisibly.

Parallel

This function supports parallel processing in the sense that it is able to distribute it's computation load among multiple workers. To make use of this functionality, run future::plan("multisession") before calling this function.

Progress

You can be updated of the progress of this function by using the progressr::progress() package. See progressr's vignette on how to subscribe to these updates.

Examples

## Not run: 
future::plan("multisession")
files <- test_jsons()
fix_jsons(files = files)

## End(Not run)

Measurement frequencies per sensor

Description

A numeric vector containing (an example) of example measurement frequencies per sensor. Such input is needed for coverage().

Usage

freq

Format

An object of class numeric of length 11.

Value

This vector contains the following information:

Sensor Frequency (per hour) Full text
Accelerometer 720 Once per 5 seconds. Can have multiple instances.
AirQuality 1 Once per hour.
AppUsage 2 Once every 30 minutes. Can have multiple instances.
Bluetooth 12 Once every 5 minutes. Can have multiple instances.
Gyroscope 720 Once per 5 seconds. Can have multiple instances.
Light 360 Once per 10 seconds.
Location 60 Once every 60 seconds.
Memory 60 Once per minute
Noise 120 Once every 30 seconds. Microphone cannot be used in the background in Android 11.
Weather 1 Once per hour.
Wifi 60 Once per minute.

Examples

freq

Reverse geocoding with latitude and longitude

Description

[Experimental]

This functions allows you to extract information about a place based on the latitude and longitude from the OpenStreetMaps nominatim API.

Usage

geocode_rev(lat, lon, zoom = 18, email = "", rate_limit = 1, format = "jsonv2")

Arguments

lat

The latitude of the location (in degrees)

lon

The longitude of the location (in degrees)

zoom

The desired zoom level from 1-18. The lowest level, 18, is building level.

email

If you are making large numbers of request please include an appropriate email address to identify your requests. See Nominatim's Usage Policy for more details.

rate_limit

The time interval to keep between queries, in seconds. If the rate limit is too low, OpenStreetMaps may reject further requests or even ban your entirely.

format

The format of the response. Either "jsonv2", "geojson", or"geocodejson". See Nomatims documentation for more details.

Value

A list of information about the location. See Nominatim's documentation for more details. The response may also be an error message in case of API errors, or NA if the client or API is offline.

Warning

Do not abuse this function or you will be banned by OpenStreetMap. The maximum number of requests is around 1 per second. Also make sure not to do too many batch lookups, as many subsequent requests will get you blocked as well.

Examples

# Frankfurt Airport
geocode_rev(50.037936, 8.5599631)

Extract data from an m-Path Sense database

Description

[Stable]

This is a convenience function to help extract data from an m-Path sense database.

Usage

get_data(db, sensor, participant_id = NULL, start_date = NULL, end_date = NULL)

Arguments

db

A database connection to an m-Path Sense database.

sensor

The name of a sensor. See sensors for a list of available sensors.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

start_date

Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant.

end_date

Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant.

Details

Note that this function returns a lazy (also called remote) tibble. This means that the data is not actually in R until you call a function that pulls the data from the database. This is useful for various functions in this package that work with a lazy tibble, for example identify_gaps(). You may manually want to modify this lazy tibble by using dplyr functions such as dplyr::filter() or dplyr::mutate() before pulling the data into R. These functions will be executed in-database, and will therefore be much faster than having to first pull all data into R and then possibly removing a large part of it. Importantly, data can pulled into R using dplyr::collect().

Value

A lazy tbl containing the requested data.

Examples

## Not run: 
# Open a database
db <- open_db()

# Retrieve some data
get_data(db, "Accelerometer", "12345")

# Or within a specific window
get_data(db, "Accelerometer", "12345", "2021-01-01", "2021-01-05")

## End(Not run)

Get the number of rows per sensor in a mpathsenser database

Description

[Stable]

Usage

get_nrows(
  db,
  sensor = "All",
  participant_id = NULL,
  start_date = NULL,
  end_date = NULL
)

Arguments

db

db A database connection, as created by create_db().

sensor

A character vector of one or multiple vectors. Use sensor = "All" for all sensors. See sensors for a list of all available sensors.

participant_id

A character string identifying a single participant. Use get_participants() to retrieve all participants from the database. Leave empty to get data for all participants.

start_date

Optional search window specifying date where to begin search. Must be convertible to date using base::as.Date(). Use first_date() to find the date of the first entry for a participant.

end_date

Optional search window specifying date where to end search. Must be convertible to date using base::as.Date(). Use last_date() to find the date of the last entry for a participant.

Value

A named vector containing the number of rows for each sensor.

Examples

## Not run: 
# Open a database connection
db <- open_db("path/to/db")

# Get the number of rows for all sensors
get_nrows(db, sensor = NULL)

# Get the number of rows for the Accelerometer and Gyroscope sensors
get_nrows(db, c("Accelerometer", "Gyroscope"))

# Remember to close the connection
close_db(db)

## End(Not run)

Get all participants

Description

[Stable]

Usage

get_participants(db, lazy = FALSE)

Arguments

db

db A database connection, as created by create_db().

lazy

Whether to evaluate lazily using dbplyr.

Value

A data frame containing all participant_id and study_id.

Examples

# Create a database
db <- create_db(tempdir(), "mydb.db")

# Add some participants
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")
DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')")

# Get the participants
get_participants(db)

# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))

Get all processed files from a database

Description

[Stable]

Usage

get_processed_files(db)

Arguments

db

A database connection, as created by create_db().

Value

A data frame containing the file_name, participant_id, and study_id of the processed files.

Examples

# Create a database
db <- create_db(tempdir(), "mydb.db")

# Add some processed files
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")
DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')")
DBI::dbExecute(db, "INSERT INTO ProcessedFiles VALUES('file1', 'participant1', 'study1')")

# Get the processed files
get_processed_files(db)

# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))

Get all studies

Description

[Stable]

Usage

get_studies(db, lazy = FALSE)

Arguments

db

db A database connection, as created by create_db().

lazy

Whether to evaluate lazily using dbplyr.

Value

A data frame containing all studies.

Examples

# Create a database
db <- create_db(tempdir(), "mydb.db")

# Add some studies
DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')")

# Get the studies
get_studies(db)

# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))

Calculate the Great-Circle Distance between two points in kilometers

Description

[Stable]

Calculate the great-circle distance between two points using the Haversine function.

Usage

haversine(lat1, lon1, lat2, lon2, r = 6371)

Arguments

lat1

The latitude of point 1 in degrees.

lon1

The longitude of point 1 in degrees.

lat2

The latitude of point 2 in degrees.

lon2

The longitude of point 2 in degrees.

r

The average earth radius.

Value

A numeric value of the distance between point 1 and 2 in kilometers.

Examples

fra <- c(50.03333, 8.570556) # Frankfurt Airport
ord <- c(41.97861, -87.90472) # Chicago O'Hare International Airport
haversine(fra[1], fra[2], ord[1], ord[2]) # 6971.059 km

Identify gaps in mpathsenser mobile sensing data

Description

[Stable]

Oftentimes in mobile sensing, gaps appear in the data as a result of the participant accidentally closing the app or the operating system killing the app to save power. This can lead to issues later on during data analysis when it becomes unclear whether there are no measurements because no events occurred or because the app quit in that period. For example, if no screen on/off event occur in a 6-hour period, it can either mean the participant did not turn on their phone in that period or that the app simply quit and potential events were missed. In the latter case, the 6-hour missing period has to be compensated by either removing this interval altogether or by subtracting the gap from the interval itself (see examples).

Usage

identify_gaps(
  db,
  participant_id = NULL,
  min_gap = 60,
  sensor = "Accelerometer"
)

Arguments

db

A database connection to an m-Path Sense database.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

min_gap

The minimum time (in seconds) passed between two subsequent measurements for it to be considered a gap.

sensor

One or multiple sensors. See sensors for a list of available sensors.

Details

While any sensor can be used for identifying gaps, it is best to choose a sensor with a very high, near-continuous sample rate such as the accelerometer or gyroscope. This function then creates time between two subsequent measurements and returns the period in which this time was larger than min_gap.

Note that the from and to columns in the output are character vectors in UTC time.

Value

A tibble containing the time period of the gaps. The structure of this tibble is as follows:

participant_id the participant_id of where the gap occurred
from the time of the last measurement before the gap
to the time of the first measurement after the gap
gap the time passed between from and to, in seconds

Warning

Depending on the sensor that is used to identify the gaps (though this is typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a small delay between the start of the gap and the actual start of the gap. For example, if the accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last accelerometer measurement (so just before the next measurement), the app was killed. However, within that time other measurements may still have taken place, thereby technically occurring "within" the gap. This is especially important if you want to use these gaps in add_gaps since this issue may lead to erroneous results.

An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.

Examples

## Not run: 
# Find the gaps for a participant and convert to datetime
gaps <- identify_gaps(db, "12345", min_gap = 60) |>
  mutate(across(c(to, from), ymd_hms)) |>
  mutate(across(c(to, from), with_tz, "Europe/Brussels"))

# Get some sensor data and calculate a statistic, e.g. the time spent walking
# You can also do this with larger intervals, e.g. the time spent walking per hour
walking_time <- get_data(db, "Activity", "12345") |>
  collect() |>
  mutate(datetime = ymd_hms(paste(date, time))) |>
  mutate(datetime = with_tz(datetime, "Europe/Brussels")) |>
  arrange(datetime) |>
  mutate(prev_time = lag(datetime)) |>
  mutate(duration = datetime - prev_time) |>
  filter(type == "WALKING")

# Find out if a gap occurs in the time intervals
walking_time |>
  rowwise() |>
  mutate(gap = any(gaps$from >= prev_time & gaps$to <= datetime))

## End(Not run)

Import m-Path Sense files into a database

Description

[Stable]

Import JSON files from m-Path Sense into a structured database. This function is the bread and butter of this package, as it populates the database with data that most of the other functions in this package use. It is recommend to first run test_jsons() and, if necessary, fix_jsons() to repair JSON files with problematic syntax.

Usage

import(
  path = getwd(),
  db,
  sensors = NULL,
  batch_size = 24,
  backend = "RSQLite",
  recursive = TRUE
)

Arguments

path

The path to the file directory

db

Valid database connection, typically created by create_db().

sensors

Select one or multiple sensors as in sensors. Leave NULL to extract all sensor data.

batch_size

The number of files that are to be processed in a single batch.

backend

Name of the database backend that is used. Currently, only RSQLite is supported.

recursive

Should the listing recurse into directories?

Details

import allows you to specify which sensors to import (even though there may be more in the files) and it also allows batching for a speedier writing process. If processing in parallel is active, it is recommended that batch_size be a scalar multiple of the number of CPU cores the parallel cluster can use. If a single JSON file in the batch causes and error, the batch is terminated (but not the function) and it is up to the user to fix the file. This means that if batch_size is large, many files will not be processed. Set batch_size to 1 for sequential (one-by-one) file processing.

Currently, only SQLite is supported as a backend. Due to its concurrency restriction, parallel processing works for cleaning the raw data, but not for importing it into the database. This is because SQLite does not allow multiple processes to write to the same database at the same time. This is a limitation of SQLite and not of this package. However, while files are processing individually (and in parallel if specified), writing to the database happens for the entire batch specified by batch_size at once. This means that if a single file in the batch causes an error, the entire batch is skipped. This is to ensure that the database is not left in an inconsistent state.

Value

A message indicating how many files were imported. If all files were imported successfully, this functions returns an empty string invisibly. Otherwise the file names of the files that were not imported are returned visibly.

Parallel

This function supports parallel processing in the sense that it is able to distribute it's computation load among multiple workers. To make use of this functionality, run future::plan("multisession") before calling this function.

Progress

You can be updated of the progress of this function by using the progressr::progress() package. See progressr's vignette on how to subscribe to these updates.

See Also

create_db() for creating a database for import() to use, close_db() for closing this database; index_db() to create indices on the database for faster future processing, and vacuum_db() to shrink the database to its minimal size.

Examples

## Not run: 
path <- "some/path"
# Create a database
db <- create_db(path = path, db_name = "my_db")

# Import all JSON files in the current directory
import(path = path, db = db)

# Import all JSON files in the current directory, but do so sequentially
import(path = path, db = db, batch_size = 1)

# Import all JSON files in the current directory, but only the accelerometer data
import(path = path, db = db, sensors = "accelerometer")

# Import all JSON files in the current directory, but only the accelerometer and gyroscope data
import(path = path, db = db, sensors = c("accelerometer", "gyroscope"))

# Remember to close the database
close_db(db)

## End(Not run)

Create indexes for an mpathsenser database

Description

[Stable]

Create indexes for an mpathsenser database on the participant_id, date, and a combination of these variable for all the tables in the database. This will speed up queries that use these variables in the WHERE clause.

Usage

index_db(db)

Arguments

db

A database connection to an m-Path Sense database.

Value

Returns TRUE invisibly, called for side effects.

Examples

## Not run: 
# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")

# Import some files
import(path = "path/to/jsons", db = db)

# Then index it to speed up the database
index_db(db)

## End(Not run)

Get installed apps

Description

[Stable]

Extract installed apps for one or all participants. Contrarily to other get_* functions in this package, start and end dates are not used since installed apps are assumed to be fixed throughout the study.

Usage

installed_apps(db, participant_id = NULL)

Arguments

db

A database connection to an mpathsenser database.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

Value

A tibble containing app names.

Examples

## Not run: 
db <- open_db()

# Get installed apps for all participants
installed_apps(db)

# Get installed apps for a single participant
installed_apps(db, "12345")

## End(Not run)

Extract the date of the last entry

Description

[Stable]

A helper function for extracting the last date of entry of (of one or all participant) of one sensor. Note that this function is specific to the last date of a sensor. After all, it wouldn't make sense to extract the last date for a participant of the device info, while the last accelerometer measurement occurred a day later.

Usage

last_date(db, sensor, participant_id = NULL)

Arguments

db

A database connection to an m-Path Sense database.

sensor

The name of a sensor. See sensors for a list of available sensors.

participant_id

A character string identifying a single participant. Use get_participants to retrieve all participants from the database. Leave empty to get data for all participants.

Value

A string in the format 'YYYY-mm-dd' of the last entry date.

Examples

## Not run: 
db <- open_db()
first_date(db, "Accelerometer", "12345")

## End(Not run)

Moving average for values in an mpathsenser database

Description

[Experimental]

Usage

moving_average(
  db,
  sensor,
  cols,
  n,
  participant_id = NULL,
  start_date = NULL,
  end_date = NULL
)

Arguments

db

A database connection to an m-Path Sense database.

sensor

The name of a sensor. See sensors for a list of available sensors.

cols

Character vectors of the columns in the sensor table to average over.

n

The number of seconds to average over. The index of the result will be centered compared to the rolling window of observations.

participant_id

A character vector identifying one or multiple participants.

start_date

Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant.

end_date

Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant.

Value

A tibble with the same columns as the input, modified to be a moving average.

Examples

## Not run: 
path <- system.file("testdata", "test.db", package = "mpathsenser")
db <- open_db(NULL, path)
moving_average(
  db = db,
  sensor = "Light",
  cols = c("mean_lux", "max_lux"),
  n = 5, # seconds
  participant_id = "12345"
)
close_db(db)

## End(Not run)

Open an mpathsenser database.

Description

[Stable]

Usage

open_db(path = getwd(), db_name = "sense.db")

Arguments

path

The path to the database. Use NULL to use the full path name in db_name.

db_name

The name of the database.

Value

A connection to an mpathsenser database.

See Also

close_db() for closing a database; copy_db() for copying (part of) a database; index_db() for indexing a database; get_data() for extracting data from a database.

Examples

# First create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")
close_db(db)
DBI::dbIsValid(db) # db is closed

# Then re-open it
db2 <- open_db(tempdir(), "mydb.db")
DBI::dbIsValid(db2) # db is opened

# Cleanup
close_db(db2)
file.remove(file.path(tempdir(), "mydb.db"))

Plot a coverage overview

Description

Plot a coverage overview

Usage

## S3 method for class 'coverage'
plot(x, ...)

Arguments

x

A tibble with the coverage data coming from coverage().

...

Other arguments passed on to methods. Not currently used.

Value

A ggplot2::ggplot object.

See Also

coverage()

Examples

## Not run: 
freq <- c(
  Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements.
  AirQuality = 1,
  AppUsage = 2, # Once every 30 minutes
  Bluetooth = 60, # Once per minute. Can have multiple measurements.
  Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements.
  Light = 360, # Once per 10 seconds
  Location = 60, # Once per 60 seconds
  Memory = 60, # Once per minute
  Noise = 120,
  Pedometer = 1,
  Weather = 1,
  Wifi = 60 # once per minute
)

data <- coverage(
  db = db,
  participant_id = "12345",
  sensor = c("Accelerometer", "Gyroscope"),
  frequency = mpathsenser::freq,
  start_date = "2021-01-01",
  end_date = "2021-05-01"
)

plot(data)

## End(Not run)

Available Sensors

Description

[Stable]

A list containing all available sensors in this package you can work with. This variable was created so it is easier to use in your own functions, e.g. to loop over sensors.

Usage

sensors

Format

An object of class character of length 27.

Value

A character vector containing all sensor names supported by mpathsenser.

Examples

sensors

Test JSON files for being in the correct format.

Description

[Stable]

Usage

test_jsons(path = getwd(), files = NULL, db = NULL, recursive = TRUE)

Arguments

path

The path name of the JSON files.

files

Alternatively, a character list of the input files.

db

A mpathsenser database connection (optional). If provided, will be used to check which files are already in the database and check only those JSON files which are not.

recursive

Should the listing recurse into directories?

Value

A message indicating whether there were any issues and a character vector of the file names that need to be fixed. If there were no issues, an invisible empty string is returned.

Parallel

This function supports parallel processing in the sense that it is able to distribute it's computation load among multiple workers. To make use of this functionality, run future::plan("multisession") before calling this function.

Progress

You can be updated of the progress of this function by using the progressr::progress() package. See progressr's vignette on how to subscribe to these updates.

Examples

## Not run: 
# Test all files in a directory
test_jsons(path = "path/to/jsons", recursive = FALSE)

# Test all files in a directory and its subdirectories
test_jsons(path = "path/to/jsons", recursive = TRUE)

# Test specific files
test_jsons(files = c("file1.json", "file2.json"))

# Test files in a directory, but skip those that are already in the database
test_jsons(path = "path/to/jsons", db = db)

## End(Not run)

Unzip m-Path Sense output

Description

[Stable]

Similar to unzip, but makes it easier to unzip all files in a given path with one function call.

Usage

unzip_data(path = getwd(), to = NULL, overwrite = FALSE, recursive = TRUE)

Arguments

path

The path to the directory containing the zip files.

to

The output path.

overwrite

Logical value whether you want to overwrite already existing zip files.

recursive

Logical value indicating whether to unzip files in subdirectories as well. These files will then be unzipped in their respective subdirectory.

Value

A message indicating how many files were unzipped.

Parallel

This function supports parallel processing in the sense that it is able to distribute it's computation load among multiple workers. To make use of this functionality, run future::plan("multisession") before calling this function.

Progress

You can be updated of the progress of this function by using the progressr::progress() package. See progressr's vignette on how to subscribe to these updates.

Examples

## Not run: 
# Unzip all files in a directory
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = FALSE)

# Unzip all files in a directory and its subdirectories
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = TRUE)

# Unzip files in a directory, but skip those that are already unzipped
unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", overwrite = FALSE)

## End(Not run)

Vacuum a database

Description

[Stable]

This is a convenience function that calls the VACUUM command on a database. This command will rebuild the database file, repacking it into a minimal amount of disk space.

Usage

vacuum_db(db)

Arguments

db

A database connection to an m-Path Sense database.

Value

a scalar numeric that specifies the number of rows affected by the vacuum.

Examples

# Create a database in a temporary directory
db <- create_db(tempdir(), "mydb.db")

# Assuming that we have imported some data into the database, we can vacuum it
vacuum_db(db)

# Cleanup
close_db(db)
file.remove(file.path(tempdir(), "mydb.db"))