Title: | Process and Analyse Data from m-Path Sense |
---|---|
Description: | Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically 'm-Path Sense' <https://m-path.io>. The main task of 'mpathsenser' is therefore to read 'm-Path Sense' JSON files into a database and provide several convenience functions to aid in data processing. |
Authors: | Koen Niemeijer [aut, cre] |
Maintainer: | Koen Niemeijer <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.3.9000 |
Built: | 2025-02-24 16:37:54 UTC |
Source: | https://github.com/koenniem/mpathsenser |
Since there may be many gaps in mobile sensing data, it is pivotal to pay attention to them in
the analysis. This function adds known gaps to data as "measurements", thereby allowing easier
calculations for, for example, finding the duration. For instance, consider a participant spent
30 minutes walking. However, if it is known there is gap of 15 minutes in this interval, we
should somehow account for it. add_gaps
accounts for this by adding the gap data to
sensors data by splitting intervals where gaps occur.
add_gaps(data, gaps, by = NULL, continue = FALSE, fill = NULL)
add_gaps(data, gaps, by = NULL, continue = FALSE, fill = NULL)
data |
A data frame containing the data. See |
gaps |
A data frame (extension) containing the gap data. See |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, |
continue |
Whether to continue the measurement(s) prior to the gap once the gap ends. |
fill |
A named list of the columns to fill with default values for the extra measurements that are added because of the gaps. |
In the example of 30 minutes walking where a 15 minute gap occurred (say after 5
minutes), add_gaps()
adds two rows: one after 5 minutes of the start of the interval
indicating the start of the gap(if needed containing values from fill
), and one after 20
minutes of the start of the interval signalling the walking activity. Then, when calculating
time differences between subsequent measurements, the gap period is appropriately accounted
for. Note that if multiple measurements occurred before the gap, they will both be continued
after the gap.
A tibble containing the data and the added gaps.
Depending on the sensor that is used to identify the gaps (though this is
typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a
small delay between the start of the gap and the actual start of the gap. For example, if the
accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last
accelerometer measurement (so just before the next measurement), the app was killed. However,
within that time other measurements may still have taken place, thereby technically occurring
"within" the gap. This is especially important if you want to use these gaps in
add_gaps
since this issue may lead to erroneous results.
An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.
identify_gaps()
for finding gaps in the sampling; link_gaps()
for linking gaps to
ESM data, analogous to link()
.
# Define some data dat <- data.frame( participant_id = "12345", time = as.POSIXct(c("2022-05-10 10:00:00", "2022-05-10 10:30:00", "2022-05-10 11:30:00")), type = c("WALKING", "STILL", "RUNNING"), confidence = c(80, 100, 20) ) # Get the gaps from identify_gaps, but in this example define them ourselves gaps <- data.frame( participant_id = "12345", from = as.POSIXct(c("2022-05-10 10:05:00", "2022-05-10 10:50:00")), to = as.POSIXct(c("2022-05-10 10:20:00", "2022-05-10 11:10:00")) ) # Now add the gaps to the data add_gaps( data = dat, gaps = gaps, by = "participant_id" ) # You can use fill if you want to get rid of those pesky NA's add_gaps( data = dat, gaps = gaps, by = "participant_id", fill = list(type = "GAP", confidence = 100) )
# Define some data dat <- data.frame( participant_id = "12345", time = as.POSIXct(c("2022-05-10 10:00:00", "2022-05-10 10:30:00", "2022-05-10 11:30:00")), type = c("WALKING", "STILL", "RUNNING"), confidence = c(80, 100, 20) ) # Get the gaps from identify_gaps, but in this example define them ourselves gaps <- data.frame( participant_id = "12345", from = as.POSIXct(c("2022-05-10 10:05:00", "2022-05-10 10:50:00")), to = as.POSIXct(c("2022-05-10 10:20:00", "2022-05-10 11:10:00")) ) # Now add the gaps to the data add_gaps( data = dat, gaps = gaps, by = "participant_id" ) # You can use fill if you want to get rid of those pesky NA's add_gaps( data = dat, gaps = gaps, by = "participant_id", fill = list(type = "GAP", confidence = 100) )
This function scrapes the Google Play Store by using name
as the search term. From there
it selects the first result in the list and its corresponding category and package name.
app_category(name, num = 1, rate_limit = 5, exact = TRUE)
app_category(name, num = 1, rate_limit = 5, exact = TRUE)
name |
The name of the app to search for. |
num |
Which result should be selected in the list of search results. Defaults to one. |
rate_limit |
The time interval to keep between queries, in seconds. If the rate limit is too low, the Google Play Store may reject further requests or even ban your entirely. |
exact |
In m-Path Sense, the app names of the AppUsage sensor are the last part of the app's
package names. When |
A list containing the following fields:
package | the package name that was selected from the Google Play search |
genre | the corresponding genre of this package |
Do not abuse this function or you will be banned by the Google Play Store. The minimum delay between requests seems to be around 5 seconds, but this is untested. Also make sure not to do batch lookups, as many subsequent requests will get you blocked as well.
app_category("whatsapp") # Example of a generic app name where we can't find a specific app app_category("weather") # Weather forecast channel # Get OnePlus weather app_category("net.oneplus.weather")
app_category("whatsapp") # Example of a generic app name where we can't find a specific app app_category("weather") # Weather forecast channel # Get OnePlus weather app_category("net.oneplus.weather")
In time series with variable measurements, an often recurring task is calculating the total time spent (i.e. the duration) in fixed bins, for example per hour or day. However, this may be difficult when two subsequent measurements are in different bins or span over multiple bins.
bin_data( data, start_time, end_time, by = c("sec", "min", "hour", "day"), fixed = TRUE, .name = "bin" )
bin_data( data, start_time, end_time, by = c("sec", "min", "hour", "day"), fixed = TRUE, .name = "bin" )
data |
A data frame or tibble containing the time series. |
start_time |
The column name of the start time of the interval, a POSIXt. |
end_time |
The column name of the end time of the interval, a POSIXt. |
by |
A binning specification. |
fixed |
Whether to create fixed bins. If |
.name |
The name of the column containing the nested data. |
A tibble containing the group columns (if any), date, hour (if by = "hour"
), and
the duration in seconds.
link_gaps()
for linking gaps to data.
library(dplyr) data <- tibble( participant_id = 1, datetime = c( "2022-06-21 15:00:00", "2022-06-21 15:55:00", "2022-06-21 17:05:00", "2022-06-21 17:10:00" ), confidence = 100, type = "WALKING" ) # get bins per hour, even if the interval is longer than one hour data |> mutate(datetime = as.POSIXct(datetime)) |> mutate(lead = lead(datetime)) |> bin_data( start_time = datetime, end_time = lead, by = "hour" ) # Alternatively, you can give an integer value to by to create custom-sized # bins, but only if fixed = FALSE. Not that these bins are not rounded to, # as in this example 30 minutes, but rather depends on the earliest time # in the group. data |> mutate(datetime = as.POSIXct(datetime)) |> mutate(lead = lead(datetime)) |> bin_data( start_time = datetime, end_time = lead, by = 1800L, fixed = FALSE ) # More complicated data for showcasing grouping: data <- tibble( participant_id = 1, datetime = c( "2022-06-21 15:00:00", "2022-06-21 15:55:00", "2022-06-21 17:05:00", "2022-06-21 17:10:00" ), confidence = 100, type = c("STILL", "WALKING", "STILL", "WALKING") ) # binned_intervals also takes into account the prior grouping structure out <- data |> mutate(datetime = as.POSIXct(datetime)) |> group_by(participant_id) |> mutate(lead = lead(datetime)) |> group_by(participant_id, type) |> bin_data( start_time = datetime, end_time = lead, by = "hour" ) print(out) # To get the duration for each bin (note to change the variable names in sum): purrr::map_dbl( out$bin_data, ~ sum(as.double(.x$lead) - as.double(.x$datetime), na.rm = TRUE ) ) # Or: out |> tidyr::unnest(bin_data, keep_empty = TRUE) |> mutate(duration = .data$lead - .data$datetime) |> group_by(bin, .add = TRUE) |> summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")
library(dplyr) data <- tibble( participant_id = 1, datetime = c( "2022-06-21 15:00:00", "2022-06-21 15:55:00", "2022-06-21 17:05:00", "2022-06-21 17:10:00" ), confidence = 100, type = "WALKING" ) # get bins per hour, even if the interval is longer than one hour data |> mutate(datetime = as.POSIXct(datetime)) |> mutate(lead = lead(datetime)) |> bin_data( start_time = datetime, end_time = lead, by = "hour" ) # Alternatively, you can give an integer value to by to create custom-sized # bins, but only if fixed = FALSE. Not that these bins are not rounded to, # as in this example 30 minutes, but rather depends on the earliest time # in the group. data |> mutate(datetime = as.POSIXct(datetime)) |> mutate(lead = lead(datetime)) |> bin_data( start_time = datetime, end_time = lead, by = 1800L, fixed = FALSE ) # More complicated data for showcasing grouping: data <- tibble( participant_id = 1, datetime = c( "2022-06-21 15:00:00", "2022-06-21 15:55:00", "2022-06-21 17:05:00", "2022-06-21 17:10:00" ), confidence = 100, type = c("STILL", "WALKING", "STILL", "WALKING") ) # binned_intervals also takes into account the prior grouping structure out <- data |> mutate(datetime = as.POSIXct(datetime)) |> group_by(participant_id) |> mutate(lead = lead(datetime)) |> group_by(participant_id, type) |> bin_data( start_time = datetime, end_time = lead, by = "hour" ) print(out) # To get the duration for each bin (note to change the variable names in sum): purrr::map_dbl( out$bin_data, ~ sum(as.double(.x$lead) - as.double(.x$datetime), na.rm = TRUE ) ) # Or: out |> tidyr::unnest(bin_data, keep_empty = TRUE) |> mutate(duration = .data$lead - .data$datetime) |> group_by(bin, .add = TRUE) |> summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")
Copy zip files from a source destination to an origin destination where they do not yet exist. That is, it only updates the target folder from the source folder.
ccopy(from, to, recursive = TRUE)
ccopy(from, to, recursive = TRUE)
from |
A path to copy files from. |
to |
A path to copy files to. |
recursive |
Should files from subdirectories be copied? |
A message indicating how many files were copied.
## Not run: ccopy("K:/data/myproject/", "~/myproject") ## End(Not run)
## Not run: ccopy("K:/data/myproject/", "~/myproject") ## End(Not run)
This is a convenience function that is simply a wrapper around DBI::dbDisconnect()
.
close_db(db)
close_db(db)
db |
A database connection to an m-Path Sense database. |
Returns invisibly regardless of whether the database is active, valid, or even exists.
open_db()
for opening an mpathsenser database.
# First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Then close it close_db(db) # You can even try to close a database that is already closed. This will not trigger an error. close_db(db) # Cleanup file.remove(file.path(tempdir(), "mydb.db"))
# First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Then close it close_db(db) # You can even try to close a database that is already closed. This will not trigger an error. close_db(db) # Cleanup file.remove(file.path(tempdir(), "mydb.db"))
copy_db(source_db, target_db, sensor = "All")
copy_db(source_db, target_db, sensor = "All")
source_db |
A mpathsenser database connection from where the data will be transferred. |
target_db |
A mpathsenser database connection where the data will be transferred to.
|
sensor |
A character vector containing one or multiple sensors. See
|
Returns TRUE
invisibly, called for side effects.
# First create two databases in a temporary directory db1 <- create_db(tempdir(), "mydb1.db") db2 <- create_db(tempdir(), "mydb2.db") # Populate the first database with some data DBI::dbExecute(db1, "INSERT INTO Study VALUES ('study_1', 'default')") DBI::dbExecute(db1, "INSERT INTO Participant VALUES ('1', 'study_1')") DBI::dbExecute(db1, "INSERT INTO Activity VALUES( '123', '1', '2024-01-01', '08:00:00', '100', 'WALKING')") # Then copy the first database to the second database copy_db(db1, db2) # Check that the second database has the same data as the first database get_data(db2, "Activity") # Cleanup close_db(db1) close_db(db2) file.remove(file.path(tempdir(), "mydb1.db")) file.remove(file.path(tempdir(), "mydb2.db"))
# First create two databases in a temporary directory db1 <- create_db(tempdir(), "mydb1.db") db2 <- create_db(tempdir(), "mydb2.db") # Populate the first database with some data DBI::dbExecute(db1, "INSERT INTO Study VALUES ('study_1', 'default')") DBI::dbExecute(db1, "INSERT INTO Participant VALUES ('1', 'study_1')") DBI::dbExecute(db1, "INSERT INTO Activity VALUES( '123', '1', '2024-01-01', '08:00:00', '100', 'WALKING')") # Then copy the first database to the second database copy_db(db1, db2) # Check that the second database has the same data as the first database get_data(db2, "Activity") # Cleanup close_db(db1) close_db(db2) file.remove(file.path(tempdir(), "mydb1.db")) file.remove(file.path(tempdir(), "mydb2.db"))
Only applicable to non-reactive sensors with 'continuous' sampling
coverage( db, participant_id, sensor = NULL, frequency = mpathsenser::freq, relative = TRUE, offset = "None", start_date = NULL, end_date = NULL, plot = deprecated() )
coverage( db, participant_id, sensor = NULL, frequency = mpathsenser::freq, relative = TRUE, offset = "None", start_date = NULL, end_date = NULL, plot = deprecated() )
db |
A valid database connection. Schema must be that as it is created by open_db. |
participant_id |
A character string of one participant ID. |
sensor |
A character vector containing one or multiple sensors. See
|
frequency |
A named numeric vector with sensors as names and the number of expected samples per hour |
relative |
Show absolute number of measurements or relative to the expected number? Logical value. |
offset |
Currently not used. |
start_date |
A date (or convertible to a date using |
end_date |
A date (or convertible to a date using |
plot |
|
A ggplot of the coverage results if plot
is TRUE
or a tibble containing the
hour, type of measure (i.e. sensor), and (relative) coverage.
## Not run: freq <- c( Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements. AirQuality = 1, AppUsage = 2, # Once every 30 minutes Bluetooth = 60, # Once per minute. Can have multiple measurements. Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements. Light = 360, # Once per 10 seconds Location = 60, # Once per 60 seconds Memory = 60, # Once per minute Noise = 120, Pedometer = 1, Weather = 1, Wifi = 60 # once per minute ) coverage( db = db, participant_id = "12345", sensor = c("Accelerometer", "Gyroscope"), frequency = mpathsenser::freq, start_date = "2021-01-01", end_date = "2021-05-01" ) ## End(Not run)
## Not run: freq <- c( Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements. AirQuality = 1, AppUsage = 2, # Once every 30 minutes Bluetooth = 60, # Once per minute. Can have multiple measurements. Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements. Light = 360, # Once per 10 seconds Location = 60, # Once per 60 seconds Memory = 60, # Once per minute Noise = 120, Pedometer = 1, Weather = 1, Wifi = 60 # once per minute ) coverage( db = db, participant_id = "12345", sensor = c("Accelerometer", "Gyroscope"), frequency = mpathsenser::freq, start_date = "2021-01-01", end_date = "2021-05-01" ) ## End(Not run)
create_db(path = getwd(), db_name = "sense.db", overwrite = FALSE)
create_db(path = getwd(), db_name = "sense.db", overwrite = FALSE)
path |
The path to the database. |
db_name |
The name of the database. |
overwrite |
In case a database with |
A database connection using prepared database schemas.
# Create a new database in a temporary directory db <- create_db(tempdir(), "mydb.db") # You can also create an in-memory database db2 <- create_db(path = NULL, ":memory:") # Cleanup close_db(db) close_db(db2) file.remove(file.path(tempdir(), "mydb.db"))
# Create a new database in a temporary directory db <- create_db(tempdir(), "mydb.db") # You can also create an in-memory database db2 <- create_db(path = NULL, ":memory:") # Cleanup close_db(db) close_db(db2) file.remove(file.path(tempdir(), "mydb.db"))
By default, the latitude and longitude of the GPS data collected by m-Path Sense are encrypted
using an asymmetric curve25519 key to provide extra protection for these highly sensitive data.
This function takes a character vector and decrypts its longitude and latitude columns using the
provided key
.
decrypt_gps(data, key, ignore = ":")
decrypt_gps(data, key, ignore = ":")
data |
A character vector containing hexadecimal (i.e. encrypted) data. |
key |
A curve25519 private key. |
ignore |
A string with characters to ignore from |
A vector of doubles of the decrypted GPS coordinates.
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
library(dplyr) library(sodium) # Create some GPS coordinates. data <- data.frame( participant_id = "12345", time = as.POSIXct(c( "2022-12-02 12:00:00", "2022-12-02 12:00:01", "2022-12-02 12:00:02" )), longitude = c("50.12345", "50.23456", "50.34567"), latitude = c("4.12345", "4.23456", "4.345678") ) # Generate keypair key <- sodium::keygen() pub <- sodium::pubkey(key) # Encrypt coordinates with pubkey # You do not need to do this for m-Path Sense # as this is already encrypted encrypt <- function(data, pub) { data <- lapply(data, charToRaw) data <- lapply(data, function(x) sodium::simple_encrypt(x, pub)) data <- lapply(data, sodium::bin2hex) data <- unlist(data) data } data$longitude <- encrypt(data$longitude, pub) data$latitude <- encrypt(data$latitude, pub) # Once the data has been collected, decrypt it using decrypt_gps(). data |> mutate(longitude = decrypt_gps(longitude, key)) |> mutate(latitude = decrypt_gps(latitude, key))
library(dplyr) library(sodium) # Create some GPS coordinates. data <- data.frame( participant_id = "12345", time = as.POSIXct(c( "2022-12-02 12:00:00", "2022-12-02 12:00:01", "2022-12-02 12:00:02" )), longitude = c("50.12345", "50.23456", "50.34567"), latitude = c("4.12345", "4.23456", "4.345678") ) # Generate keypair key <- sodium::keygen() pub <- sodium::pubkey(key) # Encrypt coordinates with pubkey # You do not need to do this for m-Path Sense # as this is already encrypted encrypt <- function(data, pub) { data <- lapply(data, charToRaw) data <- lapply(data, function(x) sodium::simple_encrypt(x, pub)) data <- lapply(data, sodium::bin2hex) data <- unlist(data) data } data$longitude <- encrypt(data$longitude, pub) data$latitude <- encrypt(data$latitude, pub) # Once the data has been collected, decrypt it using decrypt_gps(). data |> mutate(longitude = decrypt_gps(longitude, key)) |> mutate(latitude = decrypt_gps(latitude, key))
device_info(db, participant_id = NULL)
device_info(db, participant_id = NULL)
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
A tibble containing device info for each participant
## Not run: # Open the database db <- open_db("path/to/db") # Get device info for all participants device_info(db) # Get device info for a specific participant device_info(db, participant_id = 1) ## End(Not run)
## Not run: # Open the database db <- open_db("path/to/db") # Get device info for all participants device_info(db) # Get device info for a specific participant device_info(db, participant_id = 1) ## End(Not run)
A helper function for extracting the first date of entry of (of one or all participant) of one sensor. Note that this function is specific to the first date of a sensor. After all, it wouldn't make sense to extract the first date for a participant of the accelerometer, while the first device measurement occurred a day later.
first_date(db, sensor, participant_id = NULL)
first_date(db, sensor, participant_id = NULL)
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
A string in the format 'YYYY-mm-dd' of the first entry date.
## Not run: db <- open_db() first_date(db, "Accelerometer", "12345") ## End(Not run)
## Not run: db <- open_db() first_date(db, "Accelerometer", "12345") ## End(Not run)
When copying data directly coming from m-Path Sense, JSON files are sometimes corrupted due to the app not properly closing them. This function attempts to fix the most common problems associated with improper file closure by m-Path Sense.
fix_jsons(path = getwd(), files = NULL, recursive = TRUE)
fix_jsons(path = getwd(), files = NULL, recursive = TRUE)
path |
The path name of the JSON files. |
files |
Alternatively, a character list of the input files |
recursive |
Should the listing recurse into directories? |
There are two distinct problems this functions tries to tackle. First of all, there are often
bad file endings (e.g. no ]
) because the app was closed before it could properly close
the file. There are several cases that may be wrong (or even multiple), so it unclear what the
precise problems are. As this function is experimental, it may even make it worse by accidentally
inserting an incorrect file ending.
Secondly, in rare scenarios there are illegal ASCII characters in the JSON files. Not often does this happen, and it is likely because of an OS failure (such as a flush error), a disk failure, or corrupted data during transmit. Nevertheless, these illegal characters make the file completely unreadable. Fortunately, they are detected correctly by test_jsons, but they cannot be imported by import. This functions attempts to surgically remove lines with illegal characters, by removing that specific line as well as the next line, as this is often a comma. It may therefore be too liberal in its approach – cutting away more data than necessary – or not liberal enough when the corruption has spread throughout multiple lines. Nevertheless, it is a first step in removing some straightforward corruption from files so that only a small number may still need to be fixed by hand.
A message indicating how many files were fixed, and the number of fixed files invisibly.
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
## Not run: future::plan("multisession") files <- test_jsons() fix_jsons(files = files) ## End(Not run)
## Not run: future::plan("multisession") files <- test_jsons() fix_jsons(files = files) ## End(Not run)
A numeric vector containing (an example) of example measurement frequencies per sensor.
Such input is needed for coverage()
.
freq
freq
An object of class numeric
of length 11.
This vector contains the following information:
Sensor | Frequency (per hour) | Full text |
Accelerometer | 720 | Once per 5 seconds. Can have multiple instances. |
AirQuality | 1 | Once per hour. |
AppUsage | 2 | Once every 30 minutes. Can have multiple instances. |
Bluetooth | 12 | Once every 5 minutes. Can have multiple instances. |
Gyroscope | 720 | Once per 5 seconds. Can have multiple instances. |
Light | 360 | Once per 10 seconds. |
Location | 60 | Once every 60 seconds. |
Memory | 60 | Once per minute |
Noise | 120 | Once every 30 seconds. Microphone cannot be used in the background in Android 11. |
Weather | 1 | Once per hour. |
Wifi | 60 | Once per minute. |
freq
freq
This functions allows you to extract information about a place based on the latitude and longitude from the OpenStreetMaps nominatim API.
geocode_rev(lat, lon, zoom = 18, email = "", rate_limit = 1, format = "jsonv2")
geocode_rev(lat, lon, zoom = 18, email = "", rate_limit = 1, format = "jsonv2")
lat |
The latitude of the location (in degrees) |
lon |
The longitude of the location (in degrees) |
zoom |
The desired zoom level from 1-18. The lowest level, 18, is building level. |
email |
If you are making large numbers of request please include an appropriate email address to identify your requests. See Nominatim's Usage Policy for more details. |
rate_limit |
The time interval to keep between queries, in seconds. If the rate limit is too low, OpenStreetMaps may reject further requests or even ban your entirely. |
format |
The format of the response. Either "jsonv2", "geojson", or"geocodejson". See Nomatims documentation for more details. |
A list of information about the location. See Nominatim's documentation
for more details. The response may also be an error message in case of API errors, or NA
if
the client or API is offline.
Do not abuse this function or you will be banned by OpenStreetMap. The maximum number of requests is around 1 per second. Also make sure not to do too many batch lookups, as many subsequent requests will get you blocked as well.
# Frankfurt Airport geocode_rev(50.037936, 8.5599631)
# Frankfurt Airport geocode_rev(50.037936, 8.5599631)
This is a convenience function to help extract data from an m-Path sense database.
get_data(db, sensor, participant_id = NULL, start_date = NULL, end_date = NULL)
get_data(db, sensor, participant_id = NULL, start_date = NULL, end_date = NULL)
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
Note that this function returns a lazy (also called remote) tibble
. This means that
the data is not actually in R until you call a function that pulls the data from the database.
This is useful for various functions in this package that work with a lazy tibble, for example
identify_gaps()
. You may manually want to modify this lazy tibble
by using dplyr
functions such as dplyr::filter()
or dplyr::mutate()
before pulling the data into R. These
functions will be executed in-database, and will therefore be much faster than having to first
pull all data into R and then possibly removing a large part of it. Importantly, data can
pulled into R using dplyr::collect()
.
A lazy tbl
containing the requested data.
## Not run: # Open a database db <- open_db() # Retrieve some data get_data(db, "Accelerometer", "12345") # Or within a specific window get_data(db, "Accelerometer", "12345", "2021-01-01", "2021-01-05") ## End(Not run)
## Not run: # Open a database db <- open_db() # Retrieve some data get_data(db, "Accelerometer", "12345") # Or within a specific window get_data(db, "Accelerometer", "12345", "2021-01-01", "2021-01-05") ## End(Not run)
get_nrows( db, sensor = "All", participant_id = NULL, start_date = NULL, end_date = NULL )
get_nrows( db, sensor = "All", participant_id = NULL, start_date = NULL, end_date = NULL )
db |
db A database connection, as created by |
sensor |
A character vector of one or multiple vectors. Use |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be
convertible to date using |
end_date |
Optional search window specifying date where to end search. Must be convertible
to date using |
A named vector containing the number of rows for each sensor.
## Not run: # Open a database connection db <- open_db("path/to/db") # Get the number of rows for all sensors get_nrows(db, sensor = NULL) # Get the number of rows for the Accelerometer and Gyroscope sensors get_nrows(db, c("Accelerometer", "Gyroscope")) # Remember to close the connection close_db(db) ## End(Not run)
## Not run: # Open a database connection db <- open_db("path/to/db") # Get the number of rows for all sensors get_nrows(db, sensor = NULL) # Get the number of rows for the Accelerometer and Gyroscope sensors get_nrows(db, c("Accelerometer", "Gyroscope")) # Remember to close the connection close_db(db) ## End(Not run)
get_participants(db, lazy = FALSE)
get_participants(db, lazy = FALSE)
db |
db A database connection, as created by |
lazy |
Whether to evaluate lazily using dbplyr. |
A data frame containing all participant_id
and study_id
.
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some participants DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')") # Get the participants get_participants(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some participants DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')") # Get the participants get_participants(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
get_processed_files(db)
get_processed_files(db)
db |
A database connection, as created by |
A data frame containing the file_name
, participant_id
, and study_id
of the
processed files.
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some processed files DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')") DBI::dbExecute(db, "INSERT INTO ProcessedFiles VALUES('file1', 'participant1', 'study1')") # Get the processed files get_processed_files(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some processed files DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") DBI::dbExecute(db, "INSERT INTO Participant VALUES('participant1', 'study1')") DBI::dbExecute(db, "INSERT INTO ProcessedFiles VALUES('file1', 'participant1', 'study1')") # Get the processed files get_processed_files(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
get_studies(db, lazy = FALSE)
get_studies(db, lazy = FALSE)
db |
db A database connection, as created by |
lazy |
Whether to evaluate lazily using dbplyr. |
A data frame containing all studies.
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some studies DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") # Get the studies get_studies(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
# Create a database db <- create_db(tempdir(), "mydb.db") # Add some studies DBI::dbExecute(db, "INSERT INTO Study VALUES('study1', 'data_format1')") # Get the studies get_studies(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
Calculate the great-circle distance between two points using the Haversine function.
haversine(lat1, lon1, lat2, lon2, r = 6371)
haversine(lat1, lon1, lat2, lon2, r = 6371)
lat1 |
The latitude of point 1 in degrees. |
lon1 |
The longitude of point 1 in degrees. |
lat2 |
The latitude of point 2 in degrees. |
lon2 |
The longitude of point 2 in degrees. |
r |
The average earth radius. |
A numeric value of the distance between point 1 and 2 in kilometers.
fra <- c(50.03333, 8.570556) # Frankfurt Airport ord <- c(41.97861, -87.90472) # Chicago O'Hare International Airport haversine(fra[1], fra[2], ord[1], ord[2]) # 6971.059 km
fra <- c(50.03333, 8.570556) # Frankfurt Airport ord <- c(41.97861, -87.90472) # Chicago O'Hare International Airport haversine(fra[1], fra[2], ord[1], ord[2]) # 6971.059 km
Oftentimes in mobile sensing, gaps appear in the data as a result of the participant accidentally closing the app or the operating system killing the app to save power. This can lead to issues later on during data analysis when it becomes unclear whether there are no measurements because no events occurred or because the app quit in that period. For example, if no screen on/off event occur in a 6-hour period, it can either mean the participant did not turn on their phone in that period or that the app simply quit and potential events were missed. In the latter case, the 6-hour missing period has to be compensated by either removing this interval altogether or by subtracting the gap from the interval itself (see examples).
identify_gaps( db, participant_id = NULL, min_gap = 60, sensor = "Accelerometer" )
identify_gaps( db, participant_id = NULL, min_gap = 60, sensor = "Accelerometer" )
db |
A database connection to an m-Path Sense database. |
participant_id |
A character string identifying a single participant. Use
|
min_gap |
The minimum time (in seconds) passed between two subsequent measurements for it to be considered a gap. |
sensor |
One or multiple sensors. See sensors for a list of available sensors. |
While any sensor can be used for identifying gaps, it is best to choose a sensor with a
very high, near-continuous sample rate such as the accelerometer or gyroscope. This function
then creates time between two subsequent measurements and returns the period in which this time
was larger than min_gap
.
Note that the from
and to
columns in the output are character vectors in UTC
time.
A tibble containing the time period of the gaps. The structure of this tibble is as follows:
participant_id | the participant_id of where the gap occurred |
from | the time of the last measurement before the gap |
to | the time of the first measurement after the gap |
gap | the time passed between from and to, in seconds |
Depending on the sensor that is used to identify the gaps (though this is
typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a
small delay between the start of the gap and the actual start of the gap. For example, if the
accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last
accelerometer measurement (so just before the next measurement), the app was killed. However,
within that time other measurements may still have taken place, thereby technically occurring
"within" the gap. This is especially important if you want to use these gaps in
add_gaps
since this issue may lead to erroneous results.
An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.
## Not run: # Find the gaps for a participant and convert to datetime gaps <- identify_gaps(db, "12345", min_gap = 60) |> mutate(across(c(to, from), ymd_hms)) |> mutate(across(c(to, from), with_tz, "Europe/Brussels")) # Get some sensor data and calculate a statistic, e.g. the time spent walking # You can also do this with larger intervals, e.g. the time spent walking per hour walking_time <- get_data(db, "Activity", "12345") |> collect() |> mutate(datetime = ymd_hms(paste(date, time))) |> mutate(datetime = with_tz(datetime, "Europe/Brussels")) |> arrange(datetime) |> mutate(prev_time = lag(datetime)) |> mutate(duration = datetime - prev_time) |> filter(type == "WALKING") # Find out if a gap occurs in the time intervals walking_time |> rowwise() |> mutate(gap = any(gaps$from >= prev_time & gaps$to <= datetime)) ## End(Not run)
## Not run: # Find the gaps for a participant and convert to datetime gaps <- identify_gaps(db, "12345", min_gap = 60) |> mutate(across(c(to, from), ymd_hms)) |> mutate(across(c(to, from), with_tz, "Europe/Brussels")) # Get some sensor data and calculate a statistic, e.g. the time spent walking # You can also do this with larger intervals, e.g. the time spent walking per hour walking_time <- get_data(db, "Activity", "12345") |> collect() |> mutate(datetime = ymd_hms(paste(date, time))) |> mutate(datetime = with_tz(datetime, "Europe/Brussels")) |> arrange(datetime) |> mutate(prev_time = lag(datetime)) |> mutate(duration = datetime - prev_time) |> filter(type == "WALKING") # Find out if a gap occurs in the time intervals walking_time |> rowwise() |> mutate(gap = any(gaps$from >= prev_time & gaps$to <= datetime)) ## End(Not run)
Import JSON files from m-Path Sense into a structured database. This function is the bread and
butter of this package, as it populates the database with data that most of the other functions
in this package use. It is recommend to first run test_jsons()
and, if necessary,
fix_jsons()
to repair JSON files with problematic syntax.
import( path = getwd(), db, sensors = NULL, batch_size = 24, backend = "RSQLite", recursive = TRUE )
import( path = getwd(), db, sensors = NULL, batch_size = 24, backend = "RSQLite", recursive = TRUE )
path |
The path to the file directory |
db |
Valid database connection, typically created by |
sensors |
Select one or multiple sensors as in |
batch_size |
The number of files that are to be processed in a single batch. |
backend |
Name of the database backend that is used. Currently, only RSQLite is supported. |
recursive |
Should the listing recurse into directories? |
import
allows you to specify which sensors to import (even though there may be more in
the files) and it also allows batching for a speedier writing process. If processing in
parallel is active, it is recommended that batch_size
be a scalar multiple of the number of
CPU cores the parallel cluster can use. If a single JSON file in the batch causes and error,
the batch is terminated (but not the function) and it is up to the user to fix the file. This
means that if batch_size
is large, many files will not be processed. Set batch_size
to 1
for sequential (one-by-one) file processing.
Currently, only SQLite is supported as a backend. Due to its concurrency restriction, parallel
processing works for cleaning the raw data, but not for importing it into the database. This is
because SQLite does not allow multiple processes to write to the same database at the same
time. This is a limitation of SQLite and not of this package. However, while files are
processing individually (and in parallel if specified), writing to the database happens for the
entire batch specified by batch_size
at once. This means that if a single file in the batch
causes an error, the entire batch is skipped. This is to ensure that the database is not left
in an inconsistent state.
A message indicating how many files were imported. If all files were imported successfully, this functions returns an empty string invisibly. Otherwise the file names of the files that were not imported are returned visibly.
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
create_db()
for creating a database for import()
to use, close_db()
for closing
this database; index_db()
to create indices on the database for faster future processing, and
vacuum_db()
to shrink the database to its minimal size.
## Not run: path <- "some/path" # Create a database db <- create_db(path = path, db_name = "my_db") # Import all JSON files in the current directory import(path = path, db = db) # Import all JSON files in the current directory, but do so sequentially import(path = path, db = db, batch_size = 1) # Import all JSON files in the current directory, but only the accelerometer data import(path = path, db = db, sensors = "accelerometer") # Import all JSON files in the current directory, but only the accelerometer and gyroscope data import(path = path, db = db, sensors = c("accelerometer", "gyroscope")) # Remember to close the database close_db(db) ## End(Not run)
## Not run: path <- "some/path" # Create a database db <- create_db(path = path, db_name = "my_db") # Import all JSON files in the current directory import(path = path, db = db) # Import all JSON files in the current directory, but do so sequentially import(path = path, db = db, batch_size = 1) # Import all JSON files in the current directory, but only the accelerometer data import(path = path, db = db, sensors = "accelerometer") # Import all JSON files in the current directory, but only the accelerometer and gyroscope data import(path = path, db = db, sensors = c("accelerometer", "gyroscope")) # Remember to close the database close_db(db) ## End(Not run)
Create indexes for an mpathsenser database on the participant_id
, date
, and a combination
of these variable for all the tables in the database. This will speed up queries that use these
variables in the WHERE
clause.
index_db(db)
index_db(db)
db |
A database connection to an m-Path Sense database. |
Returns TRUE
invisibly, called for side effects.
## Not run: # First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Import some files import(path = "path/to/jsons", db = db) # Then index it to speed up the database index_db(db) ## End(Not run)
## Not run: # First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Import some files import(path = "path/to/jsons", db = db) # Then index it to speed up the database index_db(db) ## End(Not run)
Extract installed apps for one or all participants. Contrarily to other get_* functions in this package, start and end dates are not used since installed apps are assumed to be fixed throughout the study.
installed_apps(db, participant_id = NULL)
installed_apps(db, participant_id = NULL)
db |
A database connection to an mpathsenser database. |
participant_id |
A character string identifying a single participant. Use
|
A tibble containing app names.
## Not run: db <- open_db() # Get installed apps for all participants installed_apps(db) # Get installed apps for a single participant installed_apps(db, "12345") ## End(Not run)
## Not run: db <- open_db() # Get installed apps for all participants installed_apps(db) # Get installed apps for a single participant installed_apps(db, "12345") ## End(Not run)
A helper function for extracting the last date of entry of (of one or all participant) of one sensor. Note that this function is specific to the last date of a sensor. After all, it wouldn't make sense to extract the last date for a participant of the device info, while the last accelerometer measurement occurred a day later.
last_date(db, sensor, participant_id = NULL)
last_date(db, sensor, participant_id = NULL)
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
participant_id |
A character string identifying a single participant. Use
|
A string in the format 'YYYY-mm-dd' of the last entry date.
## Not run: db <- open_db() first_date(db, "Accelerometer", "12345") ## End(Not run)
## Not run: db <- open_db() first_date(db, "Accelerometer", "12345") ## End(Not run)
One of the key tasks in analysing mobile sensing data is being able to link it to other data.
For example, when analysing physical activity data, it could be of interest to know how much
time a participant spent exercising before or after an ESM beep to evaluate their stress level.
link()
allows you to map two data frames to each other that are on different time scales,
based on a pre-specified offset before and/or after. This function assumes that both x
and
y
have a column called time
containing DateTimeClasses.
link( x, y, by = NULL, time, end_time = NULL, y_time, offset_before = 0, offset_after = 0, add_before = FALSE, add_after = FALSE, name = "data", split = by )
link( x, y, by = NULL, time, end_time = NULL, y_time, offset_before = 0, offset_after = 0, add_before = FALSE, add_after = FALSE, name = "data", split = by )
x , y
|
A pair of data frames or data frame extensions (e.g. a tibble). Both |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, To join by different variables on To join by multiple variables, use a vector with To perform a cross-join (when |
time |
The name of the column containing the timestamps in |
end_time |
Optionally, the name of the column containing the end time in |
y_time |
The name of the column containing the timestamps in |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
add_before |
Logical value. Do you want to add the last measurement before the start of each interval? |
add_after |
Logical value. Do you want to add the first measurement after the end of each interval? |
name |
The name of the column containing the nested |
split |
An optional grouping variable to split the computation by. When working with large
data sets, the computation can grow so large it no longer fits in your computer's working
memory (after which it will probably fall back on the swap file, which is very slow). Splitting
the computation trades some computational efficiency for a large decrease in RAM usage. This
argument defaults to |
y
is matched to the time scale of x
by means of time windows. These time windows are
defined as the period between x - offset_before
and x + offset_after
. Note that either
offset_before
or offset_after
can be 0, but not both. The "interval" of the measurements is
therefore the associated time window for each measurement of x
and the data of y
that also
falls within this period. For example, an offset_before
of
minutes(30)
means to match all data of y
that occurred before each
measurement in x
. An offset_after
of 900 (i.e. 15 minutes) means to match all data of y
that occurred after each measurement in x
. When both offset_before
and offset_after
are
specified, it means all data of y
is matched in an interval of 30 minutes before and 15
minutes after each measurement of x
, thus combining the two arguments.
The arguments add_before
and add_after
let you decide whether you want to add the last
measurement before the interval and/or the first measurement after the interval respectively.
This could be useful when you want to know which type of event occurred right before or after
the interval of the measurement. For example, at offset_before = "30 minutes"
, the data may
indicate that a participant was running 20 minutes before a measurement in x
, However, with
just that information there is no way of knowing what the participant was doing the first 10
minutes of the interval. The same principle applies to after the interval. When add_before
is
set to TRUE
, the last measurement of y
occurring before the interval of x
is added to the
output data as the first row, having the time
of x - offset_before
(i.e. the start
of the interval). When add_after
is set to TRUE
, the first measurement of y
occurring
after the interval of x
is added to the output data as the last row, having the time
of
x + offset_after
(i.e. the end of the interval). This way, it is easier to calculate the
difference to other measurements of y
later (within the same interval). Additionally, an
extra column (original_time
) is added in the nested data
column, which is the original time
of the y
measurement and NULL
for every other observation. This may be useful to check if
the added measurement isn't too distant (in time) from the others. Note that multiple rows may
be added if there were multiple measurements in y
at exactly the same time. Also, if there
already is a row with a timestamp exactly equal to the start of the interval (for add_before = TRUE
) or to the end of the interval (add_after = TRUE
), no extra row is added.
A tibble with the data of x
with a new column data
with the matched data of y
according to offset_before
and offset_after
.
Note that setting add_before
and add_after
each add one row to each nested
tibble
of the data
column. Thus, if you are only interested in the total count (e.g.
the number of total screen changes), remember to set these arguments to FALSE or make sure to
filter out rows that do not have an original_time
. Simply subtracting 1 or 2 does not work
as not all measurements in x
may have a measurement in y
before or after (and thus no row
is added).
# Define some data x <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) # Define some data that we want to link to x y <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 12:50:00"), by = "5 min", length.out = 30), 2), participant_id = c(rep("12345", 30), rep("23456", 30)), x = rep(1:30, 2) ) # Now link y within 30 minutes before each row in x # until the measurement itself: link( x = x, y = y, by = "participant_id", time = time, y_time = time, offset_before = "30 minutes" ) # We can also link y to a period both before and after # each measurement in x. # Also note that time, end_time and y_time accept both # quoted names as well as character names. link( x = x, y = y, by = "participant_id", time = "time", y_time = "time", offset_before = "15 minutes", offset_after = "15 minutes" ) # It can be important to also know the measurements # just preceding the interval or just after the interval. # This adds an extra column called 'original_time' in the # nested data, containing the original time stamp. The # actual timestamp is set to the start time of the interval. link( x = x, y = y, by = "participant_id", time = time, y_time = time, offset_before = "15 minutes", offset_after = "15 minutes", add_before = TRUE, add_after = TRUE ) # If you participant_id is not important to you # (i.e. the measurements are interchangeable), # you can ignore them by leaving by empty. # However, in this case we'll receive a warning # since x and y have no other columns in common # (except time, of course). Thus, we can perform # a cross-join: link( x = x, y = y, by = character(), time = time, y_time = time, offset_before = "30 minutes" ) # Alternatively, we can specify custom intervals. # That is, we can create variable intervals # without using fixed offsets. x <- data.frame( start_time = rep( x = as.POSIXct(c( "2021-11-14 12:40:00", "2021-11-14 13:30:00", "2021-11-14 15:00:00" )), times = 2 ), end_time = rep( x = as.POSIXct(c( "2021-11-14 13:20:00", "2021-11-14 14:10:00", "2021-11-14 15:30:00" )), times = 2 ), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) link( x = x, y = y, by = "participant_id", time = start_time, end_time = end_time, y_time = time, add_before = TRUE, add_after = TRUE )
# Define some data x <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) # Define some data that we want to link to x y <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 12:50:00"), by = "5 min", length.out = 30), 2), participant_id = c(rep("12345", 30), rep("23456", 30)), x = rep(1:30, 2) ) # Now link y within 30 minutes before each row in x # until the measurement itself: link( x = x, y = y, by = "participant_id", time = time, y_time = time, offset_before = "30 minutes" ) # We can also link y to a period both before and after # each measurement in x. # Also note that time, end_time and y_time accept both # quoted names as well as character names. link( x = x, y = y, by = "participant_id", time = "time", y_time = "time", offset_before = "15 minutes", offset_after = "15 minutes" ) # It can be important to also know the measurements # just preceding the interval or just after the interval. # This adds an extra column called 'original_time' in the # nested data, containing the original time stamp. The # actual timestamp is set to the start time of the interval. link( x = x, y = y, by = "participant_id", time = time, y_time = time, offset_before = "15 minutes", offset_after = "15 minutes", add_before = TRUE, add_after = TRUE ) # If you participant_id is not important to you # (i.e. the measurements are interchangeable), # you can ignore them by leaving by empty. # However, in this case we'll receive a warning # since x and y have no other columns in common # (except time, of course). Thus, we can perform # a cross-join: link( x = x, y = y, by = character(), time = time, y_time = time, offset_before = "30 minutes" ) # Alternatively, we can specify custom intervals. # That is, we can create variable intervals # without using fixed offsets. x <- data.frame( start_time = rep( x = as.POSIXct(c( "2021-11-14 12:40:00", "2021-11-14 13:30:00", "2021-11-14 15:00:00" )), times = 2 ), end_time = rep( x = as.POSIXct(c( "2021-11-14 13:20:00", "2021-11-14 14:10:00", "2021-11-14 15:30:00" )), times = 2 ), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) link( x = x, y = y, by = "participant_id", time = start_time, end_time = end_time, y_time = time, add_before = TRUE, add_after = TRUE )
This function is specific to mpathsenser databases. It is a wrapper around link()
but
extracts data in the database for you. It is now soft deprecated as I feel this function's use
is limited in comparison to link()
.
link_db( db, sensor_one, sensor_two = NULL, external = NULL, external_time = "time", offset_before = 0, offset_after = 0, add_before = FALSE, add_after = FALSE, participant_id = NULL, start_date = NULL, end_date = NULL, reverse = FALSE, ignore_large = FALSE )
link_db( db, sensor_one, sensor_two = NULL, external = NULL, external_time = "time", offset_before = 0, offset_after = 0, add_before = FALSE, add_after = FALSE, participant_id = NULL, start_date = NULL, end_date = NULL, reverse = FALSE, ignore_large = FALSE )
db |
A database connection to an m-Path Sense database. |
sensor_one |
The name of a primary sensor. See sensors for a list of available sensors. |
sensor_two |
The name of a secondary sensor. See sensors for a list of
available sensors. Cannot be used together with |
external |
Optionally, specify an external data frame. Cannot be used at the same time as a
second sensor. This data frame must have a column called |
external_time |
The name of the column containing the timestamps in |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
add_before |
Logical value. Do you want to add the last measurement before the start of each interval? |
add_after |
Logical value. Do you want to add the first measurement after the end of each interval? |
participant_id |
A character string identifying a single participant. Use
|
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
reverse |
Switch |
ignore_large |
Safety override to prevent long wait times. Set to |
A tibble with the data of sensor_one
with a new column data
with the matched data of
either sensor_two
or external
according to offset_before
or offset_after
. The other way
around when reverse = TRUE
.
## Not run: # Open a database db <- open_db("path/to/db") # Link two sensors link_db(db, "accelerometer", "gyroscope", offset_before = 300, offset_after = 300) # Link a sensor with an external data frame link_db(db, "accelerometer", external = my_external_data, external_time = "time", offset_before = 300, offset_after = 300 ) ## End(Not run)
## Not run: # Open a database db <- open_db("path/to/db") # Link two sensors link_db(db, "accelerometer", "gyroscope", offset_before = 300, offset_after = 300) # Link a sensor with an external data frame link_db(db, "accelerometer", external = my_external_data, external_time = "time", offset_before = 300, offset_after = 300 ) ## End(Not run)
Gaps in mobile sensing data typically occur when the app is stopped by the operating system or
the user. While small gaps may not pose problems with analyses, greater gaps may cause bias or
skew your data. As a result, gap data should be considered in order to inspect and limit their
influence. This function, analogous to link()
, allows you to connect gaps to other data
(usually ESM/EMA data) within a user-specified time range.
link_gaps( data, gaps, by = NULL, offset_before = 0, offset_after = 0, raw_data = FALSE )
link_gaps( data, gaps, by = NULL, offset_before = 0, offset_after = 0, raw_data = FALSE )
data |
A data frame or an extension to a data frame (e.g. a tibble). While gap data can be linked to any other type of data, ESM data is most commonly used. |
gaps |
A data frame (extension) containing the gap data. See |
by |
A character vector indicating the variable(s) to match by, typically the participant
IDs. If NULL, the default, To join by different variables on To join by multiple variables, use a vector with To perform a cross-join (when |
offset_before |
The time before each measurement in |
offset_after |
The time after each measurement in |
raw_data |
Whether to include the raw data (i.e. the matched gap data) to the output as gap_data. |
The original data
with an extra column duration
indicating the gap during within the
interval in seconds (if duration
is TRUE
), or an extra column called gap_data
containing
the gaps within the interval. The function ensures all durations and gap time stamps are within
the range of the interval.
bin_data()
for linking two sets of intervals to each other; identify_gaps()
for
finding gaps in the sampling; add_gaps()
for adding gaps to sensor data;
# Create some data x <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) # Create some gaps gaps <- data.frame( from = as.POSIXct(c("2021-11-14 13:00:00", "2021-11-14 14:00:00")), to = as.POSIXct(c("2021-11-14 13:30:00", "2021-11-14 14:30:00")), participant_id = c("12345", "23456") ) # Link the gaps to the data link_gaps(x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800) # Link the gaps to the data and include the raw data link_gaps( x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800, raw_data = TRUE )
# Create some data x <- data.frame( time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2), participant_id = c(rep("12345", 3), rep("23456", 3)), item_one = rep(c(40, 50, 60), 2) ) # Create some gaps gaps <- data.frame( from = as.POSIXct(c("2021-11-14 13:00:00", "2021-11-14 14:00:00")), to = as.POSIXct(c("2021-11-14 13:30:00", "2021-11-14 14:30:00")), participant_id = c("12345", "23456") ) # Link the gaps to the data link_gaps(x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800) # Link the gaps to the data and include the raw data link_gaps( x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800, raw_data = TRUE )
moving_average( db, sensor, cols, n, participant_id = NULL, start_date = NULL, end_date = NULL )
moving_average( db, sensor, cols, n, participant_id = NULL, start_date = NULL, end_date = NULL )
db |
A database connection to an m-Path Sense database. |
sensor |
The name of a sensor. See sensors for a list of available sensors. |
cols |
Character vectors of the columns in the |
n |
The number of seconds to average over. The index of the result will be centered compared to the rolling window of observations. |
participant_id |
A character vector identifying one or multiple participants. |
start_date |
Optional search window specifying date where to begin search. Must be convertible to date using as.Date. Use first_date to find the date of the first entry for a participant. |
end_date |
Optional search window specifying date where to end search. Must be convertible to date using as.Date. Use last_date to find the date of the last entry for a participant. |
A tibble with the same columns as the input, modified to be a moving average.
## Not run: path <- system.file("testdata", "test.db", package = "mpathsenser") db <- open_db(NULL, path) moving_average( db = db, sensor = "Light", cols = c("mean_lux", "max_lux"), n = 5, # seconds participant_id = "12345" ) close_db(db) ## End(Not run)
## Not run: path <- system.file("testdata", "test.db", package = "mpathsenser") db <- open_db(NULL, path) moving_average( db = db, sensor = "Light", cols = c("mean_lux", "max_lux"), n = 5, # seconds participant_id = "12345" ) close_db(db) ## End(Not run)
open_db(path = getwd(), db_name = "sense.db")
open_db(path = getwd(), db_name = "sense.db")
path |
The path to the database. Use NULL to use the full path name in db_name. |
db_name |
The name of the database. |
A connection to an mpathsenser database.
close_db()
for closing a database; copy_db()
for copying (part of) a database;
index_db()
for indexing a database; get_data()
for extracting data from a database.
# First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") close_db(db) DBI::dbIsValid(db) # db is closed # Then re-open it db2 <- open_db(tempdir(), "mydb.db") DBI::dbIsValid(db2) # db is opened # Cleanup close_db(db2) file.remove(file.path(tempdir(), "mydb.db"))
# First create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") close_db(db) DBI::dbIsValid(db) # db is closed # Then re-open it db2 <- open_db(tempdir(), "mydb.db") DBI::dbIsValid(db2) # db is opened # Cleanup close_db(db2) file.remove(file.path(tempdir(), "mydb.db"))
Plot a coverage overview
## S3 method for class 'coverage' plot(x, ...)
## S3 method for class 'coverage' plot(x, ...)
x |
A tibble with the coverage data coming from |
... |
Other arguments passed on to methods. Not currently used. |
A ggplot2::ggplot object.
## Not run: freq <- c( Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements. AirQuality = 1, AppUsage = 2, # Once every 30 minutes Bluetooth = 60, # Once per minute. Can have multiple measurements. Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements. Light = 360, # Once per 10 seconds Location = 60, # Once per 60 seconds Memory = 60, # Once per minute Noise = 120, Pedometer = 1, Weather = 1, Wifi = 60 # once per minute ) data <- coverage( db = db, participant_id = "12345", sensor = c("Accelerometer", "Gyroscope"), frequency = mpathsenser::freq, start_date = "2021-01-01", end_date = "2021-05-01" ) plot(data) ## End(Not run)
## Not run: freq <- c( Accelerometer = 720, # Once per 5 seconds. Can have multiple measurements. AirQuality = 1, AppUsage = 2, # Once every 30 minutes Bluetooth = 60, # Once per minute. Can have multiple measurements. Gyroscope = 720, # Once per 5 seconds. Can have multiple measurements. Light = 360, # Once per 10 seconds Location = 60, # Once per 60 seconds Memory = 60, # Once per minute Noise = 120, Pedometer = 1, Weather = 1, Wifi = 60 # once per minute ) data <- coverage( db = db, participant_id = "12345", sensor = c("Accelerometer", "Gyroscope"), frequency = mpathsenser::freq, start_date = "2021-01-01", end_date = "2021-05-01" ) plot(data) ## End(Not run)
A list containing all available sensors in this package you can work with. This variable was created so it is easier to use in your own functions, e.g. to loop over sensors.
sensors
sensors
An object of class character
of length 27.
A character vector containing all sensor names supported by mpathsenser
.
sensors
sensors
test_jsons(path = getwd(), files = NULL, db = NULL, recursive = TRUE)
test_jsons(path = getwd(), files = NULL, db = NULL, recursive = TRUE)
path |
The path name of the JSON files. |
files |
Alternatively, a character list of the input files. |
db |
A mpathsenser database connection (optional). If provided, will be used to check which files are already in the database and check only those JSON files which are not. |
recursive |
Should the listing recurse into directories? |
A message indicating whether there were any issues and a character vector of the file names that need to be fixed. If there were no issues, an invisible empty string is returned.
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
## Not run: # Test all files in a directory test_jsons(path = "path/to/jsons", recursive = FALSE) # Test all files in a directory and its subdirectories test_jsons(path = "path/to/jsons", recursive = TRUE) # Test specific files test_jsons(files = c("file1.json", "file2.json")) # Test files in a directory, but skip those that are already in the database test_jsons(path = "path/to/jsons", db = db) ## End(Not run)
## Not run: # Test all files in a directory test_jsons(path = "path/to/jsons", recursive = FALSE) # Test all files in a directory and its subdirectories test_jsons(path = "path/to/jsons", recursive = TRUE) # Test specific files test_jsons(files = c("file1.json", "file2.json")) # Test files in a directory, but skip those that are already in the database test_jsons(path = "path/to/jsons", db = db) ## End(Not run)
Similar to unzip, but makes it easier to unzip all files in a given path with one function call.
unzip_data(path = getwd(), to = NULL, overwrite = FALSE, recursive = TRUE)
unzip_data(path = getwd(), to = NULL, overwrite = FALSE, recursive = TRUE)
path |
The path to the directory containing the zip files. |
to |
The output path. |
overwrite |
Logical value whether you want to overwrite already existing zip files. |
recursive |
Logical value indicating whether to unzip files in subdirectories as well. These files will then be unzipped in their respective subdirectory. |
A message indicating how many files were unzipped.
This function supports parallel processing in the sense that it is able to
distribute it's computation load among multiple workers. To make use of this functionality, run
future::plan("multisession")
before
calling this function.
You can be updated of the progress of this function by using the
progressr::progress()
package. See progressr
's
vignette on
how to subscribe to these updates.
## Not run: # Unzip all files in a directory unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = FALSE) # Unzip all files in a directory and its subdirectories unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = TRUE) # Unzip files in a directory, but skip those that are already unzipped unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", overwrite = FALSE) ## End(Not run)
## Not run: # Unzip all files in a directory unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = FALSE) # Unzip all files in a directory and its subdirectories unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", recursive = TRUE) # Unzip files in a directory, but skip those that are already unzipped unzip_data(path = "path/to/zipfiles", to = "path/to/unzipped", overwrite = FALSE) ## End(Not run)
This is a convenience function that calls the VACUUM
command on a database. This command will
rebuild the database file, repacking it into a minimal amount of disk space.
vacuum_db(db)
vacuum_db(db)
db |
A database connection to an m-Path Sense database. |
a scalar numeric that specifies the number of rows affected by the vacuum.
# Create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Assuming that we have imported some data into the database, we can vacuum it vacuum_db(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))
# Create a database in a temporary directory db <- create_db(tempdir(), "mydb.db") # Assuming that we have imported some data into the database, we can vacuum it vacuum_db(db) # Cleanup close_db(db) file.remove(file.path(tempdir(), "mydb.db"))