Sisifo - Wearable physical activity

Introduction & purpose

After a couple of post on the Android app for upstreaming data from a wearable with an accelerometer, we finally get to analyse the data. The very first objective is relatively simple: measuring the overall physical activity of the user. Actually, the simplest statistic measures already say a lot -in particular the standard deviation of the Signal Vector Magnitude.

The post includes code in R, and also a dataset for the accelerometer data, so that you can follow it along as a kind of tutorial.

Dataset used

You can find the dataset in this R data file. Just download it and load it into your R session.

The dataset is generated with the Android app covered in previous posts (see this and that), but you don’t need to read them for context -they cover the app itself.

References

There’s lot of information on accelerometers in the web, but in particular this se question could be useful for understanding the range of values that such a device provides, and how normalization is usually made. For a more serious/academic source on calibration, take e.g. this paper.

A good summary on the basic measures of physical activity may be found here.

Measuring physical activity

Let’s first take a look at the dataset. It corresponds to 486 minutes of accelerometer data, i.e. around 8 hours of data. Some basic statistics:

nrow(post_data)

## [1] 364500

summary(post_data[, c('x', 'y', 'z')])

##        x                 y                  z
##  Min.   :-2.2580   Min.   :-2.48500   Min.   :-2.2660
##  1st Qu.:-0.4850   1st Qu.:-0.47700   1st Qu.:-0.6840
##  Median :-0.1250   Median :-0.20400   Median :-0.5160
##  Mean   :-0.1703   Mean   :-0.04411   Mean   :-0.2351
##  3rd Qu.: 0.1480   3rd Qu.: 0.60100   3rd Qu.: 0.0500
##  Max.   : 1.6050   Max.   : 1.47600   Max.   : 1.5110

There’s around 365 thousand rows, and each of them is composed of an id (actually two fields, header_id and id, see below for details on the data model) and the 3 columns coming from the accelerometer, for coordinates x, y and z.

First operation should be to calibrate the accelerometer so that the data can be normalized, but in this case the device already provides a normalized output, as it will become clear later. Second would be to calculate the modulus (length) of the 3d vector and plotting the results.

post_data$mod <- sqrt(post_data$x^2 + post_data$y^2 + post_data$z^2)

Note that, in order to avoid a crude plot with 350 thousand values, the plot below smooths the data manually, and it is only useful to get a quick visualization of the data. And alternative would be to use a smoothScatter plot.

require(zoo)
plot(rollapply(post_data$mod, width=100, FUN=max, by=100), xlab="Samples / 100", ylab="Modulus",
     main="Overall crude picture", type="l", ylim=c(min(post_data$mod), max(post_data$mod)))
lines(rollapply(post_data$mod, width=100, FUN=min, by=100), type="l")

A few details on what the “subject” who was wearing the wristband was doing during the 8 hours in which the data was captured:

1st he arrived home and setup the experiment
2nd he attended quite a long phone call
then he prepared dinner and did some stuff at home (laundry, washing dishes, etc)
he spent some time in front on the computer and then on bed reading
and finally he went to sleep

The last 4 hours correspond to the sleep time, although the “subject” had to interrupt his sleep to go to the bathroom a couple of times, and in general was somewhat worried about the experiment.

From the data (and even before the visualization), it’s easy to see that the mean of the modulus is equal to 1. Let’s try to interpret that value: the mean of activity for the “subject”, in particular for the last 4 hours, was “being still”; on the other hand, in the context of an accelerometer, being still corresponds to a 1G acceleration -since the accelerometer just measures gravity when the subject is still. Precisely, the objective of the normalization would be to set to 1 whatever the value comes out from the accelerometer when the device is still and thus measuring 1G -so this step is already done.

Another important point to keep in mind is the meaning of the 0 axis for the modulus: that would correspond to 0G, zero gravity, or free fall.

The straightforward way to measure physical way is called SVM (acceleration Signal Vector Magnitude), which is just an average of the modulus. Bellow the calculation in R using a rolling window per minute of data (since 750 samples correspond to 1 minute).

post_svm <- rollapply(post_data$mod, width=750, FUN=mean, by=750)
summary(post_data$mod)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.05152 0.98970 0.99550 0.99840 1.00600 3.29900

summary(post_svm)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##  0.9709  0.9923  0.9965  0.9984  1.0060  1.0350

plot(post_svm, xlab="Minutes", ylab="SVM", main="Mean of physical activity")

Note that from this plot:

the x axis shows now minutes
thus the SVM values correspond to the average of activity per minute
due to the averaging, all values are now around 1 (or 1G), note that the y scale is narrow!
it is now a bit more clear that for the second part of the experiment the subject was mostly still (actually in bed)

However, even if SVM gives a measure of the average physical activity, one could wonder why the average energy value in minute 320 is so different to minute 360, when in both the subject was sleeping, and why both values are not essentially so different to the values of the first part of the experiment, in which the subject was active.

Also note that, after the averaging, the details of the movements are now lost completely, and it would be impossible to know whether the subject is walking or typing or anything else.

To distinguish the periods with different levels of activity, it makes much more sense to use the standard deviation of the physical activity, as follows.

post_svm_sd <- rollapply(post_data$mod, width=750, FUN=sd, by=750)
summary(post_svm_sd)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
## 0.0008257 0.0016800 0.0034390 0.0204200 0.0222500 0.1764000

plot(post_svm_sd, xlab="Minutes", ylab="Standard deviation of physical activity", main="SD of physical activity")

In this case it is easy to distinguish the different periods of activity. Let’s add some labels to it.

plot(post_svm_sd, xlab="Minutes", ylab="Standard deviation of physical activity", main="SD of physical activity -labeled")
text(5, 0.10, "just home", col="red", srt=90)
text(28, 0.06, "phone call", col="red", srt=90)
text(100, 0.10, "dinner, dishes, laundry", col="red", srt=90)
text(200, 0.03, "computer & reading", col="red")
text(320, 0.02, "in bed...", col="red")

Conclusion

This post hopefully illustrates that a few simple measures on the accelerometer data may give quite some info on the activity level of the subject. This may not be useful to detect what the user is really doing, but it can give a hint of the type of activity. Quite a few commercial wearables seem to be based mostly on simple measures like these.

On the other hand, the standard deviation of the physical activity may lead to interesting insights (and it is one of the targets of our future wearable lab). It may give an idea on how the activity is being done, how the movements are actually performed. For example, it is probably able to distinguish between Karate and Tai-chi. And the way the movements are done may give an indication of the psychological status of the subject as well… depending on the results from the lab, we might talk further about it in future posts.

Getting the data through JDBC

The dataset is generated with the Android app covered in previous posts (see this and that), which stores them in a PostgreSQL database. Just for completeness, find below the code to load from the database.

require(RPostgreSQL)

# connection
drv <- dbDriver("PostgreSQL")
#?dbConnect
con <- dbConnect(drv, dbname="fraa", user="fraa", password="<password here>",
                 host="localhost", port="15432")

dbListConnections(drv)
dbGetInfo(drv)
summary(con)

rs <- dbSendQuery(con, "select * from header")
headers <- data.frame(fetch(rs, n=-1))
dbClearResult(dbListResults(con)[[1]])

rs <- dbSendQuery(con, "select * from acc_data")
acc_data <- data.frame(fetch(rs, n=-1))
dbClearResult(dbListResults(con)[[1]])

post_data <- acc_data[acc_data$header_id >= 312 & acc_data$header_id <= 316,]

The table header contains reference data for each of the experiments, and the data from the accelerometer linked to each of the headers goes into the table acc_data.

Upstreaming data from a wearable device to a server: a Android app for a data lab
Setting up a lab for a number of wearables, in which all the raw data from sensors (e.g. accelerometer) goes upstream to a server, to be analyzed later. Some features are not so common in an Android app implementation, e.g. a not negligible upstream of data. Includes detailed explanations and Java code.
Upstreaming data from a wearable device to a server: making a robust Android app for the lab
Some conclusions on how to make the app robust enough, so that the users can do whatever they like with their smartphones during the experiment.
Applying neural nets to time series: accelerometer data from wearable
When approaching a somehow more advanced analysis of the data coming from an accelerometer, the far-from-obvious concepts from time series start to be needed. But are they really needed? Maybe not, since one could always apply a neural network for solving the problem. Is this second path easier in practice?

Wearables: measuring physical activity from accelerometer data