Introduction & purpose

After a couple of post on the Android app for upstreaming data from a wearable with an accelerometer, we finally get to analyse the data. The very first objective is relatively simple: measuring the overall physical activity of the user. Actually, the simplest statistic measures already say a lot -in particular the standard deviation of the Signal Vector Magnitude.

The post includes code in R, and also a dataset for the accelerometer data, so that you can follow it along as a kind of tutorial.

Dataset used

You can find the dataset in this R data file. Just download it and load it into your R session.

The dataset is generated with the Android app covered in previous posts (see this and that), but you don’t need to read them for context -they cover the app itself.

References

There’s lot of information on accelerometers in the web, but in particular this se question could be useful for understanding the range of values that such a device provides, and how normalization is usually made. For a more serious/academic source on calibration, take e.g. this paper.

A good summary on the basic measures of physical activity may be found here.

Measuring physical activity

Let’s first take a look at the dataset. It corresponds to 486 minutes of accelerometer data, i.e. around 8 hours of data. Some basic statistics:

nrow(post_data)
## [1] 364500
summary(post_data[, c('x', 'y', 'z')])
##        x                 y                  z
##  Min.   :-2.2580   Min.   :-2.48500   Min.   :-2.2660
##  1st Qu.:-0.4850   1st Qu.:-0.47700   1st Qu.:-0.6840
##  Median :-0.1250   Median :-0.20400   Median :-0.5160
##  Mean   :-0.1703   Mean   :-0.04411   Mean   :-0.2351
##  3rd Qu.: 0.1480   3rd Qu.: 0.60100   3rd Qu.: 0.0500
##  Max.   : 1.6050   Max.   : 1.47600   Max.   : 1.5110

There’s around 365 thousand rows, and each of them is composed of an id (actually two fields, header_id and id, see below for details on the data model) and the 3 columns coming from the accelerometer, for coordinates x, y and z.

First operation should be to calibrate the accelerometer so that the data can be normalized, but in this case the device already provides a normalized output, as it will become clear later. Second would be to calculate the modulus (length) of the 3d vector and plotting the results.

post_data$mod <- sqrt(post_data$x^2 + post_data$y^2 + post_data$z^2)

Note that, in order to avoid a crude plot with 350 thousand values, the plot below smooths the data manually, and it is only useful to get a quick visualization of the data. And alternative would be to use a smoothScatter plot.

require(zoo)
plot(rollapply(post_data$mod, width=100, FUN=max, by=100), xlab="Samples / 100", ylab="Modulus",
     main="Overall crude picture", type="l", ylim=c(min(post_data$mod), max(post_data$mod)))
lines(rollapply(post_data$mod, width=100, FUN=min, by=100), type="l")

A few details on what the “subject” who was wearing the wristband was doing during the 8 hours in which the data was captured:

  • 1st he arrived home and setup the experiment

  • 2nd he attended quite a long phone call

  • then he prepared dinner and did some stuff at home (laundry, washing dishes, etc)

  • he spent some time in front on the computer and then on bed reading

  • and finally he went to sleep

The last 4 hours correspond to the sleep time, although the “subject” had to interrupt his sleep to go to the bathroom a couple of times, and in general was somewhat worried about the experiment.

From the data (and even before the visualization), it’s easy to see that the mean of the modulus is equal to 1. Let’s try to interpret that value: the mean of activity for the “subject”, in particular for the last 4 hours, was “being still”; on the other hand, in the context of an accelerometer, being still corresponds to a 1G acceleration -since the accelerometer just measures gravity when the subject is still. Precisely, the objective of the normalization would be to set to 1 whatever the value comes out from the accelerometer when the device is still and thus measuring 1G -so this step is already done.

Another important point to keep in mind is the meaning of the 0 axis for the modulus: that would correspond to 0G, zero gravity, or free fall.

The straightforward way to measure physical way is called SVM (acceleration Signal Vector Magnitude), which is just an average of the modulus. Bellow the calculation in R using a rolling window per minute of data (since 750 samples correspond to 1 minute).

post_svm <- rollapply(post_data$mod, width=750, FUN=mean, by=750)
summary(post_data$mod)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.05152 0.98970 0.99550 0.99840 1.00600 3.29900
summary(post_svm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##  0.9709  0.9923  0.9965  0.9984  1.0060  1.0350
plot(post_svm, xlab="Minutes", ylab="SVM", main="Mean of physical activity")