More Google Location Data

So I’ve been doing some more investigating of my Google Timeline Data here and there (as I started writing about here).

After my last post, a friend of mine pointed me towards the Haversine formula for calculating distance between two sets of lat, long coordinates and with that I was able to calculate distances on a day-by-day basis that were consistently close to Google’s estimations for the days. Encouraged, I then moved on to calculating distances on a per-year basis, and that was fun. I got what seemed reasonable to me:

Between 5/22/15 and 5/22/16 I went 20,505 miles

Between 9/20/2013 and 5/22/16 I went 43,434 miles

Recall that these are for every sort of movement, including airplanes. So you can see my average miles/day went way up recently, due to a few big recent airplane trips. So if the numbers seem high (~60 miles a day, ~45 miles a day), this is why.

But then I wanted to visualize some of the data. I decided to use matplotlib to plot the points because it is easy to use and because Python has an easy way to load JSON data.

So I ended up breaking down the points by the assigned “activity.” True to my word, I only considered the “highest likelihood activities,” and discarded all the other less likely ones. To keep it simple.

You may recall that I had a total of 1,048,575 position data points. Well, only 288,922 had activities assigned to them. So just over a quarter. Still, it is enough data to have a bit of fun with.

Of those 288,922 data points with activity, it turns out that there were only a total of 7 different activities:

still
unknown
onBicycle
onFoot
tilting
inVehicle
exitingVehicle

The first obvious thing to do was to sort by activity type and then plot the  coordinates segregated by activity type.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import csv

latitudeArray = []
longitudeArray = []

with open("bicycle.csv") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        x, y = float(row[0]), float(row[1])
        latitudeArray.append(x) #storing all the latitudes and longitudes in separate arrays, but the index is the same for each pair
        longitudeArray.append(y)

m = Basemap(llcrnrlon=min(longitudeArray)-10, #Set map's displayed max/min based on your set's max/min
    llcrnrlat=min(latitudeArray)-10,
    urcrnrlon=max(longitudeArray)+10,
    urcrnrlat=max(latitudeArray)+10,
    lat_ts=20, #"latitude of true scale" lat_ts=0 is stereographic projection
    resolution='h', #resolution can be set to 'c','l','i','h', or 'f' - for crude, low, intermediate, high, or full
    projection='merc',
    lon_0=longitudeArray[0],
    lat_0=latitudeArray[0])

x1, y1 = m(longitudeArray, latitudeArray) #map the lat, long arrays to x, y coordinate pairs

m.drawmapboundary(fill_color="white")
m.drawcoastlines()
m.drawcountries()

m.scatter(x1, y1, s=25, c='r', marker="o") #Plot your markers and pick their size, color, and shape

plt.title("Steen's Bicycling Coordinates")

plt.show()

Note that, at the time, I wrote a script which saved all the different activity types into their own CSV, because I wanted to look at them and play with them individually. And I then decided to load these CSV files into my script for plotting – because they were there. Note that, if I went back and did it again, I’d probably not bother with the CSV intermediary, and just go straight from the geoJSON file.

And in this way I was able to see the different activities plotted onto Basemap’s default map thing:

steen-biking03
Points in San Francisco from my Timeline Data that Google tagged as “onBicycle.” I can see a lot of my favorite biking routes, as well as that time I biked the Golden Gate Bridge.
onfoot06
Points in San Francisco from my Timeline Data that Google tagged as “onFoot.” Streets are less clear, and it sort of looks like a big blob centered around the Mission. Which seems accurate, given that I just sort of meander about. The one straight looks like Market, which I occasionally will walk all the way on, so that makes sense.
invehicle03
Points in San Francisco from my Timeline Data that Google tagged as “inVehicle.” All the points in the bay are most likely due to when I take the ferry to work – as a ferry is, in fact, a vehicle. And I am not surprised to see that the vast majority of the points appear to be on Van Ness.

I also plotted the other categories – including the mysterious “tilting,” but I couldn’t really discern any sort of meaning from those. They just looked like all the points from everywhere I’ve ever been, and were therefore not very meaningful. Not like the dramatic differences between the points and obvious routes for biking, on any sort of vehicle, or walking. So, there’s no need for you to see those.

I’d say this was a success. So my next question is to figure out how much time I spend doing each activity. And how much time I spend in each place. I’ll have to think about it a bit on how to calculate the time. All those points with no activity associated with them have me concerned that it won’t be very straightforward to just subtract the timestamps to get ΔT. It could be the case that in the “no activity” points, I had ended up doing completely different things, and then returned to the original activity (in which case the calculated ΔT would be incorrect). But, then again, it is probably very likely that all those “no activity” points actually are the same activity if they are bookended by it. Hmm.

So now I’m wondering if, given that a time point is uploaded every 60 seconds, I should just say 1 timepoint = 60 seconds? That doesn’t seem quite right to me, but I’ll work it out for a smaller data set and see if that even comes close to accurate. I’ll keep thinking about it, but if anybody has any suggestions on how to get around this problem, feel free to let me know!

Google Timeline Data

I really love the data collected by Google Timeline, and I have fun going through it. But some of the most obvious data aggregates are not available! You cannot get how many miles you’ve biked this year, or how many hours you spent at your office, or how many hours you’ve spent walking, etc. You can see those stats on a day-per-day basis, which I’ll admit is interesting, but I usually already have an instinctual idea of I’ve done on a day-by-day basis. I’d like to imagine that I’d be totally surprised by what my stats would be for a full year.

So I downloaded my raw data from Google Takeout to try and play with, and it turns out to be one humongous geoJSON file. But it seems that it is laid out in the format of each unique timestamp taken, with all the associated data for that point. In most cases, you just get timestamp, latitude/longitude (E7), and accuracy.

{
 "timestampMs" : "1463961031129",
 "latitudeE7" : 377567650,
 "longitudeE7" : -1224178356,
 "accuracy" : 20
 }

But sometimes there’s a list of “activities,” with Google’s “confidence” that the activity listed was the one… being done. They seem to always add up to 100%, if you assume that “on foot” is the same as “walking,” but it isn’t clear to me why both get listed in those cases

{
 "timestampMs" : "1463951764885",
 "latitudeE7" : 377617843,
 "longitudeE7" : -1224204018,
 "accuracy" : 20,
 "activitys" : [ {
 "timestampMs" : "1463951764173",
 "activities" : [ {
 "type" : "onFoot",
 "confidence" : 92
 }, {
 "type" : "walking",
 "confidence" : 92
 }, {
 "type" : "onBicycle",
 "confidence" : 8
 } ]
 } ]
 }

This much is pretty obvious. But then you get activities like “tilting” and other weird things.

Sometimes you get an associated velocity, altitude, and heading. But not usually. Note: I haven’t figured out yet whether the velocity is in kmph or mph.

{
 "timestampMs" : "1463268526270",
 "latitudeE7" : 380429730,
 "longitudeE7" : -1227851373,
 "accuracy" : 10,
 "velocity" : 20,
 "heading" : 239,
 "altitude" : 21,
 "activitys" : [ {
 "timestampMs" : "1463268515541",
 "activities" : [ {
 "type" : "inVehicle",
 "confidence" : 100
 }, {
 "type" : "still",
 "confidence" : 8
 } ],
 "extras" : [ {
 "type" : "value",
 "name" : "vehicle_personal_confidence",
 "intVal" : 100
 } ]
 } ]
 }

So, obviously I’d have to do some calculations based on the latitude and longitude to get data on distances. Which I haven’t done. So I guess I’ll probably just consider the timestamps that have types associated, and for simplicity’s sake I’ll most likely end up only considering the one with the highest “confidence.” With that method, it would be pretty easy to calculate total times. Distances… not as easy, mostly because I haven’t worked with latitudes and longitudes very much. But I’m sure there’s some module out there or something for handling that, so I probably won’t have to learn too much about it 😉

All I’ve done with my data thus far was convert all the timepoints with associated lat/long into human dates with lat/long that are not… E7. So now I have 1,048,575 lines formatted like this

2016-05-22 T14:58:35-07:00    37.7570186, -122.4204018

All the way back to

2013-09-20 T08:18:54-07:00    37.7650828, -122.4172515

Which is the first timepoint I have in my Google Location History.

This format makes it super easy for me to quickly generate maps like

5/22/2016
5/22/2016
5/15/2015
5/15/2015

Or even overlays of multiple days, etc. Which, I realize, is… baaaasically exactly what I can already get from Google Location History through my Timeline.

But! My next step is to calculate how many hours per year were spent doing each activity (walking/biking/driving/etc). There will obviously be far fewer than 1,048,575 timepoints for which there is activity data associated, because most timepoints don’t have any activity.

I haven’t started thinking about the mileage counter yet, so if anybody has any suggestions for handling latitude/longitude points and calculating mileage from those, I guess feel free to let me know. Otherwise, I’ll just let you all know what stats I come up with.

Update 6/5/2016:

My ongoing saga of Google Location History data continues here

West Marin Weekend

This weekend, Doc and I spent almost the full weekend in West Marin.

We went to a dinner event at Heiðrún Meadery featuring the food of Fine & Rare paired with Heiðrún meads. It was pretty magical having such a meal in Heiðrún’s gardens, surrounded by bees floating around and flowers everywhere. At night we even got to roast marshmallows over a fire!

Cutie pie little bee gathering nectar
A little bee gathers nectar from a cornflower in the garden

Flowers!
More lovely flowers on Heiðrún’s grounds

Sipping mead in the top secret tree

Hammock

Beautiful California native wildflower garden

Dinner at Heidrun Meadery

Roasting marshmallows at Heidrun Meadery

Since we were anticipating drinking a lot (since it was a mead event, after all), we had also booked a room at Ligonberry Farms, where they raise sheep and alpaca for their wool and fiber making! This way we didn’t have to drive back in such a state. This was also quite relaxing and magical.

Comics by the fire at Ligonberry Farm

Staying at Ligonberry Farm

Alpacas

Then on Sunday, we did a short ~5 mile hike in Point Reyes National Seashore. And we even got back to The City on time for Doc to give his photography talk!

Mt Wittenberg Trail

The View from Mt Wittenberg

Spiky Thigs

Mount Wittenberg Summit Marker