Google Location History Data Part IV

I’ve returned to my Google Location History Data (the previous installment of which is here), to implement something I had been thinking about for a while: setting a maximum time threshold for how far apart (temporally) two coordinate pairs (of the same activity type) can be and still be considered part of the “same trip.” Figuring out this threshold isn’t as obvious as I was originally thinking, but I still think I got something meaningful out of this exercise. I think a spread of 10 minutes obviously encompasses the same trip, but what about an hour? In many cases, likely not.

Recall that of ~1,000,000 data points (over 3 years), only 288,922 had any activities associated with them. This means, for the 1,000,000 data points, I have (on average) one point for every 31 seconds. For the 288,922 activity-associated data points, I have (on average) one point for every 109 seconds. So, the threshold will have to at least be 109 seconds, but most likely higher.

So I futzed around with the threshold a lot to see how that changed the results. I think 15-20 minutes is a pretty good threshold without allowing a ton of noise in, but the average miles/day for all the activities still seems pretty low with that threshold. The true number might lie somewhere closer between 20m and 1h. Or maybe I just don’t go as far as I imagine I do!

I wrote the threshold decider stuff in Ruby, because I had initially written the “sort-by-activities” script in Ruby (this was before I knew I was going to use Python for Basemap etc). And I had already written the Haversine script in Ruby, and I wanted to reuse that. Oh, well. Maybe one day I will normalize everything to be in Python, for consistency’s sake.

#Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs
def haversine(old_lats_longs, new_lats_longs)
  lat1 = old_lats_longs[0]
  lon1 = old_lats_longs[1]
  lat2 = new_lats_longs[0]
  lon2 = new_lats_longs[1]

  r = 6371000
  phi1 = (lat1*Math::PI)/180
  phi2 = (lat2*Math::PI)/180

  deltaPhi = ((lat2-lat1)*Math::PI)/180

  deltaLambda = ((lon2-lon1)*Math::PI)/180

  a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2)
  c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a))

  distance = r * c

  return distance
end

#Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold
def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]]
  threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip
  distance = 0
  total_time = 0
  time_period = dated_coords.first[0]-dated_coords.last[0]

  previous_time = dated_coords.first[0]
  previous_lats_longs = dated_coords.first[1]

  dated_coords.each do |dated_coord|
    (distance+=haversine(previous_lats_longs, dated_coord[1])) && (total_time+=(previous_time-dated_coord[0])) if previous_time-dated_coord[0] <= threshold
    previous_time = dated_coord[0]
    previous_lats_longs = dated_coord[1]
  end
end

This is the meat of the threshold decider. Pretty simple. Almost exactly identical to the old Haversine distance calculator I wrote to get me aggregate distances, except the distances only get calculated/added if they are within the temporal threshold.

This is pretty much the context I have it in right now, with the file-reader and human-friendly-displayer (et al) to quickly display some of the stats that I am interested in seeing:

require 'time'

#Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs
def haversine(old_lats_longs, new_lats_longs)
  lat1 = old_lats_longs[0]
  lon1 = old_lats_longs[1]
  lat2 = new_lats_longs[0]
  lon2 = new_lats_longs[1]

  r = 6371000
  phi1 = (lat1*Math::PI)/180
  phi2 = (lat2*Math::PI)/180

  deltaPhi = ((lat2-lat1)*Math::PI)/180

  deltaLambda = ((lon2-lon1)*Math::PI)/180

  a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2)
  c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a))

  distance = r * c

  return distance
end

#Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold
def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]]
  threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip
  distance = 0
  total_time = 0
  time_period = dated_coords.first[0]-dated_coords.last[0]

  previous_time = dated_coords.first[0]
  previous_lats_longs = dated_coords.first[1]

  dated_coords.each do |dated_coord|
    (distance+=haversine(previous_lats_longs, dated_coord[1])) && (total_time+=(previous_time-dated_coord[0])) if previous_time-dated_coord[0] <= threshold
    previous_time = dated_coord[0]
    previous_lats_longs = dated_coord[1]
  end

  display(time_period, total_time, distance)
end

#Reads the file of sorted Google Location History data
def reader()
  dated_coords = [] #An array of coordinate pairs with their timestamp, format: [timestamp,[lat,long]]

  File.open("inVehicle.txt", 'r') do |file|
    file.each_line do |line|
      columns = line.split("\t")
      dated_coords << [Time.parse(columns[0]),[columns[1].to_f,columns[2].to_f]]
    end
  end

  threshold_decider(dated_coords)
end

def sec_to_year(seconds)
  seconds/31536000
end

def sec_to_hour(seconds)
  seconds/3600
end

def sec_to_day(seconds)
  seconds/86400
end

def m_to_km(meters)
  meters/1000
end

def m_to_mi(meters)
  meters/1609
end

#Displays all the information in a way that humans like to read
def display(time_period, total_time, distance)
  puts "The time period was #{sec_to_year(time_period).round(2)} years!"
  puts "The total distance gone over that full time period was #{m_to_km(distance).round} kilometers, or #{m_to_mi(distance).round} miles!"
  puts "You spent #{sec_to_hour(total_time).round} hours doing it!"
  puts "That is an average of #{(m_to_mi(distance)/sec_to_day(time_period)).round(2)} miles per day!"
  puts "You've spent #{((total_time/time_period)*100).round(2)}% of your time doing this activity!"
  puts "You've averaged #{(m_to_mi(distance)/sec_to_hour(total_time)).round}mph!"
end

reader()

So! Let us see some of my results I got with the threshold set to 1,000 seconds.

Walking:

The time period was 2.8 years!

The total distance gone over that full time period was 5543 kilometers, or 3445 miles!

You spent 1864 hours doing it!

That is an average of 3.37 miles per day!

You’ve spent 7.6% of your time doing this activity!

You’ve averaged 2mph!

Bicycling:

The time period was 2.79 years!

The total distance gone over that full time period was 3047 kilometers, or 1894 miles!

You spent 297 hours doing it!

That is an average of 1.86 miles per day!

You’ve spent 1.22% of your time doing this activity!

You’ve averaged 6mph!

In a Vehicle:

The time period was 2.8 years!

The total distance gone over that full time period was 31394 kilometers, or 19511 miles!

You spent 1299 hours doing it!

That is an average of 19.12 miles per day!

You’ve spent 5.3% of your time doing this activity!

You’ve averaged 15mph!

All sounds pretty reasonable to me! Except maybe the speeds for all three seem pretty low. Likely because I am already including lots of time periods of me not moving. But the distances/day seem like roughly what I would expect to see. Anyhow, not much I can do with this now except futz with the threshold and see what seems most reasonable.

Author: Steen

Steen is a nerdy biologist who spends a lot of time trying to cultivate Chloroflexi, who also likes to draw comics, play video games, and climb.

One thought on “Google Location History Data Part IV”

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.