# Google Location History Data Part IV

I’ve returned to my Google Location History Data (the previous installment of which is here), to implement something I had been thinking about for a while: setting a maximum time threshold for how far apart (temporally) two coordinate pairs (of the same activity type) can be and still be considered part of the “same trip.” Figuring out this threshold isn’t as obvious as I was originally thinking, but I still think I got something meaningful out of this exercise. I think a spread of 10 minutes obviously encompasses the same trip, but what about an hour? In many cases, likely not.

Recall that of ~1,000,000 data points (over 3 years), only 288,922 had any activities associated with them. This means, for the 1,000,000 data points, I have (on average) one point for every 31 seconds. For the 288,922 activity-associated data points, I have (on average) one point for every 109 seconds. So, the threshold will have to at least be 109 seconds, but most likely higher.

So I futzed around with the threshold a lot to see how that changed the results. I think 15-20 minutes is a pretty good threshold without allowing a ton of noise in, but the average miles/day for all the activities still seems pretty low with that threshold. The true number might lie somewhere closer between 20m and 1h. Or maybe I just don’t go as far as I imagine I do!

I wrote the threshold decider stuff in Ruby, because I had initially written the “sort-by-activities” script in Ruby (this was before I knew I was going to use Python for Basemap etc). And I had already written the Haversine script in Ruby, and I wanted to reuse that. Oh, well. Maybe one day I will normalize everything to be in Python, for consistency’s sake.

```#Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs
def haversine(old_lats_longs, new_lats_longs)
lat1 = old_lats_longs
lon1 = old_lats_longs
lat2 = new_lats_longs
lon2 = new_lats_longs

r = 6371000
phi1 = (lat1*Math::PI)/180
phi2 = (lat2*Math::PI)/180

deltaPhi = ((lat2-lat1)*Math::PI)/180

deltaLambda = ((lon2-lon1)*Math::PI)/180

a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2)
c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a))

distance = r * c

return distance
end

#Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold
def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]]
threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip
distance = 0
total_time = 0
time_period = dated_coords.first-dated_coords.last

previous_time = dated_coords.first
previous_lats_longs = dated_coords.first

dated_coords.each do |dated_coord|
(distance+=haversine(previous_lats_longs, dated_coord)) && (total_time+=(previous_time-dated_coord)) if previous_time-dated_coord <= threshold
previous_time = dated_coord
previous_lats_longs = dated_coord
end
end```

This is the meat of the threshold decider. Pretty simple. Almost exactly identical to the old Haversine distance calculator I wrote to get me aggregate distances, except the distances only get calculated/added if they are within the temporal threshold.

This is pretty much the context I have it in right now, with the file-reader and human-friendly-displayer (et al) to quickly display some of the stats that I am interested in seeing:

```require 'time'

#Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs
def haversine(old_lats_longs, new_lats_longs)
lat1 = old_lats_longs
lon1 = old_lats_longs
lat2 = new_lats_longs
lon2 = new_lats_longs

r = 6371000
phi1 = (lat1*Math::PI)/180
phi2 = (lat2*Math::PI)/180

deltaPhi = ((lat2-lat1)*Math::PI)/180

deltaLambda = ((lon2-lon1)*Math::PI)/180

a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2)
c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a))

distance = r * c

return distance
end

#Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold
def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]]
threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip
distance = 0
total_time = 0
time_period = dated_coords.first-dated_coords.last

previous_time = dated_coords.first
previous_lats_longs = dated_coords.first

dated_coords.each do |dated_coord|
(distance+=haversine(previous_lats_longs, dated_coord)) && (total_time+=(previous_time-dated_coord)) if previous_time-dated_coord <= threshold
previous_time = dated_coord
previous_lats_longs = dated_coord
end

display(time_period, total_time, distance)
end

dated_coords = [] #An array of coordinate pairs with their timestamp, format: [timestamp,[lat,long]]

File.open("inVehicle.txt", 'r') do |file|
file.each_line do |line|
columns = line.split("\t")
dated_coords << [Time.parse(columns),[columns.to_f,columns.to_f]]
end
end

threshold_decider(dated_coords)
end

def sec_to_year(seconds)
seconds/31536000
end

def sec_to_hour(seconds)
seconds/3600
end

def sec_to_day(seconds)
seconds/86400
end

def m_to_km(meters)
meters/1000
end

def m_to_mi(meters)
meters/1609
end

#Displays all the information in a way that humans like to read
def display(time_period, total_time, distance)
puts "The time period was #{sec_to_year(time_period).round(2)} years!"
puts "The total distance gone over that full time period was #{m_to_km(distance).round} kilometers, or #{m_to_mi(distance).round} miles!"
puts "You spent #{sec_to_hour(total_time).round} hours doing it!"
puts "That is an average of #{(m_to_mi(distance)/sec_to_day(time_period)).round(2)} miles per day!"
puts "You've spent #{((total_time/time_period)*100).round(2)}% of your time doing this activity!"
puts "You've averaged #{(m_to_mi(distance)/sec_to_hour(total_time)).round}mph!"
end

So! Let us see some of my results I got with the threshold set to 1,000 seconds.

Walking:

The time period was 2.8 years!

The total distance gone over that full time period was 5543 kilometers, or 3445 miles!

You spent 1864 hours doing it!

That is an average of 3.37 miles per day!

You’ve spent 7.6% of your time doing this activity!

You’ve averaged 2mph!

Bicycling:

The time period was 2.79 years!

The total distance gone over that full time period was 3047 kilometers, or 1894 miles!

You spent 297 hours doing it!

That is an average of 1.86 miles per day!

You’ve spent 1.22% of your time doing this activity!

You’ve averaged 6mph!

In a Vehicle:

The time period was 2.8 years!

The total distance gone over that full time period was 31394 kilometers, or 19511 miles!

You spent 1299 hours doing it!

That is an average of 19.12 miles per day!

You’ve spent 5.3% of your time doing this activity!

You’ve averaged 15mph!

All sounds pretty reasonable to me! Except maybe the speeds for all three seem pretty low. Likely because I am already including lots of time periods of me not moving. But the distances/day seem like roughly what I would expect to see. Anyhow, not much I can do with this now except futz with the threshold and see what seems most reasonable. ## Author: Steen

Steen is a nerdy biologist who spends a lot of time trying to cultivate Chloroflexi, who also likes to draw comics, play video games, and climb.

## One thought on “Google Location History Data Part IV”

This site uses Akismet to reduce spam. Learn how your comment data is processed.