I’ve returned to my Google Location History Data (the previous installment of which is here), to implement something I had been thinking about for a while: setting a maximum time threshold for how far apart (temporally) two coordinate pairs (of the same activity type) can be and still be considered part of the “same trip.” Figuring out this threshold isn’t as obvious as I was originally thinking, but I still think I got something meaningful out of this exercise. I think a spread of 10 minutes obviously encompasses the same trip, but what about an hour? In many cases, likely not.
Recall that of ~1,000,000 data points (over 3 years), only 288,922 had any activities associated with them. This means, for the 1,000,000 data points, I have (on average) one point for every 31 seconds. For the 288,922 activity-associated data points, I have (on average) one point for every 109 seconds. So, the threshold will have to at least be 109 seconds, but most likely higher.
So I futzed around with the threshold a lot to see how that changed the results. I think 15-20 minutes is a pretty good threshold without allowing a ton of noise in, but the average miles/day for all the activities still seems pretty low with that threshold. The true number might lie somewhere closer between 20m and 1h. Or maybe I just don’t go as far as I imagine I do!
I wrote the threshold decider stuff in Ruby, because I had initially written the “sort-by-activities” script in Ruby (this was before I knew I was going to use Python for Basemap etc). And I had already written the Haversine script in Ruby, and I wanted to reuse that. Oh, well. Maybe one day I will normalize everything to be in Python, for consistency’s sake.
#Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs def haversine(old_lats_longs, new_lats_longs) lat1 = old_lats_longs[0] lon1 = old_lats_longs[1] lat2 = new_lats_longs[0] lon2 = new_lats_longs[1] r = 6371000 phi1 = (lat1*Math::PI)/180 phi2 = (lat2*Math::PI)/180 deltaPhi = ((lat2-lat1)*Math::PI)/180 deltaLambda = ((lon2-lon1)*Math::PI)/180 a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2) c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)) distance = r * c return distance end #Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]] threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip distance = 0 total_time = 0 time_period = dated_coords.first[0]-dated_coords.last[0] previous_time = dated_coords.first[0] previous_lats_longs = dated_coords.first[1] dated_coords.each do |dated_coord| (distance+=haversine(previous_lats_longs, dated_coord[1])) && (total_time+=(previous_time-dated_coord[0])) if previous_time-dated_coord[0] <= threshold previous_time = dated_coord[0] previous_lats_longs = dated_coord[1] end end
This is the meat of the threshold decider. Pretty simple. Almost exactly identical to the old Haversine distance calculator I wrote to get me aggregate distances, except the distances only get calculated/added if they are within the temporal threshold.
This is pretty much the context I have it in right now, with the file-reader and human-friendly-displayer (et al) to quickly display some of the stats that I am interested in seeing:
require 'time' #Uses the Haversine formula to calculate the distance between two lat, long coordinate pairs def haversine(old_lats_longs, new_lats_longs) lat1 = old_lats_longs[0] lon1 = old_lats_longs[1] lat2 = new_lats_longs[0] lon2 = new_lats_longs[1] r = 6371000 phi1 = (lat1*Math::PI)/180 phi2 = (lat2*Math::PI)/180 deltaPhi = ((lat2-lat1)*Math::PI)/180 deltaLambda = ((lon2-lon1)*Math::PI)/180 a = Math.sin(deltaPhi/2) * Math.sin(deltaPhi/2) + Math.cos(phi1) * Math.cos(phi2) * Math.sin(deltaLambda/2) * Math.sin(deltaLambda/2) c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)) distance = r * c return distance end #Decides whether to count a dated coordinate as part of the same trip or not, based on the time threshold def threshold_decider(dated_coords) #dated_coords is an array of coordinate pairs with their time, format: [time,[lat,long]] threshold = 1000 #Set a maximum threshold (in seconds) for a coordinate to be counted in the same trip distance = 0 total_time = 0 time_period = dated_coords.first[0]-dated_coords.last[0] previous_time = dated_coords.first[0] previous_lats_longs = dated_coords.first[1] dated_coords.each do |dated_coord| (distance+=haversine(previous_lats_longs, dated_coord[1])) && (total_time+=(previous_time-dated_coord[0])) if previous_time-dated_coord[0] <= threshold previous_time = dated_coord[0] previous_lats_longs = dated_coord[1] end display(time_period, total_time, distance) end #Reads the file of sorted Google Location History data def reader() dated_coords = [] #An array of coordinate pairs with their timestamp, format: [timestamp,[lat,long]] File.open("inVehicle.txt", 'r') do |file| file.each_line do |line| columns = line.split("\t") dated_coords << [Time.parse(columns[0]),[columns[1].to_f,columns[2].to_f]] end end threshold_decider(dated_coords) end def sec_to_year(seconds) seconds/31536000 end def sec_to_hour(seconds) seconds/3600 end def sec_to_day(seconds) seconds/86400 end def m_to_km(meters) meters/1000 end def m_to_mi(meters) meters/1609 end #Displays all the information in a way that humans like to read def display(time_period, total_time, distance) puts "The time period was #{sec_to_year(time_period).round(2)} years!" puts "The total distance gone over that full time period was #{m_to_km(distance).round} kilometers, or #{m_to_mi(distance).round} miles!" puts "You spent #{sec_to_hour(total_time).round} hours doing it!" puts "That is an average of #{(m_to_mi(distance)/sec_to_day(time_period)).round(2)} miles per day!" puts "You've spent #{((total_time/time_period)*100).round(2)}% of your time doing this activity!" puts "You've averaged #{(m_to_mi(distance)/sec_to_hour(total_time)).round}mph!" end reader()
So! Let us see some of my results I got with the threshold set to 1,000 seconds.
Walking:
The time period was 2.8 years!
The total distance gone over that full time period was 5543 kilometers, or 3445 miles!
You spent 1864 hours doing it!
That is an average of 3.37 miles per day!
You’ve spent 7.6% of your time doing this activity!
You’ve averaged 2mph!
Bicycling:
The time period was 2.79 years!
The total distance gone over that full time period was 3047 kilometers, or 1894 miles!
You spent 297 hours doing it!
That is an average of 1.86 miles per day!
You’ve spent 1.22% of your time doing this activity!
You’ve averaged 6mph!
In a Vehicle:
The time period was 2.8 years!
The total distance gone over that full time period was 31394 kilometers, or 19511 miles!
You spent 1299 hours doing it!
That is an average of 19.12 miles per day!
You’ve spent 5.3% of your time doing this activity!
You’ve averaged 15mph!
All sounds pretty reasonable to me! Except maybe the speeds for all three seem pretty low. Likely because I am already including lots of time periods of me not moving. But the distances/day seem like roughly what I would expect to see. Anyhow, not much I can do with this now except futz with the threshold and see what seems most reasonable.