{"id":1077,"date":"2016-08-12T18:34:23","date_gmt":"2016-08-13T01:34:23","guid":{"rendered":"http:\/\/www.certainly-strange.com\/?p=1077"},"modified":"2016-08-12T18:45:25","modified_gmt":"2016-08-13T01:45:25","slug":"google-location-history-data-part-iv","status":"publish","type":"post","link":"http:\/\/www.certainly-strange.com\/?p=1077","title":{"rendered":"Google Location History Data Part IV"},"content":{"rendered":"<p>I&#8217;ve returned to my Google Location History Data (<a href=\"http:\/\/www.certainly-strange.com\/?p=989\">the previous installment of which is here<\/a>), to implement something I had been thinking about for a while: setting a maximum time threshold for how far apart (temporally) two coordinate pairs (of the same activity type) can be and still be considered part of the &#8220;same trip.&#8221; Figuring out this threshold isn&#8217;t as obvious as I was originally thinking, but I still think I got something meaningful out of this exercise. I think a spread of 10 minutes obviously encompasses the same trip, but what about an hour? In many cases, likely not.<\/p>\n<p>Recall that <a href=\"http:\/\/www.certainly-strange.com\/?p=977\">of ~1,000,000 data points (over 3 years), only 288,922 had any activities associated with them<\/a>. This means, for the 1,000,000 data points, I have (on average) one point for every 31 seconds. For the\u00a0288,922 activity-associated data points, I have (on average) one point for every 109 seconds. So, the threshold will have to at least be 109 seconds, but most likely higher.<\/p>\n<p>So I futzed around with the threshold a lot to see how that changed the results. I think 15-20 minutes is a pretty good threshold without allowing a ton of noise in, but the average miles\/day for all the activities still seems pretty low with that threshold. The true number might lie somewhere closer between 20m and 1h. Or maybe I just don&#8217;t go as far as I imagine I do!<\/p>\n<p>I wrote the threshold decider stuff in Ruby, because I had initially written the &#8220;sort-by-activities&#8221; script in Ruby (this was before I knew I was going to use Python for Basemap etc). And I had already written the Haversine script in Ruby, and I wanted to reuse that. Oh, well. Maybe one day I will normalize everything to be in Python, for consistency&#8217;s sake.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-theme=\"twilight\" data-enlighter-language=\"ruby\">#Uses\u00a0the\u00a0Haversine\u00a0formula\u00a0to\u00a0calculate\u00a0the\u00a0distance\u00a0between\u00a0two\u00a0lat,\u00a0long\u00a0coordinate\u00a0pairs\r\ndef\u00a0haversine(old_lats_longs,\u00a0new_lats_longs)\r\n\u00a0\u00a0lat1\u00a0=\u00a0old_lats_longs[0]\r\n\u00a0\u00a0lon1\u00a0=\u00a0old_lats_longs[1]\r\n\u00a0\u00a0lat2\u00a0=\u00a0new_lats_longs[0]\r\n\u00a0\u00a0lon2\u00a0=\u00a0new_lats_longs[1]\r\n\r\n\u00a0\u00a0r\u00a0=\u00a06371000\r\n\u00a0\u00a0phi1\u00a0=\u00a0(lat1*Math::PI)\/180\r\n\u00a0\u00a0phi2\u00a0=\u00a0(lat2*Math::PI)\/180\r\n\r\n\u00a0\u00a0deltaPhi\u00a0=\u00a0((lat2-lat1)*Math::PI)\/180\r\n\r\n\u00a0\u00a0deltaLambda\u00a0=\u00a0((lon2-lon1)*Math::PI)\/180\r\n\r\n\u00a0\u00a0a\u00a0=\u00a0Math.sin(deltaPhi\/2)\u00a0*\u00a0Math.sin(deltaPhi\/2)\u00a0+\u00a0Math.cos(phi1)\u00a0*\u00a0Math.cos(phi2)\u00a0*\u00a0Math.sin(deltaLambda\/2)\u00a0*\u00a0Math.sin(deltaLambda\/2)\r\n\u00a0\u00a0c\u00a0=\u00a02\u00a0*\u00a0Math.atan2(Math.sqrt(a),\u00a0Math.sqrt(1-a))\r\n\r\n\u00a0\u00a0distance\u00a0=\u00a0r\u00a0*\u00a0c\r\n\r\n\u00a0\u00a0return\u00a0distance\r\nend\r\n\r\n#Decides\u00a0whether\u00a0to\u00a0count\u00a0a\u00a0dated\u00a0coordinate\u00a0as\u00a0part\u00a0of\u00a0the\u00a0same\u00a0trip\u00a0or\u00a0not,\u00a0based\u00a0on\u00a0the\u00a0time\u00a0threshold\r\ndef\u00a0threshold_decider(dated_coords)\u00a0#dated_coords\u00a0is\u00a0an\u00a0array\u00a0of\u00a0coordinate\u00a0pairs\u00a0with\u00a0their\u00a0time,\u00a0format:\u00a0[time,[lat,long]]\r\n\u00a0\u00a0threshold\u00a0=\u00a01000\u00a0#Set\u00a0a\u00a0maximum\u00a0threshold\u00a0(in\u00a0seconds)\u00a0for\u00a0a\u00a0coordinate\u00a0to\u00a0be\u00a0counted\u00a0in\u00a0the\u00a0same\u00a0trip\r\n\u00a0\u00a0distance\u00a0=\u00a00\r\n\u00a0\u00a0total_time\u00a0=\u00a00\r\n\u00a0\u00a0time_period\u00a0=\u00a0dated_coords.first[0]-dated_coords.last[0]\r\n\r\n\u00a0\u00a0previous_time\u00a0=\u00a0dated_coords.first[0]\r\n\u00a0\u00a0previous_lats_longs\u00a0=\u00a0dated_coords.first[1]\r\n\r\n\u00a0\u00a0dated_coords.each\u00a0do\u00a0|dated_coord|\r\n\u00a0\u00a0\u00a0\u00a0(distance+=haversine(previous_lats_longs,\u00a0dated_coord[1]))\u00a0&amp;&amp;\u00a0(total_time+=(previous_time-dated_coord[0]))\u00a0if\u00a0previous_time-dated_coord[0]\u00a0&lt;=\u00a0threshold\r\n\u00a0\u00a0\u00a0\u00a0previous_time\u00a0=\u00a0dated_coord[0]\r\n\u00a0\u00a0\u00a0\u00a0previous_lats_longs\u00a0=\u00a0dated_coord[1]\r\n\u00a0\u00a0end\r\nend<\/pre>\n<p>This is the meat of the threshold decider. Pretty simple. Almost exactly identical to the old Haversine distance calculator I wrote to get me aggregate distances, except the distances only get calculated\/added if they are within the temporal threshold.<\/p>\n<p>This is pretty much the context I have it in right now, with the file-reader and human-friendly-displayer (et al) to quickly display some of the stats that I am interested in seeing:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-theme=\"twilight\" data-enlighter-language=\"ruby\">require\u00a0'time'\r\n\r\n#Uses\u00a0the\u00a0Haversine\u00a0formula\u00a0to\u00a0calculate\u00a0the\u00a0distance\u00a0between\u00a0two\u00a0lat,\u00a0long\u00a0coordinate\u00a0pairs\r\ndef\u00a0haversine(old_lats_longs,\u00a0new_lats_longs)\r\n\u00a0\u00a0lat1\u00a0=\u00a0old_lats_longs[0]\r\n\u00a0\u00a0lon1\u00a0=\u00a0old_lats_longs[1]\r\n\u00a0\u00a0lat2\u00a0=\u00a0new_lats_longs[0]\r\n\u00a0\u00a0lon2\u00a0=\u00a0new_lats_longs[1]\r\n\r\n\u00a0\u00a0r\u00a0=\u00a06371000\r\n\u00a0\u00a0phi1\u00a0=\u00a0(lat1*Math::PI)\/180\r\n\u00a0\u00a0phi2\u00a0=\u00a0(lat2*Math::PI)\/180\r\n\r\n\u00a0\u00a0deltaPhi\u00a0=\u00a0((lat2-lat1)*Math::PI)\/180\r\n\r\n\u00a0\u00a0deltaLambda\u00a0=\u00a0((lon2-lon1)*Math::PI)\/180\r\n\r\n\u00a0\u00a0a\u00a0=\u00a0Math.sin(deltaPhi\/2)\u00a0*\u00a0Math.sin(deltaPhi\/2)\u00a0+\u00a0Math.cos(phi1)\u00a0*\u00a0Math.cos(phi2)\u00a0*\u00a0Math.sin(deltaLambda\/2)\u00a0*\u00a0Math.sin(deltaLambda\/2)\r\n\u00a0\u00a0c\u00a0=\u00a02\u00a0*\u00a0Math.atan2(Math.sqrt(a),\u00a0Math.sqrt(1-a))\r\n\r\n\u00a0\u00a0distance\u00a0=\u00a0r\u00a0*\u00a0c\r\n\r\n\u00a0\u00a0return\u00a0distance\r\nend\r\n\r\n#Decides\u00a0whether\u00a0to\u00a0count\u00a0a\u00a0dated\u00a0coordinate\u00a0as\u00a0part\u00a0of\u00a0the\u00a0same\u00a0trip\u00a0or\u00a0not,\u00a0based\u00a0on\u00a0the\u00a0time\u00a0threshold\r\ndef\u00a0threshold_decider(dated_coords)\u00a0#dated_coords\u00a0is\u00a0an\u00a0array\u00a0of\u00a0coordinate\u00a0pairs\u00a0with\u00a0their\u00a0time,\u00a0format:\u00a0[time,[lat,long]]\r\n\u00a0\u00a0threshold\u00a0=\u00a01000\u00a0#Set\u00a0a\u00a0maximum\u00a0threshold\u00a0(in\u00a0seconds)\u00a0for\u00a0a\u00a0coordinate\u00a0to\u00a0be\u00a0counted\u00a0in\u00a0the\u00a0same\u00a0trip\r\n\u00a0\u00a0distance\u00a0=\u00a00\r\n\u00a0\u00a0total_time\u00a0=\u00a00\r\n\u00a0\u00a0time_period\u00a0=\u00a0dated_coords.first[0]-dated_coords.last[0]\r\n\r\n\u00a0\u00a0previous_time\u00a0=\u00a0dated_coords.first[0]\r\n\u00a0\u00a0previous_lats_longs\u00a0=\u00a0dated_coords.first[1]\r\n\r\n\u00a0\u00a0dated_coords.each\u00a0do\u00a0|dated_coord|\r\n\u00a0\u00a0\u00a0\u00a0(distance+=haversine(previous_lats_longs,\u00a0dated_coord[1]))\u00a0&amp;&amp;\u00a0(total_time+=(previous_time-dated_coord[0]))\u00a0if\u00a0previous_time-dated_coord[0]\u00a0&lt;=\u00a0threshold\r\n\u00a0\u00a0\u00a0\u00a0previous_time\u00a0=\u00a0dated_coord[0]\r\n\u00a0\u00a0\u00a0\u00a0previous_lats_longs\u00a0=\u00a0dated_coord[1]\r\n\u00a0\u00a0end\r\n\r\n\u00a0\u00a0display(time_period,\u00a0total_time,\u00a0distance)\r\nend\r\n\r\n#Reads\u00a0the\u00a0file\u00a0of\u00a0sorted\u00a0Google\u00a0Location\u00a0History\u00a0data\r\ndef\u00a0reader()\r\n\u00a0\u00a0dated_coords\u00a0=\u00a0[]\u00a0#An\u00a0array\u00a0of\u00a0coordinate\u00a0pairs\u00a0with\u00a0their\u00a0timestamp,\u00a0format:\u00a0[timestamp,[lat,long]]\r\n\r\n\u00a0\u00a0File.open(\"inVehicle.txt\",\u00a0'r')\u00a0do\u00a0|file|\r\n\u00a0\u00a0\u00a0\u00a0file.each_line\u00a0do\u00a0|line|\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0columns\u00a0=\u00a0line.split(\"\\t\")\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dated_coords\u00a0&lt;&lt;\u00a0[Time.parse(columns[0]),[columns[1].to_f,columns[2].to_f]]\r\n\u00a0\u00a0\u00a0\u00a0end\r\n\u00a0\u00a0end\r\n\r\n\u00a0\u00a0threshold_decider(dated_coords)\r\nend\r\n\r\ndef\u00a0sec_to_year(seconds)\r\n\u00a0\u00a0seconds\/31536000\r\nend\r\n\r\ndef\u00a0sec_to_hour(seconds)\r\n\u00a0\u00a0seconds\/3600\r\nend\r\n\r\ndef\u00a0sec_to_day(seconds)\r\n\u00a0\u00a0seconds\/86400\r\nend\r\n\r\ndef\u00a0m_to_km(meters)\r\n\u00a0\u00a0meters\/1000\r\nend\r\n\r\ndef\u00a0m_to_mi(meters)\r\n\u00a0\u00a0meters\/1609\r\nend\r\n\r\n#Displays\u00a0all\u00a0the\u00a0information\u00a0in\u00a0a\u00a0way\u00a0that\u00a0humans\u00a0like\u00a0to\u00a0read\r\ndef\u00a0display(time_period,\u00a0total_time,\u00a0distance)\r\n\u00a0\u00a0puts\u00a0\"The\u00a0time\u00a0period\u00a0was\u00a0#{sec_to_year(time_period).round(2)}\u00a0years!\"\r\n\u00a0\u00a0puts\u00a0\"The\u00a0total\u00a0distance\u00a0gone\u00a0over\u00a0that\u00a0full\u00a0time\u00a0period\u00a0was\u00a0#{m_to_km(distance).round}\u00a0kilometers,\u00a0or\u00a0#{m_to_mi(distance).round}\u00a0miles!\"\r\n\u00a0\u00a0puts\u00a0\"You\u00a0spent\u00a0#{sec_to_hour(total_time).round}\u00a0hours\u00a0doing\u00a0it!\"\r\n\u00a0\u00a0puts\u00a0\"That\u00a0is\u00a0an\u00a0average\u00a0of\u00a0#{(m_to_mi(distance)\/sec_to_day(time_period)).round(2)}\u00a0miles\u00a0per\u00a0day!\"\r\n\u00a0\u00a0puts\u00a0\"You've\u00a0spent\u00a0#{((total_time\/time_period)*100).round(2)}%\u00a0of\u00a0your\u00a0time\u00a0doing\u00a0this\u00a0activity!\"\r\n\u00a0\u00a0puts\u00a0\"You've\u00a0averaged\u00a0#{(m_to_mi(distance)\/sec_to_hour(total_time)).round}mph!\"\r\nend\r\n\r\nreader()<\/pre>\n<p>So! Let us see some of my results I got with the threshold set to 1,000 seconds.<\/p>\n<p>Walking:<\/p>\n<blockquote><p>The time period was 2.8 years!<\/p>\n<p>The total distance gone over that full time period was 5543 kilometers, or 3445 miles!<\/p>\n<p>You spent 1864 hours doing it!<\/p>\n<p>That is an average of 3.37 miles per day!<\/p>\n<p>You&#8217;ve spent 7.6% of your time doing this activity!<\/p>\n<p>You&#8217;ve averaged 2mph!<\/p><\/blockquote>\n<p>Bicycling:<\/p>\n<blockquote><p>The time period was 2.79 years!<\/p>\n<p>The total distance gone over that full time period was 3047 kilometers, or 1894 miles!<\/p>\n<p>You spent 297 hours doing it!<\/p>\n<p>That is an average of 1.86 miles per day!<\/p>\n<p>You&#8217;ve spent 1.22% of your time doing this activity!<\/p>\n<p>You&#8217;ve averaged 6mph!<\/p><\/blockquote>\n<p>In a Vehicle:<\/p>\n<blockquote><p>The time period was 2.8 years!<\/p>\n<p>The total distance gone over that full time period was 31394 kilometers, or 19511 miles!<\/p>\n<p>You spent 1299 hours doing it!<\/p>\n<p>That is an average of 19.12 miles per day!<\/p>\n<p>You&#8217;ve spent 5.3% of your time doing this activity!<\/p>\n<p>You&#8217;ve averaged 15mph!<\/p><\/blockquote>\n<p>All sounds pretty reasonable to me! Except maybe the speeds for all three seem pretty low. Likely because I am already including lots of time periods of me not moving. But the distances\/day seem like roughly what I would expect to see. Anyhow, not much I can do with this now except futz with the threshold and see what seems most reasonable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve returned to my Google Location History Data (the previous installment of which is here), to implement something I had been thinking about for a while: setting a maximum time threshold for how far apart (temporally) two coordinate pairs (of the same activity type) can be and still be considered part of the &#8220;same trip.&#8221; &hellip; <a href=\"http:\/\/www.certainly-strange.com\/?p=1077\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Google Location History Data Part IV&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"categories":[361,362],"tags":[376,375],"class_list":["post-1077","post","type-post","status-publish","format-standard","hentry","category-programming","category-ruby","tag-haversine","tag-ruby"],"_links":{"self":[{"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/posts\/1077","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1077"}],"version-history":[{"count":7,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/posts\/1077\/revisions"}],"predecessor-version":[{"id":1085,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=\/wp\/v2\/posts\/1077\/revisions\/1085"}],"wp:attachment":[{"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1077"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1077"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.certainly-strange.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}