Traversing a Hashie::Mash

While messing around with gems for Ruby, and in particular Dark Sky’s weather forecast API, Forecast.io, I came across an object type that I hadn’t used before: the Mash.

It seems like a deeply-nested thing, and I couldn’t really see an obvious way to iterate over it, so I figured I’d share the process that I used to unwrap it.

In order to figure out exactly what Forecast.io was returning in the first place, I first used the class method:

forecast = ForecastIO.forecast(latitude, longitude)
puts forecast.class

which returned:

Hashie::Mash

This is how I figured out it was a Mash in the first place. I then converted the Mash to a hash:

forecast = forecast.to_hash

and was then able to see what this resulting hash was composed of, again using the class method:

forecast["daily"].each do |key, value|
  puts "Key name: #{key}\tKey type: #{key.class}\tValue type: #{value.class}"
end

which returned:

Key name: summary Key type: String Value type: String
Key name: icon Key type: String Value type: String
Key name: data Key type: String Value type: Array

And so I was able to discover that all the forecast data that I wanted to access was kept in an array, as a value paired to the key called “data.” “Summary” contains today’s weather summary only, and “icon” contains today’s weather icon only. As we will see, those data are also stored for today’s date in the forecast, so we don’t even need to use them for the current day.

But what is each element in this array?

Turns out the data is in an array of 8 hashes – today’s weather plus the seven day forecast. Each hash represents a different day. Each hash contains everything from ozone to pressure to sunset times, and feel free to poke around in there. For now, all I’m interested in is the temperature maximum and minimum, along with the summary.

forecast = ForecastIO.forecast(latitude, longitude)
forecast = forecast.to_hash

forecast["daily"].each do |key, value|
  if key =="data"
    value.each do |n|
      date = Time.at(n['time'])
      puts "#{date.month}/#{date.day}\nHigh: #{n['temperatureMax']}F\tLow: #{n['temperatureMin']}F\n#{n['summary']}\n--------"
    end
  end
end

And that successfully pulled everything out and displayed it nice and pretty, so this was one way of traversing the Mash.

In the end, I was able to make a quick little script to pull out forecast data for a location input by a user using Forecast.io for getting the forecast and geocoder for looking up the latitude/longitude of a location:

require 'forecast_io'
require 'date'
require 'time'
require 'geocoder'

date = Date.today

puts "Enter the city and/or state you would like to get a forecast for:"
location = gets.chomp
puts "--------"

city = Geocoder.search(location)
latitude = city[0].latitude
longitude = city[0].longitude

ForecastIO.api_key = 'YOUR API KEY'

forecast = ForecastIO.forecast(latitude, longitude)
forecast = forecast.to_hash

forecast["daily"].each do |key, value|
  if key =="data"
    value.each do |n|
      date = Time.at(n['time'])
      puts "#{date.month}/#{date.day}\nHigh: #{n['temperatureMax']}F\tLow: #{n['temperatureMin']}F\n#{n['summary']}\n--------"
    end
  end
end

For example, if you input “san francisco,” you get an output that looks like this:

Enter the city and/or state you would like to get a forecast for:
san francisco
--------
7/11
High: 70.66F Low: 57.11F
Partly cloudy starting in the evening.
--------
7/12
High: 69.74F Low: 54.98F
Partly cloudy in the morning.
--------
7/13
High: 72.35F Low: 56.02F
Partly cloudy in the morning.
--------
7/14
High: 75.86F Low: 55.67F
Partly cloudy in the morning.
--------
7/15
High: 75.49F Low: 57.89F
Clear throughout the day.
--------
7/16
High: 69.27F Low: 57.4F
Clear throughout the day.
--------
7/17
High: 66.31F Low: 56.43F
Mostly cloudy until afternoon.
--------
7/18
High: 66.72F Low: 55.71F
Mostly cloudy until afternoon.
--------

Hooray!

Although I figure, there’s got to be a less roundabout-way of getting data out of a Mash (without converting them into hashes), because otherwise why would they even exist? Thus, further research is called for on my part.

Google Location History Part III

While I was researching more ways to visualize my Google Location History, I ran across Beneath the Data’s excellent post on exactly that. I thought I’d start with trying to replicate what Tyler had done (because his figures are way prettier than the ones I’ve made thus far), and then doing my own thing from there. Little did I know that this would send me down some giant GIS/shapefile/geography rabbit hole.

I have to be honest, my main takeaway from the Beneath the Data post was that you could download shapefiles and use those to make your figures instead of the default world stuff from Basemap that I had been using. I’ll also admit that, until this point, I thought the Basemap’s drawmapboundary(), drawcoastlines(), drawcountries(), etc were super cool. Not as cool as using some super detailed and specific shapefile, though!

I immediately went to download a bunch of shapefiles of San Francisco, California, etc. I got obsessed with Shapefiles. I saw a bunch used for ecological studies of the bay, but (though I was sorely tempted) I did not download those. Unfortunately, this is where I hit my first snag. Apparently, Seattle uploads shapefiles with WGS84 coordinates. San Francisco, alas, does not. Anywhere. Ever.

There is a boundless sea of vector files available for SF, but no WGS84 coordinates to be had. I researched for an embarrassing number of hours of how to convert to WGS84 and trying to figure out if they were calculations I could automate, and how exactly the x,y coordinates for shapefile vectors were calculated. I even installed QGIS because somebody said that you could save a layer as WGS84 through there. Well, I couldn’t get that to work. Maybe it is due to my inexperience with GIS and shapefiles, or maybe it is a function that QGIS doesn’t have anymore. I don’t know.

Finally, I found out that gdal has a super easy way to convert shapefiles to GeoJSON. It is literally just one command. PHEW. At last, I could download any Shapefile I wanted, and instantly convert them to get WGS84 coordinates if they didn’t already have them. I went with this San Francisco Zip Codes shapefile, and converted that.

Of course, immediately after I figured out that gdal has this functionality, I found a California County shapefile that already had it in WGS84 coordinates, no conversion necessary. ISN’T THAT ALWAYS THE WAY.

Anyhow, I spent so much time learning about maps and shapefiles, that I haven’t yet done anything beyond simply replicating what Tyler did, but with my own data. But now that I’ve gotten all that out of the way, making the visualizations that I want will be the next step.

My location density for San Francisco between 9/20/2013 and 5/22/2016
My location density for San Francisco between 9/20/2013 and 5/22/2016
Excerpt from my location density in California Counties between 9/20/2013 and 5/22/2016
Excerpt from my location density in California Counties between 9/20/2013 and 5/22/2016

My older post Google Location History Part II can be read here.

More Google Location Data

So I’ve been doing some more investigating of my Google Timeline Data here and there (as I started writing about here).

After my last post, a friend of mine pointed me towards the Haversine formula for calculating distance between two sets of lat, long coordinates and with that I was able to calculate distances on a day-by-day basis that were consistently close to Google’s estimations for the days. Encouraged, I then moved on to calculating distances on a per-year basis, and that was fun. I got what seemed reasonable to me:

Between 5/22/15 and 5/22/16 I went 20,505 miles

Between 9/20/2013 and 5/22/16 I went 43,434 miles

Recall that these are for every sort of movement, including airplanes. So you can see my average miles/day went way up recently, due to a few big recent airplane trips. So if the numbers seem high (~60 miles a day, ~45 miles a day), this is why.

But then I wanted to visualize some of the data. I decided to use matplotlib to plot the points because it is easy to use and because Python has an easy way to load JSON data.

So I ended up breaking down the points by the assigned “activity.” True to my word, I only considered the “highest likelihood activities,” and discarded all the other less likely ones. To keep it simple.

You may recall that I had a total of 1,048,575 position data points. Well, only 288,922 had activities assigned to them. So just over a quarter. Still, it is enough data to have a bit of fun with.

Of those 288,922 data points with activity, it turns out that there were only a total of 7 different activities:

still
unknown
onBicycle
onFoot
tilting
inVehicle
exitingVehicle

The first obvious thing to do was to sort by activity type and then plot the  coordinates segregated by activity type.

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import csv

latitudeArray = []
longitudeArray = []

with open("bicycle.csv") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        x, y = float(row[0]), float(row[1])
        latitudeArray.append(x) #storing all the latitudes and longitudes in separate arrays, but the index is the same for each pair
        longitudeArray.append(y)

m = Basemap(llcrnrlon=min(longitudeArray)-10, #Set map's displayed max/min based on your set's max/min
    llcrnrlat=min(latitudeArray)-10,
    urcrnrlon=max(longitudeArray)+10,
    urcrnrlat=max(latitudeArray)+10,
    lat_ts=20, #"latitude of true scale" lat_ts=0 is stereographic projection
    resolution='h', #resolution can be set to 'c','l','i','h', or 'f' - for crude, low, intermediate, high, or full
    projection='merc',
    lon_0=longitudeArray[0],
    lat_0=latitudeArray[0])

x1, y1 = m(longitudeArray, latitudeArray) #map the lat, long arrays to x, y coordinate pairs

m.drawmapboundary(fill_color="white")
m.drawcoastlines()
m.drawcountries()

m.scatter(x1, y1, s=25, c='r', marker="o") #Plot your markers and pick their size, color, and shape

plt.title("Steen's Bicycling Coordinates")

plt.show()

Note that, at the time, I wrote a script which saved all the different activity types into their own CSV, because I wanted to look at them and play with them individually. And I then decided to load these CSV files into my script for plotting – because they were there. Note that, if I went back and did it again, I’d probably not bother with the CSV intermediary, and just go straight from the geoJSON file.

And in this way I was able to see the different activities plotted onto Basemap’s default map thing:

steen-biking03
Points in San Francisco from my Timeline Data that Google tagged as “onBicycle.” I can see a lot of my favorite biking routes, as well as that time I biked the Golden Gate Bridge.
onfoot06
Points in San Francisco from my Timeline Data that Google tagged as “onFoot.” Streets are less clear, and it sort of looks like a big blob centered around the Mission. Which seems accurate, given that I just sort of meander about. The one straight looks like Market, which I occasionally will walk all the way on, so that makes sense.
invehicle03
Points in San Francisco from my Timeline Data that Google tagged as “inVehicle.” All the points in the bay are most likely due to when I take the ferry to work – as a ferry is, in fact, a vehicle. And I am not surprised to see that the vast majority of the points appear to be on Van Ness.

I also plotted the other categories – including the mysterious “tilting,” but I couldn’t really discern any sort of meaning from those. They just looked like all the points from everywhere I’ve ever been, and were therefore not very meaningful. Not like the dramatic differences between the points and obvious routes for biking, on any sort of vehicle, or walking. So, there’s no need for you to see those.

I’d say this was a success. So my next question is to figure out how much time I spend doing each activity. And how much time I spend in each place. I’ll have to think about it a bit on how to calculate the time. All those points with no activity associated with them have me concerned that it won’t be very straightforward to just subtract the timestamps to get ΔT. It could be the case that in the “no activity” points, I had ended up doing completely different things, and then returned to the original activity (in which case the calculated ΔT would be incorrect). But, then again, it is probably very likely that all those “no activity” points actually are the same activity if they are bookended by it. Hmm.

So now I’m wondering if, given that a time point is uploaded every 60 seconds, I should just say 1 timepoint = 60 seconds? That doesn’t seem quite right to me, but I’ll work it out for a smaller data set and see if that even comes close to accurate. I’ll keep thinking about it, but if anybody has any suggestions on how to get around this problem, feel free to let me know!