I was looking for a way to compute characteristics of best paths between Velib stations such as variations of elevation and distances. These variables being mandatory to understand the variation of flow between Origin-Destination pairs. I ended up with open trip planner a great open-source software that plan trips from open-source data, (thanks open street map). It is used in different city like Portland for their official trip planner service. Thanks to this tool and to the great works of openplans for the webapps, i was able to get a fully functional trip planers for Velib in few days. It find best path together with recommended start and destination stations using real time information on stations states from the un-official API (from citybik.es). You may even try it by clicking on the screen-shot (it may be slow since i host the tiles myself).

velibme

To get such service you must build a valued graph between locations so that open trip planer may look for “shortest” path between the origin and destination supplied by the users. Such graph may be constructed from various data source. Here, i build the graph using osm data for the streets (available here) and elevations data from http://earthexplorer.usgs.gov/. But, you may also take into account infos from transport agency to deal with transit trips if a gtfs file is available for our city. A lot more of useful information are available on the open trip planner github page.

To start the year, i made a little dataviz to explore the the usage of a french online marketplace between private person called leboncoin. This site gives for each categories of object and regions the number of articles in sell. Thus a cartogram can be used to distort the french boundaries such that regions areas encode number of articles in sell. This will give maps like this one :

cara

This particular map corresponds to the announces for recreational vehicles. To compute the distortion and daw the maps a d3 plugin is available. It transforms a geography encoded in toposon. To build such viz you thus have to provide a toposon file of region boundaries and a csv file with the feature values that you want to use for the cartograms. I will shortly describe the steps, i have follow to build them.

The csv file is perhaps the simplest one. First the data themselves must be scrapped from the marketplace site. For this task, i’m pretty old school and like to use bash scripts with wget, sed and grep commands to retrieve the different pages and extract the right numbers from them.

The topojson file is a little bit more tricky to compute. In my case i have started from map background extracted from osm. The french regions boundaries are available here, in a shapefile. To get the topojson file you have first to get a geojson file. You may use the ogr2ogr linux command line to perform this conversion.

ogr2ogr -f geoJSON geo.json geo.shp

Then, you may then use node.js and the topojson pluggin to convert it to the final topojson file.

# Install topojson under node
npm install -g topojson
# Convert the data
topojson -o output_file.json input_file.json

Since the number of announces depends certainly on the regions populations, the final interface propose to scale the variables per inhabitants. This will get a more interesting picture of the data since the cartogram of the population looks like this:

hab

Sometimes, you need to take into account weather data in an analysis, so the question of grabbing weather data from the web arise. This has happened for me in the context of studying Vélib’ usage were weather conditions are important. I found recently that http://www.wunderground.com provide interesting historical data for a variety of wheather stations and at a 5min intervals ! This was perfect for my usage. You may for example retrieve the data for the 5th of September, 2011 recorded at stations Montsouris in Paris here . So using wget

wget -FO "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=I75003PA1&graphspan=day&month=9&day=5&year=2011&format=1"

you may get a text file for a given day. The header of the file will be something like

Time,TemperatureC,DewpointC,PressurehPa,WindDirection,WindDirectionDegrees,WindSpeedKMH,WindSpeedGustKMH,Humidity,HourlyPrecipMM,Conditions,Clouds,dailyrainMM,SoftwareType,DateUTC<br>
2011-09-05 00:06:00,17.3,11.9,1014.4,SSO,202,4.8,9.7,71,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:06:00,
<br>
2011-09-05 00:11:00,17.3,11.9,1014.4,SSO,202,4.8,9.7,71,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:11:00,
<br>
2011-09-05 00:16:00,17.3,11.8,1014.4,SO,225,6.4,16.1,70,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:16:00,
....

These data are quite close to be ready to be processed using R or similar tools. If you want more days, for example one month (this was my case study), you may use few lines of bash to grab the data and clean them:

#!/bin/bash
for i in {1..30}
do
echo "day ${i} dowload"
wget -FO "${i}092011.csv" "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=I75003PA1&graphspan=day&month=9&day=$i&year=2011&format=1"
sed ':a;N;$!ba;s/,\n/\n/g' "${i}092011.csv" > "${i}092011N.csv"
sed ':a;N;$!ba;s/
\n//g'
"${i}092011N.csv" > "${i}092011F.csv"
sed ':a;N;$!ba;s/
//g'
"${i}092011F.csv" > "${i}092011N.csv"
sed '/^$/d' "${i}092011N.csv" > "${i}092011.csv"
sed '1d' "${i}092011.csv" >> septembre2011.csv
rm "${i}092011F.csv"
rm "${i}092011N.csv"
done

The sed wizardry helps to remove line breaks and html markups and you will get a cleansed file for the whole month ready to be imported in R.

septembre2011 <- read.csv("~/Projets/velib/data/meteo/databyhours/septembre2011.csv", header=F)

Eventually, you may plot the first 3000 temperature values to check the data.

ggplot(data=septembre2011[1:3000,],aes(x=as.POSIXct(V1,format="%Y-%m-%d %H:%M:%S"),y=V10))+geom_point()+geom_line()+scale_x_datetime("time")+scale_y_continuous('rainfall (mm / h)')

 

temperature

Or the associated rainfalls :

ggplot(data=septembre2011[1:3000,],aes(x=as.POSIXct(V1,format="%Y-%m-%d %H:%M:%S"),y=V10))+geom_point()+geom_line()+scale_x_datetime("time")+scale_y_continuous('rainfall (mm / h)')

 
rainfall

Welcome to my new blog.

I intend to speak a little bit of my research and provide some tutorial oriented content in this new version.

I hope to publish on a weekly basis.

I will write in English for the moment in order to practice, but will certainly also write posts in french.

So, have a good read …

Etienne