Mon essaie précédent (velibme) avec open trip planner m’a donné envies d’utiliser cet outil de planification de trajets pour projeter les données Vélib (1) sur le réseau routier de paris de manière à représenter le nombre de vélib / mois parcourant chaque artère. Une telle stratégie à déjà été employée sur londre par des chercheurs de l’ ucl.

J’ai donc commencé à étendre l’outil d’analyse batch d’otp pour y inclure une fonction permettant de transformer une matrice Origines/Destinations en estimation de la charge sur le réseau. Celle-ci permet donc d’estimer dans mon cas le nombre de Vélib ayant parcouru chaque artère. Il est ensuite assez aisé de représenter le résultat sous forme de carte avec R et ggplot.

linkmaps4_medium

Cette méthode présente quelques désavantages en particulier la perte des boucles (station de départ = station d’arrivé) mais offre l’avantage d’être très lisible et directement interprétable. En plus, le même code source peut être utilisé pour analyser des systèmes de type métro et produire des visualisations intéressantes ( voir l’étude prospective de fréquentation de l’arc exrpess parisien.)

 

(1) Les données ont été fournies par JC-Decaux, cyclocity et la mairie de paris dans le cadre d’un partenariat avec l’ifsttar.

 
(2) Une version pdf de la carte est disponible ici.
 

I was looking for a way to compute characteristics of best paths between Velib stations such as variations of elevation and distances. These variables being mandatory to understand the variation of flow between Origin-Destination pairs. I ended up with open trip planner a great open-source software that plan trips from open-source data, (thanks open street map). It is used in different city like Portland for their official trip planner service. Thanks to this tool and to the great works of openplans for the webapps, i was able to get a fully functional trip planers for Velib in few days. It find best path together with recommended start and destination stations using real time information on stations states from the un-official API (from citybik.es). You may even try it by clicking on the screen-shot (it may be slow since i host the tiles myself).

velibme

To get such service you must build a valued graph between locations so that open trip planer may look for “shortest” path between the origin and destination supplied by the users. Such graph may be constructed from various data source. Here, i build the graph using osm data for the streets (available here) and elevations data from http://earthexplorer.usgs.gov/. But, you may also take into account infos from transport agency to deal with transit trips if a gtfs file is available for our city. A lot more of useful information are available on the open trip planner github page.

J’ai finis de retoucher la visualisation que j’ai développé pour explorer les données de déplacement Vélib que nous avons obtenus de JC-Decaux et de la Ville de paris. Un mois entier de trajets soit plus de 2 500 000 déplacements en Vélib sont utilisés. Elle permet par exemple de visualiser le trafic entrant / sortant par stations pour différentes heures de la journée. Cliquez sur la capture d’écran pour essayer.

velibnew

La carte représente le nombre de départs ou d’arrivés de Vélib de chacune des stations durant la plage temporelle sélectionnée grâce aux différents menus. Le fond de carte donne des repères géographiques, les parcs, la seine, les lignes de métro et de rer.

Le survol d’une stations permet de visualiser les stations d’origines ou de destinations. Le flux est alors représenté par un trait dont la largeur indique le nombre de déplacements observés entre les deux stations.

Le survol permet également d’observer le profil d’usage temporel d’une station. Celui-ci caractérise la station par le nombre moyen de départs et de retours durant chaque heure d’une journée type de semaine et de week-end. Ces profils sont normalisés par l’activité moyenne de la station pour être comparable d’une station à l’autre. Enfin, les couleurs des stations correspondent à différents clusters de stations similaires construits à l’aide d’un algorithme de clustering dédié (voir cette publication). Le derniers menu permet d’ailleurs de filtrer les stations par cluster.

To start the year, i made a little dataviz to explore the the usage of a french online marketplace between private person called leboncoin. This site gives for each categories of object and regions the number of articles in sell. Thus a cartogram can be used to distort the french boundaries such that regions areas encode number of articles in sell. This will give maps like this one :

cara

This particular map corresponds to the announces for recreational vehicles. To compute the distortion and daw the maps a d3 plugin is available. It transforms a geography encoded in toposon. To build such viz you thus have to provide a toposon file of region boundaries and a csv file with the feature values that you want to use for the cartograms. I will shortly describe the steps, i have follow to build them.

The csv file is perhaps the simplest one. First the data themselves must be scrapped from the marketplace site. For this task, i’m pretty old school and like to use bash scripts with wget, sed and grep commands to retrieve the different pages and extract the right numbers from them.

The topojson file is a little bit more tricky to compute. In my case i have started from map background extracted from osm. The french regions boundaries are available here, in a shapefile. To get the topojson file you have first to get a geojson file. You may use the ogr2ogr linux command line to perform this conversion.

ogr2ogr -f geoJSON geo.json geo.shp

Then, you may then use node.js and the topojson pluggin to convert it to the final topojson file.

# Install topojson under node
npm install -g topojson
# Convert the data
topojson -o output_file.json input_file.json

Since the number of announces depends certainly on the regions populations, the final interface propose to scale the variables per inhabitants. This will get a more interesting picture of the data since the cartogram of the population looks like this:

hab

Sometimes, you need to take into account weather data in an analysis, so the question of grabbing weather data from the web arise. This has happened for me in the context of studying Vélib’ usage were weather conditions are important. I found recently that http://www.wunderground.com provide interesting historical data for a variety of wheather stations and at a 5min intervals ! This was perfect for my usage. You may for example retrieve the data for the 5th of September, 2011 recorded at stations Montsouris in Paris here . So using wget

wget -FO "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=I75003PA1&graphspan=day&month=9&day=5&year=2011&format=1"

you may get a text file for a given day. The header of the file will be something like

Time,TemperatureC,DewpointC,PressurehPa,WindDirection,WindDirectionDegrees,WindSpeedKMH,WindSpeedGustKMH,Humidity,HourlyPrecipMM,Conditions,Clouds,dailyrainMM,SoftwareType,DateUTC<br>
2011-09-05 00:06:00,17.3,11.9,1014.4,SSO,202,4.8,9.7,71,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:06:00,
<br>
2011-09-05 00:11:00,17.3,11.9,1014.4,SSO,202,4.8,9.7,71,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:11:00,
<br>
2011-09-05 00:16:00,17.3,11.8,1014.4,SO,225,6.4,16.1,70,0.0,,,-2539.7,WeatherLink 5.4,2011-09-04 22:16:00,
....

These data are quite close to be ready to be processed using R or similar tools. If you want more days, for example one month (this was my case study), you may use few lines of bash to grab the data and clean them:

#!/bin/bash
for i in {1..30}
do
echo "day ${i} dowload"
wget -FO "${i}092011.csv" "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=I75003PA1&graphspan=day&month=9&day=$i&year=2011&format=1"
sed ':a;N;$!ba;s/,\n/\n/g' "${i}092011.csv" > "${i}092011N.csv"
sed ':a;N;$!ba;s/
\n//g'
"${i}092011N.csv" > "${i}092011F.csv"
sed ':a;N;$!ba;s/
//g'
"${i}092011F.csv" > "${i}092011N.csv"
sed '/^$/d' "${i}092011N.csv" > "${i}092011.csv"
sed '1d' "${i}092011.csv" >> septembre2011.csv
rm "${i}092011F.csv"
rm "${i}092011N.csv"
done

The sed wizardry helps to remove line breaks and html markups and you will get a cleansed file for the whole month ready to be imported in R.

septembre2011 <- read.csv("~/Projets/velib/data/meteo/databyhours/septembre2011.csv", header=F)

Eventually, you may plot the first 3000 temperature values to check the data.

ggplot(data=septembre2011[1:3000,],aes(x=as.POSIXct(V1,format="%Y-%m-%d %H:%M:%S"),y=V10))+geom_point()+geom_line()+scale_x_datetime("time")+scale_y_continuous('rainfall (mm / h)')

 

temperature

Or the associated rainfalls :

ggplot(data=septembre2011[1:3000,],aes(x=as.POSIXct(V1,format="%Y-%m-%d %H:%M:%S"),y=V10))+geom_point()+geom_line()+scale_x_datetime("time")+scale_y_continuous('rainfall (mm / h)')

 
rainfall

Welcome to my new blog.

I intend to speak a little bit of my research and provide some tutorial oriented content in this new version.

I hope to publish on a weekly basis.

I will write in English for the moment in order to practice, but will certainly also write posts in french.

So, have a good read …

Etienne