Mobility




Lets a lot of Digital Footprints

Sensors everywhere ...

for whom, for what ?

Sensors :

Utopy ?

Open the black box ?

Transparency / Trust / Personal data

Utopy ?

Better urban governance ?
Anatagonist interests ?

Distopy ?

Better urban governance ?
Observation limits ?

Quite voluminous !! Must be analysed

! communicate results

reappropriation

System, city perspective

Unsupervised learning, clustering, Data-Mining

Try to let the data speak themself

Digital Revolution and Mobility

Paradigm Shift From Transport to Mobility

Lifestyle transformation through digital

Digital Revolution and Mobility

Digital and mobility actors

Digital and Citizen (ou user)

Digital Revolution and Mobility

Data mining

Trends

Operationnal objectives

For urban stakeholders (operators, transport authorities, city managers)

For citizens

Replica

We believe this powerful data source can help do just that. Meet Replica: a user-friendly modeling tool that uses anonymized mobile location data to give planning agencies a comprehensive portrait of how, when, and why people travel in urban areas. Replica provides a full set of baseline travel measures that are very difficult to gather and maintain today, including the total number of people on a highway or local street network, what mode they’re using (car, transit, bike, or foot), and their trip purpose (commuting to work, going shopping, heading to school, etc). By updating these measures every three months, Replica also provides the ongoing ability to detect changes in these measures over time — helping planners answer questions about land use and transportation from a regional level all the way down to a city block.

Replica

Replica uses this anonymized data from about 5 percent of the population to learn about travel patterns and create a travel behavior model — basically, a set of rules to represent who’s moving where, when, why, and how. But models aren’t perfect. So we gut check these rules using on-the-ground data (such as manual traffic counts or transit boardings) to make sure Replica is consistent with real-world movement patterns.

Replica

We then match these models with what planners often call a “synthetic” population. That’s a very technical term, but the basic idea is that planners can use incomplete samples of census demographic data to create a broad new data set that is statistically representative of the full population.

Replica

Replica

Status :

Other company examples

Special data

Smart cards data        

Smart Card Data designed for tarification purpose

spatially and temporally precise, longitudinal, quite exaustives

Secondary use of Smart Card Data for urban mobility analyzis

BUT

Bike-sharing systems analysis

Visualize and analyze Vélib' data

2 data sources derived from smart-card data:

Stocks

in open-data (Real-Time Apps)

Flows

Origines / Destinations sometime in open-data (London, New-York, Boston,...) frequently not ()

Animated bikes stocks

Animated bikes stocks

Stocks data : vlsstat

Stocks data : vlsstat

Clustering of stocks

The Discriminative Functional Mixture Model
for the Analysis of Bike Sharing Systems

Origin/Destination data clustering

http://www.comeetie.fr/galerie/velib/

Station clustering model


Simple generative model :

$$Z_s\sim\mathcal{Multinomial}(1,\pi)$$ $$X_{sdt}|\{Z_{sk}=1,W_{dl}=1\}\sim\mathcal{Poisson}(\alpha_s\lambda_{klt})$$ + constraints $\sum_{l,t}D_l\lambda_{klt}=DT, \forall k \in\{1,...,K\}$,
with $D_l$ number of days in $l$.

Station clustering model

Urban dynamics through the observed flows (2 month of O/D data in Paris)

http://www.comeetie.fr/galerie/velib/

Urban dynamics through the observed flows

Crossing with contextual data // socio-eco

Crossing with contextual data // socio-eco

Crossing with contextual data // socio-eco

hab/ha emp/ha serv/ha com/ha
* 162 237 4.2 3.7
Leisure (1) 367 189 6.3 4.4
Leisure (2) 261 322 7.7 6.9
Parks 172 90 2 1.7
Stations 209 206 2.4 1.8
Housing 375 108 3.8 2.7
Jobs(1) 138 409 4.5 2.8
Jobs(2) 157 456 5.7 5.6
Average 301 163 3.8 2.8

Origin/Destination matrix analysis
with Latent Dirichlet Allocation

Latent Dirichlet Allocation

For dynamic Origine-Destination matrices analysis

Local stationarity of BSS behaviour / OD
Small bags of successive trips $\approx$ stationarity of OD
Documents (bags of words) = bags of successive trips (5000)
With :

Latent Dirichlet Allocation

For dynamic Origine-Destination matrices analysis


For each latent activity $a$, draw its template : $\Lambda_a\sim\mathcal{Dirichlet}(\beta)$
For each bag of trips
Draw the proportion of activities : $\pi_t \sim \mathcal{Dirichlet}(\alpha)$
For each trips
    Draw its activity
    $A \sim \mathcal{Multinomial}(1,\pi_t)$
    Draw an OD using the activity template
    $D \sim \mathcal{Multinomial}(1,\Lambda_A)$

OD decomposition results

Model selection with perplexity analysis
(clear drop for K=5)

Stations Balances : home→work

Stations Balances: work→home

Stations Balances : evening

Gravity-LDA

LDA extension for taking stations context into-accounts

Replace the O/D matrix templates $\Lambda_k$ by a parametric form which depends on :

Inspiration from gravity models for O/D matrices

Gravity-LDA

LDA extension for taking stations context into-accounts

Replace the O/D matrix templates $\Lambda_k$ by a parametric form which depends on : \begin{equation} \Lambda_{\,uv}(\Theta_k) \,=\, \frac{ \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } {\sum\limits _{u,v}\; \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } \end{equation}

Estimation by Collapsed CEM

Conclusion on OD analysis

Results

Extension

Bonus : cities portraits at Night

Bonus : cities portraits : atNight

Transit networks analysis

Smart-card data form


Open-Data ? (ex: "Ile de France Mobilité" in aggregated form)

Smart-card data form


A particular field : user id

A massive dataset


2 year of data

Outline

What to do without user ids ?

What to do with user ids ?

Analyzing in-flow volumes

Profiles of the demand with spatio-temporal variations

Profiles of the demand with spatio-temporal variations

Profiles of the demand with spatio-temporal variations

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Which is mainly explained by calendar effects


Which can be used to detects outliers


Which can be used to detects outliers

#Rennes #metro #Star des chaises jetées sur la ligne aérienne de métro à Villejean. Dégâts importants. Trafic interrompu pendant 2h?

— Samuel Nohra (@SamuelNohra) 29 mars 2016

Or perform mid-term predictions


Or perform mid-term predictions

A variability which is also spatial

User's id for

data enrichment

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Data enrichment


72% of reconstructed destinations


→ Aggregation and analysis per Oirigines/Destinations
→ Multimodal exchange hub analysis (C. Richer)
→ Dynamic OD matrices or Line graph of load

Short term OD prediction

Short term OD prediction



Modele Learning. Val. Test
Calendar 11.6 12.35 8.86
Var 4.34 5.56 5.88
LSTM (subway) 4.01 4.94 5
LSTM (subway + bus) 2.73 4.52 4.71

(MSE) estimated for each model on the OD prediction task

User clustering

for user centric analysis

Objectives


Methodology

Objectives


Methodology

Commuter patterns

Mean profile of a cluster with 4.55% of users

Commuter patterns

Mean profile of a cluster with 12.54% of users

Commuter patterns

Mean profile of a cluster with 3.6% of users

But other forms emerge

Mean profile of a cluster with 15.13% of users

But other forms emerge

Mean profile of a cluster with 6.44% of users

But other forms emerge

Mean profile of a cluster with 8.64% of users

Methodology

Using a continuous time description Generative model for continuous time user clustering

Some comments on the results


Thank's for your attention



and to all my colleagues

Latifa Oukhellou

Mohamed El Marhsi

Anne Sarah Briand

Florian Toqué

Cyprien Richer

Nicolas Coulombel

Conclusion


Work in progress

Analysis of flows in a multimodal hub (RATP)

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Time series decomposition for long term analysis

Analysis of flows in a multimodal hub (RATP)

Segmentation and prediction

Analysis of flows in a multimodal hub (RATP)

Segmentation and prediction

Analysis of flows in a multimodal hub (RATP)

Segmentation and prediction

Analysis of flows in a multimodal hub (RATP)

Segmentation and prediction

Metro load prediction (RATP)

Load prediction from real time data

Metro load prediction (RATP)

Load prediction from real time data

Metro load prediction (RATP)

Load prediction from real time data