Exploratory analysis of Smart-Card data. Some results on Bike-Sharing Systems and transit networks

Séminaire System X,
6 Juin 2018

Etienne Côme COSYS/GRETTIA/IFSTTAR

Spécial data

Smart cards data        

Mobility




Lets a lot of Digital Footprints

Sensors everywhere ...

for whom, for what ?

Quite voluminous !! Must be analysed

! communicate results

reappropriation

System, city perspective

Unsupervised learning, clustering, LDA ...

Try to let the data speak themself

Generative models

Bike-sharing systems analysis

Visualize and analyze Vélib' data

2 data sources derived from smart-card data:

Stocks

in open-data (Real-Time Apps)

Flows

Origines / Destinations sometime in open-data (London, New-York, Boston,...) frequently not (France)

Animated bikes stocks

Animated bikes stocks

Stocks data : vlsstat

Stocks data : vlsstat

Functional data clustering on stocks series

The Discriminative Functional Mixture Model for the Analysis of Bike Sharing Systems [preprint]

Functional data clustering on stocks series

The Discriminative Functional Mixture Model for the Analysis of Bike Sharing Systems [preprint]

Functional data clustering on stocks series

The Discriminative Functional Mixture Model for the Analysis of Bike Sharing Systems [preprint]

Functional data clustering on stocks series

The Discriminative Functional Mixture Model for the Analysis of Bike Sharing Systems [preprint]

Origin/Destination data clustering

http://www.comeetie.fr/galerie/velib/

Station clustering model


Simple generative model :

$$Z_s\sim\mathcal{Multinomial}(1,\pi)$$ $$X_{sdt}|\{Z_{sk}=1,W_{dl}=1\}\sim\mathcal{Poisson}(\alpha_s\lambda_{klt})$$ + constraints $\sum_{l,t}D_l\lambda_{klt}=DT, \forall k \in\{1,...,K\}$,
with $D_l$ number of days in $l$.

Station clustering model


Completed - Likelihood :

$$Lc(\mathbf{\Theta};\mathbf{X},\mathbf{Z},\mathbf{\alpha},\mathbf{W})=\sum_{s,k}Z_{sk}\log\left(\pi_{k}\prod_{d,t,l}po(X_{sdt};\alpha_s\lambda_{klt})^{W_{dl}}\right)$$ Estimation by EM

Urban dynamics through the observed flows (2 month of O/D data in Paris)

http://www.comeetie.fr/galerie/velib/

Urban dynamics through the observed flows

Crossing with contextual data // socio-eco

Crossing with contextual data // socio-eco

Crossing with contextual data // socio-eco

hab/ha emp/ha serv/ha com/ha
* 162 237 4.2 3.7
Leisure (1) 367 189 6.3 4.4
Leisure (2) 261 322 7.7 6.9
Parks 172 90 2 1.7
Stations 209 206 2.4 1.8
Mixed 375 108 3.8 2.7
Jobs(1) 138 409 4.5 2.8
Jobs(2) 157 456 5.7 5.6
Average 301 163 3.8 2.8

Origin/Destination matrix analysis
with Latent Dirichlet Allocation

Latent Dirichlet Allocation

For dynamic Origine-Destination matrices analysis

Local stationarity of BSS behaviour / OD
Small bags of successive trips $\approx$ stationarity of OD
Documents (bags of words) = bags of successive trips (5000)
With :

Latent Dirichlet Allocation

For dynamic Origine-Destination matrices analysis


For each latent activity $a$, draw its template : $\Lambda_a\sim\mathcal{Dirichlet}(\beta)$
For each bag of trips
Draw the proportion of activities : $\pi_t \sim \mathcal{Dirichlet}(\alpha)$
For each trips
    Draw its activity
    $A \sim \mathcal{Multinomial}(1,\pi_t)$
    Draw an OD using the activity template
    $D \sim \mathcal{Multinomial}(1,\Lambda_A)$

OD decomposition results

Model selection with perplexity analysis
(clear drop for K=5)

Latent activity template analysis

Expected stations balances

Draw $N_{tr}$, (number of trips) using $\Lambda_a$ : $$OD\sim\mathcal{Multinomial}(N_{tr},\Lambda_a)$$ Compute the balance (incoming bikes - leaving bikes) for a station $s$ : $$B_s=\sum_jOD_{js}-\sum_jOD_{sj}$$ Compute the expectation for each station $\mathbb{E}[\mathbf{B}]=N_{tr}(\Lambda_a^t-\Lambda_a)\mathbf{1}$

Stations Balances : home→work

Stations Balances: work→home

Stations Balances : evening

Gravity-LDA

LDA extension for taking stations context into-accounts

Replace the O/D matrix templates $\Lambda_k$ by a parametric form which depends on : \begin{equation} \Lambda_{\,uv}(\Theta_k) \,=\, \frac{ \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } {\sum\limits _{u,v}\; \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } \end{equation}

Inspiration from gravity models for O/D matrices

Gravity-LDA

LDA extension for taking stations context into-accounts

Replace the O/D matrix templates $\Lambda_k$ by a parametric form which depends on : \begin{equation} \Lambda_{\,uv}(\Theta_k) \,=\, \frac{ \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } {\sum\limits _{u,v}\; \exp(\mathbf{\theta_{k}^{d}}^\top \mathbf{x_{u}} + \mathbf{\theta_{k}^{a}}^\top \mathbf{x_{v}} + \mathbf{\theta_{k}^{da}}^\top \mathbf{x_{uv}^{da}}) } \end{equation}

Estimation by Collapsed CEM

Conclusion on OD analysis

Results

Extension

Bonus : cities portraits at Night

Bonus : cities portraits : atNight

Transit networks analysis

Smart-card data form


Open-Data ? (ex: "Ile de France Mobilité" in aggregated form)

Smart-card data form


A particular field : user id

A massive dataset


2 year of data

Outline

What to do without user ids ?

What to do with user ids ?

Analyzing in-flow volumes

Profiles of the demand with spatio-temporal variations

Profiles of the demand with spatio-temporal variations

Profiles of the demand with spatio-temporal variations

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Between day variations clearly visible (CAH)

Which is mainly explained by calendar effects


Which can be used to detects outliers


Which can be used to detects outliers

#Rennes #metro #Star des chaises jetées sur la ligne aérienne de métro à Villejean. Dégâts importants. Trafic interrompu pendant 2h?

— Samuel Nohra (@SamuelNohra) 29 mars 2016

Or perform mid-term predictions


Or perform mid-term predictions

User's id for

data enrichment

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Enable the reconstruction of a significant portion of the destinations

Data enrichment


72% of reconstructed destinations


→ Aggregation and analysis per Oirigines/Destinations
→ Multimodal exchange hub analysis (C. Richer)
→ Dynamic OD matrices or Line graph of load

User clustering

for user centric analysis

Objectives


Methodology

Objectives


Methodology

Commuter patterns

Mean profile of a cluster with 4.55% of users

Commuter patterns

Mean profile of a cluster with 12.54% of users

Commuter patterns

Mean profile of a cluster with 3.6% of users

But other forms emerge

Mean profile of a cluster with 15.13% of users

But other forms emerge

Mean profile of a cluster with 6.44% of users

But other forms emerge

Mean profile of a cluster with 8.64% of users

Methodology

Using a continuous time description Generative model for continuous time user clustering

Some comments on the results


Conclusion


Current works

Thank's for your attention



and to all my colleagues

Latifa Oukhellou

Mohamed El Marhsi

Anne Sarah Briand

Florian Toqué

Cyprien Richer

Nicolas Coulombel

Bibliographie

El Mahrsi, M., Briand, A. S., Côme, E., & Oukhellou, L. "Utilité des données billettiques pour l’analyse des mobilités urbaines: le cas rennais" Données urbaines, Economica, 2015, 11p.
El Mahrsi, M., Côme, E., Oukhellou, L. & Verleysen, M. "Clustering Smart Card Data for Urban Mobility Analysis" Ieee Transactions on Intelligent Transportation Suystems ( Volume: PP, Issue: 99 ), pp 1 – 17, 2016.
Briand, A. S., Côme, E., El Mahrsi, M., & Oukhellou, L. "A mixture model clustering approach for temporal passenger pattern characterization in public transport", International Journal of Data Science and Analytics, Avril 2016, Volume 1, issue 1, pp 37-50.
Briand, A. S., Côme, E., El Mahrsi, M., & Oukhellou, L. "Classification á base de Modèle de mélange pour l’identification de profils temporels types d’usagers de transport public" AAFD & SFC’16 : Conférence Internationale Francophone sur la Science des Données, Marrakech, 2016.
Toqué, F., Côme, E., El Mahrsi, M., Oukhellou, L. "Forecasting Dynamic Public Transport Origin-Destination Matrices with Long-Short Term Memory Recurrent Neural Networks". In Proceedings of IEEE 19th International Conference on Intelligent Transportation Systems , Rio de Janeiro , BRESIL 2016.

Bibliographie

Work in progress :

Briand, A. S., Côme, E., Trépanier, M., & Oukhellou, L. "Analysing year-to-year changes in public transport passenger behaviour using smart card data" Transportation Research Part C : Special Issue on Smartcard data (article accepté en cours de publication)
Briand, A. S., Côme, E., & Oukhellou, L. "Anomalie detection and characterization in Smart-Card data" (article en cours de finalisation)