Mixture models for counts, model selection, regularisation and estimation.

Sémianire UTC,
20 Mars 2018

Etienne Côme COSYS/GRETTIA/IFSTTAR

Mixture of unigrams

$$ \begin{eqnarray} Z&\sim&\mathcal{M}(1,\mathbf{\nu})\\ X_i|Z_{ik}=1&\sim&\mathcal{M}(L^i,\mathbf{\mu}_k)\\ \end{eqnarray} $$

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

$BIC(\theta,D_k)$ asymptotic approximation: $$ -2\ln(p_k(x,\color{green}{\hat{\theta}}))+\color{red}{D_k}\ln(\color{blue}{N})$$ $\Rightarrow$ Drawbacks, several run of EM, asymptotic approximation

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

$AIC(\color{green}{\theta},\color{red}{D_k})$,asymptotic approximation : $$ -2\ln(p_k(x,\color{green}{\hat{\theta}}))+2\color{red}{D_k}$$ $\Rightarrow$ Drawbacks, several run of EM, asymptotic approximation

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

$ICL(\color{green}{\hat{\theta}},z,\color{red}{D_k})$, asymptotic approximation : $$ -2 \ln(p_k(x,\color{purple}{\hat{z}},\color{green}{\hat{\theta}}))+\color{red}{D_k}\ln(\color{blue}{N})$$ $\Rightarrow$ Drawbacks, several run of EM, asymptotic approximation

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

$\Rightarrow$ Drawbacks, several run of EM,

Mixture of unigrams, estimation

Frequentists --.-------- Bayesian

$\Rightarrow$ Drawbacks, several run of EM,

Mixture of unigrams, estimation

Frequentists ----------. Bayesian

Dirichlet are conjugate priors for Multinomials: $$ \begin{eqnarray} \mathbf{\nu}&\sim&\mathcal{D}(\mathbf{\alpha})\\ \mathbf{\mu}_k&\sim&\mathcal{D}(\mathbf{\beta}_k)\\ \end{eqnarray} $$

Mixture of unigrams, estimation

Frequentists ----------. Bayesian

Gibbs sampling [Rigouste & al 2006]

$$\begin{eqnarray} &\sim& Z|\nu, \mu,X\\ &\sim& \nu|Z,X\\ &\sim& \mu|Z,X\\ \end{eqnarray}$$ ! Mixing, label switching, ...

Mixture of unigrams, estimation

Frequentists ----------. Bayesian

Collapsed Gibbs sampling [Rigouste & al 2006]

$$\begin{eqnarray} &\sim& Z|X\\ \end{eqnarray}$$ ! Mixing, label switching, ...

Mixture of unigrams, estimation

Frequentists --------.-- Bayesian, ICL [Biernacki & al 2012]

Mixture of unigrams, estimation

Frequentists --------.-- Bayesian, ICL

Parameters marginalization


Conjugacy $\Rightarrow$ analytical integration of the parameters : \begin{eqnarray} \log(p(\mathbf{x},\mathbf{z}))&=&\color{RoyalBlue}{\log(p(\mathbf{x}|\mathbf{z}))}+\color{SeaGreen}{\log(p(\mathbf{z}))}\\ &=&\color{RoyalBlue}{\sum_{k=1}^{K}\log(\frac{C(\mathbf{\epsilon}_k)}{C(\mathbf{\beta_k})})}+\color{SeaGreen}{\log(\frac{C(\mathbf{n})}{C(\mathbf{\alpha})})}\\ \end{eqnarray} with $\left\{\tiny{\begin{eqnarray} \color{RoyalBlue}{\mathbf{\epsilon}_k}&=&(\beta_{k1}+\sum_{i=1}^NX_{i1},...,\beta_{kV}+\sum_{i=1}^NX_{iV})\\ \color{SeaGreen}{\mathbf{n} }&=& (\alpha_{1}+\sum_{i=1}^NZ_{i1},...,\alpha_{k}+\sum_{i=1}^NZ_{iK}) \end{eqnarray}}\right\}$ the pseud-count vectors
and $C$ the normalisation term of the dirichlet distribution $C(\mathbf{a})=\frac{\prod_{d=1}^D\Gamma(a_d)}{\Gamma(\sum_{d=1}^Da_d)}$

Models

Graph clustering, Stochastic Block Model

$$ \begin{eqnarray} Z&\sim&\mathcal{M}(1,\mathbf{\nu})\\ X_{ij}|Z_{ik}Z_{jl}=1&\sim&\mathcal{B}(\mathbf{\Pi}_{kl})\\ \end{eqnarray} $$

Models

Graph clustering, Stochastic Block Model

$$ \begin{eqnarray} Z&\sim&\mathcal{M}(1,\mathbf{\nu})\\ X_{ij}|Z_{ik}Z_{jl}=1&\sim&\mathcal{B}(\mathbf{\Pi}_{kl})\\ \end{eqnarray} $$

Models

Degree corrected Stochastic Block Model

$$ \begin{eqnarray} Z&\sim&\mathcal{M}(1,\mathbf{\alpha})\\ X_{ij}|Z_{ik}Z_{jl}=1&\sim&\mathcal{P}(\beta_j\beta_i\Pi_{kl})\\ \end{eqnarray} $$

SBM estimation

Same algorithms and methods except some additional difficulties :


Greedy optimization of ICL

$$\hat{Z}=\arg\max_{Z}ICL_{ex}(X,Z)$$

Algorithm, greedy swap


Remarks:

$\Rightarrow$ No need for multiple run for each $k$
$\Rightarrow$ ICL depends only on pseudo counts matrice and vector
$\Rightarrow$ Competitive complexity: $\mathcal{O}(NK^2+L)$ vs $\mathcal{O}(K^3L)$, and running time
$\Rightarrow$ Local optimum ! initialization
$\Rightarrow$ Combine with greedy hierachical moves

Simulations results (small scale problems)


Simulations results (medium scale problems)


Communities, 500 nodes, K=10

Simulations results (large scale problem)


Complex structure, 10 000 nodes, K=50


Real data (drawings blogs)

Remarks

Hiearchical, regularisation

Main idea

$log(\Gamma(x))$ around $0$

$$x\rightarrow 0 : \log(\Gamma(x))= \log(\Gamma(x+1)) - log(x) \approx - log(x)$$

ICL as a function of $\log(\alpha)$


$$ICL(Z,\alpha)=D(Z)+(k-1)log(\alpha)$$ ! simple form, linear shape

Merge Cost with respect to $\alpha$:


$$ \begin{eqnarray} \Delta_{gh}(\alpha)&>&0\\ log(\alpha)&<&\log(\frac{K}{(K-1)})+\\ &&D_{gh}(Z)+log(\Gamma(n_h+n_g))-\log(\Gamma(n_h))-\log(\Gamma(n_g)) \end{eqnarray}$$

ICL as a function of $\log(\alpha)$

Algorithm

hierachical, greedy merging

Remarks

$\Rightarrow$ complexity $\mathcal{O}(K^3)$
$\Rightarrow$ use only the pseudo counts matrice and vector
$\Rightarrow$ heuristic to follow the pareto front over ICL,$\alpha$

Dendogram from greddy merge tree

ICL $\alpha=1$, drawings blogs


Dendogram, drawings blogs


Dendogram, drawings blogs


$\mathbf{\Pi}$, drawings blogs


$\mathbf{\Pi}$, drawings blogs


$\mathbf{\Pi}$, drawings blogs


$\mathbf{\Pi}$, drawings blogs


Overfitting large scale simulaton


! Almost exact recovery with a sufficient regularization.

Bibliographie :

Baudry, Maugis, Micel 2010 : Slope Heuristics: Overview and Implementation
Rigouste, Cappé, Yvon 2006 : Inference and Evaluation of the Multinomial Mixture Model for Text Clustering
Biernacki, Celeux, Govaert 2011 : Exact and Monte Carlo Calculations of Integrated Likelihoods for the Latent Class Model
Latouche, Birmelé, Ambroise 2012 : Variational bayesian inference and complexity control for stochastic block models
Mc Daid, Murphy, Hurley 2013 : Improved bayesian inference for the stochastic block model with application to large networks
Cômer, Latouche 2015 : Model selection and clustering in stochastic block models with the exact integrated complete data likelihood
Newman, Reinart 2016 : Estimating the number of communities in a network
Ghasemian, Hosseinmardi, Clauset 2018 : Evaluating Overfit and Underfit in Models of Network Community Structure

Work in progress :

Côme, Jouvin, Latouche, Bouveyron : Hierarchical clustering in stochastic block models with the integrated classification likelihood

Thanks for the attention !

Questions ?









@comeetie