A Graph-Based Superframework for Mixture Model Estimation Using EM: An Analysis of US Wholesale Electricity Markets

Featured and recent publications

All Publications

In their 2023 paper in Neural Computing and Applications, Carlo Mari and Cristiano Baldassari present a fresh, fully unsupervised, graph-based “superframework” designed to make the Expectation–Maximization (EM) algorithm smarter when estimating Gaussian Mixture Models (GMM) on financial and energy market time series.

By blending ideas from network science, machine learning, and topological data analysis (TDA), the authors tackle one of EM’s biggest pain points — how to choose good starting parameters so the algorithm doesn’t get stuck in local optima. Their approach lets the model discover structure in the data on its own, leading to more stable and insightful results when analyzing complex market dynamics.

The Problem: EM Initialization and Local Optima

Mixture models are powerful tools for modeling complex distributions, but the EM algorithm used to estimate them is highly sensitive to initial conditions. Traditional approaches like K-means or random initialization require prior assumptions on the number of mixture components and often converge to suboptimal solutions. Mari and Baldassari propose a data-driven alternative: using graph representations of time series to automatically infer both the number of components and the initial parameters.

From Time Series to Graphs

The framework begins by transforming time series into quantile-based complex networks, using a Markov Transition Field (MTF) as the adjacency matrix. This technique captures transition probabilities between quantized states, effectively embedding temporal dynamics into a graph structure. Alongside this, the framework also employs a graph embedding technique, which projects the resulting networks into a high-dimensional space where their structural patterns and similarities can be more easily analyzed. Together, these methods allow the model to uncover both temporal and topological features hidden in financial and energy market data.

The resulting diagram then simplified through one superframework strategies:

Graph Coarsening – a model-driven dimensionality reduction used in the Graph Approximation (GA) framework.
Graph Embedding – a data-driven transformation into an embedding space used in the Graph Representation Learning (GRL) framework.

For embedding, the study leverages multiple unsupervised methods from the Karate Club Python library, including Diff2Vec, GraphWave, ASNE, and Network Embedding Update (NEU).

Community Detection and Initialization

Once the graph is encoded and compressed, communities of nodes are identified using:

the Louvain method (for GA), based on modularity optimization, or
ToMATo clustering (for GRL), a TDA-based approach using persistent homology for robust unsupervised clustering.

Each detected community corresponds to one Gaussian component in the mixture model. The mean, variance, and weight of each component are initialized using data points within the same community, producing a complete and data-driven initialization vector for EM.

Testing the Framework: US Electricity Markets

The superframework is tested on daily wholesale electricity prices from four major U.S. markets — Palo Verde (Southwest), PJM (Northeast), SP15 (Southern California), and NEPOOL (New England) — covering 2017–2021.

Before analysis, missing data were filled using the missForest imputation algorithm, and trends and seasonality were removed with LOWESS smoothing to focus on stochastic components. The log-return series of these markets exhibit heavy tails and volatility spikes, making them ideal candidates for mixture modeling.

Both frameworks successfully estimate GMM that reproduce the empirical distributions of log-returns, but the Graph Representation Learning (GRL) framework clearly outperforms the Graph Approximation (GA).

Across all markets, the best-fitting models consistently feature three mixture components, as determined automatically by the community structure.

The GRL framework achieves lower Bayesian Information Criterion (BIC) values and more accurate reproduction of higher-order moments, particularly kurtosis — a key indicator of volatility clustering in electricity markets.

Interpretation and Significance

The study demonstrates that structural information encoded in complex graphs captures nonlinear dependencies in financial and energy time series that traditional methods miss. By turning time series into network objects and using unsupervised community detection, the authors provide a universal and flexible system for initializing EM without human intervention.

The framework’s modular design — with interchangeable encoding, compression, and partitioning blocks — allows it to adapt to diverse empirical settings, from financial assets to sensor data. It bridges graph theory and statistical inference, offering a new path toward robust, interpretable probabilistic modeling.

Conclusion and Future Outlook

Mari and Baldassari’s superframework establishes a general-purpose, graph-driven approach for mixture model estimation. By integrating graph embedding and TDA clustering, it produces stable and high-quality EM solutions, outperforming standard initialization strategies.

Future research directions include:

using deep convolutional neural networks (e.g., VGG-16) for graph embedding via transfer learning,

combining coarsening and embedding for hybrid compression, and
extending the initialization process into a self-supervised learning system for dynamic regime discovery.

Ultimately, this research reveals how network-based representations can unlock new levels of precision and automation in statistical modeling — especially in markets where complexity and nonlinearity dominate.

Page updated

Report abuse