In an era where energy markets are becoming increasingly volatile and data-rich, researchers Carlo Mari and Cristiano Baldassari have introduced a cutting-edge, network-based framework for improving the estimation of Gaussian Mixture Models (GMM) on financial time series. Their study, published in Computational Management Science, expands the frontier of graph machine learning by demonstrating how visibility graphs can provide a more accurate and efficient foundation for probabilistic modeling—particularly in complex domains such as the US electricity market.
Mixture models like GMM are fundamental tools in data analysis, allowing researchers to describe intricate probability distributions that emerge in financial and market data. However, these models rely on the Expectation-Maximization (EM) algorithm, whose performance is highly sensitive to its starting parameters. Poor initialization can lead to local optima and unreliable results.
Mari and Baldassari tackle this issue head-on by proposing an unsupervised, network-based approach. Their method builds on earlier work (Mari & Baldassari, 2022), which introduced a “Graph Approximation” (GA) framework based on Markov Transition Fields. In their new study, they advance this idea further through the Graph Embedding (GE) framework, which uses visibility graphs to encode time series and embedding algorithms to preserve the full structural complexity of data.
The result is a model that learns directly from the shape and connectivity of time series producing more stable, interpretable, and accurate outcomes.
The new GE-framework transforms raw financial or market time series into complex networks through Visibility Graph (VG) algorithms. In these graphs, each observation in the time series becomes a node, and edges connect data points that can “see” each other based on geometric visibility rules. This approach allows the inherent topology of a time series—its peaks, valleys, and fluctuations—to be captured in network form.
Two types of visibility algorithms are used:
Natural Visibility Graphs (NVG) – connect points based on geometric line-of-sight.
Horizontal Visibility Graphs (HVG) – a simplified version that reduces computational cost.
Once encoded, the graphs are compressed using embedding techniques. Methods such as Diff2Vec, GraphWave, and ASNE map the graph’s nodes into multidimensional Euclidean space, preserving neighborhood and structural relationships. The resulting embeddings are then clustered using Topological Data Analysis (TDA) via the ToMATo algorithm, which identifies stable communities corresponding to different behavioral regimes in the data.
Each detected community represents a GMM component, providing both the number of mixture components and the initial parameters for EM initialization in a completely unsupervised fashion.
To demonstrate the power of their new approach, Mari and Baldassari applied the Graph Embedding framework to a dataset of daily wholesale electricity prices in the United States, spanning the years 2017 to 2021. The analysis covered four of the country’s main regional hubs—each reflecting distinct market dynamics and geographical characteristics:
Palo Verde (Southwest) – representing the desert energy corridor, characterized by high solar input and peak-demand fluctuations.
PJM (Northeast) – one of the largest and most liquid power markets, marked by dense trading and strong seasonal effects.
SP15 (Southern California) – a volatile hub influenced by renewables integration and demand surges.
NEPOOL (New England) – a mature, weather-sensitive market driven by heating and winter price shocks.
Before modeling, the researchers conducted an extensive data preprocessing phase to ensure statistical consistency. Missing or irregular daily observations—common in energy trading due to weekends, holidays, and market closures—were reconstructed using the missForest machine learning algorithm for data imputation.
Next, long-term trends and seasonal components were stripped away using LOWESS smoothing, a non-parametric regression technique that isolates the stochastic core of the price signal. The final product was a set of detrended log-return time series—highly volatile, heavy-tailed, and ideal for testing probabilistic models under real-world market complexity.
The findings were decisive. The GE-framework outperformed both the GA-framework and traditional K-means initialization across all datasets. Using Bayesian Information Criterion (BIC) and kurtosis error as benchmarks, the researchers showed that:
The GE-based models consistently achieved lower BIC scores (better fit with fewer parameters).
The approach was up to 250 times more likely to produce the optimal model than GA-based methods.
Computational time dropped dramatically—from around 120 hours for the GA method to just 30 minutes for GE.
In all cases, the visibility-graph-based initialization captured the heavy tails and volatility spikes that characterize electricity price movements, outperforming standard clustering-based methods in both efficiency and realism.
Among embedding techniques, Diff2Vec and ASNE (with additional log-return attributes) yielded the best results, offering a balance between accuracy and computational tractability.
The study’s broader significance lies in its unification of network science, topological analysis, and probabilistic modeling into a single, flexible pipeline. Mari and Baldassari propose that their GE-framework could be part of a larger “super framework”—a modular system capable of adapting to any empirical dataset by combining multiple graph-based encodings and compression methods.
This adaptability could prove transformative not only for financial modeling but also for fields like neuroscience, climatology, and energy forecasting, where data are inherently dynamic, nonlinear, and noisy.
By encoding time series as visibility graphs and embedding them into a machine-learning-ready space, Mari and Baldassari have reimagined how mixture models can be optimized. Their Graph Embedding framework stands out as a faster, more accurate, and more interpretable approach to modeling complex systems.
In the intricate and unpredictable world of electricity markets, this approach offers analysts a powerful new lens—one that transforms raw volatility into structured, understandable patterns, and paves the way for a new generation of network-driven statistical modeling.