Understanding the intricate landscape of accommodation pricing in a country as geographically and culturally diverse as Italy has always been a challenge. Traditional statistics often provide a generalized view, masking significant local variations. This project moves beyond averages, leveraging a massive dataset of online listings scraped from meta-aggregators like CozyCozy, Booking.com, and Airbnb. Our goal is to create a high-resolution map of spatial price differences across Italy. By applying a suite of advanced statistical and spatial analysis techniques, we uncover the hidden patterns of hotel and rental pricing, identifying pricing "hot spots," "cold spots," and the complex factors that determine costs from the Alps to Sicily. This research, first presented at the ICES 2025 conference in Naples, provides a novel, data-driven perspective on the Italian tourism market.
Our analysis is built on a robust foundation of web-scraped data, capturing thousands of daily listings for hotels, B&Bs, and short-term rentals. This granular data allows us to go beyond simple city-level averages and analyze prices at a much finer scale. The core of our methodology integrates several powerful techniques:
Spatial Price Index (SPI): We developed a custom SPI to offer a more accurate comparison of price levels between different locations (e.g., provinces or municipalities). Unlike a simple average, the SPI accounts for differences in the quality and type of accommodation available, answering the question: "For a similar quality room, how much more expensive is Florence than Palermo?"
Spatial Clustering & LISA: To identify geographic patterns, we employed spatial clustering algorithms. These methods group areas with similar pricing structures. We then used Local Indicators of Spatial Association (LISA) to pinpoint statistically significant hot spots (clusters of high prices), cold spots (clusters of low prices), and spatial outliers.
Advanced Modelling (GAM & CPD): To understand the drivers behind these price differences, we used Generalized Additive Models (GAM). These flexible models help us quantify the impact of variables like location, property type, seasonality, and proximity to landmarks. The model was built using a Country Product Dummy (CPD) framework, a standard methodology for constructing robust price indexes.
The analysis revealed several key insights that challenge common assumptions about accommodation pricing in Italy:
Beyond the North-South Divide: While a general North-South price gradient exists, our analysis reveals a far more complex reality. We identified distinct, high-priced clusters in specific southern coastal areas and, conversely, "pockets of affordability" in unexpected northern regions.
The Power of "Tourist Clusters": The analysis clearly delineates pricing zones around major tourist destinations like the Amalfi Coast, Cinque Terre, and key art cities. The LISA analysis shows that these high-price zones have a strong "spillover" effect, influencing prices in adjacent, less-famous municipalities.
Quantifying the Price Drivers: Our GAM models confirmed the expected impact of seasonality and location but also quantified the premium associated with specific features. For instance, we could measure the average price increase for accommodations with a sea view in Liguria versus Puglia, providing actionable insights for the hospitality industry.
This study demonstrates the immense potential of combining web-scraped data with advanced spatial statistics to analyze complex economic phenomena. Our findings provide a detailed and nuanced understanding of the Italian accommodation market, offering valuable insights for tourists, hotel managers, and policymakers. By moving from aggregated data to granular, location-specific analysis, we can build more effective tourism strategies, promote fair pricing, and better understand regional economic disparities. This data-driven approach represents the future of market analysis in the digital age.