Last updated: 5 November 2024.
Highlights
Survey sampling is typically designed for large geographic units, such as countries or regions, and often fails to meet the detailed needs of smaller areas. This creates significant challenges in obtaining accurate and reliable estimates at the city level and for functional urban areas, where sample sizes are often too small for traditional estimation techniques to yield precise results.
Small area estimation (SAE) offers a robust solution, allowing statisticians to produce precise and reliable estimates for these smaller areas, even when direct data is limited or unavailable.
This chapter serves as a practical guide on small area estimation methods for city statistics, covering different estimation techniques and the steps required to implement them. It’s designed to assist national statistical offices in producing high-quality small area estimates that are consistent, comparable and reliable.
The content of this chapter is based a Eurostat publication, Guidelines on small area estimation for city statistics and other functional geographies – 2019 edition.
This chapter forms part of Eurostat’s City statistics manual.
Small area estimation methods
SAE involves estimating parameters for subpopulations or geographies that weren’t specifically planned for in the original sampling design, often due to their small size. The term ’small area’ doesn’t necessarily refer to a physically small geography, but rather to any area where the sample size is insufficient for direct, reliable estimation.
Various methods are available for SAE, each with its own strengths and weaknesses. The choice of method depends largely on the nature of the data and the specific characteristics of the small areas being analysed.
- Direct estimation – when the sample size within a small area is relatively large, direct estimation may be feasible. Direct estimators rely solely on data from the area in question. While direct estimates are unbiased, they often lack precision when sample sizes are small. This method is typically used only when the sample size is sufficient to produce reliable estimates without additional data.
- Indirect estimation – when direct estimation isn’t feasible due to small sample sizes, indirect estimation methods may be used. These methods ’borrow strength’ from related areas by incorporating data from similar areas. Indirect estimation improves the precision of estimates by leveraging correlations or relationships observed in these areas. For example, when estimating unemployment rates for a small city, data from similar cities or regional averages might be used to enhance reliability.
- Model-based estimation – this approach extends indirect estimation by integrating auxiliary information through statistical models. Model-based estimators assume a relationship between auxiliary variables and the variable of interest across different areas. These models are particularly powerful when significant auxiliary data is available. For instance, a model-based approach might utilise demographic data from a national census, administrative records like social security or fiscal data, or other relevant variables to estimate the unemployment rate in a city with a small sample size. In estimating health outcomes for small areas, a model might include variables such as age, income, education level and access to healthcare, all of which are known to influence health status. Model-based estimation can be further divided into
- unit-level models – these models are used when detailed unit-level data (for example, individual survey responses and individual auxiliary variables) are available. They require knowledge of the auxiliary variables for the entire population.
- area-level models – these models are appropriate when unit-level data isn’t accessible. They use aggregated data at the area level and are computationally less intensive.
Steps in the application of small area estimation methods
SAE involves several steps to ensure accurate and reliable results.
- Identification of target areas – the 1st step in applying SAE methods is to define clearly the small areas for which estimates are needed. Often, these areas – such as cities or FUAs – may not have been explicitly considered in the original survey design, which typically focuses on larger areas. The definition of these target areas should be guided by the specific needs of policymakers and stakeholders who require detailed data at this level.
- Data collection and initial assessment – before applying SAE methods, it’s crucial to conduct an initial assessment of the available data. This involves reviewing existing survey data to determine the sample sizes within each target area and identifying any auxiliary data that could enhance the precision of the estimates.
- Use of auxiliary data – the success of SAE methods often hinges on the effective use of auxiliary data. This additional information is vital for making reliable estimates in small areas where direct survey data may be sparse or non-existent. In practice, auxiliary data might include census information, administrative records or other relevant variables. For example, when estimating poverty rates in a small rural district, data on agricultural productivity, educational attainment and employment rates – all correlated with poverty – might be integrated into the estimation model to improve accuracy.
- Selection of the appropriate estimation method – SAE encompasses a variety of methodologies, each with its own strengths and specific applications. The choice of method depends on the nature of the available data and the unique characteristics of the small areas being analysed.
- Model construction and implementation – after selecting the appropriate estimation method(s), the next step is to construct the statistical model that will be used to generate the estimates. This process typically requires sophisticated statistical software and expertise in model selection and validation. The model must be meticulously designed to capture accurately the relationships between the auxiliary data and the variable of interest. Additionally, the model’s assumptions need to be tested and validated rigorously to ensure they hold true across different small areas.
- Validation and diagnostics – once the model has been applied and estimates generated, the next step is to validate the results. This involves conducting diagnostic tests to confirm that the model is performing as expected and that the estimates are accurate. Validation is particularly important in scenarios where direct data for the small area is limited or absent. This step might include comparing model-based estimates with any available direct estimates, checking for consistency with known trends, or testing the underlying assumptions of the model. The goal is to ensure that the estimates are not only statistically sound but also meaningful in the real-world context of the small area being studied.
- Ensuring coherence across geographies – coherence refers to the consistency of estimates across different geographical areas. For example, estimates for small areas, such as cities, should align with those for larger areas, such as regions, when aggregated. Coherence is crucial because policymakers and planners often use data across multiple geographical areas, and inconsistencies can undermine credibility. Ensuring coherence may involve adjusting estimates so that they sum correctly when aggregated, or comparing the results with independent estimates for different geographical areas.
- Communication and dissemination of the results – the final step involves effectively communicating the results to stakeholders, policymakers and other users. This requires presenting the estimates clearly and accessibly, often through reports, maps or dashboards so that users can explore the data at various levels of detail. Effective communication is vital for ensuring that the estimates are utilised effectively in decision-making processes. For example, a report might highlight key findings, such as cities or neighbourhoods with the highest unemployment rates, and offer recommendations for targeted interventions.
To illustrate the practical application of SAE methods, consider a scenario where a national statistical office is tasked with producing reliable estimates of employment and unemployment for all cities and FUAs within a country by combining information from the labour force survey and administrative data. This scenario is discussed in a study by D’Alò, M., Filipponi, D. & Loriga, S., Sae estimation of related labor market indicators for different overlapping areas; Statistical Methods & Applications (2024).
Conclusion
SAE provides a valuable set of techniques that enable the production of accurate and detailed statistics for cities and other small areas where traditional methods fall short. The effectiveness of these methods largely depends on several factors: the quality and availability of auxiliary data, the suitability of the chosen model and the expertise required to apply sophisticated statistical procedures. Moreover, it’s crucial to rigorously validate model assumptions to ensure their applicability across diverse small areas, which can present challenges when these areas differ significantly in their characteristics.
The Guidelines on small area estimation for city statistics and other functional geographies – 2019 edition offer a systematic framework for implementing SAE, ensuring that the resulting estimates are not only precise but also coherent and comparable across various geographical areas.
Source data for tables and graphs
Explore further
Database
- City statistics (urb), see:
- Cities and greater cities (urb_cgc)
- Functional urban areas (urb_luz)
- Perception survey results (urb_percep)