© Roman Mager / Unsplash
© Roman Mager / Unsplash

ILO Modelled Estimates (ILOEST database)

Table of Contents

Impact of the pandemic on ILO modelled estimates and projections

The ILO maintains a series of econometric models used to produce estimates of labour market indicators in the countries and years for which country-reported data are unavailable and to produce forecasts (see descriptions below). The model inputs are historical time series data. The unprecedented labour market shock created by the COVID-19 pandemic is difficult to assess by benchmarking against historical data. For indicators related to working-hours, a nowcasting model was recently developed to provide timely estimates.  The ILO is updating the methodology for the rest of the modelled estimates. 

Given the exceptional situation, including the scarcity of relevant data, the estimates from 2020 onwards are subject to a substantial amount of uncertainty.

The ILO modelled estimates series provides a complete set of internationally comparable labour statistics, including both nationally reported observations and imputed data for countries with missing data. The imputations are produced through a series of econometric models maintained by the ILO. The purpose of estimating labour market indicators for countries with missing data is to obtain a balanced panel data set so that, every year, regional and global aggregates with consistent country coverage can be computed. These allow the ILO to analyse global and regional estimates of key labour market indicators and related trends. Moreover, the resulting country-level data, combining both reported and imputed observations, constitute a unique, internationally comparable data set on labour market indicators.

Estimates for countries with very limited labour market information have a high degree of uncertainty. Hence, estimates for countries with limited nationally reported data should not be considered as “observed” data, and great care needs to be applied when using these data for analysis, especially at the country level.

For more information on the ILO modelled estimates, refer to this methodological description.

Data collection and evaluation

The ILO modelled estimates are generally derived for 189 countries, disaggregated by sex and age as appropriate. For selected indicators, an additional disaggregation by rural/urban areas is performed. Before running the models to obtain the estimates, labour market information specialists from the ILO Department of Statistics, in cooperation with the Research Department, evaluate existing country‑reported data and select only those observations deemed sufficiently comparable across countries.

The recent efforts by the ILO to produce harmonized indicators from country-reported microdata have greatly increased the comparability of the observations. Nonetheless, it is still necessary to select the data based on the following four criteria: (1) type of data source; (2) geographical coverage; (3) age-group coverage; and (4) presence of methodological breaks or outliers.

Country groupings

The UN does not have a standardized set of regional groupings. Groupings in ILOSTAT are based on the regions used for administrative purposes by the ILO, which may differ from those of other organizations. These usually do not change over time.

ILOSTAT also presents income groupings based on the World Bank’s classification. The world’s economies are assigned to one of four income groups: low, lower-middle, upper-middle, and high-income countries. The classifications are updated each year on July 1 and are based on GNI per capita in current USD of the previous year. Aggregates in the ILO modelled estimates from the November edition reflect the World Bank’s income classifications from July that year. Hence, estimates from different editions will not reflect the same income groupings.

See the complete list of countries by region and income group.

F.A.Q.

Conducting labour force surveys is a complicated and costly task which some countries are unable to do on a systematic basis. Consequently, significant data gaps remain in most international labour statistics databases. To be able to produce reliable global and regional estimates of key labour indicators, the ILO has developed statistical models that produce estimates for countries in years for which no data have been reported. These models have been tested for statistical accuracy and allow the ILO to forecast changes in key labour market indicators as well as to produce global and regional aggregates. The end result of these models is a complete set of national labour statistics alongside the global and regional aggregates. In the interest of transparency, the ILO publishes the resulting country-level and global and regional estimates in the ILO modelled estimates series.

Not all countries submit statistically comparable data. Before running the models to obtain the estimates, ILO labour market information specialists evaluate country-reported data and select only those observations deemed sufficiently comparable across countries. The recent efforts by the ILO to produce harmonized indicators from country-reported microdata have greatly increased the comparability of the observations. Nonetheless, it is still necessary to select the data on the basis of the following four criteria: (a) type of data source; (b) geographical coverage; (c) age-group coverage; and (d) presence of methodological breaks or outliers.

Our models also include country-level data on population, economic growth, poverty and other economic indicators from the following sources:

  • United Nations World Population Prospects
  • IMF/World Bank data on macroeconomic indicators
  • World Bank poverty estimates from the PovcalNet database

The estimates are produced using a series of models, which establish statistical relationships between observed labour market indicators and explanatory variables. These relationships are used to impute missing observations and to make projections for the indicators.

There are many potential statistical relationships, also called “model specifications” that could be used to predict labour market indicators. The key to obtaining accurate and unbiased estimates is to select the best model specification in each case. The ILO modelled estimates generally rely on a procedure called cross-validation, which is used to identify those models that minimize the expected error and variance of the estimation. This procedure involves repeatedly computing a number of candidate model specifications using random subsets of the data: the missing observations are predicted and the prediction error is calculated for each iteration. This makes it possible to identify the statistical relationship that provides the best estimate of a given labour market indicator.

The ILO modelled estimates aim is to provide a complete set (without missing observations) of internationally comparable labour statistics. In order to achieve complete and comparable estimates several harmonisation processes are carried out, which can result in estimates differing from nationally reported data. The following procedures are the most common source of differences:

  • Benchmarking the working-age population to the estimates of the United Nations World Population Prospects.
  • Application of the standards adopted by the International Conference of Labour Statisticians to produce internationally comparable figures.
  • Adjustment of classifications for standardization purposes. For instance, adjusting data to account for differences in age coverage.
  • Internal consistency procedures. Each edition of the ILO modelled estimates is internally consistent by construction. This entails ensuring that components add up to the total in every observation across all related indicators. Such normalization procedures can produce differences with respect to national sources.
  • Mitigation of time series breaks. In order to produce statistics that are comparable over time, in certain instances it is necessary to adjust figures within a time series to account for changes in methodology, coverage or other relevant dimensions.

The ILO modelled estimates cover a wide variety of indicators. Hence, input updates and methodological improvements are implemented in a staggered manner. The time stamp indicates the production date of the estimates, also referred to as the edition. The production date is important because it indicates approximately the cut-off date for inclusion of nationally-reported observations as input into the models. Additionally, estimates with the same production date have undergone normalization to ensure that they are internally consistent. For instance, the sum of employment across all economic sectors will equal the sum across all occupations. Nonetheless, for estimates with different production dates, this will not be the case.

We are constantly improving the ILO modelled estimates. Revisions usually happen for one of three reasons:

  • Countries make new data available. The ILOSTAT database is constantly updated as new national labour statistics become available. In some cases, this may only happen after a significant delay, requiring the ILO to replace estimates for that year with the reported statistics.
  • Revisions are made to other databases used by our statistical model. The ILO’s econometrics models use databases maintained by other international organizations such as the UN’s World Population Prospects and the IMF’s World Economic Outlook. These databases are periodically subject to their own revisions, which can lead to revisions in the ILO modelled estimates.
  • Historical data needs to be revised. Periodically, data from prior years needs to be revised as new information emerges. 

Please see different options on our dissemination and analysis page

Labour market indicators

Labour market indicators are estimated using a series of models that establish statistical relationships between observed labour market indicators and explanatory variables. These relationships are used to impute missing observations and to make projections for the indicators.

There are many potential statistical relationships, also called “model specifications”, that could be used to predict labour market indicators. The key to obtaining accurate and unbiased estimates is to select the best model specification in each case. The ILO modelled estimates generally rely on a procedure called “cross-validation”, which is used to identify those models that minimize the expected error and variance of the estimation. This procedure involves repeatedly computing a number of candidate model specifications using random subsets of the data: the missing observations are predicted and the prediction error is calculated for each iteration. Each candidate model is assessed based on the pseudo-out-of-sample root mean square error, although other metrics such as result stability are also assessed depending on the model. This makes it possible to identify the statistical relationship that provides the best estimate of a given labour market indicator. It is worth noting that the most appropriate statistical relationship for this purpose may differ according to country.

The extraordinary disruptions to the global labour market caused by the COVID-19 pandemic have rendered the series of models underlying the ILO modelled estimates less suitable for estimating and projecting the evolution of labour market indicators. For this reason, the methodology has been adapted, and explanatory variables that are specific to the COVID-19 crisis have been introduced into the modelling process.

The benchmark for the ILO modelled estimates is the 2019 Revision of the United Nations World Population Prospects, which provides estimates and projections of the total population broken down into five-year age groups. The working-age population comprises everyone who is at least 15 years of age.

Although the same basic approach is followed in the models used to estimate all the indicators, there are differences between the various models because of specific features of the underlying data. Further details are provided for each model in this methodological description, while an overview is provided below.

Labour force, employment structure and labour underutilization

To track the participation in the labour market of the working age population estimates of the labour force are produced, disaggregated by sex and age. The labour force measures active participation in the labour market: the sum of persons employed and the unemployed.

To analyse the employment structure, the distribution of employment as a function of four different breakdowns is estimated: employment status, economic activity (sector), occupation, and economic class (working poverty).

To measure labour underutilization, there are numerous series available disaggregated by sex and age: unemployment rate, labour underutilization rates (LU2, LU3 and LU4), the NEET rate (youth not in employment, education or training), time-related underemployment rate, and all of the related underlying indicators.

For all the indicators described – except for economic class, a breakdown by rural/urban areas is produced. 

To produce estimates for 2020, a cross-validation approach is used to assess models that minimize prediction error in that specific year. The tested models include annual averages of high-frequency indicators related to the evolution of the COVID-19 pandemic. Two main approaches are combined to produce projections beyond 2020. The first is to use partial data for 2021 (for instance the first three quarters). The second is based on error correction models, in which the effect of the pandemic is modelled as a short run component whilst assuming a return to trend in the longer run. Projections are only available for selected indicators.

Working-hour losses

The number of working hours lost is estimated by making use of a nowcasting model. This method uses data that are available almost in real time to predict aggregate hours worked that are published with substantial delay. The nowcasting model allows to produce the following indicators:

  • Percentage of hours lost due to the COVID-19 crisis, compared to the baseline (the latest pre-crisis quarter, i.e., the 4th quarter of 2019, seasonally adjusted), adjusting for population aged 15-64. The figures reported should not be interpreted as a quarterly or an inter-annual growth rate. The first year with available estimates is 2020.
  • Full-time equivalent employment losses (assuming 40 or 48 workweek hours). This measure is constructed by dividing the number of weekly hours lost due to COVID-19 and dividing them by 40 or 48. Hence, they provide an illustration of the magnitude in hours lost, by expressing them in full-time jobs. The first year with available estimates is 2020.
  • Total weekly hours worked by employed persons and weekly hours worked divided by population 15-64. These time series start in 2005, because the estimates combine nowcast results for 2020 with historical time series data on hours worked and population from ILOSTAT.

The data in the nowcasting model include a variety of indicators of economic activity and of the evolution of the labour market, such as:

  • labour force survey data 
  • administrative data on the labour market, such as registered unemployment
  • up-to-date mobile phone data from Google Mobility Reports
  • Oxford’s COVID‑19 Government Response Stringency Index
  • data on the incidence of COVID-19

Given the exceptional situation, including the scarcity of relevant data, the estimates are subject to a substantial amount of uncertainty.These estimates are subject to regular updates and revision.

For more information on the modelling technique, refer to the annexes of ILO Monitor: COVID-19 and the world of work

Labour income

The dataset covers 189 countries as well as global and regional aggregates. The data are based on the ILO Harmonized Microdata collection. To produce consistent time series for all countries, statistical models are used to extrapolate and impute missing data points. The dataset contains two key indicators: the labour income share and the labour income distribution, following the recommendation of the ILO Global Commission on the Future of Work to develop new distributional indicators. Furthermore, the new internationally comparable labour share data will be used to monitor progress towards the United Nations’ Sustainable Development Goals.

Wages

The methodology to estimate global and regional wage trends was developed by the ILO for the previous editions of the Global Wage Report (GWR) in collaboration between technical departments and the Department of Statistics, following four peer reviews conducted by five independent experts. The appendix of the GWR describes the methodology adopted as a result of this process. 

Global estimates on wages are not in the ILOEST database or other database on ILOSTAT. 

Labour migration

The third edition of the ILO Global Estimates on International Migrant Workers: Results and Methodology presents the most recent estimates on the stock of international migrant workers, disaggregated by age, sex, country-income group and region, and the estimation methodology. The reference year is 2019. The report predates the onset of the COVID-19 crisis, which has affected the magnitude and characteristics of international labour migration.

Global estimates on labour migration are not in the ILOEST database or other database on ILOSTAT. 

Publications

Many publications are available in English only. Click on the + sign for other languages, if available, and additional information. 

Scroll to Top