Bulk download facility

Access to individual datasets or all of the ILOSTAT databases

Directories

Select a directory to access files.  Data files are in zipped csv (gz) format and dictionary files are in csv format. See guidelines below.

INDICATOR

Tables presented by indicator and frequency

REF_AREA

Tables presented by ref_area (e.g., countries and regions) and frequency

DIC

Dictionaries for the codes used (i.e., code lists)

Using the bulk download facility

Table of contents

Frequently asked questions

How can I open the compressed gz files?

All ‘gz’ files can be uncompressed using WinZip7zip, or statistical software such as R.

How can I read the csv files?

Many programs can read csv files, including standard statistical packages such as R and STATA. Spreadsheet applications, such as Microsoft Excel, can also open csv files. Nevertheless, trying to open large files using Excel will crash the application. Note you may need to indicate the comma is the field separator to properly read the file.

Where can I find information beyond labels?

The code lists only provide the label corresponding to each code used, not including any further information on concepts, definitions or classifications. These are available on the concepts and definitions page

How can your data refer to future dates?

This will occur for indicators and reference areas for which projections are available. 

Overview

The bulk download facility contains data, metadata and documentation. These include datasets in zipped csv format, “dictionaries” for the codes used in the csv files, and a PDF version of these instructions. The directories containing the datasets by indicator (for instance the unemployment rate by sex and age) or by ref_area (abbreviation for reference area, which is the relevant geographical unit such as a country) present, in addition to all the data tables available, a table of contents detailing the list of tables available by indicator or reference area and the time period covered by the corresponding data. The following table summarizes the contents and provides a brief description of each item.

Table 1. Contents of the facility

DirectoryContents
[indicator]All ILOSTAT tables presented by indicator and frequency
[ref_area]All ILOSTAT tables presented by ref_area and frequency
[dic]Dictionaries of all the codes used (code lists)
BulkDownload_Guidelines.pdfDocumentation, including guidelines and instructions

Data directories [indicator] and [ref_area]

There are two different data directories, based on two different ways of presenting the corresponding tables: organizing them by ‘indicator’ (and frequency) or by ‘ref_area’ (and frequency). The indicator refers to the title of each specific table, including the represented variable and the eventual breakdowns used for it (for instance, ‘labour force by sex and age’, ‘employment by sex and economic activity’ and ‘unemployment rate by sex, age and rural / urban areas’ are ILOSTAT indicators). The ref_area (from reference area) refers to the geographic areas for which data are available. Since ILOSTAT includes both country-level data and regional and global estimates, the reference area can either refer to countries, to regions (geographic regions such as Africa, Americas or Arab States, income groups such as low income countries, or other groups such as the BRICS or the G20) or to the world as a whole. However, it is important to note that global and regional estimates are only available for some indicators, and so most datasets would only include country-level data. The frequency refers to whether the various data points are annual, quarterly or monthly.

Data directories, whether by indicator or by ref_area, are presented in csv format as compressed zip files (‘gz’). All ‘gz’ files can be uncompressed using WinZip or 7zip. For further information on the csv files, see the following section. After selecting one of the two approaches proposed (tables by indicator or by ref_area) by clicking on the name of the directory, you can access and download the desired data by clicking on the code name(s) of the table(s) you are looking for.

The [dic] directory provides dictionaries of all the code lists needed to identify the indicator or reference area that you are looking for. For reference, please note that codes all follow the same structure. The indicator code includes, in this order:

  • code of the topic
  • code to identify the indicator within that topic
  • breakdowns or ‘NOC’ for ‘no classification’ if there is no breakdown
  • unit of measure
    • ‘NB’ for absolute values or numbers
    • ‘RT’ for percentages or rates
  • frequency
    • ‘A’ for annual data
    • ‘Q’ for quarterly data
    • ‘M’ for monthly data

Similarly, the code names of the files by reference area refer to:

  • country (ISO Alpha-3 country code) or the region (codes starting with X) and
  • frequency

The two tables presented next show the contents of the [indicator] and the [ref_area] directories, which contain approximately 500 and 700 datasets respectively.

Table 2. Contents of [indicator]

FilesContents
table_of_contents_enTable of contents in English
table_of_contents_frTable of contents in French
table_of_contents_spTable of contents in Spanish
EAP_TEAP_SEX_AGE_NB_A.csv.gzDataset containing all annual data available on the labour force by sex and age
EMP_DWAP_NOC_RT_A.csv.gzDataset containing all annual data available for the employment-to-population ratio

Table 3. Contents of [ref_area]

FilesContents
table_of_contents_enTable of contents in English
table_of_contents_frTable of contents in French
table_of_contents_spTable of contents in Spanish
ABW_A.csv.gzDataset containing all annual data available for Aruba
ABW_M.csv.gzDataset containing all monthly data available for Aruba

Format of CSV data files

Files in ‘csv’ format are files storing tabular information (whether numbers or text) in the form of plain text, as comma separated values. That is, the columns (or fields) from the original table are separated by commas, allowing for each row or line of the file to correspond to one data record (the data record may thus consist of one or more fields, separated by commas). These files can easily and straightforwardly be opened in Excel. In ILOSTAT ‘csv’ files, the first row contains the headers (of the fields or columns). The subsequent rows present the data records, consisting of the key of the record (the ‘names’ of the dimensions used to identify each record, including the reference area, the source of the data, the classifications used, etc., referring to all fields from ‘ref_area’ to ‘time’), the observation value (‘obs_value’) and any other metadata available (such as the geographical coverage of the source or the specific definitions used for some concepts, referring to all fields from ‘obs_status’ to ‘note_source’). All of the labels corresponding to the code names used as field headers in the csv files available for download are presented in the code lists’ dictionary ([dic] files, see following section for further information). The only code name not explained in the [dic] files is ‘obs_value’, which corresponds to
the observation value.

There is no dictionary (or no ‘dic’ file) for the time dimension. The syntax of the codes used for this dimension is the following:

  • Annual data: YYYY where YYYY is the year.
  • Quarterly data: YYYYQ where YYYY is the year and Q is the quarter (the number corresponding to the quarter from 1 to 4).
  • Monthly data: YYYYMM where YYYY is the year and MM is the month (the number corresponding to the month from 01 to 12).

The number format applied in ILOSTAT files uses a dot as the decimal symbol (‘.’).

Dictionary directory [dic]

Code lists are predefined sets of terms from which statistical concepts (statistical characteristics of data) that have been coded take their values. All of the code lists presented in ILOSTAT are available in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). All ILOSTAT code list files have the same structure, consisting of three columns: the variable name or code (‘var_name’), the variable label or description of the code (‘var_label’) and a number used to sort the information in the file (‘var_sort’). The following table provides an example of ILOSTAT code list.

Table 4. Extract of ‘indicator_en.csv’

IndicatorIndicator.labelIndicator.sort
GDP_211P_NOC_NBOutput per worker (GDP constant 2011 international $ in PPP) — ILO estimates and projections, Nov. 2016 (units)1
CPI_NCPI_COI_INNational consumer price index (CPI) by COICOP (units)2

The various code lists available in English, French and Spanish in the [dic] directory correspond to the fields used in the downloaded csv files described in the previous section (except for the ‘obs_value’ field used for the observation value and not requiring a dictionary with labels). The following table enumerates the code lists included in the [dic] directory.

Table 5. Extract of ‘indicator_en.csv’ in [dic]

Variable name1used also as code list nameBrief description
ref_areaReference area – this can refer to countries, geographic regions, groups of countries (by income level or others) or the world
sourceThe specific source of the data, including information on the country or region for which it is used and the main type of source (population census, labour force survey, administrative records, etc.) as well as the precise name of the source.
indicatorThe indicator, including information on the represented variables, the classifications used (if any) and the unit.
sexThe breakdown by sex and the items of this breakdown.
classif1All classifications used as the first breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items.
classif2All classifications used as the second breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items.
obs_statusThe value status or flags on the values, such as breaks in series or provisional values.
note_classifMetadata and/or footnotes related to the classifications used and the specific classification categories.
note_indicatorMetadata and/or footnotes related to the indicator.
note_sourceMetadata and/or footnotes related to the data source.

It should be noted that these code lists present only the label corresponding to each code. For further methodological information, including definitions of the main statistical terms used in ILOSTAT, detailed indicator descriptions and statistical standards, refer to the concepts and definitions page

The two data directories [indicator] and [ref_area] include a table of contents, available in csv format and in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). These tables of contents list all of the data files available for download in the corresponding directory, and provide summary information on each data file. 

The table of contents of the [indicator] directory lists all the indicators available, with the label of the indicator and the frequency of the data. 

The table of contents of the [ref_area] directory lists all the reference areas available (countries, regions, groups of countries), with the label of the reference area and the frequency of the data. 

Both tables indicate the size of each data file, the time period covered by the data in the file and the date when the data file was last updated. Since ILOSTAT’s datasets include projections of the main labour market indicators, the time period covered by some data files can go as far as 2050. The codes or identifiers used in the tables of contents for the indicators and reference areas in the first field or column (‘id’) are unique and allow for the unequivocal identification of the corresponding item. The two tables presented next show extracts of the tables of contents of the [indicator] and the [ref_area] directories.

Table 6. Extract of ‘table_of_contents_en.csv’ in [indicator]

Variable name2used also as code list nameBrief description
idFile name of the dataset
indicatorIndicator code
indicator.labelIndicator name, including information on the represented variables, the classifications used (if any) and the unit.
freqFrequency code (A, Q, M)
freq.labelFrequency label
sizeSize of the .csv.gz file
data.startFirst time period available in the dataset
data.endLast time period available in the dataset
last.updateLast update of the dataset (Europe/Paris time zone)
n.recordsNumber of records in the dataset
collectionCollection code
collection.labelData collection or compilation from which the data was derived, from all the various data compilations carried out by the ILO and disseminated in ILOSTAT
subjectSubject code
subject.labelHow the indicator is display on the ilostat website

Table 7. Extract of file ‘table_of_contents_en.csv’ in [ref_area]

Variable name3used also as code list nameBrief description
idFile name of the dataset
ref_areaReference area code
ref_area.labelReference area name, this can refer to countries, geographic regions, groups of countries (by income level or others) or the world
freqFrequency code (A, Q, M)
freq.labelFrequency label
sizeSize of the .csv.gz file
data.startFirst time period available in the dataset
data.endLast time period available in the dataset
last.updateLast update of the dataset (Europe/Paris time zone)
n.recordsNumber of records in the dataset
group_geoGeograpical group code
group_geo.labelGeographical group name of the reference area
group_incomeIncome group code
group_income.labelIncome group name of the reference area

Updates

All of the information stored in the facility is updated once a week, every Sunday at 10:00 pm (Europe/Paris time zone). The updating procedure only involves datasets for which there is new data or that have undergone a modification or a structural change.

Rilostat – ILOSTAT’s R package

ILOSTAT’s bulk download facility is the basis for ILOSTAT’s R package (“Rilostat’), which was designed to give data users the ability to access the ILOSTAT databases, search for data, rearrange the information as needed, create data visualizations, and download data in the desired format, all in a programmatic and replicable manner, with the possibility of quickly re-running the queries as required. For more info, visit visit the R-ilostat webpage.