Data

Bulk download facility

Q: How can I open the compressed gz files?

All ‘gz’ files can be uncompressed using WinZip , 7zip , or statistical software such as R .

The bulk download facility provides access to individual datasets or all of the ILOSTAT databases. The instructions on using the facility are below the list of directories.

Directories

Select a directory to access files. Data files are in zipped csv (gz) format and dictionary files are in csv format. See guidelines below.

INDICATOR

Tables presented by indicator and frequency

REF_AREA

Tables presented by ref_area (e.g., countries and regions) and frequency

DIC

Dictionaries for the codes used (i.e., code lists)

Rilostat – ILOSTAT’s R package

ILOSTAT’s bulk download facility is the basis for ILOSTAT’s R package (“Rilostat’), which was designed to give data users the ability to access the ILOSTAT databases, search for data, rearrange the information as needed, create data visualizations, and download data in the desired format, all in a programmatic and replicable manner, with the possibility of quickly re-running the queries as required. For more info, visit visit the R-ilostat webpage.

Frequently asked questions

How can I open the compressed gz files?

All ‘gz’ files can be uncompressed using WinZip, 7zip, or statistical software such as R.

How can I read the csv files?

Many programs can read csv files, including standard statistical packages such as R and STATA. Spreadsheet applications, such as Microsoft Excel, can also open csv files. Nevertheless, trying to open large files using Excel will crash the application. Note you may need to indicate the comma is the field separator to properly read the file.

Where can I find information beyond labels?

The code lists only provide the label corresponding to each code used, not including any further information on concepts, definitions or classifications. These are available on the concepts and definitions page.

How can your data refer to future dates?

This will occur for indicators and reference areas for which projections are available.

Using the bulk download facility

Overview

The bulk download facility contains data, metadata and documentation. These include datasets in zipped csv format, “dictionaries” for the codes used in the csv files, and a PDF version of these instructions. The directories containing the datasets by indicator (for instance the unemployment rate by sex and age) or by ref_area (abbreviation for reference area, which is the relevant geographical unit such as a country) present, in addition to all the data tables available, a table of contents detailing the list of tables available by indicator or reference area and the time period covered by the corresponding data. The following table summarizes the contents and provides a brief description of each item.

Data directories [indicator] and [ref_area]

There are two different data directories, based on two different ways of presenting the corresponding tables: organizing them by ‘indicator’ (and frequency) or by ‘ref_area’ (and frequency). The indicator refers to the title of each specific table, including the represented variable and the eventual breakdowns used for it (for instance, ‘labour force by sex and age’, ‘employment by sex and economic activity’ and ‘unemployment rate by sex, age and rural / urban areas’ are ILOSTAT indicators). The ref_area (from reference area) refers to the geographic areas for which data are available. Since ILOSTAT includes both country-level data and regional and global estimates, the reference area can either refer to countries, to regions (geographic regions such as Africa, Americas or Arab States, income groups such as low income countries, or other groups such as the BRICS or the G20) or to the world as a whole. However, it is important to note that global and regional estimates are only available for some indicators, and so most datasets would only include country-level data. The frequency refers to whether the various data points are annual, quarterly or monthly.

Data directories, whether by indicator or by ref_area, are presented in csv format as compressed zip files (‘gz’). All ‘gz’ files can be uncompressed using WinZip or 7zip. For further information on the csv files, see the following section. After selecting one of the two approaches proposed (tables by indicator or by ref_area) by clicking on the name of the directory, you can access and download the desired data by clicking on the code name(s) of the table(s) you are looking for.

The [dic] directory provides dictionaries of all the code lists needed to identify the indicator or reference area that you are looking for. For reference, please note that codes all follow the same structure. The indicator code includes, in this order:

code of the topic
code to identify the indicator within that topic
breakdowns or ‘NOC’ for ‘no classification’ if there is no breakdown
unit of measure
- ‘NB’ for absolute values or numbers
- ‘RT’ for percentages or rates
frequency
- ‘A’ for annual data
- ‘Q’ for quarterly data
- ‘M’ for monthly data

Similarly, the code names of the files by reference area refer to:

country (ISO Alpha-3 country code) or the region (codes starting with X) and
frequency

The two tables presented next show the contents of the [indicator] and the [ref_area] directories, which contain approximately 500 and 700 datasets respectively.

Contents of [indicator]

Files	Contents
table_of_contents_en	Table of contents in English
table_of_contents_fr	Table of contents in French
table_of_contents_sp	Table of contents in Spanish
EAP_TEAP_SEX_AGE_NB_A.csv	Dataset containing all annual data available on the labour force by sex and age
EMP_DWAP_NOC_RT_A.csv	Dataset containing all annual data available for the employment-to-population ratio
…	…

Contents of [ref_area]

Files	Contents
table_of_contents_en	Table of contents in English
table_of_contents_fr	Table of contents in French
table_of_contents_sp	Table of contents in Spanish
ABW_A.csv	Dataset containing all annual data available for Aruba
ABW_M.csv	Dataset containing all monthly data available for Aruba
…	…

Format of CSV data files

Files in ‘csv’ format are files storing tabular information (whether numbers or text) in the form of plain text, as comma separated values. That is, the columns (or fields) from the original table are separated by commas, allowing for each row or line of the file to correspond to one data record (the data record may thus consist of one or more fields, separated by commas). These files can easily and straightforwardly be opened in Excel. In ILOSTAT ‘csv’ files, the first row contains the headers (of the fields or columns). The subsequent rows present the data records, consisting of the key of the record (the ‘names’ of the dimensions used to identify each record, including the reference area, the source of the data, the classifications used, etc., referring to all fields from ‘ref_area’ to ‘time’), the observation value (‘obs_value’) and any other metadata available (such as the geographical coverage of the source or the specific definitions used for some concepts, referring to all fields from ‘obs_status’ to ‘note_source’). All of the labels corresponding to the code names used as field headers in the csv files available for download are presented in the code lists’ dictionary ([dic] files, see following section for further information). The only code name not explained in the [dic] files is ‘obs_value’, which corresponds to
the observation value.

There is no dictionary (or no ‘dic’ file) for the time dimension. The syntax of the codes used for this dimension is the following:

Annual data: YYYY where YYYY is the year.
Quarterly data: YYYYQ where YYYY is the year and Q is the quarter (the number corresponding to the quarter from 1 to 4).
Monthly data: YYYYMM where YYYY is the year and MM is the month (the number corresponding to the month from 01 to 12).

The number format applied in ILOSTAT files uses a dot as the decimal symbol (‘.’).

Dictionary directory [dic]

Code lists are predefined sets of terms from which statistical concepts (statistical characteristics of data) that have been coded take their values. All of the code lists presented in ILOSTAT are available in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). All ILOSTAT code list files have the same structure, consisting of three columns: the variable name or code (‘var_name’), the variable label or description of the code (‘var_label’) and a number used to sort the information in the file (‘var_sort’). The following table provides an example of ILOSTAT code list.

Extract of ‘indicator_en.csv’

Indicator	Indicator.label	Indicator.sort
GDP_211P_NOC_NB	Output per worker (GDP constant 2011 international $ in PPP) — ILO estimates and projections, Nov. 2016 (units)	1
CPI_NCPI_COI_IN	National consumer price index (CPI) by COICOP (units)	2
…	…	…

The various code lists available in English, French and Spanish in the [dic] directory correspond to the fields used in the downloaded csv files described in the previous section (except for the ‘obs_value’ field used for the observation value and not requiring a dictionary with labels). The following table enumerates the code lists included in the [dic] directory.

Extract of ‘indicator_en.csv’ in [dic]

Variable name¹	Brief description
ref_area	Reference area – this can refer to countries, geographic regions, groups of countries (by income level or others) or the world
source	The specific source of the data, including information on the country or region for which it is used and the main type of source (population census, labour force survey, administrative records, etc.) as well as the precise name of the source.
indicator	The indicator, including information on the represented variables, the classifications used (if any) and the unit.
sex	The breakdown by sex and the items of this breakdown.
classif1	All classifications used as the first breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items.
classif2	All classifications used as the second breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items.
obs_status	The value status or flags on the values, such as breaks in series or provisional values.
note_classif	Metadata and/or footnotes related to the classifications used and the specific classification categories.
note_indicator	Metadata and/or footnotes related to the indicator.
note_source	Metadata and/or footnotes related to the data source.

It should be noted that these code lists present only the label corresponding to each code. For further methodological information, including definitions of the main statistical terms used in ILOSTAT, detailed indicator descriptions and statistical standards, refer to the concepts and definitions page.

The two data directories [indicator] and [ref_area] include a table of contents, available in csv format and in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). These tables of contents list all of the data files available for download in the corresponding directory, and provide summary information on each data file.

The table of contents of the [indicator] directory lists all the indicators available, with the label of the indicator and the frequency of the data.

The table of contents of the [ref_area] directory lists all the reference areas available (countries, regions, groups of countries), with the label of the reference area and the frequency of the data.

Both tables indicate the size of each data file, the time period covered by the data in the file and the date when the data file was last updated. Since ILOSTAT’s datasets include projections of the main labour market indicators, the time period covered by some data files can go as far as 2050. The codes or identifiers used in the tables of contents for the indicators and reference areas in the first field or column (‘id’) are unique and allow for the unequivocal identification of the corresponding item. The two tables presented next show extracts of the tables of contents of the [indicator] and the [ref_area] directories.

Extract of ‘table_of_contents_en.csv’ in [indicator]

Variable name²	Brief description
id	File name of the dataset
indicator	Indicator code
indicator.label	Indicator name, including information on the represented variables, the classifications used (if any) and the unit.
freq	Frequency code (A, Q, M)
freq.label	Frequency label
size	Size of the .csv.gz file
data.start	First time period available in the dataset
data.end	Last time period available in the dataset
last.update	Last update of the dataset (Europe/Paris time zone)
n.records	Number of records in the dataset
collection	Collection code
collection.label	Data collection or compilation from which the data was derived, from all the various data compilations carried out by the ILO and disseminated in ILOSTAT
subject	Subject code
subject.label	How the indicator is display on the ilostat website

Extract of file ‘table_of_contents_en.csv’ in [ref_area]

Variable name³	Brief description
id	File name of the dataset
ref_area	Reference area code
ref_area.label	Reference area name, this can refer to countries, geographic regions, groups of countries (by income level or others) or the world
freq	Frequency code (A, Q, M)
freq.label	Frequency label
size	Size of the file
data.start	First time period available in the dataset
data.end	Last time period available in the dataset
last.update	Last update of the dataset (Europe/Paris time zone)
n.records	Number of records in the dataset
group_geo	Geograpical group code
group_geo.label	Geographical group name of the reference area
group_income	Income group code
group_income.label	Income group name of the reference area

Updates

All of the information stored in the facility is updated daily at 12:00 pm (Europe/Paris time zone). The updating procedure only involves datasets for which there is new data or that have undergone a modification or a structural change.