The bulk download facility provides access to individual datasets or all of the ILOSTAT databases. The instructions on using the facility are below the list of directories.
Directories
Select a directory to access files. Data files are in zipped csv (gz) format and dictionary files are in csv format. See guidelines below.
ILOSTAT’s bulk download facility is the basis for ILOSTAT’s R package (“Rilostat’), which was designed to give data users the ability to access the ILOSTAT databases, search for data, rearrange the information as needed, create data visualizations, and download data in the desired format, all in a programmatic and replicable manner, with the possibility of quickly re-running the queries as required. For more info, visit visit the R-ilostat webpage.
Table of Contents
Frequently asked questions
All ‘gz’ files can be uncompressed using WinZip, 7zip, or statistical software such as R.
Many programs can read csv files, including standard statistical packages such as R and STATA. Spreadsheet applications, such as Microsoft Excel, can also open csv files. Nevertheless, trying to open large files using Excel will crash the application. Note you may need to indicate the comma is the field separator to properly read the file.
The code lists only provide the label corresponding to each code used, not including any further information on concepts, definitions or classifications. These are available on the concepts and definitions page.
This will occur for indicators and reference areas for which projections are available.
Related pages
Using the bulk download facility
Overview
The bulk download facility contains data, metadata and documentation. These include datasets in zipped csv format, “dictionaries” for the codes used in the csv files, and a PDF version of these instructions. The directories containing the datasets by indicator (for instance the unemployment rate by sex and age) or by ref_area (abbreviation for reference area, which is the relevant geographical unit such as a country) present, in addition to all the data tables available, a table of contents detailing the list of tables available by indicator or reference area and the time period covered by the corresponding data. The following table summarizes the contents and provides a brief description of each item.
Table 1. Contents of the facility
Directory | Contents |
[indicator] | All ILOSTAT tables presented by indicator and frequency |
[ref_area] | All ILOSTAT tables presented by ref_area and frequency |
[dic] | Dictionaries of all the codes used (code lists) |
BulkDownload_Guidelines.pdf | Documentation, including guidelines and instructions |
Data directories [indicator] and [ref_area]
There are two different data directories, based on two different ways of presenting the corresponding tables: organizing them by ‘indicator’ (and frequency) or by ‘ref_area’ (and frequency). The indicator refers to the title of each specific table, including the represented variable and the eventual breakdowns used for it (for instance, ‘labour force by sex and age’, ‘employment by sex and economic activity’ and ‘unemployment rate by sex, age and rural / urban areas’ are ILOSTAT indicators). The ref_area (from reference area) refers to the geographic areas for which data are available. Since ILOSTAT includes both country-level data and regional and global estimates, the reference area can either refer to countries, to regions (geographic regions such as Africa, Americas or Arab States, income groups such as low income countries, or other groups such as the BRICS or the G20) or to the world as a whole. However, it is important to note that global and regional estimates are only available for some indicators, and so most datasets would only include country-level data. The frequency refers to whether the various data points are annual, quarterly or monthly.
Data directories, whether by indicator or by ref_area, are presented in csv format as compressed zip files (‘gz’). All ‘gz’ files can be uncompressed using WinZip or 7zip. For further information on the csv files, see the following section. After selecting one of the two approaches proposed (tables by indicator or by ref_area) by clicking on the name of the directory, you can access and download the desired data by clicking on the code name(s) of the table(s) you are looking for.
The [dic] directory provides dictionaries of all the code lists needed to identify the indicator or reference area that you are looking for. For reference, please note that codes all follow the same structure. The indicator code includes, in this order:
- code of the topic
- code to identify the indicator within that topic
- breakdowns or ‘NOC’ for ‘no classification’ if there is no breakdown
- unit of measure
- ‘NB’ for absolute values or numbers
- ‘RT’ for percentages or rates
- frequency
- ‘A’ for annual data
- ‘Q’ for quarterly data
- ‘M’ for monthly data
Similarly, the code names of the files by reference area refer to:
- country (ISO Alpha-3 country code) or the region (codes starting with X) and
- frequency
The two tables presented next show the contents of the [indicator] and the [ref_area] directories, which contain approximately 500 and 700 datasets respectively.
Table 2. Contents of [indicator]
Files | Contents |
table_of_contents_en | Table of contents in English |
table_of_contents_fr | Table of contents in French |
table_of_contents_sp | Table of contents in Spanish |
EAP_TEAP_SEX_AGE_NB_A.csv.gz | Dataset containing all annual data available on the labour force by sex and age |
EMP_DWAP_NOC_RT_A.csv.gz | Dataset containing all annual data available for the employment-to-population ratio |
… | … |
Table 3. Contents of [ref_area]
Files | Contents |
table_of_contents_en | Table of contents in English |
table_of_contents_fr | Table of contents in French |
table_of_contents_sp | Table of contents in Spanish |
ABW_A.csv.gz | Dataset containing all annual data available for Aruba |
ABW_M.csv.gz | Dataset containing all monthly data available for Aruba |
… | … |
Format of CSV data files
Files in ‘csv’ format are files storing tabular information (whether numbers or text) in the form of plain text, as comma separated values. That is, the columns (or fields) from the original table are separated by commas, allowing for each row or line of the file to correspond to one data record (the data record may thus consist of one or more fields, separated by commas). These files can easily and straightforwardly be opened in Excel. In ILOSTAT ‘csv’ files, the first row contains the headers (of the fields or columns). The subsequent rows present the data records, consisting of the key of the record (the ‘names’ of the dimensions used to identify each record, including the reference area, the source of the data, the classifications used, etc., referring to all fields from ‘ref_area’ to ‘time’), the observation value (‘obs_value’) and any other metadata available (such as the geographical coverage of the source or the specific definitions used for some concepts, referring to all fields from ‘obs_status’ to ‘note_source’). All of the labels corresponding to the code names used as field headers in the csv files available for download are presented in the code lists’ dictionary ([dic] files, see following section for further information). The only code name not explained in the [dic] files is ‘obs_value’, which corresponds to
the observation value.
There is no dictionary (or no ‘dic’ file) for the time dimension. The syntax of the codes used for this dimension is the following:
- Annual data: YYYY where YYYY is the year.
- Quarterly data: YYYYQ where YYYY is the year and Q is the quarter (the number corresponding to the quarter from 1 to 4).
- Monthly data: YYYYMM where YYYY is the year and MM is the month (the number corresponding to the month from 01 to 12).
The number format applied in ILOSTAT files uses a dot as the decimal symbol (‘.’).
Dictionary directory [dic]
Code lists are predefined sets of terms from which statistical concepts (statistical characteristics of data) that have been coded take their values. All of the code lists presented in ILOSTAT are available in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). All ILOSTAT code list files have the same structure, consisting of three columns: the variable name or code (‘var_name’), the variable label or description of the code (‘var_label’) and a number used to sort the information in the file (‘var_sort’). The following table provides an example of ILOSTAT code list.
Table 4. Extract of ‘indicator_en.csv’
Indicator | Indicator.label | Indicator.sort |
GDP_211P_NOC_NB | Output per worker (GDP constant 2011 international $ in PPP) — ILO estimates and projections, Nov. 2016 (units) | 1 |
CPI_NCPI_COI_IN | National consumer price index (CPI) by COICOP (units) | 2 |
… | … | … |
The various code lists available in English, French and Spanish in the [dic] directory correspond to the fields used in the downloaded csv files described in the previous section (except for the ‘obs_value’ field used for the observation value and not requiring a dictionary with labels). The following table enumerates the code lists included in the [dic] directory.
Table 5. Extract of ‘indicator_en.csv’ in [dic]
Variable name1used also as code list name | Brief description |
ref_area | Reference area – this can refer to countries, geographic regions, groups of countries (by income level or others) or the world |
source | The specific source of the data, including information on the country or region for which it is used and the main type of source (population census, labour force survey, administrative records, etc.) as well as the precise name of the source. |
indicator | The indicator, including information on the represented variables, the classifications used (if any) and the unit. |
sex | The breakdown by sex and the items of this breakdown. |
classif1 | All classifications used as the first breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items. |
classif2 | All classifications used as the second breakdown in the various indicators available (excluding the breakdown by sex, which is treated separately) and the corresponding classification categories or items. |
obs_status | The value status or flags on the values, such as breaks in series or provisional values. |
note_classif | Metadata and/or footnotes related to the classifications used and the specific classification categories. |
note_indicator | Metadata and/or footnotes related to the indicator. |
note_source | Metadata and/or footnotes related to the data source. |
It should be noted that these code lists present only the label corresponding to each code. For further methodological information, including definitions of the main statistical terms used in ILOSTAT, detailed indicator descriptions and statistical standards, refer to the concepts and definitions page.
The two data directories [indicator] and [ref_area] include a table of contents, available in csv format and in three languages (‘en’ for English, ‘fr’ for French and ‘sp’ for Spanish). These tables of contents list all of the data files available for download in the corresponding directory, and provide summary information on each data file.
The table of contents of the [indicator] directory lists all the indicators available, with the label of the indicator and the frequency of the data.
The table of contents of the [ref_area] directory lists all the reference areas available (countries, regions, groups of countries), with the label of the reference area and the frequency of the data.
Both tables indicate the size of each data file, the time period covered by the data in the file and the date when the data file was last updated. Since ILOSTAT’s datasets include projections of the main labour market indicators, the time period covered by some data files can go as far as 2050. The codes or identifiers used in the tables of contents for the indicators and reference areas in the first field or column (‘id’) are unique and allow for the unequivocal identification of the corresponding item. The two tables presented next show extracts of the tables of contents of the [indicator] and the [ref_area] directories.
Table 6. Extract of ‘table_of_contents_en.csv’ in [indicator]
Variable name2used also as code list name | Brief description |
id | File name of the dataset |
indicator | Indicator code |
indicator.label | Indicator name, including information on the represented variables, the classifications used (if any) and the unit. |
freq | Frequency code (A, Q, M) |
freq.label | Frequency label |
size | Size of the .csv.gz file |
data.start | First time period available in the dataset |
data.end | Last time period available in the dataset |
last.update | Last update of the dataset (Europe/Paris time zone) |
n.records | Number of records in the dataset |
collection | Collection code |
collection.label | Data collection or compilation from which the data was derived, from all the various data compilations carried out by the ILO and disseminated in ILOSTAT |
subject | Subject code |
subject.label | How the indicator is display on the ilostat website |
Table 7. Extract of file ‘table_of_contents_en.csv’ in [ref_area]
Variable name3used also as code list name | Brief description |
id | File name of the dataset |
ref_area | Reference area code |
ref_area.label | Reference area name, this can refer to countries, geographic regions, groups of countries (by income level or others) or the world |
freq | Frequency code (A, Q, M) |
freq.label | Frequency label |
size | Size of the .csv.gz file |
data.start | First time period available in the dataset |
data.end | Last time period available in the dataset |
last.update | Last update of the dataset (Europe/Paris time zone) |
n.records | Number of records in the dataset |
group_geo | Geograpical group code |
group_geo.label | Geographical group name of the reference area |
group_income | Income group code |
group_income.label | Income group name of the reference area |
Updates
All of the information stored in the facility is updated once a week, every Sunday at 10:00 pm (Europe/Paris time zone). The updating procedure only involves datasets for which there is new data or that have undergone a modification or a structural change.