| Title: | Explore World Development Indicators Data |
|---|---|
| Description: | Provides a workflow for exploring World Development Indicators (WDI) country-level panel data. It downloads WDI data using the 'WDI' package and computes diagnostic indices that capture the temporal behaviour of the data by incorporating the grouping structure of the data. The set of diagnostic indices implemented includes variation features, trend and shape features, and sequential temporal features. This method is described in Akinfenwa, Cahill, and Hurley (2025) "wdiexplorer: An R package Designed for Exploratory Analysis of World Development Indicators (WDI) Data" <doi:10.48550/arXiv.2511.07027>. We adapt the clustering diagnostics and visualisation methodology described in Rousseeuw (1987) <doi:10.1016/0377-0427(87)90125-7> and selected time series features from Hyndman and Athanasopoulos (2021) "Forecasting: Principles and Practice" <https://otexts.com/fpp3/>. |
| Authors: | Oluwayomi Akinfenwa [aut, cre], Niamh Cahill [aut, ths], Catherine Hurley [aut, ths] |
| Maintainer: | Oluwayomi Akinfenwa <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-22 08:50:24 UTC |
| Source: | https://github.com/oluwayomi-olaitan/wdiexplorer |
Add grouping information of the WDI data to a metric summary
add_group_info(metric_summary, wdi_data)add_group_info(metric_summary, wdi_data)
metric_summary |
A data frame containing the calculated diagnostic indices generated by any of the following functions:
|
wdi_data |
A data frame of the indicator data generated by |
A data frame containing the calculated diagnostic indices and the grouping variables in the WDI data set.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data)pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data)
Calculates the collection of diagnostic indices at once
compute_diagnostic_indices(wdi_data, index = NULL, group_var)compute_diagnostic_indices(wdi_data, index = NULL, group_var)
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income") |
A data frame with columns country, country_avg_dist, within_group_dist, sil_width,
trend_strength, linearity, curvature, smoothness, crossing_points, flat_spot, and acf.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region")pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region")
Compute dissimilarity between pair of countries Calculate pairwise dissimilarities and convert the output to matrix.
compute_dissimilarity(wdi_data, index = NULL, metric = "euclidean")compute_dissimilarity(wdi_data, index = NULL, metric = "euclidean")
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
metric |
A character string specifying the dissimilarity metric to use
Defaults to |
A matrix of pairwise dissimilarities between countries.
pm_diss_mat <- compute_dissimilarity(pm_data)pm_diss_mat <- compute_dissimilarity(pm_data)
Calculates number of crossing points, longest flat spot using the feasts package functionality and an additional time series feature - autocorrelation.
compute_temporal_features(wdi_data, index = NULL)compute_temporal_features(wdi_data, index = NULL)
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
A data frame with columns country, crossing_points, flat_spot, and acf.
pm_temporal <- compute_temporal_features(pm_data)pm_temporal <- compute_temporal_features(pm_data)
Calculates trend strength, linearity, and curvature using the feasts and fabletools packages functionality.
compute_trend_shape_features(wdi_data, index = NULL, verbose = TRUE)compute_trend_shape_features(wdi_data, index = NULL, verbose = TRUE)
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
verbose |
Logical, if TRUE, the message about the data download is printed. If FALSE, it is silenced. |
A data frame with columns country, trend_strength, linearity, curvature, and smoothness.
pm_trend_shape <- compute_trend_shape_features(pm_data, verbose = TRUE)pm_trend_shape <- compute_trend_shape_features(pm_data, verbose = TRUE)
Calculates average dissimilarities between countries, group-wise country dissimilarities, and silhouette widths.
compute_variation( wdi_data, index = NULL, diss_matrix = compute_dissimilarity(wdi_data, index = index), group_var )compute_variation( wdi_data, index = NULL, diss_matrix = compute_dissimilarity(wdi_data, index = index), group_var )
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
diss_matrix |
An optional dissimilarity matrix generated by |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income") |
A data frame with columns country, group, country_avg_dist, within_group_dist, and sil_width.
pm_variation <- compute_variation(pm_data, group_var = "region")pm_variation <- compute_variation(pm_data, group_var = "region")
Extract valid data from the WDI data Reports countries with no data point, countries with one data point, as well as years for which no data are available.
get_valid_data(wdi_data, index = NULL, verbose = TRUE)get_valid_data(wdi_data, index = NULL, verbose = TRUE)
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
verbose |
Logical, if TRUE, the message about countries and years will one or no data point is printed. If FALSE, it is silenced. Default to TRUE |
A tibble with the valid data for the provided WDI indicator data set and a detailed report of missing entries.
get_valid_data(pm_data, verbose = TRUE)get_valid_data(pm_data, verbose = TRUE)
WDI R packageCreate and store the data for the specified indicator code in a folder called wdi_data.
get_wdi_data(indicator, verbose = TRUE)get_wdi_data(indicator, verbose = TRUE)
indicator |
A valid WDI indicator code |
verbose |
Logical, if TRUE, the message about the data download is printed. If FALSE, it is silenced. Default to TRUE |
An .rds file containing the data set for the specified indicator code.
pm_data <- get_wdi_data(indicator = "EN.ATM.PM25.MC.M3", verbose = TRUE)pm_data <- get_wdi_data(indicator = "EN.ATM.PM25.MC.M3", verbose = TRUE)
The Programme for International Student Assessment (PISA) is a study conducted by the Organisation for Economic Co-operation and Development (OECD) that evaluates education systems by measuring 15-year-old students’ performance in reading, mathematics, and science every three years.
pisa_datapisa_data
A data frame with 15,407 observations with 13 variables
Country name (character)
2-letter ISO country code (character)
3-letter ISO country code (character)
Calendar year representing the time index of the observation (integer)
Observational values for the specified indicator code (numeric)
An empty variable meant to indicate the operational status of variables (character)
Timestamp that indicates the most recent update of the indicator date (character)
Geographical region variable (character)
Name of the capital city of each country (character)
Geographic coordinate that measures the longitude of the city (character)
Geographic coordinate that measures the latitude of the city (character)
World Bank income classification variable (character)
World Bank income classification variable (character)
World Development Indicator, using the WDI R package
data(pisa_data) head(pisa_data)data(pisa_data) head(pisa_data)
Generates the trajectory of each country data series and supports two plot modes: The display of all series uniformly or a mode that highlight countries with metric values within a specified percentile. Each mode can be rendered in two versions: ungrouped and grouped. Hovering over each highlighted line displays the corresponding country name and metric value
plot_data_trajectories( wdi_data, index = NULL, group_var = NULL, metric_summary = NULL, metric_var = NULL, percentile = 0.95 )plot_data_trajectories( wdi_data, index = NULL, group_var = NULL, metric_summary = NULL, metric_var = NULL, percentile = 0.95 )
wdi_data |
A data frame of the indicator data generated by |
index |
A character string specifying the indicator code
Defaults to |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income")
Default to |
metric_summary |
A data frame containing computed diagnostic metrics and the pre-defined grouping information,
generated by passing the output of any diagnostic metrics function to |
metric_var |
Character string specifying metric variable name in |
percentile |
A percentile threshold (between 0 and 1) for highlighting countries based on their metric values
Defaults to |
An ungrouped or grouped interactive plot object displaying the trajectory of country-level data series. It supports both the display of all series uniformly, and also a mode that highlight countries that fall within a specified percentile of any chosen diagnostic metric values.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_data_trajectories(pm_data, group_var = "region", metric_summary = pm_diagnostic_metrics_group, metric_var = "within_group_avg_dist")pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_data_trajectories(pm_data, group_var = "region", metric_summary = pm_diagnostic_metrics_group, metric_var = "within_group_avg_dist")
Generates faceted ggplot displaying the distribution of either selected metric(s) or all the set of diagnostic indices.
By default, distribution(s) are ungrouped; if a group_var is specified, distributions are grouped by its levels within each panel.
If only one metric is specified in metric_var, a single panel is displayed.
plot_metric_distribution( metric_summary, colour_var, metric_var = NULL, group_var = NULL )plot_metric_distribution( metric_summary, colour_var, metric_var = NULL, group_var = NULL )
metric_summary |
A data frame containing computed diagnostic metrics and the pre-defined grouping information,
generated by passing the output of any diagnostic metrics function to |
colour_var |
A variable in |
metric_var |
Character string or vector of character strings specifying metric variable name(s) in |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income")
Default to |
A ggplot object displaying either the ungrouped or grouped distribution of metric(s) in metric_summary.
Each metric is displayed in a separate facet panel; if one metric is specified, a single panel is shown.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_distribution(pm_diagnostic_metrics_group, colour_var = "region", group_var = "region")pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_distribution(pm_diagnostic_metrics_group, colour_var = "region", group_var = "region")
Creates an interactive plot linking the scatterplot of two selected metrics with data trajectories. The scatterplot showing the relationship between specified metrics are presented in one panel, and the data trajectories are presented in another panel. Hovering over a point in the scatterplot highlights the corresponding trajectory with the country name, and vice versa.
plot_metric_linkview( wdi_data, index = NULL, metric_summary, metric_var, group_var = NULL )plot_metric_linkview( wdi_data, index = NULL, metric_summary, metric_var, group_var = NULL )
wdi_data |
A data frame of the indicator data generated by |
index |
A character string specifying the indicator code
Defaults to |
metric_summary |
A data frame containing computed diagnostic metrics and the pre-defined grouping information,
generated by passing the output of any diagnostic metrics function to |
metric_var |
A vector of character strings specifying metric variable names in |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income")
Default to |
An ungrouped or grouped interactive girafe object displaying the two panels, one with the scatterplot of two specified metrics and the other with the data trajectories.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_linkview(pm_data, metric_summary = pm_diagnostic_metrics, metric_var = c("linearity", "curvature"))pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_linkview(pm_data, metric_summary = pm_diagnostic_metrics, metric_var = c("linearity", "curvature"))
Generates bars representing the metric value of each country, countries are partitioned by the levels of a specified variable. The partition plot is restricted to group levels containing more than one country, because meaningful comparisons are not possible for single-country levels. The metric value of each country is represented by a coloured bar ordered in descending order, while a lighter-shaded rectangular bar beneath indicates the group-level average for the metric. Countries in each group-level are represented by the same colour.
plot_metric_partition(metric_summary, metric_var, group_var, x_breaks = NULL)plot_metric_partition(metric_summary, metric_var, group_var, x_breaks = NULL)
metric_summary |
A data frame containing computed diagnostic metrics and the pre-defined grouping information,
generated by passing the output of any diagnostic metrics function to |
metric_var |
Character string specifying metric variable name in |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income") |
x_breaks |
Numeric vector specifying the limits and breaks, default to NULL which automatically breaks the x_axis |
A ggplot object displaying the metric value of each country by a coloured bar ordered in descending order.
A lighter-shaded rectangular bar is displayed beneath the bars indicating their respective group-level average.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_partition(metric_summary = pm_diagnostic_metrics_group, metric_var = "sil_width", group_var = "region")pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_metric_partition(metric_summary = pm_diagnostic_metrics_group, metric_var = "sil_width", group_var = "region")
Missingness plot of the indicator data
plot_missing(wdi_data, index = NULL, group_var)plot_missing(wdi_data, index = NULL, group_var)
wdi_data |
A data frame of the indicator data generated by |
index |
An optional character string specifying the indicator code
Defaults to |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income") |
A plot that provides a structured overview of missing data and shows its distribution over time, across countries, and by the specified grouping variable.
plot_missing(pm_data, group_var = "region")plot_missing(pm_data, group_var = "region")
Generates interactive parallel coordinate plots of all diagnostic indices. Hovering over a line across x-axis displays the country name, corresponding metric and its metric value.
plot_parallel_coords(diagnostic_summary, colour_var, group_var = NULL)plot_parallel_coords(diagnostic_summary, colour_var, group_var = NULL)
diagnostic_summary |
A data frame containing the computed set of diagnostic indices generated by |
colour_var |
A variable in |
group_var |
A grouping variable in the WDI data set (e.g., "region" or "income")
Default to |
An ungrouped or grouped interactive parallel coordinate plot of all diagnostic metrics, with each metric represented as a vertical axis. Each country is shown as an interactive line that intersects all axes, with the position along the x-axis corresponding to the diagnostic indices.
pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_parallel_coords(pm_diagnostic_metrics_group, colour_var = "region", group_var = "region")pm_diagnostic_metrics <- compute_diagnostic_indices(pm_data, group_var = "region") pm_diagnostic_metrics_group <- add_group_info(metric_summary = pm_diagnostic_metrics,pm_data) plot_parallel_coords(pm_diagnostic_metrics_group, colour_var = "region", group_var = "region")
This data set contains the mean annual exposure levels to ambient PM2.5 air pollution across various countries, measured in micrograms per cubic meter.
pm_datapm_data
A data frame with 13,910 observations with 13 variables
Country name (character)
2-letter ISO country code (character)
3-letter ISO country code (character)
Calendar year representing the time index of the observation (integer)
Observational values for the specified indicator code (numeric)
An empty variable meant to indicate the operational status of variables (character)
Timestamp that indicates the most recent update of the indicator date (character)
Geographical region variable (character)
Name of the capital city of each country (character)
Geographic coordinate that measures the longitude of the city (character)
Geographic coordinate that measures the latitude of the city (character)
World Bank income classification variable (character)
World Bank income classification variable (character)
World Development Indicator, using the WDI R package
data(pm_data) head(pm_data)data(pm_data) head(pm_data)