Financial Health of Firms During COVID-19
  • Home
  • Germany
  • Germany vs Sweden

On this page

  • 1 Executive Summary
  • 2 Introduction
  • 3 Data description
    • 3.1 Research Question
  • 4 IDA
    • 4.1 Question 1
    • 4.2 Question 2
    • 4.3 Question 3
    • 4.4 Question 4
  • 5 EDA
    • 5.1 Question 1
    • 5.2 Question 2
    • 5.3 Question 3
    • 5.4 Question 4
  • 6 Summary per Question
    • 6.1 Question 1
    • 6.2 Question 2
    • 6.3 Question 3
    • 6.4 Question 4
  • 7 Conclusion
  • 8 References

Financial Health of Firms During COVID-19: Germany and Sweden

Author

Yusuf Kurnia Romadhon

Published

October 20, 2025

1 Executive Summary

This report compares how firms in Germany and Sweden adjusted profitability, leverage, liquidity, and financial structure during the COVID 19 period using firm level data from 2018 to 2021.

Key findings:

  • Median profitability indicators such as ROA, ROE, and EBITDA margins remained broadly stable in both countries, but firm level dispersion increased during 2020 to 2021.
  • Liquidity buffers were preserved. Current ratios remained near pre pandemic levels, indicating no widespread short term solvency stress.
  • Leverage increased modestly in 2020, with German firms relying slightly more on debt financing while Swedish firms maintained more conservative balance sheet positions.
  • Sectoral divergence intensified. Capital intensive and cyclical industries experienced greater volatility, while technology, utilities, and selected service sectors remained comparatively stable.
  • The relationship between liquidity and profitability became more unstable during the pandemic, particularly in Sweden, where firm level divergence and uneven recovery patterns were more pronounced.

Overall, both economies avoided systemic corporate instability during the pandemic period. However, financial adjustment patterns differed: Germany exhibited more coordinated and stable balance sheet responses, while Sweden showed greater firm level heterogeneity and more uneven recovery dynamics.

2 Introduction

This report extends Part 1 by conducting a comparative analysis of corporate financial performance in Germany and Sweden between 2018 and 2021. Using firm-level data from the OSIRIS database, the study examines key financial indicators including profitability, liquidity, leverage, and firm size across major industries. The central research question is: How did the COVID-19 pandemic affect corporate financial performance in Germany and Sweden, and how did differences in policy response and economic structure shape financial resilience?

The analysis distinguishes between two phases: the pre-pandemic period (2018 to 2019) and the pandemic period (2020 to 2021). Germany implemented relatively strict containment measures beginning in March 2020, while Sweden adopted a more voluntary and less restrictive approach. These contrasting policy strategies provide a natural comparative setting to examine whether differences in economic intervention translated into different financial outcomes across industries.

Building on the methods developed in Part 1, this report moves from a single-country perspective to a cross-country framework. By comparing industry-level trends, distributional changes, and structural indicators of resilience, the analysis identifies similarities and divergences in how firms responded to the pandemic shock. The goal is not only to document financial changes, but to evaluate how institutional context and sectoral composition influenced the stability and recovery of corporate performance.

Data limitations include the absence of 2022 financial statements and potential reporting distortions during crisis conditions. Nevertheless, the available panel provides sufficient coverage to assess immediate pandemic impacts and short-term recovery dynamics.

3 Data description

The dataset used in this analysis is sourced from the Bureau van Dijk OSIRIS database, a harmonised global financial database covering publicly listed and major private firms (Bureau van Dijk, 2025).

For this study, firm-level financial data were extracted for Germany and Sweden covering the 2018–2021 accounting years. This window captures two distinct economic regimes:

  • Pre-pandemic period (2018 – 2019)
  • Pandemic period (2020 – 2021)

Each observation represents a firm-year record, enabling both cross-sectional comparison between countries and longitudinal analysis over time.

Data Loading and Integration

This is the detail library used for this project.

Show code
library(dplyr)
library(stringr)
library(janitor)
library(skimr)
library(tsibble)
library(knitr)
library(naniar)
library(patchwork)
library(plotly)
library(tsibbletalk)
library(feasts)
library(broom)
library(purrr)
library(cassowaryr)
library(gganimate)
library(gifski)
library(kableExtra)
library(ggplot2)
library(scales)
library(tidyr)
library(ggrepel)
library(viridisLite)

Financial files were stored by country and year. The following code loads all relevant German and Swedish datasets for 2018 – 2021 and combines them into a unified panel.

Show code
#define the folder
data_directory <- "osiris"

#filter germany and sweden
germany_sweden_file_paths <- list.files(
  data_directory,
  pattern = "^osiris_?(Germany|Sweden)_(2018|2019|2020|2021)\\.rda$",
  full.names = TRUE,
  ignore.case = TRUE
)

#name the list by file stems
germany_sweden_file_stems <- tools::file_path_sans_ext(basename(germany_sweden_file_paths))
osiris_germany_sweden <- setNames(vector("list", length(germany_sweden_file_paths)), germany_sweden_file_stems)

#load each file
for (file_index in seq_along(germany_sweden_file_paths)) {
  temporary_environment <- new.env(parent = emptyenv())
  load(germany_sweden_file_paths[file_index], envir = temporary_environment)
  loaded_objects <- as.list(temporary_environment)

  #If the file holds one object, store that object, otherwise store the sub-list
  osiris_germany_sweden[[file_index]] <- if (length(loaded_objects) == 1) loaded_objects[[1]] else loaded_objects
}

After loading, each dataset is appended with two derived identifiers:

  • year (extracted from file name)
  • source_country (Germany or Sweden)

The datasets are then combined into a single longitudinal panel and ordered by country and year.

This structure allows:

  • Cross-country comparisons
  • Industry-level aggregation
  • Firm-level time-series analysis
Show code
#build a single data frame with 'year' and 'source_country' taken from the filename
combined_germany_sweden <- {
  parts <- vector("list", length(osiris_germany_sweden))
  names(parts) <- names(osiris_germany_sweden)

  for (file_stem in names(osiris_germany_sweden)) {
    data_frame_from_file <- osiris_germany_sweden[[file_stem]]

    extracted_year     <- as.integer(str_extract(file_stem, "\\d{4}"))
    extracted_country  <- str_extract(file_stem, "(?i)Germany|Sweden") |> stringr::str_to_title()

    parts[[file_stem]] <- mutate(
      data_frame_from_file,
      year = extracted_year,
      source_country = extracted_country #avoid clashing with the raw 'country'
    )
  }

  #bind and sort
  bind_rows(parts) |>
    arrange(source_country, year)
}

#unique country–year pairs
combined_germany_sweden |>
  dplyr::distinct(source_country, year) |>
  dplyr::arrange(source_country, year) |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
source_country year
Germany 2018
Germany 2019
Germany 2020
Germany 2021
Sweden 2018
Sweden 2019
Sweden 2020
Sweden 2021

Data Cleaning and Variable Standardisation

To ensure analytical clarity and reproducibility, raw OSIRIS vendor column names were standardised using clean_names() and mapped to clear, topic-based variable names.

The transformation process:

  • Normalises naming conventions.
  • Harmonises financial metrics.
  • Derives fiscal-year variables from year-end reporting dates.
  • Filters the dataset to retain only variables relevant to profitability, liquidity, leverage, and firm structure.
Show code
filtered_germany_sweden <- combined_germany_sweden |>
  clean_names() |>
  
  rename(
    # IDs / scope (used in all questions)
    company_name       = company_name_name,
    company_id         = name_id,
    acct_year          = year,
    country            = country_country,
    city               = city_city_city,
    consolidation_code = consolidation_code_consol_code,
    status             = status_status,
    
    # Industry codes (Q1, Q2, Q4)
    nace_code = nace_rev_1_core_code_cnacecd,
    icb_code  = industrial_classification_benchmark_icb,
    sic_code  = us_sic_core_code_csicuscde,
    
    # Scale / balance sheet (Q2, Q3)
    total_assets_eur      = total_assets_data13077,
    total_liabilities_eur = total_liabilities_and_debt_data14022,
    total_equity_eur      = total_shareholders_equity_data14041,
    
    # Revenue & profit (Q1, Q3)
    total_revenue_eur = total_revenues_data13004,
    net_sales_eur     = net_sales_data13002,
    gross_sales_eur   = gross_sales_data13000,
    net_income_eur    = net_income_starting_line_data15500,
    
    # Profitability ratios (Q1, Q4)
    ebit_margin_pct   = ebit_margin_percent_data31055,
    ebitda_margin_pct = ebitda_margin_percent_data31060,
    roa_pct           = return_on_total_assets_percent_data31015,
    roe_pct           = roe_percent_data31065,
    
    # Efficiency
    net_assets_turnover = net_assets_turnover_data31225,
    stock_turnover      = stock_turnover_data31220,
    
    # Leverage, solvency, and liquidity (Q2)
    solvency_pct        = solvency_ratio_percent_data31310,
    gearing_pct         = gearing_percent_data31315,
    current_ratio       = current_ratio_data31105,
    liquidity_ratio     = liquidity_ratio_data31110,
    shareholders_liq_pct= shareholders_liquidity_ratio_data31305,
    interest_cover      = interest_cover_data31115
  ) |>
  
  # Rename fiscal year
  mutate(
    fy_end_date = lubridate::ymd(as.character(company_fiscal_year_end_date_closdate)),
    fy_year     = year(fy_end_date)
  ) |>
  
  # Select only relevant columns
  select(
    company_id, company_name, fy_end_date, fy_year, acct_year, status, country, city, consolidation_code,
    nace_code, icb_code, sic_code,
    total_assets_eur, total_liabilities_eur, total_equity_eur,
    total_revenue_eur, net_sales_eur, gross_sales_eur, net_income_eur,
    ebit_margin_pct, ebitda_margin_pct, roa_pct, roe_pct,
    net_assets_turnover, stock_turnover,
    solvency_pct, gearing_pct, current_ratio, liquidity_ratio,
    shareholders_liq_pct, interest_cover
  )

Variable Scope and Dimensions

Column Name Description Data Type Original Name
company_id Unique company identifier (Bureau van Dijk ID). character BvD ID Number (os_id_number)
company_name Registered company name (label). character Company Name (name)
fy_end_date Company fiscal year-end date for the account. date Company Fiscal Year End Date (closdate)
fy_year Year extracted from fiscal year-end date. integer derived from Company Fiscal Year End Date (closdate)
acct_year Reporting year label carried from file name/field. integer year
country Country of head office. character Country (country)
city City of head office. character CITY - City (city)
consolidation_code Consolidation scope of the accounts (e.g., C1/C2/U1/U2). character Consolidation Code (consol_code)
nace_code NACE core code (EU industry classification). character NACE Rev 1, Core Code (cnacecd)
icb_code ICB industry classification code. character Industrial Classification Benchmark (icb)
sic_code US SIC core code. character US SIC, Core Code (csicuscde)
total_assets_eur Total assets (EUR). numeric Total Assets (data13077)
total_liabilities_eur Total liabilities and debt (EUR). numeric Total Liabilities and Debt (data14022)
total_equity_eur Total shareholders’ equity (EUR). numeric Total Shareholders Equity (data14041)
total_revenue_eur Total revenues / operating revenue (EUR). numeric Total revenues (data13004)
net_sales_eur Net sales (EUR). numeric Net sales (data13002)
gross_sales_eur Gross sales (EUR). numeric Gross sales (data13000)
net_income_eur Net income (EUR). numeric Net Income / Starting Line (data15500)
ebit_margin_pct EBIT margin (% of sales/revenue). numeric EBIT Margin (%) (data31055)
ebitda_margin_pct EBITDA margin (% of sales/revenue). numeric EBITDA Margin (%) (data31060)
roa_pct Return on total assets (%). numeric Return on Total Assets (%) (data31015)
roe_pct Return on equity (%). numeric ROE (%) (data31065)
net_assets_turnover Net assets turnover (times). numeric Net Assets Turnover (data31225)
stock_turnover Stock / inventory turnover (times). numeric Stock Turnover (data31220)
solvency_pct Solvency ratio (%). numeric Solvency ratio (%) (data31310)
gearing_pct Gearing ratio (%). numeric Gearing (%) (data31315)
current_ratio Current ratio (times). numeric Current ratio (data31105)
liquidity_ratio Liquidity / quick ratio (times). numeric Liquidity ratio (data31110)
shareholders_liq_pct Shareholders’ liquidity ratio (%). numeric Shareholders Liquidity ratio (data31305)
interest_cover Interest cover (times). numeric Interest Cover (data31115)

The selected variables encompass four key dimensions of corporate financial health:

  1. Identification and Location

    • company_id, company_name
    • country, city
    • consolidation_code

These variables define firm identity and reporting scope.

  1. Balance Sheet Structure

    • total_assets_eur
    • total_liabilities_eur
    • total_equity_eur

These measure firm size and capital structure.

  1. Income and Profitability

    • Revenue measures: total_revenue_eur, net_sales_eur, gross_sales_eur
    • Net income: net_income_eur
    • Profitability ratios: roa_pct, roe_pct, ebit_margin_pct, ebitda_margin_pct

These reflect operating performance and returns to capital.

  1. Liquidity and Solvency

    • current_ratio, liquidity_ratio
    • solvency_pct, gearing_pct
    • interest_cover, shareholders_liq_pct

These capture financial stability and short-term resilience.

  1. Industry Classification

    • nace_code
    • icb_code
    • sic_code

Industry identifiers enable sectoral comparison across countries.

Accounting Year Interpretation

The OSIRIS accounting year variable reflects financial reporting cycles that often span two calendar years. For example:

  • 2018 reflects financial results ending in 2018 (primarily covering 2017–2018 activity)
  • 2021 reflects financial performance ending in 2021 (capturing much of the 2020 – 2021 pandemic period)

Throughout this report, the accounting year is interpreted as the endpoint of the reporting cycle, meaning:

  • 2018–2019 represent pre-pandemic benchmarks
  • 2020–2021 represent pandemic-era financial outcomes

This interpretation ensures consistency when comparing financial conditions before and during the COVID-19 shock.

3.1 Research Question

Here are the research questions (sub-questions) that guide us in analysing and answering the primary question: How did the pandemic affect corporate financials in Germany and Sweden?

1. How did profitability change across industries?

This question investigates how profitability variables such as ROA, EBITDA margin, and EBIT margin shifted across industries between the pre-pandemic period (2018 to 2019) and the pandemic period (2020 to 2021) in Germany and Sweden. It aims to identify which sectors saw profit declines, remained stable, or grew during the pandemic. By comparing profitability trends across industries, the analysis highlights which sectors proved more resilient to the economic impacts of the pandemic. These insights help explain how the pandemic affected overall corporate performance and provide a clearer understanding of the differences in financial recovery among industries.

2. How did leverage (debt-to-assets) and liquidity (current ratio) evolve during pandemic?

This question extends the Part 1 analysis by comparing how leverage (debt-to-assets) and liquidity (current ratio) evolved between Germany and Sweden from 2018 to 2021.
The goal is to determine whether firms in the two economies displayed similar balance-sheet responses to the COVID-19 shock.

3. Which Sweden industries showed unexpected financial resilience or vulnerability from 2019 to 2021, and how were these outcomes shaped by profitability, leverage, liquidity, and firm size?

Between 2019 and 2021, Swedish industries showed mixed financial resilience. Manufacturing, IT, and professional services remained strong, maintaining profitability and liquidity despite the pandemic, helped by Sweden’s lighter restrictions and export strength. In contrast, hospitality, transport, and retail were more vulnerable, with sharp declines in profitability and higher leverage as firms relied on debt to survive. Larger firms recovered faster due to stronger cash reserves and financing access, while smaller firms faced liquidity pressures. Overall, industries with high profitability and low leverage before 2020 proved most resilient through 2021.

4. How did the fundamental relationship between corporate liquidity and profitability evolve and fracture within industries during the pandemic?

This study compares firms in Germany and Sweden, two advanced European economies with different financial systems and policy responses. The analysis explores how the ROA and CR relationship evolved and fractured within industries during and after the pandemic using time-series features and scagnostic metrics. ROA and Current Ratio reflect how efficiently firms generate profit and how safely they can cover short-term obligations. The goal is to reveal which sectors were most resilient, which experienced unequal recovery, and how national environments influenced these patterns. Understanding the changing relationship between ROA and CR can provide insight into the financial resilience of different industries. It also helps identify whether recovery followed a stable path or diverged into unequal outcomes, such as K-shaped patterns, where some firms recovered faster while others declined.

4 IDA

The Initial Data Analysis (IDA) evaluates data integrity, distributional behaviour, and comparability across Germany and Sweden before proceeding to substantive analysis.

Given that firm-level financial data are typically heavy-tailed, skewed, and heterogeneous across industries, this stage focuses on:

  • Structural integrity of firm-year observations.
  • Industry classification consistency.
  • Distribution shape and outlier behaviour.
  • Missingness patterns.
  • Justification for robust summaries.

1. Structural Integrity and Missingness Checks

The first step standardises numeric fields, cleans text variables, reports missingness, and performs sanity checks for duplicates and impossible values.

Show code
num_cols <- c(
  "total_assets_eur","total_liabilities_eur","total_equity_eur",
  "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur",
  "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
  "net_assets_turnover","stock_turnover",
  "solvency_pct","gearing_pct","current_ratio","liquidity_ratio",
  "shareholders_liq_pct","interest_cover"
)

#symbol formatting/coerce numerics
filtered_germany_sweden <- filtered_germany_sweden |>
  mutate(
    across(all_of(num_cols) & where(is.character), readr::parse_number),
    country = stringr::str_to_title(country),
    city    = stringr::str_to_title(city)
  )

#missingness report check (overall and by year)
missing_overall <- filtered_germany_sweden |>
  summarise(across(everything(), ~ mean(is.na(.)))) |>
  pivot_longer(everything(), names_to = "variable", values_to = "missing_rate")

missing_by_year <- filtered_germany_sweden |>
  group_by(acct_year) |>
  summarise(across(everything(), ~ mean(is.na(.)))) |>
  pivot_longer(-acct_year, names_to = "variable", values_to = "missing_rate")

#sanity checks
dup_firm_year <- filtered_germany_sweden |>
  count(company_id, fy_year, acct_year, country) |>
  filter(n > 1)

impossible_assets <- filtered_germany_sweden |>
  filter(!is.na(total_assets_eur) & total_assets_eur < 0)

consol_levels <- filtered_germany_sweden |>
  count(consolidation_code, sort = TRUE)

This block confirms:

  • Numeric variables are correctly parsed.
  • Country and city labels are standardised.
  • Duplicate firm-year records are absent.
  • Impossible values (e.g., negative total assets) are flagged.
  • Consolidation codes are reviewed.

Ensuring structural integrity at this stage prevents distortion in later cross-country comparisons.

2. Industry Classification Mapping

To enable consistent sector-level analysis, 2-digit NACE codes are extracted and mapped to broad industry groups. ICB classification is used as a fallback where necessary. A unified industry_group variable is then created.

Show code
#clean 2-digit NACE as integer
filtered_germany_sweden <- filtered_germany_sweden |>
  mutate(
    nace_num2 = suppressWarnings(as.integer(str_sub(readr::parse_number(as.character(nace_code)), 1, 2)))
  )

#map NACE divisions to broad section names
industry_from_nace <- function(x){
  case_when(
    x %in% 1:3 ~ "Agriculture, Forestry & Fishing",
    x %in% 5:9 ~ "Mining & Quarrying",
    x %in% 10:33 | x %in% 15:37 ~ "Manufacturing",
    x %in% 35 ~ "Electricity, Gas, Steam",
    x %in% 36:39 ~ "Water Supply & Waste",
    x %in% 41:43 ~ "Construction",
    x %in% 45:47 ~ "Wholesale & Retail Trade",
    x %in% 49:53 ~ "Transport & Storage",
    x %in% 55:56 ~ "Accommodation & Food",
    x %in% 58:63 ~ "Information & Communication",
    x %in% 64:66 ~ "Financial & Insurance",
    x %in% 68 ~ "Real Estate",
    x %in% 69:75 ~ "Professional, Scientific & Technical",
    x %in% 77:82 ~ "Administrative & Support",
    x %in% 84 ~ "Public Administration",
    x %in% 85 ~ "Education",
    x %in% 86:88 ~ "Human Health & Social Work",
    x %in% 90:93 ~ "Arts, Entertainment & Recreation",
    x %in% 94:96 ~ "Other Service Activities",
    x %in% 97:98 ~ "Household Activities",
    x %in% 99 ~ "Extraterritorial Organizations",
    TRUE ~ NA_character_
  )
}

#ICB top-level sector labels
icb_map <- c(
  "0001" = "Oil & Gas",
  "1000" = "Basic Materials",
  "2000" = "Industrials",
  "3000" = "Consumer Goods",
  "4000" = "Health Care",
  "5000" = "Consumer Services",
  "6000" = "Telecommunications",
  "7000" = "Utilities",
  "8000" = "Financials",
  "9000" = "Technology"
)

#normalize ICB to a 4-digit “bucket” (example: 2573 to 2000)
normalize_icb <- function(x){
  x_chr <- str_extract(as.character(x), "\\d+")
  ifelse(is.na(x_chr), NA_character_,
         sprintf("%04d", as.integer(floor(as.numeric(x_chr) / 1000) * 1000)))
}

filtered_germany_sweden <- filtered_germany_sweden |>
  mutate(
    industry_nace = industry_from_nace(nace_num2),
    icb_bucket    = normalize_icb(icb_code),
    industry_icb  = icb_map[icb_bucket],
    industry_group = coalesce(industry_nace, industry_icb, "Other / Unmapped")
  )

#check
unmapped_sample <- filtered_germany_sweden |>
  filter(industry_group == "Other / Unmapped") |>
  select(company_id, fy_year, acct_year, nace_code, icb_code) |>
  head(15)

#factor order for plots
ordered_levels <- c(
  "Agriculture, Forestry & Fishing","Mining & Quarrying","Manufacturing",
  "Electricity, Gas, Steam","Water Supply & Waste","Construction",
  "Wholesale & Retail Trade","Transport & Storage","Accommodation & Food",
  "Information & Communication","Financial & Insurance","Real Estate",
  "Professional, Scientific & Technical","Administrative & Support",
  "Public Administration","Education","Human Health & Social Work",
  "Arts, Entertainment & Recreation","Other Service Activities",
  "Household Activities","Extraterritorial Organizations",
  "Oil & Gas","Basic Materials","Industrials","Consumer Goods","Health Care",
  "Consumer Services","Telecommunications","Utilities","Financials","Technology",
  "Other / Unmapped"
)
filtered_germany_sweden <- filtered_germany_sweden |>
  mutate(industry_group = factor(industry_group, levels = ordered_levels))

This step ensures:

  • Comparable industry groupings across Germany and Sweden.
  • Reduced fragmentation from highly granular vendor codes.
  • Consistent sector ordering for visualisation.

The mapping prioritises NACE classifications and supplements them with ICB where missing.

3. Reshaping for Distribution Profiling

Numeric variables are reshaped into long format to support systematic profiling. Variables are separated into:

  • Scale variables (monetary totals).
  • Ratio variables (profitability, liquidity, leverage).
Show code
#using long data
ida_germany_sweden <- filtered_germany_sweden |>
  dplyr::select(acct_year, country, dplyr::all_of(num_cols)) |>
  tidyr::pivot_longer(dplyr::all_of(num_cols),
                      names_to = "variable_orig", values_to = "value") |>
  tidyr::drop_na()

#split between scale and ratio
scale_vars <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
                "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur")
ratio_vars <- setdiff(unique(ida_germany_sweden$variable_orig), scale_vars)

#label mapping for the charts
nice <- c(
  total_assets_eur   = "Total assets (EUR)",
  total_liabilities_eur="Total liabilities (EUR)",
  total_equity_eur   = "Equity (EUR)",
  total_revenue_eur  = "Total revenue (EUR)",
  net_sales_eur      = "Net sales (EUR)",
  gross_sales_eur    = "Gross sales (EUR)",
  net_income_eur     = "Net income (EUR)",
  ebit_margin_pct    = "EBIT margin (%)",
  ebitda_margin_pct  = "EBITDA margin (%)",
  roa_pct            = "ROA (%)",
  roe_pct            = "ROE (%)",
  net_assets_turnover= "Net assets turnover (x)",
  stock_turnover     = "Stock turnover (x)",
  solvency_pct       = "Solvency (%)",
  gearing_pct        = "Gearing (%)",
  current_ratio      = "Current ratio (x)",
  liquidity_ratio    = "Liquidity ratio (x)",
  shareholders_liq_pct = "Shareholders’ liquidity (%)",
  interest_cover     = "Interest cover (x)"
)

pretty_var <- function(x) ifelse(x %in% names(nice), nice[x], x)

ida_germany_sweden <- ida_germany_sweden |>
  mutate(variable = pretty_var(variable_orig),
         variable = factor(variable, levels = unique(variable)))

Reshaping facilitates uniform treatment across variables and supports consistent visual diagnostics. Separating scale and ratio variables allows appropriate transformation choices (log scale for monetary values, linear scale for ratios).

Distributional Analysis

4. Scale Variables

Scale variables are examined using violin and jitter plots on a log scale, with medians highlighted.

4.1 Variable Group Definitions

Show code
#define variables sets
vars_balance <- c("total_assets_eur","total_liabilities_eur","total_equity_eur")
vars_flows   <- c("total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur")

This code defines balance-sheet and flow variable groups for visualisation.

4.2 Balanced Sheet Scale Variables

Show code
p_scale_balance <- ida_germany_sweden |>
  filter(variable_orig %in% vars_balance) |>
  mutate(
    y    = if_else(value > 0, value, NA_real_),
    year = factor(acct_year)
  ) |>
  ggplot(aes(x = year, y = y, colour = country,
             group = interaction(country, year))) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4, na.rm = TRUE,
              position = position_dodge(width = 0.75)) +
  geom_jitter(alpha = 0.12, size = 0.6, na.rm = TRUE,
              position = position_jitterdodge(jitter.width = 0.12,
                                              jitter.height = 0,
                                              dodge.width = 0.75)) +
  stat_summary(fun = median, geom = "point", shape = 95, size = 6, na.rm = TRUE,
               position = position_dodge(width = 0.75)) +
  scale_y_log10(labels = scales::label_number(scale_cut = scales::cut_short_scale())) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(x = "Year", y = NULL, title = "Violin and Jitter by Year (log scale) for Balance Sheet Variables") +
  theme_minimal(base_size = 11) +
  theme(
  plot.title.position = "plot",
  legend.position     = "top",
  legend.spacing.y    = unit(-0.6, "lines"),
  legend.justification = "center",
  legend.direction     = "horizontal",
  legend.box           = "horizontal",
  legend.margin        = margin(b = 4),
  axis.text.x          = element_text(size = 10),
  strip.text           = element_text(face = "bold"),
  panel.grid.minor     = element_blank()
)

p_scale_balance

Balance-sheet variables are strongly right-skewed in both countries. Most firms cluster at lower asset levels, while a small number of very large firms create long upper tails.

Key findings:

  • Medians remain stable between 2018 and 2021.
  • Sweden shows slightly higher central levels in asset and liability measures.
  • Dispersion increases modestly during 2020 – 2021

The heavy skewness justifies the use of median and IQR rather than mean.

4.3 Flow Scale Variables

Show code
p_scale_flows <- ida_germany_sweden |>
  dplyr::filter(variable_orig %in% vars_flows) |>
  dplyr::mutate(
    y    = dplyr::if_else(value > 0, value, NA_real_),
    year = factor(acct_year)
  ) |>
  ggplot(aes(x = year, y = y, colour = country,
             group = interaction(country, year))) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4, na.rm = TRUE,
              position = position_dodge(width = 0.75)) +
  geom_jitter(alpha = 0.12, size = 0.6, na.rm = TRUE,
              position = position_jitterdodge(jitter.width = 0.12,
                                              jitter.height = 0,
                                              dodge.width = 0.75)) +
  stat_summary(fun = median, geom = "point", shape = 95, size = 6, na.rm = TRUE,
               position = position_dodge(width = 0.75)) +
  scale_y_log10(labels = scales::label_number(scale_cut = scales::cut_short_scale())) +
  facet_wrap(~ variable, scales = "free_y", ncol = 2) +
  labs(x = "Year", y = NULL, title = "Violin and Jitter by Year (log scale) for Flow Variables") +
  theme_minimal(base_size = 11) +
  theme(
  plot.title.position = "plot",
  legend.position     = "top",
  legend.spacing.y    = unit(0.3, "lines"),
  legend.justification = "center",
  legend.direction     = "horizontal",
  legend.box           = "horizontal",
  legend.margin        = margin(t = 6, b = 2),
  axis.text.x          = element_text(size = 10),
  strip.text           = element_text(face = "bold"),
  panel.grid.minor     = element_blank()
)

p_scale_flows

Revenue and sales measures show similar right-skewed distributions. Net income exhibits a thicker lower tail in 2020 – 2021, indicating increased incidence of low or negative profitability during the pandemic.

Again, medians provide a more robust summary than means.

5. Ratio Variables

Ratio variables are visualised on their natural scale to assess dispersion and potential structural shifts.

5.1 Variable Group Definitions

Show code
#define variable sets
vars_profit_eff <- c(
  "ebit_margin_pct", "ebitda_margin_pct",
  "roa_pct", "roe_pct",
  "net_assets_turnover", "stock_turnover"
)

vars_liquidity_solvency <- c(
  "solvency_pct", "gearing_pct",
  "current_ratio", "liquidity_ratio",
  "shareholders_liq_pct", "interest_cover"
)

This block defines profitability/efficiency and liquidity/solvency variable groups.

5.2 Profitability and Efficiency Ratios

Show code
p_ratio_profit_eff <- ida_germany_sweden |>
  dplyr::filter(variable_orig %in% vars_profit_eff) |>
  dplyr::mutate(year = factor(acct_year)) |>
  ggplot(aes(x = year, y = value, colour = country,
             group = interaction(country, year))) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4, na.rm = TRUE,
              position = position_dodge(width = 0.75)) +
  geom_jitter(alpha = 0.12, size = 0.6, na.rm = TRUE,
              position = position_jitterdodge(jitter.width = 0.12,
                                              jitter.height = 0,
                                              dodge.width = 0.75)) +
  stat_summary(fun = median, geom = "point",
               shape = 95, size = 6, na.rm = TRUE,
               position = position_dodge(width = 0.75)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(x = "Year", y = NULL,
       title = "Violin and Jitter by Year for Profitability & Efficiency Ratios") +
  theme_minimal(base_size = 11) +
  theme(
    axis.text.x = element_text(size = 10),
    strip.text  = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

p_ratio_profit_eff

Profitability ratios cluster near zero but widen during the pandemic years. ROE displays the greatest dispersion, reflecting sensitivity to equity base fluctuations.

Efficiency ratios remain relatively stable, suggesting operational turnover was less volatile than bottom-line profitability.

Heavy tails confirm that median-based comparisons are appropriate.

5.3 Liquidity and Solvency Ratios

Show code
p_ratio_liq_solv <- ida_germany_sweden |>
  dplyr::filter(variable_orig %in% vars_liquidity_solvency) |>
  dplyr::mutate(year = factor(acct_year)) |>
  ggplot(aes(x = year, y = value, colour = country,
             group = interaction(country, year))) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4, na.rm = TRUE,
              position = position_dodge(width = 0.75)) +
  geom_jitter(alpha = 0.12, size = 0.6, na.rm = TRUE,
              position = position_jitterdodge(jitter.width = 0.12,
                                              jitter.height = 0,
                                              dodge.width = 0.75)) +
  stat_summary(fun = median, geom = "point",
               shape = 95, size = 6, na.rm = TRUE,
               position = position_dodge(width = 0.75)) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(x = "Year", y = NULL,
       title = "Violin and Jitter by Year for Liquidity & Solvency Variables") +
  theme_minimal(base_size = 11) +
  theme(
    axis.text.x = element_text(size = 10),
    strip.text  = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

p_ratio_liq_solv

Liquidity measures are centred around operationally meaningful values, while gearing and interest coverage show long upper tails.

Sweden exhibits slightly greater dispersion across several ratios, consistent with earlier distributional findings.

Data Quality Diagnostics

6. Missingness Patterns

Missingness is examined by country and year.

Show code
key_vars <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
              "total_revenue_eur","net_income_eur",
              "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
              "current_ratio","liquidity_ratio","gearing_pct","solvency_pct")

# missingness by industry and country
missing_by_industry <- filtered_germany_sweden |>
  group_by(country, industry_group) |>
  summarise(across(all_of(key_vars), ~ mean(is.na(.)), .names = "{.col}"),
            .groups = "drop") |>
  pivot_longer(-c(country, industry_group),
               names_to = "variable", values_to = "missing_rate")

#heatmap by year for numeric variables
missing_heat <- filtered_germany_sweden |>
  group_by(country, acct_year) |>
  summarise(across(all_of(key_vars), ~ mean(is.na(.)), .names = "{.col}"),
            .groups = "drop") |>
  pivot_longer(-c(country, acct_year),
               names_to = "variable", values_to = "missing_rate")

ggplot(missing_heat,
       aes(variable, factor(acct_year), fill = missing_rate)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "white", high = "red",
                      labels = scales::percent_format(accuracy = 1)) +
  facet_wrap(~ country, nrow = 1) +
  labs(x = NULL, y = "Year",
       title = "Missingness by Year & Country (Numeric Variables)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Missingness is stable across 2018–2021 and does not spike during the pandemic. Ratio variables (particularly gearing and liquidity ratio) exhibit higher missing rates than core accounting totals.

Balance-sheet and revenue variables are nearly complete, ensuring reliable cross-country comparisons for scale measures.

7. Outliers Detection

Outliers are flagged using both IQR (1.5× rule) and z-score thresholds.

Show code
num_cols <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
              "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur",
              "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
              "net_assets_turnover","stock_turnover",
              "solvency_pct","gearing_pct","current_ratio","liquidity_ratio",
              "shareholders_liq_pct","interest_cover")

qfun <- function(x,p) quantile(x, probs=p, na.rm=TRUE, names=FALSE)

outliers_by_country <- purrr::map_dfr(num_cols, function(v){
  x  <- filtered_germany_sweden[[v]]
  ct <- filtered_germany_sweden$country
  
  q1 <- qfun(x,.25); q3 <- qfun(x,.75); iqr <- q3-q1
  lo <- q1 - 1.5*iqr; hi <- q3 + 1.5*iqr
  z  <- as.numeric(scale(x))
  
  tibble::tibble(
    variable = v,
    country  = ct,
    n        = !is.na(x),
    iqr_flag = (x < lo | x > hi),
    z_flag   = abs(z) > 3
  )
}) |>
  dplyr::group_by(variable, country) |>
  dplyr::summarise(
    n = sum(n),
    iqr_outliers = sum(iqr_flag, na.rm = TRUE),
    z_outliers   = sum(z_flag, na.rm = TRUE),
    share_iqr    = iqr_outliers/n,
    share_z      = z_outliers/n,
    .groups = "drop"
  ) |>
  dplyr::arrange(country, desc(share_iqr))

var_levels <- intersect(num_cols, unique(outliers_by_country$variable))

outliers_by_country_ord <- outliers_by_country |>
  dplyr::mutate(variable = factor(variable, levels = var_levels)) |>
  dplyr::arrange(variable, country)

outliers_by_country_ord |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
variable country n iqr_outliers z_outliers share_iqr share_z
total_assets_eur Germany 8042 1117 57 0.14 0.01
total_assets_eur Sweden 7150 1451 145 0.20 0.02
total_liabilities_eur Germany 8038 1201 60 0.15 0.01
total_liabilities_eur Sweden 7150 1483 100 0.21 0.01
total_equity_eur Germany 8041 1070 38 0.13 0.00
total_equity_eur Sweden 7146 1482 145 0.21 0.02
total_revenue_eur Germany 8003 1045 79 0.13 0.01
total_revenue_eur Sweden 7094 1584 132 0.22 0.02
net_sales_eur Germany 7909 1028 79 0.13 0.01
net_sales_eur Sweden 7018 1581 132 0.23 0.02
gross_sales_eur Germany 7909 1033 77 0.13 0.01
gross_sales_eur Sweden 7013 1580 132 0.23 0.02
net_income_eur Germany 7643 1052 32 0.14 0.00
net_income_eur Sweden 6980 2404 131 0.34 0.02
ebit_margin_pct Germany 6554 1179 90 0.18 0.01
ebit_margin_pct Sweden 4920 1183 101 0.24 0.02
ebitda_margin_pct Germany 6736 1170 77 0.17 0.01
ebitda_margin_pct Sweden 5024 1079 92 0.21 0.02
roa_pct Germany 7938 391 83 0.05 0.01
roa_pct Sweden 6754 1344 270 0.20 0.04
roe_pct Germany 7717 468 81 0.06 0.01
roe_pct Sweden 6885 1317 226 0.19 0.03
net_assets_turnover Germany 7953 351 26 0.04 0.00
net_assets_turnover Sweden 6990 568 61 0.08 0.01
stock_turnover Germany 4859 962 171 0.20 0.04
stock_turnover Sweden 4178 617 86 0.15 0.02
solvency_pct Germany 8018 70 67 0.01 0.01
solvency_pct Sweden 7113 71 57 0.01 0.01
gearing_pct Germany 5891 517 183 0.09 0.03
gearing_pct Sweden 4457 213 75 0.05 0.02
current_ratio Germany 7786 989 237 0.13 0.03
current_ratio Sweden 7097 704 66 0.10 0.01
liquidity_ratio Germany 6217 794 178 0.13 0.03
liquidity_ratio Sweden 4394 356 10 0.08 0.00
shareholders_liq_pct Germany 7912 1166 150 0.15 0.02
shareholders_liq_pct Sweden 5632 1002 110 0.18 0.02
interest_cover Germany 7089 1014 108 0.14 0.02
interest_cover Sweden 5543 1394 128 0.25 0.02

Outliers are common in scale variables due to heterogeneous firm size. Profitability ratios also show extreme values, particularly ROE.

These patterns confirm:

  • Heavy-tailed distributions.
  • Need for robust summaries.
  • Potential benefit of log transformations for monetary variables.

No automatic winsorisation is applied at this stage; instead, robustness is addressed in later modelling decisions.

8. Character Field Validation

Key identifiers and classification fields are cleaned and checked for missingness.

Show code
#character variables to check
char_vars <- c("company_id","company_name","country","city",
               "consolidation_code","nace_code","icb_code","sic_code")
char_vars <- intersect(char_vars, names(filtered_germany_sweden))

#cleaning
chars_clean <- filtered_germany_sweden |>
  mutate(
    across(all_of(char_vars),
           ~ .x |> as.character() |> str_squish() |> na_if("")),
    #casing
    country = str_to_title(country),
    city    = str_to_title(city),
    company_name = str_squish(company_name)
  )

#missingness check
char_missing <- chars_clean |>
  summarise(across(all_of(char_vars), ~ mean(is.na(.)))) |>
  tidyr::pivot_longer(everything(), names_to="variable", values_to="missing_rate")

char_missing |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
variable missing_rate
company_id 0.00
company_name 0.00
country 0.00
city 0.00
consolidation_code 0.00
nace_code 0.02
icb_code 0.09
sic_code 0.00

Identifiers (company_id, company_name) are complete, enabling reliable panel tracking. Industry codes are largely complete, supporting consistent classification.

After completing the global IDA, targeted preparation steps were applied for each research question.

4.1 Question 1

I initially added a period label (Pre-pandemic 2018 – 2019 vs. Pandemic 2020 – 2021). For visualisations, net_income_eur was transformed using asinh() to compare both gains and losses on a consistent scale. When ROA/ROE are missing, I recalculated simple estimates from the accounting totals and noted the sources of the values (reported vs. recomputed).

Show code
stopifnot("net_income_eur" %in% names(filtered_germany_sweden))

ida_question1 <- filtered_germany_sweden |>
  mutate(
    country = country,
    # period flag
    period = if_else(acct_year >= 2020, "Pandemic (2020–2021)", "Pre-pandemic (2018–2019)"),
    # transform for visuals (handles negatives)
    net_income_eur_asinh = asinh(as.numeric(net_income_eur)),
    # recompute ROA / ROE (fallbacks)
    roa_calc = 100 * (net_income_eur / total_assets_eur),
    roe_calc = if_else(total_equity_eur > 0,
                       100 * (net_income_eur / total_equity_eur),
                       NA_real_),
    roa_use = coalesce(roa_pct, roa_calc),
    roe_use = coalesce(roe_pct, roe_calc),
    roa_src = case_when(
      !is.na(roa_pct) ~ "reported",
      !is.na(roa_calc) ~ "recomputed",
      TRUE ~ NA_character_
    ),
    roe_src = case_when(
      !is.na(roe_pct) ~ "reported",
      !is.na(roe_calc) ~ "recomputed",
      TRUE ~ NA_character_
    )
  )

This step:

  • Defines pre-pandemic vs pandemic periods
  • Transforms net income using asinh() to handle losses
  • Recomputes ROA and ROE when missing
  • Tracks provenance of values

This ensures transparency in profitability construction.

Conservative Within-firm Gap Filling

Show code
#Firm one-gap fill within the same period only

fill_one_gap <- function(x){
  f <- dplyr::lag(x); b <- dplyr::lead(x)
  ifelse(is.na(x) & !is.na(f) & !is.na(b) & (f == b), f, x)
}

ida_question1 <- ida_question1 |>
  arrange(company_id, acct_year) |>
  group_by(company_id, period) |>  #period boundary blocks carryover
  mutate(
    ebit_step1   = fill_one_gap(ebit_margin_pct),
    ebitda_step1 = fill_one_gap(ebitda_margin_pct),
    roa_step1    = fill_one_gap(roa_use),
    roe_step1    = fill_one_gap(roe_use)
  ) |>
  ungroup() |>
  mutate(
    ebit_step1   = coalesce(ebit_margin_pct, ebit_step1),
    ebitda_step1 = coalesce(ebitda_margin_pct, ebitda_step1),
    roa_step1    = coalesce(roa_use, roa_step1),
    roe_step1    = coalesce(roe_use, roe_step1)
  )

A one-gap rule fills isolated missing values within the same period only, preventing cross-period information leakage.

Hierarchical Median Imputation

Show code
#median fills in strict order (no cross-period borrowing)
#country x industry × year medians
med_ixy <- ida_question1 |>
  group_by(country, industry_group, acct_year) |>
  summarise(
    med_ebit = median(ebit_step1,   na.rm=TRUE),
    med_ebitda = median(ebitda_step1, na.rm=TRUE),
    med_roa = median(roa_step1,    na.rm=TRUE),
    med_roe = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#country x industry × period medians
med_ip <- ida_question1 |>
  group_by(country, industry_group, period) |>
  summarise(
    med_ebit_ip = median(ebit_step1,   na.rm=TRUE),
    med_ebitda_ip = median(ebitda_step1, na.rm=TRUE),
    med_roa_ip = median(roa_step1,    na.rm=TRUE),
    med_roe_ip = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#country x period medians
med_p <- ida_question1 |>
  group_by(country, period) |>
  summarise(
    med_ebit_p = median(ebit_step1,   na.rm=TRUE),
    med_ebitda_p = median(ebitda_step1, na.rm=TRUE),
    med_roa_p = median(roa_step1,    na.rm=TRUE),
    med_roe_p = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#join and impute with precedence, also explicit source flags
ida_question1 <- ida_question1 |>
  left_join(med_ixy, by = c("country","industry_group","acct_year")) |>
  left_join(med_ip,  by = c("country","industry_group","period")) |>
  left_join(med_p,   by = c("country", "period")) |>
  mutate(
    #final values, using your precedence
    ebit_margin_q1   = if_else(is.na(ebitda_step1), ebit_step1, ebit_step1),
    ebit_margin_q1   = if_else(is.na(ebit_step1),   coalesce(med_ebit,   med_ebit_ip,   med_ebit_p),   ebit_step1),
    ebitda_margin_q1 = if_else(is.na(ebitda_step1), coalesce(med_ebitda, med_ebitda_ip, med_ebitda_p), ebitda_step1),
    roa_q1           = if_else(is.na(roa_step1),    coalesce(med_roa,    med_roa_ip,    med_roa_p),    roa_step1),
    roe_q1           = if_else(is.na(roe_step1),    coalesce(med_roe,    med_roe_ip,    med_roe_p),    roe_step1),

    #provenance for EBIT/EBITDA margins
    ebit_src = dplyr::case_when(
      !is.na(ebit_margin_pct) ~ "reported",
      is.na(ebit_margin_pct) & !is.na(ebit_step1) ~ "firm-1gap(period)",
      is.na(ebit_step1) & !is.na(med_ebit) ~ "ind×year",
      is.na(ebit_step1) & is.na(med_ebit) & !is.na(med_ebit_ip) ~ "ind×period",
      TRUE ~ "period"
    ),
    ebitda_src = dplyr::case_when(
      !is.na(ebitda_margin_pct) ~ "reported",
      is.na(ebitda_margin_pct) & !is.na(ebitda_step1) ~ "firm-1gap(period)",
      is.na(ebitda_step1) & !is.na(med_ebitda) ~ "ind×year",
      is.na(ebitda_step1) & is.na(med_ebitda) & !is.na(med_ebitda_ip) ~ "ind×period",
      TRUE ~ "period"
    ),

    #ROA/ROE provenance
    roa_src = dplyr::case_when(
      !is.na(roa_src) ~ roa_src,
      is.na(roa_src) & !is.na(roa_step1) ~ "firm-1gap(period)",
      is.na(roa_step1) & !is.na(med_roa) ~ "ind×year",
      is.na(roa_step1) & is.na(med_roa) & !is.na(med_roa_ip) ~ "ind×period",
      TRUE ~ "period"
    ),
    roe_src = dplyr::case_when(
      !is.na(roe_src) ~ roe_src,
      is.na(roe_src) & !is.na(roe_step1) ~ "firm-1gap(period)",
      is.na(roe_step1) & !is.na(med_roe) ~ "ind×year",
      is.na(roe_step1) & is.na(med_roe) & !is.na(med_roe_ip) ~ "ind×period",
      TRUE ~ "period"
    )
  ) |>
  select(-ends_with("_step1"))

Remaining gaps are filled using a strict hierarchy:

  • Country × Industry × Year median.
  • Country × Industry × Period median.
  • Country × Period median.

All imputation sources are explicitly recorded.

Imputation Audit Tables

Show code
impute_summary_q1 <- ida_question1 |>
  summarise(
    .by = c(country, period),
    n_rows = n(),
    ebit_reported   = sum(ebit_src   == "reported",            na.rm=TRUE),
    ebit_firmgap    = sum(ebit_src   == "firm-1gap(period)",   na.rm=TRUE),
    ebit_ind_year   = sum(ebit_src   == "ind×year",            na.rm=TRUE),
    ebit_ind_period = sum(ebit_src   == "ind×period",          na.rm=TRUE),
    ebit_period     = sum(ebit_src   == "period",              na.rm=TRUE),

    ebitda_reported   = sum(ebitda_src   == "reported",          na.rm=TRUE),
    ebitda_firmgap    = sum(ebitda_src   == "firm-1gap(period)", na.rm=TRUE),
    ebitda_ind_year   = sum(ebitda_src   == "ind×year",          na.rm=TRUE),
    ebitda_ind_period = sum(ebitda_src   == "ind×period",        na.rm=TRUE),
    ebitda_period     = sum(ebitda_src   == "period",            na.rm=TRUE),

    roa_reported    = sum(roa_src == "reported",            na.rm=TRUE),
    roa_recomputed  = sum(roa_src == "recomputed",          na.rm=TRUE),
    roa_firmgap     = sum(roa_src == "firm-1gap(period)",   na.rm=TRUE),
    roa_ind_year    = sum(roa_src == "ind×year",            na.rm=TRUE),
    roa_ind_period  = sum(roa_src == "ind×period",          na.rm=TRUE),
    roa_period      = sum(roa_src == "period",              na.rm=TRUE),

    roe_reported    = sum(roe_src == "reported",            na.rm=TRUE),
    roe_recomputed  = sum(roe_src == "recomputed",          na.rm=TRUE),
    roe_firmgap     = sum(roe_src == "firm-1gap(period)",   na.rm=TRUE),
    roe_ind_year    = sum(roe_src == "ind×year",            na.rm=TRUE),
    roe_ind_period  = sum(roe_src == "ind×period",          na.rm=TRUE),
    roe_period      = sum(roe_src == "period",              na.rm=TRUE)
  )

Imputation rates remain limited. Germany shows higher direct reporting coverage than Sweden, particularly for EBIT and EBITDA margins. ROA and ROE are largely reported in both countries.

  1. EBIT & EBITDA Imputation
Show code
impute_margins <- impute_summary_q1 |>
  select(country, period, n_rows,
         ebit_reported, ebit_ind_year, ebit_period, ebit_firmgap,
         ebitda_reported, ebitda_ind_year, ebitda_period, ebitda_firmgap)

impute_margins |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
country period n_rows ebit_reported ebit_ind_year ebit_period ebit_firmgap ebitda_reported ebitda_ind_year ebitda_period ebitda_firmgap
Germany Pre-pandemic (2018–2019) 4071 3326 745 0 0 3399 672 0 0
Germany Pandemic (2020–2021) 3975 3228 747 0 0 3337 638 0 0
Sweden Pre-pandemic (2018–2019) 3477 2384 1089 4 0 2427 1046 4 0
Sweden Pandemic (2020–2021) 3675 2536 1135 4 0 2597 1073 4 1

According to the table, Germany shows strong coverage of EBIT and EBITDA margins in both periods, with about 82 – 84% of values reported directly. The remaining 16 – 18% are filled using industry and year medians, and Germany does not need any firm-gap or period-level imputations. In comparison, Sweden has lower direct reporting, around 69 – 71% across both periods. Approximately 29 – 31% of Swedish records depend on industry and year medians, which is a higher dependence than in Germany. Only four rows in each period use a period-level fallback, indicating that heavy imputations are very limited. Overall, imputations are present in both countries, but Sweden requires more support due to lower direct reporting.

  1. ROA Imputation
Show code
impute_roa <- impute_summary_q1 |>
  select(country, period, n_rows,
         roa_reported, roa_recomputed, roa_ind_year, roa_ind_period, roa_period)

impute_roa |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
country period n_rows roa_reported roa_recomputed roa_ind_year roa_ind_period roa_period
Germany Pre-pandemic (2018–2019) 4071 4008 52 11 0 0
Germany Pandemic (2020–2021) 3975 3930 43 2 0 0
Sweden Pre-pandemic (2018–2019) 3477 3269 201 7 0 0
Sweden Pandemic (2020–2021) 3675 3485 179 10 0 0

According to the table, ROA is nearly fully available in Germany, with 98 – 99% of rows reported directly in both periods. Only about 1% of observations depend on recomputation from accounting totals, and industry and year replacement is below 0.3%. Sweden also shows strong ROA coverage, around 94 – 95% reported or recomputed. However, Sweden relies more on recomputation, approximately 5% in each period. Industry-level fallback is very limited. Overall, data quality remains stable in both countries, though Sweden requires slightly more reconstruction of ROA than Germany.

  1. ROE Imputation
Show code
impute_roe <- impute_summary_q1 |>
  select(country, period, n_rows,
         roe_reported, roe_recomputed, roe_ind_year, roe_ind_period, roe_period)

impute_roe |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)
country period n_rows roe_reported roe_recomputed roe_ind_year roe_ind_period roe_period
Germany Pre-pandemic (2018–2019) 4071 3902 16 153 0 0
Germany Pandemic (2020–2021) 3975 3815 18 142 0 0
Sweden Pre-pandemic (2018–2019) 3477 3339 37 101 0 0
Sweden Pandemic (2020–2021) 3675 3546 27 102 0 0

According to the table, Germany had strong ROE availability, with about 96% of records reported directly in both periods. Around 0.4 – 0.5% were recomputed, and roughly 3 – 4% used industry and year fallbacks. Sweden also had good coverage, above 94% reported in both periods. However, Sweden relied more on recomputation and on industry and year fallbacks combined, accounting for around 5 – 6% of observations. No period-level imputations occurred in either country. Overall, both datasets provided reliable ROE values, with Sweden needing slightly more support from industry medians.

The relatively high direct reporting coverage in Germany compared to Sweden suggests stronger raw data completeness, although both datasets remain sufficiently robust for comparative analysis. Importantly, the imputation hierarchy preserves cross-country comparability while minimising distortion from extreme firm-level observations.

4.2 Question 2

The Initial Data Analysis (IDA) for this question focused on constructing and validating key indicators of leverage and liquidity for both Germany and Sweden during 2018 – 2021. Specifically, new variables such as debt-to-assets and equity ratio were derived from total liabilities, equity, and assets to measure firms’ capital structure and solvency capacity. Data cleaning steps included filtering out inactive firms, excluding implausible ratios (e.g., leverage >150%), and dropping missing or non-finite values. These processed metrics were summarised using median and interquartile range (IQR) to capture typical firm behaviour and variability by year and country, ensuring comparability between economies before conducting visual exploration.

Show code
# Create leverage and liquidity indicators for both countries
lev_liq_data <- filtered_germany_sweden |>
  filter(acct_year %in% 2018:2021, status == "Active") |>
  mutate(
    debt_to_assets = total_liabilities_eur / total_assets_eur,
    equity_ratio   = total_equity_eur / total_assets_eur
  ) |>
  select(acct_year, country, industry_group,
         debt_to_assets, equity_ratio, gearing_pct,
         solvency_pct, current_ratio) |>
  drop_na(debt_to_assets, current_ratio)

# Summarise by year and country
lev_liq_summary <- lev_liq_data |>
  group_by(country, acct_year) |>
  summarise(
    median_debt_assets = median(debt_to_assets, na.rm = TRUE),
    iqr_debt_assets    = IQR(debt_to_assets, na.rm = TRUE),
    median_liquidity   = median(current_ratio, na.rm = TRUE),
    iqr_liquidity      = IQR(current_ratio, na.rm = TRUE),
    .groups = "drop"
  )

lev_liq_summary |>
  knitr::kable(
    caption = "Median and IQR of Leverage and Liquidity by Country (2018–2021)",
    digits = 2
  ) |>
  kableExtra::kable_styling(full_width = FALSE, font_size = 11)
Median and IQR of Leverage and Liquidity by Country (2018–2021)
country acct_year median_debt_assets iqr_debt_assets median_liquidity iqr_liquidity
Germany 2018 0.50 0.37 1.9 2.6
Germany 2019 0.51 0.38 1.8 2.5
Germany 2020 0.51 0.38 1.8 2.5
Germany 2021 0.51 0.37 1.8 2.4
Sweden 2018 0.46 0.40 1.6 2.2
Sweden 2019 0.48 0.40 1.6 1.9
Sweden 2020 0.47 0.41 1.6 2.1
Sweden 2021 0.43 0.40 1.7 2.7

Both countries maintained median liquidity near 1.8×, while median leverage hovered around 0.50-0.52.Sweden shows slightly higher liquidity dispersion, suggesting a broader mix of firm sizes and capital structures.

The stability of median leverage during 2020 – 2021 indicates that firms did not materially increase debt exposure despite favourable credit conditions. This suggests that pandemic-related stress manifested more through profitability pressures than through widespread balance-sheet deterioration.

4.3 Question 3

  1. Financial Indicators Summaries
Show code
industry_financial_summary <- filtered_germany_sweden |> 
  filter(country == "Germany", !is.na(industry_group), acct_year >= 2019, acct_year <= 2021) |> 
  mutate(period = ifelse(acct_year < 2020, "Pre-pandemic", "Pandemic")) |>
  group_by(industry_group, period) |> 
  summarise(
    n_firms = n(),
    median_profit = median(net_income_eur, na.rm = TRUE),
    median_roa = median(roa_pct, na.rm = TRUE),
    median_liq = median(current_ratio, na.rm = TRUE),
    median_lev = median(gearing_pct, na.rm = TRUE),
    .groups = "drop"     
  ) |> 
  arrange(industry_group, period)

# Display table
knitr::kable(
  industry_financial_summary,
  digits = 2,
  booktabs = TRUE,
  caption = "Industry-level summary as per IDA procedure"
)
Industry-level summary as per IDA procedure
industry_group period n_firms median_profit median_roa median_liq median_lev
Manufacturing Pandemic 1690 0 2.48 1.91 64.4
Manufacturing Pre-pandemic 893 0 3.84 1.94 64.4
Construction Pandemic 7 94700 5.33 0.95 144.1
Construction Pre-pandemic 4 54500 7.48 1.04 98.1
Wholesale & Retail Trade Pandemic 43 6200 5.19 1.43 81.2
Wholesale & Retail Trade Pre-pandemic 22 1400 4.91 1.60 81.2
Transport & Storage Pandemic 273 0 2.61 1.62 71.2
Transport & Storage Pre-pandemic 142 0 2.37 1.74 49.5
Accommodation & Food Pandemic 13 -18900 -3.18 1.79 34.9
Accommodation & Food Pre-pandemic 10 0 -4.78 1.79 54.2
Information & Communication Pandemic 120 0 2.58 1.28 99.4
Information & Communication Pre-pandemic 75 381 4.22 1.41 117.9
Financial & Insurance Pandemic 149 0 3.28 1.46 70.2
Financial & Insurance Pre-pandemic 79 0 1.60 1.43 74.2
Professional, Scientific & Technical Pandemic 873 0 3.65 1.84 72.4
Professional, Scientific & Technical Pre-pandemic 463 0 3.88 1.67 73.9
Administrative & Support Pandemic 8 0 -3.14 1.71 21.9
Administrative & Support Pre-pandemic 3 106 11.78 3.34 45.9
Education Pandemic 58 0 0.26 1.41 52.6
Education Pre-pandemic 31 0 3.34 1.97 46.4
Arts, Entertainment & Recreation Pandemic 163 0 0.00 0.97 38.3
Arts, Entertainment & Recreation Pre-pandemic 88 0 1.92 1.20 29.2
Basic Materials Pandemic 6 -596 -14.53 0.85 15.4
Basic Materials Pre-pandemic 4 -801 -16.47 0.72 29.2
Industrials Pandemic 53 0 -1.58 2.00 67.2
Industrials Pre-pandemic 25 0 -0.17 2.51 40.6
Consumer Goods Pandemic 8 -32 -6.03 0.86 55.9
Consumer Goods Pre-pandemic 4 -32 -6.22 NA NA
Health Care Pandemic 6 -638 -2.19 25.58 NA
Health Care Pre-pandemic 4 -638 -2.47 2.23 NA
Consumer Services Pandemic 10 0 -1.30 1.77 28.3
Consumer Services Pre-pandemic 8 0 -3.34 0.48 41.6
Utilities Pandemic 63 34573 2.29 1.07 176.8
Utilities Pre-pandemic 36 0 2.21 1.15 161.2
Financials Pandemic 354 0 1.96 2.42 42.8
Financials Pre-pandemic 192 0 2.10 2.21 34.0
Technology Pandemic 14 902 0.34 5.30 1.0
Technology Pre-pandemic 13 509 0.50 4.31 2.2
Other / Unmapped Pandemic 64 0 -2.09 1.45 29.3
Other / Unmapped Pre-pandemic 33 0 0.00 1.55 50.2

This code generates an industry-level summary of key financial indicators for German firms from 2019 – 2021, distinguishing between pre-pandemic and pandemic periods. It calculates the median values of profitability, ROA, liquidity, and leverage for each industry group, ensuring robustness against outliers. The resulting summary enables comparison of financial performance across industries, highlighting shifts in profitability and stability during the pandemic.

  1. Outlier Detection (IQR Rule)
Show code
outlier_summary <- function(x) {
  q1 <- quantile(x, 0.25, na.rm=TRUE)
  q3 <- quantile(x, 0.75, na.rm=TRUE)
  iqr <- q3 - q1
  lower <- q1 - 1.5 * iqr
  upper <- q3 + 1.5 * iqr
  sum(x < lower | x > upper, na.rm=TRUE)
}
financial_vars <- c("ebit_margin_pct", "gearing_pct", "current_ratio", "total_assets_eur")
outlier_counts_sweden <- filtered_germany_sweden  |> 
  filter(country == "Sweden", acct_year >= 2019, acct_year <= 2021)  |> 
  summarise(across(all_of(financial_vars), outlier_summary))

knitr::kable(outlier_counts_sweden, caption="Outlier counts for Sweden: Key financial variables (IQR rule)")
Outlier counts for Sweden: Key financial variables (IQR rule)
ebit_margin_pct gearing_pct current_ratio total_assets_eur
808 180 602 973

This code applies the Interquartile Range (IQR) rule to detect outliers among key financial variables for Swedish firms between 2019 and 2021. For each variable, EBIT margin, gearing ratio, current ratio, and total assets, the function calculates the first (Q1) and third (Q3) quartiles, determines the IQR, and identifies values lying beyond 1.5×IQR from these bounds as outliers. The resulting table reports the count of extreme observations per variable. This method provides a robust, non-parametric approach to identifying unusually high or low financial values, which helps assess data quality, detect potential anomalies, and ensure the reliability of subsequent statistical analyses.

  1. Data Completeness
Show code
# Prepare long-format data (already done)
missing_long_compare <- filtered_germany_sweden |>  
  filter(country %in% c("Sweden", "Germany"), acct_year >= 2018, acct_year <= 2021) |>  
  select(country, acct_year, all_of(financial_vars)) |>  
  pivot_longer(-c(country, acct_year), names_to = "variable", values_to = "value") |>  
  mutate(missing = is.na(value)) |>  
  group_by(country, acct_year, variable) |>  
  summarise(prop_missing = mean(missing), .groups = "drop")

# Ribbon plot
ggplot(missing_long_compare, aes(x = acct_year, y = prop_missing, fill = country, group = country)) +
  geom_ribbon(aes(ymin = 0, ymax = prop_missing), alpha = 0.3) +
  geom_line(aes(color = country), size = 1) +
  facet_wrap(~variable, scales = "free_y") +
  labs(
    title = "Missingness Trends by Variable for Sweden and Germany (2019–2021)",
    x = "Year",
    y = "Proportion Missing",
    fill = "Country",
    color = "Country"
  ) +
  scale_y_continuous(labels = function(x) sprintf("%.4f", x)) +
  theme_bw(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 11),
    axis.text.x = element_text(face = "bold"),
    legend.position = "right"
  )

This plot visualizes data completeness across key financial variables for Sweden and Germany between 2018 and 2021. It shows the proportion of missing values for each variable and year, allowing a direct comparison of data quality between the two countries. The ribbon and line format highlights both the level and trend of missingness over time. Consistently low proportions indicate reliable data coverage, while noticeable peaks suggest potential data reporting or collection gaps. Overall, this visualization supports a systematic evaluation of data integrity, ensuring that subsequent analyses are based on robust and complete financial information.

  1. Comparative Summary Table: Germany vs Sweden (Pandemic Years)
Show code
combined_financial_summary <- filtered_germany_sweden  |> 
  filter(country %in% c("Germany", "Sweden"), !is.na(industry_group), acct_year >= 2019, acct_year <= 2021)  |> 
  group_by(country, industry_group)  |> 
  summarise(
    median_roa = median(roa_pct, na.rm = TRUE),
    median_liq = median(current_ratio, na.rm = TRUE),
    median_lev = median(gearing_pct, na.rm = TRUE)
  )  |> 
  pivot_wider(names_from = country, values_from = c(median_roa, median_liq, median_lev))

knitr::kable(combined_financial_summary, 
             digits = 2, 
             booktabs = TRUE, 
             caption = "Comparison of Key Financial Medians by Industry: Germany vs Sweden (2019-2021)")
Comparison of Key Financial Medians by Industry: Germany vs Sweden (2019-2021)
industry_group median_roa_Germany median_roa_Sweden median_liq_Germany median_liq_Sweden median_lev_Germany median_lev_Sweden
Manufacturing 3.11 -5.17 1.92 1.95 64.4 47.2
Construction 6.27 -40.70 1.01 1.17 124.4 28.8
Wholesale & Retail Trade 5.19 5.82 1.49 1.35 81.2 79.6
Transport & Storage 2.52 -2.34 1.62 1.71 62.6 52.5
Accommodation & Food -3.94 4.57 1.79 0.52 50.0 7.5
Information & Communication 3.08 -6.38 1.31 0.79 102.9 218.9
Financial & Insurance 3.13 0.05 1.45 1.14 72.8 44.5
Professional, Scientific & Technical 3.76 2.91 1.79 1.30 73.1 65.5
Administrative & Support -2.07 5.83 1.98 0.90 21.9 176.7
Education 2.76 -3.85 1.93 1.46 52.6 44.8
Arts, Entertainment & Recreation 0.16 -4.85 1.07 1.00 35.5 62.0
Basic Materials -15.81 NA 0.85 NA 29.2 NA
Industrials -0.32 -7.09 2.04 1.80 61.6 72.6
Consumer Goods -6.12 NA 0.86 NA 55.9 NA
Health Care -2.47 7.58 15.53 1.01 NA 63.4
Consumer Services -2.18 -24.76 1.38 1.09 34.8 1.3
Utilities 2.29 4.60 1.08 1.18 168.3 93.8
Financials 2.05 3.54 2.31 3.45 40.1 20.4
Technology 0.34 5.41 4.31 2.98 1.6 19.0
Other / Unmapped -1.36 -1.46 1.45 2.33 46.4 99.3

This comparative summary table presents the median financial indicators, return on assets (ROA), liquidity (current ratio), and leverage (gearing), for Germany and Sweden across various industries during the pandemic period (2019 – 2021). By summarizing and contrasting these median values, the table highlights cross-country differences in financial performance and stability at the industry level. For instance, higher median liquidity in one country may reflect stronger short-term solvency, while lower leverage suggests more conservative financing structures. Overall, this table provides a concise yet informative overview of how the two economies’ industries responded financially during the pandemic years, enabling targeted comparisons of resilience and financial health.

These sectoral differences indicate that financial resilience was shaped primarily by pre-pandemic capital structure and profitability levels rather than by uniform macroeconomic shock effects. The pandemic therefore amplified existing structural strengths and weaknesses across industries.

4.4 Question 4

  1. Data Cleaning on Consolidation Codes

The raw data from Orbis contained financial statements under different consolidation codes (C1, C2, C*, U1, U2). To ensure consistency and comparability, we prioritized consolidated accounts which represent the entire economic entity. For each firm-year observation, we retained the highest-priority available consolidated statement following the order: C1 > C2 > C*. Unconsolidated statements (U1, U2) were excluded from the analysis.

Show code
ger_swe_q4 <- filtered_germany_sweden |>
  mutate(
    consolidation_priority = case_when(
      consolidation_code == "C1" ~ 1,  # highest priority
      consolidation_code == "C2" ~ 2,
      consolidation_code == "C*" ~ 3,
      consolidation_code == "U1" ~ 99, # Very low priority, about to be filtered
      consolidation_code == "U2" ~ 99,
      TRUE ~ 99  # Handle any unexpected code
    )
  ) |>
  
  # Group by company and year, and keep only the highest priority records
  group_by(company_id, fy_year) |>
  arrange(consolidation_priority, .by_group = TRUE) |>
  slice(1) |> # Take the first row of each group, which has the highest priority
  ungroup() |>
  # Filter out the remaining non-consolidated records
  filter(consolidation_priority %in% c(1, 2, 3))
  1. Select the columns to be analyzed
Show code
ger_swe_q4 <- ger_swe_q4 |>
  select(company_id, fy_year, roa_pct, current_ratio, industry_group, country)

The analysis in this study is based on a dataset of company financial information. It is important to note that in this question we use the fiscal year, not the calendar year. A company’s fiscal year is its own 12-month reporting period, which may not end in December. Using the fiscal year is more accurate for this research because it matches the company’s true business cycle. This means that when we refer to “2020” in the data, we are talking about the fiscal year that was most affected by the pandemic, even if it does not exactly match the calendar year 2020.

  1. Convert data into a tsibble object using the tsibble package

Each company was defined as a key and the financial year as the index. Basic screening was performed to check for missing values, outliers, and duplicates.

Show code
ger_swe_ts <- ger_swe_q4 |>
  as_tsibble(key = company_id, index = fy_year) |>
  arrange(company_id, fy_year)
  1. Missing value - Check for missing value

The initial examination of missing values revealed a relatively low proportion of incomplete data in the dataset. As shown in Figure 2, the missing value analysis showed that only 0.9% of the total data was missing, while 99.1% of observations were complete. ROA contained between 3% and 3.5% missing values across the dataset, while the current ratio had a slightly higher proportion of missing data, ranging from 1% to 2% (Figure 1).

Show code
# Check for missing ROA and CR
miss_var <- 
  gg_miss_var(ger_swe_ts, show_pct = TRUE) + 
  labs(title = "Distribution of Missing Values", 
       subtitle = "For all variables")

# Check for missing values by industry group
miss_industry <- ger_swe_ts |> 
  group_by(industry_group) |> 
  gg_miss_var(show_pct = TRUE) + 
  labs(title = "Distribution of Missing Values", 
       subtitle = "By industry group")

  
miss_var / miss_industry
Figure 1: The plots of Missing Value Distribution
Show code
m3 <- vis_miss(ger_swe_ts, cluster = TRUE, sort_miss = TRUE) +
  theme(axis.text.x = element_text(size = 8, angle = 45, hjust = 1),
        axis.text.y = element_text(size = 8))

m3
Figure 2: The plot of clustering to visualize missing patterns

Address missingness issues

To address missing data, observations with missing values in either ROA or current ratio were removed from the dataset. Additionally, to ensure temporal consistency across the study period, only industry groups that contained data for all three years (2019 – 2021) were retained. The ‘Other / Unmapped’ category was removed from the study as it comprises companies without clear industry classification.

Show code
ger_swe_ts <- ger_swe_ts |>
  as_tibble() |>   # Convert to a normal tibble
  # Remove observations with missing ROA or CR
  filter(!is.na(roa_pct) & !is.na(current_ratio)) |>
   # Remove industries that don't have data in 2019-2021
  group_by(industry_group) |>
  filter(all(2019:2021 %in% fy_year)) |>
  filter(industry_group != "Other / Unmapped") |>
  ungroup() |>
  # Convert back to tsibble format
  as_tsibble(key = company_id, index = fy_year) |>
  arrange(company_id, fy_year)
  1. Summary

The data preparation followed the same cleaning and preprocessing pipeline as described for Germany, ensuring comparability across the two countries. Consolidated financial statements were prioritized, and only industries with complete data for 2019 – 2021 were retained.

The comparative analysis of ROA and Current Ratio trends (Table 1) from 2017 to 2021 reveals stark differences between Germany and Sweden, with distinct pandemic-related patterns. German companies demonstrated remarkable resilience in profitability, with ROA recovering strongly from a low of 0.37% in 2020 to peak at 5.00% in 2021. It indicates a robust post-pandemic recovery. In contrast, Swedish firms maintained consistently negative ROA throughout the period, ranging from -9.69% to -11.29%. It suggests persistent profitability challenges unaffected by the pandemic. Regarding liquidity, both countries maintained stable Current Ratios. German companies showed more volatility, dipping to 2.62 in 2019 before recovering to 3.11 in 2020. However, Swedish firms maintained consistently higher ratios between 2.95-3.48, indicating stronger short-term liquidity positions.

This difference suggests that while the pandemic significantly impacted German profitability with a subsequent strong recovery, Swedish companies faced deeper structural profitability issues. But both maintained adequate short-term financial stability throughout the period.

Show code
kable(ger_swe_ts |>
  as_tibble() |>
  group_by(fy_year, country) |>
  summarise(mean_roa = mean(roa_pct, na.rm = TRUE),
            mean_cr = mean(current_ratio, na.rm = TRUE)) |>
  ungroup())
Table 1: Trends in Mean Return on Assets (ROA) and Current Ratio from Pre-Pandemic to Post-Pandemic Years (2017-2021) in Two Countries
fy_year country mean_roa mean_cr
2017 Germany 3.68 3.1
2017 Sweden -9.89 3.3
2018 Germany 1.77 2.7
2018 Sweden -11.29 3.0
2019 Germany 1.09 2.6
2019 Sweden -10.94 3.0
2020 Germany 0.37 3.1
2020 Sweden -9.86 3.5
2021 Germany 5.00 2.8
2021 Sweden -9.69 3.5

The divergence between profitability and liquidity patterns suggests that firms prioritised short-term financial buffers even when profitability weakened. This decoupling highlights an adaptive balance-sheet response rather than a simultaneous collapse in operational and liquidity conditions.

5 EDA

The following exploratory analysis evaluates how corporate profitability in Germany and Sweden evolved between the pre-pandemic period (2018 – 2019) and the pandemic period (2020 – 2021). The focus is on both central tendency and dispersion to determine whether the pandemic altered typical firm performance or primarily increased variability across firms.

5.1 Question 1

1. How did profitability change across industries?

5.1.1 Profitability Measures Analysis

This code recodes years into Pre (2018 – 2019) and Pandemic (2020 – 2021), reshapes five profitability measures (EBIT, EBITDA, ROA, ROE, and net income) into long format, and produces violin plots with jittered observations, median markers, and IQR overlays. Net income is transformed using the asinh function so that negative values remain visible while limiting the influence of extreme magnitudes.

Show code
#recode period to short labels
ida_question1 <- ida_question1 |>
  mutate(
    period = if_else(acct_year >= 2020, "Pandemic (2020–21)", "Pre (2018–19)"),
    period = factor(period, levels = c("Pre (2018–19)", "Pandemic (2020–21)"))
  )

#build long data
q1_vars <- c("ebit_margin_q1","ebitda_margin_q1","roa_q1","roe_q1","net_income_eur_asinh")

plot_long <- ida_question1 |>
  select(company_id, country, period, all_of(q1_vars)) |>
  pivot_longer(-c(company_id, country, period), names_to = "metric", values_to = "value") |>
  mutate(
    metric = factor(
      metric,
      levels = q1_vars,
      labels = c("EBIT margin (%)","EBITDA margin (%)","ROA (%)","ROE (%)","Net income (asinh)")
    )
  ) |>
  drop_na(value)

#helper for IQR bar
iqr_bar <- function(y) data.frame(
  y    = median(y, na.rm=TRUE),
  ymin = quantile(y, 0.25, na.rm=TRUE),
  ymax = quantile(y, 0.75, na.rm=TRUE)
)

#helper to draw a panel
draw_q1_panel_country <- function(data_subset, ncol = 2, title_text = "") {
  ggplot(
    data_subset,
    aes(x = period, y = value, colour = country, group = interaction(country, period))
  ) +
    geom_violin(
      trim = FALSE, fill = NA, linewidth = 0.45, na.rm = TRUE,
      position = position_dodge(width = 0.75)
    ) +
    geom_jitter(
      alpha = 0.10, size = 0.55, na.rm = TRUE,
      position = position_jitterdodge(jitter.width = 0.08, jitter.height = 0, dodge.width = 0.75)
    ) +
    stat_summary(
      fun.data = iqr_bar, geom = "errorbar", width = 0.12, linewidth = 0.6, na.rm = TRUE,
      position = position_dodge(width = 0.75), show.legend = FALSE
    ) +
    stat_summary(
      fun = median, geom = "point", shape = 95, size = 8, na.rm = TRUE,
      position = position_dodge(width = 0.75), show.legend = FALSE
    ) +
    facet_wrap(~ metric, scales = "free_y", ncol = ncol) +
    scale_x_discrete(labels = c(
      `Pre (2018–19)`      = "Pre\n(2018 – 2019)",
      `Pandemic (2020–21)` = "Pandemic\n(2020 – 2021)"
    )) +
    labs(x = NULL, y = NULL, colour = "country", title = title_text) +
    theme_minimal(base_size = 11) +
    theme(
      strip.text  = element_text(face = "bold"),
      axis.text.x = element_text(size = 10, margin = margin(t = 8), lineheight = 0.95),
      plot.margin = margin(10, 15, 24, 10),
      legend.position = "top"
    )
}

EBIT & EBITDA Analysis using Median and IQR

Show code
figA_metrics <- c("EBIT margin (%)", "EBITDA margin (%)")
p_q1_A <- draw_q1_panel_country(
  dplyr::filter(plot_long, metric %in% figA_metrics),
  ncol = 2,
  title_text = "Pre vs Pandemic: Germany vs Sweden\nEBIT & EBITDA Analysis using Median and IQR"
)

p_q1_A

According to the chart, EBIT and EBITDA margins in both Germany and Sweden remain centred close to zero in both periods. Median values change only marginally from the pre-pandemic to the pandemic years, indicating that typical operating profitability was broadly maintained.

However, dispersion increases during 2020 – 2021, particularly in Sweden. The wider violin shapes suggest greater heterogeneity in firm performance, with more firms experiencing either unusually strong margins or sharper declines. Germany shows a comparatively narrower distribution, indicating relatively more stable operating profitability.

Overall, the evidence suggests that the pandemic did not substantially shift median operating margins, but it widened the range of firm-level outcomes, especially in Sweden.

ROA & ROE Analysis Using Median and IQR

Show code
figB_metrics <- c("ROA (%)", "ROE (%)")
p_q1_B <- draw_q1_panel_country(
  dplyr::filter(plot_long, metric %in% figB_metrics),
  ncol = 2,
  title_text = "Pre vs Pandemic: Germany vs Sweden\nROA & ROE analysis using Median and IQR"
)

p_q1_B

The ROA and ROE plots show a similar pattern. Median values remain close to zero in both countries, indicating that typical returns on assets and equity did not collapse during the pandemic.

The key change lies in the tails of the distribution. During 2020 – 2021, dispersion increases noticeably, especially for ROE in Sweden, where several firms record extremely negative values. This suggests that while most firms preserved stable returns, a subset experienced substantial losses or equity erosion.

Germany also shows increased variability during the pandemic, but the magnitude of extreme outcomes is smaller compared to Sweden. The stability of the medians combined with wider tails indicates that the pandemic amplified downside risk for certain firms rather than uniformly reducing profitability.

Net income (asinh) Analysis Using Median and IQR

Show code
figC_metrics <- c("Net income (asinh)")
p_q1_C <- draw_q1_panel_country(
  dplyr::filter(plot_long, metric %in% figC_metrics),
  ncol = 1,
  title_text = "Pre vs Pandemic: Germany vs Sweden\nNet income (asinh) Analysis Using Median and IQR"
)

p_q1_C

The net income (asinh) distribution reinforces this pattern. Median values remain close to zero across both periods in Germany and Sweden, indicating that overall profit levels did not dramatically shift for the typical firm.

However, Sweden exhibits heavier tails during the pandemic period, with more firms recording both high profits and deeper losses. Germany’s distribution remains more concentrated around the median, suggesting comparatively steadier firm performance.

This pattern indicates that the pandemic increased volatility in firm outcomes, particularly in Sweden, even though the central tendency of profitability remained relatively stable.

5.1.2 Industry Median for EBITDA Margin Analysis

This code calculates median EBITDA margins by country, industry, and period, then constructs a dumbbell chart showing the change from Pre to Pandemic. The use of medians ensures that comparisons reflect typical industry performance rather than being driven by extreme firms.

Show code
#mapped industries and build period label
eb_src <- ida_question1 |>
  filter(
    !is.na(ebitda_margin_q1),
    !is.na(industry_group),
    industry_group != "Other / Unmapped"
  ) |>
  mutate(
    period = ifelse(acct_year >= 2020, "Pandemic", "Pre"),
    period = factor(period, levels = c("Pre", "Pandemic"))
  )

#median by country × industry × period, plus delta
eb_med_cs <- eb_src |>
  group_by(country, industry_group, period) |>
  summarise(med = median(ebitda_margin_q1, na.rm = TRUE), .groups = "drop") |>
  pivot_wider(names_from = period, values_from = med) |>
  drop_na(Pre, Pandemic) |>
  mutate(
    delta     = Pandemic - Pre,
    direction = factor(ifelse(delta >= 0, "Increase", "Decrease"),
                       levels = c("Increase", "Decrease"))
  )

#one common ordering for industries
order_levels <- eb_med_cs |>
  group_by(industry_group) |>
  summarise(delta_mean = mean(delta, na.rm = TRUE), .groups = "drop") |>
  arrange(delta_mean) |>
  pull(industry_group)

eb_med_cs <- eb_med_cs |>
  mutate(industry_group = factor(industry_group, levels = order_levels))

#facet by country
ggplot(eb_med_cs,
       aes(y = industry_group)) +
  geom_segment(aes(x = Pre, xend = Pandemic,
                   yend = industry_group, colour = direction),
               linewidth = 1) +
  geom_point(aes(x = Pre),      colour = "#4E79A7", size = 2) +  #pre (blue)
  geom_point(aes(x = Pandemic), colour = "#E15759", size = 2) +  #pandemic (red)
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey70") +
  scale_color_manual(values = c("Increase" = "#2CA02C", "Decrease" = "#D62728")) +
  guides(colour = guide_legend(title = NULL, override.aes = list(linewidth = 3))) +
  labs(
    title = "Change in Median EBITDA Margin: Pandemic vs Pre\nGermany and Sweden by Industry",
    x = "Median EBITDA Margin (%)",
    y = NULL
  ) +
  facet_grid(. ~ country) +
  theme_minimal(base_size = 11) +
  theme(
    legend.position      = "top", #legend below title
    legend.box           = "vertical",#stack under the title
    plot.title.position  = "plot",
    strip.text           = element_text(face = "bold"),
    panel.grid.minor     = element_blank(),
    plot.margin          = margin(10, 20, 10, 10)
  )

The chart compares median EBITDA margins across industries in Germany and Sweden.

In Germany, most industries show moderate shifts. Accommodation & Food records the most visible improvement, moving from a negative median margin to a positive one during the pandemic period. Technology and Utilities show mild increases, indicating resilience. In contrast, Health Care records the largest decline from a previously high level, marking a notable reversal.

In Sweden, changes follow a broadly similar direction but are generally smaller in magnitude. Accommodation & Food improves slightly, while Health Care, Consumer Services, and Construction show declines. The overall pattern suggests more muted median adjustments but continued dispersion within sectors.

Across both countries, the pandemic did not uniformly reduce industry profitability. Instead, effects were uneven: service-based and cyclical industries display greater volatility, while Technology, Utilities, and Education remain comparatively stable.

The industry-level analysis confirms the earlier firm-level findings: median profitability was largely preserved, but dispersion increased and sectoral divergence became more pronounced during the pandemic.

5.2 Question 2

The Exploratory Data Analysis (EDA) expands on the cleaned leverage–liquidity dataset to identify cross-country and temporal patterns in how firms managed their financial positions during the pandemic. Three complementary visual analyses were conducted.

5.2.1 Year-wise Comparison

This section tracks changes in median leverage and liquidity across 2018–2021 to reveal how firms in Germany and Sweden adjusted their balance sheets during the pandemic.

Show code
lev_liq_long <- lev_liq_summary |>
  pivot_longer(cols = c(median_debt_assets, median_liquidity),
               names_to = "indicator", values_to = "value")

ggplot(lev_liq_long,
       aes(x = acct_year, y = value, colour = country, group = country)) +
  geom_line(linewidth = 1.1) +
  geom_point(size = 2) +
  facet_wrap(~ indicator, scales = "free_y",
             labeller = as_labeller(c(
               median_debt_assets = "Median Debt-to-Assets",
               median_liquidity   = "Median Current Ratio"
             ))) +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
  scale_colour_brewer(palette = "Set1") +
  labs(
    title = "Leverage and Liquidity Trends (2018–2021)",
    subtitle = "Germany vs Sweden: Stability in liquidity, mild re-leveraging during pandemic",
    x = "Accounting Year", y = "Median Value", colour = "Country"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(size = 11),
        panel.grid.minor = element_blank())

Key findings:

- Both economies experienced a slight leverage increase in 2020 (Germany +0.01, Sweden +0.02), indicating modest additional borrowing during the initial shock.

- Liquidity remained broadly stable, with median current ratios holding near 1.8 – 1.9× throughout the period.

- The pattern suggests precautionary balance-sheet adjustments rather than aggressive re-leveraging. German firms appear to have relied slightly more on credit support mechanisms, while Swedish firms maintained stronger liquidity buffers, consistent with differences in fiscal and policy responses.

Overall, the trends indicate controlled financial adaptation rather than structural deterioration.

5.2.2 Leverage-Solvency Relationship

This section examines how leverage (debt-to-assets) relates to solvency over time, highlighting whether higher debt ratios reduced financial stability differently across the two countries.

Show code
# build plotting data for the leverage–solvency faceted chart
lev_liq_plot <- lev_liq_data |>
  # keep sensible ranges and finite values
  filter(
    is.finite(debt_to_assets), is.finite(solvency_pct),
    between(debt_to_assets, 0, 1.5),   # 0–150%
    between(solvency_pct, -50, 100)    # -50% to 100%
  ) |>
  drop_na(country, acct_year)

ggplot(lev_liq_plot, aes(x = debt_to_assets, y = solvency_pct, colour = country)) +
  geom_point(alpha = 0.25, size = 1.2) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.1) +
  facet_grid(acct_year ~ country) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 1.5)) +
  scale_y_continuous(limits = c(-50, 100)) +
  scale_colour_manual(values = c("Germany" = "#D55", "Sweden" = "#0072B2")) +
  labs(
    title = "Leverage–Solvency Relationship by Country and Year",
    subtitle = "Stronger negative slope in Germany, especially during 2020",
    x = "Debt-to-Assets (%)",
    y = "Solvency (%)",
    colour = "Country"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    strip.text = element_text(face = "bold", size = 14),
    plot.title = element_text(face = "bold", size = 18, hjust = 0.5),
    plot.subtitle = element_text(size = 13, hjust = 0.5),
    legend.position = "none",  # each facet is already labelled
    axis.text = element_text(size = 12),
    panel.spacing.x = unit(1.2, "lines"),
    panel.spacing.y = unit(1.0, "lines"),
    panel.grid.minor = element_blank()
  )
Figure 3: Leverage–Solvency Relationship by Country and Year

Across both Germany and Sweden, leverage shows a strong negative relationship with solvency from 2018 to 2021. Firms with higher debt-to-asset ratios consistently report lower solvency levels, confirming the structural trade-off between debt financing and long-term financial stability.

The relationship remains stable over time, suggesting that the pandemic did not fundamentally alter the capital-structure dynamic.

However, the slope is notably steeper in Germany, especially in 2020, implying that incremental increases in debt had a stronger adverse effect on solvency during the peak disruption period. This may reflect Germany’s greater exposure to capital-intensive and manufacturing industries, where revenue shocks translate more directly into balance-sheet stress.

In contrast, Swedish firms show a milder slope and narrower solvency deterioration, consistent with stronger equity buffers and a more diversified sectoral structure.

5.2.3 Industry-level Leverage Distributions

This section explores cross-industry variation in leverage to identify which sectors carried the most debt and how these patterns evolved through the pandemic period.

Show code
lev_liq_industry <- filtered_germany_sweden |>
  filter(acct_year %in% 2018:2021,
         total_assets_eur > 0,
         total_liabilities_eur >= 0,
         total_liabilities_eur / total_assets_eur <= 1.5) |>
  mutate(
    debt_to_assets = total_liabilities_eur / total_assets_eur
  ) |>
  drop_na(industry_group, country)

ggplot(lev_liq_industry,
       aes(x = debt_to_assets,
           y = reorder(industry_group, debt_to_assets),
           fill = country)) +
  geom_boxplot(outlier.shape = NA, alpha = 0.7, width = 0.6) +
  facet_wrap(~ acct_year, ncol = 2, scales = "free_x") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, 1.5)) +
  scale_fill_manual(
    values = c("Germany" = "#D55", "Sweden" = "#0072B2"),
    name = "Country"
  ) +
  labs(
    title = "Leverage Distribution by Industry and Year (2018–2021)",
    subtitle = "Comparison between Germany and Sweden across industry groups and years",
    x = "Debt-to-Assets Ratio",
    y = NULL
  ) +
  theme_minimal(base_size = 16) +  # increased from 13 → 16
  theme(
    strip.text = element_text(face = "bold", size = 15),
    plot.title = element_text(face = "bold", size = 20, hjust = 0.5),
    plot.subtitle = element_text(size = 14, hjust = 0.5),
    legend.position = "top",
    legend.text = element_text(size = 13),
    legend.title = element_text(size = 14),
    axis.text.y = element_text(size = 12),
    axis.text.x = element_text(size = 12),
    panel.grid.minor = element_blank(),
    panel.spacing.y = unit(1.2, "lines")
  )
Figure 4: Leverage Distribution by Industry and Year for Germany and Sweden (2018–2021)

The industry-level boxplots show that leverage remained broadly stable across sectors in both countries from 2018 to 2021. Capital-intensive industries such as Manufacturing, Construction, and Transport and Storage consistently maintain higher debt-to-assets ratios than service-oriented sectors including Information and Communication or Professional and Technical Services.

Median leverage levels rise slightly in 2020, consistent with temporary borrowing increases during the early pandemic period. However, the overall distribution does not shift dramatically, and both countries display similar industry-level leverage structures.

When considered alongside liquidity patterns, the findings suggest that firms preserved adequate short-term financial capacity. Median current ratios remain close to 1.8 to 2.0 throughout the period, indicating that most firms retained sufficient ability to meet short-term obligations despite moderate borrowing increases.

Taken together, the evidence indicates that corporate balance sheets were supported by prudent financial management and policy buffers. While leverage increased modestly in 2020, neither Germany nor Sweden experienced systemic liquidity deterioration or widespread solvency collapse during the pandemic.

Show code
# Sensitivity check: winsorise extreme ratios
lev_liq_summary_w <- lev_liq_data |>
  mutate(across(c(debt_to_assets, current_ratio),
                ~ scales::squish(., c(quantile(., 0.01, na.rm=TRUE),
                                      quantile(., 0.99, na.rm=TRUE))))) |>
  group_by(country, acct_year) |>
  summarise(across(c(debt_to_assets, current_ratio), median, na.rm=TRUE), .groups="drop")
knitr::kable(head(lev_liq_summary_w), caption="Winsorised medians confirm robustness.")
Winsorised medians confirm robustness.
country acct_year debt_to_assets current_ratio
Germany 2018 0.50 1.9
Germany 2019 0.51 1.8
Germany 2020 0.51 1.8
Germany 2021 0.51 1.8
Sweden 2018 0.46 1.6
Sweden 2019 0.48 1.6

Summary:

  • Leverage: Small increase in 2020, partial normalisation by 2021.
  • Liquidity: Stable across both countries, indicating effective cash-flow management.
  • Cross-country comparison: Sweden’s firms appear marginally more liquid and slightly less leveraged, suggesting somewhat greater resilience to credit stress.
  • Overall conclusion: The pandemic triggered modest balance-sheet adjustments but did not produce systemic liquidity deterioration. Germany’s higher industrial exposure explains the slightly stronger leverage–solvency sensitivity, while Sweden’s diversified structure provided a buffer.

5.3 Question 3

The purpose of this section is to examine the structure and main financial indicators of German and Swedish firms before and during the COVID-19 pandemic from 2018 to 2021. The analysis focuses on firm size, profitability, leverage, and liquidity across industries and over time to identify resilience and vulnerability patterns.

5.3.1 Industry-level Financial Summary for Sweden

Show code
industry_financial_summary_sweden <- filtered_germany_sweden |> 
  filter(country == "Sweden", !is.na(industry_group), acct_year >= 2019, acct_year <= 2021) |> 
  group_by(industry_group, acct_year) |> 
  summarise(
    n_firms = n(),
    median_profit = median(net_income_eur, na.rm = TRUE),
    median_roa = median(roa_pct, na.rm = TRUE),
    median_liq = median(current_ratio, na.rm = TRUE),
    median_lev = median(gearing_pct, na.rm = TRUE),
    .groups = "drop"
  ) |> 
  mutate(period = ifelse(acct_year < 2020, "Pre-pandemic", "Pandemic"))
Show code
combined_summary <- bind_rows(
  industry_financial_summary |> mutate(country="Germany"),
  industry_financial_summary_sweden |> mutate(country="Sweden")
)

combined_summary$period <- factor(combined_summary$period, levels = c("Pre-pandemic", "Pandemic"))

ggplot(combined_summary, aes(
    x = industry_group,
    y = median_profit,
    fill = country
  )) +
  geom_col(position = "dodge", width = 0.6) +
  facet_wrap(~period) +
  labs(
    title = "Median Profit Comparison by Industry",
    x = "Industry",
    y = "Median Net Income (EUR)",
    fill = "Country"
  ) +
  scale_fill_manual(values = c("Germany" = "#E69F00", "Sweden" = "#56B4E9")) +
  scale_y_continuous(
    breaks = seq(0, max(combined_summary$median_profit, na.rm = TRUE), by = 100000),
    labels = function(x) paste0(scales::comma(x / 1000), "k")
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
    axis.text.y = element_text(size = 10),
    axis.title.x = element_text(size = 10, face = "bold"),
    axis.title.y = element_text(size = 10, face = "bold"),
    legend.position = "top",
    plot.title = element_text(size = 10, face = "bold"),
    strip.text = element_text(size = 10, face = "bold")
  )

This analysis compares median net income across industries in Germany and Sweden during the pre-pandemic year (2019) and the pandemic period (2020 – 2021).

The results show clear sectoral divergence. In Sweden, industries such as Health Care, Utilities, and Information & Communication maintain relatively strong median profitability during the pandemic, suggesting structural resilience and stable demand. In contrast, more cyclical sectors such as Transport & Storage and Accommodation & Food display greater volatility.

Germany exhibits a similar pattern of sectoral dispersion but with more pronounced shifts in certain capital-intensive industries. Manufacturing and Construction show visible sensitivity to pandemic disruption, while defensive sectors remain comparatively stable.

Overall, the industry comparison indicates that resilience was uneven and strongly sector-dependent rather than country-wide. The pandemic amplified pre-existing structural differences between defensive and cyclical industries.

5.3.2 Trend of Median Financial Ratios Over Time

Show code
median_trends <- filtered_germany_sweden |> 
  filter(acct_year >= 2019, acct_year <= 2021) |> 
  group_by(country, acct_year) |> 
  summarise(across(all_of(financial_vars), median, na.rm=TRUE), .groups="drop") |> 
  pivot_longer(cols = all_of(financial_vars), names_to="variable", values_to="median_value")

ggplot(median_trends, aes(x=acct_year, y=median_value, color=country, group=country)) +
  geom_line(size=1) +
  geom_point(size=2) +
  facet_wrap(~ variable, scales="free_y") +
  labs(title="Median Financial Indicators Over Time", x="Year", y="Median Value", color="Country") +
  theme_minimal()

The median trend plots provide a macro-level view of how financial health evolved between 2019 and 2021.

Liquidity remains relatively stable in both countries, indicating that firms maintained adequate short-term buffers throughout the crisis. Sweden shows a gradual increase in total assets by 2021, suggesting faster balance sheet expansion during the recovery phase.

Profitability declines in 2020, particularly in Germany, before rebounding in 2021. This pattern reflects temporary operating disruption rather than long-term structural damage.

Leverage trends differ slightly between the two countries. Germany’s gearing ratio increases modestly, consistent with greater reliance on debt during the shock period. Sweden, by contrast, shows more controlled leverage dynamics, suggesting stronger internal financing capacity.

Taken together, the median trends indicate that neither economy experienced systemic financial collapse, but Sweden displays slightly stronger balance sheet expansion and leverage containment during recovery.

5.3.3 Boxplots of Key Financial Ratios

Show code
# Reshape data for plotting
eda_data <- filtered_germany_sweden |> 
  filter(acct_year >= 2019, acct_year <= 2021) |> 
  select(country, acct_year, all_of(financial_vars)) |> 
  pivot_longer(cols = all_of(financial_vars), names_to = "variable", values_to = "value")

ggplot(eda_data, aes(x=country, y=value, fill=country)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16) +
  facet_wrap(~ variable, scales = "free_y") +
  labs(title="Distribution of Key Financial Ratios (2019-2021)", 
       y="Value",
       x="Country") +
  theme_minimal()

The boxplots provide insight into dispersion and firm-level heterogeneity.

Both countries exhibit wide distributions in profitability and firm size, with substantial outliers. This confirms that average or aggregate figures would mask important variation across firms. The presence of extreme observations reinforces the decision to rely on medians rather than means.

Germany shows slightly higher dispersion in gearing ratios, indicating greater heterogeneity in debt exposure. Sweden displays a somewhat tighter interquartile range for leverage, consistent with more moderate balance sheet risk.

Importantly, dispersion increases during the pandemic period, particularly in profitability measures. This suggests that the crisis did not affect firms uniformly; instead, it created divergence between stronger and weaker firms, consistent with a partial K-shaped recovery dynamic.

Overall Interpretation

Across industries and financial indicators, the pandemic’s impact was moderate but uneven. Defensive sectors maintained profitability and liquidity, while cyclical industries experienced sharper pressure. Median values suggest overall stability, but distributional evidence reveals increased dispersion and firm-level divergence.

Sweden appears slightly more resilient in terms of leverage containment and asset growth, whereas Germany shows greater sensitivity in capital-intensive sectors. However, neither country experienced widespread structural deterioration, indicating that corporate balance sheets were sufficiently robust to absorb the shock.

5.4 Question 4

4. How did the fundamental relationship between corporate liquidity and profitability evolve and fracture within industries during the pandemic?

5.4.1 Macro-level evidence: Industry recovery patterns (ROA vs CR)

This section evaluates whether the structural relationship between liquidity measured by Current Ratio and profitability measured by ROA remained stable during the pandemic across industries in Germany and Sweden.

Show code
# Calculate the average ROA and current ratio for each industry each year
industry_ts <- ger_swe_ts |>
  group_by(industry_group, country) |>
  index_by(fy_year) |>
  summarise(
    industry_roa = mean(roa_pct, na.rm = TRUE),
    industry_cr = mean(current_ratio, na.rm = TRUE),
    n_companies = n()  
  )

# Calculating time series features
industry_features <- industry_ts |>
  group_by(industry_group, country) |>
  features(industry_roa, feat_stl) |>
  rename_with(~ paste0("roa_", .), -c(industry_group, country))  |>
  left_join(
    industry_ts |>
      features(industry_cr, features = feat_stl) |>
      rename_with(~ paste0("cr_", .), -c(industry_group, country)),
    by = c("industry_group", "country")
  )

# Focus on trend_strength and spikiness
focus_features <- industry_features |>
  select(industry_group, country,
         roa_trend_strength, roa_spikiness,
         cr_trend_strength, cr_spikiness)

Trend Strength Analysis

In Figure 5, colors represent industry groups and shapes denote countries. The left panel plots ROA trend strength against Current Ratio trend strength using a 0.3 threshold to identify structural disruption.

Industries below this threshold experienced weakened financial trajectories. Simultaneous reductions in both profitability and liquidity trend strength indicate that the internal financial structure of the industry did not hold during the pandemic period. Germany shows fewer industries with dual breakdown, while Sweden displays more sectors with reduced stability.

Volatility Analysis

The right panel of Figure 5 compares ROA spikiness and Current Ratio spikiness. Median reference lines identify industries with above average volatility.

Industries positioned above both medians experienced strong fluctuations in profitability and liquidity at the same time. Swedish industries are more concentrated in this high volatility region, suggesting greater instability in financial adjustment during the crisis.

Show code
# Trend strength analysis to identify the hardest-hit sectors
trend_analysis <- focus_features |>
  group_by(country) |>
  mutate(
    # Identify sectors where trends are broken (low trend strength)
    roa_trend_disrupted = roa_trend_strength < 0.3,
    cr_trend_disrupted = cr_trend_strength < 0.3,
    overall_trend_impact = (roa_trend_strength + cr_trend_strength) / 2
  ) |>
  ungroup()

# Volatility analysis to identify the sectors with the greatest volatility during the pandemic
volatility_analysis <- focus_features |>
  group_by(country) |>
  mutate(
    high_roa_volatility = roa_spikiness > median(roa_spikiness),
    high_cr_volatility = cr_spikiness > median(cr_spikiness),
    overall_volatility = (scale(roa_spikiness) + scale(cr_spikiness)) / 2)  |>
  ungroup()

# Create trend strength plot
trend_plot <- focus_features |>
  ggplot(aes(x = roa_trend_strength, y = cr_trend_strength)) +
  geom_point(aes(size = roa_spikiness, 
                 color = industry_group, 
                 shape = country,  
                 text = paste("<b>Industry:</b>", industry_group, 
                             "<br><b>Country:</b>", country,  
                             "<br><b>ROA Trend:</b>", round(roa_trend_strength, 3),
                             "<br><b>CR Trend:</b>", round(cr_trend_strength, 3))), 
             alpha = 0.7) +
  geom_hline(yintercept = 0.3, linetype = "dashed", color = "red", alpha = 0.7) +
  geom_vline(xintercept = 0.3, linetype = "dashed", color = "red", alpha = 0.7) +
  labs(x = "ROA Trend Strength", y = "Current Ratio Trend Strength",
       size = "ROA Spikiness", shape = "Country") +
  theme_minimal() + 
  theme(legend.position = "none",
         axis.text = element_text(size = 9)) +
  scale_shape_manual(values = c(16, 17)) 

# Convert to interactive plotly
trend_in <- ggplotly(trend_plot, tooltip = "text") 

# Create volatility plot
volatility_plot <- focus_features |>
  ggplot(aes(x = roa_spikiness, y = cr_spikiness)) +
  geom_point(aes(color = industry_group, 
                 size = roa_trend_strength, 
                 shape = country,  
                 text = paste("<b>Industry:</b>", industry_group,
                             "<br><b>Country:</b>", country,  
                             "<br><b>ROA Spikiness:</b>", round(roa_spikiness, 3),
                             "<br><b>CR Spikiness:</b>", round(cr_spikiness, 3))), 
             alpha = 0.7) +
  geom_hline(yintercept = median(focus_features$cr_spikiness), 
             linetype = "dashed", color = "blue", alpha = 0.5) +
  geom_vline(xintercept = median(focus_features$roa_spikiness), 
             linetype = "dashed", color = "blue", alpha = 0.5) +
  coord_cartesian(xlim = c(0, quantile(focus_features$roa_spikiness, 0.90))) +
  labs(x = "ROA Spikiness", y = "Current Ratio Spikiness",
       size = "ROA Trend Strength", shape = "Country") +
  theme_minimal() +
  theme(legend.position = "none",
         axis.text = element_text(size = 9)) +
  scale_shape_manual(values = c(16, 17))  

volatility_in <- ggplotly(volatility_plot, tooltip = "text")

# Combined analysis
combined_analysis <- subplot(
  trend_in, volatility_in,
  nrows = 1,
  shareY = FALSE,
  titleX = TRUE,
  titleY = TRUE,
  margin = 0.05
) |>
  plotly::layout(
    title = list(
      text = "<b>Industry Financial Impact Analysis During Pandemic</b>",
      x = 0.5,
      xanchor = "center",
      y = 0.98,
      font = list(size = 16, color = "black"),
      automargin = TRUE,
      margin = list(l = 60, r = 60, b = 100, t = 150)
    ),
    hoverlabel = list(
      bgcolor = "white", 
      font = list(size = 11, color = "black"),
      bordercolor = "lightgray"
    ),
    margin = list(l = 60, r = 60, b = 80),
    hovermode = "closest",
    annotations = list(
      list(
        x = 0, y = -0.15,
        xref = "paper", yref = "paper",
        text = "Red lines: 0.3 threshold",
        showarrow = FALSE,
        font = list(size = 10, color = "red")
      ),
      list(
        x = 0.85, y = -0.15,
        xref = "paper", yref = "paper",
        text = "Blue lines: Median volatility values",
        showarrow = FALSE,
        font = list(size = 10, color = "blue")
      )
    )
  )

combined_analysis
Figure 5: Combined analysis of financial trend disruption and volatility during the pandemic. Shape denotes country (circles: Germany; triangles: Switzerland).

Threshold-based Industry Classification

Industries are classified using country specific thresholds based on trend strength and spikiness.

Industries with high trend strength and low volatility are classified as resilient. Industries with low trend strength and high volatility are classified as K shaped. Industries with high trend strength but high volatility are classified as hidden crisis. All remaining industries are classified as moderate impact.

Show code
industry_classes <- focus_features |>
  group_by(country) |>
  mutate(
    # Calculate threshold
    trend_threshold_high = 0.7,
    trend_threshold_low = 0.3,
    spikiness_threshold_high = quantile(roa_spikiness, 0.75),
    spikiness_median = median(roa_spikiness), 
    
    # Classification
    recovery_pattern = case_when(
      roa_trend_strength >= trend_threshold_high & 
        roa_spikiness <= median(roa_spikiness) ~ "Resilient",
      roa_trend_strength <= trend_threshold_low & 
        roa_spikiness >= spikiness_threshold_high ~ "K-shaped",
      roa_trend_strength >= trend_threshold_high & 
        roa_spikiness >= spikiness_threshold_high ~ "Hidden Crisis",
      TRUE ~ "Moderate Impact"
    )
  )

# Comparing the classification differences between the two countries
classification_comparison <- industry_classes |>
  select(industry_group, country, recovery_pattern) |>
  pivot_wider(names_from = country, values_from = recovery_pattern) |>
  mutate(
    pattern_match = Germany == Sweden,
    cross_country_insight = case_when(
      Germany == "Resilient" & Sweden != "Resilient" ~ "Resilient only in Germany",
      Sweden == "Resilient" & Germany != "Resilient" ~ "Resilient only in Sweden",
      Germany == "K-shaped" & Sweden != "K-shaped" ~ "K-shaped only in Germany", 
      Sweden == "K-shaped" & Germany != "K-shaped" ~ "K-shaped only in Sweden",
      TRUE ~ "Similar pattern"
    )
  ) |>
  filter(!(Germany == "Moderate Impact" & Sweden == "Moderate Impact")) |>
  select(industry_group, Germany, Sweden, cross_country_insight)

kable(classification_comparison) |>
  kable_styling(font_size = 11)
Table 2: Industry Recovery Classification for Sweden and Germany based on Country-Specific Thresholds
industry_group Germany Sweden cross_country_insight
Manufacturing Moderate Impact Resilient Resilient only in Sweden
Construction Resilient Moderate Impact Resilient only in Germany
Wholesale & Retail Trade K-shaped Moderate Impact K-shaped only in Germany
Accommodation & Food K-shaped Moderate Impact K-shaped only in Germany
Information & Communication K-shaped K-shaped Similar pattern
Financial & Insurance Resilient Moderate Impact Resilient only in Germany
Administrative & Support Hidden Crisis Resilient Resilient only in Sweden
Arts, Entertainment & Recreation K-shaped Moderate Impact K-shaped only in Germany
Consumer Services Moderate Impact K-shaped K-shaped only in Sweden
Technology Resilient K-shaped Resilient only in Germany

The classification reveals meaningful cross country differences. Germany demonstrates resilience in Construction and Financial Services. Sweden shows stronger recovery in Manufacturing and Administrative Support. Information and Communication displays K shaped dynamics in both countries, indicating that sector characteristics can dominate national context.

5.4.2 Micro-level evidence of anomalous recovery patterns across industries (ROA and liquidity)

This section evaluates firm level relationships between Current Ratio and ROA within industries that exhibited distinct macro level patterns.

Selected industries include Information and Communication, Wholesale and Retail Trade, Consumer Services, Financial and Insurance, and Manufacturing.

Show code
selected_industries <- c("Information & Communication", "Wholesale & Retail Trade", "Consumer Services", "Financial & Insurance", "Manufacturing")

# Extract the companies time series data of these industries
selected_companies_ts <- ger_swe_ts |>
  filter(industry_group %in% selected_industries)
Show code
# Aggregate the original point set by industry × year
by_ind_year <- selected_companies_ts |>
  group_by(industry_group) |>
  index_by(fy_year) |>
  nest(data = c(company_id, roa_pct, current_ratio)) |>
  mutate(n_obs = map_int(data, ~ nrow(.x)))

# Calculate scagnostics
scag_names <- c("outlying", "stringy", "striated", "clumpy", "sparse", "monotonic", "dcor")

by_ind_year <- by_ind_year |>
  mutate(
    scags = map(data, ~ {
      df <- .x
      # When there are too few samples, return a row of tibble with the same column names and types.
      if (nrow(df) < 3) {
        return(as_tibble(set_names(as.list(rep(NA_real_, length(scag_names))), scag_names)))
      }
      res <- calc_scags(df$current_ratio, df$roa_pct, scags = scag_names)
      # Make sure to convert to atomic vectors and name them, then convert to tibble
      res_vec <- unlist(res)
      res_named <- set_names(as.numeric(res_vec[scag_names]), scag_names)
      as_tibble(as.list(res_named))
    })
  ) |>
  unnest(cols = scags)  # Expand into multiple columns

# Fit a linear regression for each industry × year and extract slope, intercept, and resid_sd
by_ind_year <- by_ind_year |>
  mutate(
    fit = map(data, ~ {
      df <- .x
      if (nrow(df) < 3 || all(is.na(df$current_ratio)) || all(is.na(df$roa_pct))) return(NULL)
      lm(roa_pct ~ current_ratio, data = df)
    }),
    slope = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else coef(.x)[["current_ratio"]]),
    intercept = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else coef(.x)[["(Intercept)"]]),
    resid_sd = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else sd(residuals(.x), na.rm = TRUE))
  ) |>
  select(-fit)

# Industry time-series features, grouped by industry_group AND country
industry_tignostics <- by_ind_year |>
  arrange(industry_group, country, fy_year) |>
  group_by(industry_group, country) |>
  mutate(
    slope_lag = dplyr::lag(slope),
    slope_change = slope - slope_lag,   # year-to-year change in slope
    slope_pct_change = if_else(!is.na(slope_lag) & slope_lag != 0,
                               (slope - slope_lag) / abs(slope_lag) * 100,
                               NA_real_),
    resid_sd_lag = dplyr::lag(resid_sd),
    resid_sd_change = resid_sd - resid_sd_lag,
    big_slope_jump = if_else(!is.na(slope_change) & abs(slope_change) > 0.5, 1L, 0L),
    jumps_cumulative = cumsum(replace_na(big_slope_jump, 0L))
  ) |>
  ungroup()

# Calculate the overall volatility_shift (post - pre) before and after the epidemic (taking 2020 as the epidemic year)
vol_shift <- by_ind_year |>
  mutate(period = case_when(
    fy_year < 2020 ~ "pre",
    fy_year == 2020 ~ "covid_year",
    fy_year > 2020 ~ "post"
  )) |>
  group_by(industry_group, country, period) |>
  summarise(mean_resid_sd = mean(resid_sd, na.rm = TRUE),
            mean_slope = mean(slope, na.rm = TRUE),
            .groups = "drop") |>
  pivot_wider(names_from = period, values_from = c(mean_resid_sd, mean_slope)) |>
  mutate(
    vol_pre_post_diff = mean_resid_sd_post - mean_resid_sd_pre,
    slope_pre_post_diff = mean_slope_post - mean_slope_pre
  )

# Merge the results to generate the final summary
industry_summary <- industry_tignostics |>
  group_by(industry_group, country) |>
  summarise(
    n_years = n_distinct(fy_year),
    avg_slope = mean(slope, na.rm = TRUE),
    max_abs_slope_change = if (all(is.na(slope_change))) NA_real_ else max(abs(slope_change), na.rm = TRUE),
    mean_resid_sd = mean(resid_sd, na.rm = TRUE),
    jumps = if (all(is.na(jumps_cumulative))) 0 else max(jumps_cumulative, na.rm = TRUE),
    .groups = "drop"
  ) |>
  left_join(vol_shift |> select(industry_group, country, vol_pre_post_diff, slope_pre_post_diff), 
            by = c("industry_group", "country"))

kable(industry_summary |> 
        filter( !is.na(max_abs_slope_change)) |>
        select(-n_years)) |>
  kable_styling(font_size = 12)
Table 3: Summary statistics for industries with significant slope changes (jumps > 0). The table shows average slope, maximum absolute slope change, mean residual standard deviation, number of jumps, and differences in volatility and slope before and after the COVID-19 pandemic for three industries: Wholesale & Retail Trade, Information & Communication, and Financial & Insurance.
industry_group country avg_slope max_abs_slope_change mean_resid_sd jumps vol_pre_post_diff slope_pre_post_diff
Manufacturing Germany 0.00 0.54 14.3 1 -4.24 0.59
Manufacturing Sweden -0.60 0.43 28.1 0 -2.61 -0.37
Wholesale & Retail Trade Germany 2.08 2.97 7.8 4 -7.53 -0.54
Wholesale & Retail Trade Sweden -1.23 4.00 15.6 4 6.84 -4.29
Information & Communication Germany 2.50 4.67 8.2 3 2.81 6.91
Information & Communication Sweden 3.64 2.90 24.6 3 -7.20 2.72
Financial & Insurance Germany 0.83 2.05 11.9 3 -0.67 0.99
Financial & Insurance Sweden -0.67 4.95 21.5 4 -11.66 -0.83

Table 3 summarises slope changes and volatility shifts.

Except for Consumer Services, all industries experienced multiple major slope jumps during the pandemic. This indicates that the profitability liquidity relationship changed materially at firm level.

Swedish industries display larger increases in residual dispersion and clustering measures. German industries show comparatively more stable slope evolution.

Show code
scag_long <- by_ind_year |>
  select(industry_group, country, fy_year, clumpy, outlying, stringy, monotonic) |>
  pivot_longer(cols = c(clumpy, outlying, stringy, monotonic),
               names_to = "scag_metric", values_to = "value")

ggplot(scag_long, aes(x = fy_year, y = value, color = industry_group)) +
  geom_line(linewidth = 0.7) +
  facet_grid(country~ scag_metric, scales = "free_y") +
  geom_vline(xintercept = 2020, linetype = "dashed", color = "red") +
  labs(
    title = "Evolution of Scagnostic Metrics over Time",
    subtitle = "Dashed line = COVID year (2020)",
    x = "Fiscal Year", y = "Scagnostic Value",
    color = "Industry group"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 6),
    legend.key.width = unit(0.8, "cm"),
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 9),
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11)) +
  guides(color = guide_legend(
    nrow = 2, byrow = TRUE, override.aes = list(size = 3))
    )
Figure 6: Scagnostic metrics across industries and countries. Dashed line marks 2020 (COVID onset). We can see variations in outlying (extreme values), clumpy (clustering patterns), stringy (linear relationships), and monotonic (consistent trends) metrics before and after the pandemic.

Figure 6 shows structural evolution over time. The vertical line marks 2020. Post 2020 increases in clustering and outlying behaviour are more pronounced in Sweden, indicating stronger firm level divergence.

Summary

Germany demonstrates more stable post 2020 financial alignment across industries. Sweden exhibits stronger divergence and volatility within industries after the pandemic shock.

The evidence suggests that national financial environments influenced how firms adjusted liquidity and profitability during crisis conditions.

6 Summary per Question

6.1 Question 1

The analysis indicates that overall profitability remained broadly stable in both Germany and Sweden during the pandemic relative to the pre-pandemic period. Median EBIT, EBITDA, ROA, ROE, and net income stayed close to zero, suggesting that the typical firm did not experience a structural collapse in profitability.

However, dispersion increased during 2020–2021, particularly in Sweden, where firms experienced more extreme positive and negative outcomes. At the industry level, most sectors showed only modest changes in median EBITDA margins. Accommodation and Food recorded the strongest rebound, while Health Care experienced the largest decline. Technology, Utilities, and Education remained comparatively stable in both countries.

These findings suggest that the pandemic did not dramatically shift central profitability trends, but it amplified heterogeneity across firms and industries, especially within service-oriented sectors.

6.2 Question 2

Both Germany and Sweden displayed financial resilience during the COVID-19 period, though through different balance sheet adjustments.

Liquidity levels remained stable in both countries, with current ratios around 1.8 to 1.9. This indicates effective short-term financial management throughout the crisis. Sweden exhibited slightly greater dispersion, reflecting more variation in firm size and funding structures.

Leverage increased moderately in 2020 in both economies. German firms relied more heavily on external borrowing, consistent with strong credit support mechanisms. Swedish firms appear to have relied more on internal liquidity buffers.

The negative relationship between leverage and solvency is evident in both countries, though the steeper slope in Germany suggests greater sensitivity to debt accumulation. Industry patterns are broadly similar, with Manufacturing, Transport, and Construction remaining more leveraged, while Information, Health Care, and Professional Services maintain more conservative capital structures.

Overall, German firms’ solvency was more exposed to leverage increases, while Swedish firms’ diversified financial structures supported steadier post-pandemic adjustment.

6.3 Question 3

The comparative analysis of Germany and Sweden between 2019 and 2021 reveals distinct resilience patterns during the pandemic.

Both countries experienced profitability pressures and balance sheet adjustments. However, Swedish firms maintained more stable median profitability and liquidity, with fewer extreme outliers. German firms exhibited greater cross-industry variation, particularly in hospitality and transport, indicating more uneven financial stress.

These results suggest that Sweden’s corporate sector adapted with smoother financial adjustment, while Germany experienced sharper disruptions across certain industries, alongside recovery signals in capital-intensive sectors.

6.4 Question 4

The pandemic altered not only financial performance levels but also the structural relationship between liquidity and profitability.

Macro-level analysis shows that Germany maintained relatively stable trend strength and contained volatility across industries. Sweden displayed stronger fluctuations and more frequent structural breakdown in the profitability–liquidity relationship.

Micro-level scagnostic analysis reinforces this contrast. Swedish industries experienced larger increases in clustering, outliers, and residual dispersion, indicating stronger firm-level divergence. Germany showed comparatively steadier slope evolution and more consistent financial alignment.

The evidence suggests that Germany’s institutional environment supported stability in liquidity management, while Sweden’s more flexible structure allowed greater dispersion in firm outcomes. In several sectors, particularly Information and Communication, this resulted in K-shaped recovery dynamics.

Overall, resilience during the pandemic depended not only on firm-level financial management but also on national institutional and policy environments shaping adjustment mechanisms.

7 Conclusion

The comparative analysis of Germany and Sweden from 2018 to 2021 demonstrates that both economies maintained corporate financial resilience throughout the COVID-19 period, although through different adjustment mechanisms.

Profitability indicators, including ROA, ROE, and EBITDA margins, remained broadly stable in both countries. The central tendency of firm performance did not collapse. However, dispersion increased during 2020 – 2021, particularly in service-oriented sectors such as accommodation and transport, indicating more uneven firm-level outcomes.

Liquidity positions remained sound in both economies, with current ratios consistently around 1.8 to 1.9. This suggests that firms successfully managed short-term obligations despite economic disruption. Differences emerge in leverage dynamics. German firms relied more heavily on debt financing, supported by government-backed credit programs, while Swedish firms maintained comparatively conservative leverage and relied more on internal liquidity buffers. As a result, German solvency outcomes were more sensitive to rising leverage, whereas Swedish firms exhibited greater balance-sheet stability.

Industry patterns further reinforce this contrast. Capital-intensive sectors such as manufacturing, transport, and construction operated with higher leverage but maintained adequate liquidity. Information technology, health care, and professional services showed lower debt exposure and relatively stable performance.

The relationship between liquidity and profitability reveals distinct recovery structures. German firms displayed steadier, more coordinated financial adjustment. Swedish firms exhibited greater internal divergence, with stronger volatility and evidence of K-shaped dynamics in several sectors.

Overall, neither country experienced a systemic liquidity crisis, and corporate stability was preserved. The findings indicate that financial resilience during crisis periods depends not only on internal balance-sheet management but also on institutional settings and policy frameworks. Germany’s coordinated fiscal interventions promoted structural stability, while Sweden’s more flexible corporate environment enabled adaptive but more heterogeneous recovery paths.

8 References

The materials used for this report are:

  • Bureau van Dijk. (2025). OSIRIS [Data set: Germany and Sweden subset]. Bureau van Dijk – A Moody’s Analytics Company. https://www.bvdinfo.com/en-gb/our-products/data/international/osiris

  • Tidyverse: Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.

  • Conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. doi:10.32614/CRAN.package.conflicted https://doi.org/10.32614/CRAN.package.conflicted, R package version 1.2.0, https://CRAN.R-project.org/package=conflicted.

  • Dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. doi:10.32614/CRAN.package.dplyr https://doi.org/10.32614/CRAN.package.dplyr, R package version 1.1.4, https://CRAN.R-project.org/package=dplyr.

  • Stringr: Wickham H (2025). stringr: Simple, Consistent Wrappers for Common String Operations. doi:10.32614/CRAN.package.stringr https://doi.org/10.32614/CRAN.package.stringr, R package version 1.5.2, https://CRAN.R-project.org/package=stringr.

  • Janitor: Firke S (2024). janitor: Simple Tools for Examining and Cleaning Dirty Data. doi:10.32614/CRAN.package.janitor https://doi.org/10.32614/CRAN.package.janitor, R package version 2.2.1, https://CRAN.R-project.org/package=janitor.

  • Skimr: Waring E, Quinn M, McNamara A, Arino de la Rubia E, Zhu H, Ellis S (2025). skimr: Compact and Flexible Summaries of Data. doi:10.32614/CRAN.package.skimr https://doi.org/10.32614/CRAN.package.skimr, R package version 2.2.1, https://CRAN.R-project.org/package=skimr.

  • Tsibble: Wang, E, D Cook, and RJ Hyndman (2020). A new tidy data structure to support exploration and modeling of temporal data, Journal of Computational and Graphical Statistics, 29:3, 466-478, doi:10.1080/10618600.2019.1695624.

  • Knitr: Xie Y (2025). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50, https://yihui.org/knitr/.

    Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963

    Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595

  • Naniar: Tierney N, Cook D (2023). “Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations.” Journal of Statistical Software, 105(7), 1-31. doi:10.18637/jss.v105.i07 https://doi.org/10.18637/jss.v105.i07.

  • Patchwork: Pedersen T (2025). patchwork: The Composer of Plots. doi:10.32614/CRAN.package.patchwork https://doi.org/10.32614/CRAN.package.patchwork, R package version 1.3.2, https://CRAN.R-project.org/package=patchwork.

  • Plotly: C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.

  • tsibbletalk: Wang E, Cook D (2020). tsibbletalk: Interactive Graphics for Tsibble Objects. doi:10.32614/CRAN.package.tsibbletalk https://doi.org/10.32614/CRAN.package.tsibbletalk, R package version 0.1.0, https://CRAN.R-project.org/package=tsibbletalk.

  • Feasts: O’Hara-Wild M, Hyndman R, Wang E (2025). feasts: Feature Extraction and Statistics for Time Series. doi:10.32614/CRAN.package.feasts https://doi.org/10.32614/CRAN.package.feasts, R package version 0.4.2, https://CRAN.R-project.org/package=feasts.

  • Broom: Robinson D, Hayes A, Couch S (2025). broom: Convert Statistical Objects into Tidy Tibbles. doi:10.32614/CRAN.package.broom https://doi.org/10.32614/CRAN.package.broom, R package version 1.0.9, https://CRAN.R-project.org/package=broom.

  • Purrr: Wickham H, Henry L (2025). purrr: Functional Programming Tools. doi:10.32614/CRAN.package.purrr https://doi.org/10.32614/CRAN.package.purrr, R package version 1.1.0, https://CRAN.R-project.org/package=purrr.

  • Cassowaryr: L. Wilkinson, A. Anand and R. Grossman, Graph-theoretic scagnostics, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., 2005, pp. 157-164, doi: 10.1109/INFVIS.2005.1532142.

    L. Wilkinson and G. Wills (2008) Scagnostics Distributions, Journal of Computational and Graphical Statistics, 17(2), pp 473-491, doi:10.1198/106186008X320465

    K. Grimm, Kennzahlenbasierte Grafikauswahl, doctoral thesis, Universitat Augsburg, 2016.

    H. Mason, Lee, S., Laa, U. and Cook, D.

    1. cassowaryr: Compute Scagnostics on Pairs of Numeric Variables in a Data Set. R package version 2.0.0. https://CRAN.R-project.org/package=cassowary
  • Gganimate: Pedersen T, Robinson D (2025). gganimate: A Grammar of Animated Graphics. doi:10.32614/CRAN.package.gganimate https://doi.org/10.32614/CRAN.package.gganimate, R package version 1.0.11, https://CRAN.R-project.org/package=gganimate.

  • Gifski: Ooms J, Kornel Lesiński, Authors of the dependency Rust crates (2025). gifski: Highest Quality GIF Encoder. doi:10.32614/CRAN.package.gifski https://doi.org/10.32614/CRAN.package.gifski, R package version 1.32.0-2, https://CRAN.R-project.org/package=gifski.

  • KableExtra: Zhu H (2024). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. doi:10.32614/CRAN.package.kableExtra https://doi.org/10.32614/CRAN.package.kableExtra, R package version 1.4.0, https://CRAN.R-project.org/package=kableExtra.

  • Ggplot2: H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

  • Scales: Wickham H, Pedersen T, Seidel D (2025). scales: Scale Functions for Visualization. doi:10.32614/CRAN.package.scales https://doi.org/10.32614/CRAN.package.scales, R package version 1.4.0, https://CRAN.R-project.org/package=scales.

  • Tidyr: Wickham H, Vaughan D, Girlich M (2024). tidyr: Tidy Messy Data. doi:10.32614/CRAN.package.tidyr https://doi.org/10.32614/CRAN.package.tidyr, R package version 1.3.1, https://CRAN.R-project.org/package=tidyr.

  • Ggrepel: Slowikowski K (2024). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. doi:10.32614/CRAN.package.ggrepel https://doi.org/10.32614/CRAN.package.ggrepel, R package version 0.9.6, https://CRAN.R-project.org/package=ggrepel.

  • ViridisLite: Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2023). viridis(Lite) - Colorblind-Friendly Color Maps for R. viridisLite package version 0.4.2.

  • The Associated Press. (2020). Europe braces for next wave of coronavirus pandemic in Berlin. AP News. https://apnews.com/article/coronavirus-pandemic-health-europe-epidemics-berlin-b61de99739774c1f52b4ba6860054d6d

  • Godin, M. (2020). Sweden’s relaxed approach to the coronavirus could already be back-firing. Time. https://time.com/5817412/sweden-coronavirus/