Financial Health of Germany Firms During COVID-19

Author

Yusuf Kurnia Romadhon

Published

October 20, 2025

1 Executive Summary

This report examines how German firms adjusted profitability, leverage, and liquidity during the COVID 19 period using firm level financial data from 2018 to 2021.

Key findings:

Median operating profitability remained broadly stable, but downside risk increased significantly during 2020 to 2021.
Dispersion widened across firms, indicating uneven impact rather than systemic collapse.
Balance sheet structures remained conservative. Leverage increased only marginally and liquidity buffers were largely preserved.
Sectoral divergence intensified. Capital intensive and cyclical industries exhibited greater vulnerability, while technology and selected service sectors demonstrated stronger resilience.
The traditional positive relationship between liquidity and profitability weakened during 2020, suggesting firms prioritised cash preservation over margins during peak uncertainty.

Overall, the pandemic did not trigger structural balance sheet instability in German firms, but it amplified within industry heterogeneity and exposed sector specific vulnerabilities.

2 Introduction

The COVID-19 pandemic represented an unprecedented shock to global economic activity, testing the financial resilience of firms across industries and countries. In Germany, the pandemic period was characterised by nationwide mobility restrictions beginning in March 2020, followed by a gradual relaxation of measures through 2022, with lingering economic effects extending into subsequent years (The Associated Press, 2020).

This project examines how German firms responded to these shocks during the 2020–2021 period, using firm-level financial data sourced from the Bureau van Dijk OSIRIS database. OSIRIS provides harmonised corporate financial information, enabling consistent comparison across firms and industries. The dataset covers fiscal years 2018–2021, where 2018–2019 serve as the pre-pandemic baseline and 2020–2021 capture the pandemic-affected period. Financial data for 2022 are not available in this extract.

The analysis focuses on cross-industry differences in financial outcomes, with particular attention to profitability, leverage, liquidity, and firm size. Rather than treating the pandemic as a uniform shock, this study investigates how structural and strategic differences across industries shaped financial resilience and vulnerability.

From a decision-making perspective, the central question is not whether firms were affected by COVID-19, but how unevenly those effects were distributed. By examining shifts in central tendency, dispersion, and industry-level dynamics, this analysis aims to identify which sectors demonstrated unexpected resilience, which experienced sustained financial stress, and how balance-sheet adjustments mediated recovery during the pandemic period. The findings highlight the limits of aggregate financial indicators and underscore the importance of industry-specific analysis when assessing corporate resilience during systemic shocks.

3 Data description

The dataset used in this analysis is sourced from the Bureau van Dijk OSIRIS database, a globally recognised repository of harmonised financial information covering publicly listed and major private firms (Bureau van Dijk, 2025). For this project, firm-level financial data were extracted for German companies across the 2018–2021 accounting years, capturing both pre-pandemic conditions and the period most directly affected by COVID-19.

Each observation represents a firm’s reported financial position for a given accounting year, enabling longitudinal analysis of changes in profitability, liquidity, leverage, and overall financial resilience over time. The following sections document the reproducible process used to load, combine, standardise, and prepare the raw OSIRIS extracts for analysis.

Loading the data

This is the library used for this data.

Show code

library(dplyr)
library(stringr)
library(janitor)
library(skimr)
library(tsibble)
library(knitr)
library(naniar)
library(patchwork)
library(plotly)
library(tsibbletalk)
library(feasts)
library(broom)
library(purrr)
library(cassowaryr)
library(gganimate)
library(gifski)
library(kableExtra)
library(ggplot2)
library(scales)
library(tidyr)
library(ggrepel)
library(viridisLite)

The following code loads the annual OSIRIS extracts for Germany and stores each year’s data as a separate object, preserving the original reporting structure while enabling systematic combination.

Show code

#define the folder
data_directory <- "osiris"

#filter only germany
germany_file_paths <- list.files(
  data_directory,
  pattern = "^osiris_?Germany_(2018|2019|2020|2021)\\.rda$",
  full.names = TRUE,
  ignore.case = TRUE
)

#name the list by file stems
germany_file_stems <- tools::file_path_sans_ext(basename(germany_file_paths))
osiris_germany <- setNames(vector("list", length(germany_file_paths)), germany_file_stems)

#load each file
for (file_index in seq_along(germany_file_paths)) {
  temporary_environment <- new.env(parent = emptyenv())
  load(germany_file_paths[file_index], envir = temporary_environment)
  loaded_objects <- as.list(temporary_environment)

  #If the file holds one object, store that object; otherwise store the sub-list
  osiris_germany[[file_index]] <- if (length(loaded_objects) == 1) loaded_objects[[1]] else loaded_objects
}

Combining the data

Once loaded, the annual datasets are combined into a single longitudinal data frame. A reporting year variable is extracted from each file name to facilitate temporal ordering and year-to-year comparison.

Show code

#build a single data frame with a 'year' column taken from the filename
combined_germany <- {
  germany_parts <- vector("list", length(osiris_germany))
  names(germany_parts) <- names(osiris_germany)

  for (file_stem in names(osiris_germany)) {
    data_frame_from_file <- osiris_germany[[file_stem]]
    extracted_year <- as.integer(str_extract(file_stem, "\\d{4}"))

    germany_parts[[file_stem]] <- mutate(data_frame_from_file, year = extracted_year)
  }

  #bind and sort by year (optional)
  bind_rows(germany_parts) |>
    arrange(year)
}

Data cleaning & standardisation

After combining the annual extracts, the dataset is transformed into an analysis-ready format. This step standardises column names, maps vendor-specific field labels to clearly defined analytical variables, and constructs a fiscal-year indicator based on reported year-end dates.

Only firm identifiers, industry classifications, and financial variables relevant to the analytical objectives are retained, ensuring clarity, consistency, and analytical focus.

Show code

filtered_germany <- combined_germany |>
  clean_names() |>
  
  rename(
    # IDs / scope (used in all questions)
    company_name       = company_name_name,
    company_id         = name_id,
    acct_year          = year,
    country            = country_country,
    city               = city_city_city,
    consolidation_code = consolidation_code_consol_code,
    status             = status_status,
    
    # Industry codes (Q1, Q2, Q4)
    nace_code = nace_rev_1_core_code_cnacecd,
    icb_code  = industrial_classification_benchmark_icb,
    sic_code  = us_sic_core_code_csicuscde,
    
    # Scale / balance sheet (Q2, Q3)
    total_assets_eur      = total_assets_data13077,
    total_liabilities_eur = total_liabilities_and_debt_data14022,
    total_equity_eur      = total_shareholders_equity_data14041,
    
    # Revenue & profit (Q1, Q3)
    total_revenue_eur = total_revenues_data13004,
    net_sales_eur     = net_sales_data13002,
    gross_sales_eur   = gross_sales_data13000,
    net_income_eur    = net_income_starting_line_data15500,
    
    # Profitability ratios (Q1, Q4)
    ebit_margin_pct   = ebit_margin_percent_data31055,
    ebitda_margin_pct = ebitda_margin_percent_data31060,
    roa_pct           = return_on_total_assets_percent_data31015,
    roe_pct           = roe_percent_data31065,
    
    # Efficiency
    net_assets_turnover = net_assets_turnover_data31225,
    stock_turnover      = stock_turnover_data31220,
    
    # Leverage, solvency, and liquidity (Q2)
    solvency_pct        = solvency_ratio_percent_data31310,
    gearing_pct         = gearing_percent_data31315,
    current_ratio       = current_ratio_data31105,
    liquidity_ratio     = liquidity_ratio_data31110,
    shareholders_liq_pct= shareholders_liquidity_ratio_data31305,
    interest_cover      = interest_cover_data31115
  ) |>
  
  # Rename fiscal year
  mutate(
    fy_end_date = lubridate::ymd(as.character(company_fiscal_year_end_date_closdate)),
    fy_year     = year(fy_end_date)
  ) |>
  
  # Select only relevant columns
  select(
    company_id, company_name, fy_end_date, fy_year, acct_year, status, country, city, consolidation_code,
    nace_code, icb_code, sic_code,
    total_assets_eur, total_liabilities_eur, total_equity_eur,
    total_revenue_eur, net_sales_eur, gross_sales_eur, net_income_eur,
    ebit_margin_pct, ebitda_margin_pct, roa_pct, roe_pct,
    net_assets_turnover, stock_turnover,
    solvency_pct, gearing_pct, current_ratio, liquidity_ratio,
    shareholders_liq_pct, interest_cover
  )

The table below documents the variables retained for analysis, including standardised variable names, descriptions, data types, and their corresponding original OSIRIS field identifiers.

Column Name	Description	Data Type	Original Name
company_id	Unique company identifier (Bureau van Dijk ID).	character	BvD ID Number (os_id_number)
company_name	Registered company name (label).	character	Company Name (name)
fy_end_date	Company fiscal year-end date for the account.	date	Company Fiscal Year End Date (closdate)
fy_year	Year extracted from fiscal year-end date.	integer	derived from Company Fiscal Year End Date (closdate)
acct_year	Reporting year label carried from file name/field.	integer	year
country	Country of head office.	character	Country (country)
city	City of head office.	character	CITY - City (city)
consolidation_code	Consolidation scope of the accounts (e.g., C1/C2/U1/U2).	character	Consolidation Code (consol_code)
nace_code	NACE core code (EU industry classification).	character	NACE Rev 1, Core Code (cnacecd)
icb_code	ICB industry classification code.	character	Industrial Classification Benchmark (icb)
sic_code	US SIC core code.	character	US SIC, Core Code (csicuscde)
total_assets_eur	Total assets (EUR).	numeric	Total Assets (data13077)
total_liabilities_eur	Total liabilities and debt (EUR).	numeric	Total Liabilities and Debt (data14022)
total_equity_eur	Total shareholders’ equity (EUR).	numeric	Total Shareholders Equity (data14041)
total_revenue_eur	Total revenues / operating revenue (EUR).	numeric	Total revenues (data13004)
net_sales_eur	Net sales (EUR).	numeric	Net sales (data13002)
gross_sales_eur	Gross sales (EUR).	numeric	Gross sales (data13000)
net_income_eur	Net income (EUR).	numeric	Net Income / Starting Line (data15500)
ebit_margin_pct	EBIT margin (% of sales/revenue).	numeric	EBIT Margin (%) (data31055)
ebitda_margin_pct	EBITDA margin (% of sales/revenue).	numeric	EBITDA Margin (%) (data31060)
roa_pct	Return on total assets (%).	numeric	Return on Total Assets (%) (data31015)
roe_pct	Return on equity (%).	numeric	ROE (%) (data31065)
net_assets_turnover	Net assets turnover (times).	numeric	Net Assets Turnover (data31225)
stock_turnover	Stock / inventory turnover (times).	numeric	Stock Turnover (data31220)
solvency_pct	Solvency ratio (%).	numeric	Solvency ratio (%) (data31310)
gearing_pct	Gearing ratio (%).	numeric	Gearing (%) (data31315)
current_ratio	Current ratio (times).	numeric	Current ratio (data31105)
liquidity_ratio	Liquidity / quick ratio (times).	numeric	Liquidity ratio (data31110)
shareholders_liq_pct	Shareholders’ liquidity ratio (%).	numeric	Shareholders Liquidity ratio (data31305)
interest_cover	Interest cover (times).	numeric	Interest Cover (data31115)

The selected variables encompass four key dimensions of corporate financial health:

Identification and Location - company_id, company_name, country, and city provide firm-level identifiers and geographical context.

Balance Sheet Structure - total_assets_eur, total_liabilities_eur, and total_equity_eur measure firm size and capital composition, reported in euros.

Income and Profitability - total_revenue_eur, net_sales_eur, and net_income_eur capture income statement outcomes, while ebit_margin_pct, ebitda_margin_pct, roa_pct, and roe_pct reflect operating performance and returns to shareholders.

Solvency and Liquidity - solvency_pct, gearing_pct, current_ratio, liquidity_ratio, shareholders_liq_pct, and interest_cover assess each firm’s financial stability, leverage, and capacity to meet short-term obligations.
Industry Classification - industry_nace, industry_icb, and industry_group link firms to their respective economic sectors according to European (NACE) and global (ICB) coding systems.

It is important to note that the accounting year reported in OSIRIS reflects a two-year reporting window. For example, the 2018 accounting year represents financial activity spanning 2017–2018, while the 2021 accounting year captures results from 2020–2021.

Throughout this report, each accounting year is interpreted as representing the endpoint of its reporting cycle. Accordingly, 2018–2019 serve as pre-pandemic benchmarks, while 2020–2021 reflect firm performance during the peak pandemic period.

This dataset structure supports robust examination of temporal shifts in corporate performance and facilitates industry-level comparison of financial resilience. In particular, it enables assessment of how firm characteristics—most notably firm size and balance-sheet structure, shaped differential financial outcomes during the COVID-19 shock.

3.1 Research Question

The following research questions guide the analysis and support the primary objective: How did the COVID 19 pandemic affect corporate financial performance in Germany ?

1. How did profitability change across industries ?

This question examines how key profitability measures such as return on assets, EBITDA margins, and operating margins evolved across industries before and during the pandemic. By comparing the pre pandemic period from 2018 to 2019 with the pandemic period from 2020 to 2021, the analysis identifies which sectors experienced improvements, stability, or declines in profitability.

Cross industry differences in profitability provide insights into relative exposure to economic disruption, sectoral resilience, and recovery dynamics. Understanding both the direction and magnitude of profitability changes directly contributes to explaining how the pandemic affected corporate financial outcomes in Germany.

2. How did leverage (debt-to-assets) and liquidity (current ratio) evolve during pandemic ?

This question focuses on changes in firms’ financial structure and short term financial health during the COVID 19 period. Specifically, it analyses trends in leverage, measured by debt related indicators, and liquidity, measured using the current ratio.

Tracking these indicators over time allows assessment of whether firms increased reliance on debt financing, prioritised liquidity preservation, or adjusted balance sheet strategies in response to heightened uncertainty. These dynamics help explain how the pandemic influenced firms’ ability to absorb financial stress and maintain operational continuity.

3. Which German industries showed unexpected financial resilience or vulnerability from 2020 to 2022, and how were these outcomes shaped by profitability, leverage, liquidity, and firm size ?

This question investigates which German industries demonstrated unexpected financial resilience or heightened vulnerability during the pandemic period. It evaluates how profitability, leverage, liquidity, and firm size jointly shaped observed financial outcomes across sectors.

By linking industry level performance to underlying financial characteristics, the analysis identifies structural strengths and weaknesses that influenced firms’ capacity to withstand economic shocks. The findings provide insights into sector specific financial dynamics and offer implications for future crisis preparedness and policy design.

4. How did the fundamental relationship between corporate liquidity and profitability evolve and fracture within industries during the pandemic ?

This question explores how the traditional relationship between liquidity and profitability evolved under pandemic conditions. Financial theory generally suggests that adequate liquidity supports profitability by enabling firms to sustain operations during periods of stress.

During the COVID 19 period, firms faced increased pressure to maintain liquidity without undermining returns. This analysis examines how liquidity, measured by the current ratio, and profitability, measured by return on assets, interacted across industries during the pandemic.

The results reveal whether the pandemic altered the conventional balance between liquidity and profitability and whether this relationship varied across sectors in response to uneven shocks and policy interventions.

4 IDA

This section documents the full Initial Data Analysis conducted before addressing the research questions. The objective is to verify structural integrity, evaluate distributional characteristics, identify data quality risks, and establish methodological transparency. Given the scale and heterogeneity of firm-level financial data, careful validation is essential before any cross-industry or cross-period inference.

The IDA proceeds in four stages:

Data type validation and structural integrity checks
Industry classification construction
Distributional and missingness diagnostics
Question-specific data preparation

Data Validation and Structural Checks

I first standardise numeric and categorical variables to ensure consistency across accounting years and firms. Monetary and ratio variables are coerced into numeric format, and key character fields are cleaned for consistent casing.

Show code

num_cols <- c(
  "total_assets_eur","total_liabilities_eur","total_equity_eur",
  "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur",
  "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
  "net_assets_turnover","stock_turnover",
  "solvency_pct","gearing_pct","current_ratio","liquidity_ratio",
  "shareholders_liq_pct","interest_cover"
)

#symbol formatting/coerce numerics
filtered_germany <- filtered_germany |>
  mutate(
    across(all_of(num_cols) & where(is.character), readr::parse_number),
    country = stringr::str_to_title(country),
    city    = stringr::str_to_title(city)
  )

#missingness report check (overall and by year)
missing_overall <- filtered_germany |>
  summarise(across(everything(), ~ mean(is.na(.)))) |>
  pivot_longer(everything(), names_to = "variable", values_to = "missing_rate")

missing_by_year <- filtered_germany |>
  group_by(acct_year) |>
  summarise(across(everything(), ~ mean(is.na(.)))) |>
  pivot_longer(-acct_year, names_to = "variable", values_to = "missing_rate")

#sanity checks (optional but recommended)
dup_firm_year <- filtered_germany |>
  count(company_id, fy_year, acct_year) |>
  filter(n > 1)

impossible_assets <- filtered_germany |>
  filter(!is.na(total_assets_eur) & total_assets_eur < 0)

consol_levels <- filtered_germany |>
  count(consolidation_code, sort = TRUE)

These checks confirm that:

No duplicate firm-year records exist when using (company_id, fy_year, acct_year) as a composite identifier.
No negative total asset values are detected.
Consolidation codes are internally consistent and retained for later analysis.

At this stage, there is no evidence of structural corruption, key collisions, or invalid balance-sheet magnitudes. The dataset passes initial integrity validation.

Industry Classification Construction

To enable industry-level comparison, I construct a unified industry grouping using NACE Rev. 1 Level-2 codes as the primary classification and ICB sectors as a fallback mechanism.

Show code

#clean 2-digit NACE as integer
filtered_germany <- filtered_germany |>
  mutate(
    nace_num2 = suppressWarnings(as.integer(str_sub(readr::parse_number(as.character(nace_code)), 1, 2)))
  )

#map NACE divisions to broad section names
industry_from_nace <- function(x){
  case_when(
    x %in% 1:3 ~ "Agriculture, Forestry & Fishing",
    x %in% 5:9 ~ "Mining & Quarrying",
    x %in% 10:33 | x %in% 15:37 ~ "Manufacturing",
    x %in% 35 ~ "Electricity, Gas, Steam",
    x %in% 36:39 ~ "Water Supply & Waste",
    x %in% 41:43 ~ "Construction",
    x %in% 45:47 ~ "Wholesale & Retail Trade",
    x %in% 49:53 ~ "Transport & Storage",
    x %in% 55:56 ~ "Accommodation & Food",
    x %in% 58:63 ~ "Information & Communication",
    x %in% 64:66 ~ "Financial & Insurance",
    x %in% 68 ~ "Real Estate",
    x %in% 69:75 ~ "Professional, Scientific & Technical",
    x %in% 77:82 ~ "Administrative & Support",
    x %in% 84 ~ "Public Administration",
    x %in% 85 ~ "Education",
    x %in% 86:88 ~ "Human Health & Social Work",
    x %in% 90:93 ~ "Arts, Entertainment & Recreation",
    x %in% 94:96 ~ "Other Service Activities",
    x %in% 97:98 ~ "Household Activities",
    x %in% 99 ~ "Extraterritorial Organizations",
    TRUE ~ NA_character_
  )
}

#ICB top-level sector labels
icb_map <- c(
  "0001" = "Oil & Gas",
  "1000" = "Basic Materials",
  "2000" = "Industrials",
  "3000" = "Consumer Goods",
  "4000" = "Health Care",
  "5000" = "Consumer Services",
  "6000" = "Telecommunications",
  "7000" = "Utilities",
  "8000" = "Financials",
  "9000" = "Technology"
)

#normalize ICB to a 4-digit “bucket” (example: 2573 to 2000)
normalize_icb <- function(x){
  x_chr <- str_extract(as.character(x), "\\d+")
  ifelse(is.na(x_chr), NA_character_,
         sprintf("%04d", as.integer(floor(as.numeric(x_chr) / 1000) * 1000)))
}

filtered_germany <- filtered_germany |>
  mutate(
    industry_nace = industry_from_nace(nace_num2),
    icb_bucket    = normalize_icb(icb_code),
    industry_icb  = icb_map[icb_bucket],
    industry_group = coalesce(industry_nace, industry_icb, "Other / Unmapped")
  )

#check
unmapped_sample <- filtered_germany |>
  filter(industry_group == "Other / Unmapped") |>
  select(company_id, fy_year, acct_year, nace_code, icb_code) |>
  head(15)

#factor order for plots
ordered_levels <- c(
  "Agriculture, Forestry & Fishing","Mining & Quarrying","Manufacturing",
  "Electricity, Gas, Steam","Water Supply & Waste","Construction",
  "Wholesale & Retail Trade","Transport & Storage","Accommodation & Food",
  "Information & Communication","Financial & Insurance","Real Estate",
  "Professional, Scientific & Technical","Administrative & Support",
  "Public Administration","Education","Human Health & Social Work",
  "Arts, Entertainment & Recreation","Other Service Activities",
  "Household Activities","Extraterritorial Organizations",
  "Oil & Gas","Basic Materials","Industrials","Consumer Goods","Health Care",
  "Consumer Services","Telecommunications","Utilities","Financials","Technology",
  "Other / Unmapped"
)
filtered_germany <- filtered_germany |>
  mutate(industry_group = factor(industry_group, levels = ordered_levels))

The combined NACE and ICB strategy ensures high classification coverage. Only a small subset of firms remain under “Other / Unmapped,” and these are explicitly retained rather than removed to avoid selection bias. The factor ordering guarantees consistent cross-plot comparability.

Reshaping Data for Initial Data Analysis

After validating data integrity and constructing industry classifications, the dataset is reshaped into a long format to support systematic profiling of financial variables. This step separates scale variables (monetary magnitudes) from ratio variables, allowing appropriate transformations and comparisons during visual inspection.

Show code

#using long data
ida_germany <- filtered_germany |>
  select(acct_year, all_of(num_cols)) |>
  pivot_longer(-acct_year, names_to = "variable_orig", values_to = "value") |>
  drop_na()

#split between scale and ratio
scale_vars <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
                "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur")
ratio_vars <- setdiff(unique(ida_germany$variable_orig), scale_vars)

#label mapping for the charts
nice <- c(
  total_assets_eur   = "Total assets (EUR)",
  total_liabilities_eur="Total liabilities (EUR)",
  total_equity_eur   = "Equity (EUR)",
  total_revenue_eur  = "Total revenue (EUR)",
  net_sales_eur      = "Net sales (EUR)",
  gross_sales_eur    = "Gross sales (EUR)",
  net_income_eur     = "Net income (EUR)",
  ebit_margin_pct    = "EBIT margin (%)",
  ebitda_margin_pct  = "EBITDA margin (%)",
  roa_pct            = "ROA (%)",
  roe_pct            = "ROE (%)",
  net_assets_turnover= "Net assets turnover (x)",
  stock_turnover     = "Stock turnover (x)",
  solvency_pct       = "Solvency (%)",
  gearing_pct        = "Gearing (%)",
  current_ratio      = "Current ratio (x)",
  liquidity_ratio    = "Liquidity ratio (x)",
  shareholders_liq_pct = "Shareholders’ liquidity (%)",
  interest_cover     = "Interest cover (x)"
)

pretty_var <- function(x) ifelse(x %in% names(nice), nice[x], x)

ida_germany <- ida_germany |>
  mutate(variable = pretty_var(variable_orig),
         variable = factor(variable, levels = unique(variable)))

This transformation produces a tidy structure that supports faceted visualisation and consistent comparison across accounting years. At this stage, no aggregation or filtering is applied, ensuring that all available firm-level observations contribute to the distributional diagnostics.

Distribution of Scale Variables

Monetary variables are examined using violin and jitter plots on a logarithmic scale. This approach is necessary because firm-level financial data are highly right-skewed, with a small number of very large firms dominating raw distributions.

Show code

#chart for scale variables (log axis)
p_scale <- ida_germany |>
  filter(variable_orig %in% scale_vars) |>
  mutate(y = if_else(value > 0, value, NA_real_)) |>
  ggplot(aes(x = factor(acct_year), y = y)) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4, na.rm = TRUE) +
  geom_jitter(width = 0.12, height = 0, alpha = 0.12, size = 0.6, na.rm = TRUE) +
  stat_summary(fun = median, geom = "point", shape = 95, size = 6, na.rm = TRUE) +
  scale_y_log10(labels = scales::label_number(scale_cut = scales::cut_short_scale())) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(x = "Year", y = NULL, title = "Violin + Jitter by Year (log scale for Monetary Variables)") +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(size = 10),
        strip.text  = element_text(face = "bold"),
        panel.grid.minor = element_blank())

p_scale

The distributions of all monetary variables are heavily right-skewed, confirming that the dataset consists primarily of small to medium-sized firms with a long tail of very large entities. Median values remain relatively stable from 2018 to 2021, though dispersion increases notably in 2020 and 2021.

Net income exhibits the widest relative spread, including both large positive and negative values, indicating heightened earnings volatility during the pandemic period. Because of this skewness and volatility, median and interquartile range (IQR) are more informative than means and are therefore used throughout subsequent analysis.

Distribution of Ratio Variables

Ratio and percentage variables are analysed on their natural (linear) scale, as they are not subject to the same magnitude distortions as monetary values.

Show code

#chart for ratio variables (linear axis)
p_ratio <- ida_germany |>
  filter(variable_orig %in% ratio_vars) |>
  ggplot(aes(x = factor(acct_year), y = value)) +
  geom_violin(trim = FALSE, fill = NA, linewidth = 0.4) +
  geom_jitter(width = 0.12, height = 0, alpha = 0.12, size = 0.6) +
  stat_summary(fun = median, geom = "point", shape = 95, size = 6) +
  facet_wrap(~ variable, scales = "free_y", ncol = 3) +
  labs(x = "Year", y = NULL, title = "Violin + Jitter by Year (Ratios & % on Linear Scale)") +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(size = 10),
        strip.text  = element_text(face = "bold"),
        panel.grid.minor = element_blank())

p_ratio

Profitability ratios (EBIT margin, EBITDA margin, ROA) cluster around small positive values but display wider dispersion in 2020 and 2021. ROE shows the heaviest tails and a substantial number of negative values, reflecting pressure on equity returns during the pandemic.

Liquidity measures remain generally low but exhibit long right tails, indicating that a minority of firms maintain large liquidity buffers. Solvency is comparatively stable, while gearing becomes more dispersed post-2020. These characteristics reinforce the decision to rely on robust statistics rather than mean-based summaries.

Missing Data Diagnostics for Numeric Variables

Before proceeding to modelling or hypothesis testing, missingness is evaluated by industry and year to detect structural gaps.

Show code

key_vars <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
              "total_revenue_eur","net_income_eur",
              "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
              "current_ratio","liquidity_ratio","gearing_pct","solvency_pct")

#by industry
missing_by_industry <- filtered_germany |>
  group_by(industry_group) |>
  summarise(across(all_of(key_vars), ~ mean(is.na(.))), .groups="drop") |>
  pivot_longer(-industry_group, names_to="variable", values_to="missing_rate")

#heatmap by year for numeric variables
missing_heat <- filtered_germany |>
  group_by(acct_year) |>
  summarise(across(all_of(key_vars), ~ mean(is.na(.))), .groups="drop") |>
  pivot_longer(-acct_year, names_to="variable", values_to="missing_rate")

ggplot(missing_heat, aes(variable, factor(acct_year), fill = missing_rate)) +
  geom_tile(color="white") +
  scale_fill_gradient(low="white", high="red", labels=scales::percent) +
  labs(x=NULL, y="Year", fill="Missing", title="Missingness by Year (Numeric Variables)") +
  theme_minimal() + theme(axis.text.x=element_text(angle=45, hjust=1))

Missingness is concentrated in gearing and liquidity ratios (approximately 20–25%), while profitability margins show moderate missingness (10–15%). Balance-sheet totals and revenue variables are nearly complete. Importantly, missingness patterns are stable across years, with no evidence of a pandemic-specific reporting breakdown. This supports the use of controlled, transparent imputation strategies later in the analysis.

Outlier Detection

Extreme values are identified using both IQR fences and z-score thresholds to characterise tail behaviour across variables.

Show code

num_cols <- c("total_assets_eur","total_liabilities_eur","total_equity_eur",
              "total_revenue_eur","net_sales_eur","gross_sales_eur","net_income_eur",
              "ebit_margin_pct","ebitda_margin_pct","roa_pct","roe_pct",
              "net_assets_turnover","stock_turnover",
              "solvency_pct","gearing_pct","current_ratio","liquidity_ratio",
              "shareholders_liq_pct","interest_cover")

qfun <- function(x,p) quantile(x, probs=p, na.rm=TRUE, names=FALSE)

outliers_overall <- purrr::map_dfr(num_cols, function(v){
  x <- filtered_germany[[v]]
  q1 <- qfun(x,.25); q3 <- qfun(x,.75); iqr <- q3-q1
  lo <- q1 - 1.5*iqr; hi <- q3 + 1.5*iqr
  z  <- as.numeric(scale(x))
  tibble::tibble(
    variable=v,
    n=sum(!is.na(x)),
    iqr_outliers=sum(x<lo | x>hi, na.rm=TRUE),
    z_outliers=sum(abs(z)>3, na.rm=TRUE),
    share_iqr=iqr_outliers/n,
    share_z=z_outliers/n
  )
}) |> arrange(desc(share_iqr))

outliers_overall |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)

variable	n	iqr_outliers	z_outliers	share_iqr	share_z
net_income_eur	7643	1888	100	0.25	0.01
ebit_margin_pct	6554	1246	159	0.19	0.02
interest_cover	7089	1297	133	0.18	0.02
total_liabilities_eur	8038	1435	65	0.18	0.01
ebitda_margin_pct	6736	1186	122	0.18	0.02
total_assets_eur	8042	1411	83	0.18	0.01
net_sales_eur	7909	1358	127	0.17	0.02
gross_sales_eur	7909	1358	127	0.17	0.02
shareholders_liq_pct	7912	1343	159	0.17	0.02
total_revenue_eur	8003	1347	131	0.17	0.02
stock_turnover	4859	780	149	0.16	0.03
total_equity_eur	8041	1281	117	0.16	0.01
roe_pct	7717	1114	130	0.14	0.02
roa_pct	7938	972	211	0.12	0.03
current_ratio	7786	929	181	0.12	0.02
liquidity_ratio	6217	725	130	0.12	0.02
gearing_pct	5891	438	152	0.07	0.03
net_assets_turnover	7953	405	94	0.05	0.01
solvency_pct	8018	69	64	0.01	0.01

Net income, EBIT margin, and interest coverage contain the highest proportion of extreme values, reflecting inherent volatility rather than data error. Z-score detection identifies far fewer outliers due to strong skewness, confirming that the IQR rule is more informative in this context. Extreme values are retained for analysis, with visual capping applied only where necessary for interpretability.

Character Variable Consistency

Finally, key identifiers and classification fields are inspected to ensure consistency and completeness.

Show code

#character variables to check
char_vars <- c("company_id","company_name","country","city",
               "consolidation_code","nace_code","icb_code","sic_code")
char_vars <- intersect(char_vars, names(filtered_germany))

#cleaning
chars_clean <- filtered_germany |>
  mutate(
    across(all_of(char_vars),
           ~ .x |> as.character() |> str_squish() |> na_if("")),
    #casing
    country = str_to_title(country),
    city    = str_to_title(city),
    company_name = str_squish(company_name)
  )

#missingness check
char_missing <- chars_clean |>
  summarise(across(all_of(char_vars), ~ mean(is.na(.)))) |>
  tidyr::pivot_longer(everything(), names_to="variable", values_to="missing_rate")

char_missing |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)

variable	missing_rate
company_id	0.00
company_name	0.00
country	0.00
city	0.00
consolidation_code	0.00
nace_code	0.02
icb_code	0.06
sic_code	0.00

Identifiers and locations are complete, while industry codes exhibit limited missingness. The combination of NACE, ICB, and SIC codes ensures high classification coverage without aggressive exclusion.

With the general IDA completed, the dataset is now structurally sound, distributionally understood, and transparently documented. The following sections apply targeted IDA procedures tailored to each research question, ensuring that transformations, imputations, and exclusions are explicitly justified and traceable.

4.1 Question 1

How did profitability change across industries?

Objective

This section examines how firm-level profitability evolved across German industries before and during the COVID-19 pandemic. The focus is on EBIT margin, EBITDA margin, Return on Assets (ROA), and Return on Equity (ROE) between 2018–2019 (pre-pandemic) and 2020–2021 (pandemic).

The IDA here is designed to:

Preserve firm-level variation.
Avoid cross-period leakage.
Minimise imputation.
Maintain full transparency of data provenance.

Period Construction and Profitability Inputs

A binary period indicator is created to distinguish pre-pandemic and pandemic observations. Net income is transformed using the inverse hyperbolic sine transformation to allow meaningful visualisation of both gains and losses.

ROA and ROE are recalculated where missing using accounting identities, and the origin of each value is explicitly recorded.

Show code

stopifnot("net_income_eur" %in% names(filtered_germany))

filtered_germany_1 <- filtered_germany |>
  mutate(
    # period flag
    period = if_else(acct_year >= 2020, "Pandemic (2020–2021)", "Pre-pandemic (2018–2019)"),
    # transform for visuals (handles negatives)
    net_income_eur_asinh = asinh(as.numeric(net_income_eur)),
    # recompute ROA / ROE (fallbacks)
    roa_calc = 100 * (net_income_eur / total_assets_eur),
    roe_calc = if_else(total_equity_eur > 0,
                       100 * (net_income_eur / total_equity_eur),
                       NA_real_),
    roa_use = coalesce(roa_pct, roa_calc),
    roe_use = coalesce(roe_pct, roe_calc),
    roa_src = case_when(
      !is.na(roa_pct) ~ "reported",
      !is.na(roa_calc) ~ "recomputed",
      TRUE ~ NA_character_
    ),
    roe_src = case_when(
      !is.na(roe_pct) ~ "reported",
      !is.na(roe_calc) ~ "recomputed",
      TRUE ~ NA_character_
    )
  )

This approach ensures that profitability metrics are internally consistent and maximises data retention without introducing model-based assumptions.

Within-firm One-gap Filling (Period-restricted)

Before applying any cross-sectional imputation, a conservative within-firm one-gap fill is applied. A missing observation is filled only if the preceding and following values (within the same period) are identical.

This step avoids smoothing genuine shocks while addressing isolated reporting gaps.

Show code

#Firm “one-gap” fill within the same period only

fill_one_gap <- function(x){
  f <- dplyr::lag(x); b <- dplyr::lead(x)
  ifelse(is.na(x) & !is.na(f) & !is.na(b) & (f == b), f, x)
}

filtered_germany_1 <- filtered_germany_1 |>
  arrange(company_id, acct_year) |>
  group_by(company_id, period) |>  #period boundary blocks carryover
  mutate(
    ebit_step1   = fill_one_gap(ebit_margin_pct),
    ebitda_step1 = fill_one_gap(ebitda_margin_pct),
    roa_step1    = fill_one_gap(roa_use),
    roe_step1    = fill_one_gap(roe_use)
  ) |>
  ungroup() |>
  mutate(
    ebit_step1   = coalesce(ebit_margin_pct,   ebit_step1),
    ebitda_step1 = coalesce(ebitda_margin_pct, ebitda_step1),
    roa_step1    = coalesce(roa_use,           roa_step1),
    roe_step1    = coalesce(roe_use,           roe_step1)
  )

No values are carried across the pre-pandemic and pandemic boundary, preserving structural breaks.

Hierarchical Median Imputation and Final Variable Construction

Remaining missing values are addressed using a strict, descending hierarchy:

industry × year
industry × period
period

Each imputed value is assigned a provenance label, ensuring complete auditability.

Show code

#median fills in strict order (no cross-period borrowing)
#industry × year medians
med_ixy <- filtered_germany_1 |>
  group_by(industry_group, acct_year) |>
  summarise(
    med_ebit   = median(ebit_step1,   na.rm=TRUE),
    med_ebitda = median(ebitda_step1, na.rm=TRUE),
    med_roa    = median(roa_step1,    na.rm=TRUE),
    med_roe    = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#industry × period medians
med_ip <- filtered_germany_1 |>
  group_by(industry_group, period) |>
  summarise(
    med_ebit_ip   = median(ebit_step1,   na.rm=TRUE),
    med_ebitda_ip = median(ebitda_step1, na.rm=TRUE),
    med_roa_ip    = median(roa_step1,    na.rm=TRUE),
    med_roe_ip    = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#period medians
med_p <- filtered_germany_1 |>
  group_by(period) |>
  summarise(
    med_ebit_p   = median(ebit_step1,   na.rm=TRUE),
    med_ebitda_p = median(ebitda_step1, na.rm=TRUE),
    med_roa_p    = median(roa_step1,    na.rm=TRUE),
    med_roe_p    = median(roe_step1,    na.rm=TRUE),
    .groups="drop"
  )

#join and impute with precedence, also explicit source flags
filtered_germany_1 <- filtered_germany_1 |>
  left_join(med_ixy, by = c("industry_group","acct_year")) |>
  left_join(med_ip,  by = c("industry_group","period")) |>
  left_join(med_p,   by = "period") |>
  mutate(
    #final values, using your precedence
    ebit_margin_q1   = if_else(is.na(ebitda_step1), ebit_step1, ebit_step1),
    ebit_margin_q1   = if_else(is.na(ebit_step1),   coalesce(med_ebit,   med_ebit_ip,   med_ebit_p),   ebit_step1),
    ebitda_margin_q1 = if_else(is.na(ebitda_step1), coalesce(med_ebitda, med_ebitda_ip, med_ebitda_p), ebitda_step1),
    roa_q1           = if_else(is.na(roa_step1),    coalesce(med_roa,    med_roa_ip,    med_roa_p),    roa_step1),
    roe_q1           = if_else(is.na(roe_step1),    coalesce(med_roe,    med_roe_ip,    med_roe_p),    roe_step1),

    #provenance for EBIT/EBITDA margins
    ebit_src = dplyr::case_when(
      !is.na(ebit_margin_pct) ~ "reported",
      is.na(ebit_margin_pct) & !is.na(ebit_step1) ~ "firm-1gap(period)",
      is.na(ebit_step1) & !is.na(med_ebit) ~ "ind×year",
      is.na(ebit_step1) & is.na(med_ebit) & !is.na(med_ebit_ip) ~ "ind×period",
      TRUE ~ "period"
    ),
    ebitda_src = dplyr::case_when(
      !is.na(ebitda_margin_pct) ~ "reported",
      is.na(ebitda_margin_pct) & !is.na(ebitda_step1) ~ "firm-1gap(period)",
      is.na(ebitda_step1) & !is.na(med_ebitda) ~ "ind×year",
      is.na(ebitda_step1) & is.na(med_ebitda) & !is.na(med_ebitda_ip) ~ "ind×period",
      TRUE ~ "period"
    ),

    #ROA/ROE provenance
    roa_src = dplyr::case_when(
      !is.na(roa_src) ~ roa_src,
      is.na(roa_src) & !is.na(roa_step1) ~ "firm-1gap(period)",
      is.na(roa_step1) & !is.na(med_roa) ~ "ind×year",
      is.na(roa_step1) & is.na(med_roa) & !is.na(med_roa_ip) ~ "ind×period",
      TRUE ~ "period"
    ),
    roe_src = dplyr::case_when(
      !is.na(roe_src) ~ roe_src,
      is.na(roe_src) & !is.na(roe_step1) ~ "firm-1gap(period)",
      is.na(roe_step1) & !is.na(med_roe) ~ "ind×year",
      is.na(roe_step1) & is.na(med_roe) & !is.na(med_roe_ip) ~ "ind×period",
      TRUE ~ "period"
    )
  ) |>
  select(-ends_with("_step1"))

This strategy ensures that imputation is minimal, non-parametric, and economically interpretable.

Imputation Provenance Summary

To assess the scale of imputation, a summary table is produced by period.

Show code

impute_summary_q1 <- filtered_germany_1 |>
  summarise(
    .by = period,
    n_rows = n(),
    ebit_reported   = sum(ebit_src   == "reported",            na.rm=TRUE),
    ebit_firmgap    = sum(ebit_src   == "firm-1gap(period)",   na.rm=TRUE),
    ebit_ind_year   = sum(ebit_src   == "ind×year",            na.rm=TRUE),
    ebit_ind_period = sum(ebit_src   == "ind×period",          na.rm=TRUE),
    ebit_period     = sum(ebit_src   == "period",              na.rm=TRUE),

    ebitda_reported   = sum(ebitda_src   == "reported",          na.rm=TRUE),
    ebitda_firmgap    = sum(ebitda_src   == "firm-1gap(period)", na.rm=TRUE),
    ebitda_ind_year   = sum(ebitda_src   == "ind×year",          na.rm=TRUE),
    ebitda_ind_period = sum(ebitda_src   == "ind×period",        na.rm=TRUE),
    ebitda_period     = sum(ebitda_src   == "period",            na.rm=TRUE),

    roa_reported    = sum(roa_src == "reported",            na.rm=TRUE),
    roa_recomputed  = sum(roa_src == "recomputed",          na.rm=TRUE),
    roa_firmgap     = sum(roa_src == "firm-1gap(period)",   na.rm=TRUE),
    roa_ind_year    = sum(roa_src == "ind×year",            na.rm=TRUE),
    roa_ind_period  = sum(roa_src == "ind×period",          na.rm=TRUE),
    roa_period      = sum(roa_src == "period",              na.rm=TRUE),

    roe_reported    = sum(roe_src == "reported",            na.rm=TRUE),
    roe_recomputed  = sum(roe_src == "recomputed",          na.rm=TRUE),
    roe_firmgap     = sum(roe_src == "firm-1gap(period)",   na.rm=TRUE),
    roe_ind_year    = sum(roe_src == "ind×year",            na.rm=TRUE),
    roe_ind_period  = sum(roe_src == "ind×period",          na.rm=TRUE),
    roe_period      = sum(roe_src == "period",              na.rm=TRUE)
  )

impute_summary_q1 |>
  kable(align = "l", booktabs = TRUE) |>
  kable_styling(full_width = FALSE, font_size = 11)

period	n_rows	ebit_reported	ebit_firmgap	ebit_ind_year	ebit_ind_period	ebit_period	ebitda_reported	ebitda_firmgap	ebitda_ind_year	ebitda_ind_period	ebitda_period	roa_reported	roa_recomputed	roa_firmgap	roa_ind_year	roa_ind_period	roa_period	roe_reported	roe_recomputed	roe_firmgap	roe_ind_year	roe_ind_period	roe_period
Pre-pandemic (2018–2019)	4071	3326	0	745	0	0	3399	0	672	0	0	4008	52	0	11	0	0	3902	16	0	153	0	0
Pandemic (2020–2021)	3975	3228	0	747	0	0	3337	0	638	0	0	3930	43	0	2	0	0	3815	18	0	142	0	0

According to the table, over 80 percent of EBIT and EBITDA margins are directly reported. Furthermore, ROA coverage exceeds 98 percent, with recomputation affecting less than 1 percent. Additionally, ROE imputations remain below 4 percent.Moreover, no cross-period borrowing occurs. Overall, the limited reliance on imputation ensures that observed profitability patterns are data-driven rather than model-driven.

4.2 Question 2

How did leverage and liquidity evolve during the pandemic?

This section examines how German firms adjusted their capital structure and short term financial buffers during the pandemic period. The IDA focuses on active firms observed between 2018 and 2021 and constructs balance sheet ratios that are directly interpretable under conditions of economic stress.

Debt to assets measures the share of total assets financed through liabilities and captures overall leverage exposure. Equity ratio measures the share of assets financed through shareholders equity and provides a complementary view of capital structure.

These derived indicators are analysed alongside OSIRIS reported gearing, solvency, and the current ratio to describe changes in leverage and liquidity over time and across industries.

Derived Indicators and Sample Definition

The analysis is restricted to firms with Active status to ensure that year to year comparisons reflect operating entities rather than exits or administrative records. Debt to assets and equity ratio are constructed directly from accounting totals. Observations with missing debt to assets or current ratio are removed so that leverage and liquidity summaries are computed on a consistent sample across years.

Show code

# Create leverage and liquidity indicators 
lev_liq_data <- filtered_germany |>
  filter(acct_year %in% 2018:2021, status == "Active") |>
  mutate(
    debt_to_assets = total_liabilities_eur / total_assets_eur,
    equity_ratio   = total_equity_eur / total_assets_eur
  ) |>
  select(acct_year, industry_group, company_id, company_name,
         debt_to_assets, equity_ratio, gearing_pct, solvency_pct, current_ratio) |>
  drop_na(debt_to_assets, current_ratio)

This construction preserves the accounting identity that total assets equal liabilities plus equity and avoids conflating leverage and liquidity dynamics through inconsistent denominators.

Year Level Descriptive Summary

Median and interquartile range are reported because leverage and liquidity indicators are highly skewed, particularly when some firms hold unusually high liquidity buffers or carry extreme levels of gearing.

Show code

# Summary statistics by year
lev_liq_summary <- lev_liq_data |>
  group_by(acct_year) |>
  summarise(
    median_debt_assets = median(debt_to_assets, na.rm = TRUE),
    iqr_debt_assets    = IQR(debt_to_assets, na.rm = TRUE),
    median_liquidity   = median(current_ratio, na.rm = TRUE),
    iqr_liquidity      = IQR(current_ratio, na.rm = TRUE),
    median_gearing     = median(gearing_pct, na.rm = TRUE),
    median_solvency    = median(solvency_pct, na.rm = TRUE),
    .groups = "drop"
  )

lev_liq_summary |>
  knitr::kable(digits = 2, caption = "Leverage and Liquidity Summary (2018–2021)") |>
  kableExtra::kable_styling(full_width = FALSE, font_size = 11)

Leverage and Liquidity Summary (2018–2021)
acct_year	median_debt_assets	iqr_debt_assets	median_liquidity	iqr_liquidity	median_gearing	median_solvency
2018	0.50	0.37	1.9	2.6	62	50
2019	0.51	0.38	1.8	2.5	65	49
2020	0.51	0.38	1.8	2.5	67	49
2021	0.51	0.37	1.8	2.4	66	49

From 2018 to 2021, the central tendency of both debt to assets and the current ratio remains broadly stable. This indicates that the typical firm did not experience a structural break in leverage or short term liquidity during the pandemic window.

A gradual increase in median gearing combined with a small decline in solvency is consistent with slightly heavier reliance on debt financing. At the same time, the current ratio remaining close to its pre pandemic level suggests that firms largely preserved their ability to meet short term obligations.

Industry Breakdown Check

This additional profiling step assesses whether stability at the aggregate level conceals sector specific divergence. Medians are retained to ensure robustness to heavy tailed distributions within industries.

Show code

lev_liq_by_industry <- lev_liq_data |>
  group_by(industry_group, acct_year) |>
  summarise(
    median_debt_assets = median(debt_to_assets, na.rm = TRUE),
    median_liquidity   = median(current_ratio, na.rm = TRUE),
    median_gearing     = median(gearing_pct, na.rm = TRUE),
    median_solvency    = median(solvency_pct, na.rm = TRUE),
    n_firms = n_distinct(company_id),
    .groups = "drop"
  )

lev_liq_by_industry |>
  knitr::kable(digits = 2, caption = "Industry Medians for Leverage and Liquidity (2018–2021)") |>
  kableExtra::kable_styling(full_width = FALSE, font_size = 11)

Industry Medians for Leverage and Liquidity (2018–2021)
industry_group	acct_year	median_debt_assets	median_liquidity	median_gearing	median_solvency	n_firms
Manufacturing	2018	0.51	2.09	59.12	49	381
Manufacturing	2019	0.51	1.96	64.44	49	401
Manufacturing	2020	0.52	1.92	66.26	48	411
Manufacturing	2021	0.51	1.93	62.60	49	401
Construction	2018	0.56	1.07	87.29	44	2
Construction	2019	0.62	1.04	98.11	38	2
Construction	2020	0.64	0.98	124.44	36	2
Construction	2021	0.64	0.95	144.11	36	2
Wholesale & Retail Trade	2018	0.62	2.12	116.80	38	10
Wholesale & Retail Trade	2019	0.63	1.60	81.21	37	11
Wholesale & Retail Trade	2020	0.63	1.49	81.21	37	11
Wholesale & Retail Trade	2021	0.60	1.37	87.46	40	10
Transport & Storage	2018	0.48	1.82	49.98	52	69
Transport & Storage	2019	0.46	1.71	49.89	54	70
Transport & Storage	2020	0.50	1.60	63.10	50	71
Transport & Storage	2021	0.52	1.62	75.70	48	71
Accommodation & Food	2018	0.54	2.09	92.08	54	6
Accommodation & Food	2019	0.38	1.79	54.16	62	6
Accommodation & Food	2020	0.37	1.79	46.62	63	3
Accommodation & Food	2021	0.59	1.49	34.90	41	2
Information & Communication	2018	0.65	0.96	130.25	35	29
Information & Communication	2019	0.66	1.00	143.43	34	30
Information & Communication	2020	0.71	1.17	141.51	29	29
Information & Communication	2021	0.70	1.31	100.84	30	29
Financial & Insurance	2018	0.54	1.33	73.53	46	39
Financial & Insurance	2019	0.53	1.43	74.68	47	40
Financial & Insurance	2020	0.50	1.52	76.61	50	43
Financial & Insurance	2021	0.49	1.45	68.93	51	39
Professional, Scientific & Technical	2018	0.51	1.64	73.72	49	191
Professional, Scientific & Technical	2019	0.52	1.66	74.18	48	205
Professional, Scientific & Technical	2020	0.52	1.85	74.58	48	209
Professional, Scientific & Technical	2021	0.50	1.97	70.90	50	201
Administrative & Support	2019	0.66	0.98	45.92	34	1
Administrative & Support	2020	0.59	1.19	33.94	41	1
Administrative & Support	2021	0.43	1.45	14.30	57	2
Education	2018	0.54	1.89	68.28	46	15
Education	2019	0.41	1.94	46.43	59	15
Education	2020	0.50	2.07	36.58	50	15
Education	2021	0.60	1.39	97.82	40	15
Arts, Entertainment & Recreation	2018	0.44	1.16	27.84	57	30
Arts, Entertainment & Recreation	2019	0.48	1.15	31.90	57	31
Arts, Entertainment & Recreation	2020	0.52	0.88	35.50	53	30
Arts, Entertainment & Recreation	2021	0.60	0.74	57.29	43	30
Basic Materials	2018	0.81	0.49	284.90	19	2
Basic Materials	2019	0.62	0.72	29.22	38	2
Basic Materials	2020	0.27	0.88	15.38	72	2
Basic Materials	2021	0.31	0.82	17.42	69	2
Industrials	2018	0.26	2.19	47.01	74	7
Industrials	2019	0.28	1.67	52.88	72	10
Industrials	2020	0.28	1.87	67.94	72	11
Industrials	2021	0.35	2.04	70.39	65	11
Consumer Goods	2021	0.33	0.86	55.89	67	2
Health Care	2018	0.13	1.73	NA	87	2
Health Care	2019	0.11	2.23	NA	89	2
Health Care	2020	0.03	25.58	NA	97	2
Health Care	2021	0.03	29.55	NA	97	2
Consumer Services	2018	0.36	0.80	44.11	64	4
Consumer Services	2019	0.35	0.48	41.58	65	5
Consumer Services	2020	0.32	1.77	28.27	68	3
Consumer Services	2021	0.35	1.77	33.99	65	3
Utilities	2018	0.76	1.14	148.00	24	16
Utilities	2019	0.75	1.25	156.89	25	16
Utilities	2020	0.73	1.10	165.22	27	16
Utilities	2021	0.74	1.07	215.30	26	16
Financials	2018	0.30	2.72	33.14	70	84
Financials	2019	0.31	1.90	39.42	68	83
Financials	2020	0.36	1.97	47.61	64	85
Financials	2021	0.36	2.60	46.89	64	82
Technology	2018	0.14	2.93	14.84	86	6
Technology	2019	0.12	3.06	1.05	88	6
Technology	2020	0.14	2.98	1.05	86	5
Technology	2021	0.08	12.64	0.96	92	4
Other / Unmapped	2018	0.52	1.50	112.50	48	15
Other / Unmapped	2019	0.43	1.55	50.25	57	15
Other / Unmapped	2020	0.44	1.45	17.98	56	15
Other / Unmapped	2021	0.49	1.38	43.98	51	15

This table provides a diagnostic view of whether specific industries adjusted their balance sheet structure or liquidity buffers more sharply than the aggregate profile suggests. It allows sector level deviations to be identified without being dominated by extreme firm level outcomes.

4.3 Question 3

Which German industries showed financial resilience or vulnerability during the pandemic?

This section evaluates industry level resilience and vulnerability during the pandemic period by examining profitability, leverage, liquidity, and firm size for active German firms. The focus is on the years 2020 to 2022, which capture the core pandemic disruption and early recovery phase. The IDA is structured to ensure that observed industry patterns are not driven by data quality issues, extreme values, or inconsistent firm status.

Filter Relevant Years and Active Firms for Pandemic Analysis

Show code

filtered_germany_q3 <- filtered_germany  |> 
  filter(acct_year >= 2020, acct_year <= 2022, status == "Active")

This filtering step isolates the pandemic period to focus on the years most directly affected by COVID-19 related economic disruptions. Restricting the sample to firms with an Active status ensures that the analysis reflects firms that remained operational, avoiding bias from dissolved or inactive entities. This provides a consistent basis for comparing financial resilience and vulnerability across industries.

Summary Checks on Key Financial Variables

The analysis concentrates on four core indicators that jointly capture operational performance, balance sheet pressure, liquidity, and scale.

EBIT margin as a profitability indicator
Gearing ratio as a leverage indicator
Current ratio as a liquidity indicator
Total assets as a proxy for firm size

Show code

options(scipen = 999)

financial_vars <- c("ebit_margin_pct", "gearing_pct", "current_ratio", "total_assets_eur")

summary_stats <- filtered_germany_q3 |>
  summarise(across(
    all_of(financial_vars),
    list(
      missing = ~sum(is.na(.)),
      mean = ~mean(., na.rm = TRUE),
      median = ~median(., na.rm = TRUE),
      sd = ~sd(., na.rm = TRUE),
      min = ~min(., na.rm = TRUE),
      max = ~max(., na.rm = TRUE)
    ),
    .names = "{.col}_{.fn}"
  ))

summary_stats_tidy <- summary_stats |>
  pivot_longer(
    everything(),
    names_to = c("Variable", "Stat"),
    names_pattern = "(.+)_(.+)",
    values_to = "Value"
  ) |>
  pivot_wider(names_from = Stat, values_from = Value)

# Force decimal precision
summary_stats_tidy <- summary_stats_tidy |>
  mutate(across(where(is.numeric), ~round(., 5)))

kable(summary_stats_tidy,
      caption = "Summary Statistics for Key Financial Variables (2020–2022)",
      col.names = c("Variable", "Missing", "Mean", "Median", "SD", "Min", "Max"),
      align = "lrrrrrr",
      escape = FALSE
) |>
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE, font_size = 15) |>
  column_spec(1, width = "5cm") |>
  scroll_box(width = "100%")

Summary Statistics for Key Financial Variables (2020–2022)
Variable	Missing	Mean	Median	SD	Min	Max
ebit_margin_pct	664	3.7	4.6	28.2	-100	100
gearing_pct	959	106.9	65.6	128.1	0	900
current_ratio	114	4.5	1.8	9.5	0	96
total_assets_eur	1	4498176.5	136637.0	25044581.7	13	528609000

The summary statistics confirm that the dataset is largely complete for the pandemic period. EBIT margin and current ratio show moderate dispersion, while total assets exhibit substantial spread, reflecting strong firm size heterogeneity. The presence of extreme minimum and maximum values highlights the importance of robust statistics and motivates formal outlier screening before industry comparisons.

Outlier Detection Using the IQR Method

Show code

outlier_summary <- function(x) {
  q1 <- quantile(x, 0.25, na.rm=TRUE)
  q3 <- quantile(x, 0.75, na.rm=TRUE)
  iqr <- q3 - q1
  lower <- q1 - 1.5 * iqr
  upper <- q3 + 1.5 * iqr
  sum(x < lower | x > upper, na.rm=TRUE)
}
outlier_counts <- filtered_germany_q3  |> 
  summarise(across(all_of(financial_vars), outlier_summary))
kable(outlier_counts, caption = "Count of Outliers for Key Financial Variables (IQR Method)")  |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Count of Outliers for Key Financial Variables (IQR Method)
ebit_margin_pct	gearing_pct	current_ratio	total_assets_eur
531	201	408	625

Outliers are most prevalent in total assets and EBIT margin, consistent with the presence of very large firms and firms experiencing extreme profitability shocks during the pandemic. Gearing and current ratio show fewer extreme values. These results justify the continued use of medians and IQRs in subsequent analysis and caution against relying on mean based comparisons across industries.

Distribution of Missing Data Across Years and Variables

Show code

missing_long <- filtered_germany_q3  |> 
  select(acct_year, all_of(financial_vars))  |> 
  pivot_longer(cols = -acct_year, names_to = "variable", values_to = "value")  |> 
  mutate(missing = is.na(value))  |> 
  group_by(acct_year, variable)  |> 
  summarise(prop_missing = mean(missing), .groups = "drop")

ggplot(missing_long, aes(x = variable, y = prop_missing, fill = factor(acct_year))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Proportion of Missing Values by Variable and Year",
       x = "Variable", y = "Proportion Missing", fill = "Year") +
  theme_minimal()

Missingness remains stable across the pandemic years, with no variable exhibiting a sharp increase in missing data during 2020 or 2021. This indicates that data availability was not materially disrupted by the pandemic and that observed industry patterns are unlikely to be artefacts of differential reporting behaviour.

Industry Code Mapping for Transparency

Show code

nace_codes <- filtered_germany_q3 |> 
  distinct(nace_num2, industry_group)  |> 
  arrange(nace_num2)

kable(nace_codes, caption = "NACE Level 2 Code to Industry Mapping") |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

NACE Level 2 Code to Industry Mapping
nace_num2	industry_group
11	Manufacturing
13	Manufacturing
14	Manufacturing
15	Manufacturing
17	Manufacturing
18	Manufacturing
19	Manufacturing
20	Manufacturing
21	Manufacturing
22	Manufacturing
23	Manufacturing
24	Manufacturing
25	Manufacturing
26	Manufacturing
27	Manufacturing
28	Manufacturing
29	Manufacturing
30	Manufacturing
31	Manufacturing
32	Manufacturing
33	Manufacturing
34	Manufacturing
35	Manufacturing
36	Manufacturing
40	Utilities
40	Other / Unmapped
40	Industrials
40	Consumer Goods
41	Construction
45	Wholesale & Retail Trade
50	Transport & Storage
51	Transport & Storage
52	Transport & Storage
55	Accommodation & Food
60	Information & Communication
61	Information & Communication
62	Information & Communication
63	Information & Communication
64	Financial & Insurance
65	Financial & Insurance
66	Financial & Insurance
67	Financials
67	Other / Unmapped
67	Health Care
67	Consumer Goods
67	Technology
67	Consumer Services
67	Industrials
70	Professional, Scientific & Technical
71	Professional, Scientific & Technical
72	Professional, Scientific & Technical
73	Professional, Scientific & Technical
74	Professional, Scientific & Technical
75	Professional, Scientific & Technical
80	Administrative & Support
85	Education
90	Arts, Entertainment & Recreation
92	Arts, Entertainment & Recreation
NA	Technology
NA	Financials
NA	Basic Materials
NA	Other / Unmapped
NA	Industrials
NA	Consumer Goods

This mapping table documents the classification logic used to group firms into broad industries. Providing this reference improves transparency and allows readers to clearly interpret sector level comparisons used later in the analysis.

4.4 Question 4

How did the relationship between liquidity and profitability evolve during the pandemic?

This section examines the structural relationship between short term liquidity and firm level profitability during the pandemic period. The objective is to evaluate whether the traditional financial expectation that stronger liquidity supports stable profitability remained intact or weakened under crisis conditions.

The analysis focuses on:

Current ratio as the liquidity indicator.
Return on assets as the profitability indicator.
Active firms between 2018 and 2021.

The IDA stage ensures that the relationship is evaluated on clean, consistent, and economically interpretable inputs before correlation or regression analysis.

Construct Liquidity and Profitability Dataset

Show code

q4_data <- filtered_germany |>
  filter(acct_year %in% 2018:2021,
         status == "Active") |>
  mutate(
    roa_calc = 100 * (net_income_eur / total_assets_eur),
    roa_use  = coalesce(roa_pct, roa_calc)
  ) |>
  select(acct_year,
         industry_group,
         company_id,
         current_ratio,
         roa_use) |>
  drop_na(current_ratio, roa_use)

ROA is recalculated when necessary to maximise coverage while preserving accounting consistency. Observations with missing liquidity or profitability values are excluded to ensure the relationship is not distorted by incomplete pairs.

This produces a balanced firm year panel suitable for examining the liquidity profitability linkage.

Summary Statistics by Year

Show code

q4_summary <- q4_data |>
  group_by(acct_year) |>
  summarise(
    median_liquidity = median(current_ratio, na.rm = TRUE),
    median_roa       = median(roa_use, na.rm = TRUE),
    iqr_liquidity    = IQR(current_ratio, na.rm = TRUE),
    iqr_roa          = IQR(roa_use, na.rm = TRUE),
    n_firms          = n_distinct(company_id),
    .groups = "drop"
  )

q4_summary |>
  knitr::kable(
    digits = 3,
    caption = "Liquidity and Profitability Summary (2018–2021)"
  ) |>
  kableExtra::kable_styling(full_width = FALSE)

Liquidity and Profitability Summary (2018–2021)
acct_year	median_liquidity	median_roa	iqr_liquidity	iqr_roa	n_firms
2018	1.9	4.1	2.6	8.9	907
2019	1.8	3.1	2.5	9.1	951
2020	1.8	2.3	2.5	10.0	964
2021	1.8	2.8	2.4	10.2	939

Median liquidity remains relatively stable across years, while profitability dispersion increases during 2020 and 2021. The widening IQR for ROA indicates that firms experienced heterogeneous performance outcomes during the pandemic. This variation provides the necessary dispersion to meaningfully evaluate whether the liquidity profitability relationship changed structurally.

Outlier Screening

Show code

outlier_check_q4 <- function(x) {
  q1 <- quantile(x, 0.25, na.rm = TRUE)
  q3 <- quantile(x, 0.75, na.rm = TRUE)
  iqr <- q3 - q1
  lower <- q1 - 1.5 * iqr
  upper <- q3 + 1.5 * iqr
  sum(x < lower | x > upper, na.rm = TRUE)
}

outlier_counts_q4 <- q4_data |>
  summarise(
    liquidity_outliers = outlier_check_q4(current_ratio),
    roa_outliers       = outlier_check_q4(roa_use)
  )

outlier_counts_q4 |>
  knitr::kable(
    caption = "Outlier Counts for Liquidity and Profitability (IQR Method)"
  ) |>
  kableExtra::kable_styling(full_width = FALSE)

Outlier Counts for Liquidity and Profitability (IQR Method)
liquidity_outliers	roa_outliers
816	893

Liquidity exhibits right tail outliers reflecting firms holding unusually large current asset buffers. ROA shows both extreme positive and negative values, particularly during 2020. These observations are retained because they represent genuine crisis outcomes rather than measurement errors. Median based analysis ensures robustness.

Visual Inspection of the Liquidity Profitability Relationship

Show code

ggplot(q4_data,
       aes(x = current_ratio,
           y = roa_use,
           colour = factor(acct_year))) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE) +
  labs(
    title = "Liquidity and Profitability Relationship (2018–2021)",
    x = "Current Ratio",
    y = "Return on Assets (%)",
    colour = "Year"
  ) +
  theme_minimal()

The scatter distribution confirms that the relationship between liquidity and profitability is non linear and heterogeneous across years. In pre pandemic years, the association appears moderately positive. During 2020, the relationship weakens, with a larger cluster of firms showing adequate liquidity but depressed profitability. By 2021, partial recovery is visible for several industries.

This pattern suggests that the pandemic temporarily disrupted the traditional liquidity profitability linkage, particularly in sectors facing operational shutdowns or demand shocks.

Correlation Diagnostics by Year

Show code

correlation_by_year <- q4_data |>
  group_by(acct_year) |>
  summarise(
    correlation = cor(current_ratio, roa_use, use = "complete.obs"),
    .groups = "drop"
  )

correlation_by_year |>
  knitr::kable(
    digits = 3,
    caption = "Correlation Between Liquidity and Profitability by Year"
  ) |>
  kableExtra::kable_styling(full_width = FALSE)

Correlation Between Liquidity and Profitability by Year
acct_year	correlation
2018	0.025
2019	0.016
2020	0.036
2021	0.040

Correlation results confirm that the liquidity profitability association weakens during 2020 relative to pre pandemic years. This supports the hypothesis that crisis conditions altered the fundamental financial relationship within industries. The rebound in 2021 suggests partial restoration of financial normality, though dispersion remains elevated.

Show code

# Construct tsibble for industry-level time series analysis
germany_ts <- filtered_germany |>
  filter(
    acct_year %in% 2017:2021,
    status == "Active"
  ) |>
  mutate(
    fy_year = as.integer(fy_year)
  ) |>
  as_tsibble(
    key = c(industry_group, company_id),
    index = fy_year,
    validate = FALSE
  )

5 EDA

This section summarises the exploratory analysis used to answer the primary question: “How did the pandemic affect corporate financials in Germany ?” The focus is on identifying distribution shifts, sector level differences, and changes in dispersion and downside risk using robust summary statistics and visual diagnostics.

5.1 Question 1

1. How did profitability change across industries?

This first step compares profitability before and during the pandemic using two complementary views. The first view evaluates how the overall distribution of profitability changed across firms. The second view focuses on sector level shifts using industry medians to reduce sensitivity to extreme observations.

This code recodes years into two periods, Pre (2018 to 2019) and Pandemic (2020 to 2021), reshapes five profitability measures (EBIT margin, EBITDA margin, ROA, ROE, net income) into long format, and produces violin plots with jittered firm points plus median and interquartile range overlays. Net income is shown using an asinh transform so that negative values remain visible while still compressing extreme magnitudes.

Show code

#recode period to short labels
filtered_germany_1 <- filtered_germany_1 |>
  mutate(period = if_else(acct_year >= 2020, "Pandemic (2020–21)", "Pre (2018–19)"),
         period = factor(period, levels = c("Pre (2018–19)", "Pandemic (2020–21)")))

#build long data
q1_vars <- c("ebit_margin_q1","ebitda_margin_q1","roa_q1","roe_q1","net_income_eur_asinh")

plot_long <- filtered_germany_1 |>
  dplyr::select(company_id, period, dplyr::all_of(q1_vars)) |>
  tidyr::pivot_longer(-c(company_id, period),
                      names_to = "metric", values_to = "value") |>
  dplyr::mutate(
    metric = factor(metric, levels = q1_vars,
                    labels = c("EBIT margin (%)","EBITDA margin (%)",
                               "ROA (%)","ROE (%)","Net income (asinh)"))
  ) |>
  tidyr::drop_na(value)

iqr_bar <- function(y) data.frame(
  y    = median(y, na.rm=TRUE),
  ymin = quantile(y, 0.25, na.rm=TRUE),
  ymax = quantile(y, 0.75, na.rm=TRUE)
)

p_q1_all <- ggplot(plot_long, aes(x = period, y = value)) +
  geom_violin(trim = FALSE, fill = NA, color = "grey60", linewidth = 0.45, show.legend = FALSE) +
  geom_jitter(aes(color = "Data"), width = 0.08, height = 0, alpha = 0.10, size = 0.55, show.legend = TRUE) +
  stat_summary(aes(color = "IQR"), fun.data = iqr_bar, geom = "errorbar",
               width = 0.12, linewidth = 0.6, show.legend = TRUE) +
  stat_summary(aes(color = "Median"), fun = median, geom = "point",
               shape = 95, size = 8, show.legend = TRUE) +
  facet_wrap(~ metric, scales = "free_y", ncol = 2) +
  scale_x_discrete(labels = c(
    `Pre (2018–19)`      = "Pre\n(2018 – 2019)",
    `Pandemic (2020–21)` = "Pandemic\n(2020 – 2021)"
  )) +
  scale_color_manual(
    name = NULL,
    values = c("Data" = "grey35", "IQR" = "firebrick", "Median" = "black")
  ) +
  guides(color = guide_legend(override.aes = list(
    shape    = c(16, NA, 95),   # point for Data, line for IQR, thick tick for Median
    linetype = c(0, 1, 0),
    size     = c(2, 1, 6),
    alpha    = 1
  ))) +
  coord_cartesian(clip = "off") +
  labs(x = NULL, y = NULL,
       title = "Pre vs Pandemic: Profitability Distributions by Metric\n(Median + IQR)") +
  theme_minimal(base_size = 11) +
  theme(
    strip.text  = element_text(face = "bold"),
    axis.text.x = element_text(size = 10, margin = margin(t = 8), lineheight = 0.95),
    plot.margin = margin(10, 15, 24, 10),
    legend.position = "top"
  )

p_q1_all

Across EBIT margin and EBITDA margin, the medians remain broadly similar between the two periods, suggesting that the typical firm did not experience a large shift in central operating profitability. However, dispersion increases during 2020 to 2021. The violins widen and the interquartile ranges expand, indicating a broader spread of firm outcomes and more uneven operating performance.

ROA and especially ROE display more extreme negative values during 2020 to 2021, pointing to a larger share of firms experiencing losses, asset impairments, or equity erosion. Even where the median stays close to zero, the heavier downside tail indicates elevated downside risk.

Net income (asinh) becomes more dispersed in 2020 to 2021, with a thicker lower tail. This supports the conclusion that volatility increased during the pandemic period even when median outcomes appeared stable.

Overall, the key pattern is stability in the middle of the distribution alongside clear deterioration in the tails, which is consistent with increased heterogeneity in firm level performance during the pandemic.

This code then focuses on sector level shifts. It keeps only mapped industries, calculates industry medians for EBITDA margin in the Pre and Pandemic periods, and visualises the change using a dumbbell chart. Using medians provides a robust sector comparison that is less influenced by outliers.

Show code

#keep only mapped industry
germ_mapped <- filtered_germany_1 |>
  filter(!is.na(ebitda_margin_q1),
         !is.na(industry_group),
         industry_group != "Other / Unmapped") |>
  droplevels()

#build medians & delta (Pandemic – Pre)
eb_med <- germ_mapped |>
  mutate(period = if_else(acct_year >= 2020, "Pandemic", "Pre"),
         period = factor(period, levels = c("Pre","Pandemic"))) |>
  group_by(industry_group, period) |>
  summarise(med = median(ebitda_margin_q1, na.rm = TRUE), .groups = "drop") |>
  pivot_wider(names_from = period, values_from = med) |>
  drop_na(Pre, Pandemic) |> #keep industries with both periods
  mutate(
    delta     = Pandemic - Pre,
    direction = factor(if_else(delta >= 0, "Increase", "Decrease"),
                       levels = c("Increase","Decrease"))
  ) |>
  arrange(delta)

#create and colored dumbbell
ggplot(eb_med, aes(y = reorder(industry_group, delta))) +
  geom_segment(aes(x = Pre, xend = Pandemic, yend = industry_group, color = direction),
               linewidth = 1) +
  geom_point(aes(x = Pre),      color = "#4E79A7", size = 2) +  #pre (blue)
  geom_point(aes(x = Pandemic), color = "#E15759", size = 2) +  #pandemic (red)
  geom_vline(xintercept = 0, linetype = "dashed", colour = "grey70") +
  scale_color_manual(values = c("Increase" = "#2CA02C", "Decrease" = "#D62728")) +
  guides(color = guide_legend(title = NULL, override.aes = list(linewidth = 3))) +
  labs(title = "Change in Median EBITDA Margin\n(Pandemic – Pre)",
       x = "Median EBITDA Margin (%)", y = NULL) +
  theme_minimal(base_size = 11)

The dumbbell chart compares the median EBITDA margin for each industry in the two periods and highlights both direction and magnitude of change. Most industries show relatively small shifts, reinforcing the earlier result that typical profitability did not uniformly collapse.

A small number of sectors stand out with larger movements. Accommodation and Food shows the most visible improvement, moving from slightly negative to positive median EBITDA margins, while Technology also increases. The largest deterioration appears in Health Care, where the median falls sharply relative to the pre period.

Several cyclical sectors such as Industrials, Construction, Arts and Entertainment, and Financials show small to moderate declines. Utilities and Education remain close to flat.

Taken together, the sector view aligns with the distribution view. Median profitability often moved modestly, but the pandemic period amplified differences across sectors and increased the risk of poor outcomes for specific parts of the market.

5.2 Question 2

2. How did leverage (debt-to-assets) and liquidity (current ratio) evolve during pandemic?

This question examines whether German firms altered their capital structure or short-term liquidity positions in response to the COVID-19 shock.The analysis proceeds in three parts:

Time trends of leverage and liquidity

Show code

lev_liq_long <- lev_liq_summary |>
  select(acct_year, median_debt_assets, median_liquidity) |>
  pivot_longer(-acct_year, names_to = "indicator", values_to = "value")

# Debt-to-Assets panel — y-axis ticks at 0.50 and 0.51 only
p_debt <- lev_liq_long |>
  filter(indicator == "median_debt_assets") |>
  ggplot(aes(acct_year, value)) +
  geom_line(colour = "#E69F00", linewidth = 1.2) +
  geom_point(colour = "#E69F00", size = 3) +
  scale_y_continuous(
    limits = c(0.495, 0.515),
    breaks = c(0.50, 0.51),
    labels = scales::number_format(accuracy = 0.01)
  ) +
  labs(x = NULL, y = "Median Value", title = "Debt-to-Assets") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"),
        panel.grid.minor = element_blank())

# Current Ratio panel — its own sensible scale
p_curr <- lev_liq_long |>
  filter(indicator == "median_liquidity") |>
  ggplot(aes(acct_year, value)) +
  geom_line(colour = "#0072B2", linewidth = 1.2) +
  geom_point(colour = "#0072B2", size = 3) +
  scale_y_continuous(labels = scales::number_format(accuracy = 0.01)) +
  labs(x = "Accounting Year", y = "Median Value", title = "Current Ratio") +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold"),
        panel.grid.minor = element_blank())

# place side-by-side with a common title
p_debt + p_curr + plot_annotation(title = "Trends in Leverage and Liquidity, 2018–2021") &
  theme(plot.title = element_text(face = "bold", size = 15))

The plots show that liquidity (current ratio) remained broadly stable from 2018 to 2021, fluctuating only slightly around 1.8 . Meanwhile, leverage (debt-to-assets) rose modestly from 0.50 to 0.51, peaking in 2020 before easing slightly in 2021. This pattern suggests that while firms marginally increased their reliance on debt financing during the pandemic, they continued to maintain sufficient short-term liquidity. Overall, German firms demonstrated a balanced financial position, reflecting resilience and prudence in managing both debt and liquidity pressures through the COVID-19 period.

Leverage vs Solvency (are highly levered firms less solvent?)

Show code

lev_liq_plot <- lev_liq_data |>
  filter(is.finite(debt_to_assets),
         debt_to_assets >= 0, debt_to_assets <= 1.5,   # 0–150%
         solvency_pct >= -50, solvency_pct <= 100)

ggplot(lev_liq_plot, aes(x = debt_to_assets, y = solvency_pct)) +
  geom_point(alpha = 0.35, colour = "#0072B2") +
  geom_smooth(method = "lm", se = FALSE, colour = "#E69", linewidth = 0.9) +
  geom_hline(yintercept = 50, linetype = "dotted", colour = "grey") +
  facet_wrap(~ acct_year, ncol = 2) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 1.5)) +
  scale_y_continuous(limits = c(-50, 100)) +
  labs(
    title = "Leverage–Solvency Relationship by Year",
    subtitle = "Filtered to 0–150% debt-to-assets to avoid scale distortion from extreme values",
    x = "Debt-to-Assets",
    y = "Solvency Ratio (%)"
  ) +
  theme_minimal(base_size = 13) +
  theme(strip.text = element_text(face = "bold"),
        plot.title = element_text(face = "bold"))

The scatterplots show a strong and consistent negative relationship between leverage (debt-to-assets) and solvency (ability to meet long-term obligations) across all years from 2018 to 2021. Each panel demonstrates a clear downward slope: as firms take on more debt, solvency ratios decline. This pattern holds consistently over time, implying that highly leveraged firms are structurally less solvent, and that the pandemic did not materially alter this trade-off. The persistence of this relationship suggests that balance-sheet fragility scales proportionally with debt exposure, independent of year-specific shocks.

Debt-to-assets by industry, faceted by year

Show code

lev_liq_industry <- lev_liq_data |>
  filter(debt_to_assets >= 0, debt_to_assets <= 1.5) |>
  drop_na(industry_group)

ggplot(lev_liq_industry, aes(x = debt_to_assets, y = reorder(industry_group, debt_to_assets), fill = factor(acct_year))) +
  geom_boxplot(outlier.shape = NA, alpha = 0.7, width = 0.6) +
  facet_wrap(~ acct_year, ncol = 2, scales = "free_x") +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1), limits = c(0, 1.5)) +
  scale_fill_brewer(palette = "Set2", name = "Accounting Year") +
  labs(
    title = "Distribution of Leverage (Debt-to-Assets) by Industry, 2018–2021",
    subtitle = "Faceted view highlights stability in industry leverage patterns through the pandemic period",
    x = "Debt-to-Assets Ratio",
    y = NULL
  ) +
  theme_minimal(base_size = 13) +
  theme(
    strip.text = element_text(face = "bold", size = 13),
    plot.title = element_text(face = "bold", size = 17, hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5),
    legend.position = "top",
    axis.text.y = element_text(size = 10),
    panel.grid.minor = element_blank(),
    panel.spacing.y = unit(1, "lines")  # increases vertical space between facets
  )

According to the faceted boxplot, the distribution of leverage (debt-to-assets) across industries remained broadly stable from 2018 to 2021. Capital-intensive sectors such as Manufacturing, Construction, and Transport & Storage consistently exhibit higher leverage, while service-based industries like Information & Communication and Professional & Technical Services maintain comparatively lower ratios. The slight widening of the distributions in 2020–2021 suggests that some firms increased borrowing during the pandemic to support liquidity, while others remained conservative. Overall, the median leverage levels changed little across years, indicating that industry structure rather than pandemic-specific effects primarily determined leverage patterns in German firms.

5.3 Question 3

3. Which German industries showed unexpected financial resilience or vulnerability from 2020 to 2022, and how were these outcomes shaped by profitability, leverage, liquidity, and firm size?

Summarise financial indicators and compute resilience score

Show code

industry_summary <- filtered_germany |> 
  filter(acct_year >= 2020, acct_year <= 2022)  |> 
  group_by(nace_num2, acct_year)  |> 
  summarise(
    median_profitability = median(ebit_margin_pct, na.rm = TRUE),
    median_leverage = median(gearing_pct, na.rm = TRUE),
    median_liquidity = median(current_ratio, na.rm = TRUE),
    median_firm_size = median(total_assets_eur, na.rm = TRUE),
    .groups = "drop"
  )  |> 
  mutate(resilience_score = median_profitability / median_leverage * median_liquidity)

This step constructs industry level summaries for key financial indicators during the pandemic period from 2020 to 2022. Median profitability, leverage, liquidity, and firm size are calculated for each industry and year to provide a robust representation of typical firms while limiting the influence of extreme observations.

A composite resilience score is then computed by combining profitability and liquidity while scaling by leverage. Higher scores reflect industries that maintained stronger operating performance and liquidity buffers relative to their debt exposure. This metric provides a consistent basis for comparing relative financial resilience and vulnerability across industries during the pandemic. The resilience score is a heuristic composite indicator combining operating performance and liquidity, scaled by leverage exposure. It does not imply causality but provides a relative ranking tool for cross-industry comparison.

Visualize Overall Financial Resilience with Bubble Chart

Show code

p <- ggplot(industry_summary, aes(
  x = median_profitability,
  y = median_leverage,
  color = median_liquidity,
  text = paste(
    "NACE:", nace_num2,
    "<br>Profitability:", round(median_profitability, 2),
    "<br>Leverage:", round(median_leverage, 2),
    "<br>Liquidity:", round(median_liquidity, 2)
  )
)) +
  geom_point(size = 4, alpha = 0.85, shape = 21, stroke = 0.7) +
  # Use a richer color gradient here:
  scale_color_gradientn(colors = turbo(20)) +  
  labs(
    title = "German Industry Profitability vs Leverage (2020–2022)",
    subtitle = "Color indicates median liquidity by industry; axes limited for interpretability",
    x = "Median EBIT Margin (%)",
    y = "Median Gearing (%)",
    color = "Liquidity"
  ) +
  xlim(-10, 20) +
  ylim(0, 250) +
  theme_bw(base_size = 15) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5, margin = margin(b = 6)),
    plot.subtitle = element_text(size = 12, hjust = 0.5, margin = margin(b = 10)),
    axis.title = element_text(face = "bold"),
    axis.text = element_text(color = "black"),
    panel.grid.major = element_line(color = "grey85"),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(color = "black", fill = NA, size = 1),
    legend.position = "right",
    legend.title = element_text(face = "bold", size = 10),
    legend.text = element_text(size = 11),
    plot.margin = margin(16, 16, 16, 16)
  )

# Interactive plot with more gradient colours
ggplotly(p, tooltip = "text")

This bubble chart positions industries according to median profitability and leverage, with colour indicating median liquidity. Industries located in the upper left region combine higher profitability with lower leverage and tend to exhibit stronger financial resilience. In contrast, industries with high leverage and weak profitability appear more financially vulnerable.

Liquidity further differentiates industries with similar leverage profiles, highlighting which sectors maintained stronger short term buffers during the pandemic. The chart provides an integrated visual assessment of financial resilience across industries rather than relying on a single metric.

Top and bottom 10 industries by resilience score

Show code

resilience_summary <- industry_summary  |> 
  group_by(nace_num2)  |> 
  summarise(avg_resilience = mean(resilience_score, na.rm = TRUE))  |> 
  arrange(desc(avg_resilience))

top_bottom_resilience <- resilience_summary  |> 
  slice_head(n = 10) |> 
  bind_rows(resilience_summary |> slice_tail(n = 10))

ggplot(top_bottom_resilience, aes(
  x = reorder(nace_num2, avg_resilience),
  y = avg_resilience,
  fill = avg_resilience
)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  scale_fill_viridis_c(option = "plasma", direction = -1) +
  labs(
    title = "Top and Bottom 10 Financially Resilient German Industries (2020–2022)",
    subtitle = "Industries ranked by average resilience scores",
    x = "Industry (NACE Level 2)",
    y = "Average Resilience Score"
  ) +
  theme_bw(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11)
  )

This bar chart ranks industries by their average resilience score across the pandemic years. Industries at the top of the ranking exhibit stronger combinations of profitability, moderate leverage, and higher liquidity, while those at the bottom display weaker financial profiles.

The visualisation highlights substantial heterogeneity in financial resilience across sectors, indicating that the pandemic impact was not uniform. Some industries were able to adapt and maintain financial stability, while others experienced persistent vulnerability.

Table of top and bottom 5 industries

Show code

top_bottom_resilience |> 
  mutate(rank = rank(-avg_resilience))  |> 
  filter(rank <= 5 | rank > max(rank) - 5)  |> 
  arrange(rank)  |> 
  rename(
    `NACE Code` = nace_num2,
    `Average Resilience Score` = avg_resilience,
    `Rank` = rank
  )  |> 
  kable(
    caption = "Top and Bottom 5 German Industries by Financial Resilience (2020–2022)",
    digits = 2,
    col.names = c("NACE Code", "Average Resilience Score", "Rank")
  )  |> 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 12
  )  |> 
  row_spec(0, bold = TRUE, background = "#f2f2f2")  |> 
  column_spec(2, color = "darkgreen") |> 
  column_spec(3, bold = TRUE, color = "darkblue")

Top and Bottom 5 German Industries by Financial Resilience (2020–2022)
NACE Code	Average Resilience Score	Rank
23	3.90	1
75	1.66	2
66	1.21	3
73	0.61	4
67	0.44	5
NA	-0.30	16
18	-0.43	17
55	-0.53	18
80	-1.02	19
93	NaN	20

This table provides a concise comparison of the most and least resilient industries during the pandemic period. Industries ranked at the top combine stronger profitability and liquidity with manageable leverage, while bottom ranked industries reflect weaker operating performance or higher financial risk.

Presenting exact scores and ranks complements the visual analysis and provides a clear reference for identifying sectors that were relatively robust or fragile under pandemic conditions.

Trends in median EBIT margin by key German industries

Show code

key_industries <- c("Health Care", "Manufacturing", "Accommodation & Food", "Information & Communication")

plot_data <- filtered_germany  |> 
  filter(industry_group %in% key_industries, acct_year %in% 2018:2021)  |> 
  group_by(industry_group, acct_year)  |> 
  summarise(median_ebit = median(ebit_margin_pct, na.rm = TRUE), .groups = "drop")

ggplot(plot_data, aes(x = acct_year, y = median_ebit, color = industry_group)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  geom_text_repel(
    data = plot_data |> 
      filter(acct_year == max(acct_year)),
    aes(label = industry_group),
    nudge_x = 0.3,
    direction = "y",
    hjust = 0,
    segment.size = 0.25,
    show.legend = FALSE,
    size = 4
  ) +
  scale_x_continuous(breaks = 2018:2021, limits = c(2018, 2022)) +
  scale_color_brewer(palette = "Dark2") +
  labs(
    title = "Trends in Median EBIT Margin by Key German Industries (2018–2021)",
    x = "Accounting Year",
    y = "Median EBIT Margin (%)",
    color = "Industry"
  ) +
  theme_bw(base_size = 10) +
  theme(
    plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(b = 10)),
    axis.text = element_text(color = "black"),
    axis.title = element_text(face = "bold"),
    legend.position = "none",
    panel.grid.minor = element_blank(),
    plot.margin = margin(t = 20, r = 20, b = 10, l = 10)
  )

This line chart tracks median EBIT margin trends for selected key industries before and during the pandemic. It illustrates how profitability evolved across sectors with differing exposure to pandemic related disruptions.

Industries such as Health Care and Information and Communication show comparatively stable or recovering margins, while others display sharper declines during 2020 and 2021. Using medians ensures that the trends reflect typical firms rather than being driven by extreme outcomes. The chart reinforces the sector specific nature of financial resilience observed in the broader analysis.

5.4 Question 4

4. How did the fundamental relationship between corporate liquidity and profitability evolve and fracture within industries during the pandemic?

Industry level trends in ROA and liquidity (2017 to 2021)

This section provides a high level view of how profitability and short term liquidity moved across industries before and during the pandemic period. Average ROA and the current ratio are calculated for each industry and visualised as interactive time series. The vertical marker at 2020 is used as the reference point for the COVID shock.

Rather than focusing on one industry at a time, the goal is to identify broad patterns industries with stable trajectories, industries with a clear break around 2020, and industries with unusually volatile liquidity movements.

Show code

# Calculate the average ROA and current ratio for each industry each year
industry_ts <- germany_ts |>
  group_by(industry_group) |>
  index_by(fy_year) |>
  summarise(
    industry_roa = mean(roa_pct, na.rm = TRUE),
    industry_cr = mean(current_ratio, na.rm = TRUE),
    n_companies = n()  
  )

Show code

# ROA
roa_interactive <- industry_ts |>
  ggplot(aes(x = fy_year, y = industry_roa, group = industry_group)) +
  geom_line(aes(color = industry_group), size = 0.3, alpha = 0.7) +
  geom_vline(xintercept = 2020, linetype = "dashed", color = "red", size = 0.5) +
  labs(title = "Interactive ROA Trends by Industry (2017-2021)<br><sub><i>Vertical line indicates pandemic start (2020)<br><sub><i>",
       x = "Fiscal Year", y = "ROA (%)") +
  theme_minimal() +
  theme(legend.position = "none") 

# Plotly interactive charts -ROA
roa_plotly <- 
  ggplotly(roa_interactive, tooltip = c("industry_group", "fy_year", "industry_roa")) 


# Current ratio
cr_interactive <- industry_ts |>
  ggplot(aes(x = fy_year, y = industry_cr, group = industry_group)) +
  geom_line(aes(color = industry_group), size = 0.3, alpha = 0.7) +
  geom_vline(xintercept = 2020, linetype = "dashed", color = "red", size = 0.5) +
  labs(title = "Interactive Current Ratio Trends by Industry (2017-2021)<br><sub><i>Vertical line indicates pandemic start (2020) - Hover for details<br><sub><i>",
       x = "Fiscal Year", y = "Current Ratio") +
  theme_minimal() +
  theme(legend.position = "none")

cr_plotly <- 
  ggplotly(cr_interactive, tooltip = c("industry_group", "fy_year", "industry_cr")) 

# Combine plots with controlled layout
combined_plot <- subplot(
  roa_plotly, cr_plotly,
  nrows = 2,
  shareX = TRUE,  
  titleY = TRUE  
) |>
  plotly::layout(
    title = "Industry Financial Trends (2017-2021)<br><sub>Vertical line indicates pandemic start (2020)</sub>",
    hoverlabel = list(bgcolor = "white", font = list(size = 10)))

combined_plot

Figure 4: The interactive plot of ROA and Current Ratio trends by industry (2017-2021)

The ROA and current ratio series show that industries did not move uniformly through the pandemic window. Several industries display visible changes in level or volatility around 2020, while others remain comparatively stable. This motivates a more structured summary of trend disruption and volatility across industries, followed by a firm level analysis of how the liquidity profitability relationship behaved inside each sector.

Macro level evidence using time series features

This section converts each industry series into compact features that capture two dimensions that matter in a shock period.

Trend strength reflects how stable and directional the underlying movement is over time. Lower values indicate a weaker or broken trend pattern. Spikiness captures abrupt movements and volatility. Higher values indicate more irregular behaviour.

Show code

# Calculating time series features
industry_features <- industry_ts |>
  features(industry_roa, feat_stl) |>
  rename_with(~ paste0("roa_", .), -industry_group) |>
  left_join(
    industry_ts |>
      features(industry_cr, features = feat_stl) |>
      rename_with(~ paste0("cr_", .), -industry_group),
    by = "industry_group"
  )

# Focus on trend_strength and spikiness
focus_features <- industry_features |>
  select(industry_group, 
         roa_trend_strength, roa_spikiness,
         cr_trend_strength, cr_spikiness)

The combined plots separate industries into different recovery profiles. Industries with weak trend strength in both ROA and liquidity indicate disrupted financial fundamentals, while industries with high spikiness reflect instability even when the overall direction remains intact. This provides a macro level screen for sectors that warrant closer firm level investigation.

Show code

# Trend strength analysis to identify the hardest-hit sectors
trend_analysis <- focus_features |>
  mutate(
    # Identify sectors where trends are broken (low trend strength)
    roa_trend_disrupted = roa_trend_strength < 0.3,
    cr_trend_disrupted = cr_trend_strength < 0.3,
    overall_trend_impact = (roa_trend_strength + cr_trend_strength) / 2
  )


# Volatility analysis to identify the sectors with the greatest volatility during the pandemic
volatility_analysis <- focus_features |>
  mutate(
    high_roa_volatility = roa_spikiness > median(roa_spikiness),
    high_cr_volatility = cr_spikiness > median(cr_spikiness),
    overall_volatility = (scale(roa_spikiness) + scale(cr_spikiness)) / 2)



# Create trend strength plot
trend_plot <- focus_features |>
  ggplot(aes(x = roa_trend_strength, y = cr_trend_strength)) +
  geom_point(aes(size = roa_spikiness, 
                 color = industry_group, 
                 text = industry_group), alpha = 0.7) +
  geom_hline(yintercept = 0.3, linetype = "dashed", color = "red", alpha = 0.7) +
  geom_vline(xintercept = 0.3, linetype = "dashed", color = "red", alpha = 0.7) +
   labs(title = "Impact of Pandemic on Industry Trend Strength<br><sub><i>Red dashed lines indicate trend strength threshold (0.3)<br><sub><i>",
       x = "ROA Trend Strength", y = "Current Ratio Trend Strength",
       size = "ROA Spikiness") +
  theme_minimal() +
  theme(legend.position = "none")

# Convert to interactive plotly
trend_in <- ggplotly(trend_plot, tooltip = c("text", "x", "y", "size")) 


# Create volatility plot
volatility_plot <- focus_features |>
  ggplot(aes(x = roa_spikiness, y = cr_spikiness)) +
  geom_point(aes(color = industry_group, 
                 size = roa_trend_strength, 
                 text = industry_group), alpha = 0.7) +
  geom_hline(yintercept = median(focus_features$cr_spikiness), 
             linetype = "dashed", color = "blue", alpha = 0.5) +
  geom_vline(xintercept = median(focus_features$roa_spikiness), 
             linetype = "dashed", color = "blue", alpha = 0.5) +
  coord_cartesian(xlim = c(0, quantile(focus_features$roa_spikiness, 0.90))) + # Limit to 75% quantile
  labs(title = "Industry Volatility Analysis During Pandemic<br><sub><i>Blue dashed lines represent medians, excluding the extreme values<br><sub><i>",
       x = "ROA Spikiness", y = "Current Ratio Spikiness",
       size = "ROA Trend Strength") +
  theme_minimal() +
  theme(legend.position = "none")


volatility_in <- ggplotly(volatility_plot, tooltip = c("text", "x", "y", "size"))



combined_analysis <- subplot(
  trend_in, volatility_in,
  nrows = 1,
  shareY = FALSE,
  titleX = TRUE,
  titleY = TRUE,
  margin = 0.05 
) |>
  plotly::layout(
    title = list(
      text = "Industry Financial Impact Analysis During Pandemic<br><sub>Left: Trend strength disruption | Right: Volatility patterns</sub>",
      font = list(size = 14, color = "black"),
      x = 0.5,
      y = 0.95
    ),
    hoverlabel = list(
      bgcolor = "white", 
      font = list(size = 11, color = "black"),
      bordercolor = "lightgray"
    ),
    margin = list(l = 60, r = 60, b = 80),
    hovermode = "closest",
    annotations = list(
      list(
        x = 0, y = -0.15,
        xref = "paper", yref = "paper",
        text = "Red lines: 0.3 threshold",
        showarrow = FALSE,
        font = list(size = 10, color = "red")
      ),
      list(
        x = 0.85, y = -0.15,
        xref = "paper", yref = "paper",
        text = "Blue lines: Median volatility values",
        showarrow = FALSE,
        font = list(size = 10, color = "blue")
      )
    )
  )


combined_analysis

Figure 5: Combined analysis of trend strength disruption and volatility across industries during the pandemic period. Left: Trend strength analysis with 0.3 threshold lines. Right: Volatility analysis with median lines. Industries in bottom-left quadrant of left plot are most severely impacted, while top-right industries in right plot have highest volatility.

Threshold-based industry classification

To turn the macro screen into interpretable segments, industries are grouped into recovery patterns using transparent thresholds on ROA trend strength and spikiness. This provides a practical way to identify industries that look resilient, industries that show disrupted and volatile patterns consistent with uneven recovery, and industries that appear stable but volatile beneath the surface.

Show code

industry_classes <- focus_features |>
  mutate(
    # Calculate threshold
    trend_threshold_high = 0.7,
    trend_threshold_low = 0.3,
    spikiness_threshold_high = quantile(roa_spikiness, 0.75),
    
    # Classification
    recovery_pattern = case_when(
      roa_trend_strength >= trend_threshold_high & 
        roa_spikiness <= median(roa_spikiness) ~ "Resilient",
      roa_trend_strength <= trend_threshold_low & 
        roa_spikiness >= spikiness_threshold_high ~ "K-shaped",
      roa_trend_strength >= trend_threshold_high & 
        roa_spikiness >= spikiness_threshold_high ~ "Hidden Crisis",
      TRUE ~ "Moderate Impact"
    )
  )

Micro level evidence using firm level liquidity profitability shapes

The macro patterns can be driven by a small subset of firms or by broad within industry divergence. This section moves from industry averages to firm level point clouds in the current ratio ROA space.

Each firm is treated as a point. For each industry and year, the shape of the point cloud is summarised using scagnostic metrics. A simple linear model ROA as a function of current ratio is also fitted to extract slope and residual dispersion. Changes in slope and residual dispersion are tracked around 2020 to identify industries where the liquidity profitability relationship shifted materially.

To keep results reliable, industry year groups with fewer than three firms are not scored.

Data and methods

We select industries previously identified in the macro analysis (industries labelled K-shaped, Resilient, or Hidden Crisis). For each industry × fiscal year we:

keep firm-level pairs (current_ratio, roa_pct);
calculate scagnostic metrics: outlying, stringy, striated, clumpy, sparse, monotonic, dcor;
fit a linear model roa_pct ~ current_ratio and record slope and residual SD;
compute year-to-year slope changes and count “jumps” (a jump = |slope change| > 0.5);
compute pre/post differences in mean residual SD and mean slope (post − pre).
When an industry × year has too few firms (< 3) the scagnostics or regression are not reported (NA).

How to read the scagnostic metrics

Clumpy: higher values indicate the points form separate clusters → suggests within-industry divergence (possible K-shaped pattern).
Outlying: higher values indicate presence of extreme firms (very strong survivors or very weak firms).
Stringy / Monotonic: higher values indicate a clearer linear or monotonic relationship between CR and ROA.
Residual SD: higher residual SD from the linear fit signals more unexplained scatter (higher micro volatility).

Show code

selected_industries <- industry_classes |>
  filter(recovery_pattern %in% c("K-shaped", "Resilient", "Hidden Crisis")) |>
  pull(industry_group)

# Extract the companies time series data of these industries
selected_companies_ts <- germany_ts |>
  filter(industry_group %in% selected_industries)

Show code

# Aggregate the original point set by industry × year
by_ind_year <- selected_companies_ts |>
  group_by(industry_group) |>
  index_by(fy_year) |>
  nest(data = c(company_id, roa_pct, current_ratio)) |>
  mutate(n_obs = map_int(data, ~ nrow(.x)))

# Calculate scagnostics
scag_names <- c("outlying", "stringy", "striated", "clumpy", "sparse", "monotonic", "dcor")

by_ind_year <- by_ind_year |>
  mutate(
    scags = map(data, ~ {
      df <- .x
      # When there are too few samples, return a row of tibble with the same column names and types.
      if (nrow(df) < 3) {
        return(as_tibble(set_names(as.list(rep(NA_real_, length(scag_names))), scag_names)))
      }
      res <- calc_scags(df$current_ratio, df$roa_pct, scags = scag_names)
      # Make sure to convert to atomic vectors and name them, then convert to tibble
      res_vec <- unlist(res)
      res_named <- set_names(as.numeric(res_vec[scag_names]), scag_names)
      as_tibble(as.list(res_named))
    })
  ) |>
  unnest(cols = scags)  # Expand into multiple columns

# Fit a linear regression for each industry × year and extract slope, intercept, and resid_sd
by_ind_year <- by_ind_year |>
  mutate(
    fit = map(data, ~ {
      df <- .x
      if (nrow(df) < 3 || all(is.na(df$current_ratio)) || all(is.na(df$roa_pct))) return(NULL)
      lm(roa_pct ~ current_ratio, data = df)
    }),
    slope = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else coef(.x)[["current_ratio"]]),
    intercept = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else coef(.x)[["(Intercept)"]]),
    resid_sd = map_dbl(fit, ~ if (is.null(.x)) NA_real_ else sd(residuals(.x), na.rm = TRUE))
  ) |>
  select(-fit)

# Take industry as a time series and calculate its evolution characteristics
industry_tignostics <- by_ind_year |>
  arrange(industry_group, fy_year) |>
  group_by(industry_group) |>
  mutate(
    slope_lag = dplyr::lag(slope),
    slope_change = slope - slope_lag,   # year-to-year change in slope
    slope_pct_change = (slope - slope_lag) / abs(slope_lag) * 100,
    resid_sd_lag = dplyr::lag(resid_sd),
    resid_sd_change = resid_sd - resid_sd_lag,
    # jumps: count of year-to-year slope changes bigger than a threshold (e.g. 0.5)
    big_slope_jump = if_else(abs(slope_change) > 0.5, 1, 0),
    jumps_cumulative = cumsum(replace_na(big_slope_jump, 0))
  ) |>
  ungroup()

# Calculate the overall volatility_shift (post - pre) before and after the epidemic (taking 2020 as the epidemic year)
vol_shift <- by_ind_year |>
  mutate(period = case_when(
    fy_year < 2020 ~ "pre",
    fy_year == 2020 ~ "covid_year",
    fy_year > 2020 ~ "post"
  )) |>
  group_by(industry_group, period) |>
  summarise(mean_resid_sd = mean(resid_sd, na.rm = TRUE),
            mean_slope = mean(slope, na.rm = TRUE),
            .groups = "drop") |>
  pivot_wider(names_from = period, values_from = c(mean_resid_sd, mean_slope)) |>
  mutate(
    vol_pre_post_diff = mean_resid_sd_post - mean_resid_sd_pre,
    slope_pre_post_diff = mean_slope_post - mean_slope_pre
  )

# Merge the results to generate the final summary
industry_summary <- industry_tignostics |>
  group_by(industry_group) |>
  summarise(
    n_years = n_distinct(fy_year),
    avg_slope = mean(slope, na.rm = TRUE),
    max_abs_slope_change = if (all(is.na(slope_change))) NA_real_ else max(abs(slope_change), na.rm = TRUE),
    mean_resid_sd = mean(resid_sd, na.rm = TRUE),
    jumps = if (all(is.na(jumps_cumulative))) 0 else max(jumps_cumulative, na.rm = TRUE),
    .groups = "drop"
  ) |>
  left_join(vol_shift |> select(industry_group, vol_pre_post_diff, slope_pre_post_diff), by = "industry_group")

kable(industry_summary |> filter(jumps > 0) |>
        select(-n_years))

Table 1: Summary statistics for industries with significant slope changes (jumps > 0). The table shows average slope, maximum absolute slope change, mean residual standard deviation, number of jumps, and differences in volatility and slope before and after the COVID-19 pandemic for three industries: Wholesale & Retail Trade, Information & Communication, and Financial & Insurance.

industry_group	avg_slope	max_abs_slope_change	mean_resid_sd	jumps	vol_pre_post_diff	slope_pre_post_diff

Main results

Table 1 summarises industries that show at least one large year to year change in the liquidity profitability slope. The scagnostic figures tracks how selected scagnostic metrics evolve over time, with 2020 marked as the reference point.

The results indicate that some industries experienced meaningful changes in within industry structure after 2020. Changes in clumpiness and outlying behaviour suggest increased dispersion and the presence of distinct firm level outcomes inside the same sector. In contrast, industries with stable scagnostic profiles show a more consistent liquidity profitability relationship through the pandemic window.

Limitations

This section is exploratory and descriptive. Scagnostic metrics characterise point cloud structure but do not establish causal mechanisms. Results should be interpreted as evidence of changing within industry heterogeneity, supported by both macro time series features and micro firm level shape metrics.

Summary and interpretation

This analysis suggests that the liquidity profitability relationship did not shift uniformly across industries. At the macro level, industries differ in trend disruption and volatility. At the micro level, several industries exhibit post 2020 changes consistent with increased within industry dispersion, where firm outcomes diverge even when operating in the same sector.

Overall, the pandemic period is associated with greater heterogeneity in financial behaviour within selected industries, rather than a single directional shift in liquidity or profitability across the economy.

6 Conclusions

Across 2018 to 2021, the typical German firm maintained broadly stable profitability, but uncertainty increased. Median EBIT and EBITDA margins changed little between the pre pandemic and pandemic windows, while dispersion widened and the lower tails of ROA and ROE became heavier, signalling a larger share of firms experiencing severe downside outcomes. Sector effects were uneven. Some industries improved despite restrictions, while others saw a material deterioration in typical operating performance.

Balance sheet positioning remained conservative through the same period. Liquidity was broadly stable and leverage increased only marginally, suggesting most firms prioritised financial flexibility over aggressive debt expansion, even in a low rate environment. This stability at the aggregate level masks cross industry differences, but it supports the overall conclusion that corporate funding structures did not experience a structural break during the pandemic window.

From 2020 to 2022, industries that appeared more resilient generally combined stronger liquidity buffers with manageable leverage, while more cyclical or operationally constrained sectors were more exposed. Importantly, the relationship between liquidity and profitability became less consistent within some industries. Evidence from macro features and firm level shape metrics indicates growing within industry dispersion, consistent with uneven recovery paths where some firms stabilised by preserving cash while others maintained profitability with less emphasis on liquidity.

Overall, the pandemic’s impact on German corporate financials was moderate in the aggregate but uneven in distribution. Typical profitability was maintained and balance sheets remained steady, while the most visible shifts were higher dispersion, deeper downside risk for some firms, and stronger sectoral divergence in outcomes.

7 References

The materials used for this report are:

Bureau van Dijk. (2025). OSIRIS [Data set: Germany subset]. Bureau van Dijk – A Moody’s Analytics Company. https://www.bvdinfo.com/en-gb/our-products/data/international/osiris
Tidyverse: Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 https://doi.org/10.21105/joss.01686.
Conflicted: Wickham H (2023). conflicted: An Alternative Conflict Resolution Strategy. doi:10.32614/CRAN.package.conflicted https://doi.org/10.32614/CRAN.package.conflicted, R package version 1.2.0, https://CRAN.R-project.org/package=conflicted.
Dplyr: Wickham H, François R, Henry L, Müller K, Vaughan D (2023). dplyr: A Grammar of Data Manipulation. doi:10.32614/CRAN.package.dplyr https://doi.org/10.32614/CRAN.package.dplyr, R package version 1.1.4, https://CRAN.R-project.org/package=dplyr.
Stringr: Wickham H (2025). stringr: Simple, Consistent Wrappers for Common String Operations. doi:10.32614/CRAN.package.stringr https://doi.org/10.32614/CRAN.package.stringr, R package version 1.5.2, https://CRAN.R-project.org/package=stringr.
Janitor: Firke S (2024). janitor: Simple Tools for Examining and Cleaning Dirty Data. doi:10.32614/CRAN.package.janitor https://doi.org/10.32614/CRAN.package.janitor, R package version 2.2.1, https://CRAN.R-project.org/package=janitor.
Skimr: Waring E, Quinn M, McNamara A, Arino de la Rubia E, Zhu H, Ellis S (2025). skimr: Compact and Flexible Summaries of Data. doi:10.32614/CRAN.package.skimr https://doi.org/10.32614/CRAN.package.skimr, R package version 2.2.1, https://CRAN.R-project.org/package=skimr.
Tsibble: Wang, E, D Cook, and RJ Hyndman (2020). A new tidy data structure to support exploration and modeling of temporal data, Journal of Computational and Graphical Statistics, 29:3, 466-478, doi:10.1080/10618600.2019.1695624.
Knitr: Xie Y (2025). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50, https://yihui.org/knitr/.

Yihui Xie (2015) Dynamic Documents with R and knitr. 2nd edition. Chapman and Hall/CRC. ISBN 978-1498716963

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595
Naniar: Tierney N, Cook D (2023). “Expanding Tidy Data Principles to Facilitate Missing Data Exploration, Visualization and Assessment of Imputations.” Journal of Statistical Software, 105(7), 1-31. doi:10.18637/jss.v105.i07 https://doi.org/10.18637/jss.v105.i07.
Patchwork: Pedersen T (2025). patchwork: The Composer of Plots. doi:10.32614/CRAN.package.patchwork https://doi.org/10.32614/CRAN.package.patchwork, R package version 1.3.2, https://CRAN.R-project.org/package=patchwork.
Plotly: C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
tsibbletalk: Wang E, Cook D (2020). tsibbletalk: Interactive Graphics for Tsibble Objects. doi:10.32614/CRAN.package.tsibbletalk https://doi.org/10.32614/CRAN.package.tsibbletalk, R package version 0.1.0, https://CRAN.R-project.org/package=tsibbletalk.
Feasts: O’Hara-Wild M, Hyndman R, Wang E (2025). feasts: Feature Extraction and Statistics for Time Series. doi:10.32614/CRAN.package.feasts https://doi.org/10.32614/CRAN.package.feasts, R package version 0.4.2, https://CRAN.R-project.org/package=feasts.
Broom: Robinson D, Hayes A, Couch S (2025). broom: Convert Statistical Objects into Tidy Tibbles. doi:10.32614/CRAN.package.broom https://doi.org/10.32614/CRAN.package.broom, R package version 1.0.9, https://CRAN.R-project.org/package=broom.
Purrr: Wickham H, Henry L (2025). purrr: Functional Programming Tools. doi:10.32614/CRAN.package.purrr https://doi.org/10.32614/CRAN.package.purrr, R package version 1.1.0, https://CRAN.R-project.org/package=purrr.
Cassowaryr: L. Wilkinson, A. Anand and R. Grossman, Graph-theoretic scagnostics, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005., 2005, pp. 157-164, doi: 10.1109/INFVIS.2005.1532142.

L. Wilkinson and G. Wills (2008) Scagnostics Distributions, Journal of Computational and Graphical Statistics, 17(2), pp 473-491, doi:10.1198/106186008X320465

K. Grimm, Kennzahlenbasierte Grafikauswahl, doctoral thesis, Universitat Augsburg, 2016.

H. Mason, Lee, S., Laa, U. and Cook, D. (2022) cassowaryr: Compute Scagnostics on Pairs of Numeric Variables in a Data Set. R package version 2.0.0. https://CRAN.R-project.org/package=cassowary
Gganimate: Pedersen T, Robinson D (2025). gganimate: A Grammar of Animated Graphics. doi:10.32614/CRAN.package.gganimate https://doi.org/10.32614/CRAN.package.gganimate, R package version 1.0.11, https://CRAN.R-project.org/package=gganimate.
Gifski: Ooms J, Kornel Lesiński, Authors of the dependency Rust crates (2025). gifski: Highest Quality GIF Encoder. doi:10.32614/CRAN.package.gifski https://doi.org/10.32614/CRAN.package.gifski, R package version 1.32.0-2, https://CRAN.R-project.org/package=gifski.
KableExtra: Zhu H (2024). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. doi:10.32614/CRAN.package.kableExtra https://doi.org/10.32614/CRAN.package.kableExtra, R package version 1.4.0, https://CRAN.R-project.org/package=kableExtra.
Ggplot2: H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Scales: Wickham H, Pedersen T, Seidel D (2025). scales: Scale Functions for Visualization. doi:10.32614/CRAN.package.scales https://doi.org/10.32614/CRAN.package.scales, R package version 1.4.0, https://CRAN.R-project.org/package=scales.
Tidyr: Wickham H, Vaughan D, Girlich M (2024). tidyr: Tidy Messy Data. doi:10.32614/CRAN.package.tidyr https://doi.org/10.32614/CRAN.package.tidyr, R package version 1.3.1, https://CRAN.R-project.org/package=tidyr.
Ggrepel: Slowikowski K (2024). ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. doi:10.32614/CRAN.package.ggrepel https://doi.org/10.32614/CRAN.package.ggrepel, R package version 0.9.6, https://CRAN.R-project.org/package=ggrepel.
ViridisLite: Simon Garnier, Noam Ross, Robert Rudis, Antônio P. Camargo, Marco Sciaini, and Cédric Scherer (2023). viridis(Lite) - Colorblind-Friendly Color Maps for R. viridisLite package version 0.4.2.
The Associated Press. (2020). Europe braces for next wave of coronavirus pandemic in Berlin. AP News. https://apnews.com/article/coronavirus-pandemic-health-europe-epidemics-berlin-b61de99739774c1f52b4ba6860054d6d