Get started with yusufHAIGermany • yusufHAIGermany

Purpose

This vignette demonstrates how to load the package datasets and perform quick checks to confirm the structure and content. It uses concise, reproducible examples you can adapt for exploratory work and quality control.

What’s inside:

Data access. Load sim_monthly, sim_weekly, and sim_daily and verify they are present.
Review structure. Print concise schema for each dataset.
Quick visuals and summaries. A monthly trend line.
Tooling notes. Short explanations of the build and development libraries.

Data Access

#load the datasets from the installed package during vignette build
if (!requireNamespace("yusufHAIGermany", quietly = TRUE)) {
  stop("Package not available in vignette build process.")
}

data("sim_monthly", package = "yusufHAIGermany")
data("sim_weekly",  package = "yusufHAIGermany")
data("sim_daily",   package = "yusufHAIGermany")

#quick sanity check so the build fails early if something is missing
stopifnot(
  exists("sim_monthly"), exists("sim_weekly"), exists("sim_daily")
)

The package provides three simulation tables for HAIs in Germany (2011 – 2012) and was cleaned to get these formats:

sim_monthly: date, year, month, hai, region, sex, cases_month, deaths_month, dalys_month.
sim_weekly: date, year, iso_week, hai, region, sex, cases_week, deaths_week, dalys_week.
sim_daily: date, year, yday, wday, hai, region, sex, cases_day, deaths_day, dalys_day.

For details on the data source and variable definitions, see the Data Description, and the examples of common analysis steps are provided in the Examples.

Review Data

This glimpse() function helps understand the data type and its appearance, making the process easier for IDA and EDA.

dplyr::glimpse(sim_monthly)
dplyr::glimpse(sim_weekly)
dplyr::glimpse(sim_daily)

Quick Visualisation and Summaries

hese are the example tables and figures that can be used to start analysing the data:

Summary table by year, region, metric, and HAI

# HAI categories
hai_levels <- sort(unique(sim_monthly$hai))

# Build table for each metric
summary_region <- sim_monthly |>
  group_by(year, region, hai) |>
  summarise(
    cases  = sum(cases_month),
    deaths = sum(deaths_month),
    dalys  = sum(dalys_month),
    .groups = "drop"
  ) |>
  pivot_longer(cols = c(cases, deaths, dalys),
               names_to = "metric",
               values_to = "value") |>
  pivot_wider(
    names_from = hai,
    values_from = value,
    values_fill = 0
  ) |>
  arrange(year, region, metric)

knitr::kable(
  summary_region,
  caption = "Summary by year, region, and metric, with HAI categories as columns."
)

Summary by year, region, and metric, with HAI categories as columns.
year	region	metric	HAP	SSI	BSI	UTI	CDI
2011	EU/EEA	cases	119573.0	100961.0	29548.0	250172.0	39626.0
2011	EU/EEA	dalys	77966.6	31368.3	63903.3	77783.5	23021.2
2011	EU/EEA	deaths	4453.0	2532.0	4277.0	4274.0	2114.0
2011	Germany	cases	108704.0	91784.0	26861.0	227429.0	36030.0
2011	Germany	dalys	70878.6	28516.5	58093.8	70712.2	20928.6
2011	Germany	deaths	4047.0	2301.0	3887.0	3885.0	1922.0
2012	EU/EEA	cases	117743.0	98488.0	30212.0	244976.0	41294.0
2012	EU/EEA	dalys	76615.2	30402.3	65402.9	76522.7	23983.4
2012	EU/EEA	deaths	4371.0	2454.0	4378.0	4203.0	2202.0
2012	Germany	cases	107039.0	89535.0	27465.0	222706.0	37540.0
2012	Germany	dalys	69650.1	27638.5	59457.3	69566.0	21802.4
2012	Germany	deaths	3975.0	2230.0	3979.0	3821.0	1998.0

According to Table 1, Germany consistently reports higher case counts than the overall EU/EEA, with UTI and HAP making up the largest portions in both years. DALYs follow a similar pattern, showing that the burden is driven by both frequency and severity in these categories. Deaths remain significantly lower across all groups, but the distribution across HAI types reflects the same patterns seen in cases and DALYs.

summary_sex <- sim_monthly |>
  group_by(year, sex, hai) |>
  summarise(
    cases  = sum(cases_month),
    deaths = sum(deaths_month),
    dalys  = sum(dalys_month),
    .groups = "drop"
  ) |>
  pivot_longer(cols = c(cases, deaths, dalys),
               names_to = "metric",
               values_to = "value") |>
  pivot_wider(
    names_from = hai,
    values_from = value,
    values_fill = 0
  ) |>
  arrange(year, sex, metric)

knitr::kable(
  summary_sex,
  caption = "Summary by year, sex, and metric, with HAI categories as columns."
)

Summary by year, sex, and metric, with HAI categories as columns.
year	sex	metric	HAP	SSI	BSI	UTI	CDI
2011	Female	cases	102726.0	77099.0	23691.0	229247.0	37828.0
2011	Female	dalys	66980.3	23953.7	51238.7	71277.7	21974.9
2011	Female	deaths	3826.0	1932.0	3429.0	3915.0	2018.0
2011	Male	cases	125551.0	115646.0	32718.0	248354.0	37828.0
2011	Male	dalys	81864.9	35931.1	70758.4	77218.0	21974.9
2011	Male	deaths	4674.0	2901.0	4735.0	4244.0	2018.0
2012	Female	cases	101152.0	75210.0	24223.0	224487.0	39417.0
2012	Female	dalys	65819.3	23216.3	52441.0	70122.6	22892.9
2012	Female	deaths	3756.0	1875.0	3510.0	3854.0	2100.0
2012	Male	cases	123630.0	112813.0	33454.0	243195.0	39417.0
2012	Male	dalys	80446.0	34824.5	72419.2	75966.1	22892.9
2012	Male	deaths	4590.0	2809.0	4847.0	4170.0	2100.0

According to Table 2, male patients consistently show higher case counts and DALYs across all HAI categories, reflecting a greater infection burden compared with females in both years. The distribution across HAP, SSI, BSI, UTI, and CDI remains stable between sexes, indicating similar clinical patterns despite differences in overall magnitude. Deaths follow the same trend, with male totals slightly exceeding female totals across all HAI types.

Monthly Total HAI cases with Composition by Infection

monthly_totals <- sim_monthly |>
  dplyr::group_by(date) |>
  dplyr::summarise(total_cases = sum(cases_month), .groups = "drop")

monthly_mix <- sim_monthly |>
  dplyr::group_by(date, hai) |>
  dplyr::summarise(cases = sum(cases_month), .groups = "drop")

peak_pt <- monthly_totals |> dplyr::slice_max(total_cases, n = 1)

ggplot() +
  geom_area(data = monthly_mix,
            aes(x = as.Date(date), y = cases, fill = hai), alpha = 0.75) +
  geom_line(data = monthly_totals,
            aes(x = as.Date(date), y = total_cases), linewidth = 1.1) +
  geom_point(data = peak_pt,
             aes(x = as.Date(date), y = total_cases), size = 2) +
  geom_text(data = peak_pt,
            aes(x = as.Date(date), y = total_cases,
                label = scales::comma(total_cases)),
            vjust = -0.5, size = 3.2) +
  scale_y_continuous(labels = scales::label_comma()) +
  labs(x = "Date", y = "Cases (monthly)",
       fill = "Infection type",
       title = "Monthly HAI burden: Total and Composition") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "bottom")

Monthly total HAI cases with composition by infection type, 2011–2012

According to the Figure 1, the total HAI cases remained high throughout 2011 – 2012, with a noticeable peak in mid-2011. UTI consistently made up the largest share of the monthly burden, while CDI contributed the smallest proportion. The overall pattern remains stable, with seasonal increases and decreases that seem consistent across infection types.

For more information regarding the analysis, please check the live Shiny dashboard App and Examples.

Tooling Notes

This package is:

Built with:

shiny and bslib: UI components and theming for interactive apps.
ggplot2 and plotly: Static and interactive charts for EDA and review.
dplyr, tidyr, lubridate, scales, glue: Data wrangling, date handling, formatting, and text utilities.
yusufHAIGermany: Package datasets and helpers.

Developed with:

devtools, usethis: Package scaffolding, workflow, and checks.
roxygen2: In-source documentation and NAMESPACE generation.
pkgdown: Static documentation site for functions, articles, and examples.
knitr, rmarkdown: Reproducible vignettes and literate programming.
rsconnect: Deployment to shinyapps.io.