--- title: "Format of the crosstabs' data" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Format of the crosstabs' data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` In this vignette we'll show the underlying data format of the crosstabs generated by crosstabser. Basically, it's a long format of [tidy data](https://r4ds.had.co.nz/tidy-data.html) such as the one needed if you'd want to plot the crosstab with [ggplot2](https://github.com/tidyverse/ggplot2/) or [observable plot](https://github.com/observablehq/plot) (like in [table_charter](https://gitlab.com/urswilke/table_charter); see [here](https://urswilke.github.io/datadaptor-crosstabser-table_charter-demo/) for an interactive demo). However, in order to reduce redundancy and to save space we store the data in multiple data.frames that can be merged together in the end. First, let's load the needed libraries: ```{r, setup, message=FALSE} library(crosstabser) library(dplyr) library(ggplot2) library(purrr) library(tidyr) library(haven) ``` and define a labelled data.frame: ```{r} df <- tibble::tibble( q1 = c(1, 2, 1) |> haven::labelled(c(Yes = 1, No = 2), label = "Super important question"), age = c(2, 1, 1) |> haven::labelled(c("18-39" = 1, "40+" = 2), label = "age") ) df ``` Next we define the syntax to generate a crosstab: ```{r} Questions <- tibble::tibble( Type = "cat", RowVar = "q1", Title = "The crosstab's title" ) ``` We'll use the `age` variable for the x-axis of the crosstab: ```{r} ColVar <- "age" ``` Now we can construct an R6 object of class "Tabula": ```{r} mapping_file <- list(Questions = Questions, Macro = list(ColVar = ColVar)) m <- Tabula$new( df, mapping_file, ) ``` The `Tabula$get_crosstabs_data()` method returns a list of dataframes containing the crosstabs' underlying data: ```{r} l <- m$get_crosstabs_data() l ``` It contains 5 data.frames: - `tab_table`: For every crosstab generated, this dataframe contains 1 row; `QuestNo` is the unique identifier from `Questions$Abbreviation`; if multiple crosstabs are generated by a row, they are identified by `TabNo`. - `val_table`: Contains the values of the crosstabs in long format in the column `Value`. - `row_table`: Contains the label information of the crosstabs' rows and some information about the format in the rows. - `head_table`: Contains the label information of the crosstabs' headers. - `col_table_all`: Contains the label information of the crosstabs' columns. Now we're ready to merge all this data into one data.frame: ```{r} res <- l[c( "tab_table", "val_table", "row_table", "col_table_all", "head_table" )] |> reduce(left_join) ```
Click here to see the full data.frame ```{r} knitr::kable(res) ```
If we look at the crosstab ```{r} m ``` and say we wanted to generated a color-coded raster of the percent values, we could do this like this: ```{r} res |> filter( # This removes the data of the "TOTAL" & "VALID CASES" rows: RowContent == "Detail", # remove rows with absolute values: RowAbsPercent == "Percent" ) |> ggplot() + geom_tile(aes( x = ColTitle2, y = RowTitle1, fill = Value )) + facet_grid( ~ as_factor(ColTitle1), scales = "free_x" ) + scale_x_discrete(position = "top") + theme_minimal() + theme( axis.title.x = element_blank(), axis.title.y = element_blank() ) ``` (*This plot isn't really interesting; this is just to demonstrate the structure of the crosstabs' underlying data*)