Transposes a Dataset — proc_transpose • procs

A function to pivot or transpose a data frame. In the default usage, the variables identified by the parameter var are transposed to become rows. The variable values in the parameter id become the new columns. The function has several more parameters to control how variables are named in the transposed data set. Parameters will accept quoted or unquoted values.

proc_transpose(
  data,
  by = NULL,
  var = NULL,
  id = NULL,
  idlabel = NULL,
  copy = NULL,
  name = "NAME",
  namelabel = NULL,
  prefix = NULL,
  delimiter = ".",
  suffix = NULL,
  where = NULL,
  options = NULL,
  log = TRUE
)

Arguments

data: The input data to transpose.
by: An optional by group. Parameter accepts a vector of one or more quoted variable names. If the by group is requested, the data will be subset by that variable and the transpose function will transpose each group and stack them together in a single table.
var: The variable or variables to transpose. Parameter accepts a vector of variable names. By default, all numeric variables will be transposed.
id: The variable or variables to use for the transposed column names.
idlabel: The variable to use for the transposed column labels.
copy: A vector of variables to retain in the output data without transposition. Values will be truncated or recycled to fit the number of output rows.
name: Specifies the name of the variable to be used for the var values.
namelabel: The label to use for the name variable.
prefix: Contains a prefix to be used in the construction of column names.
delimiter: Specifies a delimiter to be used in the construction of column names.
suffix: Contains a suffix to be used in the construction of column names.
where: An expression to filter the rows after the transform is complete. Use the expression function to define the where clause.
options: Optional keywords that affect the transpose. Default is NULL. Available option is "noname" which drops the name column from the output dataset.
log: Whether or not to log the procedure. Default is TRUE. This parameter is used internally.

Value

The transposed dataset. If a data frame is input, a data frame will be output. If a tibble is input, a tibble will be output.

Details

The proc_tranpose function takes an input data frame or tibble and transposes the columns and rows. If no parameters are specified, the function will assign all numeric variables to the var parameter. These variables will become rows, and generic column names ("COL1", "COL2", etc.) will be generated. Other variables will be dropped.

There are several parameters to control how new column names are constructed. If the desired column names already exist in your data, identify them on the id parameter. The function will then use those data values unaltered. The label for these new columns can also be constructed from data values using the idlabel parameter.

The name and namelabel parameter are used to control the name of the column created for the var values. If this parameter is not passed, the column will be called "NAME", and no label will be assigned.

You may group the transposed values using the by parameter. This parameter accepts one or more variable names to use for grouping. If this parameter is used, the function will first subset the data by the unique combination of by variables, transpose each subset, and then combine the result into a single output dataset. The by group variables will be named on the output dataset with a generic name ("BY1", "BY2", etc.).

The copy parameter is used to simply copy columns from the input dataset to the transposed dataset. If necessary, these values will be recycled or truncated to fit the number of output rows. Any input variables not included in the var, id, or copy parameter will be dropped.

Once the transpose is complete, you may wish to filter the output data. Filtering can be accomplished using the where parameter. This parameter takes an expression using the expression function. The expression is constructed using standard R logical operators. Variable names do not need to be quoted.

The prefix, delimiter, and suffix parameter are used to control how generic column names are constructed. These parameters are especially useful when there are multiple var variables.

Examples

# Prepare data
dat <- data.frame(CAT = rownames(USPersonalExpenditure),
                  USPersonalExpenditure, stringsAsFactors = FALSE,
                  row.names = NULL)[1:4, ]

# View data
dat
#                   CAT X1940 X1945 X1950 X1955 X1960
# 1    Food and Tobacco 22.20 44.50 59.60  73.2  86.8
# 2 Household Operation 10.50 15.50 29.00  36.5  46.2
# 3  Medical and Health  3.53  5.76  9.71  14.0  21.1
# 4       Personal Care  1.04  1.98  2.45   3.4   5.4

# Default transpose
tdat1 <- proc_transpose(dat)

# View results
tdat1
#    NAME COL1 COL2  COL3 COL4
# 1 X1940 22.2 10.5  3.53 1.04
# 2 X1945 44.5 15.5  5.76 1.98
# 3 X1950 59.6 29.0  9.71 2.45
# 4 X1955 73.2 36.5 14.00 3.40
# 5 X1960 86.8 46.2 21.10 5.40

# Transpose with ID and Name
tdat2 <-  proc_transpose(dat, id = CAT, name = Year)

# View results
tdat2
#   Year Food and Tobacco Household Operation Medical and Health Personal Care
# 1 X1940             22.2                10.5               3.53          1.04
# 2 X1945             44.5                15.5               5.76          1.98
# 3 X1950             59.6                29.0               9.71          2.45
# 4 X1955             73.2                36.5              14.00          3.40
# 5 X1960             86.8                46.2              21.10          5.40

# Transpose only some of the variables
tdat3 <- proc_transpose(dat, var = v(X1940, X1950, X1960), id = CAT, name = Year)

# View results
tdat3
#    Year Food and Tobacco Household Operation Medical and Health Personal Care
# 1 X1940             22.2                10.5               3.53          1.04
# 2 X1950             59.6                29.0               9.71          2.45
# 3 X1960             86.8                46.2              21.10          5.40

# By with a where clause
tdat4 <- proc_transpose(dat, by = CAT, name = Year,
                        where = expression(Year %in% c("X1940", "X1950", "X1960")))

# View Results
tdat4
#                    CAT  Year  COL1
# 1     Food and Tobacco X1940 22.20
# 2     Food and Tobacco X1950 59.60
# 3     Food and Tobacco X1960 86.80
# 4  Household Operation X1940 10.50
# 5  Household Operation X1950 29.00
# 6  Household Operation X1960 46.20
# 7   Medical and Health X1940  3.53
# 8   Medical and Health X1950  9.71
# 9   Medical and Health X1960 21.10
# 10       Personal Care X1940  1.04
# 11       Personal Care X1950  2.45
# 12       Personal Care X1960  5.40