A function to pivot or transpose a data frame. In the default
usage, the variables identified by the parameter var
are transposed
to become rows. The variable values in the parameter id
become
the new columns. The function has several more parameters to control
how variables are named in the transposed data set. Parameters will
accept quoted or unquoted values.
proc_transpose(
data,
by = NULL,
var = NULL,
id = NULL,
idlabel = NULL,
copy = NULL,
name = "NAME",
namelabel = NULL,
prefix = NULL,
delimiter = ".",
suffix = NULL,
where = NULL,
options = NULL,
log = TRUE
)
The input data to transpose.
An optional by group. Parameter accepts a vector of one or more quoted variable names. If the by group is requested, the data will be subset by that variable and the transpose function will transpose each group and stack them together in a single table.
The variable or variables to transpose. Parameter accepts a vector of variable names. By default, all numeric variables will be transposed.
The variable or variables to use for the transposed column names.
The variable to use for the transposed column labels.
A vector of variables to retain in the output data without transposition. Values will be truncated or recycled to fit the number of output rows.
Specifies the name of the variable to be used for the var values.
The label to use for the name variable.
Contains a prefix to be used in the construction of column names.
Specifies a delimiter to be used in the construction of column names.
Contains a suffix to be used in the construction of column names.
An expression to filter the rows after the transform
is complete. Use the expression
function to define
the where clause.
Optional keywords that affect the transpose. Default is NULL. Available option is "noname" which drops the name column from the output dataset.
Whether or not to log the procedure. Default is TRUE. This parameter is used internally.
The transposed dataset. If a data frame is input, a data frame will be output. If a tibble is input, a tibble will be output.
The proc_tranpose
function takes an input data frame or tibble
and transposes the columns and rows. If no parameters are specified,
the function will assign all numeric variables to the var
parameter.
These variables will become rows, and generic column names
("COL1", "COL2", etc.) will be generated.
Other variables will be dropped.
There are several parameters to control how new column names are constructed.
If the desired column names already exist in your data, identify them on the
id
parameter. The function will then use those data values
unaltered. The label for these new columns can also be constructed from
data values using the idlabel
parameter.
The name
and namelabel
parameter are used to control the
name of the column created for the var
values. If this parameter
is not passed, the column will be called "NAME", and no label will be assigned.
You may group the transposed values using the by
parameter. This
parameter accepts one or more variable names to use for grouping. If this
parameter is used, the function will first subset the data by the unique
combination of by
variables, transpose each subset, and then
combine the result into a single output dataset. The by group variables
will be named on the output dataset with a generic name ("BY1", "BY2", etc.).
The copy
parameter is used to simply copy columns from the input
dataset to the transposed dataset. If necessary, these values will be
recycled or truncated to fit the number of output rows. Any input
variables not included in the var
, id
, or copy
parameter
will be dropped.
Once the transpose is complete, you may wish to filter the output data.
Filtering can be accomplished using the where
parameter. This parameter
takes an expression using the expression
function. The expression
is constructed using standard R logical operators. Variable names do
not need to be quoted.
The prefix
, delimiter
, and suffix
parameter are used
to control how generic column names are constructed. These parameters
are especially useful when there are multiple var
variables.
# Prepare data
dat <- data.frame(CAT = rownames(USPersonalExpenditure),
USPersonalExpenditure, stringsAsFactors = FALSE,
row.names = NULL)[1:4, ]
# View data
dat
# CAT X1940 X1945 X1950 X1955 X1960
# 1 Food and Tobacco 22.20 44.50 59.60 73.2 86.8
# 2 Household Operation 10.50 15.50 29.00 36.5 46.2
# 3 Medical and Health 3.53 5.76 9.71 14.0 21.1
# 4 Personal Care 1.04 1.98 2.45 3.4 5.4
# Default transpose
tdat1 <- proc_transpose(dat)
# View results
tdat1
# NAME COL1 COL2 COL3 COL4
# 1 X1940 22.2 10.5 3.53 1.04
# 2 X1945 44.5 15.5 5.76 1.98
# 3 X1950 59.6 29.0 9.71 2.45
# 4 X1955 73.2 36.5 14.00 3.40
# 5 X1960 86.8 46.2 21.10 5.40
# Transpose with ID and Name
tdat2 <- proc_transpose(dat, id = CAT, name = Year)
# View results
tdat2
# Year Food and Tobacco Household Operation Medical and Health Personal Care
# 1 X1940 22.2 10.5 3.53 1.04
# 2 X1945 44.5 15.5 5.76 1.98
# 3 X1950 59.6 29.0 9.71 2.45
# 4 X1955 73.2 36.5 14.00 3.40
# 5 X1960 86.8 46.2 21.10 5.40
# Transpose only some of the variables
tdat3 <- proc_transpose(dat, var = v(X1940, X1950, X1960), id = CAT, name = Year)
# View results
tdat3
# Year Food and Tobacco Household Operation Medical and Health Personal Care
# 1 X1940 22.2 10.5 3.53 1.04
# 2 X1950 59.6 29.0 9.71 2.45
# 3 X1960 86.8 46.2 21.10 5.40
# By with a where clause
tdat4 <- proc_transpose(dat, by = CAT, name = Year,
where = expression(Year %in% c("X1940", "X1950", "X1960")))
# View Results
tdat4
# CAT Year COL1
# 1 Food and Tobacco X1940 22.20
# 2 Food and Tobacco X1950 59.60
# 3 Food and Tobacco X1960 86.80
# 4 Household Operation X1940 10.50
# 5 Household Operation X1950 29.00
# 6 Household Operation X1960 46.20
# 7 Medical and Health X1940 3.53
# 8 Medical and Health X1950 9.71
# 9 Medical and Health X1960 21.10
# 10 Personal Care X1940 1.04
# 11 Personal Care X1950 2.45
# 12 Personal Care X1960 5.40