The proc_sort
function sorts a dataset according to
the variables passed on the by
parameter. If no parameters are
passed on the by
parameter, it will sort by all variables. The
direction of the sort is controlled with the order
parameter.
Use the nodupkey
option to eliminate duplicate rows from
the dataset, and the keep
parameter to subset columns.
The parameters will accept either quoted or unquoted values.
proc_sort(
data,
by = NULL,
keep = NULL,
order = "ascending",
options = NULL,
as.character = FALSE
)
The input data to sort.
A vector of variables to sort by.
A vector of variables on the output data to keep. All other variables will be dropped.
The sort order of the variables on the by parameter. Valid values are 'ascending' or 'descending'. These values may also be abbreviated to 'asc', 'desc', 'a', or 'd'. You may pass a vector of order values equal to the number of variables on the by parameter. Default is 'ascending' for all by variables.
Any options desired for the sort. Available options are 'dupkey' and 'nodupkey'. The 'nodupkey' option removes duplicate rows from the sorted dataset. The 'dupkey' option removes unique rows from the sorted dataset.
If TRUE, will cast any factors in the 'by' parameter to character. Default is FALSE. This parameter is included because it is common to use factors for sorting in R, but you may not want to keep the variable as a factor. This parameter therefore allows you to use the factor for the sort, but then convert back to a character once the sort is complete.
The sorted dataset. If a data frame was input, a data frame will be output. If a tibble was input, a tibble will be output.
Below are the available options for the proc_sort
function:
dupkey: This option keeps duplicate rows and discards unique rows. Duplicate rows will be identified by the key variables listed on the by parameter if passed. This option is the opposite of 'nodupkey'.
nodupkey: Removes duplicate rows following the sort. Duplicate rows will be identified by the key variables listed on the by parameter if passed. Otherwise, the function will dedupe on all variables returned.
# Prepare data subset
dat <- data.frame(HairEyeColor, stringsAsFactors = FALSE)[1:32 %% 4 == 1, ]
# View data
dat
# Hair Eye Sex Freq
# 1 Black Brown Male 32
# 5 Black Blue Male 11
# 9 Black Hazel Male 10
# 13 Black Green Male 3
# 17 Black Brown Female 36
# 21 Black Blue Female 9
# 25 Black Hazel Female 5
# 29 Black Green Female 2
# Sort by Frequency
res1 <- proc_sort(dat, by = Freq)
# View results
res1
# Hair Eye Sex Freq
# 29 Black Green Female 2
# 13 Black Green Male 3
# 25 Black Hazel Female 5
# 21 Black Blue Female 9
# 9 Black Hazel Male 10
# 5 Black Blue Male 11
# 1 Black Brown Male 32
# 17 Black Brown Female 36
# Sort by Frequency descending
res2 <- proc_sort(dat, by = Freq, order = d)
# View results
res2
# Hair Eye Sex Freq
# 17 Black Brown Female 36
# 1 Black Brown Male 32
# 5 Black Blue Male 11
# 9 Black Hazel Male 10
# 21 Black Blue Female 9
# 25 Black Hazel Female 5
# 13 Black Green Male 3
# 29 Black Green Female 2
# Get unique combinations of Eye and Sex
res3 <- proc_sort(dat, keep = v(Eye, Sex), options = nodupkey)
# View results
res3
# Eye Sex
# 1 Brown Male
# 17 Brown Female
# 5 Blue Male
# 21 Blue Female
# 9 Hazel Male
# 25 Hazel Female
# 13 Green Male
# 29 Green Female