The proc_sort function sorts a dataset according to
the variables passed on the by parameter. If no parameters are
passed on the by parameter, it will sort by all variables. The
direction of the sort is controlled with the order parameter.
Use the nodupkey option to eliminate duplicate rows from
the dataset, and the keep parameter to subset columns.
The parameters will accept either quoted or unquoted values.
proc_sort(
data,
by = NULL,
keep = NULL,
order = "ascending",
options = NULL,
as.character = FALSE,
na.sort = NULL,
where = NULL
)The input data to sort.
A vector of variables to sort by.
A vector of variables on the output data to keep. All other variables will be dropped.
The sort order of the variables on the by parameter. Valid values are 'ascending' or 'descending'. These values may also be abbreviated to 'asc', 'desc', 'a', or 'd'. You may pass a vector of order values equal to the number of variables on the by parameter. Default is 'ascending' for all by variables.
Any options desired for the sort. Available options are 'dupkey' and 'nodupkey'. The 'nodupkey' option removes duplicate rows from the sorted dataset. The 'dupkey' option removes unique rows from the sorted dataset.
If TRUE, will cast any factors in the 'by' parameter to character. Default is FALSE. This parameter is included because it is common to use factors for sorting in R, but you may not want to keep the variable as a factor. This parameter therefore allows you to use the factor for the sort, but then convert back to a character once the sort is complete.
An option that determines how to sort NA values. Valid
values are NULL, "first", "last", and "sas". Values may be quoted or
unquoted. The "sas"
value will sort NAs first for ascending keys, and last for descending
keys. The NA sort style may be controlled globally with
options("procs.na.sort" = <value>). Any values passed
on the local parameter will override the global setting. The default
value is NULL, which defers to the global setting. If the global setting
is NULL, the parameter will default to "last".
An expression to filter the rows before the sort takes place.
Use the expression function to define the where clause.
The sorted dataset. If a data frame was input, a data frame will be output. If a tibble was input, a tibble will be output.
Below are the available options for the proc_sort function:
dupkey: This option keeps duplicate rows and discards unique rows. Duplicate rows will be identified by the key variables listed on the by parameter if passed. This option is the opposite of 'nodupkey'.
nodupkey: Removes duplicate rows following the sort. Duplicate rows will be identified by the key variables listed on the by parameter if passed. Otherwise, the function will dedupe on all variables returned.
Missing values are handled in R differently than they are handled in SAS. In R, missing values (NA) are sorted last, and most sorting functions in R give you an option to sort it first if desired.
SAS, on the other hand, considers a missing as the smallest value. If the sort is ascending, it will sort the missing values first. If the sort is descending, it will sort the missing values last.
The na.sort parameter on proc_sort gives you a choice of
how NAs are sorted. By default, the function sorts in the standard R
manner. If you are trying to replicate a SAS sort, set the
value to "sas".
Note that this parameter only sorts NA values in a manner similar to SAS. It does not guarantee that the entire sort will match SAS. There can still be differences in the sort caused by formats, character set mismatch, or other reasons.
If you want to always use the SAS style sort, you can set it as a global
option, like this: options("procs.na.sort" = "sas").
# Prepare data subset
dat <- data.frame(HairEyeColor, stringsAsFactors = FALSE)[1:32 %% 4 == 1, ]
# View data
dat
# Hair Eye Sex Freq
# 1 Black Brown Male 32
# 5 Black Blue Male 11
# 9 Black Hazel Male 10
# 13 Black Green Male 3
# 17 Black Brown Female 36
# 21 Black Blue Female 9
# 25 Black Hazel Female 5
# 29 Black Green Female 2
# Sort by Frequency
res1 <- proc_sort(dat, by = Freq)
# View results
res1
# Hair Eye Sex Freq
# 29 Black Green Female 2
# 13 Black Green Male 3
# 25 Black Hazel Female 5
# 21 Black Blue Female 9
# 9 Black Hazel Male 10
# 5 Black Blue Male 11
# 1 Black Brown Male 32
# 17 Black Brown Female 36
# Sort by Frequency descending
res2 <- proc_sort(dat, by = Freq, order = d)
# View results
res2
# Hair Eye Sex Freq
# 17 Black Brown Female 36
# 1 Black Brown Male 32
# 5 Black Blue Male 11
# 9 Black Hazel Male 10
# 21 Black Blue Female 9
# 25 Black Hazel Female 5
# 13 Black Green Male 3
# 29 Black Green Female 2
# Get unique combinations of Eye and Sex
res3 <- proc_sort(dat, keep = v(Eye, Sex), options = nodupkey)
# View results
res3
# Eye Sex
# 1 Brown Male
# 17 Brown Female
# 5 Blue Male
# 21 Blue Female
# 9 Hazel Male
# 25 Hazel Female
# 13 Green Male
# 29 Green Female