The proc_sort function sorts a dataset according to the variables passed on the by parameter. If no parameters are passed on the by parameter, it will sort by all variables. The direction of the sort is controlled with the order parameter. Use the nodupkey option to eliminate duplicate rows from the dataset, and the keep parameter to subset columns. The parameters will accept either quoted or unquoted values.

proc_sort(
  data,
  by = NULL,
  keep = NULL,
  order = "ascending",
  options = NULL,
  as.character = FALSE
)

Arguments

data

The input data to sort.

by

A vector of variables to sort by.

keep

A vector of variables on the output data to keep. All other variables will be dropped.

order

The sort order of the variables on the by parameter. Valid values are 'ascending' or 'descending'. These values may also be abbreviated to 'asc', 'desc', 'a', or 'd'. You may pass a vector of order values equal to the number of variables on the by parameter. Default is 'ascending' for all by variables.

options

Any options desired for the sort. Available options are 'dupkey' and 'nodupkey'. The 'nodupkey' option removes duplicate rows from the sorted dataset. The 'dupkey' option removes unique rows from the sorted dataset.

as.character

If TRUE, will cast any factors in the 'by' parameter to character. Default is FALSE. This parameter is included because it is common to use factors for sorting in R, but you may not want to keep the variable as a factor. This parameter therefore allows you to use the factor for the sort, but then convert back to a character once the sort is complete.

Value

The sorted dataset. If a data frame was input, a data frame will be output. If a tibble was input, a tibble will be output.

Options

Below are the available options for the proc_sort function:

  • dupkey: This option keeps duplicate rows and discards unique rows. Duplicate rows will be identified by the key variables listed on the by parameter if passed. This option is the opposite of 'nodupkey'.

  • nodupkey: Removes duplicate rows following the sort. Duplicate rows will be identified by the key variables listed on the by parameter if passed. Otherwise, the function will dedupe on all variables returned.

Examples

# Prepare data subset
dat <- data.frame(HairEyeColor, stringsAsFactors = FALSE)[1:32 %% 4 == 1, ]

# View data
dat
#     Hair   Eye    Sex Freq
# 1  Black Brown   Male   32
# 5  Black  Blue   Male   11
# 9  Black Hazel   Male   10
# 13 Black Green   Male    3
# 17 Black Brown Female   36
# 21 Black  Blue Female    9
# 25 Black Hazel Female    5
# 29 Black Green Female    2

# Sort by Frequency
res1 <- proc_sort(dat, by = Freq)

# View results
res1
#    Hair   Eye    Sex Freq
# 29 Black Green Female    2
# 13 Black Green   Male    3
# 25 Black Hazel Female    5
# 21 Black  Blue Female    9
# 9  Black Hazel   Male   10
# 5  Black  Blue   Male   11
# 1  Black Brown   Male   32
# 17 Black Brown Female   36

# Sort by Frequency descending
res2 <- proc_sort(dat, by = Freq, order = d)

# View results
res2
#     Hair   Eye    Sex Freq
# 17 Black Brown Female   36
# 1  Black Brown   Male   32
# 5  Black  Blue   Male   11
# 9  Black Hazel   Male   10
# 21 Black  Blue Female    9
# 25 Black Hazel Female    5
# 13 Black Green   Male    3
# 29 Black Green Female    2

# Get unique combinations of Eye and Sex
res3 <- proc_sort(dat, keep = v(Eye, Sex), options = nodupkey)

# View results
res3
#      Eye    Sex
# 1  Brown   Male
# 17 Brown Female
# 5   Blue   Male
# 21  Blue Female
# 9  Hazel   Male
# 25 Hazel Female
# 13 Green   Male
# 29 Green Female