The proc_ttest
function generates T-Test statistics
for selected variables on the input dataset.
The variables are identified on the var
parameter or the paired
parameter. The function will calculate a standard set of T-Test statistics.
Results are displayed in
the viewer interactively and returned from the function.
proc_ttest(
data,
var = NULL,
paired = NULL,
output = NULL,
by = NULL,
class = NULL,
options = NULL,
titles = NULL
)
The input data frame for which to calculate summary statistics. This parameter is required.
The variable or variables to be used for hypothesis testing.
Pass the variable names in a quoted vector,
or an unquoted vector using the
v()
function. If there is only one variable, it may be passed
unquoted.
If the class
variable is specified, the function will compare the two groups identified
in the class variable. If the class
variable is not specified,
enter the baseline hypothesis value on the "h0" option. Default "h0" value
is zero (0).
A vector of paired variables to perform a paired T-Test on.
Variables should
be separated by a star (*). The entire string should be quoted, for example,
paired = "var1 * var2"
. To test multiple pairs, place the pairs in a
quoted vector
: paired = c("var1 * var2", "var3 * var4")
. The parameter does not
accept parenthesis, hyphens, or any other shortcut syntax.
Whether or not to return datasets from the function. Valid
values are "out", "none", and "report". Default is "out", and will
produce dataset output specifically designed for programmatic use. The "none"
option will return a NULL instead of a dataset or list of datasets.
The "report" keyword returns the datasets from the interactive report, which
may be different from the standard output. The output parameter also accepts
data shaping keywords "long, "stacked", and "wide".
These shaping keywords control the structure of the output data. See the
Data Shaping section for additional details. Note that
multiple output keywords may be passed on a
character vector. For example,
to produce both a report dataset and a "long" output dataset,
use the parameter output = c("report", "out", "long")
.
An optional by group. If you specify a by group, the input data will be subset on the by variable(s) prior to performing any statistics.
The class
parameter is used to perform a unpaired T-Test
between two different groups of the same variable. For example, if you
want to test for a significant difference between a control group and a test
group, where the control and test groups are in rows identified by a
variable "Group". Note that
there can only be two different values on the class variable. Also, the
analysis is restricted to only one class variable.
A vector of optional keywords. Valid values are: "alpha =", "h0 =", and "noprint". The "alpha = " option will set the alpha value for confidence limit statistics. The default is 95% (alpha = 0.05). The "h0 = " option sets the baseline hypothesis value for single-variable hypothesis testing. The "noprint" option turns off the interactive report.
A vector of one or more titles to use for the report output.
Normally, the requested T-Test statistics are shown interactively
in the viewer, and output results are returned as a list of data frames.
You may then access individual datasets from the list using dollar sign ($)
syntax.
The interactive report can be turned off using the "noprint" option, and
the output datasets can be turned off using the "none" keyword on the
output
parameter.
The proc_ttest
function is for performing hypothesis testing.
Data is passed in on the data
parameter. The function can segregate data into
groups using the by
parameter. There are also
options to determine whether and what results are returned.
The proc_ttest
function allows for three types of analysis:
One Sample: The one sample test allows you to perform
significance testing of a single variable against a known baseline
value or null hypothesis. To perform this test, pass the variable name
on the var
parameter and the baseline on the h0=
option. The
one sample T-Test performs a classic Student's T-Test and assumes your
data has a normal distribution.
Paired Comparison: The paired comparison is for tests of
two variables with a natural pairing and the same number of observations
for both measures.
For instance, if you are checking for a change in blood pressure for
the same group of patients at different time points. To perform a paired
comparison, use the paired
parameter with the two variables
separated by a star (*). The paired T-Test performs a classic Student's T-Test
and assumes your data has a normal distribution.
Two Independant Samples: The analysis of two independent
samples is used when there is no natural pairing, and there may be a different
number of observations in each group. This method is used, for example,
if you are comparing the effectiveness of a treatment between two different
groups of patients. The function assumes that there is
a single variable that contains the analysis values for both groups, and
another variable to identify the groups. To perform this analysis,
pass the target variable name on the var
parameter, and the
grouping variable on the class
parameter. The Two Sample T-Test
provides both a Student's T-Test and a Welch-Satterthwaite T-Test. Select
the appropriate T-Test results for your data based on the known normality.
By default, proc_ttest
results will
be sent to the viewer as an HTML report. This functionality
makes it easy to get a quick analysis of your data. To turn off the
interactive report, pass the "noprint" keyword
to the options
parameter.
The titles
parameter allows you to set one or more titles for your
report. Pass these titles as a vector of strings.
The exact datasets used for the interactive report can be returned as a list.
To return these datasets, pass
the "report" keyword on the output
parameter. This list may in
turn be passed to proc_print
to write the report to a file.
Dataset results are also returned from the function by default.
proc_ttest
typically returns multiple datasets in a list. Each
dataset will be named according to the category of statistical
results. There are three standard categories: "Statistics",
"ConfLimits", and "TTests". For the class style analysis, the function
also returns a dataset called "Equality" that shows the Folded F analysis.
The output datasets generated are optimized for data manipulation. The column names have been standardized, and additional variables may be present to help with data manipulation. For example, the by variable will always be named "BY". In addition, data values in the output datasets are intentionally not rounded or formatted to give you the most accurate numeric results.
The proc_ttest
function recognizes the following options. Options may
be passed as a quoted vector of strings, or an unquoted vector using the
v()
function.
alpha = : The "alpha = " option will set the alpha
value for confidence limit statistics. Set the alpha as a decimal value
between 0 and 1. For example, you can set a 90% confidence limit as
alpha = 0.1
.
h0: The "h0 =" option is used to set the baseline mean value
for testing a single variable. Pass the option as a name/value pair,
such as h0 = 95
.
noprint: Whether to print the interactive report to the
viewer. By default, the report is printed to the viewer. The "noprint"
option will inhibit printing. You may inhibit printing globally by
setting the package print option to false:
options("procs.print" = FALSE)
.
The output datasets produced by the function can be shaped in different ways. These shaping options allow you to decide whether the data should be returned long and skinny, or short and wide. The shaping options can reduce the amount of data manipulation necessary to get the data into the desired form. The shaping options are as follows:
long: Transposes the output datasets so that statistics are in rows and variables are in columns.
stacked: Requests that output datasets be returned in "stacked" form, such that both statistics and variables are in rows.
wide: Requests that output datasets be returned in "wide" form, such that statistics are across the top in columns, and variables are in rows. This shaping option is the default.
These shaping options are passed on the output
parameter. For example,
to return the data in "long" form, use output = "long"
.
# Turn off printing for CRAN checks
options("procs.print" = FALSE)
# Prepare sample data
dat1 <- subset(sleep, group == 1, c("ID", "extra"))
dat2 <- subset(sleep, group == 2, c("ID", "extra"))
dat <- data.frame(ID = dat1$ID, group1 = dat1$extra, group2 = dat2$extra)
# View sample data
dat
# ID group1 group2
# 1 1 0.7 1.9
# 2 2 -1.6 0.8
# 3 3 -0.2 1.1
# 4 4 -1.2 0.1
# 5 5 -0.1 -0.1
# 6 6 3.4 4.4
# 7 7 3.7 5.5
# 8 8 0.8 1.6
# 9 9 0.0 4.6
# 10 10 2.0 3.4
# Example 1: T-Test using h0 option
res1 <- proc_ttest(dat, var = "group1", options = c("h0" = 0))
# View results
res1
# $Statistics
# VAR N MEAN STD STDERR MIN MAX
# 1 group1 10 0.75 1.78901 0.5657345 -1.6 3.7
#
# $ConfLimits
# VAR MEAN LCLM UCLM STD
# 1 group1 0.75 -0.5297804 2.02978 1.78901
#
# $TTests
# VAR DF T PROBT
# 1 group1 9 1.32571 0.2175978
# Example 2: T-Test using paired parameter
res2 <- proc_ttest(dat, paired = "group2 * group1")
# View results
res2
# $Statistics
# VAR1 VAR2 DIFF N MEAN STD STDERR MIN MAX
# 1 group2 group1 group2-group1 10 1.58 1.229995 0.3889587 0 4.6
#
# $ConfLimits
# VAR1 VAR2 DIFF MEAN LCLM UCLM STD LCLMSTD UCLMSTD
# 1 group2 group1 group2-group1 1.58 0.7001142 2.459886 1.229995 0.8460342 2.245492
#
# $TTests
# VAR1 VAR2 DIFF DF T PROBT
# 1 group2 group1 group2-group1 9 4.062128 0.00283289
# Example 3: T-Test using class parameter
res3 <- proc_ttest(sleep, var = "extra", class = "group")
# View results
res3
# $Statistics
# VAR CLASS METHOD N MEAN STD STDERR MIN MAX
# 1 extra 1 <NA> 10 0.75 1.789010 0.5657345 -1.6 3.7
# 2 extra 2 <NA> 10 2.33 2.002249 0.6331666 -0.1 5.5
# 3 extra Diff (1-2) Pooled NA -1.58 NA 0.8490910 NA NA
# 4 extra Diff (1-2) Satterthwaite NA -1.58 NA 0.8490910 NA NA
#
# $ConfLimits
# VAR CLASS METHOD MEAN LCLM UCLM STD LCLMSTD UCLMSTD
# 1 extra 1 <NA> 0.75 -0.5297804 2.0297804 1.789010 1.230544 3.266034
# 2 extra 2 <NA> 2.33 0.8976775 3.7623225 2.002249 1.377217 3.655326
# 3 extra Diff (1-2) Pooled -1.58 -3.3638740 0.2038740 NA NA NA
# 4 extra Diff (1-2) Satterthwaite -1.58 -3.3654832 0.2054832 NA NA NA
#
# $TTests
# VAR METHOD VARIANCES DF T PROBT
# 1 extra Pooled Equal 18.00000 -1.860813 0.07918671
# 2 extra Satterthwaite Unequal 17.77647 -1.860813 0.07939414
#
# $Equality
# VAR METHOD NDF DDF FVAL PROBF
# 1 extra Folded F 9 9 1.252595 0.7427199
# Example 4: T-Test using alpha option and by variable
res4 <- proc_ttest(sleep, var = "extra", by = "group", options = c(alpha = 0.1))
# View results
res4
# $Statistics
# BY VAR N MEAN STD STDERR MIN MAX
# 1 1 extra 10 0.75 1.789010 0.5657345 -1.6 3.7
# 2 2 extra 10 2.33 2.002249 0.6331666 -0.1 5.5
#
# $ConfLimits
# BY VAR MEAN LCLM UCLM STD LCLMSTD UCLMSTD
# 1 1 extra 0.75 -0.2870553 1.787055 1.789010 1.304809 2.943274
# 2 2 extra 2.33 1.1693340 3.490666 2.002249 1.460334 3.294095
#
# $TTests
# BY VAR DF T PROBT
# 1 1 extra 9 1.325710 0.217597780
# 2 2 extra 9 3.679916 0.005076133
# Example 5: Single variable T-Test using "long" shaping option
res5 <- proc_ttest(sleep, var = "extra", output = "long")
# View results
res5
# $Statistics
# STAT extra
# 1 N 20.0000000
# 2 MEAN 1.5400000
# 3 STD 2.0179197
# 4 STDERR 0.4512206
# 5 MIN -1.6000000
# 6 MAX 5.5000000
#
# $ConfLimits
# STAT extra
# 1 MEAN 1.5400000
# 2 LCLM 0.5955845
# 3 UCLM 2.4844155
# 4 STD 2.0179197
# 5 LCLMSTD 1.5346086
# 6 UCLMSTD 2.9473163
#
# $TTests
# STAT extra
# 1 DF 19.00000000
# 2 T 3.41296500
# 3 PROBT 0.00291762