Request Regression Plots

A function to request regression plots on a call to proc_reg. The function allows you to specify the type of regression plots to produce. It produces a combined diagnostics panel, residuals dot plot, and a fit plot by default. You may also specify individual plots by passing a vector of plot names on the "type" parameter.

regplot(
  type = c("diagnostics", "residuals", "fitplot"),
  panel = TRUE,
  stats = "default",
  label = FALSE,
  id = NULL
)

Arguments

type: The type(s) of plot to create. Multiple types should be passed as a vector of strings. Valid values are "diagnostics", "cooksd", "dfbetas", "dffits", "fitplot", "observedbypredicted", "qqplot", "residuals", "residualboxplot", "residualbypredicted", "residualhistogram", "rfplot", "rstudentbyleverage", and "rstudentbypredicted". The default value is a vector with "diagnostics", "residuals", and "fitplot". The "diagnostics" keyword produces a single combined chart with 8 different plots and a selection of statistics in a small table. The statistics can be controlled by the stats parameter. You may also pass the "all" keyword to produce all charts available for the analysis.
panel: Whether or not to display the diagnostics plots combined into in a single panel. Default is TRUE. A value of FALSE will create individual plots instead. This parameter is equivalent to the "unpack" keyword in SAS.
stats: The statistics to display on the diagnostics panel or fit plot. Valid values are: "adjrsq", "aic", "coeffvar", "depmean", "default", "edf", "mse", "nobs", "nparm", "rsquare", and "sse". The default value is "default", which produces the following statistics: "nobs", "nparm", "edf", "mse", "rsquare", and "adjrsq". You may also pass the value "none" if you do not want any statistics shown on the chart. In the case of the diagnostics panel, if the statistics table is removed, it will be replaced with a residual box plot.
label: Whether or not to label outlier values. Valid values are TRUE or FALSE. Default is FALSE. If TRUE, this option will assign labels to outlier values on some charts. Charts that support labels are as follows: "diagnostics", "cooksd", "dffits", "dfbetas", "rstudentbypredicted", and "rstudentbyleverage".
id: If the label parameter is TRUE, this parameter determines which value is assigned to the label. By default, the row number will be assigned. You may also assign a column name from the input dataset to this parameter to use as the label value.

Details

There are many types of regression plots. The plots have different uses. Some of the plots help you assess normality of the data. Other plots help you assess the quality of the model. You can select the type of plots you want by passing a vector of plot names on the type parameter.

Here is a list of possible plot types and a short description of each:

diagnostics: A fit diagnostics panel that contains 8 different types of plots and a table of statistics.
cooksd: Cook’s D statistic vs. Observation number.
dfbetas: Displays the influence of each observation for each coefficient in the model.
dffits: Displays the influence of each observation on fitted values.
fitplot: Produces a scatter plot of the dependent variable against the regressor, including the fitted line and confidence/prediction bands. This is only available for models with a single regressor.
observedbypredicted: Dependent variable (Observed) vs. Predicted values.
qqplot: Normal Quantile-Quantile (Q-Q) plot of residuals.
residuals: Produces a panel of residual plots against each independent variable in the model.
residualboxplot: A box plot of residual values.
residualbypredicted: Residuals vs. Predicted values.
residualhistogram: Histogram of residuals, with a normal and kernel curve overlay.
rfplot: Residual-Fit (RF) spread plot.
rstudentbyleverage: Externally Studentized Residuals vs. Leverage.
rstudentbypredicted: Externally Studentized Residuals (RStudent) vs. Predicted values.

If possible, the statistics from the report tabular output are used for the plots. Otherwise, additional statistics functions are called to produce the needed values.

Plot Statistics

The diagnostics panel and the fit plot each contain a small table of statistics. This table is customizable. The following keywords may be used to to customize the statistics table:

adjrsq: Adjusted R-square.
aic: Akaike's information criterion.
coeffvar: Coefficient of variation.
depmean: Mean of dependent.
default: A set of default statistics.
edf: Error degrees of freedom.
mse: Mean squared error.
nobs: Number of observations used.
nparm: Number of parameters in the model (including the intercept).
rsquare: The R-square statistic.
sse: Error sum of squares.

To use these keywords, pass them as a vector to the stats parameter on the regplot function.

Labeling Outliers

Some types of plots can be used to identify outliers in the data. For instance, the Cook's D chart is excellent for such a task. When identifying outliers, it is helpful to have them labelled on the chart. The labels make it possible to trace the outliers back to the source data.

To get a basic row/observation label, set the labels parameter to TRUE. If there is a column in the data that can be used to identify an individual record, pass that column name on the id parameter. The values from that column will then be used as outlier labels on those charts that support labels.

Additional Information

Regression plots will be displayed on interactive reports only. Plots are created as jpeg files, and stored in a temp directory. Those temporary files are then referenced by the interactive report to display the graphic.

If desired, you may output the report objects and pass to proc_print. To do this, set output = report on the call to proc_reg, and pass the entire list to proc_print.

Discrepancies with SAS

The histogram binning algorithm in R is different from the binning algorithm in SAS®. R uses a "Sturges" algorithm, which more accurately represents the distribution of the data. This algorithm may produce a different number of bins and different height of bars than the corresponding SAS chart.

Plots generated by proc_reg may also have some spacing, color, and shape discrepancies with the corresponding SAS charts. The information conveyed is expected to be similar.

Examples

library(procs)

# Turn off printing for CRAN checks
# Set to TRUE to run in local environment
options("procs.print" = FALSE)


# Example 1: Regression statistics with default plots
res <- proc_reg(iris, model = "Sepal.Length = Petal.Length",
                 output = report,
                 plots = TRUE,
                 titles = "Iris Regression Statistics")

# View results
res

# Example 2: Regression statistics with custom plot request and by variable
res <- proc_reg(iris, model = "Sepal.Length = Petal.Length",
                 output = report,
                 by = Species,
                 plots = regplot(type = v(residualhistogram,
                                 rstudentbypredicted, rstudentbyleverage),
                                 label = TRUE),
                 titles = "Iris Regression Statistics")

# View results
res

# Example 3: Regression statistics with multiple models, same plot string
res <- proc_reg(iris, model = c("Sepal.Length = Petal.Length",
                                "Sepal.Length = Sepal.Width",
                                "Sepal.Length = Petal.Width"),
                 output = report,
                 plots = "diagnostics",
                 titles = "Iris Regression Statistics")

# View results
res

# Example 4: Regression statistics with multiple models, different plot strings
res <- proc_reg(iris, model = c("Sepal.Length = Petal.Length",
                                "Sepal.Length = Sepal.Width",
                                "Sepal.Length = Petal.Width"),
                 output = report,
                 plots = list("diagnostics",
                              "residualhistogram",
                              "fitplot"),
                 titles = "Iris Regression Statistics")

# View results
res

# Example 5: Regression statistics with multiple models, different plot functions
res <- proc_reg(iris, model = c("Sepal.Length = Petal.Length",
                                "Sepal.Length = Sepal.Width",
                                "Sepal.Length = Petal.Width"),
                 output = report,
                 plots = list(regplot(type = "diagnostics"),
                              regplot(type = "cooksd",
                                      label = TRUE),
                              regplot(type = "fitplot",
                                      stats = c("nobs", "mse", "rsquare"))),
                 titles = "Iris Regression Statistics")

# View results
res

# Example 6: Regression statistics with multiple models, influence charts
res <- proc_reg(iris, model = c("Sepal.Length = Petal.Length",
                                "Sepal.Length = Petal.Length Petal.Width",
                                "Sepal.Length = Petal.Length Petal.Width Sepal.Width"),
                 output = report,
                 plots = regplot(v(cooksd, dffits, dfbetas), label = TRUE),
                 titles = "Iris Regression Statistics")

# View results
res