This post shows one simple way to get correctly standardized regression coefficients in multiple linear regression with a moderator, with appropriate confidence intervals, using std_selected_boot() from the stdmod package.

(Note: This post and some related posts, e.g., this one on categorical variables, have sections that are identical or highly similar. This is intentional, to make each article self-contained. Readers do not need to refer to other articles to understand the points.)

“Betas”

It is common in my area, psychology, to report standardized coefficients, the so-called “betas,” when reporting the results of multiple regression or related methods. It is so common that some programs have the “betas” reported by default, alongside the unstandardized coefficients (the “Bs”).

However, when a regression model has a moderator and hence a product term, the common practice, standardizing all variables, including the product term, will lead to “betas” that is difficult to interpret.

A Standardized Product Is Not A Product of Standardized Variables

That is, the standardized \(x \times w\) is not \(\textrm{standardized } x \times \textrm{standardized } w\). The value of the former is no longer a measure of the moderation effect.¹

This point is not new. Friedrich (1982) has discussed this a long time ago, and Aiken & West (1991) have also explored this in details.

Unfortunately, reporting the “beta” of the product term computed this way is quite common (including in my own work).

The solutions are simple:

Standardize The Components

For the product term \(x \times w\), standardize \(x\) (\(z_x\)) and \(w\) (\(z_w\)), and compute the product of them, \(z_x \times z_w\).

However, common statistical programs standardize all variables. To do standardization right in moderated regression, we need to manually do the standardization and compute the product term, twice.

The function std_selected_boot() from the R package stdmod, which I and my collaborators developed for moderated regression (Cheung et al., 2022), can do the standardization as presented in Aiken & West (1991) and Friedrich (1982), without any drastic changes to the common workflow.

How to Use `std_selected_boot()`

Workflow

Fit a regression model by lm() as usual.
Pass the output to std_selected_boot() and select variables to be standardized.

It was designed to let users have full control on which variables to standardize (see this article on how). It has one additional feature:

What It Won’t Do (Correctly)

If a model has product terms, they will be formed after the component terms are standardized.

Example

Do Regression As Usual

Suppose this is the data set:

Show the code for generating data

set.seed(342532)
n <- 200
control1 <- rnorm(n, 5, 1)
control2 <- rnorm(n, 20, 5)
x <- rnorm(n, 10, 4)
w <- rnorm(n, 10, 2)
y <- 10 + 1 * control1 + 2 * control2 +
     (3 + 1 * (w - 10)) * x + rnorm(n, 0, 30)
dat <- data.frame(control1, control2, x, w, y)
write.csv(dat, "dat.csv")

This is the data file.

head(dat)

  control1 control2         x         w         y
1 5.025877 22.32080  9.521779 11.049711 131.15905
2 6.200119 17.04644 11.912240 10.860813 102.01347
3 4.992346 27.28403  9.845949 10.515219  77.86161
4 4.979231 12.85510  6.512449 13.635488  89.54024
5 7.020257 15.48984  5.734798  9.846383  66.94897
6 3.969682 24.61428  3.329810 10.221247  38.20317

The variable x is the predictor, w the moderator. The dataset also has two control variables, control1 and control2. The outcome variable is y.

This is the regression model.

lm_out <- lm(y ~ control1 + control2 + x*w, dat)
summary(lm_out)


Call:
lm(formula = y ~ control1 + control2 + x * w, data = dat)

Residuals:
    Min      1Q  Median      3Q     Max 
-86.895 -18.374  -0.556  19.255  70.971 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  40.8781    29.5919   1.381 0.168746    
control1     -2.2750     2.1314  -1.067 0.287129    
control2      1.0053     0.3821   2.631 0.009191 ** 
x            -4.6922     2.3479  -1.998 0.047060 *  
w             0.3635     2.4452   0.149 0.881986    
x:w           0.7780     0.2287   3.402 0.000811 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 28.34 on 194 degrees of freedom
Multiple R-squared:  0.3587,    Adjusted R-squared:  0.3422 
F-statistic:  21.7 on 5 and 194 DF,  p-value: < 2.2e-16

The moderation effect is significant.

If w increases by one unit, the effect of x increases by 0.778.
If w is equal to zero, the effect of x is -4.692, and significant.

Do Standardization Right

Install stdmod and load it:

library(stdmod)

If we want to standardize all variables except for the product term, and compute the product term as the product of the standardized variables, we just pass the output to std_selected_boot(), and set to_standardize to ~ .. The right-hand side of ~ denotes the variables to be standardized. If set to ., then all numeric variables, including the outcome variable (y), will be standardized.²

But this is not just about the coefficient. There is one issue with standardization: confidence intervals.

Beware of the t-based SE and CI

If a variable is standardized, the usual t-based standard errors and confidence intervals of the coefficients that involve it may be biased.³

This is because (a) they do not take into account the sampling variation of the standard deviations used in standardization (Yuan & Chan, 2011), and (b) the coefficients with standardization, the “betas”, are not normally distributed (though may be close to). Many statistical programs do not report the confidence intervals for “betas,” for a good reason.

The case of moderated regression is more complicated because, as shown in Cheung et al. (2022), the correctly standardized product term involves the standard deviations of three variables, not two.

This is why std_selected_boot() enables nonparametric bootstrapping percentile confidence intervals by default, just in case the bias is large.

To have stable and reproducible confidence intervals, call set.seed() before calling std_selected_boot() and set nboot to the desired number of bootstrap samples (at least 2000 but 5000 or more is recommended):

set.seed(645321)
lm_out_std <- std_selected_boot(lm_out,
                                to_standardize = ~ .,
                                nboot = 5000)

Call summary() as usual:

summary(lm_out_std)


Call to std_selected_boot():
std_selected_boot(lm_out = lm_out, to_standardize = ~., nboot = 5000)

Selected variable(s) are centered by mean and/or scaled by SD
- Variable(s) centered: y control1 control2 x w
- Variable(s) scaled: y control1 control2 x w

         centered_by  scaled_by                            Note
y          83.477789 34.9370599 Standardized (mean = 0, SD = 1)
control1    5.060548  0.9562693 Standardized (mean = 0, SD = 1)
control2   19.625484  5.3780625 Standardized (mean = 0, SD = 1)
x           9.844108  4.3166239 Standardized (mean = 0, SD = 1)
w          10.132063  2.0474763 Standardized (mean = 0, SD = 1)

Note:
- Categorical variables will not be centered or scaled even if
  requested.
- Nonparametric bootstrapping 95% confidence intervals computed.
- The number of bootstrap samples is 5000.

Call:
lm(formula = y ~ control1 + control2 + x * w, data = dat_mod)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.48720 -0.52591 -0.01591  0.55113  2.03139 

Coefficients:
            Estimate CI Lower CI Upper Std. Error t value Pr(>|t|)    
(Intercept)   0.0201  -0.0059   0.0522     0.0577  0.3492 0.727347    
control1     -0.0623  -0.1528   0.0361     0.0583 -1.0674 0.287129    
control2      0.1548   0.0454   0.2650     0.0588  2.6312 0.009191 ** 
x             0.3941   0.2776   0.5028     0.0596  6.6154  < 0.001 ***
w             0.4701   0.3586   0.5735     0.0580  8.1079  < 0.001 ***
x:w           0.1968   0.0981   0.2893     0.0578  3.4023  < 0.001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.811 on 194 degrees of freedom

R-squared                : 0.3587
Adjusted R-squared       : 0.3422
ANOVA test of R-squared  : F(5, 194) = 21.7047, p < 0.001

= Test the highest order term =
The highest order term             : x:w
R-squared increase adding this term: 0.0383
F test of R-squared increase       : F(1, 194) = 11.5759, p < 0.001

Note:
- Estimates and their statistics are based on the data after
  mean-centering, scaling, or standardization.
- [CI Lower, CI Upper] are bootstrap percentile confidence intervals.
- Std. Error are not bootstrap SEs.

The output has one additional section:

Variables that are standardized⁴, and variables that are not transformed.

As shown in the first section, x and w are standardized. x:w is not on the list. It is because x:w is not standardized. It should not be, as explained above.

The other sections are similar to those for a usual multiple regression. Note that the column Estimate is intentionally not labelled as Beta because it is possible that only some variables are standardized. Labelling it as Beta, though common, is misleading.

Interpret The Output

The coefficient of the product term x:w is 0.197. That is, for each one SD increase of w, the standardized effect of x on y ( the “beta” of x on y) increases by 0.197.

My collaborator and I proposed to call the moderation effect with x, w, and y standardized, x:w in the example, the standardized moderation effect (Cheung et al., 2022). When variables are standardized as proposed by Friedrich (1982), the coefficients can be interpreted as usual in moderated regression, with all variables on the standardized metric.

The coefficient of x is 0.394. This is the standardized effect (the “beta”) of x when w in this model is equal to zero. Because w is standardized, this is equivalent to say that this is the standardized effect of x when w is equal to its mean because the mean of a standardized variable is zero.

Conditional Effect

The function cond_effect_boot() from stdmod can be used to compute the conditional effects. Just pass the output of lm() or std_selected_boot() to it.

When the variables are standardized, cond_effect_boot() can be used to compute the standardized conditional effects, with nonparametric bootstrap confidence intervals. Just set nboot to the desired number of bootstrap samples. To ensure that the same nboot bootstrap samples are drawn, set the seed to the number used in std_seleted_boot().

set.seed(645321)
cond_std <- cond_effect_boot(lm_out_std,
                             x = "x",
                             w = "w",
                             nboot = 5000)
cond_std

The effects of x on y, conditional on w:

  Level      w x Effect CI Lower CI Upper  S.E.     t     p Sig
   High  1.000    0.591    0.429    0.737 0.085 6.982 0.000 ***
 Medium  0.000    0.394    0.278    0.503 0.060 6.615 0.000 ***
    Low -1.000    0.197    0.051    0.340 0.081 2.424 0.016 *  

[CI Lower, CI Upper] shows the 95% nonparametric bootstrap confidence
interval(s) (based on 5000 bootstrap samples).


The regression model:

    y ~ control1 + control2 + x * w

Interpreting the levels of w:

  Level      w % Below From Mean (in SD)
   High  1.000   81.50              1.00
 Medium  0.000   51.00              0.00
    Low -1.000   15.00             -1.00

- % Below: The percent of cases equal to or less than a level.
- From Mean (in SD): Distance of a level from the mean, in standard
  deviation (+ve above, -ve below).

Note:

- The variable(s) y, x, w is/are standardized.
- The conditional effects are the standardized effects of x on y.

As shown above, the effect of x is positive and significant when w ranges from one SD below mean to one SD above mean. The standardized effect of x on y when w is one SD above mean is 0.591.

Why The Usual Practice Is Problematic

Standardize the Product Term

Assume that we standardize all variables, including the product term, as in some statistical program. To simulate this, let’s manually create the product term, standardize all variables, including the product term, and do regression:

Show the code

dat$x_w <- dat$x * dat$w
dat_std <- as.data.frame(scale(dat))
head(dat_std)

     control1   control2             x           w          y        x_w
1 -0.03625650  0.5011678 -0.0746715806  0.44818493  1.3647761  0.1360233
2  1.19168346 -0.4795487  0.4791086834  0.35592570  0.5305451  0.6515372
3 -0.07132155  1.4240348  0.0004264965  0.18713570 -0.1607514  0.1001691
4 -0.08503578 -1.2588894 -0.7718205413  1.71109398  0.1735248 -0.2141245
5  2.04932709 -0.7689844 -0.9519731607 -0.13952791 -0.4731027 -0.9039326
6 -1.14075236  0.9276200 -1.5091188133  0.04355764 -1.2958909 -1.3825065

Show the code

psych::describe(dat_std, range = FALSE, skew = FALSE)

         vars   n mean sd   se
control1    1 200    0  1 0.07
control2    2 200    0  1 0.07
x           3 200    0  1 0.07
w           4 200    0  1 0.07
y           5 200    0  1 0.07
x_w         6 200    0  1 0.07

Show the code

lm_out_std_wrong <- lm(y ~ control1 + control2 + x + w + x_w, dat_std)

The following results are what found in common statistical programs that standardize all variables, including the product term, to yield the “betas”:

Show the code

printCoefmat(summary(lm_out_std_wrong)$coefficients,
             zap.ind = 1:4,
             P.values = TRUE)

             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.000000   0.057350  0.0000  1.00000    
control1    -0.062270   0.058339 -1.0674  0.28713    
control2     0.154750   0.058813  2.6312  0.00919 ** 
x           -0.579740   0.290091 -1.9985  0.04706 *  
w            0.021300   0.143300  0.1486  0.88199    
x_w          1.043740   0.306771  3.4023  0.00081 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let us compare the results.

p-values

Some p-values are different, which is expected:

Show the code

coef_std <- summary(lm_out_std)$coefficients
coef_std_wrong <- summary(lm_out_std_wrong)$coefficients
round(cbind(`p-value (Std Correct)` = coef_std[, "Pr(>|t|)"],
            `p-value (Std All)` = coef_std_wrong[, "Pr(>|t|)"]), 5)

            p-value (Std Correct) p-value (Std All)
(Intercept)               0.72735           1.00000
control1                  0.28713           0.28713
control2                  0.00919           0.00919
x                         0.00000           0.04706
w                         0.00000           0.88199
x:w                       0.00081           0.00081

The p-values of x:w are the same, which is expected (see Aiken & West, 1991).

However, the p-values of x and w are very different. This is not just because they are effects of x and w conditional on other values. They actually are not conditional effects as we usually understood, explained in the next section.

Coefficient Estimates

Let us compare the coefficients:

Show the code

round(cbind(`Estimate (Std Correct)` = coef_std[, "Estimate"],
            `Estimate (Std All)` = coef_std_wrong[, "Estimate"]), 3)

            Estimate (Std Correct) Estimate (Std All)
(Intercept)                  0.020              0.000
control1                    -0.062             -0.062
control2                     0.155              0.155
x                            0.394             -0.580
w                            0.470              0.021
x:w                          0.197              1.044

The coefficients of x, w, and x:w have very different values between the correct standardized solution and the standardized solution with product term standardized.

Two points of note for the second column, the solution with the product term itself standardized:

The coefficient of x:w , when it is standardized (called x_w), is not a product term. The value 1.044 is not the increase of effect of x when w increases by one SD, nor by one unit.
The coefficient of x is not the usual conditional effect. Mathematically, it is the standardized effect of x when w is equal to “some value”. However, what is this value? This is not easy to determine because x_w is not a product term. The coefficient of w has the same problem.

Standardizing the Product Term Make Interpretation Difficult

The coefficient of the product term is no longer the moderation effect (except in some very special cases).
The coefficients of the component terms, the focal variable and the moderator, are no longer the conditional effects of values easily deducible from the results.

Use t Statistics Confidence Intervals

Some programs gives confidence intervals of “betas” using t statistics. That is, standardize variables, do regression, and form the confidence intervals, as if the standardized variables were the original data.

Let us compare the bootstrap confidence intervals with the usual OLS confidence intervals based on the t-statistics.

For the output of std_selected_boot(), we can request the t-based confidence intervals by setting type to "lm" when calling confint(), the function commonly used to form confidence intervals:

# OLS confidence intervals
round(confint(lm_out_std, type = "lm"), 3)

             2.5 % 97.5 %
(Intercept) -0.094  0.134
control1    -0.177  0.053
control2     0.039  0.271
x            0.277  0.512
w            0.356  0.584
x:w          0.083  0.311

Without setting type, the nonparametric bootstrap confidence intervals are returned (if available):

# Bootstrap Confidence Intervals
round(confint(lm_out_std), 3)

             2.5 % 97.5 %
(Intercept) -0.006  0.052
control1    -0.153  0.036
control2     0.045  0.265
x            0.278  0.503
w            0.359  0.574
x:w          0.098  0.289

As shown above, the confidence intervals of x, w, and x:w by the two methods are close to each other. However, the bootstrap confidence intervals tend to be narrower than the t-based confidence intervals.

We can compare the widths of the confidence intervals by examining their ratios (ratio = CI_Width_t / CI_Width_boot):

Show the code

ci_t <- confint(lm_out_std, type = "lm")
ci_b <- confint(lm_out_std)
width_t <- ci_t[, 2] - ci_t[, 1]
width_b <- ci_b[, 2] - ci_b[, 1]
ci_compare <- data.frame(CI_Width_t = width_t,
                         CI_Width_boot = width_b)
ci_compare$ratio <- ci_compare[, 1] / ci_compare[, 2]
round(ci_compare, 3)

            CI_Width_t CI_Width_boot ratio
(Intercept)      0.227         0.058 3.918
control1         0.230         0.189 1.218
control2         0.232         0.220 1.056
x                0.235         0.225 1.044
w                0.229         0.215 1.064
x:w              0.228         0.191 1.193

The t-based confidence interval of the product term, the standardized moderation effect, is nearly 19.3% wider than the bootstrap confidence interval.

Final Remarks

What std_selected_boot() does can be implemented in R code by researchers using base R packages only. The function was developed to make it easier for researchers to do standardization right in moderated regression, without adopting the all-or-none approach.

More information on std_selected_boot() can be found from its help page.

Other Resources

There are programs that advocate the correct way to do standardization in moderated regression. For example, users are recommended to standardize variables before using them to do moderated regression. Done this way, they yield correct point estimates for standardized solution as std_selected_boot() does.

However, I am not aware of program that do the bootstrapping correctly when forming the bootstrapping confidence interval for the correctly standardized solution. It is because both the multiple regression and standardization need to be done in each bootstrap sample. If standardization is done only once, before doing bootstrapping, even though the point estimates are correct, the bootstrap confidence intervals are not.

Revision History and Issues

The revision history of this post can be find in the GitHub history of the source file.

For issues on this post, such as corrections and mistakes, please open an issue for the GitHub repository for this blog. Thanks.

:::

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. SAGE Publication.

Cheung, S. F., Cheung, S.-H., Lau, E. Y. Y., Hui, C. H., & Vong, W. N. (2022). Improving an old way to measure moderation effect in standardized units. Health Psychology, 41(7), 502–505. https://doi.org/10.1037/hea0001188

Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations. American Journal of Political Science, 26(4), 797–833. https://doi.org/10.2307/2110973

Yuan, K.-H., & Chan, W. (2011). Biases and standard errors of standardized regression coefficients. Psychometrika, 76(4), 670–690. https://doi.org/10.1007/s11336-011-9224-6

Footnotes

Standardized \(x \times w\) can be close to \(\textrm{standardized } x \times \textrm{standardized } w\), but only in some very specific situations, as shown in Cheung et al. (2022).↩︎
Suppose we only want to standardize x and y, because w is on a meaningful unit, we can set to_standardize to ~ x + y. Order does not matter.↩︎
Note that there are indeed cases in which they are still unbiased, and cases in which the biases are negligible. See Yuan & Chan (2011) for a detailed discussion.↩︎
std_selected_boot() allows users to do only mean-centering or scaling by standard deviation. Meaning-centering and than scaling by standard deviation is equivalent to standardization.↩︎

“Betas”

How to Use std_selected_boot()

Example

Do Regression As Usual

Do Standardization Right

Interpret The Output

Conditional Effect

Why The Usual Practice Is Problematic

Standardize the Product Term

p-values

Coefficient Estimates

Use t Statistics Confidence Intervals

Final Remarks

Other Resources

Revision History and Issues

References

Footnotes

How to Use `std_selected_boot()`