1 Servo Example: Linear regression


In this example we look at how to apply a linear regression model to a highly non-linear data set. Working through several iterations and relying on analysis of patterns in the data, we are able to converge on model with a strong fit and good explanatory powers.

1.1 Data Set


The data set may be found on the UC Irvine machine learning repository:

Servo Data


The data set was created by Karl Ulrich, an MIT professor, and donated to the UCI data repository by Ross Quinlan. It contains two discrete numerical predictors dealing with gain settings:

  • pgain and vgain

and two categorical variables dealing mechanical components:

  • motor and screw

The output variable is class, a measure of some sort of reaction time. From Ulrich’s recollection, “the output value is almost certainly a rise time, or the time required for the system to respond to a step change in a position set point.” Hence, presumably lower class values correspond to faster/better performance.


We begin by loading the data set. Then we begin exploring by examining column names, checking for missing values, etc.

head(servo)
str(servo)
sum(is.na(servo))
##   motor screw pgain vgain     class
## 1     E     E     5     4 0.2812509
## 2     B     D     6     5 0.5062525
## 3     D     D     4     3 0.3562515
## 4     B     A     3     2 5.5000330
## 5     D     B     6     5 0.3562515
## 6     E     C     4     3 0.8062546
## 'data.frame':    167 obs. of  5 variables:
##  $ motor: chr  "E" "B" "D" "B" ...
##  $ screw: chr  "E" "D" "D" "A" ...
##  $ pgain: int  5 6 4 3 6 4 3 3 6 4 ...
##  $ vgain: int  4 5 3 2 5 3 2 2 5 1 ...
##  $ class: num  0.281 0.506 0.356 5.5 0.356 ...
## [1] 0
# servo |> group_by(motor) |> summarize(count=n())
# servo |> group_by(screw) |> summarize(count=n())
# servo |> group_by(pgain) |> summarize(count=n())
# servo |> group_by(vgain) |> summarize(count=n())

servo |> ggplot(aes(x = class))+
  geom_histogram(
    aes(y = ..density..),
    col = 'darkgray',
    fill = 'darkred',
    bins = 20
  )+
  ggtitle("Histogram of Class")+
  theme(plot.background = element_rect(fill = "darkgray"))

servo |> ggplot(aes(x = motor))+
  geom_bar(
    aes(y = ..count..),
    col = 'darkgray',
    fill = 'darkgreen'
  )+
  ggtitle("Counts of Motor Type")+
  theme(plot.background = element_rect(fill = "darkgray"))

servo |> ggplot(aes(x = screw))+
  geom_bar(
    aes(y = ..count..),
    col = 'darkgray',
    fill = 'darkgreen'
  )+
  ggtitle("Counts of Screw Type")+
  theme(plot.background = element_rect(fill = "darkgray"))

servo |> ggplot(aes(x = factor(pgain)))+
  geom_bar(
    aes(y = ..count..),
    col = 'darkgray',
    fill = 'darkgreen'
  )+
  ggtitle("Counts of pgain level")+
  theme(plot.background = element_rect(fill = "darkgray"))

servo |> ggplot(aes(x = factor(vgain)))+
  geom_bar(
    aes(y = ..count..),
    col = 'darkgray',
    fill = 'darkgreen'
  )+
  ggtitle("Counts of vgain level")+
  theme(plot.background = element_rect(fill = "darkgray"))


While there is some variation by category of screw/motor and level of pgain/vgain, we see all levels/categories have at least 20 observations. This is good for fitting individual predictor effects. Next we count the number of observations across every possible combination of the predictor variables. This gives us a sense of the combinations of predictors present in the data for evaluating possible interaction effects.


servo |> group_by(motor, screw, pgain, vgain) |> summarize(count=n())
## # A tibble: 167 x 5
## # Groups:   motor, screw, pgain [93]
##    motor screw pgain vgain count
##    <chr> <chr> <int> <int> <int>
##  1 A     A         3     1     1
##  2 A     A         3     2     1
##  3 A     A         4     1     1
##  4 A     A         4     2     1
##  5 A     A         4     3     1
##  6 A     A         5     3     1
##  7 A     A         5     4     1
##  8 A     A         6     5     1
##  9 A     B         3     1     1
## 10 A     B         3     2     1
## # ... with 157 more rows


In the resulting output only 10 observations are displayed, out of a total of 167. Now if every combination of categories/levels were present in the data, we would expect 5x5x4x5=500 observations. Hence running a blind regression including all possible combinations is not a good idea here. We will do it anyway for instructional purposes and to see the outcome.

1.2 Regression Models


We begin the first regression by regressing class on the full set of interactions between all predictor variables. There are too many predictors here, given the size of the data set, to be able to fit a parsimonious model. Nonetheless, this serves as a starting point.


reg0 = lm(class ~ factor(motor)*factor(screw)*pgain*vgain,data = servo)
summary(reg0)
## 
## Call:
## lm(formula = class ~ factor(motor) * factor(screw) * pgain * 
##     vgain, data = servo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5917 -0.2360  0.0000  0.2217  2.0431 
## 
## Coefficients: (4 not defined because of singularities)
##                                             Estimate Std. Error t value
## (Intercept)                                20.446836   2.771706   7.377
## factor(motor)B                             -0.704880   3.919784  -0.180
## factor(motor)C                            -11.376378   3.420310  -3.326
## factor(motor)D                            -10.071290   4.218559  -2.387
## factor(motor)E                             11.394076   4.218559   2.701
## factor(screw)B                             -0.044329   4.218559  -0.011
## factor(screw)C                             -1.507741   4.218559  -0.357
## factor(screw)D                             -0.686520   4.218559  -0.163
## factor(screw)E                             -1.755959   4.218559  -0.416
## pgain                                      -4.942712   0.744565  -6.638
## vgain                                      -2.757403   1.017472  -2.710
## factor(motor)B:factor(screw)B              -0.115237   5.965943  -0.019
## factor(motor)C:factor(screw)B               9.653595   5.650336   1.708
## factor(motor)D:factor(screw)B              -5.193946   6.166362  -0.842
## factor(motor)E:factor(screw)B             -27.093137   6.166362  -4.394
## factor(motor)B:factor(screw)C               0.583523   5.965943   0.098
## factor(motor)C:factor(screw)C               4.723777   5.650336   0.836
## factor(motor)D:factor(screw)C              -1.545853   4.790977  -0.323
## factor(motor)E:factor(screw)C             -25.547544   6.166362  -4.143
## factor(motor)B:factor(screw)D              -0.889781   5.965943  -0.149
## factor(motor)C:factor(screw)D              -1.834408   5.650336  -0.325
## factor(motor)D:factor(screw)D              -0.248130   4.790977  -0.052
## factor(motor)E:factor(screw)D             -27.456094   6.457955  -4.252
## factor(motor)B:factor(screw)E               0.694367   5.965943   0.116
## factor(motor)C:factor(screw)E               2.248261   5.650336   0.398
## factor(motor)D:factor(screw)E              -1.436751   2.163660  -0.664
## factor(motor)E:factor(screw)E             -27.640442   6.457955  -4.280
## factor(motor)B:pgain                        0.163145   1.052974   0.155
## factor(motor)C:pgain                        3.332524   0.842701   3.955
## factor(motor)D:pgain                        2.495256   1.150894   2.168
## factor(motor)E:pgain                       -2.999212   1.150894  -2.606
## factor(screw)B:pgain                       -0.135591   1.150894  -0.118
## factor(screw)C:pgain                        0.057937   1.150894   0.050
## factor(screw)D:pgain                       -0.185314   1.150894  -0.161
## factor(screw)E:pgain                        0.086511   1.150894   0.075
## factor(motor)B:vgain                        0.098498   1.438923   0.068
## factor(motor)C:vgain                        1.311632   1.429776   0.917
## factor(motor)D:vgain                        1.138405   1.468437   0.775
## factor(motor)E:vgain                       -2.084235   1.468437  -1.419
## factor(screw)B:vgain                       -0.250556   1.468437  -0.171
## factor(screw)C:vgain                        0.468394   1.468437   0.319
## factor(screw)D:vgain                        0.363169   1.468437   0.247
## factor(screw)E:vgain                        0.561242   1.468437   0.382
## pgain:vgain                                 0.801230   0.206289   3.884
## factor(motor)B:factor(screw)B:pgain         0.034295   1.627610   0.021
## factor(motor)C:factor(screw)B:pgain        -3.032635   1.500168  -2.022
## factor(motor)D:factor(screw)B:pgain         1.390248   1.692607   0.821
## factor(motor)E:factor(screw)B:pgain         7.118616   1.692607   4.206
## factor(motor)B:factor(screw)C:pgain        -0.136823   1.627610  -0.084
## factor(motor)C:factor(screw)C:pgain        -1.820263   1.500168  -1.213
## factor(motor)D:factor(screw)C:pgain         0.115537   1.845392   0.063
## factor(motor)E:factor(screw)C:pgain         6.628206   1.692607   3.916
## factor(motor)B:factor(screw)D:pgain         0.282844   1.627610   0.174
## factor(motor)C:factor(screw)D:pgain         0.060353   1.500168   0.040
## factor(motor)D:factor(screw)D:pgain        -0.547526   1.845392  -0.297
## factor(motor)E:factor(screw)D:pgain         7.086690   1.897715   3.734
## factor(motor)B:factor(screw)E:pgain        -0.214982   1.627610  -0.132
## factor(motor)C:factor(screw)E:pgain        -1.129062   1.500168  -0.753
## factor(motor)D:factor(screw)E:pgain               NA         NA      NA
## factor(motor)E:factor(screw)E:pgain         7.194708   1.897715   3.791
## factor(motor)B:factor(screw)B:vgain         0.024850   2.076683   0.012
## factor(motor)C:factor(screw)B:vgain        -0.749704   2.070356  -0.362
## factor(motor)D:factor(screw)B:vgain         1.265075   2.097241   0.603
## factor(motor)E:factor(screw)B:vgain         4.612245   2.097241   2.199
## factor(motor)B:factor(screw)C:vgain        -0.153361   2.076683  -0.074
## factor(motor)C:factor(screw)C:vgain         0.393133   2.070356   0.190
## factor(motor)D:factor(screw)C:vgain         0.432859   1.260390   0.343
## factor(motor)E:factor(screw)C:vgain         4.106601   2.097241   1.958
## factor(motor)B:factor(screw)D:vgain        -0.056767   2.076683  -0.027
## factor(motor)C:factor(screw)D:vgain         0.671127   2.070356   0.324
## factor(motor)D:factor(screw)D:vgain         0.819264   1.260390   0.650
## factor(motor)E:factor(screw)D:vgain         4.641183   2.113351   2.196
## factor(motor)B:factor(screw)E:vgain         0.050707   2.076683   0.024
## factor(motor)C:factor(screw)E:vgain         0.861588   2.070356   0.416
## factor(motor)D:factor(screw)E:vgain         0.377512   1.260390   0.300
## factor(motor)E:factor(screw)E:vgain         4.545147   2.113351   2.151
## factor(motor)B:pgain:vgain                 -0.025779   0.291736  -0.088
## factor(motor)C:pgain:vgain                 -0.516465   0.277091  -1.864
## factor(motor)D:pgain:vgain                 -0.372368   0.304728  -1.222
## factor(motor)E:pgain:vgain                  0.558232   0.304728   1.832
## factor(screw)B:pgain:vgain                  0.059194   0.304728   0.194
## factor(screw)C:pgain:vgain                 -0.056282   0.304728  -0.185
## factor(screw)D:pgain:vgain                 -0.016676   0.304728  -0.055
## factor(screw)E:pgain:vgain                 -0.068781   0.304728  -0.226
## factor(motor)B:factor(screw)B:pgain:vgain  -0.008320   0.430950  -0.019
## factor(motor)C:factor(screw)B:pgain:vgain   0.416892   0.421173   0.990
## factor(motor)D:factor(screw)B:pgain:vgain  -0.308634   0.439848  -0.702
## factor(motor)E:factor(screw)B:pgain:vgain  -1.291648   0.439848  -2.937
## factor(motor)B:factor(screw)C:pgain:vgain   0.033143   0.430950   0.077
## factor(motor)C:factor(screw)C:pgain:vgain   0.145554   0.421173   0.346
## factor(motor)D:factor(screw)C:pgain:vgain         NA         NA      NA
## factor(motor)E:factor(screw)C:pgain:vgain  -1.154697   0.439848  -2.625
## factor(motor)B:factor(screw)D:pgain:vgain  -0.016055   0.430950  -0.037
## factor(motor)C:factor(screw)D:pgain:vgain  -0.055313   0.421173  -0.131
## factor(motor)D:factor(screw)D:pgain:vgain         NA         NA      NA
## factor(motor)E:factor(screw)D:pgain:vgain  -1.276390   0.444710  -2.870
## factor(motor)B:factor(screw)E:pgain:vgain   0.009257   0.430950   0.021
## factor(motor)C:factor(screw)E:pgain:vgain   0.007621   0.421173   0.018
## factor(motor)D:factor(screw)E:pgain:vgain         NA         NA      NA
## factor(motor)E:factor(screw)E:pgain:vgain  -1.277234   0.444710  -2.872
##                                           Pr(>|t|)    
## (Intercept)                               2.38e-10 ***
## factor(motor)B                            0.857801    
## factor(motor)C                            0.001397 ** 
## factor(motor)D                            0.019634 *  
## factor(motor)E                            0.008640 ** 
## factor(screw)B                            0.991645    
## factor(screw)C                            0.721848    
## factor(screw)D                            0.871187    
## factor(screw)E                            0.678485    
## pgain                                     5.37e-09 ***
## vgain                                     0.008428 ** 
## factor(motor)B:factor(screw)B             0.984643    
## factor(motor)C:factor(screw)B             0.091912 .  
## factor(motor)D:factor(screw)B             0.402447    
## factor(motor)E:factor(screw)B             3.82e-05 ***
## factor(motor)B:factor(screw)C             0.922360    
## factor(motor)C:factor(screw)C             0.405950    
## factor(motor)D:factor(screw)C             0.747902    
## factor(motor)E:factor(screw)C             9.33e-05 ***
## factor(motor)B:factor(screw)D             0.881863    
## factor(motor)C:factor(screw)D             0.746397    
## factor(motor)D:factor(screw)D             0.958841    
## factor(motor)E:factor(screw)D             6.36e-05 ***
## factor(motor)B:factor(screw)E             0.907673    
## factor(motor)C:factor(screw)E             0.691899    
## factor(motor)D:factor(screw)E             0.508817    
## factor(motor)E:factor(screw)E             5.75e-05 ***
## factor(motor)B:pgain                      0.877310    
## factor(motor)C:pgain                      0.000179 ***
## factor(motor)D:pgain                      0.033504 *  
## factor(motor)E:pgain                      0.011155 *  
## factor(screw)B:pgain                      0.906548    
## factor(screw)C:pgain                      0.959992    
## factor(screw)D:pgain                      0.872537    
## factor(screw)E:pgain                      0.940292    
## factor(motor)B:vgain                      0.945618    
## factor(motor)C:vgain                      0.362055    
## factor(motor)D:vgain                      0.440767    
## factor(motor)E:vgain                      0.160171    
## factor(screw)B:vgain                      0.865002    
## factor(screw)C:vgain                      0.750682    
## factor(screw)D:vgain                      0.805377    
## factor(screw)E:vgain                      0.703452    
## pgain:vgain                               0.000228 ***
## factor(motor)B:factor(screw)B:pgain       0.983248    
## factor(motor)C:factor(screw)B:pgain       0.046995 *  
## factor(motor)D:factor(screw)B:pgain       0.414188    
## factor(motor)E:factor(screw)B:pgain       7.48e-05 ***
## factor(motor)B:factor(screw)C:pgain       0.933242    
## factor(motor)C:factor(screw)C:pgain       0.229009    
## factor(motor)D:factor(screw)C:pgain       0.950254    
## factor(motor)E:factor(screw)C:pgain       0.000205 ***
## factor(motor)B:factor(screw)D:pgain       0.862534    
## factor(motor)C:factor(screw)D:pgain       0.968022    
## factor(motor)D:factor(screw)D:pgain       0.767563    
## factor(motor)E:factor(screw)D:pgain       0.000377 ***
## factor(motor)B:factor(screw)E:pgain       0.895291    
## factor(motor)C:factor(screw)E:pgain       0.454164    
## factor(motor)D:factor(screw)E:pgain             NA    
## factor(motor)E:factor(screw)E:pgain       0.000312 ***
## factor(motor)B:factor(screw)B:vgain       0.990486    
## factor(motor)C:factor(screw)B:vgain       0.718343    
## factor(motor)D:factor(screw)B:vgain       0.548292    
## factor(motor)E:factor(screw)B:vgain       0.031122 *  
## factor(motor)B:factor(screw)C:vgain       0.941338    
## factor(motor)C:factor(screw)C:vgain       0.849940    
## factor(motor)D:factor(screw)C:vgain       0.732288    
## factor(motor)E:factor(screw)C:vgain       0.054149 .  
## factor(motor)B:factor(screw)D:vgain       0.978269    
## factor(motor)C:factor(screw)D:vgain       0.746770    
## factor(motor)D:factor(screw)D:vgain       0.517784    
## factor(motor)E:factor(screw)D:vgain       0.031351 *  
## factor(motor)B:factor(screw)E:vgain       0.980588    
## factor(motor)C:factor(screw)E:vgain       0.678552    
## factor(motor)D:factor(screw)E:vgain       0.765419    
## factor(motor)E:factor(screw)E:vgain       0.034907 *  
## factor(motor)B:pgain:vgain                0.929836    
## factor(motor)C:pgain:vgain                0.066473 .  
## factor(motor)D:pgain:vgain                0.225761    
## factor(motor)E:pgain:vgain                0.071160 .  
## factor(screw)B:pgain:vgain                0.846533    
## factor(screw)C:pgain:vgain                0.853994    
## factor(screw)D:pgain:vgain                0.956511    
## factor(screw)E:pgain:vgain                0.822073    
## factor(motor)B:factor(screw)B:pgain:vgain 0.984651    
## factor(motor)C:factor(screw)B:pgain:vgain 0.325616    
## factor(motor)D:factor(screw)B:pgain:vgain 0.485171    
## factor(motor)E:factor(screw)B:pgain:vgain 0.004470 ** 
## factor(motor)B:factor(screw)C:pgain:vgain 0.938914    
## factor(motor)C:factor(screw)C:pgain:vgain 0.730672    
## factor(motor)D:factor(screw)C:pgain:vgain       NA    
## factor(motor)E:factor(screw)C:pgain:vgain 0.010597 *  
## factor(motor)B:factor(screw)D:pgain:vgain 0.970387    
## factor(motor)C:factor(screw)D:pgain:vgain 0.895885    
## factor(motor)D:factor(screw)D:pgain:vgain       NA    
## factor(motor)E:factor(screw)D:pgain:vgain 0.005402 ** 
## factor(motor)B:factor(screw)E:pgain:vgain 0.982922    
## factor(motor)C:factor(screw)E:pgain:vgain 0.985614    
## factor(motor)D:factor(screw)E:pgain:vgain       NA    
## factor(motor)E:factor(screw)E:pgain:vgain 0.005373 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6647 on 71 degrees of freedom
## Multiple R-squared:  0.9223, Adjusted R-squared:  0.8184 
## F-statistic: 8.873 on 95 and 71 DF,  p-value: < 2.2e-16


Expectedly, the full regression with all interactions is a mess. There are too many variables for a coherent model with a dataset of this size. In fact, 4 of the interaction variables have too few datapoints for a coefficient to be fit. However we see that there are subsets of variables that are highly statistically significant. The \(R^2\) value is also decently high, but the adjusted \(R^2\) is significantly less. Adjusted \(R^2\) penalizes for adding predictors with no additional predictive value, hence this points to the inclusion of too many nuisance variables. While this is not a good model for explaining or predicting our dataset, it shows us that there may be a subset of predictors producing a decent regression.


One of the issues in visualizing the data is pgain and vgain appear to be discrete variables. Hence, many values overlap and will be plot on top of each other. The geom_jitter function allows us to avoid this problem. It adds a small amount of noise to the data for more effective visualization. For example all points with a pgain of 3 are adjusted to fall somewhere between 2.5 and 3.5. This allows us to visualize each point without them falling on top of each other.


servo |> ggplot(
  aes(
    x = pgain,
    y = vgain,
    color = class,
    shape = screw
  )
) +
  geom_point() +
  geom_jitter()+
  scale_color_gradient2(
    low = "darkblue",
    mid = "turquoise",
    high = "darkred",
    midpoint = 3
  )+
  annotate(geom="rect", alpha = .08, xmin = 2.5, xmax = 3.5, ymin = .5, ymax = 2.5, 
            fill = "red", color = "black")+
  theme(legend.position = "bottom",plot.background = element_rect(fill = "darkgray"))+
  ggtitle("Servo Class by vgain/pgain/screw")

servo |> ggplot(
  aes(
    x = pgain,
    y = vgain,
    color = class,
    shape = motor
  )
) +
  geom_point() +
  geom_jitter()+
  scale_color_gradient2(
    low = "darkblue",
    mid = "turquoise",
    high = "darkred",
    midpoint = 3
  )+
  annotate(geom="rect", alpha = .08, xmin = 2.5, xmax = 3.5, ymin = .5, ymax = 2.5, 
           fill = "red", color = "black")+
  theme(legend.position = "bottom",plot.background = element_rect(fill = "darkgray"))+
  ggtitle("Servo Class by vgain/pgain/motor")


We see that higher class values (i.e. class values > 2) generally occur within a box of values in the plot. This box corresponds to points with a pgain of 3 and a vgain or 2 or 1. On the basis of this we create a new variable: pvclass.


servo$pvclass = 0
servo$pvclass[servo$pgain==3 & servo$vgain<2.5] = 1

reg1 = lm(class ~ factor(pvclass) * factor(motor) * factor(screw), data = servo)

summary(reg1)
## 
## Call:
## lm(formula = class ~ factor(pvclass) * factor(motor) * factor(screw), 
##     data = servo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4000 -0.1638 -0.0525  0.2000  1.4000 
## 
## Coefficients: (1 not defined because of singularities)
##                                                 Estimate Std. Error t value
## (Intercept)                                     0.856254   0.149687   5.720
## factor(pvclass)1                                4.643779   0.299374  15.512
## factor(motor)B                                 -0.025000   0.211689  -0.118
## factor(motor)C                                 -0.177500   0.189341  -0.937
## factor(motor)D                                 -0.275001   0.222022  -1.239
## factor(motor)E                                 -0.005002   0.222022  -0.023
## factor(screw)B                                 -0.132500   0.222022  -0.597
## factor(screw)C                                 -0.267500   0.222022  -1.205
## factor(screw)D                                 -0.290002   0.222022  -1.306
## factor(screw)E                                 -0.282503   0.222022  -1.272
## factor(pvclass)1:factor(motor)B                -0.175010   0.423379  -0.413
## factor(pvclass)1:factor(motor)C                -0.622537   0.412659  -1.509
## factor(pvclass)1:factor(motor)D                -2.625051   0.428638  -6.124
## factor(pvclass)1:factor(motor)E                 1.505072   0.428638   3.511
## factor(pvclass)1:factor(screw)B                -0.767542   0.428638  -1.791
## factor(pvclass)1:factor(screw)C                -0.932553   0.428638  -2.176
## factor(pvclass)1:factor(screw)D                -0.810049   0.428638  -1.890
## factor(pvclass)1:factor(screw)E                -1.017555   0.428638  -2.374
## factor(motor)B:factor(screw)B                  -0.020000   0.313986  -0.064
## factor(motor)C:factor(screw)B                   0.035000   0.299374   0.117
## factor(motor)D:factor(screw)B                   0.027500   0.321043   0.086
## factor(motor)E:factor(screw)B                   0.005000   0.321043   0.016
## factor(motor)B:factor(screw)C                  -0.027501   0.313986  -0.088
## factor(motor)C:factor(screw)C                   0.094999   0.299374   0.317
## factor(motor)D:factor(screw)C                   0.230001   0.458932   0.501
## factor(motor)E:factor(screw)C                  -0.182499   0.321043  -0.568
## factor(motor)B:factor(screw)D                  -0.004999   0.313986  -0.016
## factor(motor)C:factor(screw)D                   0.087501   0.299374   0.292
## factor(motor)D:factor(screw)D                   0.065001   0.458932   0.142
## factor(motor)E:factor(screw)D                  -0.064372   0.331346  -0.194
## factor(motor)B:factor(screw)E                  -0.027500   0.313986  -0.088
## factor(motor)C:factor(screw)E                   0.155001   0.299374   0.518
## factor(motor)D:factor(screw)E                  -0.599922   0.518531  -1.157
## factor(motor)E:factor(screw)E                  -0.090621   0.331346  -0.273
## factor(pvclass)1:factor(motor)B:factor(screw)B  0.020000   0.606186   0.033
## factor(pvclass)1:factor(motor)C:factor(screw)B  0.365022   0.598748   0.610
## factor(pvclass)1:factor(motor)D:factor(screw)B -0.227444   0.609871  -0.373
## factor(pvclass)1:factor(motor)E:factor(screw)B -4.405068   0.609871  -7.223
## factor(pvclass)1:factor(motor)B:factor(screw)C  0.127505   0.606186   0.210
## factor(pvclass)1:factor(motor)C:factor(screw)C -0.494958   0.598748  -0.827
## factor(pvclass)1:factor(motor)D:factor(screw)C -0.529929   0.692454  -0.765
## factor(pvclass)1:factor(motor)E:factor(screw)C -4.317553   0.609871  -7.079
## factor(pvclass)1:factor(motor)B:factor(screw)D -0.195007   0.606186  -0.322
## factor(pvclass)1:factor(motor)C:factor(screw)D -1.987453   0.598748  -3.319
## factor(pvclass)1:factor(motor)D:factor(screw)D -0.464933   0.692454  -0.671
## factor(pvclass)1:factor(motor)E:factor(screw)D -4.735681   0.615357  -7.696
## factor(pvclass)1:factor(motor)B:factor(screw)E  0.227516   0.606186   0.375
## factor(pvclass)1:factor(motor)C:factor(screw)E -0.854956   0.598748  -1.428
## factor(pvclass)1:factor(motor)D:factor(screw)E        NA         NA      NA
## factor(pvclass)1:factor(motor)E:factor(screw)E -4.709423   0.615357  -7.653
##                                                Pr(>|t|)    
## (Intercept)                                    8.19e-08 ***
## factor(pvclass)1                                < 2e-16 ***
## factor(motor)B                                 0.906192    
## factor(motor)C                                 0.350435    
## factor(motor)D                                 0.217944    
## factor(motor)E                                 0.982063    
## factor(screw)B                                 0.551792    
## factor(screw)C                                 0.230676    
## factor(screw)D                                 0.194029    
## factor(screw)E                                 0.205729    
## factor(pvclass)1:factor(motor)B                0.680090    
## factor(pvclass)1:factor(motor)C                0.134073    
## factor(pvclass)1:factor(motor)D                1.23e-08 ***
## factor(pvclass)1:factor(motor)E                0.000633 ***
## factor(pvclass)1:factor(screw)B                0.075913 .  
## factor(pvclass)1:factor(screw)C                0.031578 *  
## factor(pvclass)1:factor(screw)D                0.061236 .  
## factor(pvclass)1:factor(screw)E                0.019214 *  
## factor(motor)B:factor(screw)B                  0.949319    
## factor(motor)C:factor(screw)B                  0.907131    
## factor(motor)D:factor(screw)B                  0.931884    
## factor(motor)E:factor(screw)B                  0.987601    
## factor(motor)B:factor(screw)C                  0.930354    
## factor(motor)C:factor(screw)C                  0.751556    
## factor(motor)D:factor(screw)C                  0.617188    
## factor(motor)E:factor(screw)C                  0.570805    
## factor(motor)B:factor(screw)D                  0.987323    
## factor(motor)C:factor(screw)D                  0.770586    
## factor(motor)D:factor(screw)D                  0.887609    
## factor(motor)E:factor(screw)D                  0.846296    
## factor(motor)B:factor(screw)E                  0.930356    
## factor(motor)C:factor(screw)E                  0.605601    
## factor(motor)D:factor(screw)E                  0.249623    
## factor(motor)E:factor(screw)E                  0.784952    
## factor(pvclass)1:factor(motor)B:factor(screw)B 0.973735    
## factor(pvclass)1:factor(motor)C:factor(screw)B 0.543271    
## factor(pvclass)1:factor(motor)D:factor(screw)B 0.709864    
## factor(pvclass)1:factor(motor)E:factor(screw)B 5.40e-11 ***
## factor(pvclass)1:factor(motor)B:factor(screw)C 0.833765    
## factor(pvclass)1:factor(motor)C:factor(screw)C 0.410101    
## factor(pvclass)1:factor(motor)D:factor(screw)C 0.445626    
## factor(pvclass)1:factor(motor)E:factor(screw)C 1.12e-10 ***
## factor(pvclass)1:factor(motor)B:factor(screw)D 0.748253    
## factor(pvclass)1:factor(motor)C:factor(screw)D 0.001201 ** 
## factor(pvclass)1:factor(motor)D:factor(screw)D 0.503260    
## factor(pvclass)1:factor(motor)E:factor(screw)D 4.72e-12 ***
## factor(pvclass)1:factor(motor)B:factor(screw)E 0.708094    
## factor(pvclass)1:factor(motor)C:factor(screw)E 0.155960    
## factor(pvclass)1:factor(motor)D:factor(screw)E       NA    
## factor(pvclass)1:factor(motor)E:factor(screw)E 5.90e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3667 on 118 degrees of freedom
## Multiple R-squared:  0.9607, Adjusted R-squared:  0.9447 
## F-statistic: 60.12 on 48 and 118 DF,  p-value: < 2.2e-16


While this regression is also a mess, it is cleaner than the last iteration. By collapsing pgain and vgain into 1 categorical variable, it is easier to see which variables appear to be significantly impact class outcomes. The \(R^2\) value is improved and adjusted \(R^2\) is much improved as we’ve reduced on variable in the set of interactions.


To build our final regression model, we use the previous messy model to identify the most relevant predictors. On the basis of this we introduce new dummy variables. These encode the significant categories (to prediction) as binary variables. These new variables are then used to build a reduced model that focuses on the most important predictive components.


servo$motorD = 0
servo$motorD[servo$motor=='D'] = 1

servo$motorE = 0
servo$motorE[servo$motor=='E'] = 1

servo$screwA = 0
servo$screwA[servo$screw=='A'] = 1

servo$motorC = 0
servo$motorC[servo$motor=='C'] = 1

servo$motorEscrewA = 0
servo$motorEscrewA[servo$motor=='E' & servo$screw=='A'] = 1

servo$motorCscrewD = 0
servo$motorCscrewD[servo$motor=='C' & servo$screw=='D'] = 1

servo$motorCscrewE = 0
servo$motorCscrewE[servo$motor=='C' & servo$screw=='E'] = 1

servo$motorCscrewC = 0
servo$motorCscrewC[servo$motor=='C' & servo$screw=='C'] = 1

reg2 = lm(class ~ pvclass+screwA+pvclass:(motorD+motorE+motorC+screwA+motorEscrewA+motorCscrewD+motorCscrewE+motorCscrewC),data=servo)

summary(reg2)
## 
## Call:
## lm(formula = class ~ pvclass + screwA + pvclass:(motorD + motorE + 
##     motorC + screwA + motorEscrewA + motorCscrewD + motorCscrewE + 
##     motorCscrewC), data = servo)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.39999 -0.19677 -0.04676  0.21085  1.39999 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.55302    0.03817  14.488  < 2e-16 ***
## pvclass               3.73421    0.09221  40.496  < 2e-16 ***
## screwA                0.19933    0.07299   2.731 0.007040 ** 
## pvclass:motorD       -3.10999    0.13629 -22.819  < 2e-16 ***
## pvclass:motorE       -3.03723    0.15009 -20.237  < 2e-16 ***
## pvclass:motorC       -0.39415    0.19767  -1.994 0.047896 *  
## pvclass:screwA        0.91449    0.16333   5.599 9.49e-08 ***
## pvclass:motorEscrewA  4.63629    0.31425  14.754  < 2e-16 ***
## pvclass:motorCscrewD -2.19308    0.31339  -6.998 7.23e-11 ***
## pvclass:motorCscrewE -1.19309    0.31339  -3.807 0.000202 ***
## pvclass:motorCscrewC -0.79309    0.31339  -2.531 0.012376 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3519 on 156 degrees of freedom
## Multiple R-squared:  0.9522, Adjusted R-squared:  0.9491 
## F-statistic: 310.5 on 10 and 156 DF,  p-value: < 2.2e-16


This model tells us the following:

  • when pvclass is 1, class tends to be significantly higher (\(\beta=3.73\))
  • when screw is ‘A’, class tends to be a bit higher (\(\beta=.2\))
  • when screw is ‘A’, the effect of pvclass being 1 is greater (\(\beta=.91\)) and when motor is also ‘E’ there is a massive effect (\(\beta=4.64\))
  • when motor is ‘C’, ‘D’, or ‘E’, the effect of pvclass being 1 is diminished (interaction effects)
  • when motor is ‘C’ and screw is ‘C’, ‘D’, or ‘E’, the effect of pvclass being 1 is further diminished (interaction effects)


As a final assessment we look at the residual plots

library(car)
par(
  mfrow=c(1,2),
  bg ="darkgray"
)
hist(
  resid(reg2),
  main = "Histogram of Residuals",
  col = "green",
  border = "darkgreen",
  freq = FALSE
)
qqPlot(
  resid(reg2),
  main = "qq Plot",
  ylab = "sample quantiles"
)

servo[c(127,130),]
servo[c(49,124),]
## [1]  49 124
##     motor screw pgain vgain    class pvclass motorD motorE screwA motorC
## 127     C     C     3     1 1.899990       1      0      0      0      1
## 130     C     C     3     2 4.299977       1      0      0      0      1
##     motorEscrewA motorCscrewD motorCscrewE motorCscrewC
## 127            0            0            0            1
## 130            0            0            0            1
##     motor screw pgain vgain    class pvclass motorD motorE screwA motorC
## 49      C     E     3     2 4.099967       1      0      0      0      1
## 124     C     E     3     1 1.299997       1      0      0      0      1
##     motorEscrewA motorCscrewD motorCscrewE motorCscrewC
## 49             0            0            1            0
## 124            0            0            1            0


The histogram and residual plots look decent overall, with a couple outlier points on either side. A little further exploration yields the indices of these points (49,124,127,130). Examining we see the issue. The first two points below to motor ‘C’ and screw ‘C’ with pvclass of 1 while the last two points belong to motor ‘C’ and screw ‘E’ with pvclass of 1. As there are only two points in the dataset with these combinations of features, the model does not have much data to fit here. Furthermore the two points present have significantly different class values, as seen from the output above.


This artifact could be avoided by noting that both points differ on the exact vgain value. We could incorporate this into the model. But then we would only have 1 data point per case, which would create a strong possibility for overfitting. In general nonlinear prediction paradigms, such as decision tree models, may have somewhat stronger predictive power. However with such few cases in certain categories in this dataset, these models run a higher risk of overfitting. The most satisfactory solution here, if feasible, is to collect more data.


Despite having a high non-linear dataset with many possible categories of predictors and many possible interaction effects, we have been able to construct a linear model on a subset of predictors that is parsimonious and has strong explanatory power.