# Stanford University Statistical Learning Quiz Answer | Linear Model Selection and Regularization

**Stanford University Statistical Learning Quiz Answer |**

**Linear Model Selection and Regularization**

In this article i am gone to share Stanford University Statistical Learning Quiz Answer | Linear Model Selection and Regularization with you..

## Introduction and Best-Subset Selection

6.1.R1

**Which of the following modeling techniques performs Feature Selection?**

- Least Squares
- Linear Discriminant Analysis
- Linear Regression with Forward Selection
- Support Vector Machines

## Stepwise Selection Quiz

6.2.R1

**We perform best subset and forward stepwise selection on a single dataset. For both approaches, we obtain models, containing predictors.**

Which of the two models with predictors is guaranteed to have training RSS no larger than the other model?

- Best Subset
- Forward Stepwise
- They always have the same training RSS
- Not enough information is given to know

6.2.R2

**Which of the two models with predictors has the smallest test RSS?**

- Best Subset
- Forward Stepwise
- They always have the same test RSS
- Not enough information is given to know

## Backward Stepwise Selection Quiz

6.3.R1

**You are trying to fit a model and are given p=30 predictor variables to choose from. Ultimately, you want your model to be interpretable, so you decide to use Best Subset Selection.**

How many different models will you end up considering?:

- 2^30

6.3.R2

**How many would you fit using Forward Selection?:**

- 1+30(30+1)/2*

## Estimating Test Error Quiz

6.4.R1

**You are fitting a linear model to data assumed to have Gaussian errors. The model has up to p = 5 predictors and n = 100 observations. Which of the following is most likely true of the relationship between C_p and AIC in terms of using the statistic to select a number of predictors to include?**

- C_p will select the same model as AIC
- C_p will select a model with more predictors AIC
- C_p will select a model with fewer predictors AIC
- Not enough information is given to decide

## Validation and Cross-Validation

6.5.R1

**You are doing a simulation in order to compare the effect of using Cross-Validation or a Validation set. For each iteration of the simulation, you generate new data and then use both Cross-Validation and a Validation set in order to determine the optimal number of predictors. Which of the following is most likely?**

- The Cross-Validation method will result in a higher variance of optimal number of predictors

- The Validation set method will result in a higher variance of optimal number of predictors

- Both methods will produce results with the same variance of optimal number of predictors

- Not enough information is given to decide

## Shrinkage Methods and Ridge Regression Quiz

6.6.R1

**sqrt{sum_{j=1}^pbeta_j^2} is equivalent to:**

- Xhatbeta
- hatbeta^R
- C_p statistic
- |beta|_2

6.6.R2

**You perform ridge regression on a problem where your third predictor, x3, is measured in dollars. You decide to refit the model after changing x3 to be measured in cents. Which of the following is true?:**

- hatbeta_3 and hat y will remain the same.
- hatbeta_3 will change but hat y will remain the same.
- hatbeta_3 will remain the same but hat y will change.
- hatbeta_3 and hat y will both change.

## The Lasso Quiz

6.7 R1

**Which of the following is NOT a benefit of the sparsity imposed by the Lasso?**

- The Lasso does variable selection by default
- Sparse models are generally more easy to interperet
- Using the Lasso penalty helps to decrease the bias of the fits
- Using the Lasso penalty helps to decrease the variance of the fits

## Tuning Parameter Selection Quiz

6.8 R1

**Which of the following would be the worst metric to use to select in the Lasso?**

- RSS
- Cross-Validated error
- Validation set error

## Dimension Reduction Methods Quiz

6.9.R1

**We compute the principal components of our p predictor variables. The RSS in a simple linear regression of Y onto the largest principal component will always be no larger than the RSS in a simple regression of Y onto the second largest principal component. True or False? (You may want to watch 6.10 as well before answering – sorry!)**

- True
- False

## Principal Components Regression and Partial Least Squares

6.10.R1

**You are working on a regression problem with many variables, so you decide to do Principal Components Analysis first and then fit the regression to the first 2 principal components. Which of the following would you expect to happen?:**

- A subset of the features will be selected
- Model Bias will decrease relative to the full least squares model
- Variance of fitted values will decrease relative to the full least squares model
- Model interpretability will improve relative to the full least squares model

## R Model Selection in R

6.R.R1

**One of the functions in the glmnet package is cv.glmnet(). This function, like many functions in R, will return a list object that contains various outputs of interest. What is the name of the component that contains a vector of the mean cross-validated errors?**

- cvm

## Chapter 6 Quiz

**Question 1)**

**Suppose we estimate the regression coefficients in a linear regression model by minimizing displaystylesum_{i=1}^nleft(y_i – beta_0 – sum_{j=1}^pbeta_jx_{ij}right)^2 + lambdasum_{j=1}^pbeta_j^2 for a particular value of lambda. For each of the following, select the correct answer:**

*As we increase lambda from 0, the training RSS will:*

- Steadily Increase

**Question 2)**

**As we increase lambda from 0, the test RSS will:**

- Decrease initially, and then eventually start increasing in a U shape

**Question 3)**

**As we increase lambda from 0, the variance will:**

- Steadily Decrease

**Question 4)**

**As we increase lambda from 0, the (squared) bias will:**

- Steadily Increase

**Question 5)**

**As we increase lambda from 0, the irreducible error will:**

- Remain constant