Coursera Answers

Stanford University Statistical Learning Quiz Answer | Linear Model Selection and Regularization

Stanford University Statistical Learning Quiz Answer  Linear Model Selection and Regularization


Stanford University Statistical Learning Quiz Answer | Linear Model Selection and Regularization


In this article i am gone to share Stanford University Statistical Learning Quiz Answer | Linear Model Selection and Regularization with you..


Introduction and Best-Subset Selection

6.1.R1
Which of the following modeling techniques performs Feature Selection?
  • Least Squares
  • Linear Discriminant Analysis
  • Linear Regression with Forward Selection
  • Support Vector Machines

Stepwise Selection Quiz

6.2.R1
We perform best subset and forward stepwise selection on a single dataset. For both approaches, we obtain models, containing predictors.
Which of the two models with predictors is guaranteed to have training RSS no larger than the other model?
  • Best Subset
  • Forward Stepwise
  • They always have the same training RSS
  • Not enough information is given to know
6.2.R2
Which of the two models with predictors has the smallest test RSS?
  • Best Subset
  • Forward Stepwise
  • They always have the same test RSS
  • Not enough information is given to know

Backward Stepwise Selection Quiz

6.3.R1
You are trying to fit a model and are given p=30 predictor variables to choose from. Ultimately, you want your model to be interpretable, so you decide to use Best Subset Selection.
How many different models will you end up considering?:
  • 2^30
6.3.R2
How many would you fit using Forward Selection?:
  • 1+30(30+1)/2*

Estimating Test Error Quiz

6.4.R1
You are fitting a linear model to data assumed to have Gaussian errors. The model has up to p = 5 predictors and n = 100 observations. Which of the following is most likely true of the relationship between C_p and AIC in terms of using the statistic to select a number of predictors to include?
  • C_p will select the same model as AIC
  • C_p will select a model with more predictors AIC
  • C_p will select a model with fewer predictors AIC
  • Not enough information is given to decide



Validation and Cross-Validation

6.5.R1
You are doing a simulation in order to compare the effect of using Cross-Validation or a Validation set. For each iteration of the simulation, you generate new data and then use both Cross-Validation and a Validation set in order to determine the optimal number of predictors. Which of the following is most likely?
  • The Cross-Validation method will result in a higher variance of optimal number of predictors
  • The Validation set method will result in a higher variance of optimal number of predictors
  • Both methods will produce results with the same variance of optimal number of predictors
  • Not enough information is given to decide

Shrinkage Methods and Ridge Regression Quiz

6.6.R1
sqrt{sum_{j=1}^pbeta_j^2} is equivalent to:
  • Xhatbeta
  • hatbeta^R
  • C_p statistic
  • |beta|_2
6.6.R2
You perform ridge regression on a problem where your third predictor, x3, is measured in dollars. You decide to refit the model after changing x3 to be measured in cents. Which of the following is true?:
  • hatbeta_3 and hat y will remain the same.
  • hatbeta_3 will change but hat y will remain the same.
  • hatbeta_3 will remain the same but hat y will change.
  • hatbeta_3 and hat y will both change.

The Lasso Quiz

6.7 R1
Which of the following is NOT a benefit of the sparsity imposed by the Lasso?
  • The Lasso does variable selection by default
  • Sparse models are generally more easy to interperet
  • Using the Lasso penalty helps to decrease the bias of the fits
  • Using the Lasso penalty helps to decrease the variance of the fits

Tuning Parameter Selection Quiz

6.8 R1
Which of the following would be the worst metric to use to select in the Lasso?
  • RSS
  • Cross-Validated error
  • Validation set error

Dimension Reduction Methods Quiz

6.9.R1
We compute the principal components of our p predictor variables. The RSS in a simple linear regression of Y onto the largest principal component will always be no larger than the RSS in a simple regression of Y onto the second largest principal component. True or False? (You may want to watch 6.10 as well before answering – sorry!)
  • True
  • False

Principal Components Regression and Partial Least Squares

6.10.R1
You are working on a regression problem with many variables, so you decide to do Principal Components Analysis first and then fit the regression to the first 2 principal components. Which of the following would you expect to happen?:
  • A subset of the features will be selected
  • Model Bias will decrease relative to the full least squares model
  • Variance of fitted values will decrease relative to the full least squares model
  • Model interpretability will improve relative to the full least squares model

R Model Selection in R

6.R.R1
One of the functions in the glmnet package is cv.glmnet(). This function, like many functions in R, will return a list object that contains various outputs of interest. What is the name of the component that contains a vector of the mean cross-validated errors?
  • cvm

Chapter 6 Quiz

Question 1)
Suppose we estimate the regression coefficients in a linear regression model by minimizing displaystylesum_{i=1}^nleft(y_i – beta_0 – sum_{j=1}^pbeta_jx_{ij}right)^2 + lambdasum_{j=1}^pbeta_j^2 for a particular value of lambda. For each of the following, select the correct answer:
As we increase lambda from 0, the training RSS will:
  • Steadily Increase
Question 2)
As we increase lambda from 0, the test RSS will:
  • Decrease initially, and then eventually start increasing in a U shape
Question 3)
As we increase lambda from 0, the variance will:
  • Steadily Decrease
Question 4)
As we increase lambda from 0, the (squared) bias will:
  • Steadily Increase
Question 5)
As we increase lambda from 0, the irreducible error will:
  • Remain constant