# Stanford University Statistical Learning Quiz Answer | Resampling Methods

Stanford University Statistical Learning Quiz Answer | Resampling Methods

In this article i am gone to share Stanford University Statistical Learning Quiz Answer | Resampling Methods with you..

## Cross-Validation Quiz

5.1.R1
When we fit a model to data, which is typically larger?
• Test Error
• Training Error
5.1.R2
What are reasons why test error could be LESS than training error?
• By chance, the test set has easier cases than the training set.
• The model is highly complex, so training error systematically overestimates test error
• The model is not very complex, so training error systematically overestimates test error

## K-Fold Cross Validation Quiz

5.2.R1
Suppose we want to use cross-validation to estimate the error of the following procedure:
Step 1: Find the k variables most correlated with y

Step 2: Fit a linear regression using those variables as predictors
We will estimate the error for each k from 1 to p, and then choose the best k.
True or false: a correct cross-validation procedure will possibly choose a different set of k variables for every fold.
• TRUE
• FALSE

## Cross-Validation: the wrong and right way Quiz

5.3.R1
Suppose that we perform forward stepwise regression and use cross-validation to choose the best model size.
Using the full data set to choose the sequence of models is the WRONG way to do cross-validation (we need to redo the model selection step within each training fold). If we do cross-validation the WRONG way, which of the following is true?
• The selected model will probably be too complex
• The selected model will probably be too simple

## The Bootstrap Quiz

5.4.R1
One way of carrying out the bootstrap is to average equally over all possible bootstrap samples from the original data set (where two bootstrap data sets are different if they have the same data points but in different order). Unlike the usual implementation of the bootstrap, this method has the advantage of not introducing extra noise due to resampling randomly. (You can use “^” to denote power, as in “n^2”)
To carry out this implementation on a data set with n data points, how many bootstrap data sets would we need to average over?
• n^n

## More on the Bootstrap Quiz

5.5.R1
If we have n data points, what is the probability that a given data point does not appear in a bootstrap sample?
• (1-1/n)^n

## Resampling in R Quiz

5.R.R1
Download the file 5.R.RData and load it into R using load(“5.R.RData”). Consider the linear regression model of y on X1 and X2. What is the standard error for ?
• 0.02593
5.R.R2
Next, plot the data using matplot(Xy,type=”l”). Which of the following do you think is most likely given what you see?
• Our estimate of s.e.(hatbeta_1) is too high.
• Our estimate of s.e.(hatbeta_1) is too low.
• Our estimate of s.e.(hatbeta_1) is about right.
5.R.R3
Now, use the (standard) bootstrap to estimate . To within 10%, what do you get?
• 0.0274
5.R.R4
Finally, use the block bootstrap to estimate s.e.(hatbeta_1). Use blocks of 100 contiguous observations, and resample ten whole blocks with replacement then paste them together to construct each bootstrap time series. For example, one of your bootstrap resamples could be:
• new.rows = c(101:200, 401:500, 101:200, 901:1000, 301:400, 1:100, 1:100, 801:900, 201:300, 701:800)
• new.Xy = Xy[new.rows, ]
• To within 10%, what do you get?
• 0.2
_____________________________________________________________

## Chapter 5 Quiz

5.Q.1
If we use ten-fold cross-validation as a means of model selection, the cross-validation estimate of test error is:
• biased upward
• biased downward
• unbiased
• potentially any of the above
5.Q.2
Why can’t we use the standard bootstrap for some time series data?
• The data points in most time series aren’t i.i.d.
• Some points will be used twice in the same sample
• The standard bootstrap doesn’t accurately mimic the real-world data-generating mechanism