Coursera Answers

Stanford University Statistical Learning Quiz Answer | Unsupervised Learning

Stanford University Statistical Learning Quiz Answer  Unsupervised Learning

Stanford University Statistical Learning Quiz Answer | Unsupervised Learning

In this article i am gone to share Stanford University Statistical Learning Quiz Answer | Unsupervised Learning Machines with you..

Principal Component Quiz


You are analyzing a dataset where each observation is an age, height, length, and width of a particular turtle. You want to know if the data can be well described by fewer than four dimensions (maybe for plotting), so you decide to do Principal Component Analysis. Which of the following is most likely to be the loadings of the first Principal Component?
  • (1, 1, 1, 1)
  • (.5, .5, .5, .5)
  • (.71, -.71, 0, 0)
  • (1, -1, -1, -1)

Higher Order Principal Component Quiz


Suppose we a data set where each data point represents a single student’s scores on a math test, a physics test, a reading comprehension test, and a vocabulary test.

We find the first two principal components, which capture 90% of the variability in the data, and interpret their loadings. We conclude that the first principal component represents overall academic ability, and the second represents a contrast between quantitative ability and verbal ability.

What loadings would be consistent with that interpretation? Choose all that apply.
  • (0.5, 0.5, 0.5, 0.5) and (0.71, 0.71, 0, 0)
  • (0.5, 0.5, 0.5, 0.5) and (0, 0, -0.71, -0.71)
  • (0.5, 0.5, 0.5, 0.5) and (0.5, 0.5, -0.5, -0.5)
  • (0.5, 0.5, 0.5, 0.5) and (-0.5, -0.5, 0.5, 0.5)
  • (0.71, 0.71, 0, 0) and (0, 0, 0.71, 0.71)
  • (0.71, 0, -0.71, 0) and (0, 0.71, 0, -0.71)

K-Means Clustering Quiz


True or False: If we use k-means clustering, will we get the same cluster assignments for each point, whether or not we standardize the variables.
  • True
  • False

Hierarchical Clustering Quiz


True or False: If we cut the dendrogram at a lower point, we will tend to get more clusters (and cannot get fewer clusters).
  • True
  • False

Breast Cancer Example Quiz


In the heat map for breast cancer data, which of the following depended on the output of hierarchical clustering?
  • The ordering of the rows
  • The ordering of the columns
  • The coloring of the cells as red or green

Unsupervised Learning in R Quiz


Suppose we want to fit a linear regression, but the number of variables is much larger than the number of observations. In some cases, we may improve the fit by reducing the dimension of the features before.

In this problem, we use a data set with n = 300 and p = 200, so we have more observations than variables, but not by much. Load the data x, y, x.test, and y.test from 10.R.RData.

First, concatenate x and x.test using the rbind functions and perform a principal components analysis on the concatenated data frame (use the “scale=TRUE” option). To within 10% relative error, what proportion of the variance is explained by the first five principal components?
  • 0.3498565


The previous answer suggests that a relatively small number of “latent variables” account for a substantial fraction of the features’ variability. We might believe that these latent variables are more important than linear combinations of the features that have low variance.

We can try forgetting about the raw features and using the first five principal components (computed on rbind(x,x.test)) instead as low-dimensional derived features. What is the mean-squared test error if we regress y on the first five principal components, and use the resulting model to predict y.test?
  • 0.9923


Now, try an OLS linear regression of y on the matrix x. What is the mean squared predition error if we use the fitted model to predict y.test from x.test?
  • 3.90714

Chapter 10 Quiz


K-Means is a seemingly complicated clustering algorithms. Here is a simpler one:

Given k, the number of clusters, and n, the number of observations, try all possible assignments of the n observations into k clusters. Then, select one of the assignments that minimizes Within-Cluster Variation as defined on page 30.

Assume that you implemented the most naive version of the above algorithm. Here, by naive we mean that you try all possible assignments even though some of them might be redundant (for example, the algorithm tries assigning all of the observations to cluster 1 and it also tries to assign them all to cluster 2 even though those are effectively the same solution).

In terms of n and k, how many potential solutions will your algorithm try?
  • k^n