Practice exam covering Course 4: Perform data science with Azure Databricks Quiz Answers
In this article i am gone to share Prepare for DP-100: Data Science on Microsoft Azure Exam | Week 5 | Practice exam covering Course 4: Perform data science with Azure Databricks Quiz Answers with you..
Practice exam covering Course 4: Perform data science with Azure Databricks Quiz Answers
Question 1)
You have an AirBnB housing dataframe which you preprocessed and filtered down to only the relevant columns.
The columns are: id, host_name, bedrooms, neighbourhood_cleansed, price.
You’ve written the function below name firstInitialFunction that returns the first initial from the host_name column:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
You now want to create a UDF from this function using the spark.sql.register so that it will create the UDF in the SQL namespace.
How would you code that?
- airbnbDF.replaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
- airbnbDF.createTempView(“airbnbDF”)
spark.udf.register(sql_udf = firstInitialFunction)
- airbnbDF.createAndReplaceTempView(“airbnbDF”)
spark.udf.register(sql_udf.firstInitialFunction)
- airbnbDF.createOrReplaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
Question 2)
You have a Boston Housing dataset from which you want to train a model to predict the value of housing based on one or more input measures.
You are using the Spark ml framework to train the model on a single column that contains a vector of all the relevant features.
You must prepare the data by creating one column named features that has the average number of rooms, age and tax rate.
You want to use VectorAssembler for this task.
How would you code this?
- from pyspark.ml.feature import VectorAssembler
featureCols = [“rm”, “age”, “tax”]
assembler = VectorAssembler(inputCols=featureCols, outputCol=”features”)
bostonFeaturizedDF = assembler.transform(bostonDF)
display(bostonFeaturizedDF)
Question 3)
You are using MLflow to track the runs of a Linear Regression model of an AirBnB dataset.
You want to use all the features in the dataset.
You’ve created the pipeline, logged the pipeline, and logged the parameters.
Now you need to create predictions and metrics.
How should you code that?
- predDF = pipelineModel.transform(testDF)
regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
Question 4)
You are running Python code interactively in a Conda environment. The environment includes all required Azure Machine Learning SDK and MLflow packages.
You must use MLflow to log metrics in an Azure Machine Learning experiment named mlflow-experiment.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3 mlflow.active_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 run.log()
Question 5)
You create machine learning models by using Azure Machine Learning. You plan to train and score models by using a variety of compute contexts.
You also plan to train models by using an Azure Databricks cluster.
Which compute type can you use for Azure Databricks?
- Compute cluster
- Attached compute
- Inference cluster
Question 6)
You deploy a deep learning model in Azure Container Instance.
You must use the Azure Machine Learning SDK to call the model API. You need to invoke the deployed model using native SDK classes and methods.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
from azureml.core import Workspace
#1st code option
Import json
ws = Workspace.from_config()
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)
x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
- from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
- from azureml.core.webservice import requests, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
Question 7)
You have an AirBnB housing dataframe which you preprocessed and filtered down to only the relevant columns.
The columns are: id, host_name, bedrooms, neighbourhood_cleansed, price.
You’ve written the function below name firstInitialFunction that returns the first initial from the host_name column:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Because Python UDFs are much slower than Scala UDFs, you now want to create a Vectorized UDF in Python to speed up the computation.
How would you code that?
- from pyspark.sql.functions import pandas_udf
@pandas_udf(“string”)
def vectorizedUDF(name):
return name.str[0]
Question 8)
You are running a training experiment on remote compute in Azure Machine Learning.
The experiment is configured to use a Conda environment that includes the mlflow and azureml-contrib-run packages. You must use MLflow as the logging package for tracking metrics generated in the experiment.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
Import numpy as np
#1 Import library to log metrics
#2 Start logging for this run
reg_rage = 0.01
#3 Log the reg_rate metric
#4 Stop loggin for this run
- #1 from azureml.core import Run, #2 run = Run.get_context(), #3 logger.info(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 mlflow.end_run()
- #1 import mlflow, #2 mlflow.start_run(), #3 logger.info(‘ ..’), #4 mlflow.end_run()
- #1 import logging, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 run.complete()
Question 9)
You have a Boston Housing dataset where you find a median value for a number variables such as the number of rooms, per capita crime and economic status of residents.
You want to use Linear Regression to predict the median home value based on the average number of rooms.
You’ve imported the dataset and created a column named features that has a single input variable named rm by using VectorAssembler.
You now want to fit the Liner Regression model.
How should you code that?
- from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
lrModel = lr.fit(bostonFeaturizedDF)
Question 10)
You use the following code to run a script as an experiment in Azure Machine Learning:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
ws = Workspace.from_config()
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’,
script=’experiment.py’, run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
You must identify the output files that are generated by the experiment run. You need to add code to retrieve the output file names. Which code segment should you add to the script?
- files = run.get_properties()
- files = run.get_metrics()
- run.get_details_with_logs()
- files = run.get_fine_names()
Question 11)
You’re working with the Boston Housing dataset and you want to tune the Hyperparameters for the Linear Regression algorithm you’re using.
You’ve performed a test split on the Boston data set and built a pipeline for linear regression.
Now you want to use ParamGridBuilder() to test the maximum number of iterations, no matter whether you want to use an intercept with the y axis, or whether you want to standardize the features.
How should you code that?
- from pyspark.ml.tuning import ParamGridBuilder
paramGrid = (ParamGridBuilder()
.addGrid(lr.maxIter, [1, 10, 100])
.addGrid(lr.fitIntercept, [True, False])
.addGrid(lr.standardization, [True, False])
.build()
)
Question 12)
You are evaluating a Python NumPy array that contains six data points defined as follows: data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implementation in the Python Scikit-learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
- K-fold, 3, data
- K-fold, 3, array
- K-means, 6, array
- CrossValidation, 3, data
Also Visit: Full Practice Exam Quiz Answers