Coursera Answers

Perform data science with Azure Databricks Quiz Answers

Hello Friends in this article i am gone to share Coursera Course: Perform data science with Azure Databricks All Weeks Quiz Answers with you.. | Perform data science with Azure Databricks Quiz Answers

Enroll Link: Perform data science with Azure Databricks

Perform data science with Azure Databricks Coursera Quiz Answers

 

WEEK 1 QUIZ ANSWERS

 

Knowledge check 1

Question 1)
Apache Spark is a unified processing engine that can analyze big data with which of the following features?
Select all that apply.

  • Machine Learning
  • Support for multiple Drivers running in parallel on a cluster
  • Graph Processing
  • Real-time stream analysis
  • SQL

Question 2)
Which of the following Databricks features are not Open-Source Spark?
Select all that apply.

  • Databricks Workflows
  • Databricks Workspace
  • MLFlow
  • Databricks Runtime

Question 3)
Apache Spark notebooks allow which of the following?
Select all that apply.

  • Create new Workspace
  • Execution of code
  • Display graphical visualizations
  • Rendering of formatted text

Question 4)
In Azure Databricks when creating a new Notebook, the default languages available to select from are?
Select all that apply.

  • R
  • SQL
  • Scala
  • Python
  • Java

Question 5)
If your notebook is attached to a cluster, you can carry out which of the following from within the notebook?
Select all that apply.

  • Detach your notebook from the cluster
  • Delete the cluster
  • Restart the cluster
  • Attach to another cluster

 

Knowledge check 2

Question 1)
Select all that apply.
You work with Big Data as a data engineer or a data scientist, and you must process data that is oftentimes referred to as the โ€œ3 Vs of Big Dataโ€. What do the 3Vs of Big Data stand for?

  • Variable
  • Volume
  • Variety
  • Velocity

Question 2)
Spark’s performance is based on parallelism. Which of the following Scalability methods is limited to a finite amount of RAM, Threads and CPU speeds?

  • Vertical Scaling
  • Diagonal Scaling
  • Horizontal Scaling

Question 3)
In an Apache Spark Cluster jobs are divided into which of the following?

  • Drivers
  • Executors
  • Tasks
  • Slots

Question 4)
When creating a new cluster in the Azure Databricks workspace, which of the following is a sequence of steps that happens in the background?

  • When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.
  • Azure Databricks provisions a dedicated VM (Virtual Machine) that processes all jobs, based on your VM type and size selection.
  • Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

Question 5)
To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?

  • Jobs
  • Arrays
  • Stages

Question 6)
Spark Cluster use two levels of parallelization. Which of the following are levels of parallelization?

  • Job
  • Partition
  • Executor
  • Slot

 

Test prep Quiz Answers

Question 1)
Azure Databricks Runtime adds several key capabilities to Apache Spark workloads that can increase performance and reduce costs. Which of the following are features of Azure Databricks?
Select all that apply.

  • Indexing
  • Caching
  • Auto-scaling and auto-termination
  • Parallel Cluster Drivers
  • High-speed connectors to Azure storage services

Question 2)
Apache Spark supports which of the following languages?
Select all that apply.

  • Scala
  • Java
  • Python
  • ORC

Question 3)
Which of the following statements are True
Select all that apply.

  • Once created a notebook can only be connected to the original cluster.
  • To use your Azure Databricks notebook to run code, you must attach it to a cluster
  • To use your Azure Databricks notebook to run code you do not require a cluster
  • You can detach a notebook from a cluster and attach it to another cluster.

Question 4)
Which of the following Databricks features are not Open-Source Spark?

  • Databricks Workflows
  • Databricks Workspace
  • Databricks Runtime
  • MLFlow

Question 5)
How many drivers does a Cluster have?

  • Configurable between one and eight
  • Only one
  • Two, running in parallel

Question 6)
What type of process are the driver and the executors?

  • Java processes
  • C++ processes
  • Python processes

Question 7)
You work with Big Data as a data engineer, and you must process real-time data. This is referred to as having which of the following characteristics?

  • High velocity
  • Variety
  • High volume

Question 8)
Spark’s performance is based on parallelism. Which of the following Scalability methods is limited to a finite amount of RAM, Threads and CPU speeds?

  • Vertical Scaling
  • Diagonal Scaling
  • Horizontal Scaling

Question 9)
Spark Cluster use two levels of parallelization. Which of the following are levels of parallelization?

  • Job
  • Partition
  • Executor
  • Slot

Question 10)
In an Apache Spark Cluster jobs are divided into which of the following?

  • Drivers
  • Executors
  • Slots
  • Tasks

 

 

WEEK 2 QUIZ ANSWERS

Knowledge check 3

Question 1)
How do you list files in DBFS within a notebook?

  • %fs dir /my-file-path
  • ls /my-file-path
  • %fs ls /my-file-path

Question 2)
How do you infer the data types and column names when you read a JSON file?

  • spark.read.option(“inferSchema”, “true”).json(jsonFile)
  • spark.read.option(“inferData”, “true”).json(jsonFile)
  • spark.read.inferSchema(“true”).json(jsonFile)

Question 3)
Which of the following SparkSession functions returns a DataFrameReader

  • readStream(..)
  • createDataFrame(..)
  • read(..)
  • emptyDataFrame(..)

Question 4)
When using a notebook and a spark session. We can read a CSV file. Which of the following can be used to view the first couple thousand characters of a file?

  • %fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv
  • %fs dir /mnt/training/wikipedia/pageviews/
  • %fs ls /mnt/training/wikipedia/pageviews/

 

Knowledge check 4

Question 1)
Which of the following SparkSession functions returns a
DataFrameReader

  • createDataFrame(..)
  • emptyDataFrame(..)
  • .read(..)
  • .readStream(..)

Question 2)
When using a notebook and a spark session. We can read a CSV file.
Which of the following can be used to view the first couple of thousand characters of a file

  • %fs dir /mnt/training/wikipedia/pageviews/
  • %fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv
  • %fs ls /mnt/training/wikipedia/pageviews/

Question 3)
Which DataFrame method do you use to create a temporary view?

  • createTempView()
  • createTempViewDF()
  • createOrReplaceTempView()

Question 4)
How do you define a DataFrame object?

  • Use the createDataFrame() function
  • Use the DF.create() syntax
  • Introduce a variable name and equate it to something like myDataFrameDF =

Question 5)
How do you cache data into the memory of the local executor for instant access?

  • .inMemory().save()
  • .cache()
  • .save().inMemory()

Question 6)
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

  • IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
  • IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
  • IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)

 

Test prep Quiz Answers

Question 1)
How do you list files in DBFS within a notebook?

  • %fs dir /my-file-path
  • ls /my-file-path
  • %fs ls /my-file-path

Question 2)
How do you infer the data types and column names when you read a JSON file?

  • spark.read.inferSchema(“true”).json(jsonFile)
  • spark.read.option(“inferData”, “true”).json(jsonFile)
  • spark.read.option(“inferSchema”, “true”).json(jsonFile)

Question 3)
Which of the following SparkSession functions returns a DataFrameReader?

  • read(..)
  • emptyDataFrame(..)
  • readStream(..)
  • createDataFrame(..)

Question 4)
When using a notebook and a spark session. We can read a CSV file. Which of the following can be used to view the first couple thousand characters of a file?

  • %fs ls /mnt/training/wikipedia/pageviews/
  • %fs dir /mnt/training/wikipedia/pageviews/
  • %fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv

Question 5)
You have created an Azure Databricks cluster, and you have access to a source file.
fileName = “dbfs:/mnt/training/wikipedia/clickstream/2015_02_clickstream.tsv”
You need to determine the structure of the file. Which of the following commands will assist with determining what the column and data types are?

  • .option(“inferSchema”, “false”)
  • .option(“header”, “false”)
  • .option(“inferSchema”, “true”)
  • .option(“header”, “true”)

Question 6)
In an Azure Databricks workspace you run the following command:
%fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv
The partial output from this command is as follows:
[Truncated to first 65536 bytes]

“timestamp” “site” “requests”

“2015-03-16T00:09:55” “mobile” 1595
“2015-03-16T00:10:39” “mobile” 1544
“2015-03-16T00:19:39” “desktop” 2460
“2015-03-16T00:38:11” “desktop” 2237
“2015-03-16T00:42:40” “mobile” 1656
“2015-03-16T00:52:24” “desktop” 2452

Which of the following pieces of information can be inferred from the command and the output?
Select all that apply.

  • The column is Tab separated
  • The file has no header
  • All columns are strings
  • The file has a header
  • Two columns are strings, and one column is a number
  • the file is a comma separated or CSV file

Question 7)
In an Azure Databricks you wish to create a temporary view that will be accessible to multiple notebooks. Which of the following commands will provide this feature?

  • createOrReplaceTempView(set_scope โ€œGlobalโ€)
  • createOrReplaceGlobalTempView(..)
  • createOrReplaceTempView(..)

Question 8)
Which of the following is true in respect of Parquet Files?
Select all that apply.

  • Efficient data compression
  • Is a Row-Oriented data store
  • Is a Column-Oriented data store
  • Designed for performance on small data sets
  • Open Source
  • Is a splittable “file format”.

 


 

WEEK 3 QUIZ ANSWERS

 

Knowledge check 5

Question 1)
Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments. This functionality is referred to as?

  • Schema Evolution
  • Time Travel
  • ACID Transactions
  • Schema Enforcement

Question 2)
One of the core features of Delta Lake is performing upserts. Which of the following statements is true in regard to Upsert?

  • Upsert is a new DML statement for SQL syntax
  • UpSert is literally TWO operations. Update / Insert
  • Upsert is supported in traditional data lakes

Question 3)
When discussing Delta Lake, there is often a reference to the concept of Bronze, Silver and Gold tables. These levels refer to the state of data refinement as data flows through a processing pipeline and are conceptual guidelines. Based on these table concepts the refinements in Silver tables generally relate to which of the following?

  • Data that is directly queryable and ready for insights
  • Raw data (or very little processing)
  • Highly refined views of the data

Question 4)
What is the Databricks Delta command to display metadata?

  • SHOW SCHEMA tablename
  • DESCRIBE DETAIL tableName
  • MSCK DETAIL tablename

Question 5)
How do you perform UPSERT in a Delta dataset?

  • Use MERGE INTO my-table USING data-to-upsert
  • Use UPSERT INTO my-table
  • Use UPSERT INTO my-table /MERGE

Question 6)
What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade?

  • Ensures that all data backing, for example, Grade=8 is colocated, then updates a graph that routes requests to the appropriate files.
  • Creates an order-based index on the Grade field to improve filters against that field.
  • Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files.

 

Knowledge check 6

Question 1)
You have a dataframe which you preprocessed and filtered down to only the relevant columns.
The columns are: id, host_name, bedrooms, neighbourhood_cleansed, price.
You want to retrieve the first initial from the host_name field.
How would you write that function in local Python/Scala?

  • def firstInitialFunction(name):
    return name[0]
    firstInitialFunction(“Steven”)

Question 2)
Youโ€™ve written a function that retrieves the first initial letter from the host_name column.
You now want to define it as a user-defined named firstInitialUDF.
How you define that using Python/Scala?

  • firstInitialUDF = udf(firstInitialFunction)
  • firstInitialUDF = firstInitialFunction()
  • firstInitial = udf(firstInitial)
  • firstInitial = udf(firstInitialFunction)

Question 3)
If you want to create the UDF in the SQL namespace, what class do you need to use?

  • spark.udf.register
  • spark.sql.read
  • spark.sql.create
  • spark.sql.register

Question 4)
Which is another syntax that you can use to define a UDF in Python?

  • Wrapper
  • Decorator
  • Capsulator
  • Designer

Question 5)
True or false?
The Catalyst Optimizer can be used to optimize UDFs.

  • True
  • False

 

Test prep Quiz Answers

Question 1)
Delta Lake enables you to make changes to a table schema that can be applied automatically, without the need for DDL modifications. This functionality is referred to as?

  • ACID Transactions
  • Time Travel
  • Schema Evolution
  • Schema Enforcement

Question 2)
One of the core features of Delta Lake is performing upserts. Which of the following statements is true regarding Upsert?

  • Upsert is supported in traditional data lakes
  • Upsert is literally TWO operations. Update / Insert
  • Upsert is a new DML statement for SQL syntax

Question 3)
What is the Databricks Delta command to display metadata?

  • MSCK DETAIL tablename
  • SHOW SCHEMA tablename
  • DESCRIBE DETAIL tableName

Question 4)
What optimization does the following command perform: OPTIMIZE Customers ZORDER BY City?

  • Ensures that all data backing, for example, City=โ€™Londonโ€™ is colocated, then rewrites the sorted data into new Parquet files.
  • Creates an order-based index on the City field to improve filters against that field
  • Ensures that all data backing, for example, City=โ€Londonโ€ is colocated, then updates a graph that routes requests to the appropriate files.

Question 5)
You are planning on registering a user-defined function, g, as g_function in a SQL namespace. How would you achieve this programmatically?

  • spark.udf.register(g, “g_function”)
  • spark.udf.register(“g_function”, g)
  • spark.register_udf(“g_function”, g)

Question 6)
True or False?
User-defined Functions cannot operate on DataFrames.

  • Yes
  • No

Question 7)
Suppose you already have a dataframe which only contains relevant columns.
The columns are: id, employee_name, age, gender.
You want to retrieve the first initial from the employee_name field by creating a local function in Python/Scala. Which of the following code can be used to get the first initial from the host_name column?

  • def firstInitialFunction(name):
    return name[0]
    firstInitialFunction(“Steven”)

 


 

WEEK 4 QUIZ ANSWERS

Knowledge check 7

Question 1)
Which are the three main building blocks that form the machine learning process in Spark from featurization to model training and deployment? Select all that apply.

  • Extractor
  • Transformer
  • Loader
  • Pipelines
  • Estimator

Question 2)
From the Sparkโ€™s machine learning library MLlib, which one of the following abstractions takes a dataframe as an input and returns a new dataframe with one or more columns appended to it?

  • Transformer
  • Estimator
  • Pipeline

Question 3)
True or false?
Random forest models also need one-hot encoding.

  • True
  • False

Question 4)
When dealing with null values, which strategy can you implement if you want to see missing data later on without violating the schema?

  • Dropping the records
  • Basic inputting
  • Adding a placeholder
  • Advanced inputting

Question 5)
When working with regression models, if the p-value of your model coefficient is <0.5 between the input feature and the predicted output, what does that mean? Select all that apply.

  • There is a 95% probability of seeing the correlation by chance.
  • There is a 5% probability of seeing the correlation by chance.
  • There is more than 95% probability of seeing the correlation by chance.
  • There is less than 5% chance of seeing the correlation by chance.

 

Test prep Quiz Answers

Question 1)
How are qualitative variables also known as?
Select all that apply.

  • Numerical
  • Discrete
  • Continuous
  • Categorical

Question 2)
Which type of supervised learning problem tends to output quantitative values?

  • Regression
  • Clustering
  • Classification

Question 3)
In the process of explanatory data analysis, when we want to calculate the number of observations in the data set, which of the following will tell us if there are missing values in the dataset?

  • Standard deviation
  • Count
  • Mean

Question 4)
In terms of correlations, what does a negative correlation of -1 means?

  • There is no association between the variables.
  • For each unit increase in one variable, the same increase is seen in the other..
  • For each unit increase in one variable, the same decrease is seen in the other

Question 5)
Regarding visualization tools, which of the following can help you visualize quantiles and outliers?

  • t-SNE
  • Heat maps
  • Box plots
  • Q-Q plots

Question 6)
You have an AirBnB dataset where one categorical variable is room type.
There are three types of rooms: private room, entire home/apt, and shared room.
You must first encode each unique string into a number so that the machine learning model knows how to handle these room types.
How should you code that?

  • from pyspark.ml.feature import StringIndexer
    uniqueTypesDF = airbnbDF.select(“room_type”).distinct()
    indexer = StringIndexer(inputCol=”room_type”, outputCol=”room_type_index”)
    indexerModel = indexer.fit(uniqueTypesDF)
    indexedDF = indexerModel.transform(uniqueTypesDF)
    display(indexedDF)

 

Question 7)
You have an AirBnB dataset where one categorical variable is room type.
There are three types of rooms: private room, entire home/apt, and shared room.
After youโ€™ve encoded each unique string into a number, each room has a unique numerical value assigned.
Now you must one-hot encode each of those values to a location in an array, so that the machine learning algorithm can effect each category.
How should you code that?

  • from pyspark.ml.feature import OneHotEncoder
    encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
    encoderModel = encoder.fit(indexedDF)
    encodedDF = encoderModel.transform(indexedDF)
    display(encodedDF)

 


 

WEEK 5 QUIZ ANSWERS

Knowledge check

Question 1)
What are the three core issues MLflow seeks to address?
Select all that apply.

  • Code reproducing
  • Keeping track of experiments
  • Keeping track of identity
  • The standardization of model packaging and deployment

Question 2)
What is the MLflow Tracking tool?

  • An environment
  • A logging API
  • A class
  • A library

Question 3)
MLflow Tracking is organized around the concept of runs, which are basically executions of data science code.
Runs are aggregated into which of the following?

  • Workflows
  • Experiments
  • Datasets
  • Dataframe

Question 4)
What information can be recorded for each run? Select all that apply.

  • Source
  • Variables
  • Metrics
  • Parameters
  • Artifacts

Question 5)
Which of the following objects can be used to query past runs programmatically?

  • MlflowQuery
  • MlflowTracker
  • MlflowClient
  • MlflowFetcher

 

Knowledge check

Question 1)
What will happen to a model that has been trained and evaluated on the same data?

  • Overfitting
  • Well generalized
  • Underfitting

Question 2)
True or false?
A machine learning algorithm can learn hyperparameters from the data itself.

  • True
  • False

Question 3)
Which of the following best describes the process of Hyperparameter tuning?

  • The process of dropping the hyperparameters that do not perform well on the loss function of the model.
  • The process of modifying the hyperparameter until we get the best result on the loss function of the model.
  • The process of choosing the hyperparameter that performs the best on the loss function of the model.

Question 4)
When training different models with different hyperparameters and evaluating their performance, there is a risk of overfitting by choosing the hyperparameter that happens to perform best on the data found in the dataset.
Which cross-validation technique would be the best fit for solving this problem?

  • K-fold cross-validation
  • Time Series cross-validation
  • Holdout cross-validation
  • Repeated random subsampling validation

Question 5)
Which of the following hyperparameter optimization technique is the process of exhaustively trying every combination of hyperparameters?

  • Grid Search
  • Random Search
  • Bayesian Search

 

Test prep Quiz Answer

Question 1)
You can query previous runs programmatically by using the MlflowClient object as the pathway.
How would you code that in Python?

  • from mlflow.tracking import MlflowClient
    client = MlflowClient()
    client.list_experiments()

Question 2)
You can also use the search_runs method to find all runs for a given experiment.
How would you code that in Python?

  • experiment_id = run.info.experiment_id
    runs_df = mlflow.search_runs(experiment_id)
    display(runs_df)

Question 3)
You need to retrieve the last run from the list of experiments.
How would you code that in Python?

  • runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
    runs[0].data.metrics

 

Question 4)
Knowing that each algorithm has different hyperparameter available for tuning, which method can you use to explore the hyperparameters on a model?

  • showParams()
  • explainParams()
  • exploreParams()
  • getParams()

Question 5)
Which method from the PySpark class can you use to string together all the different possible hyperparameters you want to test?

  • ParamGridBuilder()
  • ParamGridSearch()
  • ParamBuilder()
  • ParamSearch()

Question 6)
Which of the following belong to the exhaustive type of cross-validation techniques?

  • K-fold cross-validation
  • Holdout cross-validation
  • Leave-one-out cross-validation
  • Leave-p-out cross-validation

Question 7)
In which of the following non-exhaustive cross validation techniques do you randomly assign data points to the training set and the test set?

  • Holdout cross-validation
  • Repeated random sub-sampling validation
  • K-fold cross-validation

 


 

WEEK 6 QUIZ ANSWERS

Knowledge check

Question 1
What is HorovodRunner?

  • A general API
  • A logging API
  • A Python class
  • A framework

Question 2
What does HorovodRunner use to take a Python method that contains deep learning training code?

  • Paths
  • Hooks
  • URI
  • URL

Question 3
Which are two methods supported by the HorovodRunner API?

  • init(self, main)
  • run(self, main, **kwargs)
  • init(self, np)
  • run(self, main, np, **kwargs)

Question 4
Regarding the MPI concepts on which the Horovod core principles are based on, which MPI concept would be the unique process ID?

  • Local Rank
  • Size
  • Density
  • Rank

Question 5
True or false?
TensorFlow objects cannot be found or pickled using the HorovodRunner API.

  • True
  • False

 

Knowledge check

Question 1
To deploy a model to Azure ML, you must create or obtain an Azure ML Workspace.
You can do that programmatically by using a function.
Which of the following functions can you use to create the workspace?

  • azureml.core.workspace.create()
  • azureml.core.model.create()
  • azureml.core.dataset.workspace()
  • azureml.core.environment.create()

Question 2
You want to use Azure ML to train a Diabetes Model and build a container image for the trained model.
You will use the scikit-learn ElasticNet linear regression model.
You need to load the diabetes datasets. How should you code that?

  • diabetes = datasets.load_diabetes()
    X = diabetes.data
    y = diabetes.target

Question 3
When working with Azure ML, you can use MLflow to build a container image for the trained model.
Which MLflow function can you use for that task?

  • azureml.mlflow.build_image()
  • mlflow.build_image()
  • mlflow.azureml.build_image()
  • mlflow.azureml.build.image()

Question 4
Which kind of HTTP request can you send to the AKS webserviceโ€™s scoring endpoint to evaluate the sample data?

  • PUT
  • GET
  • PATCH
  • POST

Question 5
Which Azure ML function can you use to replace the deploymentโ€™s existing model image with the new model image?

  • azureml.core.webservice.AksWebservice.serialize()
  • azureml.core.webservice.AksWebservice.deploy_configuration()
  • azureml.core.webservice.AksWebservice.add_properties()
  • azureml.core.webservice.AksWebservice.update()

 

Test prep Quiz Answers

Question 1)
When developing a distributed training program using HorovodRunner you would generally follow these steps:

  1. Create a HorovodRunner instance initialized with the number of nodes.
  2. Define a Horovod training method according to the methods described in Horovod usage, making sure to add any import statements inside the method.
  3. Pass the training method to the HorovodRunner instance.

How would you code that in Python?

  • hr = HorovodRunner(np=2)
    def train():
    import tensorflow as tf
    hvd.init()
    hr.run(train)

Question 2)
Youโ€™re using Horovod to train a distributed neural network using Parquet files and Petastorm.

You have a dataset of housing prices in California named cal_housing.

After loading the data, you want to create a Spark DataFrame from the Pandas DataFrame so that you can concatenate the features and labels of the model.
How would you code that in Python?

  • data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1)
    trainDF = spark.createDataFrame(data)
    display(trainDF)

Question 3)
Youโ€™re using Horovod to train a distributed neural network using Parquet files and Petastorm.
You have a dataset of housing prices in California named cal_housing.
After loading the data, you created a Spark DataFrame from the Pandas DataFrame so that you can concatenate the features and labels of the model.
Now you need to create Dense Vectors for the features.
How would you code that in Python?

  • from pyspark.ml.feature import VectorAssembler
    vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
    vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
    display(vecTrainDF)

Question 4)
True or false?
Petastorm requires a Vector as an input, not an Array.

  • True
  • False

Question 5)
Youโ€™re working with Azure Machine Learning and you want to train a Diabetes Model and build a container image for the trained model.

  • You will use the scikit-learn ElasticNet linear regression model.
  • You want to deploy the model to production using Azure Kubernetes Service (AKS).
  • You donโ€™t have an active AKS cluster, so you need to create one using the Azure ML SDK.
  • Youโ€™ll be using the default configuration.

How would you code that?

  • aks_target = ComputeTarget.create(workspace = workspace,
  • name = aks_cluster_name,
  • provisioning_configuration = prov_config)

Question 6)
Youโ€™re working with Azure Machine Learning and you want to train a Diabetes Model and build a container image for the trained model.
You will use the scikit-learn ElasticNet linear regression model.
You want to deploy the model to production using Azure Kubernetes Service (AKS).
Youโ€™ve created a AKS cluster for model deployment.
Youโ€™ve deployed the modelโ€™s image to the specified AKS cluster.
After youโ€™ve trained a new model with different hyperparameters, you need to deploy the new modelโ€™s image to the AKS cluster.
How would you code that?

  • prod_webservice.update(image=model_image_updated)
    prod_webservice.wait_for_deployment(show_output = True)

Question 7)
After working with Azure Machine Learning, you want to clean up the deployments and terminate the โ€œdevโ€ ACI webservice using the Azure ML SDK.
Which method should do the job?

  • dev_webservice.delete()
  • dev_webservice.flush()
  • dev_webservice.remove()
  • dev_webservice.terminate()