Full Practice Exam Quiz Answers
In this article i am gone to share Prepare for DP-100: Data Science on Microsoft Azure Exam | Week 6 | Full Practice Exam Quiz Answers with you..
Prepare for DP-100: Data Science on Microsoft Azure Exam Full Practice Exam Quiz Answers
Also Visit this links: Practice exam covering Course 4: Perform data science with Azure Databricks Quiz Answers
Full Practice Exam Quiz Answers
Question 1)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having a disease”.
- False negatives
- True negatives
- True positives
- False positives
Question 2)
Your company is asking you to analyze a dataset that contains historical data obtained from a local car-sharing company. For this task, you decide to develop a regression model and you want to be able to foretell what price a trip will be. For the correct evaluation of the regression model, you have to use performance metrics.
In this scenario, what are the best two metrics?
- An R-Squared value close to 1
- A Root Mean Square Error value that is low
- An R-Squared value close to 0
- An F1 score that is low
Question 3)
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the student’s length of education, degree type, and craft form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No
Question 4)
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the same unit as the label.
What is the metric described above?
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root
- Classification
- Mean Square Error (RMSE)
Question 5)
How should the following sentence be completed?
One example of the machine learning […] type models is the Support Vector Machine algorithm.
- Clustering
- Regression
- Classification
Question 6)
Your NumPy array has the shape (2,35). Considering this, what information can you get about the elements?
- The array contains 35 elements, all with the value 2.
- The array is two dimensional, consisting of two arrays with 35 elements each.
- The array contains 2 elements with the values of 2 and 35.
Question 7)
Choose from the list below the evaluation metric that provides you with an absolute metric in the same unit as the label.
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
Question 8)
Which are two appropriate ways to approach a problem when using multiclass classification?
- One vs Rest
- Rest minus One
- One vs One
- One and Rest
Question 9)
Your deep neural network is in the process of training. You decided to set 30 epochs to the training process configuration.
In this scenario, what would happen to the model’s behavior?
- The training data is split into 30 subsets, and each subset is passed through the network
- The first 30 rows of data are used to train the model, and the remaining rows are used to validate it
- The entire training dataset is passed through the network 30 times
Question 10)
The layer described below is used to reduce the number of feature values that are extracted from images, while still retaining the key differentiating features.
- Convolutional layer
- Flattening layer
- Pooling layer
Question 11)
The company that you work for decides to expand the use of machine learning. The company decides not to set up another compute environment in Azure. At the moment, you have at your disposal the compute environments below.
Considering the scenarios below, you must establish what is the most appropriate compute environment to:
1. Run an Azure Machine Learning Designer training pipeline
2. Deploy a web service from the Azure Machine Learning Designer
What are the best compute types for this goal?
- 1 mlc_cluster, 2 nb_server
- 1 mlc_cluster, 2 aks_cluster
- 1 nb_server, 2 aks_cluster
- 1 nb_server, 2 mlc_cluster
Question 12)
You have the role of lead data scientist in a project that keeps record of birds’ health and migration. You decide to use a set of labeled bird photographs collected by experts for your multi-class image classification deep learning model.
The entire set of 200,000 birds’ photographs uses the JPG format and is being kept in an Azure blob container from an Azure subscription. You have to be able to ensure access from the Azure Machine Learning service workspace used for deep learning model training directly to the bird photograph files stored in the Azure blob container.
You have to keep data movement to a minimum. What action should you take?
- Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service.
- Create an Azure Data Lake store and move the bird photographs to the store.
- Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.
- Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.
- Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.
Question 13)
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
In order for the machine learning model to know how to handle the room types, you have to firstly encode every unique string into a number.
What code should you write to achieve this goal?
- from pyspark.ml.feature import StringIndexer
uniqueTypesDF = airbnbDF.select(“room_type”).distinct()
indexer = StringIndexer(inputCol=”room_type”, outputCol=”room_type_index”)
indexerModel = indexer.fit(uniqueTypesDF)
indexedDF = indexerModel.transform(uniqueTypesDF)
display(indexedDF)
Question 14)
You decide to register and train a model in your Azure Machine Learning workspace.
Your pipeline needs to ensure that the client applications are able to use the model for batch inferencing.
Your single ParallelRunStep step pipeline uses a Python inferencing script in order to obtain predictions from the input data.
Your task is to configure the inferencing script for the ParallelRunStep pipeline step.
Which are the most suitable two functions that you should use? Keep in mind that every correct answer presents a part of the solution.
- batch()
- run(mini_batch)
- init()
- score(mini_batch)
- main()
Question 15)
You decide to deploy a real-time inference service for a trained model.
Your model is able to support a business-critical application, and you have to ensure it can monitor the data that is submitted to the web, as well as the predictions generated by the data.
While keeping the administrative effort to a minimum, you have to be able to implement a monitoring solution for the model deployed. What action should you take?
- Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.
- Create an ML Flow tracking URI that references the endpoint, and view the data logged by ML Flow.
- View the explanations for the registered model in Azure ML studio.
- View the log files generated by the experiment used to train the model.
Question 16)
If you want to install the Azure Machine Learning SDK for Python, what are the most suitable package managers and CLI commands?
- nuget azureml-sdk
- pip install azureml-sdk
- npm install azureml-sdk
- yarn install azureml-sdk
Question 17)
What SDK commands should you choose if you want to extract a certain version of a data set?
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version(2))
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=2)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=’2’)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version_2)
Question 18)
Your task is to use the SDK in order to define a compute configuration for a managed compute target.
Which of the following commands will return you the expected result?
- compute_config = AmlCompute.provisioning_configuration(vm_size=’STANDARD_DS11_V2′,
min_nodes=0, max_nodes=4,
vm_priority=’dedicated’)
Question 19)
You want to use a reference to the Run that was used to train the model in order to register it.
What are the most suitable SDK commands to achieve this goal?
- from azureml.core import Model
run.register_model( model_name=’classification_model’,
model_path=’outputs/model.pkl’,
description=’A classification model’)
Question 20)
If you want to set up a parallel run step, which of the SDK commands below should you choose?
- parallelrun_step = ParallelRunStep(
name=’batch-score’,
parallel_run_config=parallel_run_config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True
Question 21)
What code should you write for an instance of a MimicExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
initialization_examples=X_test,
explainable_model = DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 22)
What code should you write for a PFIExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import PFIExplainer
pfi_explainer = PFIExplainer(model = loan_model,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 23)
If you want to minimize disparity in combined true positive rate and false_positive_rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- Equalized odds
- True positive rate parity
- Error rate parity
- False-positive rate parity
Question 24)
Your task is to ensure that your data drift monitor, that you scheduled to run daily, is able to send an alert when the drift magnitude surpasses 0.2. What code should you write in Python to achieve this?
- alert_email = AlertConfiguration(‘[email protected]’)
monitor = DataDriftDetector.create_from_datasets(ws, ‘dataset-drift-detector’,
baseline_data_set, target_data_set,
compute_target=cpu_cluster,
frequency=’Day’, latency=2,
drift_threshold=.2,
alert_configuration=alert_email)
Question 25)
Your goal is to train a model in the AirBnB Housing dataset you have, so that it will be able to predict the value of housing by analyzing one or several input measures.
In order to train the model on a single column that includes a vector of all the important features, you decide to use the Spark ml framework.
For the data to be prepared, you create the column entitled features that contains the average number of rooms, age and tax rate.
You decide to use VectorAssembler to obtain the result.
Considering this scenario, what code should you write?
- from pyspark.ml.feature import VectorAssembler
featureCols = [“rm”, “age”, “tax”]
assembler = VectorAssembler(inputCols=featureCols, outputCol=”features”)
bostonFeaturizedDF = assembler.transform(bostonDF)
display(bostonFeaturizedDF)
Question 26)
You decided to use Python code interactively in your Conda environment. You have all the required Azure Machine Learning SDK and MLflow packages in the environment.
In order to log metrics in your Azure Machine Learning experiment entitled mlflow-experiment, you have to use MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 run.log()
- #1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3 mlflow.active_run(), #4 mlflow.log_metric
Question 27)
Choose from the list below the supervised learning problem type that usually outputs quantitative values.
- Classification
- Clustering
- Regression
Question 30)
Choose from the list below the cross-validation technique that belongs to the exhaustive type.
- K-fold cross-validation
- Holdout cross-validation
- Leave-one-out cross-validation
- Leave-p-out cross-validation
Question 31)
Your task is to clean up the deployments and terminate the “dev” ACI webservice by making use of the Azure ML SDK after your work with Azure Machine Learning has ended.
What is the most suitable method in order to achieve this goal?
- dev_webservice.terminate()
- dev_webservice.flush()
- dev_webservice.delete()
- dev_webservice.remove()
Question 32)
True or False?
In order to differentiate multiple images, convolutional filters and pooling are used by the feature extraction layers to emphasize edges, corners, and other patterns.
This solution is supposed to work for any other group of images that have the same dimensions set as the network input layer.
- True
- False
Question 33)
You can enable the Application Insights when configuring the service deployment at the moment you want to deploy a new real-time service.
By using the SDK, what code should you write to achieve this goal?
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1,
enable_app_insights=True)
Question 34)
You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a container image for it.
You choose to make use of the scikit-learn ElasticNet linear regression model.
You want to use Azure Kubernetes Service (AKS) for the model deployment to production.
For deploying the model, you configured an AKS cluster.
At this point, you have deployed the image of the model to the desired AKS cluster.
After using different hyperparameters to train the new model, your goal is to deploy to the AKS cluster the new image of the model.
What code should you write for this task?
- prod_webservice.update(image=model_image_updated)
prod_webservice.wait_for_deployment(show_output = True)
Question 35)
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
- Box plot
- A violin plot
- Binary classification confusion matrix
- Gradient descent
Question 36)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person suffers from a disease. Your model classifies the case as having a disease”.
- True positives
- True negatives
- False positives
- False negatives
Question 37)
As a senior data scientist, you need to evaluate a binary classification machine learning model.
As evaluation metric, you have to use the precision. Considering this, which is the most appropriate visualization?
- Scatter plot
- Gradient descent
- Receiver Operating Characteristic (ROC) curve
- Violin plot
Question 38)
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the same unit as the label.
What is the metric described above?
- Mean Square Error (MSE)
- Coefficient of Determination (known as R-squared or R2)
- Root Mean Square Error (RMSE)
Question 39)
You have a Pandas DataFrame entitled df_sales that contains the sales data from each day. You DataFrame contains these columns: year, month, day_of_month, sales_total. Which of the following codes should you choose if your goal is to return the average sales_total value?
- df_sales[‘sales_total’].mean()
- df_sales[‘sales_total’].avg()
- mean(df_sales[‘sales_total’])
Question 40)
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the student’s length of education, degree type, and art form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No
Question 41)
If you use the sklearn.metrics classification report for evaluating how your model performs, what result do you get from the F1-Score metric?
- Out of all of the instances of this class in the test dataset, how many did the model identify
- An average metric that takes both precision and recall into account.
- How many instances of this class are there in the test dataset
- Of the predictions the model made for this class, what proportion were correct
Question 42)
In order to create clusters, Hierarchical clustering uses two methods.
What are the two methods used in this case?
- Divisive
- Distinctive
- Aggregational
- Agglomerative
Question 43)
What is the effect that you obtain if you increase the Learning Rate parameter for the deep neural network that you are creating?
- More hidden layers are added to the network
- More records are included in each batch passed through the network
- Larger adjustments are made to weight values during backpropagation
Question 44)
The layer described below is used to reduce the number of feature values that are extracted from images, while still retaining the key differentiating features.
- Flattening layer
- Pooling layer
- Convolutional layer
Question 45)
After installing the Azure Machine Learning Python SDK, you decide to use it to configure on your subscription a workspace entitled “aml-workspace”.
What code should you write in Python for this task?
- from azureml.core import Workspace
ws = Workspace.create(name=’aml-workspace’,
subscription_id=’123456-abc-123…’,
resource_group=’aml-resources’,
create_resource_group=True,
location=’eastus’
)
Question 46)
After installing the Azure Machine Learning CLI extension, you decide to use it to set up an ML workspace in your existing resource group.
What Azure CLI command should you choose for this task?
- az ml new workspace create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml ws create -w ‘aml-workspace’ -g ‘aml-resources’
- new az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’
Question 47)
What are the most appropriate SDK commands you should choose if you want to publish the pipeline that you created?
- published_pipeline = pipeline.publish(name=’training_pipeline’,
description=’Model training pipeline’,
version=’1.0′)
Question 48)
Choose from the options below the one that explains how are values for hyperparameters selected by random sampling.
- From a mix of discrete and continuous values
- It tries every possible combination of parameters in the search space
- It tries to select parameter combinations that will result in improved performance from the previous selection
Question 49)
What Python code should you write if your goal is to implement a median stopping policy?
- from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
Question 50)
Your task is to enable the creation of an explanation in the experiment script. What packages should you install in the run environment in order to achieve this goal?
- azureml-contrib-interpret
- azureml-explainer
- azureml-interpret
- azureml-blackbox
Question 51)
You decided to preprocess and filter down only the relevant columns for your AirBnB housing dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Your goal is to use the spark.sql.register in order to create a UDF from the function above, because you want to ensure that the UDF will be created in the SQL namespace.
Considering this scenario, what code should you write?
- airbnbDF.createOrReplaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
Question 52)
In order to track the runs of a Linear Regression model of your AirBnB dataset, you decide to use MLflow.
You want to make use of all the features included in your dataset.
At this point, you have created and logged the pipeline and you have logged the parameters.
You now have to create some predictions and metrics.
Considering this scenario, what code should you write?
- predDF = pipelineModel.transform(testDF)
regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
Question 53)
You are using remote compute in Azure Machine Learning to run a training experiment.
The Conda environment used for the experiment includes both the mlflow, and the azureml-contrib-run packages. In order to track the metrics that the experiment generates, you have to log package by using MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
Import numpy as np
#1 Import library to log metrics
#2 Start logging for this run
reg_rage = 0.01
#3 Log the reg_rate metric
#4 Stop loggin for this run
- #1 import logging, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 logger.info(‘ ..’), #4 mlflow.end_run()
- #1 from azureml.core import Run, #2 run = Run.get_context(), #3 logger.info(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 mlflow.end_run()
Question 54)
Choose from the list below all the options that show how are also entitled the qualitative variables.
- Numerical
- Continuous
- Discrete
- Categorical
Question 55)
Which of the below visualization tools is able to help you visualize quantiles and outliers?
- t-SNE
- Heat maps
- Box plots
- Q-Q plots
Question 56)
Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
runs[0].data.metrics
Question 57)
Your task is to store in the Azure ML workspace a model for whose training you ran an experiment. You want to do this so that other experiments and services can be applied to the model.
Considering this scenario, what action should you take to achieve the result?
- Save the model as a file in a compute instance
- Save the experiment script as a notebook
- Register the model in the workspace
- Save the model as a file in a Key Vault instance
Question 58)
In you want to explore the hyperparameters on a model while knowing that every algorithm uses a different hyperparameter for tuning, what is the most appropriate method you should choose?
- showParams()
- exploreParams()
- getParams()
- explainParams()
Question 59)
You usually take the following steps when you use HorovodRunner in order to develop a distributed training program:
1. Configure a HorovodRunner instance that is initialized with the nodes number.
2. While using the methods described in Horovod usage, define a Horovod training method for which you want to ensure that import statements are added inside the method.
What code should you write in Python to achieve this?
- hr = HorovodRunner(np=2)
def train():
import tensorflow as tf
hvd.init()
hr.run(train)
Question 60)
You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of the solution.
- The run ID of the inference pipeline experiment for the endpoint
- The key for the endpoint
- The URL of the endpoint
- The name of the AKS cluster where the endpoint is hosted
- The name of the inference pipeline for the endpoint
Question 61)
You decide to use a two-class logistic regression model for a binary classification. If you have to evaluate the results for imbalance issues, what would be the best evaluation metric for the model?
- Relative Absolute Error
- AUC Curve
- Relative Squared Error
- Mean Absolute Error
Question 62)
You decide to use GPU-based training to develop a deep learning model on Azure Machine Learning service that is able to recognize image.
The context where you have to configure the model needs to allow real-time GPU-based inferencing.
Considering that you have to set up compute resources for model inferencing, what is the most suitable compute type?
- Field Programmable Gate Array
- Azure Kubernetes Service
- Azure Container Instance
- Machine Learning Compute
Question 63)
Your task is to create and evaluate a model. You decide to use a specific metric that provides you a direct proportionality with how well the model fits.
What is the evaluation model described above?
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
- Coefficient of Determination (known as R-squared or R2)
Question 64)
When you use the Support Vector Machine algorithm, what type of machine learning model is possible to train?
- Regression
- Clustering
- Classification
Question 65)
Your task is to set up an Azure Machine Learning workspace. You decide to use a laptop computer to create a local Python environment.
You want to ensure connection between the laptop and the workspace and you want to run experiments.
You start creating the config.json file below:
{ “workspace_name” : “ml-workspace” }
In order to interact in the workspace with data and experiments, you have to use the Azure Machine Learning SDK. Your config.json file has to be able to connect from the Python environment directly to the workspace. If you want to ensure connection to the workspace, what should be the two additional parameters that you should add to the config,json? Keep in mind that every correct answer presents a part of the solution.
- Region
- Subscription_id
- Login
- Resource_group
- Key
Question 66)
You want to set up a new Azure subscription. The subscription doesn’t contain any resources.
Your goal is to create an Azure Machine Learning workspace.
Considering this scenario, which are three possible ways to obtain this result? Keep in mind that every correct answer presents a complete solution.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with name, subscription_id, resource_group, and location parameters.
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying Cw and Cg parameters for the workspace name and resource group.
- Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/ workspaces resource and its dependencies.
- Navigate to Azure Machine Learning studio and create a workspace.
Question 67)
You are in the process of training a machine learning model. Your model has to be configured for testing as a real-time inference service. For the service you have to ensure low CPU utilization and less than 48 MB of RAM. While keeping cost and administrative overhead to a minimum, you have to make sure that the compute target for the deployed service is initialized in an automatic manner.
In this scenario, what is the most appropriate compute target?
- Azure Machine Learning compute cluster
- attached Azure Databricks cluster
- Azure Container Instance (ACI)
- Azure Kubernetes Service (AKS) inference cluster
Question 68)
Yes or No?
In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a PFIExplainer. Is this solution effective?
- Yes
- No
Question 69)
Yes or No?
You use a logistic regression algorithm to train your classification model. In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a TabularExplainer. Is this solution effective?
- Yes
- No
Question 70)
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace, what Python command would be the most appropriate?
- from azureml.core import Workspace
ws = Workspace.from_config()
Question 71)
As a data scientist, you are asked to build a deep convolutional neural network (CNN) in order to classify images. Your CNN model seems to present some overfitting signs. Your goal is to minimize overfitting and to give an optimal fit to the model.
Considering this, what are the most appropriate two actions that you should take?
- Reduce the amount of training data
- Use training data augmentation
- Add an additional dense layer with 64 input units
- Add an additional dense layer with 512 input units
- Add L1/L2 regularization
Question 72)
If you want to visualize the environments that you registered in your workspace, what are the most appropriate SDK commands that you should choose?
- from azureml.core import Environment
env_names = Environment.list(workspace=ws)
for env_name in env_names:
print(‘Name:’,env_name)
Question 73)
If you want to extract the parallel_run_step.txt file from the output of the step after the pipeline run has ended, what code should you choose?
- for root, dirs, files in os.walk(‘results’):
for file in files:
if file.endswith(‘parallel_run_step.txt’):
result_file = os.path.join(root,file)
Question 74)
Your company uses a set of labeled photographs for the multi-class image classification deep learning model that is creating.
During summer time, the software engineering team noticed that for the prediction web services is a heavy inferencing load. Although the production web service for the model has a fully-utilized compute cluster for the deployment of the web service, it fails to meet demand.
While keeping the downtime and the administrative effort to a minimum, you have to be able to improve the image classification web service performance. Considering this, what actions do you recommend the IT Operations team to take?
- Increase the minimum node count of the compute cluster where the web service is deployed.
- Increase the node count of the compute cluster where the web service is deployed.
- Increase the VM size of nodes in the compute cluster where the web service is deployed.
- Create a new compute cluster by using larger VM sizes for the nodes, redeploy the web service to that cluster, and update the DNS registration for the service endpoint to point to the new cluster.
Question 75)
Your task is to train a binary classification model in order for it to be able to target the correct subjects in a marketing campaign.
What actions should you take if you want to ensure that your model is fair and will not be inclined to ethnic discrimination?
- Evaluate each trained model with a validation dataset, and use the model with the highest accuracy score. An accurate model is inherently fair.
- Compare disparity between selection rates and performance metrics across ethnicities.
- Remove the ethnicity feature from the training dataset.
Question 76)
You decided to preprocess and filter down only the relevant columns for your AirBnB housing dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Considering that Python UDFs are much slower than Scala UDFs, your goal is to create a Vectorized UDF in Python in order to speed up the computation.
In this scenario, what code should you write?
- from pyspark.sql.functions import pandas_udf
@pandas_udf(“string”)
def vectorizedUDF(name):
return name.str[0]
Question 77)
You decided to use the AirBnB Housing dataset and the Linear Regression algorithm for which you want to tune the Hyperparameters.
At this point, for the Boston data set you have executed a test split and for the linear regression you have built a pipeline.
You now want to test the maximum number of iterations by using the ParamGridBuilder() and you can do this no matter if you want to use an intercept with the y axis or fi you want to standardize the features.
Considering this scenario, what code should you write?
- from pyspark.ml.tuning import ParamGridBuilder
paramGrid = (ParamGridBuilder()
.addGrid(lr.maxIter, [1, 10, 100])
.addGrid(lr.fitIntercept, [True, False])
.addGrid(lr.standardization, [True, False])
.build()
)
Question 78)
You want to deploy in your Azure Container Instance a deep learning model.
In order to call the model API, you have to use the Azure Machine Learning SDK.
To invoke the deployed model, you have to use native SDK classes and methods.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from azureml.core import Workspace
#1st code option
Import json
ws = Workspace.from_config()
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)
x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
- from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
- from azureml.core.webservice import Webservice, predictions = service.run(input_json)
- from azureml.core.webservice import requests, predictions = service.run(input_json)
Question 79)
Your hyperparameter tuning needs to have a search space defined. The values of the batch_size hyperparameter can be 128, 256, or 512 and the normal distribution values for the learning_rate hyperparameter can have a mean of 10 and a standard deviation of 3.
What Python code should you write in order to achieve this goal?
- from azureml.train.hyperdrive import choice, normal
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: normal(10, 3)
}
Question 80)
You intend to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter values when training a model.
You need to use Hyperdrive to try combinations of the following hyperparameter values:
— learning_rate: any value between 0.001 and 0.1
— batch_size: 16, 32, or 64
You must configure the search space for the Hyperdrive experiment.
Which two parameter expressions should you use? Each correct answer presents part of the solution.
- A uniform expression for learning_rate
- A normal expression for batch_size
- A choice expression for learning_rate
- A choice expression for batch_size
Question 81)
You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of the solution.
- The key for the endpoint
- The name of the AKS cluster where the endpoint is hosted
- The run ID of the inference pipeline experiment for the endpoint
- The name of the inference pipeline for the endpoint
- The URL of the endpoint
Question 82)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person suffers from a disease. Your model classifies the case as having no disease”.
- True negatives
- True positives
- False negatives
- False positives
Question 83)
You have a dataset that can be used for multiclass classification tasks. The dataset provided contains a normalized numerical feature set with 20,000 data points and 300 features. For training purposes, you need 75 percent of the data points and for testing purposes you need 25 percent.
Name Description
X_train Training feature set
Y_train Training class labels
x_train Testing feature set
y_train Testing class labels
Your goal is to use the method of the Principal Component Analysis (PCA) in order to reduce the feature set dimensionality to 20 features for training and testing sets also.
You decide to apply in Python the scikit-learn machine learning library.
You mark with X the feature set and with Y the class labels.
Your Python data frames include the below code segment:
From sklearn.decomposition import PCA
pca – […]
x_train=[…].fit_transform(X_train)
x_test = pca.[…]
How would you complete the missing brackets for the code snippet presented?
- Box1: PCA(n_components=10);
Box2: pca;
Box3: transform(x_test)
Question 84)
What is the result for multiplying a NumPy array by 3?
- The new array will be 3 times longer, with the sequence repeated 3 times and also all the elements are multiplied by 3.
- The new array will be 3 times longer, with the sequence repeated 3 times.
- Array stays the same size, but each element is multiplied by 3.
Question 85)
Which of the layer types described below is a principal one that retrieves important features in images and works by putting a filter to images?
- Pooling layer
- Flattening layer
- Convolutional layer
Question 86)
In order to register a datastore in a Machine Learning services workspace, one of your coworkers decides to use the code below:
Datastore.register_azure_blob_container(workspace=ws,
datastore_name=‘demo_datastore’,
container_name=‘demo_datacontainer’,
account_name=’demo_account’,
account_key=’0A0A0A-0A00A0A-0A0A0A0A0A0’
create_if_not_exists=True)
You want to be able to access the datastore by using a notebook. If you want to achieve this goal, what code should you write for completing the following snippet segment?
import azureml.core
from azureml.core import Workspace, Datastore
ws = Workspace.from_config()
datastore = <add answer here> .get( <add answer here>, ‘<add answer here>’)
- Experiment, run, demo_account
- Run, experiment, demo_datastore
- DataStore, ws, demo_datastore
- Run, ws, demo_datastore
Question 87)
You decide to use the code below for the deployment of a model as an Azure Machine Learning real-time web service:
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
Your deployment does not succeed.
You have to troubleshoot the deployment failure in order to determine what actions were taken while deploying and to identify the one action that encountered a problem and didn’t succeed.
For this scenario, which of the following code snippets should you use?
- service.update_deployment_state()
- service.state
- service.serialize()
- service.get_logs()
Question 88)
You want to use your registered model in a batch inference pipeline.
For processing files in a file dataset, your batch inference pipeline has to use a ParallelRunStep step. The script has the ParallelRunStep step and every time the inferencing function is used, the runs need to be able to process six input files.
You have to set up the pipeline. What configuration setting needs to be specified in the ParallelRunConfig object for the ParallelRunStep step?
- process_count_per_node= “6”
- mini_batch_size= “6”
- error_threshold= “6”
- node_count= “6”
Question 89)
What object needs to be defined if your task is to create a schedule for your pipeline?
- ScheduleSync
- ScheduleConfig
- ScheduleTimer
- ScheduleRecurrence
Question 90)
You can combine the Bayesian sampling with an early-termination policy and you can use it only with these three parameter expressions: choice, uniform and quniform.
- False
- True
Question 91)
If you want to minimize disparity in the selection rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- Demographic parity
- Equalized odds
- Error rate parity
- Bounded group loss
Question 92)
Your task is to back fill a dataset monitor for the previous 5 months based on changes made in data on a monthly basis.
What code should you write in the SDK to achieve this goal?
- import datetime as dt
backfill = monitor.backfill( dt.datetime.now() – dt.timedelta(months=5), dt.datetime.now())
Question 93)
You discover a median value for a number of variables in your AirBnB Housing dataset, variables like the number of rooms, per capita crime and economic status of residents.
Depending on the average number of rooms, you want to be able to predict the median home value by using Linear Regression.
You decided to use VectorAssembler to import the dataset and to create your column entitled features that includes a single input variable entitled rm.
At this moment you have to fit the Liner Regression model.
Considering this scenario, what code should you write?
- from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
lrModel = lr.fit(bostonFeaturizedDF)
Question 93)
You are able to use the the MlflowClient object as the pathway in order to query previous runs in a programmatic manner.
What code should you write in Python to achieve this?
- from mlflow.tracking import MlflowClient
client = MlflowClient()
client.list_experiments()
Question 94)
You decided to use Azure Machine Learning to create machine learning models. You want to use multiple compute contexts to train and score models.
Moreover, you want to use Azure Databricks cluster to train models.
Considering this scenario, what compute type is the most suitable to use for Azure Databricks?
- Compute cluster
- Attached compute
- Inference cluster
Question 95)
For your experiment in Azure Machine Learning you decide to run the following code:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
ws = Workspace.from_config()
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’, script=’experiment.py’, run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
The experiment run generates several output files that need identification.
In order to retrieve the output file names, you must write some code. Which of the following code snippets should you choose to complete the script?
- files = run.get_details_with_logs()
- files = run.get_properties()
- files = run.get_fine_names()
- files = run.get_metrics()
Question 96)
Choose from the descriptions below the one that explains what does a negative correlation of -1 mean in terms of correlations.
- There is no association between the variables
- For each unit increase in one variable, the same decrease is seen in the other
- For each unit increase in one variable, the same increase is seen in the other
Question 97)
If you want to string together all the different possible hyperparameters that you need for testing, what is the most suitable PySpark class method you should choose?
- ParamGridSearch()
- ParamSearch()
- ParamGridBuilder()
- ParamBuilder()
Question 98)
What Python command should you choose in order to view the models previously registered in the Azure ML studio by using the Model object?
- from azureml.core import Model
for model in Model.list(ws):
print(model.name, ‘version:’, model.version)
Question 99)
The DataFrame you are currently working on contains data regarding the daily sales of ice cream. In order to compare the avg_temp and units_sold columns you decided to use the corr method which returned a result of 0.95.
What information can you read from this result?
- The units_sold value is, on average, 95% of the avg_temp value
- Days with high avg_temp values tend to coincide with days that have high units_sold values
- On the day with the maximum units_sold value, the avg_temp value was 0.95
Question 100)
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: You use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.
Does the solution meet the goal?
- Yes
- No
Question 101)
What is the result for multiplying a list by 3?
- The new list remains the same size, but the elements are multiplied by 3.
- The new list created has the length 3 times the original length with the sequence repeated 3 times.
- The new list created has the length 3 times the original length with the sequence repeated 3 times and also all the elements are also multiplied by 3.
Question 102)
If you multiply by 2 a list and a NumPy array, what result would you get?
- Multiplying an NumPy array by 2 creates a new array 2 times the length with the original sequence repeated 2 times.
- Multiplying a list by 2 performs an element-wise calculation on the list, which sees the list stay the same size, but each element has been multiplied by 2.
- Multiplying a list by 2 creates a new list 2 times the length with the original sequence repeated 2 times.
- Multiplying a NumPy array by 2 performs an element-wise calculation on the array, which sees the array stay the same size, but each element has been multiplied by 2.
Question 103)
You decided to use the LinearRegression class from the scikit-learn library to create your model object.
If you want to train the model, what should your next step be?
- Call the score() method of the model object, specifying the training feature and test feature arrays
- Call the fit() method of the model object, specifying the training feature and label arrays
- Call the predict() method of the model object, specifying the training feature and label arrays
Question 104)
In order to define a pipeline with multiple steps, you decide to use the Azure Machine Learning Python SDK. You notice that some steps of the pipeline do not run. Instead of running the steps, the pipeline uses a cached output from a previous run. Your task is to make sure that the pipeline runs every step, even when the parameters and contents of the source directory are the same with the ones from the previous run.
From the following list, which two ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Restart the compute cluster where the pipeline experiment is configured to run.
- Set the outputs property of each step in the pipeline to True.
- Use a PipelineData object that references a datastore other than the default datastore.
- Set the regenerate_outputs property of the pipeline to True.
- Set the allow_reuse property of each step in the pipeline to False.
Question 105)
If you want to extract a dataset after its registration, what are the most suitable methods you should choose from the Dataset class?
- find_by_id
- get_by_id
- find_by_name
- get_by_name
Question 106)
Four possible prediction outcomes are able to provide you with the Precision and Recall metrics.
What is the outcome in the scenario where the predicted label is 1, but the actual label is 0?
- False Positive
- False Negative
- True Positive
- True Negative
Question 107)
Your task is to train a model entitled finance-data for the financial department, by using data in an Azure Storage blob container.
Your container has to be registered in an Azure Machine Learning workspace as a datastore and you have to make sure that an error will appear if the container does not exist.
Considering this scenario, what should be the continuation for the code below?
Datastore = Datastore.<add answer here> (workspace = ws,
datastore_name = ‘finance_datastore’,
container_name = ‘finance-data’,
account_name = ‘fintrainingdatastorage’,
account_key = ‘FdhIWHDaiwh2…’
<add answer here>
- register_azure_blob_container, overwrite = True
- register_azure_blob_container, create_if_not_exists = False
- register_azure_data_lake, create_if_not_exists = False
- register_azure_data_lake, overwrite = False
Question 108)
True or False?
Before publishing, a pipeline needs to have its parameters defined.
- True
- False
Question 109)
Which of the options listed below is able to show if you have missing values in the dataset when you want to find out the number of observations in the data set in the process of explanatory data analysis?
- Mean
- Standard deviation
- Count
Question 110)
True or False?
Petastorm uses as an input a Vector and not an Array.
- True
- False
Question 111)
You decided to use Parquet files and Petastorm to train a distributed neural network by using Horovod.
Your housing prices dataset from California is entitled cal_housing.
In order to concatenate the features and labels of the model after you loaded the data, you configured from the Pandas DataFrame a Spark DataFrame.
At this point, you want to set up Dense Vectors for the features.
What code should you write in Python to achieve this?
- from pyspark.ml.feature import VectorAssembler
vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
display(vecTrainDF)
Question 112)
Python is commonly known to ensure extensive functionality with powerful and statistical numerical libraries. What are the utilities of Scikit-learn?
- Providing attractive data visualizations
- Offering simple and effective predictive data analysis
- Supplying machine learning and deep learning capabilities
- Analyzing and manipulating data
Question 113)
Choose from the list below the evaluation model that is described as a relative metric where the higher the value is, the better will be the fit of the model.
- Coefficient of Determination (known as R-squared or R2)
- Root Mean Square Error (RMSE)
- Mean Square Error (MSE)
Question 114)
You are able to associate the K-Means clustering algorithm with the following machine learning type:
- Supervised machine learning
- Unsupervised machine learning
- Reinforcement learning
Question 115)
In order to train your K-Means clustering model that enables grouping observations into four clusters, you decide to use scikit-learn library. Considering this scenario, what method should you choose to create the K-Means object?
- model = KMeans(n_clusters=4)
- model = Kmeans(n_init=4)
- model = Kmeans(max_iter=4)
Question 116)
Your task is to reduce the size of the feature maps that a convolutional layer generates when you create a convolutional neural network. What action should you take in this case?
- Increase the number of filters in the convolutional layer
- Reduce the size of the filter kernel used in the convolutional layer
- Add a pooling layer after the convolutional layer
Question 117)
You want to create a pipeline for which you defined three steps entitled as step1, step2, and step3.
Your goal is to run the pipeline as an experiment after the steps have been assigned to it.
Which of the following SDK command should you choose for this task?
- train_pipeline = Pipeline(workspace = ws, steps = [step1,step2,step3])
experiment = Experiment(workspace = ws, name = ‘training-pipeline’)
pipeline_run = experiment.submit(train_pipeline)
Question 118)
What code should you write for an instance of a TabularExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import TabularExplainer
tab_explainer = TabularExplainer(model=loan_model,
initialization_examples=X_test,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 119)
Which of the non-exhaustive cross validation techniques listed below enables you to assign data points in a random way to the training set and the test set?
- Holdout cross-validation
- Repeated random sub-sampling validation
- K-fold cross-validation
Question 120)
Which of the following methods are the ACI services and AKS services default authentication ones?
- Disabled for AKS services
- Token-based for ACI services
- Token-based for AKS services.
- Disabled for ACI services
- Key-based for AKS services
Question 121)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having no disease”.
- True negatives
- False negatives
- False positives
- True positives
Question 122)
Your task is to deploy your service on an AKS cluster that is set up as a compute target.
What SDK commands are able to return you the expected result?
- from azureml.core.compute import ComputeTarget, AksCompute
cluster_name = ‘aks-cluster’
compute_config = AksCompute.provisioning_configuration(location=’eastus’)
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
Question 123)
For training a classification model that is able to predict based on 8 numeric features where in the classes is belonging an observation, you configured a deep neural network.
From the list below, which one states a truth related to the network architecture?
- The network layer should contain four hidden layers
- The input layer should contain four nodes
- The output layer should contain four nodes
Question 124)
You are able to update web services already deployed and to enable the Application Insight with the use of Azure ML SDK.
What code should you write to achieve this?
- service = ws.webservices[‘my-svc’]
service.update(enable_app_insights=True)
Question 125)
How should the following sentence be completed?
One example of the machine learning […] type models are the Decision trees algorithms.
- Regression
- Clustering
- Classification
Question 126)
What Python code should you write if your goal is to extract the primary metric for a regression task?
- from azureml.train.automl.utilities import get_primary_metrics
get_primary_metrics(‘regression’)
Question 127)
Your task is to extract local feature importance from a TabularExplainer.
What code should you write in the SDK to achieve this goal?
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()
Question 128)
If you want to list the generated files after your experiment run is completed, what is the most suitable object run you should choose?
- download_files
- download_file
- get_file_names
- list_file_names
Question 129)
In order to train models, you decide to use an Azure Machine Learning compute resource. You set up the compute resource in the following manner: – Minimum nodes: 1 – Maximum nodes: 5. You have to decrease the minimum number of nodes and to increase the maximum number of nodes to the following values: – Minimum nodes: 0 – Maximum nodes: 8
Considering that you have to reconfigure the compute resource, which three ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Run the refresh_state() method of the BatchCompute class in the Python SDK.
- Use the Azure Machine Learning studio.
- Use the Azure portal.
- Use the Azure Machine Learning designer.
- Run the update method of the AmlCompute class in the Python SDK.
Question 130)
You are using an Azure Machine Learning service for your data science project. In order to deploy the project, you have to choose a compute target. For this scenario, which of the following Azure services is the most suitable?
- Apache Spark for HDInsight
- Azure Databricks
- Azure Data Lake Analytics
- Azure Container Instances
Question 131)
What code should you write using SDK if your goal is to extract the best run and its model?
- best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)
Question 132)
You published a parametrized pipeline and you now want to be able to pass parameter values in the JSON payload for the REST interface.
What SDK commands are the most appropriate to achieve your goal?
- response = requests.post(rest_endpoint,
headers=auth_header,
json={“ExperimentName”: “run_training_pipeline”,
“ParameterAssignments”: {“reg_rate”: 0.1}})
Question 133)
In order to do a multi-class classification using an unbalanced training dataset, you have to apply C-Support Vector classification. You use the following Python code for the C-Support Vector classification:
from sklearn.svm import svc
import numpy as np
svc = SVC(kernel = ‘linear’, class_weight= ‘balanced’, c-1.0, random_state-0)
model1 = svc.fit(X_train, y)
Considering that your task is to evaluate the C-Support Vector classification code, what is the most appropriate evaluation statement?
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data
C parameter: Penalty parameter
Question 134)
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
Every room is assigned with a unique numerical value because you have encoded every unique string into a number.
In order for the machine learning algorithm to effect every category, you have to one-hot encode every one of the values to a location in an array.
What code should you write to achieve this goal?
- from pyspark.ml.feature import OneHotEncoder
encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
encoderModel = encoder.fit(indexedDF)
encodedDF = encoderModel.transform(indexedDF)
display(encodedDF)
Question 135)
You decided to use Parquet files and Petastorm to train a distributed neural network by using Horovod.
Your housing prices dataset from California is entitled cal_housing.
In order to concatenate the features and labels of the model after you load the data, you decide to configure from the Pandas DataFrame a Spark DataFrame.
What code should you write in Python to achieve this?
- data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1)
trainDF = spark.createDataFrame(data)
display(trainDF)
Question 136)
You decide to use Azure Machine Learning designer for your real-time service endpoint. You can make use of only one Azure Machine Learning service compute resource.
You start training the model and preparing the real-time pipeline for deployment.
If you want to obtain a web service by publishing the inference pipeline, what is the most suitable compute type?
- a new Machine Learning Compute resource
- Azure Kubernetes Services
- the existing Machine Learning Compute resource
- Azure Databricks
- HDInsight
Question 137)
Python is commonly known to ensure extensive functionality with powerful and statistical numerical libraries. What are the utilities of TensorFlow?
- Analyzing and manipulating data
- Offering simple and effective predictive data analysis
- Supplying machine learning and deep learning capabilities
- Providing attractive data visualizations
Question 138)
In order to find all the runs for a specific experiment, you can use also the search_runs method.
What code should you write in Python to achieve this?
- experiment_id = run.info.experiment_id
runs_df = mlflow.search_runs(experiment_id)
display(runs_df)
Question 139)
You’re using the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
- Create a ScheduleRecurrence object with a Frequency of auto. Use the object to create a schedule for the pipeline
- Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder containing the training data in the path_on_datastore property
- Create a PipelineParameter with a default value that references the location where the training data is stored
- Set the regenerate_outputs property of the pipeline to True
Question 140)
You have a set of CSV files that contain sales records. Your CSV files follow an identical data schema.
The sales record for a certain month are held in one of the CSV files and the filename is sales.csv. For every file there is a corresponding storage folder that shows the month and the year for the data recording. In an Azure Machine Learning workspace has been set up a datastore for the folders kept in an Azure blob container. The parent folder entitled sales contains the folders organized to create the hierarchical structure below:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
In the sales folder is added a new folder with a certain month’s sales every time that month has ended. You want to train a machine learning model by using the sales data while complying with the requirements below:
– All of your sales data have to be loaded to date by a dataset and into a structure that enables easy conversion to a dataframe.
– You have to ensure that experiments can be done by using only the data created until a specific previous month, disregarding any data added after the month selected.
– You have to keep the number of registered datasets to the minimum possible.
Considering that the sales data have to be registered as a dataset in the Azure Machine Learning service workspace, what actions should you take?
- Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
- Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
Question 141)
Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
runs[0].data.metrics
Question 142)
You decided to use the from_files method of the Dataset.File class to configure a file dataset.
You then want to register the file dataset with the title img_files in a workspace.
What SDK commands should you choose for this task?
- from azureml.core import Dataset
blob_ds = ws.get_default_datastore()
file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images/*.jpg’))
file_ds = file_ds.register(workspace=ws, name=’img_files’)
Question 143)
Your task is to extract local feature importance from a TabularExplainer.
What code should you write in the SDK to achieve this goal?
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()
Question 144)
If you want to use the from_delimited_files method of the Dataset.Tabular class to configure and register a tabular dataset, what are the most appropriate Python commands?
- from azureml.core import Dataset
blob_ds = ws.get_default_datastore()
csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
(blob_ds, ‘data/files/archive/*.csv’)]
tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
Question 145)
You want to evaluate a Python NumPy array that has six data points with the following definition: data = [10, 20, 30, 40, 50, 60]
Your task is to use the k-fold algorithm implementation in the Python Scikit-learn machine learning library to generate the output that follows: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
In order to generate the output, you have to implement a cross-validation.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
- K-means, 6, array
- K-fold, 3, array
- CrossValidation, 3, data
- K-fold, 3, data