In this post, I’m sharing a quick review of the Prepare for DP-100: Data Science on Microsoft Azure Exam course, along with important insights to help you finalize your readiness for the DP-100 certification exam.
Just completed this fifth and final course in the Microsoft Azure Data Scientist Associate Professional Certificate? Then you’re right at the finish line. This course is all about review and exam preparation—helping you refresh your knowledge on building environments, running experiments, training models, and deploying solutions in Azure.
It also offers practice exams, study tips, and key resources to get you fully ready for the certification process. If you’re serious about passing the DP-100, this course gives you everything you need to approach exam day with confidence — and I’ve got the review to show you why it’s worth your time.
Prepare for DP-100: Data Science on Microsoft Azure Exam
Table of Contents
Practice exam covering Course 1: Create machine learning models Quiz Answers
Question 1)
Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.
Which error type should you choose for the following description?
“A person has a disease. The model classifies the case as having a disease”.
- True negatives
- False positives
- False negatives
- True positives
Question 2)
Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.
Which error type should you choose for the following description?
“A person does not have a disease. The model classifies the case as having a disease”.
- False positives
- True positives
- False negatives
- True negatives
Question 3)
You are tasked to analyze a dataset containing historical data from a local taxi company. You are developing a regression model for this. Your goal is to predict the fare of a taxi trip. You need to select performance metrics to correctly evaluate the regression model.
Which two metrics can you use?
- A Root Mean Square Error value that is low
- An R-Squared value close to 0
- An F1 score that is low
- An R-Squared value close to 1
Question 4)
You are a data scientist of a company and you are tasked with building a deep convolutional neural network (CNN) for image classification. The CNN model you built shows signs of overfitting. You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform?
- Add an additional dense layer with 512 input units
- Reduce the amount of training data
- Use training data augmentation
- Add an additional dense layer with 64 input units
- Add L1/L2 regularization
Question 5)
Your manager has provided you a dataset created for multiclass classification tasks that contains a normalized numerical feature set with 10,000 data points and 150 features. You use 75 percent of the data points for training and 25 percent for testing.
- You need to apply the Principal Component Analysis (PCA) method to reduce the dimensionality of the feature set to 10 features in both training and testing sets.
- You are using the scikit-learn machine learning library in Python.
- You use X to denote the feature set and Y to denote class labels.
- You create the following Python data frames:
From sklearn.decomposition import PCA
pca – [...]
x_train=[...] .fit_transform(X_train)
x_test = pca.[...]
How should you complete the code segment?
- Box1: PCA(n_components=10);
Box2: pca;
Box3: transform(x_test)
Question 6)
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model. You need to evaluate the linear regression model.
Solution: Use the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:
Does the solution meet the goal?
- Yes
- No
Question 7)
What happens when a NumPy array is multiplied by 5?
- The new array will be 5 times longer, with the sequence repeated 5 times and also all the elements are multiplied by 5.
- The new array will be 5 times longer, with the sequence repeated 5 times.
- Array stays the same size, but each element is multiplied by 5.
Question 8)
You are creating a model and you want to evaluate it. One metric yields an absolute metric in the same unit as the label.
Which metric is described?
- Root Mean Square Error (RMSE)
- Mean Square Error (MSE)
- Coefficient of Determination (known as R-squared or R2)
Question 9)
Complete the sentence:
Decision trees algorithms are examples of machine learning __________ type model.
- Regression
- Clustering
- Classification
Question 10)
It is well known that Python provides extensive functionality with powerful and statistical numerical libraries. What is Scikit-learn useful for?
- Providing attractive data visualizations
- Analyzing and manipulating data
- Supplying machine learning and deep learning capabilities
- Offering simple and effective predictive data analysis
Question 11)
You are asked to use C-Support Vector classification to do a multi-class classification with an unbalanced training dataset. The C-Support Vector classification is using the Python code shown below:
from sklearn.svm import svc
import numpy as np
svc = SVC(kernel = ‘linear’, class_weight= ‘balanced’, c-1.0, random_state-0)
model1 = svc.fit(X_train, y)
You need to evaluate the C-Support Vector classification code. Which evaluation statement should you use?
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data
C parameter: Penalty parameter
Question 12)
You are creating a model and you want to evaluate it. For this, you take a look on a specific metric which is direct proportional with how well the model fits.
Which evaluation model is described?
- Mean Square Error (MSE)
- Coefficient of Determination (known as R-squared or R2)
- Root Mean Square Error (RMSE)
Question 13)
You are creating a binary classification by using a two-class logistic regression model. You need to evaluate the model results for imbalance. Which evaluation metric should you use?
- Mean Absolute Error
- Relative Squared Error
- AUC Curve
- Relative Absolute Error
Question 14)
What happens when a list is multiplied by 5?
- The new list created has the length 5 times the original length with the sequence repeated 5 times.
- The new list created has the length 5 times the original length with the sequence repeated 5 times and also all the elements are also multiplied by 5.
- The new list remains the same size, but the elements are multiplied by 5.
Question 15)
You are a senior data scientist in the company and you are tasked with evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric. Which visualization should you use?
- Gradient descent
- Receiver Operating Characteristic (ROC) curve
- Scatter plot
- Violin plot
Question 16)
It is well known that Python provides extensive functionality with powerful and statistical numerical libraries. What is TensorFlow useful for?
- Analyzing and manipulating data
- Providing attractive data visualizations
- Supplying machine learning and deep learning capabilities
- Offering simple and effective predictive data analysis
Question 17)
Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.
Which error type should you choose for the following description?
“A person does not have a disease. The model classifies the case as having no disease”.
- False positives
- True negatives
- True positives
- False negatives
Question 18)
Your manager has asked you to create a binary classification model to predict whether a person has a disease. You need to detect possible classification errors.
Which error type should you choose for the following description?
“A person has a disease. The model classifies the case as having no disease”.
- True positives
- False positives
- True negatives
- False negatives
Question 19)
Complete the sentence:
The Support Vector Machine algorithm is an example of machine learning __________ type model.
- Classification
- Regression
- Clustering
Practice exam covering Course 2: Create no-code predictive models with Azure Machine Learning Quiz Answers
Question 1)
In a machine learning algorithm, what method should you use to split data for training and evaluation?
- Use labels for training and features for evaluation
- Randomly split the data into rows for training and columns for evaluation
- Use features for training and labels for evaluation
- Randomly split the data into rows for training and rows for evaluation
Question 2)
Predicting how many hours of overtime a delivery person will work based on the number of orders received is an example of which machine learning model?
- Regression
- Classification
- Clustering
Question 3)
Predicting how many minutes it will take someone to run a race based on past race times is a use case for?
- Classification
- Clustering
- Regression
Question 4)
Let’s suppose you are working on an AI application that should predict the weather. From the dataset you have, you want to pick temperature and pressure to train the model. Which machine learning task enables you to do that?
- Feature engineering
- Model training
- Feature selection
Question 5)
True or False?
Azure Machine Learning designer supports custom JavaScript functions.
- True
- False
Question 6)
You can use AI systems to predict whether a student will complete a university course. Which machine learning type enables you to do that?
- Classification
- Clustering
- Regression
Question 7)
True or False?
Accuracy is always the primary metric used to measure a model’s performance. Is this true?
- True
- False
Question 8)
True or False?
Automated machine learning is the process of automating the time consuming, iterative tasks of machine learning model development.
- True
- False
Question 9)
Which module in the Azure Machine Learning designer should you use if you want to create a training dataset and a validation dataset from an existing dataset?
- Select columns in dataset
- Split data
- Join data
- Add rows
Question 10)
You want to create a CRM application that uses AI to segment customers into different groups to support a marketing department. Which machine learning type should you use?
- Clustering
- Regression
- Classification
Question 11)
What data values are influencing prediction models?
- Features
- Identifiers
- Dependent variables
- Labels
Question 12)
Imagine you work for a government institution that wants to predict the sea level in meters for the following 10 years. Which type of machine learning should you use?
- Regression
- Classification
- Clustering
Question 13)
True or False?
When working in Azure Machine Learning designer, it is possible to save your progress as a pipeline draft.
- True
- False
Question 14)
Predicting whether someone uses a bicycle to travel to work based on the distance from home to work is a use case for?
- Classification
- Clustering
- Regression
Question 15)
Which of the following metrics is used to evaluate a classification model?
- True positive rate
- Mean absolute error (MAE)
- Root mean squared error (RMSE)
- Coefficient of determination (R2)
Question 16)
Let’s suppose you want to create an AI system that can predict how many minutes late a flight will arrive based on the amount of snowfall at an airport. Which machine learning type should you use?
- Regression
- Classification
- Clustering
Question 17)
Azure Machine Learning designer lets you visually connect datasets and modules on an interactive canvas to create machine learning models. Which two components can be dragged-and-dropped onto the canvas? Select all options that apply.
- Compute
- Dataset
- Pipeline
- Module
Question 18)
Automated machine learning can automatically infer the training data from the use case provided.
- True
- False
Question 19)
Azure Machine Learning designer provides a drag-and-drop visual canvas to build, test, and deploy machine learning models.
- True
- False
Question 20)
Fill in the blank.
__________ is a form of machine learning that has the capability to group similar items based on their features.
- Classification
- Regression
- Clustering
Practice exam covering Course 3: Build and operate machine learning solutions with Azure Machine Learning Quiz Answers
Question 1)
You create a new Azure subscription. No resources are provisioned in the subscription. You need to create an Azure Machine Learning workspace.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with name, subscription_id, resource_group, and location parameters.
- Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/ workspaces resource and its dependencies.
- Navigate to Azure Machine Learning studio and create a workspace.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying Cw and Cg parameters for the workspace name and resource group.
Question 2)
You are developing a data science workspace that uses an Azure Machine Learning service. You need to select a compute target to deploy the workspace. What should you use?
- Azure Databricks
- Azure Data Lake Analytics
- Azure Container Instances
- Apache Spark for HDInsight
Question 3)
The finance team asked you to train a model using data in an Azure Storage blob container named finance-data.
You need to register the container as a datastore in an Azure Machine Learning workspace and ensure that an error will be raised if the container does not exist.
How should you complete the code?
Datastore = Datastore.<add answer here> (workspace = ws,
datastore_name = ‘finance_datastore’,
container_name = ‘finance-data’,
account_name = ‘fintrainingdatastorage’,
account_key = ‘FdhIWHDaiwh2…’
<add answer here>
- register_azure_blob_container, create_if_not_exists = False
- register_azure_data_lake, overwrite = False
- register_azure_data_lake, create_if_not_exists = False
- register_azure_blob_container, overwrite = True
Question 4)
You are a lead data scientist for a project that tracks the health and migration of birds. You create a multi-class image classification deep learning model that uses a set of labeled bird photographs collected by experts.
You have 100,000 photographs of birds. All photographs use the JPG format and are stored in an Azure blob container in an Azure subscription. You need to access the bird photograph files in the Azure blob container from the Azure Machine Learning service workspace that will be used for deep learning model training.
You must minimize data movement. What should you do?
- Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.
- Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.
- Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.
- Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service.
- Create an Azure Data Lake store and move the bird photographs to the store.
Question 5)
You train a machine learning model. You must deploy the model as a real-time inference service for testing. The service requires low CPU utilization and less than 48 MB of RAM. The compute target for the deployed service must initialize automatically while minimizing cost and administrative overhead. Which compute target should you use?
- Azure Kubernetes Service (AKS) inference cluster
- attached Azure Databricks cluster
- Azure Machine Learning compute cluster
- Azure Container Instance (ACI)
Question 6)
You use Azure Machine Learning designer to create a real-time service endpoint. You have a single Azure Machine Learning service compute resource.
You train the model and prepare the real-time pipeline for deployment.
You need to publish the inference pipeline as a web service. Which compute type should you use?
- The existing Machine Learning Compute resource
- A new Machine Learning Compute resource
- HDInsight
- Azure Kubernetes Services
- Azure Databricks
Question 7)
You deploy a model as an Azure Machine Learning real-time web service using the following code.
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
The deployment fails.
You need to troubleshoot the deployment failure by determining the actions that were performed during deployment and identifying the specific action that failed.
Which code segment should you run?
- service.serialize()
- service.get_logs()
- service.state
- service.update_deployment_state()
Question 8)
You train and register a model in your Azure Machine Learning workspace.
You must publish a pipeline that enables client applications to use the model for batch inferencing.
You must use a pipeline with a single ParallelRunStep step that runs a Python inferencing script to get predictions from the input data.
You need to create the inferencing script for the ParallelRunStep pipeline step.
Which two functions should you include? Each correct answer presents part of the solution.
- run(mini_batch)
- init()
- main()
- batch()
- score(mini_batch)
Question 9)
Yes or No?
You train a classification model by using a logistic regression algorithm. You must be able to explain the model’s predictions by calculating the importance of each feature, both as an overall global relative importance value and as a measure of local importance for a specific set of predictions.
You need to create an explainer that you can use to retrieve the required global and local feature importance values.
Solution: Create a TabularExplainer. Does the solution meet the goal?
- Yes
- No
Question 10)
You deploy a real-time inference service for a trained model.
The deployed model supports a business-critical application, and it is important to be able to monitor the data submitted to the web service and the predictions the data generates.
You need to implement a monitoring solution for the deployed model using minimal administrative effort. What should you do?
- View the log files generated by the experiment used to train the model.
- Create an ML Flow tracking URI that references the endpoint, and view the data logged by ML Flow.
- Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.
- View the explanations for the registered model in Azure ML studio.
Question 11)
You create an Azure Machine Learning workspace. You are preparing a local Python environment on a laptop computer.
You want to use the laptop to connect to the workspace and run experiments.
You create the following config.json file:
{ “workspace_name” : “ml-workspace” }
You must use the Azure Machine Learning SDK to interact with data and experiments in the workspace. You need to configure the config.json file to connect to the workspace from the Python environment. Which two additional parameters must you add to the config.json file in order to connect to the workspace? Each correct answer presents part of the solution.
- Key
- Login
- Subscription_id
- Resource_group
- Region
Question 12)
A coworker registers a datastore in a Machine Learning services workspace by using the following code:
Datastore.register_azure_blob_container(workspace=ws,
datastore_name=‘demo_datastore’,
container_name=‘demo_datacontainer’,
account_name=’demo_account’,
account_key=’0A0A0A-0A00A0A-0A0A0A0A0A0’
create_if_not_exists=True)
You need to write code to access the datastore from a notebook. How should you complete the code segment?
import azureml.core
from azureml.core import Workspace, Datastore
ws = Workspace.from_config()
datastore = <add answer here> .get( <add answer here>, ‘<add answer here>’)
- DataStore, ws, demo_datastore
- Run, experiment, demo_datastore
- Experiment, run, demo_account
- Run, ws, demo_datastore
Question 13)
You create a deep learning model for image recognition on Azure Machine Learning service using GPU-based training.
You must deploy the model to a context that allows for real-time GPU-based inferencing.
You need to configure compute resources for model inferencing. Which compute type should you use?
- Azure Container Instance
- Field Programmable Gate Array
- Machine Learning Compute
- Azure Kubernetes Service
Question 14)
An organization creates and deploys a multi-class image classification deep learning model that uses a set of labeled photographs.
The software engineering team reports there is a heavy inferencing load for the prediction web services during the summer. The production web service for the model fails to meet demand despite having a fully-utilized compute cluster where the web service is deployed.
You need to improve performance of the image classification web service with minimal downtime and minimal administrative effort. What should you advise the IT Operations team to do?
- Increase the node count of the compute cluster where the web service is deployed.
- Increase the minimum node count of the compute cluster where the web service is deployed.
- Increase the VM size of nodes in the compute cluster where the web service is deployed.
- Create a new compute cluster by using larger VM sizes for the nodes, redeploy the web service to that cluster, and update the DNS registration for the service endpoint to point to the new cluster.
Question 15)
You use the Azure Machine Learning Python SDK to define a pipeline that consists of multiple steps.
When you run the pipeline, you observe that some steps do not run. The cached output from a previous run is used instead. You need to ensure that every step in the pipeline is run, even if the parameters and contents of the source directory have not changed since the previous run.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
- Restart the compute cluster where the pipeline experiment is configured to run.
- Set the outputs property of each step in the pipeline to True.
- Use a PipelineData object that references a datastore other than the default datastore.
- Set the regenerate_outputs property of the pipeline to True.
- Set the allow_reuse property of each step in the pipeline to False.
Question 16)
You register a model that you plan to use in a batch inference pipeline.
The batch inference pipeline must use a ParallelRunStep step to process files in a file dataset. The script has the ParallelRunStep step and the runs must process six input files each time the inferencing function is called.
You need to configure the pipeline. Which configuration setting should you specify in the ParallelRunConfig object for the ParallelRunStep step?
- process_count_per_node= “6”
- node_count= “6”
- error_threshold= “6”
- mini_batch_size= “6”
Question 17)
You create an Azure Machine Learning compute resource to train models. The compute resource is configured as follows: – Minimum nodes: 2 – Maximum nodes: 4. You must decrease the minimum number of nodes and increase the maximum number of nodes to the following values: – Minimum nodes: 0 – Maximum nodes: 8
You need to reconfigure the compute resource. What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
- Run the refresh_state() method of the BatchCompute class in the Python SDK.
- Use the Azure Machine Learning designer.
- Use the Azure Machine Learning studio.
- Run the update method of the AmlCompute class in the Python SDK.
- Use the Azure portal.
Question 18
An organization uses Azure Machine Learning service and wants to expand their use of machine learning. You have the following compute environments. The organization does not want to create another compute environment.
Machine Learning compute
You need to determine which compute environment to use for the following scenarios:
1. Run an Azure Machine Learning Designer training pipeline.
2. Deploying a web service from the Azure Machine Learning Designer.
Which compute types should you use?
- 1 nb_server, 2 aks_cluster
- 1 mlc_cluster, 2 aks_cluster
- 1 nb_server, 2 mlc_cluster
- 1 mlc_cluster, 2 nb_server
Question 19)
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
At the end of each month, a new folder with that month’s sales file is added to the sales folder. You plan to use the sales data to train a machine learning model based on the following requirements:
– You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.
– You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.
– You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace. What should you do?
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
- Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
- Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
Practice exam covering Course 4: Perform data science with Azure Databricks Quiz Answers
Question 1)
You have an AirBnB housing dataframe which you preprocessed and filtered down to only the relevant columns.
The columns are: id, host_name, bedrooms, neighbourhood_cleansed, price.
You’ve written the function below name firstInitialFunction that returns the first initial from the host_name column:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
You now want to create a UDF from this function using the spark.sql.register so that it will create the UDF in the SQL namespace.
How would you code that?
- airbnbDF.replaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
- airbnbDF.createTempView(“airbnbDF”)
spark.udf.register(sql_udf = firstInitialFunction)
- airbnbDF.createAndReplaceTempView(“airbnbDF”)
spark.udf.register(sql_udf.firstInitialFunction)
- airbnbDF.createOrReplaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
Question 2)
You have a Boston Housing dataset from which you want to train a model to predict the value of housing based on one or more input measures.
You are using the Spark ml framework to train the model on a single column that contains a vector of all the relevant features.
You must prepare the data by creating one column named features that has the average number of rooms, age and tax rate.
You want to use VectorAssembler for this task.
How would you code this?
- from pyspark.ml.feature import VectorAssembler
featureCols = [“rm”, “age”, “tax”]
assembler = VectorAssembler(inputCols=featureCols, outputCol=”features”)
bostonFeaturizedDF = assembler.transform(bostonDF)
display(bostonFeaturizedDF)
Question 3)
You are using MLflow to track the runs of a Linear Regression model of an AirBnB dataset.
You want to use all the features in the dataset.
You’ve created the pipeline, logged the pipeline, and logged the parameters.
Now you need to create predictions and metrics.
How should you code that?
- predDF = pipelineModel.transform(testDF)
regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
Question 4)
You are running Python code interactively in a Conda environment. The environment includes all required Azure Machine Learning SDK and MLflow packages.
You must use MLflow to log metrics in an Azure Machine Learning experiment named mlflow-experiment.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3 mlflow.active_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 run.log()
Question 5)
You create machine learning models by using Azure Machine Learning. You plan to train and score models by using a variety of compute contexts.
You also plan to train models by using an Azure Databricks cluster.
Which compute type can you use for Azure Databricks?
- Compute cluster
- Attached compute
- Inference cluster
Question 6)
You deploy a deep learning model in Azure Container Instance.
You must use the Azure Machine Learning SDK to call the model API. You need to invoke the deployed model using native SDK classes and methods.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
from azureml.core import Workspace
#1st code option
Import json
ws = Workspace.from_config()
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)
x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
- from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
- from azureml.core.webservice import requests, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
Question 7)
You have an AirBnB housing dataframe which you preprocessed and filtered down to only the relevant columns.
The columns are: id, host_name, bedrooms, neighbourhood_cleansed, price.
You’ve written the function below name firstInitialFunction that returns the first initial from the host_name column:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Because Python UDFs are much slower than Scala UDFs, you now want to create a Vectorized UDF in Python to speed up the computation.
How would you code that?
- from pyspark.sql.functions import pandas_udf
@pandas_udf(“string”)
def vectorizedUDF(name):
return name.str[0]
Question 8)
You are running a training experiment on remote compute in Azure Machine Learning.
The experiment is configured to use a Conda environment that includes the mlflow and azureml-contrib-run packages. You must use MLflow as the logging package for tracking metrics generated in the experiment.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
Import numpy as np
#1 Import library to log metrics
#2 Start logging for this run
reg_rage = 0.01
#3 Log the reg_rate metric
#4 Stop loggin for this run
- #1 from azureml.core import Run, #2 run = Run.get_context(), #3 logger.info(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 mlflow.end_run()
- #1 import mlflow, #2 mlflow.start_run(), #3 logger.info(‘ ..’), #4 mlflow.end_run()
- #1 import logging, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 run.complete()
Question 9)
You have a Boston Housing dataset where you find a median value for a number variables such as the number of rooms, per capita crime and economic status of residents.
You want to use Linear Regression to predict the median home value based on the average number of rooms.
You’ve imported the dataset and created a column named features that has a single input variable named rm by using VectorAssembler.
You now want to fit the Liner Regression model.
How should you code that?
- from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
lrModel = lr.fit(bostonFeaturizedDF)
Question 10)
You use the following code to run a script as an experiment in Azure Machine Learning:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
ws = Workspace.from_config()
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’,
script=’experiment.py’, run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
You must identify the output files that are generated by the experiment run. You need to add code to retrieve the output file names. Which code segment should you add to the script?
- files = run.get_properties()
- files = run.get_metrics()
- run.get_details_with_logs()
- files = run.get_fine_names()
Question 11)
You’re working with the Boston Housing dataset and you want to tune the Hyperparameters for the Linear Regression algorithm you’re using.
You’ve performed a test split on the Boston data set and built a pipeline for linear regression.
Now you want to use ParamGridBuilder() to test the maximum number of iterations, no matter whether you want to use an intercept with the y axis, or whether you want to standardize the features.
How should you code that?
- from pyspark.ml.tuning import ParamGridBuilder
paramGrid = (ParamGridBuilder()
.addGrid(lr.maxIter, [1, 10, 100])
.addGrid(lr.fitIntercept, [True, False])
.addGrid(lr.standardization, [True, False])
.build()
)
Question 12)
You are evaluating a Python NumPy array that contains six data points defined as follows: data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implementation in the Python Scikit-learn machine learning library: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
To answer, replace the bolded comments in the code with the appropriate code options in the answer area.
How should you complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
- K-fold, 3, data
- K-fold, 3, array
- K-means, 6, array
- CrossValidation, 3, data
Full Practice Exam Quiz Answers
Question 1)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having a disease”.
- False negatives
- True negatives
- True positives
- False positives
Question 2)
Your company is asking you to analyze a dataset that contains historical data obtained from a local car-sharing company. For this task, you decide to develop a regression model and you want to be able to foretell what price a trip will be. For the correct evaluation of the regression model, you have to use performance metrics.
In this scenario, what are the best two metrics?
- An R-Squared value close to 1
- A Root Mean Square Error value that is low
- An R-Squared value close to 0
- An F1 score that is low
Question 3)
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the student’s length of education, degree type, and craft form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Relative Squared Error, Coefficient of Determination, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No
Question 4)
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the same unit as the label.
What is the metric described above?
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root
- Classification
- Mean Square Error (RMSE)
Question 5)
How should the following sentence be completed?
One example of the machine learning […] type models is the Support Vector Machine algorithm.
- Clustering
- Regression
- Classification
Question 6)
Your NumPy array has the shape (2,35). Considering this, what information can you get about the elements?
- The array contains 35 elements, all with the value 2.
- The array is two dimensional, consisting of two arrays with 35 elements each.
- The array contains 2 elements with the values of 2 and 35.
Question 7)
Choose from the list below the evaluation metric that provides you with an absolute metric in the same unit as the label.
- Coefficient of Determination (known as R-squared or R2)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
Question 8)
Which are two appropriate ways to approach a problem when using multiclass classification?
- One vs Rest
- Rest minus One
- One vs One
- One and Rest
Question 9)
Your deep neural network is in the process of training. You decided to set 30 epochs to the training process configuration.
In this scenario, what would happen to the model’s behavior?
- The training data is split into 30 subsets, and each subset is passed through the network
- The first 30 rows of data are used to train the model, and the remaining rows are used to validate it
- The entire training dataset is passed through the network 30 times
Question 10)
The layer described below is used to reduce the number of feature values that are extracted from images, while still retaining the key differentiating features.
- Convolutional layer
- Flattening layer
- Pooling layer
Question 11)
The company that you work for decides to expand the use of machine learning. The company decides not to set up another compute environment in Azure. At the moment, you have at your disposal the compute environments below.

Considering the scenarios below, you must establish what is the most appropriate compute environment to:
1. Run an Azure Machine Learning Designer training pipeline
2. Deploy a web service from the Azure Machine Learning Designer
What are the best compute types for this goal?
- 1 mlc_cluster, 2 nb_server
- 1 mlc_cluster, 2 aks_cluster
- 1 nb_server, 2 aks_cluster
- 1 nb_server, 2 mlc_cluster
Question 12)
You have the role of lead data scientist in a project that keeps record of birds’ health and migration. You decide to use a set of labeled bird photographs collected by experts for your multi-class image classification deep learning model.
The entire set of 200,000 birds’ photographs uses the JPG format and is being kept in an Azure blob container from an Azure subscription. You have to be able to ensure access from the Azure Machine Learning service workspace used for deep learning model training directly to the bird photograph files stored in the Azure blob container.
You have to keep data movement to a minimum. What action should you take?
- Register the Azure blob storage containing the bird photographs as a datastore in Azure Machine Learning service.
- Create an Azure Data Lake store and move the bird photographs to the store.
- Copy the bird photographs to the blob datastore that was created with your Azure Machine Learning service workspace.
- Create an Azure Cosmos DB database and attach the Azure Blob containing bird photographs storage to the database.
- Create and register a dataset by using TabularDataset class that references the Azure blob storage containing bird photographs.
Question 13)
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
In order for the machine learning model to know how to handle the room types, you have to firstly encode every unique string into a number.
What code should you write to achieve this goal?
- from pyspark.ml.feature import StringIndexer
uniqueTypesDF = airbnbDF.select(“room_type”).distinct()
indexer = StringIndexer(inputCol=”room_type”, outputCol=”room_type_index”)
indexerModel = indexer.fit(uniqueTypesDF)
indexedDF = indexerModel.transform(uniqueTypesDF)
display(indexedDF)
Question 14)
You decide to register and train a model in your Azure Machine Learning workspace.
Your pipeline needs to ensure that the client applications are able to use the model for batch inferencing.
Your single ParallelRunStep step pipeline uses a Python inferencing script in order to obtain predictions from the input data.
Your task is to configure the inferencing script for the ParallelRunStep pipeline step.
Which are the most suitable two functions that you should use? Keep in mind that every correct answer presents a part of the solution.
- batch()
- run(mini_batch)
- init()
- score(mini_batch)
- main()
Question 15)
You decide to deploy a real-time inference service for a trained model.
Your model is able to support a business-critical application, and you have to ensure it can monitor the data that is submitted to the web, as well as the predictions generated by the data.
While keeping the administrative effort to a minimum, you have to be able to implement a monitoring solution for the model deployed. What action should you take?
- Enable Azure Application Insights for the service endpoint and view logged data in the Azure portal.
- Create an ML Flow tracking URI that references the endpoint, and view the data logged by ML Flow.
- View the explanations for the registered model in Azure ML studio.
- View the log files generated by the experiment used to train the model.
Question 16)
If you want to install the Azure Machine Learning SDK for Python, what are the most suitable package managers and CLI commands?
- nuget azureml-sdk
- pip install azureml-sdk
- npm install azureml-sdk
- yarn install azureml-sdk
Question 17)
What SDK commands should you choose if you want to extract a certain version of a data set?
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version(2))
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=2)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version=’2’)
- img_ds = Dataset.get_by_name(workspace=ws, name=’img_files’, version_2)
Question 18)
Your task is to use the SDK in order to define a compute configuration for a managed compute target.
Which of the following commands will return you the expected result?
- compute_config = AmlCompute.provisioning_configuration(vm_size=’STANDARD_DS11_V2′,
min_nodes=0, max_nodes=4,
vm_priority=’dedicated’)
Question 19)
You want to use a reference to the Run that was used to train the model in order to register it.
What are the most suitable SDK commands to achieve this goal?
- from azureml.core import Model
run.register_model( model_name=’classification_model’,
model_path=’outputs/model.pkl’,
description=’A classification model’)
Question 20)
If you want to set up a parallel run step, which of the SDK commands below should you choose?
- parallelrun_step = ParallelRunStep(
name=’batch-score’,
parallel_run_config=parallel_run_config,
inputs=[batch_data_set.as_named_input(‘batch_data’)],
output=output_dir,
arguments=[],
allow_reuse=True
Question 21)
What code should you write for an instance of a MimicExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import DecisionTreeExplainableModel
mim_explainer = MimicExplainer(model=loan_model,
initialization_examples=X_test,
explainable_model = DecisionTreeExplainableModel,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 22)
What code should you write for a PFIExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import PFIExplainer
pfi_explainer = PFIExplainer(model = loan_model,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 23)
If you want to minimize disparity in combined true positive rate and false_positive_rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- Equalized odds
- True positive rate parity
- Error rate parity
- False-positive rate parity
Question 24)
Your task is to ensure that your data drift monitor, that you scheduled to run daily, is able to send an alert when the drift magnitude surpasses 0.2. What code should you write in Python to achieve this?
- alert_email = AlertConfiguration(‘[email protected]’)
monitor = DataDriftDetector.create_from_datasets(ws, ‘dataset-drift-detector’,
baseline_data_set, target_data_set,
compute_target=cpu_cluster,
frequency=’Day’, latency=2,
drift_threshold=.2,
alert_configuration=alert_email)
Question 25)
Your goal is to train a model in the AirBnB Housing dataset you have, so that it will be able to predict the value of housing by analyzing one or several input measures.
In order to train the model on a single column that includes a vector of all the important features, you decide to use the Spark ml framework.
For the data to be prepared, you create the column entitled features that contains the average number of rooms, age and tax rate.
You decide to use VectorAssembler to obtain the result.
Considering this scenario, what code should you write?
- from pyspark.ml.feature import VectorAssembler
featureCols = [“rm”, “age”, “tax”]
assembler = VectorAssembler(inputCols=featureCols, outputCol=”features”)
bostonFeaturizedDF = assembler.transform(bostonDF)
display(bostonFeaturizedDF)
Question 26)
You decided to use Python code interactively in your Conda environment. You have all the required Azure Machine Learning SDK and MLflow packages in the environment.
In order to log metrics in your Azure Machine Learning experiment entitled mlflow-experiment, you have to use MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
import mlflow
from azureml.core import Workspace
ws = Workspace.from_config()
#1 Set the MLflow logging target
#2 Configure the experiment
with #3 Begin the experiment run
#4 Log my_metric with value 1.00 (‘my_metric’, 1.00)
print(“Finished!”)
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.set_experiment(‘mlflow-experiment), #3 mlflow.start_run(), #4 mlflow.log_metric
- #1 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #2 mlflow.get_run(‘mlflow-experiment), #3 mlflow.start_run(), #4 run.log()
- #1 mlflow.tracking.client = ws, #2 mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri()), #3 mlflow.active_run(), #4 mlflow.log_metric
Question 27)
Choose from the list below the supervised learning problem type that usually outputs quantitative values.
- Classification
- Clustering
- Regression
Question 30)
Choose from the list below the cross-validation technique that belongs to the exhaustive type.
- K-fold cross-validation
- Holdout cross-validation
- Leave-one-out cross-validation
- Leave-p-out cross-validation
Question 31)
Your task is to clean up the deployments and terminate the “dev” ACI webservice by making use of the Azure ML SDK after your work with Azure Machine Learning has ended.
What is the most suitable method in order to achieve this goal?
- dev_webservice.terminate()
- dev_webservice.flush()
- dev_webservice.delete()
- dev_webservice.remove()
Question 32)
True or False?
In order to differentiate multiple images, convolutional filters and pooling are used by the feature extraction layers to emphasize edges, corners, and other patterns.
This solution is supposed to work for any other group of images that have the same dimensions set as the network input layer.
- True
- False
Question 33)
You can enable the Application Insights when configuring the service deployment at the moment you want to deploy a new real-time service.
By using the SDK, what code should you write to achieve this goal?
- dep_config = AciWebservice.deploy_configuration(cpu_cores = 1,
memory_gb = 1,
enable_app_insights=True)
Question 34)
You decided to use Azure Machine Learning and your goal is to train a Diabetes Model and build a container image for it.
You choose to make use of the scikit-learn ElasticNet linear regression model.
You want to use Azure Kubernetes Service (AKS) for the model deployment to production.
For deploying the model, you configured an AKS cluster.
At this point, you have deployed the image of the model to the desired AKS cluster.
After using different hyperparameters to train the new model, your goal is to deploy to the AKS cluster the new image of the model.
What code should you write for this task?
- prod_webservice.update(image=model_image_updated)
prod_webservice.wait_for_deployment(show_output = True)
Question 35)
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?
- Box plot
- A violin plot
- Binary classification confusion matrix
- Gradient descent
Question 36)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person suffers from a disease. Your model classifies the case as having a disease”.
- True positives
- True negatives
- False positives
- False negatives
Question 37)
As a senior data scientist, you need to evaluate a binary classification machine learning model.
As evaluation metric, you have to use the precision. Considering this, which is the most appropriate visualization?
- Scatter plot
- Gradient descent
- Receiver Operating Characteristic (ROC) curve
- Violin plot
Question 38)
Your task is to create and evaluate a model. One of the metrics shows an absolute metric in the same unit as the label.
What is the metric described above?
- Mean Square Error (MSE)
- Coefficient of Determination (known as R-squared or R2)
- Root Mean Square Error (RMSE)
Question 39)
You have a Pandas DataFrame entitled df_sales that contains the sales data from each day. You DataFrame contains these columns: year, month, day_of_month, sales_total. Which of the following codes should you choose if your goal is to return the average sales_total value?
- df_sales[‘sales_total’].mean()
- df_sales[‘sales_total’].avg()
- mean(df_sales[‘sales_total’])
Question 40)
In order to foretell the price for a student’s craftwork, you have to rely on the following variables: the student’s length of education, degree type, and art form. You decide to set up a linear regression model that you will have to evaluate. Solution: Apply the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC:
Is this solution effective?
- Yes
- No
Question 41)
If you use the sklearn.metrics classification report for evaluating how your model performs, what result do you get from the F1-Score metric?
- Out of all of the instances of this class in the test dataset, how many did the model identify
- An average metric that takes both precision and recall into account.
- How many instances of this class are there in the test dataset
- Of the predictions the model made for this class, what proportion were correct
Question 42)
In order to create clusters, Hierarchical clustering uses two methods.
What are the two methods used in this case?
- Divisive
- Distinctive
- Aggregational
- Agglomerative
Question 43)
What is the effect that you obtain if you increase the Learning Rate parameter for the deep neural network that you are creating?
- More hidden layers are added to the network
- More records are included in each batch passed through the network
- Larger adjustments are made to weight values during backpropagation
Question 44)
The layer described below is used to reduce the number of feature values that are extracted from images, while still retaining the key differentiating features.
- Flattening layer
- Pooling layer
- Convolutional layer
Question 45)
After installing the Azure Machine Learning Python SDK, you decide to use it to configure on your subscription a workspace entitled “aml-workspace”.
What code should you write in Python for this task?
- from azureml.core import Workspace
ws = Workspace.create(name=’aml-workspace’,
subscription_id=’123456-abc-123…’,
resource_group=’aml-resources’,
create_resource_group=True,
location=’eastus’
)
Question 46)
After installing the Azure Machine Learning CLI extension, you decide to use it to set up an ML workspace in your existing resource group.
What Azure CLI command should you choose for this task?
- az ml new workspace create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’
- az ml ws create -w ‘aml-workspace’ -g ‘aml-resources’
- new az ml workspace create -w ‘aml-workspace’ -g ‘aml-resources’
Question 47)
What are the most appropriate SDK commands you should choose if you want to publish the pipeline that you created?
- published_pipeline = pipeline.publish(name=’training_pipeline’,
description=’Model training pipeline’,
version=’1.0′)
Question 48)
Choose from the options below the one that explains how are values for hyperparameters selected by random sampling.
- From a mix of discrete and continuous values
- It tries every possible combination of parameters in the search space
- It tries to select parameter combinations that will result in improved performance from the previous selection
Question 49)
What Python code should you write if your goal is to implement a median stopping policy?
- from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1,
delay_evaluation=5)
Question 50)
Your task is to enable the creation of an explanation in the experiment script. What packages should you install in the run environment in order to achieve this goal?
- azureml-contrib-interpret
- azureml-explainer
- azureml-interpret
- azureml-blackbox
Question 51)
You decided to preprocess and filter down only the relevant columns for your AirBnB housing dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Your goal is to use the spark.sql.register in order to create a UDF from the function above, because you want to ensure that the UDF will be created in the SQL namespace.
Considering this scenario, what code should you write?
- airbnbDF.createOrReplaceTempView(“airbnbDF”)
spark.udf.register(“sql_udf”, firstInitialFunction)
Question 52)
In order to track the runs of a Linear Regression model of your AirBnB dataset, you decide to use MLflow.
You want to make use of all the features included in your dataset.
At this point, you have created and logged the pipeline and you have logged the parameters.
You now have to create some predictions and metrics.
Considering this scenario, what code should you write?
- predDF = pipelineModel.transform(testDF)
regressionEvaluator = RegressionEvaluator(labelCol=”price”, predictionCol=”prediction”)
rmse = regressionEvaluator.setMetricName(“rmse”).evaluate(predDF)
r2 = regressionEvaluator.setMetricName(“r2”).evaluate(predDF)
Question 53)
You are using remote compute in Azure Machine Learning to run a training experiment.
The Conda environment used for the experiment includes both the mlflow, and the azureml-contrib-run packages. In order to track the metrics that the experiment generates, you have to log package by using MLflow.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
Import numpy as np
#1 Import library to log metrics
#2 Start logging for this run
reg_rage = 0.01
#3 Log the reg_rate metric
#4 Stop loggin for this run
- #1 import logging, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 logger.info(‘ ..’), #4 mlflow.end_run()
- #1 from azureml.core import Run, #2 run = Run.get_context(), #3 logger.info(‘ ..’), #4 run.complete()
- #1 import mlflow, #2 mlflow.start_run(), #3 mlflow.log_metric(‘ ..’), #4 mlflow.end_run()
Question 54)
Choose from the list below all the options that show how are also entitled the qualitative variables.
- Numerical
- Continuous
- Discrete
- Categorical
Question 55)
Which of the below visualization tools is able to help you visualize quantiles and outliers?
- t-SNE
- Heat maps
- Box plots
- Q-Q plots
Question 56)
Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
runs[0].data.metrics
Question 57)
Your task is to store in the Azure ML workspace a model for whose training you ran an experiment. You want to do this so that other experiments and services can be applied to the model.
Considering this scenario, what action should you take to achieve the result?
- Save the model as a file in a compute instance
- Save the experiment script as a notebook
- Register the model in the workspace
- Save the model as a file in a Key Vault instance
Question 58)
In you want to explore the hyperparameters on a model while knowing that every algorithm uses a different hyperparameter for tuning, what is the most appropriate method you should choose?
- showParams()
- exploreParams()
- getParams()
- explainParams()
Question 59)
You usually take the following steps when you use HorovodRunner in order to develop a distributed training program:
1. Configure a HorovodRunner instance that is initialized with the nodes number.
2. While using the methods described in Horovod usage, define a Horovod training method for which you want to ensure that import statements are added inside the method.
What code should you write in Python to achieve this?
- hr = HorovodRunner(np=2)
def train():
import tensorflow as tf
hvd.init()
hr.run(train)
Question 60)
You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of the solution.
- The run ID of the inference pipeline experiment for the endpoint
- The key for the endpoint
- The URL of the endpoint
- The name of the AKS cluster where the endpoint is hosted
- The name of the inference pipeline for the endpoint
Question 61)
You decide to use a two-class logistic regression model for a binary classification. If you have to evaluate the results for imbalance issues, what would be the best evaluation metric for the model?
- Relative Absolute Error
- AUC Curve
- Relative Squared Error
- Mean Absolute Error
Question 62)
You decide to use GPU-based training to develop a deep learning model on Azure Machine Learning service that is able to recognize image.
The context where you have to configure the model needs to allow real-time GPU-based inferencing.
Considering that you have to set up compute resources for model inferencing, what is the most suitable compute type?
- Field Programmable Gate Array
- Azure Kubernetes Service
- Azure Container Instance
- Machine Learning Compute
Question 63)
Your task is to create and evaluate a model. You decide to use a specific metric that provides you a direct proportionality with how well the model fits.
What is the evaluation model described above?
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
- Coefficient of Determination (known as R-squared or R2)
Question 64)
When you use the Support Vector Machine algorithm, what type of machine learning model is possible to train?
- Regression
- Clustering
- Classification
Question 65)
Your task is to set up an Azure Machine Learning workspace. You decide to use a laptop computer to create a local Python environment.
You want to ensure connection between the laptop and the workspace and you want to run experiments.
You start creating the config.json file below:
{ “workspace_name” : “ml-workspace” }
In order to interact in the workspace with data and experiments, you have to use the Azure Machine Learning SDK. Your config.json file has to be able to connect from the Python environment directly to the workspace. If you want to ensure connection to the workspace, what should be the two additional parameters that you should add to the config,json? Keep in mind that every correct answer presents a part of the solution.
- Region
- Subscription_id
- Login
- Resource_group
- Key
Question 66)
You want to set up a new Azure subscription. The subscription doesn’t contain any resources.
Your goal is to create an Azure Machine Learning workspace.
Considering this scenario, which are three possible ways to obtain this result? Keep in mind that every correct answer presents a complete solution.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.get method with name, subscription_id, and resource_group parameters.
- Run Python code that uses the Azure ML SDK library and calls the Workspace.create method with name, subscription_id, resource_group, and location parameters.
- Use the Azure Command Line Interface (CLI) with the Azure Machine Learning extension to call the az group create function with –name and –location parameters, and then the az ml workspace create function, specifying Cw and Cg parameters for the workspace name and resource group.
- Use an Azure Resource Management template that includes a Microsoft.MachineLearningServices/ workspaces resource and its dependencies.
- Navigate to Azure Machine Learning studio and create a workspace.
Question 67)
You are in the process of training a machine learning model. Your model has to be configured for testing as a real-time inference service. For the service you have to ensure low CPU utilization and less than 48 MB of RAM. While keeping cost and administrative overhead to a minimum, you have to make sure that the compute target for the deployed service is initialized in an automatic manner.
In this scenario, what is the most appropriate compute target?
- Azure Machine Learning compute cluster
- attached Azure Databricks cluster
- Azure Container Instance (ACI)
- Azure Kubernetes Service (AKS) inference cluster
Question 68)
Yes or No?
In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a PFIExplainer. Is this solution effective?
- Yes
- No
Question 69)
Yes or No?
You use a logistic regression algorithm to train your classification model. In order to explain the model’s predictions, you have to calculate the importance of all the features, taking into account the overall global relative importance value, but also the measure of local importance for a certain set of predictions.
You decide to obtain the global and local feature importance values that you need by using an explainer.
Solution: Configure a TabularExplainer. Is this solution effective?
- Yes
- No
Question 70)
If your goal is to use a configuration file in order to ensure connection to your Azure ML workspace, what Python command would be the most appropriate?
- from azureml.core import Workspace
ws = Workspace.from_config()
Question 71)
As a data scientist, you are asked to build a deep convolutional neural network (CNN) in order to classify images. Your CNN model seems to present some overfitting signs. Your goal is to minimize overfitting and to give an optimal fit to the model.
Considering this, what are the most appropriate two actions that you should take?
- Reduce the amount of training data
- Use training data augmentation
- Add an additional dense layer with 64 input units
- Add an additional dense layer with 512 input units
- Add L1/L2 regularization
Question 72)
If you want to visualize the environments that you registered in your workspace, what are the most appropriate SDK commands that you should choose?
- from azureml.core import Environment
env_names = Environment.list(workspace=ws)
for env_name in env_names:
print(‘Name:’,env_name)
Question 73)
If you want to extract the parallel_run_step.txt file from the output of the step after the pipeline run has ended, what code should you choose?
- for root, dirs, files in os.walk(‘results’):
for file in files:
if file.endswith(‘parallel_run_step.txt’):
result_file = os.path.join(root,file)
Question 74)
Your company uses a set of labeled photographs for the multi-class image classification deep learning model that is creating.
During summer time, the software engineering team noticed that for the prediction web services is a heavy inferencing load. Although the production web service for the model has a fully-utilized compute cluster for the deployment of the web service, it fails to meet demand.
While keeping the downtime and the administrative effort to a minimum, you have to be able to improve the image classification web service performance. Considering this, what actions do you recommend the IT Operations team to take?
- Increase the minimum node count of the compute cluster where the web service is deployed.
- Increase the node count of the compute cluster where the web service is deployed.
- Increase the VM size of nodes in the compute cluster where the web service is deployed.
- Create a new compute cluster by using larger VM sizes for the nodes, redeploy the web service to that cluster, and update the DNS registration for the service endpoint to point to the new cluster.
Question 75)
Your task is to train a binary classification model in order for it to be able to target the correct subjects in a marketing campaign.
What actions should you take if you want to ensure that your model is fair and will not be inclined to ethnic discrimination?
- Evaluate each trained model with a validation dataset, and use the model with the highest accuracy score. An accurate model is inherently fair.
- Compare disparity between selection rates and performance metrics across ethnicities.
- Remove the ethnicity feature from the training dataset.
Question 76)
You decided to preprocess and filter down only the relevant columns for your AirBnB housing dataframe.
The columns that you kept are: id, host_name, bedrooms, neighbourhood_cleansed, price.
In order to obtain the first initial from the host_name column, you have written the following function that you entitled firstInitialFunction:
def firstInitialFunction(name):
return name[0]
firstInitialFunction(“George”)
Considering that Python UDFs are much slower than Scala UDFs, your goal is to create a Vectorized UDF in Python in order to speed up the computation.
In this scenario, what code should you write?
- from pyspark.sql.functions import pandas_udf
@pandas_udf(“string”)
def vectorizedUDF(name):
return name.str[0]
Question 77)
You decided to use the AirBnB Housing dataset and the Linear Regression algorithm for which you want to tune the Hyperparameters.
At this point, for the Boston data set you have executed a test split and for the linear regression you have built a pipeline.
You now want to test the maximum number of iterations by using the ParamGridBuilder() and you can do this no matter if you want to use an intercept with the y axis or fi you want to standardize the features.
Considering this scenario, what code should you write?
- from pyspark.ml.tuning import ParamGridBuilder
paramGrid = (ParamGridBuilder()
.addGrid(lr.maxIter, [1, 10, 100])
.addGrid(lr.fitIntercept, [True, False])
.addGrid(lr.standardization, [True, False])
.build()
)
Question 78)
You want to deploy in your Azure Container Instance a deep learning model.
In order to call the model API, you have to use the Azure Machine Learning SDK.
To invoke the deployed model, you have to use native SDK classes and methods.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from azureml.core import Workspace
#1st code option
Import json
ws = Workspace.from_config()
service_name = “mlmodel1-service”
service = Webservice(name=service_name, workspace=ws)
x_new = [[2, 101.5, 1, 24, 21], [1, 89.7, 4, 41, 21]]
input_json = json.dumps({“data”: x_new})
#2nd code option
- from azureml.core.webservice import LocalWebservice, predictions = service.run(input_json)
- from azureml.core.webservice import Webservice, predictions = service.deserialize(ws, input_json)
- from azureml.core.webservice import Webservice, predictions = service.run(input_json)
- from azureml.core.webservice import requests, predictions = service.run(input_json)
Question 79)
Your hyperparameter tuning needs to have a search space defined. The values of the batch_size hyperparameter can be 128, 256, or 512 and the normal distribution values for the learning_rate hyperparameter can have a mean of 10 and a standard deviation of 3.
What Python code should you write in order to achieve this goal?
- from azureml.train.hyperdrive import choice, normal
param_space = {
‘–batch_size’: choice(128, 256, 512),
‘–learning_rate’: normal(10, 3)
}
Question 80)
You intend to use the Hyperdrive feature of Azure Machine Learning to determine the optimal hyperparameter values when training a model.
You need to use Hyperdrive to try combinations of the following hyperparameter values:
— learning_rate: any value between 0.001 and 0.1
— batch_size: 16, 32, or 64
You must configure the search space for the Hyperdrive experiment.
Which two parameter expressions should you use? Each correct answer presents part of the solution.
- A uniform expression for learning_rate
- A normal expression for batch_size
- A choice expression for learning_rate
- A choice expression for batch_size
Question 81)
You create a machine learning model by using the Azure Machine Learning designer. You publish the model as a real-time service on an Azure Kubernetes Service (AKS) inference compute cluster. You make no change to the deployed endpoint configuration.
You need to provide application developers with the information they need to consume the endpoint. Which two values should you provide to application developers? Each correct answer presents part of the solution.
- The key for the endpoint
- The name of the AKS cluster where the endpoint is hosted
- The run ID of the inference pipeline experiment for the endpoint
- The name of the inference pipeline for the endpoint
- The URL of the endpoint
Question 82)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person suffers from a disease. Your model classifies the case as having no disease”.
- True negatives
- True positives
- False negatives
- False positives
Question 83)
You have a dataset that can be used for multiclass classification tasks. The dataset provided contains a normalized numerical feature set with 20,000 data points and 300 features. For training purposes, you need 75 percent of the data points and for testing purposes you need 25 percent.
Name Description
X_train Training feature set
Y_train Training class labels
x_train Testing feature set
y_train Testing class labels
Your goal is to use the method of the Principal Component Analysis (PCA) in order to reduce the feature set dimensionality to 20 features for training and testing sets also.
You decide to apply in Python the scikit-learn machine learning library.
You mark with X the feature set and with Y the class labels.
Your Python data frames include the below code segment:
From sklearn.decomposition import PCA
pca – […]
x_train=[…].fit_transform(X_train)
x_test = pca.[…]
How would you complete the missing brackets for the code snippet presented?
- Box1: PCA(n_components=10);
Box2: pca;
Box3: transform(x_test)
Question 84)
What is the result for multiplying a NumPy array by 3?
- The new array will be 3 times longer, with the sequence repeated 3 times and also all the elements are multiplied by 3.
- The new array will be 3 times longer, with the sequence repeated 3 times.
- Array stays the same size, but each element is multiplied by 3.
Question 85)
Which of the layer types described below is a principal one that retrieves important features in images and works by putting a filter to images?
- Pooling layer
- Flattening layer
- Convolutional layer
Question 86)
In order to register a datastore in a Machine Learning services workspace, one of your coworkers decides to use the code below:
Datastore.register_azure_blob_container(workspace=ws,
datastore_name=‘demo_datastore’,
container_name=‘demo_datacontainer’,
account_name=’demo_account’,
account_key=’0A0A0A-0A00A0A-0A0A0A0A0A0’
create_if_not_exists=True)
You want to be able to access the datastore by using a notebook. If you want to achieve this goal, what code should you write for completing the following snippet segment?
import azureml.core
from azureml.core import Workspace, Datastore
ws = Workspace.from_config()
datastore = <add answer here> .get( <add answer here>, ‘<add answer here>’)
- Experiment, run, demo_account
- Run, experiment, demo_datastore
- DataStore, ws, demo_datastore
- Run, ws, demo_datastore
Question 87)
You decide to use the code below for the deployment of a model as an Azure Machine Learning real-time web service:
# ws, model, inference_config, and deployment_config defined previously
service = Model.deploy(ws, ‘classification-service’, [model], inference_config, deployment_config)
service.wait_for_deployment(True)
Your deployment does not succeed.
You have to troubleshoot the deployment failure in order to determine what actions were taken while deploying and to identify the one action that encountered a problem and didn’t succeed.
For this scenario, which of the following code snippets should you use?
- service.update_deployment_state()
- service.state
- service.serialize()
- service.get_logs()
Question 88)
You want to use your registered model in a batch inference pipeline.
For processing files in a file dataset, your batch inference pipeline has to use a ParallelRunStep step. The script has the ParallelRunStep step and every time the inferencing function is used, the runs need to be able to process six input files.
You have to set up the pipeline. What configuration setting needs to be specified in the ParallelRunConfig object for the ParallelRunStep step?
- process_count_per_node= “6”
- mini_batch_size= “6”
- error_threshold= “6”
- node_count= “6”
Question 89)
What object needs to be defined if your task is to create a schedule for your pipeline?
- ScheduleSync
- ScheduleConfig
- ScheduleTimer
- ScheduleRecurrence
Question 90)
You can combine the Bayesian sampling with an early-termination policy and you can use it only with these three parameter expressions: choice, uniform and quniform.
- False
- True
Question 91)
If you want to minimize disparity in the selection rate across sensitive feature groups, what is the most suitable parity constraint that you should choose to use with any of the mitigation algorithms?
- Demographic parity
- Equalized odds
- Error rate parity
- Bounded group loss
Question 92)
Your task is to back fill a dataset monitor for the previous 5 months based on changes made in data on a monthly basis.
What code should you write in the SDK to achieve this goal?
- import datetime as dt
backfill = monitor.backfill( dt.datetime.now() – dt.timedelta(months=5), dt.datetime.now())
Question 93)
You discover a median value for a number of variables in your AirBnB Housing dataset, variables like the number of rooms, per capita crime and economic status of residents.
Depending on the average number of rooms, you want to be able to predict the median home value by using Linear Regression.
You decided to use VectorAssembler to import the dataset and to create your column entitled features that includes a single input variable entitled rm.
At this moment you have to fit the Liner Regression model.
Considering this scenario, what code should you write?
- from pyspark.ml.regression import LinearRegression
lr = LinearRegression(featuresCol=”features”, labelCol=”medv”)
lrModel = lr.fit(bostonFeaturizedDF)
Question 93)
You are able to use the the MlflowClient object as the pathway in order to query previous runs in a programmatic manner.
What code should you write in Python to achieve this?
- from mlflow.tracking import MlflowClient
client = MlflowClient()
client.list_experiments()
Question 94)
You decided to use Azure Machine Learning to create machine learning models. You want to use multiple compute contexts to train and score models.
Moreover, you want to use Azure Databricks cluster to train models.
Considering this scenario, what compute type is the most suitable to use for Azure Databricks?
- Compute cluster
- Attached compute
- Inference cluster
Question 95)
For your experiment in Azure Machine Learning you decide to run the following code:
from azureml.core import Workspace, Experiment, Run
from azureml.core import RunConfig, ScriptRunConfig
ws = Workspace.from_config()
run_config = RunConfiguration()
run_config.target=’local’
script_config = ScriptRunConfig(source_directory=’./script’, script=’experiment.py’, run_config=run_config)
experiment = Experiment(workspace=ws, name=’script experiment’)
run = experiment.submit(config=script_config)
run.wait_for_completion()
The experiment run generates several output files that need identification.
In order to retrieve the output file names, you must write some code. Which of the following code snippets should you choose to complete the script?
- files = run.get_details_with_logs()
- files = run.get_properties()
- files = run.get_fine_names()
- files = run.get_metrics()
Question 96)
Choose from the descriptions below the one that explains what does a negative correlation of -1 mean in terms of correlations.
- There is no association between the variables
- For each unit increase in one variable, the same decrease is seen in the other
- For each unit increase in one variable, the same increase is seen in the other
Question 97)
If you want to string together all the different possible hyperparameters that you need for testing, what is the most suitable PySpark class method you should choose?
- ParamGridSearch()
- ParamSearch()
- ParamGridBuilder()
- ParamBuilder()
Question 98)
What Python command should you choose in order to view the models previously registered in the Azure ML studio by using the Model object?
- from azureml.core import Model
for model in Model.list(ws):
print(model.name, ‘version:’, model.version)
Question 99)
The DataFrame you are currently working on contains data regarding the daily sales of ice cream. In order to compare the avg_temp and units_sold columns you decided to use the corr method which returned a result of 0.95.
What information can you read from this result?
- The units_sold value is, on average, 95% of the avg_temp value
- Days with high avg_temp values tend to coincide with days that have high units_sold values
- On the day with the maximum units_sold value, the avg_temp value was 0.95
Question 100)
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: You use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Relative Squared Error, and the Coefficient of Determination.
Does the solution meet the goal?
- Yes
- No
Question 101)
What is the result for multiplying a list by 3?
- The new list remains the same size, but the elements are multiplied by 3.
- The new list created has the length 3 times the original length with the sequence repeated 3 times.
- The new list created has the length 3 times the original length with the sequence repeated 3 times and also all the elements are also multiplied by 3.
Question 102)
If you multiply by 2 a list and a NumPy array, what result would you get?
- Multiplying an NumPy array by 2 creates a new array 2 times the length with the original sequence repeated 2 times.
- Multiplying a list by 2 performs an element-wise calculation on the list, which sees the list stay the same size, but each element has been multiplied by 2.
- Multiplying a list by 2 creates a new list 2 times the length with the original sequence repeated 2 times.
- Multiplying a NumPy array by 2 performs an element-wise calculation on the array, which sees the array stay the same size, but each element has been multiplied by 2.
Question 103)
You decided to use the LinearRegression class from the scikit-learn library to create your model object.
If you want to train the model, what should your next step be?
- Call the score() method of the model object, specifying the training feature and test feature arrays
- Call the fit() method of the model object, specifying the training feature and label arrays
- Call the predict() method of the model object, specifying the training feature and label arrays
Question 104)
In order to define a pipeline with multiple steps, you decide to use the Azure Machine Learning Python SDK. You notice that some steps of the pipeline do not run. Instead of running the steps, the pipeline uses a cached output from a previous run. Your task is to make sure that the pipeline runs every step, even when the parameters and contents of the source directory are the same with the ones from the previous run.
From the following list, which two ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Restart the compute cluster where the pipeline experiment is configured to run.
- Set the outputs property of each step in the pipeline to True.
- Use a PipelineData object that references a datastore other than the default datastore.
- Set the regenerate_outputs property of the pipeline to True.
- Set the allow_reuse property of each step in the pipeline to False.
Question 105)
If you want to extract a dataset after its registration, what are the most suitable methods you should choose from the Dataset class?
- find_by_id
- get_by_id
- find_by_name
- get_by_name
Question 106)
Four possible prediction outcomes are able to provide you with the Precision and Recall metrics.
What is the outcome in the scenario where the predicted label is 1, but the actual label is 0?
- False Positive
- False Negative
- True Positive
- True Negative
Question 107)
Your task is to train a model entitled finance-data for the financial department, by using data in an Azure Storage blob container.
Your container has to be registered in an Azure Machine Learning workspace as a datastore and you have to make sure that an error will appear if the container does not exist.
Considering this scenario, what should be the continuation for the code below?
Datastore = Datastore.<add answer here> (workspace = ws,
datastore_name = ‘finance_datastore’,
container_name = ‘finance-data’,
account_name = ‘fintrainingdatastorage’,
account_key = ‘FdhIWHDaiwh2…’
<add answer here>
- register_azure_blob_container, overwrite = True
- register_azure_blob_container, create_if_not_exists = False
- register_azure_data_lake, create_if_not_exists = False
- register_azure_data_lake, overwrite = False
Question 108)
True or False?
Before publishing, a pipeline needs to have its parameters defined.
- True
- False
Question 109)
Which of the options listed below is able to show if you have missing values in the dataset when you want to find out the number of observations in the data set in the process of explanatory data analysis?
- Mean
- Standard deviation
- Count
Question 110)
True or False?
Petastorm uses as an input a Vector and not an Array.
- True
- False
Question 111)
You decided to use Parquet files and Petastorm to train a distributed neural network by using Horovod.
Your housing prices dataset from California is entitled cal_housing.
In order to concatenate the features and labels of the model after you loaded the data, you configured from the Pandas DataFrame a Spark DataFrame.
At this point, you want to set up Dense Vectors for the features.
What code should you write in Python to achieve this?
- from pyspark.ml.feature import VectorAssembler
vecAssembler = VectorAssembler(inputCols=cal_housing.feature_names, outputCol=”features”)
vecTrainDF = vecAssembler.transform(trainDF).select(“features”, “label”)
display(vecTrainDF)
Question 112)
Python is commonly known to ensure extensive functionality with powerful and statistical numerical libraries. What are the utilities of Scikit-learn?
- Providing attractive data visualizations
- Offering simple and effective predictive data analysis
- Supplying machine learning and deep learning capabilities
- Analyzing and manipulating data
Question 113)
Choose from the list below the evaluation model that is described as a relative metric where the higher the value is, the better will be the fit of the model.
- Coefficient of Determination (known as R-squared or R2)
- Root Mean Square Error (RMSE)
- Mean Square Error (MSE)
Question 114)
You are able to associate the K-Means clustering algorithm with the following machine learning type:
- Supervised machine learning
- Unsupervised machine learning
- Reinforcement learning
Question 115)
In order to train your K-Means clustering model that enables grouping observations into four clusters, you decide to use scikit-learn library. Considering this scenario, what method should you choose to create the K-Means object?
- model = KMeans(n_clusters=4)
- model = Kmeans(n_init=4)
- model = Kmeans(max_iter=4)
Question 116)
Your task is to reduce the size of the feature maps that a convolutional layer generates when you create a convolutional neural network. What action should you take in this case?
- Increase the number of filters in the convolutional layer
- Reduce the size of the filter kernel used in the convolutional layer
- Add a pooling layer after the convolutional layer
Question 117)
You want to create a pipeline for which you defined three steps entitled as step1, step2, and step3.
Your goal is to run the pipeline as an experiment after the steps have been assigned to it.
Which of the following SDK command should you choose for this task?
- train_pipeline = Pipeline(workspace = ws, steps = [step1,step2,step3])
experiment = Experiment(workspace = ws, name = ‘training-pipeline’)
pipeline_run = experiment.submit(train_pipeline)
Question 118)
What code should you write for an instance of a TabularExplainer if you have a model entitled loan_model?
- from interpret.ext.blackbox import TabularExplainer
tab_explainer = TabularExplainer(model=loan_model,
initialization_examples=X_test,
features=[‘loan_amount’,’income’,’age’,’marital_status’],
classes=[‘reject’, ‘approve’])
Question 119)
Which of the non-exhaustive cross validation techniques listed below enables you to assign data points in a random way to the training set and the test set?
- Holdout cross-validation
- Repeated random sub-sampling validation
- K-fold cross-validation
Question 120)
Which of the following methods are the ACI services and AKS services default authentication ones?
- Disabled for AKS services
- Token-based for ACI services
- Token-based for AKS services.
- Disabled for ACI services
- Key-based for AKS services
Question 121)
Your task is to predict if a person suffers from a disease by setting up a binary classification model. Your solution needs to be able to detect the classification errors that may appear.
Considering the below description, which of the following would be the best error type?
“A person does not suffer from a disease. Your model classifies the case as having no disease”.
- True negatives
- False negatives
- False positives
- True positives
Question 122)
Your task is to deploy your service on an AKS cluster that is set up as a compute target.
What SDK commands are able to return you the expected result?
- from azureml.core.compute import ComputeTarget, AksCompute
cluster_name = ‘aks-cluster’
compute_config = AksCompute.provisioning_configuration(location=’eastus’)
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
production_cluster.wait_for_completion(show_output=True)
Question 123)
For training a classification model that is able to predict based on 8 numeric features where in the classes is belonging an observation, you configured a deep neural network.
From the list below, which one states a truth related to the network architecture?
- The network layer should contain four hidden layers
- The input layer should contain four nodes
- The output layer should contain four nodes
Question 124)
You are able to update web services already deployed and to enable the Application Insight with the use of Azure ML SDK.
What code should you write to achieve this?
- service = ws.webservices[‘my-svc’]
service.update(enable_app_insights=True)
Question 125)
How should the following sentence be completed?
One example of the machine learning […] type models are the Decision trees algorithms.
- Regression
- Clustering
- Classification
Question 126)
What Python code should you write if your goal is to extract the primary metric for a regression task?
- from azureml.train.automl.utilities import get_primary_metrics
get_primary_metrics(‘regression’)
Question 127)
Your task is to extract local feature importance from a TabularExplainer.
What code should you write in the SDK to achieve this goal?
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()
Question 128)
If you want to list the generated files after your experiment run is completed, what is the most suitable object run you should choose?
- download_files
- download_file
- get_file_names
- list_file_names
Question 129)
In order to train models, you decide to use an Azure Machine Learning compute resource. You set up the compute resource in the following manner: – Minimum nodes: 1 – Maximum nodes: 5. You have to decrease the minimum number of nodes and to increase the maximum number of nodes to the following values: – Minimum nodes: 0 – Maximum nodes: 8
Considering that you have to reconfigure the compute resource, which three ways are able to return the expected result? Keep in mind that every correct answer presents a complete solution.
- Run the refresh_state() method of the BatchCompute class in the Python SDK.
- Use the Azure Machine Learning studio.
- Use the Azure portal.
- Use the Azure Machine Learning designer.
- Run the update method of the AmlCompute class in the Python SDK.
Question 130)
You are using an Azure Machine Learning service for your data science project. In order to deploy the project, you have to choose a compute target. For this scenario, which of the following Azure services is the most suitable?
- Apache Spark for HDInsight
- Azure Databricks
- Azure Data Lake Analytics
- Azure Container Instances
Question 131)
What code should you write using SDK if your goal is to extract the best run and its model?
- best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
metric = best_run_metrics[metric_name]
print(metric_name, metric)
Question 132)
You published a parametrized pipeline and you now want to be able to pass parameter values in the JSON payload for the REST interface.
What SDK commands are the most appropriate to achieve your goal?
- response = requests.post(rest_endpoint,
headers=auth_header,
json={“ExperimentName”: “run_training_pipeline”,
“ParameterAssignments”: {“reg_rate”: 0.1}})
Question 133)
In order to do a multi-class classification using an unbalanced training dataset, you have to apply C-Support Vector classification. You use the following Python code for the C-Support Vector classification:
from sklearn.svm import svc
import numpy as np
svc = SVC(kernel = ‘linear’, class_weight= ‘balanced’, c-1.0, random_state-0)
model1 = svc.fit(X_train, y)
Considering that your task is to evaluate the C-Support Vector classification code, what is the most appropriate evaluation statement?
- class_weight=balanced: Automatically adjust weights inversely proportional to class frequencies in the input data
C parameter: Penalty parameter
Question 134)
One of the categorical variables of your AirBnB dataset is room type.
You have three room types, as follows: private room, entire home/apt, and shared room.
Every room is assigned with a unique numerical value because you have encoded every unique string into a number.
In order for the machine learning algorithm to effect every category, you have to one-hot encode every one of the values to a location in an array.
What code should you write to achieve this goal?
- from pyspark.ml.feature import OneHotEncoder
encoder = OneHotEncoder(inputCols=[“room_type_index”], outputCols=[“encoded_room_type”])
encoderModel = encoder.fit(indexedDF)
encodedDF = encoderModel.transform(indexedDF)
display(encodedDF)
Question 135)
You decided to use Parquet files and Petastorm to train a distributed neural network by using Horovod.
Your housing prices dataset from California is entitled cal_housing.
In order to concatenate the features and labels of the model after you load the data, you decide to configure from the Pandas DataFrame a Spark DataFrame.
What code should you write in Python to achieve this?
- data = pd.concat([pd.DataFrame(X_train, columns=cal_housing.feature_names), pd.DataFrame(y_train, columns=[“label”])], axis=1)
trainDF = spark.createDataFrame(data)
display(trainDF)
Question 136)
You decide to use Azure Machine Learning designer for your real-time service endpoint. You can make use of only one Azure Machine Learning service compute resource.
You start training the model and preparing the real-time pipeline for deployment.
If you want to obtain a web service by publishing the inference pipeline, what is the most suitable compute type?
- a new Machine Learning Compute resource
- Azure Kubernetes Services
- the existing Machine Learning Compute resource
- Azure Databricks
- HDInsight
Question 137)
Python is commonly known to ensure extensive functionality with powerful and statistical numerical libraries. What are the utilities of TensorFlow?
- Analyzing and manipulating data
- Offering simple and effective predictive data analysis
- Supplying machine learning and deep learning capabilities
- Providing attractive data visualizations
Question 138)
In order to find all the runs for a specific experiment, you can use also the search_runs method.
What code should you write in Python to achieve this?
- experiment_id = run.info.experiment_id
runs_df = mlflow.search_runs(experiment_id)
display(runs_df)
Question 139)
You’re using the Azure Machine Learning Python SDK to define a pipeline to train a model.
The data used to train the model is read from a folder in a datastore.
You need to ensure the pipeline runs automatically whenever the data in the folder changes.
What should you do?
- Create a ScheduleRecurrence object with a Frequency of auto. Use the object to create a schedule for the pipeline
- Create a Schedule for the pipeline. Specify the datastore in the datastore property, and the folder containing the training data in the path_on_datastore property
- Create a PipelineParameter with a default value that references the location where the training data is stored
- Set the regenerate_outputs property of the pipeline to True
Question 140)
You have a set of CSV files that contain sales records. Your CSV files follow an identical data schema.
The sales record for a certain month are held in one of the CSV files and the filename is sales.csv. For every file there is a corresponding storage folder that shows the month and the year for the data recording. In an Azure Machine Learning workspace has been set up a datastore for the folders kept in an Azure blob container. The parent folder entitled sales contains the folders organized to create the hierarchical structure below:
/sales
/01-2019
/sales.csv
/02-2019
/sales.csv
/03-2019
/sales.csv
…
In the sales folder is added a new folder with a certain month’s sales every time that month has ended. You want to train a machine learning model by using the sales data while complying with the requirements below:
– All of your sales data have to be loaded to date by a dataset and into a structure that enables easy conversion to a dataframe.
– You have to ensure that experiments can be done by using only the data created until a specific previous month, disregarding any data added after the month selected.
– You have to keep the number of registered datasets to the minimum possible.
Considering that the sales data have to be registered as a dataset in the Azure Machine Learning service workspace, what actions should you take?
- Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
- Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
- Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
Question 141)
Your task is to extract from the experiments list the last run.
What code should you write in Python to achieve this?
- runs = client.search_runs(experiment_id, order_by=[“attributes.start_time desc”], max_results=1)
runs[0].data.metrics
Question 142)
You decided to use the from_files method of the Dataset.File class to configure a file dataset.
You then want to register the file dataset with the title img_files in a workspace.
What SDK commands should you choose for this task?
- from azureml.core import Dataset
blob_ds = ws.get_default_datastore()
file_ds = Dataset.File.from_files(path=(blob_ds, ‘data/files/images/*.jpg’))
file_ds = file_ds.register(workspace=ws, name=’img_files’)
Question 143)
Your task is to extract local feature importance from a TabularExplainer.
What code should you write in the SDK to achieve this goal?
- local_tab_explanation = tab_explainer.explain_local(X_test[0:5])
local_tab_features = local_tab_explanation.get_ranked_local_names()
local_tab_importance = local_tab_explanation.get_ranked_local_values()
Question 144)
If you want to use the from_delimited_files method of the Dataset.Tabular class to configure and register a tabular dataset, what are the most appropriate Python commands?
- from azureml.core import Dataset
blob_ds = ws.get_default_datastore()
csv_paths = [(blob_ds, ‘data/files/current_data.csv’),
(blob_ds, ‘data/files/archive/*.csv’)]
tab_ds = Dataset.Tabular.from_delimited_files(path=csv_paths)
tab_ds = tab_ds.register(workspace=ws, name=’csv_table’)
Question 145)
You want to evaluate a Python NumPy array that has six data points with the following definition: data = [10, 20, 30, 40, 50, 60]
Your task is to use the k-fold algorithm implementation in the Python Scikit-learn machine learning library to generate the output that follows: train: [10 40 50 60], test: [20 30] train: [20 30 40 60], test: [10 50] train: [10 20 30 50], test: [40 60]
In order to generate the output, you have to implement a cross-validation.
To give the correct answer, you have to replace the code comments that are bolded with some suitable code options that you find in the answer area.
Considering this, what snippet should you choose to complete the code?
from numpy import array
from sklearn.model_selection import #1st option
data – array ([10, 20, 30, 40, 50, 60])
kfold – Kfold(n_splits- #2nd option, shuffle – True – random_state-1)
for train, test in kFold, split( #3rd option):
print (‘train’: %s, test: %5’ % (data[train], data[test])
- K-means, 6, array
- K-fold, 3, array
- CrossValidation, 3, data
- K-fold, 3, data
Review
I recently completed the “Prepare for DP-100: Data Science on Microsoft Azure Exam” course on Coursera, and it’s the perfect capstone to the specialization. Covering six modules, the course revisits all the critical topics—like setting up data science environments, managing resources, running experiments, training predictive models, and deploying them into production.
What really stood out was the focus on practical exam strategies: the practice tests closely mirror the structure of the real DP-100 exam, and the tips provided for exam day preparation are extremely valuable. Beyond technical review, the course also offers guidance on career pathways and next steps after certification.
If you’ve gone through the full specialization, this final course ties everything together and ensures you’re not just ready to pass but to apply these skills professionally. It’s a must-take for anyone aiming to successfully earn their Azure Data Scientist Associate certification.