Prepare for DP-203: Data Engineering on Microsoft Azure Exam Quiz Answers

October 1, 2023October 17, 2023 by Niyander

In this article i am gone to share Coursera Course: Prepare for DP-203: Data Engineering on Microsoft Azure Exam Quiz Answers with you..

Enrol Link: DP-203: Data Engineering on Microsoft Azure

Prepare for DP-203: Data Engineering on Microsoft Azure Exam Quiz Answers

Full Practice Exam Quiz Answers

Question 1)
Scenario: While working on a project, the need arises to develop T-SQL scripts and notebooks in Azure Synapse Analytics.

Which of the following may be used to accomplish this?

Data Lake
Databricks
Azure Portal
Azure Synapse Studio
DevTest Labs

Question 2)
Which of the following can be accessed in Azure Synapse Studio, from the Monitor hub? Select all options that apply

Pipeline runs
Integration runtimes
SQL requests
Apache Spark jobs
Trigger runs
Data flow debug

Question 3)
Which component of Azure Synapse analytics allows the different engines to share the databases and tables between Spark pools and SQL on-demand engine?

Azure Data Explorer
Azure Synapse Studio
Azure Synapse Link
Azure Synapse Spark pools
None of the listed options
Azure Synapse Pipeline

Question 4)
Whilst Azure Synapse Analytics is used for the storage of data for analytical purposes, SQL Pools support the use of transactions and adhere to the ACID (Atomicity, Consistency, Isolation, and Durability) transaction principles associated with relational database management systems.

As such, locking, and blocking mechanisms are put in place to maintain transactional integrity while providing adequate workload concurrency. These blocking aspects may significantly delay the completion of queries.

In this scenario you need to improve the response time while completing queries. You have the option to adjust the setting for the READ_COMMITTED_SNAPSHOT database option for a user database when connected to the master database. Which setting would you select?

READ_COMMITTED_SNAPSHOT is not the correct setting to adjust.
None of the listed options.
Turn OFF the READ_COMMITTED_SNAPSHOT database option.
Turn On the READ_COMMITTED_SNAPSHOT database option.

Question 5)
Scenario: You are working on an Azure Synapse Analytics Workspace as part of your project. One of the requirements is to have Azure Synapse Analytics Workspace access an Azure Data Lake Store using the benefits of the security provided by Azure Active Directory.

Which is the best authentication method to use?

Managed identities
SQL Authentication
Shared access signatures
Storage account keys

Question 6)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Synapse Analytics is an integrated analytics platform, which combines data warehousing, big data analytics, data integration, and visualization into a single environment. Azure Synapse Analytics empowers users of all abilities to gain access and quick insights across all their data, enabling a whole new level of performance and scale.
Descriptive analytics answers the question [?].

“What is happening in my business?”
“What is likely to happen in the future based on previous trends and patterns?”
“When will the modification made meet my goals?”
“Why is it happening?”

Question 7)
Which of the following are facets of Azure Databricks security? Select all options that apply

Compliance
Vault
IAM/Auth
Data Protection
Network
Load Balancing

Question 8)
Which of the following solutions matches the description in the statement below?

“A managed Spark as a Service propriety Solution that provides an end-to-end data engineering/data science platform as a solution. This is of interest for Data Engineers and Data Scientists, working on big data projects daily because it provides the whole platform in which you have the ability to create and manage the big data/data science pipelines/projects all on one platform.”

Spark Pools in Azure Synapse Analytics
Azure Databricks
HDI
Apache Spark

Question 9)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Because the Databricks API is declarative, there are several optimizations are available to us. Among the most powerful components of Spark are Spark SQL. At its core lies the Catalyst optimizer.

When you execute code, Spark SQL uses Catalyst’s general tree transformation framework in four phases, as shown below:

1. analyzing a logical plan to resolve references
2. logical plan optimization
3. physical planning
4. code generation to compile parts of the query to Java bytecode

In the physical planning phase, Catalyst may generate multiple plans and compare them based on [?].

Region
Rules
Permissions
Cost

Question 10)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Security and infrastructure configuration go together. When you set up your Azure Databricks workspace(s) and related services, you need to make sure that security considerations do not take a back seat during the architecture design.

When enabled, authentication automatically takes place in Azure Data Lake Storage (ADLS) from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that one uses to log into Azure Databricks. Commands running on a configured cluster will be able to read and write data in ADLS without needing to configure service principal credentials. Any ACLs applied at the folder or file level in ADLS are enforced based on the user’s identity.

ADLS Passthrough is configured when you create a cluster in the Azure Databricks workspace. On a standard cluster when you enable the ADLS Passthrough setting… [?]

You must set two user accesses to one of the Azure Active Directory (AAD) users in the Azure Databricks workspace. The second is required as a backup or secondary user.
You must set single user access to one of the Azure Active Directory (AAD) users in the Azure Databricks workspace.
You may set multiple user accesses to one of the Azure Active Directory (AAD) users in the Azure Databricks workspace. The additional access is required as a backup or auxiliary users.
You inherit user access from the Azure Active Directory (AAD) users to the Azure Databricks workspace.

Question 11)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Databricks is a fully managed, cloud-based Big Data and Machine Learning platform, which empowers developers to accelerate AI and innovation by simplifying the process of building enterprise-grade production data applications. Built as a joint effort by Databricks and Microsoft, Azure Databricks provides data science and engineering teams with a single platform for Big Data processing and Machine Learning.

By combining the power of Databricks, an end-to-end, managed Apache Spark platform optimized for the cloud, with the enterprise scale and security of Microsoft’s Azure platform, Azure Databricks makes it simple to run large-scale Spark workloads.

Internally, [?] is used to run the Azure Databricks control-plane and data-planes via containers running on the latest generation of Azure hardware (Dv3 VMs), with NvMe SSDs capable of blazing 100us latency on IO.

Azure Kubernetes Service
Azure VNet Peering
Azure Database Services
Azure Machine Learning Studio

Question 12)
Scenario: You need to load data into a data store or compute resource using Azure Data Factory.

Which transformation in Mapping Data Flow can you use to perform this action?

Field mapping
Sink
Cache
Window
Source

Question 13)
Which Azure data platform is commonly used to process data in an ELT framework?

Azure Stream Analytics
Azure Data Catalog
Azure Data Lake Storage
Azure Databricks
Azure Data Factory

Question 14)
Scenario: Contoso have decided to implement on-premises Microsoft SQL Server pipelines using a custom solution. You meet with the IT team to discuss pulling data from SQL Server and migrating it to Azure Blob storage.

The team’s requirements are as follows:

• The process must orchestrate and manage the data lifecycle.
• The process must configure Azure Data Factory to connect to the on-premises SQL Server database.

The IT team have put together a list of actions they think need to be performed to meet the needs of the project. However, they are not sure of the order in which they should execute these actions.

Below is a list of their proposed actions:

a. Create an Azure Data Factory resource.
b. Configure a self-hosted integration runtime.
c. Create a virtual private network (VPN) connection from on-prem to MS Azure.
d. Create a database master key on SQL Server.
e. Backup the database and send it to Azure Blob storage.
f. Configure the on-prem SQL Server instance with an integration runtime.

As the Azure SME, you must advise the team on the correct items and order. Which sequence below contains the correct items and order to meet the team’s requirements?

d -> c -> e -> b
a -> b -> f
c -> a -> b -> f
c -> d -> a -> b -> f

Question 15)
Azure Data Factory provides a variety of methods for ingesting data, and a range of methods for performing transformations.

These methods are:

• Mapping Data Flows
• Compute Resources
• SSIS Packages

Mapping Data Flows provides several different transformation types that enable you to modify data. They are broken down into the following categories:

• Schema modifier transformations
• Row modifier transformations
• Multiple inputs/outputs transformations

Which transformations type is best described as “a Sort transformation that orders the data.”?

Schema modifier transformations
None of the listed options.
Multiple inputs/outputs transformations
Row modifier transformations

Question 16)
Which Azure Data Factory component can run a data movement command or orchestrate a transformation job?

SSIS
Activities
Integration runtime
Datasets
Linked Services

Question 17)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Data Factory provides a variety of methods for ingesting data and performing transformations.

These methods are:

• Mapping Data Flows
• Compute Resources
• SSIS Packages

Mapping Data Flows provides several different transformation types that enable you to modify data. They are broken down into the following categories:

• Schema modifier transformations
• Row modifier transformations
• Multiple inputs/outputs transformations

Some of the transformations that you can define have a(n) [?] that enable you to customize the functionality of a transformation using columns, fields, variables, parameters, and functions from your data flow in these boxes. To build the expression, use the [?]. You can launch it by clicking in the expression text box inside the transformation. You’ll also sometimes see “Computed Column” options when selecting columns for transformation.

~~Wrangling Data Flow~~
~~Data Stream Expression Builder~~
Data Expression Orchestrator
Data Flow Expression Builder
Data Expression Script Builder
Mapping Data Flow

Question 18)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

[?] is a cloud-integration service which orchestrates the movement of data between various data stores. [?] processes and transforms data by using compute services such as Azure HDInsight, Hadoop, Spark, and Azure Machine Learning. Publish output data to data stores such as Azure SQL Data Warehouse so that business intelligence applications can consume the data.

Azure Databricks
Azure Data Catalog
Azure Data Factory
Azure Cosmos DB
Azure Data Lake Storage
Azure Storage Explorer

Question 19)
When creating a notebook, you need to specify the pool that must be attached to the notebook (that is, a SQL or Spark pool). When it comes to languages, a notebook must be set with a primary language.

Which of the following are primary languages available within the notebook environment? (Select four)

JSspark (JavaScript)
Spark SQL
.NET Spark (C#)
PySpark (Python)
JVspark (Java)
Spark (Scala)

Question 20)
What is an Apache Spark notebook?

The logical Azure Databricks environment in which clusters are created, data is stored (via DBFS), and in which the server resources are housed.
A notebook is a collection of cells. These cells are run to execute code, render formatted text, or display graphical visualizations.
A cloud-based Big Data and Machine Learning platform, which empowers developers to accelerate AI and innovation by simplifying the process of building enterprise-grade production data applications.
The default Time to Live (TTL) property for records stored in an analytical store can manage the lifecycle of data and define how long it will be retained for.

Question 21)
Which of the following is an element of a Spark Pool in Azure Synapse Analytics?

Spark Instance
Databricks
HDI
Spark Console

Question 22)
Scenario: Contoso uses Azure Cosmos DB to store user profile data from their eCommerce site. The NoSQL document store provided by the Azure Cosmos DB SQL API offers Contoso the familiarity of managing their data using SQL syntax, while being able to read and write the files at a massive, global scale.

While Contoso is happy with the capabilities and performance of Azure Cosmos DB, they are concerned about the cost of executing a large volume of analytical queries over multiple partitions (cross-partition queries) from their data warehouse. They want to efficiently access all the data without needing to increase the Azure Cosmos DB request units (RUs).

They have looked at options for extracting data from their containers to the data lake as it changes, through the Azure Cosmos DB change feed mechanism. The problem with this approach is the extra service and code dependencies and long-term maintenance of the solution. They could perform bulk exports from a Synapse Pipeline, but then they won’t have the most up-to-date information at any given moment.

Required: Choose a solution which ensures all transactional data is automatically stored in a fully isolated column store without impacting the transactional workloads or incurring resource unit (RU) costs.

Enable Azure Synapse Link for Cosmos DB and enable the analytical store on their Azure Cosmos DB containers.
Enable Azure Private Link for SQL Database and enable the analytical store on their SQL Database containers.
Enable Spark Pools for SQL Datawarehouse and enable the analytical store on their Azure Cosmos DB containers.
None of the listed options.

Question 23)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

In Data Lake Storage Gen1, data engineers query data by using the [?] language.

~~M-SQL~~
U-SQL
~~ADLS API~~
~~ABS API~~
T-SQL

Question 24)
Which transformation is used to load data into a data store or compute resource?

Source
Field
Cache
Sink
Window

Question 25)
What issue can cause a slower performance on join or shuffle jobs?

Bucketing
Enablement of autoscaling
Use of the cache option
Data skew

Question 26)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Synapse dedicated SQL Pools supports JSON format data to be stored using [?]. The JSON format enables representation of complex or hierarchical data structures in tables. It allows users to transform arrays of JSON objects into table format. The performance of JSON data can be optimized by using columnstore indexes and memory optimized tables.

Standard NCHAR table columns
Standard CHAR table columns
Standard NVARCHAR table columns
Standard VARCHAR table columns

Question 27)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

[A] data is typically tabular data that is represented by rows and columns in a database. Databases that hold tables in this form are called [B] databases.

[A] Unstructured, [B] Binary
[A] Structured, [B] Relational
[A] JSON, [B] Semi-Structured
[A] Relational, [B] Structured

Question 28)
What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade?

Creates an order-based index on the Grade field to improve filters against that field.
It creates an order-based index on the Grade field to improve filters against that field. It also ensures that all data backing (for example, Grade=8) is colocated, and then updates a graph that routes requests to the appropriate files.
Ensures that all data backing (for example, Grade=8) is colocated, then updates a graph that routes requests to the appropriate files.
Ensures that all data backing (for example, Grade=8) is colocated, then rewrites the sorted data into new Parquet files.

Question 29)
Identify the missing word(s) in the following scenario within the context of Microsoft Azure.

With the Azure-SSIS integration runtime installed, and SQL Server Data Tools (SSDT), you have the capability to deploy and manage SSIS packages that you create in the cloud. For some packages, you may be able to rebuild them by redeploying them in the Azure-SSIS runtime. However, there may be some SSIS packages that already exist within your environment that may not be compatible.

You can use the [?] to perform an assessment of the SSIS packages that exist and identify any compatibility issues with them.

Azure Lab Services
Azure ARM templates
Azure Data Migration Assistant
Azure SQL Server Upgrade Advisor
Azure Advisor
Azure SQL Server Management Studio

Question 30)
Conditional access is a feature that enables you to define the conditions under which a user can connect to your Azure subscription and access services. Conditional access provides an additional layer of security that can be used in combination with authentication to strengthen the security access to your network.

Conditional Access policies at their simplest are [?].

If-else statements.
If-then statements.
While-if statements.
Where-having statements.

Question 31)
What does Azure Data Lake Storage (ADLS) Passthrough enable?

~~Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials.~~
~~Blocking ADLS resources through a mount point when credential passthrough is enabled.~~
User security groups that are added to ADLS are automatically created in the workspace as Databricks groups.
Automatically mounting ADLS accounts to the workspace that are added to the managed resource group.

Question 32)
Which of the following roles works with Azure Cognitive Services, Cognitive Search, and the Bot Framework?

A System Administrator
A Data Engineer
An RPA Developer
A Solution Architect
An AI Engineer
A Data Scientist

Question 33)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

As a Data Engineer, you can transfer and move data in several ways. The most common tool is [?], which provides robust resources and nearly 100 enterprise connectors. [?] also allows you to transform data by using a wide variety of languages.

Azure Data Lake Storage
Azure Data Factory
Azure Data Catalogue
Azure Databricks
Azure Stream Analytics

Question 34)
Scenario: You have been contracted by Contoso to advise their IT team on the proper type of storage to use for their files within an Azure Storage environment. Due to the various jurisdictions that Contoso operates in, there are many compliance regulations which must be followed.

Required:

• A single storage account must be used to store all operations (includes all reads, writes, and deletes).
• Retention policy dictates that an on-premises copy must exist for all historical operations.

As the contracted expert on Azure, the team look to you for direction. Which of the following actions should you recommend to them to meet the requirements?

Use the storage client to download log data from $logs/table.
Use the AzCopy tool to download log data from $logs/blob.
Configure the storage account to log read, write, and delete operations for service type queue.
Configure the storage account to log read, write, and delete operations for service type Blob.
Configure the storage account to log read, write, and delete operations for service-type table.

Question 35)
Microsoft Azure Storage is a managed service that provides durable, secure, and scalable storage in the cloud. You can create an Azure storage account using the Azure Portal, Azure PowerShell, or Azure CLI. Azure Storage provides three distinct account options with different pricing and features supported.

Which of these Azure Storage account options is best described by the following statement?
“A legacy account type which supports only block and append blobs.”

GPv1 storage accounts
GPv2 storage accounts
Page storage accounts
Blob storage accounts
Queue storage accounts
Block storage accounts

Question 36)
Azure Synapse Studio is the primary tool to use to interact with the many components that exist in the service. It organizes itself into hubs which allow you to perform a wide range of activities against your data.

Which of the following are the referenced hubs on Azure Synapse Studio? Select all options that apply.

Manage
Integrate
Home
Monitor

Question 37)
In Azure Synapse Studio, you can manage integration pipelines within the Integrate hub.

Which of the following will you see when you expand Pipelines? Select all options that apply

Notebooks
Provisioned SQL pool databases
Pipeline canvas
Master Pipeline
Data flows
Activities

Question 38)
Which component enables you to perform code free transformations in Azure Synapse Analytics?

Flow capabilities
Mapping data flow
Studio
Copy activity
Monitoring capabilities
Control capabilities

Question 39)
Azure Advisor provides you with personalized messages that provide information on best practices to optimize the setup of your Azure services. Azure Advisor recommendations are free, and the recommendations are based on telemetry data that is generated by Azure Synapse Analytics. Which of the following includes the telemetry data that is captured by Azure Synapse Analytics? Select all options that apply.

Data Skew and replicated table information
Encryption deficiencies
Column statistics data
Adaptive Cache
TempDB utilization data

Question 40)
Which is the default distribution used for a table in Synapse Analytics?

Clustered distribution
Replicated Table distribution
Round-Robin distribution
HASH distribution
Non-clustered distribution
B-tree distribution

Question 41)
Which of the following do you need to create to generate a Spark pool in Azure Synapse Analytics?

HDI
A Spark Instance
Synapse Analytics Workspace
Azure Databricks

Question 42)
Which of the following statements about the Azure Databricks Data Plane is true?

~~The Data Plane is hosted within a Microsoft-managed subscription.~~
~~The Data Plane contains the Cluster Manager and coordinates data processing jobs.~~
The Data Plane is hosted within the client subscription and is where all data is processed and stored.
The Data Plane is where you manage Key Vault itself and it is the interface used to create and delete vaults.

Question 43)
What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn’t running when the cluster is called by Data Factory?

Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity.
If the target cluster is stopped, Databricks will start the cluster before attempting to execute.
The Databricks activity will fail in Azure Data Factory – you must always have the cluster running.
Whenever a cluster is paused or shut down, ADF will recover from the last operational PiT.

Question 44)
When planning and implementing your Azure Databricks deployments, there are several considerations with respect to compliance. In many industries, it is imperative to maintain compliance through a combination of following best practices in storing and handling data, and by using services that maintain compliance certifications and attestations.
Which the following compliance certifications are available in Azure Databricks? Select all options that apply

PCI DSS
ISO 27018
HITRUST
ISO 27001
HIPAA
SOC2, Type 2

Question 45)
Azure Data factory can accommodate organizations that are embarking on data integration projects from differing starting points. Typically, many data integration workflows must consider existing pipelines that have been created on previous projects, with different dependencies and using different technologies.

Which of the following are ingestion methods that can be used to extract data from a variety of sources?

Compute resources
Copy Activity
Activities
Linked Services
SSIS packages
Self-hosted

Question 46)
Continuous integration is the practice of testing each change made to your codebase automatically and as early as possible. Continuous delivery follows the testing that happens during continuous integration and pushes changes to a staging or production system.

Below is a sample overview of the CI/CD lifecycle in an Azure data factory that’s configured with Azure Repos Git.

The order of the activities has been shuffled.

a. A developer creates a feature branch to make a change. They debug their pipeline runs with their most recent changes.
b. A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
c. After a pull request is approved and changes are merged in the master branch, the changes are then published to the development factory.
d. After the changes have been verified in the test factory, deploy to the production factory by using the next task of the pipelines release.
e. When the team is ready to deploy the changes to a test or UAT (User Acceptance Testing) factory, they go to their Azure Pipelines release and deploy the desired version of the development factory to UAT. This deployment takes place as part of an Azure Pipelines task and uses Resource Manager template parameters to apply the appropriate configuration.
f. After a developer is satisfied with their changes, they create a pull request from their feature branch to the master or collaboration branch to have their changes reviewed by peers.

Select the correct sequence of events in the CI/CD lifecycle.

b -> a -> f -> d -> e -> c
b -> a -> f -> c -> e -> d
a -> f -> d -> b -> c -> e
a -> b -> c -> f -> d -> e

Question 47)
Scenario: Your team is moving data from an Azure Data Lake Gen2 store to Azure Synapse Analytics. The team is planning a data copy activity and is currently discussing which integration runtime to use.

Which Azure Data Factory integration runtime should be used in the data copy activity?

Azure
Linked Services
Activities
Self-hosted
Azure-SSIS
Datasets

Question 48)
Azure Data Factory provides a variety of methods for ingesting data and a range of methods for performing transformations.
Which of the following are valid options for transforming data within Azure Data Factory? Select all options that apply

SSIS Packages
Control Resources
Mapping Data Flows
Test Lab Packages
Compute Resources
Data Movement Flows

Question 49)
Consider: Continuous Integration/Continuous Delivery lifecycle
Which feature commits the changes of Azure Data Factory work in a custom branch created with the main branch in a Git repository?

DDL commands
DML commands
TCL commands
Pull request
Commit
Repo

Question 50)
Scenario: Your company has a Data Lake Storage Gen2 account. You have been tasked with moving files from Amazon S3 to Azure Data Lake Storage.

Which tool should you choose to complete this task?

Azure Portal
Azure Data Catalog
Azure Storage Explorer
Azure Data Factory
Azure Data Studio

Question 51)
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

None of the listed options
IPGeocodeDF = read.spark.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)
IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)

Question 52)
Scenario: You are working on a new project and meet to discuss which Azure data platform technology is best for your company.
Requirement: A globally distributed, multi-model database that can perform queries in less than a second. Which of the following should you choose?

Azure SQL on VM
Azure Data Factory
Azure Cosmos DB
Azure SQL Data Warehouse
Azure Databricks
Azure SQL Database

Question 53)
Azure provides many ways to store your data. There are multiple database options like Azure SQL Database, Azure Cosmos DB, and Azure Table Storage. Azure offers multiple ways to store and send messages, such as Azure Queues and Event Hubs. You can even store loose files using services like Azure Files and Azure Blobs.
A storage account defines a policy that applies to all the storage services in the account.

Which settings are controlled by a storage account? Select all options hat apply

Performance
Secure transfer required
Replication
Virtual networks
Subscription
Access tier

Question 54)
Azure Cosmos DB is a globally distributed, multi-model database. Which of the following APIs can be used to deploy it?

Cassandra API
MongoDB API
ADLS API
Table API
Gremlin API
SQL API

Question 55)
By default, when a table is created the data structure has no indexes and is called a heap. A well-designed indexing strategy can reduce disk I/O operations and consume less system resources therefore improving query performance, especially when using filtering, scans, and joins in a query.
Dedicated SQL Pools have which of the following indexing options available?

Key indexes
Hash indexes
B-tree indexes
Clustered Rowstore Indexes
Non-clustered index
Clustered columnstore indexes

Question 56)
As great as Data Lakes are at inexpensively storing our raw data, they also bring with them several performance challenges:

• Too many small files, or too many large files, results in more time opening and closing files rather than reading contents (this is worse with streaming).
• Partitioning (also known as “poor man’s indexing”) breaks down if you choose the wrong fields, or if your data has many dimensions or high cardinality columns.
• No caching – cloud storage throughput is low (cloud object storage is 20-50MB/s/core vs 300MB/s/core for local SSDs).

As a solution to the challenges with Data Lakes noted above, Delta Lake is a file format that can help you build a data lake comprised of one or many tables in Delta Lake format. Delta Lake integrates tightly with Apache Spark and uses an open format that is based on Parquet.
Two of the core features of Delta Lake are performing UPSERTs and Time Travel operations.
What does the UPSERT command do?

The command will INSERT a table and if the table already exists, UPDATE the table.
The command will INSERT a row and if the row already exists, append a new row in the table with an update notation.
The command will INSERT a row and if the row already exists, UPDATE the row.
The command will INSERT a column and if the column already exists, add a new column in the table with an update notation.
The command will INSERT a column and if the column already exists, UPDATE the column.

Question 57)
What is an Azure Key Vault-backed secret scope?

It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session.
It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets.
A Databricks secret scope that is backed by Azure Key Vault instead of Databricks.
An Azure Key Vault-backed secret scope is a private key framework managed by Microsoft.

Question 58)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Data Lake Storage Gen2 provides a first-class data lake solution that enables enterprises to consolidate their data.
Along with role-based access control (RBAC), Azure Data Lake Storage Gen2 provides [?] that are POSIX-compliant, and restrict access to only authorized users, groups, or service principals. It applies restrictions in a way that’s flexible, fine-grained, and manageable.

Online Transaction Processing (OLTP)
Transmission Control Protocol (TCP)
Transport Layer Security (TLS)
Access Control Lists (ACLs)
Transparent Data Encryption (TDE)

Question 59)
Scenario: You are working as a consultant with a security firm and advising the IT team on how to design a hybrid solution to synchronize data and on-premises Microsoft SQL Server database to Azure SQL Database.
Required: An assessment of databases must be performed to determine if data will move without compatibility issues.
The firm’s IT team has many different tools at their disposal, and it is your responsibility to advise them on which tool to use. Which of the following is the best tool for the application?

Data Migration Assistant (DMA)
SQL Server Migration Assistant (SSMA)
SQL Vulnerability Assessment (VA)
Microsoft Assessment and Planning Toolkit

Question 60)
Azure HDInsight provides technologies to help you ingest, process, and analyze big data. It supports batch processing, data warehousing, IoT, and data science.
Data processing within Hadoop uses which of the following languages to process big data? Select all options that apply

.NET
Python
R
Java
C#
JavaScript

Question 61)
At what stage of a typical project should you create your storage account(s)?

At the end, during resource cleanup.
At any stage of the project before data analysis.
After deployment when the project is running.
At the beginning, during project setup.

Question 62)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Microsoft Azure Storage is a managed service that provides durable, secure, and scalable storage in the cloud. The Azure Queue service is used to store and retrieve messages. Queue messages can be up to [A] KB in size, and a queue can contain millions of messages. Queues are used to store lists of messages to be processed [B].

~~[A] 50, [B] in a time bound manner~~
~~[A] 25, [B] sequentially~~
[A] 64, [B] asynchronously
[A] 32, [B] synchronously

Question 63)
How are notebooks saved in Synapse studio?

Notebooks are synced to the Synapse Studio cloud automatically upon changes being made to a file.
Using CTRL + S.
Select the Publish all button on the workspace command bar.
Select the Publish button on the notebook command bar.

Question 64)
In Azure Synapse Studio, where would you view the contents of the primary Data Lake store?

In the linked tab of the Data tab.
In the workspace tab of the Integrate hub.
In the Integration section of the Monitor hub.
None of the listed options.
In the workspace tab of the Data hub.

Question 65)
Once Azure Synapse Link is configured on Cosmos DB, what is the first step you need to perform to use Azure Synapse Analytics serverless SQL pools to query the Azure Cosmos DB data?

Use the OPENROWSET function
Use a SELECT clause
None of the listed options
CREATE database

Question 66)
How does splitting source files help maintain good performance when loading into Synapse Analytics?

Having well defined “zones” established for the data coming into the Data Lake and cleansing and transformation tasks that land the data you need in a curated and optimized state.
Reduced possibility of data corruptions.
Compute node to storage segment alignment.
Optimized processing of smaller file sizes.

Question 67)
There are two concepts within Apache Spark Pools in Azure Synapse Analytics, namely Spark Pools and Spark Instances.
Which of the following attributes belong to Spark Instances? Select all options that apply.

Reusable
Created when connected to Spark Pool, Session, or Job
Multiple users can have access
Permissions can be applied
Creates a Spark Instance
Exists as Metadata

Question 68)
You can natively perform data transformations with Azure Data Factory code free using the Mapping Data Flow task. Mapping Data Flows provide a fully visual experience with no coding required. Your data flows run on your own execution cluster for scaled-out data processing.

Clicking Debug provisions the Spark clusters required to interact with the Mapping Data Flow transformations. If you select AutoResolveIntegrationRuntime, what will be the result? Select all options that apply.

It typically takes 5-7 minutes for the cluster to spin up.
Data engineers can develop data transformation logic with or without writing code.
The number of rows that are returned within the data previewer are fixed by the AutoResolve Agent.
A cluster with eight cores will be available with a time to live value of 60 minutes.
None of the listed options.
All the listed options.

Question 69)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.
[?] is typically used to automate the process of extracting, transforming, and loading the data through a batch process against structured and unstructured data sources.

Azure Data Factory
Azure Designer
Azure Functions
Azure PowerShell
Azure Orchestrator
Azure Conductor

Question 70)
By default, the Azure Data Factory user interface experience (UX) authors directly against the data factory service.

Which of the following are the limitations of this experience? Select all options that apply.

The Data Factory service isn’t optimized for collaboration and version control.
Azure Resource Manager template required to deploy Data Factory itself is not included.
All the listed options.
Data Factory may be configured with GitHub to allow for easier change tracking and collaboration.
The Data Factory service doesn’t include a repository for storing the JSON entities for your changes. The only way to save changes is via the “Publish All” button and all changes are published directly to the data factory service.

Question 71)
Activities within Azure Data Factory define the actions that will be performed on the data. Which of the following are valid activity categories? Select all that apply.

Analytic activities
Control activities
Data transformation activities
Test lab activities
Data movement activities
Data storage activities

Question 72)
Which language can be used to define Spark job definitions?

Transact-SQL
PySpark
C#
Java
PowerShell

Question 73)
What is one possible way to optimize a Spark Job?

Use bucketing
Remove the Spark Pool
Use the local cache option
None of the listed options
Remove all nodes

Question 74)
Spark is a distributed computing environment. Therefore, work is parallelized across executors. At which two levels does this parallelization occur?

The Executor and the Slot
The Driver and the Executor
The Slot and the Task
The Executor and the Task

Question 75)
Data engineers use Azure Stream Analytics to process streaming data and respond to data anomalies in real time. You can use Stream Analytics for Internet of Things (IoT) monitoring, web logs, remote patient monitoring, and point of sale (POS) systems.

Stream Analytics can route job output to which of the following storage systems? Select all options that apply

Azure Blob Storage
Azure SQL Datawarehouse
Azure SQL Database
Azure Cosmos DB
Azure Table Storage
Azure Data Lake Storage

Question 76)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Many business application architectures separate transactional and analytical processing into separate systems with data stored and processed on separate infrastructures. [?] systems are optimized for the analytical processing, ingesting, synthesizing, and managing large sets of historical data.

OLAP
ETL
ADPS
OLTP
ELT

Question 77)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Storage provides a REST API to work with the containers and data stored in each account. The simplest way to handle access keys and endpoint URLs within applications is to use [?].

Storage account connection strings
The instance key
The private access key
A public access key
The REST API endpoint
The account subscription key

Question 78)
Scenario: You are working at an organization that has two types of data:

1. Private and proprietary
2. For public consumption.

When considering Azure Storage Accounts, which option meets the data diversity requirement?

Locate the organization’s data in a data center in the required country or region, with one storage account for each location.
Enable virtual networks for the proprietary data and not for the public data. This will require separate storage accounts for the proprietary and public data.
Locate the organization’s data in a data center with the strictest data regulations to ensure that regulatory requirement thresholds are met. This way, only one storage account will be required for managing all data, which will reduce data storage costs.
None of the listed options.

Question 79)
What is meant by orchestration? Select the best description.

~~Orchestration typically contains the transformation logic or the analysis commands of the Azure Data Factory’s work.~~
Orchestration enables you to ingest the data from a data source to prepare it for transformation and/or analysis. In addition, Orchestration can fire up compute services on demand.
Orchestration helps make your business more efficient by reducing or replacing human interaction with IT systems and instead using software to perform tasks in order to reduce cost, complexity, and errors.
Orchestration is the automated configuration, management, and coordination of computer systems, applications, and services.
~~None of the listed options.~~

Question 80)
Scenario: You are working on a new project which involves creating storage accounts and blob containers for your application.

Which of the options below is a good strategy for this task?

All the listed options.
Create Azure Storage accounts in your application as needed. Create the containers before deploying the application.
None of the listed options.
Create Azure Storage accounts before deploying your app. Create containers in your application as needed.
Create both your Azure Storage accounts and containers before deploying your application.

Question 81)
Azure Data Lake Storage Gen2 plays a fundamental role in a wide range of big data architectures. There are stages for processing big data solutions that are common to all architectures.

What are these stages? Select all options that apply

Ingestion
Model and serve
Store
Relational
Streamed
Prep and train

Question 82)
Which Azure Synapse Analytics component enables you to perform Hybrid Transactional and Analytical Processing?

Azure Data Explorer
Azure Synapse Pipeline
Azure Synapse Studio
Azure Stream Analytics
Azure Synapse Spark pools
Azure Synapse Link

Question 83)
You can integrate your Azure Synapse Analytics workspace with a new Power BI workspace so that you can get your data from within Azure Synapse Analytics visualized in a Power BI report or dashboard.
Which icon should you click on in the home page of Azure Synapse Studio to begin the integration?

Connect BI
Explore and analyze
Import
None of the listed options
Ingest

Question 84)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Synapse Analytics can work by acting as the one stop shop to meet all your analytical needs in an integrated environment.
You can develop big data engineering and machine learning solutions using [?]. You can take advantage of the big data computation engine to deal with complex compute transformations that would take too long in a data warehouse.

Azure Synapse SQL
Azure Cosmos DB
Azure Synapse Link
Apache Spark for Azure Synapse
Azure Synapse Pipelines

Question 85)
Within the context of Azure Databricks, sharing data from one worker to another can be a costly operation.

Spark has optimized this operation by using a format called [?] which prevents the need for expensive serialization and de-serialization of objects to get data from one JVM to another.

Tungsten
Shuffles
Pipelining
Stage boundary
Lineage
Stages

Question 86)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

[?] is a fully managed, cloud-based Big Data and Machine Learning platform, which empowers developers to accelerate AI and innovation by simplifying the process of building enterprise-grade production data applications.

Azure Databricks
Azure Event Hub
Apache Kafka
Apache Spark

Question 87)
What is the Databricks Delta command to display metadata?

DESCRIBE DETAIL tableName
MSCK DETAIL tablename
METADATA SHOW tablename
SHOW SCHEMA tablename

Question 88)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.
A common use of [?] is to take shared data and provide it as a source for Azure Data Factory pipelines to use with your own internal data.

Azure Managed SQL Warehouse
Azure SQL Database
Azure Data Share
Azure Databricks
Azure Data Lake Storage

Question 89)
In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the bridge between the activity and linked services.
An Azure integration runtime performs which of the following actions?
Select all options that apply

All the listed options.
None of the listed options.
Run Copy Activity between cloud data stores.
Run Data Flows in Azure.
Dispatch transform activities in public network utilizing platforms such as Databricks Notebook/ Jar/ Python activity, HDInsight Hive activity and more.
Trigger batch movement of ETL data on a dynamic schedule for most analytics solutions.

Question 90)
Activities within Azure Data Factory define the actions that will be performed on the data. There are three categories of activity, including:

• Data movement activities
• Data transformation activities
• Control activities

When using JSON notation, the activities section can have one or more activities defined within it.
The activities have the following top-level structure:
JSON
{
“name”: “Execution Activity Name”,
“description”: “description”,
“type”: ““,
“typeProperties”:
{
},
“linkedServiceName”: “MyLinkedService”,
“policy”:
{
},
“dependsOn”:
{
}
}

Which of the JSON properties are required for HDInsight?
Select all options that apply

description
type
name
linkedServiceName
policy
typeProperties

Question 91)
A pipeline in Azure Data Factory represents a logical grouping of activities where the activities together perform a certain task.
Which of the following are valid dependency conditions?

Select all options that apply.

Succeeded
Queue
Skipped
Completed
Failed
Pending

Question 92)
Azure Synapse Pipelines is a cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale.
Azure Synapse Pipelines enables you to integrate data pipelines between which of the following?
Select all options that apply

Spark Pools
SQL Pools
Cosmos Pools
Cosmos Serverless
SQL Serverless
Hadoop Pools

Question 93)
The unit of distribution used to parallelize work is a Spark Cluster. Every Cluster has a Driver and one or more executors. What type of object is the work submitted to the Cluster split into?

Arrays
Stages
Chore
Jobs

Question 94)
Scenario: Data loads at your company have increased the processing time for on-premises data warehousing descriptive analytic solutions. You have been tasked with looking into a cloud-based alternative to reduce processing time and release business intelligence reports faster. Your boss wants you to first consider scaling up on-premises servers, but you discover this approach would quickly reach its physical limits.

The new solution must be on a petabyte scale that doesn’t involve complex installations and configurations.
Which of the following solutions would best suit the company’s needs?

~~Azure Stream Analytics~~
~~Azure Table Storage~~
~~Azure DataNow~~
~~Azure Synapse Analytics~~
Azure Cosmos DB
Azure On-prem Solution

Question 95)
Which ALTER DATABASE statement parameter allows a dedicated SQL pool to scale?

OVER
MODIFY
CHANGE
SCALE

Question 96)
Which of these is a step in flattening a nested schema?

COPY data
LOAD CSV file
CREATE parquet file
Explode Arrays

Question 97)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.
You can use account-level SAS to allow access to anything that a service-level SAS can allow, plus additional resources and abilities. For example, you could use this type of SAS… [?] (Select all that apply)

to allow an app to retrieve a list of files in a file system.
None of the listed options.
to create file systems.
to allow an app to download a file.

Question 98)
The Stream Analytics query language is a subset of which query language?

CQL
OPath
T-SQL
Gremlin
QUEL
MQL

Question 99)
The pace of change in both the capabilities of technologies, and the elastic nature of cloud services, means that new opportunities exist to evolve the data warehouse to handle modern workloads.
Which of the following are examples of these opportunities?
Select all options that apply

Insights through analytical dashboards
Increased flexibility for data volumes
Static data velocities
New varieties of data
Advanced analytics for all users

Question 100)
Scenario: You have been contracted by Contoso to advise their IT team on which API to use for the database model and type based on the following information.

Specifications:

• The application uses a NoSQL database to store data.
• The database uses the key-value and wide-column NoSQL database type.

Required: Developers need to access data in the database using an API.
Which of the following APIs should you recommend to the team?

Cassandra API
SQL API
MongoDB API
Gremlin API
Table API

Question 101)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.
Microsoft Azure Storage is a managed service that provides durable, secure, and scalable storage in the cloud. A single Azure subscription can host up to [A] storage accounts, each of which can hold [B] TB of data.

[A] 500, [B] 500
[A] 250, [B] 500
[A] 500, [B] 1000
[A] 200, [B] 500

Question 102)
Scenario: You are working in a department which requires preparation of data for ad hoc data exploration and analysis based on market fluctuations. The Department Head has tasked you with determining the most effective resource model to employ in Azure Synapse Analytics. Which one of the following resource models should you choose?

IoT Central
Pipelines
Databricks
Serverless
Dedicated

Question 103)
When data is loaded into Synapse Analytics dedicated SQL pools, the datasets are broken up and dispersed among the compute nodes for processing, and then written to a decoupled and scalable storage layer. This action is termed “sharding”.

The design decisions around how to split and disperse this data among the nodes, and then to the storage, is important for querying workloads. This is because the correct selection minimizes data movement, which is a primary cause of performance issues in an Azure Synapse dedicated SQL Pool environment.
Which of the following are valid table distribution types available in Synapse Analytics SQL Pools?

Merkle table distribution
Round robin distribution
Centralized table distribution
Hash distribution
Distributed table schema
Replicated tables

Question 104)
When loading data into Azure Synapse Analytics on a scheduled basis, it’s important to try to reduce the time taken to not perform the data load, and minimize the resources needed as much as possible to maintain good performance cost-effectively.

Which of the following are valid Strategies for managing source data files?
Select all options that apply.

Consolidate source files.
Maintaining a well-engineered Data Lake structure.
When loading large datasets, it’s best to use the compression capabilities of the file format.
Having well defined “zones” established for the data coming into the Data Lake and cleansing and transformation tasks that land the data you need in a curated and optimized state.

Question 105)
Azure Synapse Analytics can work by acting as the one stop shop to meet all your analytical needs in an integrated environment. Which of the following leverages the capabilities of Azure Data Factory and is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale?

Azure Synapse Link
Azure Cosmos DB
Apache Spark for Azure Synapse
Azure Synapse Pipelines
Azure Synapse SQL

Question 106)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Synapse Analytics is a cloud-based data platform that brings together enterprise data warehousing and Big Data analytics. It can process massive amounts of data and answer complex business questions with limitless scale.
Azure Synapse Analytics uses the [?] approach for bulk data.

Extract, Transform, and Load (ETL)
Automated Data Processing Equipment (ADPE)
Extract, Load, and Transform (ELT)
Atomicity, Consistency, Isolation, and Durability (ACID)

Question 107)
Within the context of Azure Databricks, sharing data from one worker to another can be a costly operation.

Spark has optimized this operation by using a format called Tungsten which prevents the need for expensive serialization and de-serialization of objects to get data from one JVM to another.
The data that is “shuffled” is in a format known as UnsafeRow, or more commonly, the Tungsten Binary Format.
What is created when we shuffle data?

A Pipeline
~~A Stage~~
~~A Lineage~~
A Stage boundary

Question 108)
Which of the following steps do pipelines in Azure Data Factory typically perform?

Select all options that apply

Connect and collect
Monitor
Treemap
Publish
DataCopy
Transform and enrich

Question 109)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

Azure Monitor provides base-level infrastructure metrics and logs for most Azure services. Azure diagnostic logs are emitted by a resource and provide rich, frequent data about the operation of that resource. Azure Data Factory (ADF) can write diagnostic logs in Azure Monitor.
Data Factory stores pipeline-run data for [?] days.

Question 110)
To create and manage Data Factory objects including datasets, linked services, pipelines, triggers, and integration runtimes, the user account that you use to sign into Azure must be a member of which of the following role groups? Select all that apply.

Network Manager role
Administrator role
Contributor role
Owner role
CDN Security Profile role
Custom role with required rights

Question 111)
Scenario: A teammate is working on solution for transferring data between a dedicated SQL Pool and a serverless Apache Spark Pool using the Azure Synapse Apache Spark Pool to Synapse SQL connector.

When could SQL Auth be used for this connection?

Never, it is not necessary to use SQL Auth when transferring data between a SQL or Spark Pool.
None of the listed options.
When you need a token-based authentication to a dedicated SQL outside of the Synapse Analytics workspace.
Always, anytime you want to transfer data between the SQL and Spark Pool

Question 112)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

All data within an Azure Cosmos DB container is partitioned based on the [?] and applies to both the transactional store and the analytical store. Boundaries for parallelizing workloads are based on this [?].

Foreign key
Partition key
Primary key
Index key

Question 113)
What does the APPROX_COUNT_DISTINCT Transact-SQL function do?

None of the listed options.
Calculates the approximate number of distinct records in a non-relational database.
Calculates the approximate number of distinct records in a relational database.
Approximate count on distinct executions within a specified time period on a specific endpoint.
Approximate execution using Hyperlog accuracy.

Question 114)
Azure Data Factory provides a variety of methods for ingesting data, and also provides a range of methods to perform transformations.

These methods are:

• Mapping Data Flows
• Compute Resources
• SSIS Packages

Mapping Data Flows provides a number of different transformation types that enable you to modify data. They are broken down into the following categories:

• Schema modifier transformations
• Row modifier transformations
• Multiple inputs/outputs transformations

Which of the following are valid transformations available in the Mapping Data Flow?
Select all options that apply

Union
Aggregate
Lookup
Filter
Create
Alter row

Question 115)
Scenario: You have been hired by Contoso to advise on the creation and implementation of a dimension table in Azure Data Warehouse.

Specifications:
• The dimension table will be less than 1 GB.

Required:
• Fastest available query time
• Minimize data movement

As the Azure expert, the IT team looks to you for direction. Which of the following should you advise them to utilize?

Round-robin
Heap
Replicated
Hash distributed

Question 116)
Which of the following statements describes a wide transformation?

A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers.
A wide transformation applies data transformation over several columns.
A wide transformation is where each input partition in the source data frame will contribute to sole output partition in the target data.
A wide transformation requires sharing data across workers. It does so by shuffling data.

Question 117)
Identify the missing word(s) in the following statement within the context of Microsoft Azure.

You can use a service-level SAS to allow access to specific resources in a storage account. For example, you could use this type of SAS… [?] (Select all that apply)

to allow an app to retrieve a list of files in a file system.
to create file systems.
All the listed options.
to allow an app to download a file.
None of the listed options.

Question 118)
Scenario: You are working in an Azure Databricks workspace. You want to filter based on the end of a column value using the Column Class. Specifically, you are looking at a column named verb and filtered by words ending with “ing.”
Which command filters based on the end of a column value as required?

df.filter(col(“verb”).endswith(“ing”))
df.filter(“verb like ‘_ing'”)
df.filter(“verb like ‘%ing'”)
df.filter().col(“verb”).like(“%ing”)

Question 119)
Scenario: While working on a project using Azure Data Factory, you route data rows to different streams based on matching conditions. Which transformation in Mapping Data Flow is used to perform this action?

Inspect
Select
Optimize
Lookup
Conditional Split

Question 120)
To provide a better authoring experience, Azure Data Factory allows you to configure version control software for easier change tracking and collaboration. Which of the below does Azure Data Factory integrate with? (Select all that apply)

Source Safe
BitBucket
Team Foundation Server
Launchpad
Git repositories
Google Cloud Source Repositories

Question 121)
Within Azure Synapse Link for Azure Cosmos DB, which Column-oriented store is optimized for queries?

Query store
Transactional store
Cosmos DB store
Analytical store

Question 122)
In the context of analytics, data streams are event data generated by sensors or other sources that can be analyzed by another technology. Analyzing a data stream is typically done to measure the state change of a component or to capture information on an area of interest.
Which of the following are valid approaches to processing data streams? Select all that apply.

All the listed options
On-demand
Near real time
Multiprocessing
Live

Question 123)
While Agile, CI/CD, and DevOps are different, they also support one another.

What does DevOps focus on?

Culture
Practices
~~Strategy~~
~~Development process~~

Question 124)
Where is the best place to monitor spark pools?

Monitor tab in Azure Synapse Studio within your Azure Synapse Workspace.
Azure Monitor from the Azure Portal linked to your Azure Synapse Workspace.
Monitor tab in Azure Advisor linked to your Azure Synapse Workspace.
Any of the listed options are equally proficient to monitor spark pools.

Question 125)
Scenario: You work in an organization where much of the transformation logic is currently held in existing SSIS packages that have been created on SQL Server. Since your boss is not familiar with Azure as well as you are, he tells you he has heard that Azure can lift and shift SSIS packages to execute them within Azure Data Factory to leverage existing work. He asks you “What do we need to setup in order to do this?”

Which of the options below is the correct response?

None of the listed options.
You must set up an Azure-SSIS integration runtime.
You must set up an Azure Stored procedure to execute the lift and shift.
Your boss is mistaken. Azure does not have the ability to lift and shift SSIS package to execute them within Azure Data Factory, it must be converted to AZ format and then ingested via Azure Storage.
You must set up a Self-hosted solution and then upload the data.

Question 126)
Which type of transactional database system would work best for product data?

ELT
OLAP
OLTP
ADPS
ETL

Question 127)
What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline.
In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo.
None of the listed options.
Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline.

Question 128)
What is an example of a branching activity used in control flows in Azure Data Factory?

Lookup- condition
If-condition
Having-condition
Where-condition
Until-condition

Question 129)
Activities within Azure Data Factory define the actions that will be performed on the data. There are three categories including:

• Data movement activities
• Data transformation activities
• Control activities

A Control Activity in Data Factory is defined in JSON format as follows:
JSON
{
“name”: “Control Activity Name”,
“description”: “description”,
“type”: ““,
“typeProperties”:
{
},
“dependsOn”:
{
}
}

Which of the JSON properties are required?
Selct all options that apply

dependsOn
name
type
description
typeProperties

Question 130)
Spark pools in Azure Synapse Analytics is one of Microsoft’s implementations of Apache Spark.

Which of the following statements is true about Spark pools in Azure Synapse Analytics?

Select all options that apply

~~The SparkContext connects to the Sparkle pool in Synapse Analytics. It is responsible for converting an application to an Excel file.~~
Once connected, Sparkle gets the executors on nodes in the pool. Those processes run computations and store data on your local machine.
The SparkContext can connect to the cluster manager, which allocates resources across applications. The cluster manager is Adobe Hadoop WOOL.
Spark applications act as independent sets of processes on a pool. It is coordinated by the SparkContext object in a main (driver) program.

Question 131)
Identify the missing word(s) in the following scenario within the context of Microsoft Azure.

A window function enables you to perform a mathematical equation on a set of data that is defined within a window. The mathematical equation is typically an aggregate function. However, instead of applying the aggregate function to all the rows in a table, it is first applied to a set of rows that are defined by the window function. The aggregate is then applied to the remaining rows.
One of the key components of window functions is the [?] clause. This clause determines the partitioning and ordering of a rowset before the associated window function is applied. That is, the [?] clause defines a window or user-specified set of rows within a query result set.

HAVING
WHERE
OVER
UNDER

Question 132)
Scenario: You are working in an Azure Databricks workspace, and you want to filter by a productType column where the value is equal to book.

Which one of the following commands meets the requirement by specifying a column value in a DataFrame’s filter?

df.filter(“productType = ‘book'”)
df.filter(“productType == ‘book'”)
df.filter(col(“productType”) == “book”)
df.col(“productType”).filter(“book”)

Question 133)
In Data Factory, an Activity defines the action to be performed. A Linked Service defines a target data store or a compute service. An Integration Runtime (IR) provides the bridge between the Activity and Linked Services.

To make use of the Azure-SSIS Integration Runtime, it is assumed that there is SSIS Catalog (SSISDB) deployed on a SQL Server SSIS instance. With that prerequisite met, the Azure-SSIS Integration Runtime can lift and shifting existing SSIS workloads
During the provisioning of the Azure-SSIS Integration Runtime, which of the following options must be specified?
Select all options that apply

Existing instance of Azure SQL Database to host the SSIS Catalog
Database (SSISDB) along with the service tier for the database
Maximum parallel executions per node
Private Link parameters
Node size
All the listed options

Question 134)
Scenario: Determine the type of Azure service required to fit the following specifications and requirements:

Data classification: Structured
Operations: Read-only, complex analytical queries across multiple databases
Latency and throughput: Some latency in the results is expected based on the complex nature of the queries.
Transactional support: Not required

~~Azure Blob Storage~~
Azure Route Table
Azure Cosmos DB
Azure SQL Database
Azure Queue Storage

Question 135)
dentify the missing word(s) in the following statement within the context of Microsoft Azure.

From a high level, the Azure Databricks service launches and manages Apache Spark clusters within your Azure subscription. Apache Spark clusters are groups of computers that are treated as a single computer and handle the execution of commands issued from notebooks.

Internally, Azure Kubernetes Service (AKS) is used to … [?]

auto-scale as needed based on your usage and the setting used when configuring the cluster.
specify the types and sizes of the virtual machines.
pull data from a specified data source.
run the Azure Databricks control-plane and data-planes via containers running on the latest generation of Azure hardware.
provide the fastest virtualized network infrastructure in the cloud.

Question 136)
Identify the missing word(s) in the following scenario within the context of Microsoft Azure:

When working with large data sets, it can take a long time to run the sort of queries that clients need. These queries can’t be performed in real time, and often require algorithms such as MapReduce that operate in parallel across the entire data set. The results are then stored separately from the raw data and used for querying.

One drawback to this approach is that it introduces latency. If processing takes a few hours, a query may return results that are several hours old. Ideally, you would like to get some results in real time (perhaps with some loss of accuracy) and combine these results with the results from the batch analytics.

The [?] is a big data processing architecture that addresses this problem by combining both batch- and real-time processing methods. It features an append-only immutable data source that serves as system of record. Timestamped events are appended to existing events (nothing is overwritten). Data is implicitly ordered by time of arrival.

Anaconda architecture
No-SQL architecture
Lambda architecture
Serverless architecture

Question 137)
What is a lambda architecture and what does it try to solve?

An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing.
An architecture that splits incoming data into two paths – a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.
None of the listed options.
An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today.

Question 138)
Init Scripts provide a way to configure cluster’s nodes. It is recommended to favor Cluster Scoped Init Scripts over Global and Named scripts.

“By placing the init script in /Databricks/init folder, you force the script’s execution every time any cluster is created or restarted by users of the workspace.”

Which of the following is best described by the above statement?

~~Interactive~~
Global
Cluster Named
Cluster Scoped

Question 139)
Where is the best place to monitor spark pools?

Monitor tab in Azure Synapse Studio within your Azure Synapse Workspace.
Monitor tab in Azure Advisor linked to your Azure Synapse Workspace.
Azure Monitor from the Azure Portal linked to your Azure Synapse Workspace.
Any of the listed options are equally proficient to monitor spark pools.

Niyander Tech

Prepare for DP-203: Data Engineering on Microsoft Azure Exam Quiz Answers

Prepare for DP-203: Data Engineering on Microsoft Azure Exam Quiz Answers

Full Practice Exam Quiz Answers

Leave a Reply Cancel reply

Main Pages

About Me