All Coursera Quiz Answers

Data Engineering with MS Azure Synapse Apache Spark Pools Week 3 | Course Practice Exam Answers

In this article i am gone to share Coursera Course: Data Engineering with MS Azure Synapse Apache Spark Pools Week 3 | Course Practice Exam Answers with you..

Enrol Link:  Data Engineering with MS Azure Synapse Apache Spark Pools

Data Engineering with MS Azure Synapse Apache Spark Pools Week 3 | Course Practice Exam Answers


 

Course Practice Exam Answers

Question 1)
Apache Spark pools in Azure Synapse Analytics benefit from which four of the following features?

  • Autoscale
  • Support for third party IDEs
  • REST APIs
  • Pre-Loaded Anaconda libraries
  • Real-time co-authoring

Question 2)
What needs to be created first when building an Apache Spark pool in Azure Synapse Analytics?

  • Workspace
  • Notebook
  • SQL Database

Question 3)
When creating an Apache Spark pool in Azure Synapse Analytics, the Spark Pool name must be unique within which of the following?

  • The Resource Group
  • The Workspace
  • Azure
  • The Subscription

Question 4)
Which of the following solutions can you utilize to create an embedded Apache Spark capability that can reside on the same platform as data warehouses and data integration capabilities, as well as integrate with other Azure services?

  • Azure HDInsight
  • Apache Spark for Azure Synapse
  • Apache Spark
  • Azure Databricks

Question 5)
Which two of the following features can you use to ingest data through Spark notebooks?

  • Primary Storage
  • Linked Service
  • Azure SQL
  • Azure Cosmos DB

Question 6)
You deploy the magic command %%spark. What type of query will this command execute against Spark context?

  • SparkSQL
  • Scala
  • Python
  • .Net for C#

Question 7)
Which two of the following languages does the IntelliSense support for Syntax Code Completion?

  • PySpark (Python)
  • .Net for Spark(C#)
  • SparkSQL
  • Spark(Scala)

Question 8)
DataFrames are used to perform which two of the following actions?

  • Extract large volumes of data from a wide variety of data sources.
  • Process data only in streaming data architecture.
  • Extract large volumes of data from an SQL Database only.
  • Process data only in batch data architecture.
  • Process data in either batch or streaming data architecture.

Question 9)
You input the following Python snippet into your code:

new_rows = [(‘CA’,22, 45000),(“WA”,35,65000) ,(“WA”,50,85000)]
demo_df = spark.createDataFrame(new_rows, [‘state’, ‘age’, ‘salary’])
demo_df.show()

The variable named demo_df performs which of the following actions?

  • It uses the spark.createDataFrame method and creates a variable named new_rows which creates the data in the code segment to store in the DataFrame.
  • It uses the spark.createDataFrame method referencing the new_rows variable in the first parameter. The second parameter defines the column heading names for the DataFrame as state, age, and salary.
  • It uses the spark.createDataFrame method to create a variable named new_rows which will store the values state, age, and salary.

Question 10)
You need to load data into an Apache Spark DataFrame from several different file types. Which three of the following storage services can you use to complete this action?

  • Serverless SQL Pool
  • Dedicated SQL pool
  • Azure Data Lake Store Generation 2
  • Primary Storage Account
  • Azure Storage Account

Question 11)
The Azure Synapse Apache Spark pool to Synapse SQL connector uses which of the following in SQL pools to efficiently transfer data between the Spark cluster and the Synapse SQL instance?

  • Azure Data Lake Storage Generation 2 and JSON.
  • Azure Data Lake Storage Generation 2 and PolyBase.
  • Azure Data Lake Storage Generation 2 and XML.

Question 12)
You need to use the Azure Synapse Studio notebook experience to develop and execute transformation pipelines. Which of the following languages can you use to execute this task?

  • Scala
  • JSON
  • SparkSQL
  • Python

Question 13)
Which of the following is used to load data into a table created within a dedicated SQL pool using Write API?

  • JSON
  • ORC
  • Polybase
  • Parquet

Question 14)
What is the minimum number of nodes allowed when creating an Apache Spark pool with Autoscaling?

  • 3
  • 2
  • 4
  • 1

Question 15)
How can you optimize your Apache Spark job?

  • Remove the Apache Spark Pool.
  • Remove all nodes.
  • Use bucketing.

 

Question 16)
Spark pools in Azure Synapse Analytics are compatible with which two of the following storage types?

  • Azure Data Lake Generation 1 Storage
  • Azure Data Lake Generation 2 Storage
  • Azure Storage
  • SQL Storage

Question 17)
You need to flatten nested structures and explode arrays with Apache Spark. What series of steps should you perform to complete these tasks?

  • Define a function
    Flatten nested schema
    Explode Arrays
    Flatten child nested schema

Question 18)
Which of the following can benefit from query optimization through Catalyst?

  • Resilient Distributed DataSets (RDDs)
  • DataFrames
  • Notebooks

Question 19)
Can you identify three features of Apache Spark?

  • Parallel Processing Framework
  • Distributed execution engine
  • Disk-based processing
  • In-memory processing

Question 20)
What is the default language of a new cell in Azure Synapse Studio?

  • Scala
  • PySpark
  • SQL
  • .NET for Spark

Question 21)
What are DataFrames?

  • DataFrames are a collection of data organized into named columns.
  • DataFrames optimize execution plans on queries that will access the data held in the DataFrame.
  • DataFrames enable Apache Spark to understand the schema of the data.
  • DataFrames are a collection of data organized into named Rows.

Question 22)
You can use the Azure Synapse Apache Spark to Synapse SQL connector to transfer data between which of the following?

  • Serverless Apache Spark pools and Dedicated SQL pools in Azure Synapse.
  • Dedicated Apache Spark pools and Serverless SQL pools in Azure Synapse.
  • Serverless Apache Spark pools and Serverless SQL pools in Azure Synapse.

Question 23)
Which of the following role memberships are required to successfully authenticate between two systems in Azure Synapse Analytics?

  • The account used needs to be a member of the Storage Blob Data Contributor role on the default storage account.
  • The account used needs to be a member of Storage Blob Data Contributor role in the database or SQL pool from which you to transfer data to or from.
  • The account used needs to be a member of db_exporter role in the database or SQL pool from which you to transfer data to or from.

Question 24)
You have a requirement to transfer data to a dedicated SQL pool that is outside of the workspace of Synapse Analytics. Which form of Authentication can you use to complete this task?

  • None of the above.
  • SQL Authentication Only.
  • Azure AD and SQL Authentication.
  • Azure AD only.

Question 25)
What three actions occur within existing nodes in Azure Synapse Analytics when you scale down Apache Spark pools?

  • Pending jobs will be lost.
  • Nodes to be scaled down will be shut down immediately regardless of current state.
  • Jobs that are still running will continue to run and finish.
  • Nodes to be scaled down will be put in a decommissioned state.
  • Pending jobs will be in a waiting state and scheduled for execution on fewer nodes.

Question 7)
Which of the following actions do you need to perform in order to directly reference data or variables in Azure Synapse Studio notebook using different languages?

  • Use a magic command for that language.
  • Create a new Notebook.
  • Do Nothing. You can reference data or variables directly using different languages in an Azure Synapse Studio notebook.
  • Create a temporary table so that it can be referenced across different languages.

Question 9)
You enter the following Python snippet into your code:

fromazureml.opendatasets import NycTlcYellow
data =NycTlcYellow()
data_df = data.to_spark_dataframe()
display(data_df.limit(10))

What is the purpose of the display(data_df.limit(10)) method?

  • Return batches of 10 rows of data from the data_df variable until all records are returned.
  • Limit the Dataframe to only retrieve 10 rows of data from the NycTLcYellow data source.
  • Return 10 rows of data from the data_df variable.