Coursera Answers

Data Engineering with MS Azure Synapse Apache Spark Pools Coursera Quiz Answers

In this article i am gone to share Data Engineering with MS Azure Synapse Apache Spark Pools Coursera Quiz Answers with you..

Enrol Link:ย  Data Engineering with MS Azure Synapse Apache Spark Pools

Data Engineering with MS Azure Synapse Apache Spark Pools Coursera Quiz Answers


 

WEEK 1 QUIZ ANSWERS

Knowledge check

Question 1)
Which three of the following are features of the Apache Spark application?

  • Distributed execution engine
  • Disk-based processing
  • Parallel Processing Framework
  • In-memory processing

Question 2)
Which one of the following objects is responsible for allocating resources across applications in an Apache Spark pool?

  • Executors
  • SparkContext
  • Nodes
  • Cluster Manager

Question 3)
Apache Spark pools in Azure Synapse Analytics are compatible with which of the following types of storage?

  • Azure Storage
  • Azure Data Lake Generation 1 Storage
  • SQL Storage
  • Azure Data Lake Generation 2 Storage

Question 4)
You need to manage an end-to-end big data project using one single platform. Which of the following data services is best suited to this task?

  • Apache Spark
  • Azure HDInsight
  • Apache Spark for Azure Synapse
  • Azure Databricks

Question 5)
Which of the following is an element of an Apache Spark Pool in Azure Synapse Analytics?

  • Azure HDInsight
  • Spark Instance
  • Apache Spark Console

Question 6)
You are tasked with creating an Apache Spark pool. Which three of the following parameters do you need to specify in the Create Apache pool screen in the Azure portal?

  • Number of Nodes
  • Resource Group
  • Apache Spark pool name
  • Node size

 

Knowledge check

Question 1)
You need to ingest data through Apache Spark notebooks. Which two of the following features can you use to carry out this task?

  • Azure Cosmos DB
  • Azure SQL
  • Primary Storage
  • Linked Service

Question 2)
What is the default language of a new cell in Azure Synapse Studio?

  • PySpark
  • SQL
  • Scala
  • .NET for Spark

Question 3)
Which of the following languages are supported in the notebook environment within Azure Synapse Analytics Spark pools?

  • Spark SQL
  • Spark (Scala)
  • .NET Spark (C#)
  • JSON
  • PySpark (Python)
  • YAML

Question 4)
Your Azure Studio notebook needs to be able to reference data or variables directly using different languages. Which of the following actions do you need to perform to enable this?

  • Use a magic command for that language.
  • You donโ€™t need to do anything as you can reference data or variables directly using different languages in an Azure Synapse Studio notebook.
  • Create a temporary table so that it can be referenced across different languages.
  • Create a new Notebook.

Question 5)
Azure Synapse Studio notebooks are based on which one of the following?

  • Apache Spark pool
  • Dedicated SQL pool
  • Apache Spark

Question 6)
Which one of the following actions should you take to save all notebooks in Azure Synapse studio?

  • Select the Publish button on the notebook command bar.
  • Select the Publish all button on the workspace command bar.
  • Press CTRL + S.

 

Knowledge check

Question 1)
Which three of the following are features of DataFrames?

  • DataFrames are a collection of data organized into named Rows.
  • DataFrames optimize execution plans on queries that will access the data held in the DataFrame.
  • DataFrames are a collection of data organized into named columns.
  • DataFrames enable Apache Spark to understand the schema of the data.

Question 2)
You input the following Python code snippet into your workspace:

new_rows = [(‘CA’,22, 45000),(“WA”,35,65000) ,(“WA”,50,85000)]
demo_df = spark.createDataFrame(new_rows, [‘state’, ‘age’, ‘salary’])
demo_df.show()

The variable named demo_df above is used to do which of the following?

  • It uses the spark.createDataFrame method and creates a variable named new_rows which creates the data in the code segment to store in the DataFrame.
  • It uses the spark.createDataFrame method to create a variable named new_rows which will store the values ‘state’, ‘age’, and ‘salary’.
  • It uses the spark.createDataFrame method referencing the new_rows variable in the first parameter. The second parameter defines the column heading names for the DataFrame as state, age, and salary.

Question 3)
You input the following Python snippet into your code:

from azureml.opendatasets import NycTlcYellow
data = NycTlcYellow()
data_df = data.to_spark_dataframe()
display(data_df.limit(10))

What is the purpose of the display(data_df.limit(10)) method?

  • Return batches of 10 rows of data from the data_df variable until all records are returned.
  • Limit the Dataframe to retrieving 10 rows of data from the NycTLcYellow data source.
  • Return 10 rows of data from the data_df variable

Question 4)
Select the correct series of steps to flatten nested structures and explode arrays with Apache Spark

  • Define a function
    Flatten nested schema
    Explode Arrays
    Flatten child nested schema

Question 5)
Which of these actions should you perform to flatten a nested schema?

  • Create a Parquet file.
  • Load a CSV file.
  • Explode Arrays.

Question 6)
Which two of the following actions do DataFrames perform?

  • Extract large volumes of data from a wide variety of data sources.
  • Process data only in streaming data architecture.
  • Process data in either batch or streaming data architecture.
  • Extract large volumes of data from an SQL Database only.
  • Process data only in batch data architecture.

 

Visit this link:ย  Data Engineering with MS Azure Synapse Apache Spark Pools Week 1 | Test prep Quiz Answers

 


 

WEEK 2 QUIZ ANSWERS

Knowledge check

Question 1)
The interoperability between Apache Spark and SQL helps you to directly explore and analyse which three of the following types of files?

  • JSON
  • TSV
  • YAML
  • Parquet
  • CSV

Question 2)
Which of the following features in SQL pools is used to efficiently transfer data between the Spark cluster and the Synapse SQL instance?

  • Azure Data Lake Storage Generation 2 and XML
  • Azure Data Lake Storage Generation 2 and PolyBase
  • Azure Data Lake Storage Generation 2 and JSON

Question 3
SQL and Apache Spark share the same underlying metadata store.

  • True
  • False

Question 4)
Which of the following features is used to load the data into a table created by the Write API in the dedicated SQL pool?

  • JSON
  • Parquet
  • Polybase
  • ORC

Question 5)
You have a requirement to transfer data to a dedicated SQL pool that is outside of the Azure Synapse Analytics workspace. Which form of Authentication can be used to complete this task?

  • Azure AD and SQL Authentication
  • None of the above
  • Azure AD only
  • SQL Authentication Only

Question 6)
The Azure Synapse Apache Spark to Synapse SQL connector supports which one of the following languages?

  • Python
  • Scala
  • .Net
  • SQL

 

Knowledge check

Question 1)
The Apache Spark history server can be accessed directly from which of the following Synapse Studio Tabs.

  • Develop
  • Monitor
  • Data
  • Manage

Question 2)
Which of the following features can benefit from query optimization through Catalyst when optimizing Apache Spark jobs in Azure Synapse Analytics?

  • Notebooks
  • DataFrames
  • Resilient Distributed DataSets (RDDs)

Question 3)
You need to specify the minimum number of nodes for an Apache Spark pool with autoscaling. What is the minimum number of nodes allowed?

  • 4
  • 1
  • 2
  • 3

Question 4)
You scale down Apache Spark pools in Azure Synapse Analytics. What happens to existing nodes?

  • Nodes to be scaled down will be shut down immediately regardless of current state.
  • Nodes to be scaled down will be put in a decommissioned state.
  • Jobs that are still running will continue to run until completion.
  • Pending jobs will be lost.
  • Pending jobs will be in a waiting state and scheduled for execution on fewer nodes.

Question 5)
Which of the following tasks can you perform to optimize an Apache Spark job?

  • Use bucketing.
  • Remove all nodes.
  • Remove the Apache Spark Pool.

Question 6)
You encounter a slow performing shuffle job. Which of the following is a possible cause?

  • Bucketing
  • Data skew
  • Enablement of autoscaling

 

Visit this link:ย  Data Engineering with MS Azure Synapse Apache Spark Pools Week 2 | Test prep Quiz Answers

 


 

WEEK 3 QUIZ ANSWERS

Visit this link:ย  Data Engineering with MS Azure Synapse Apache Spark Pools Week 3 | Course Practice Exam Answers