Microsoft Azure Databricks for Data Engineering Coursera Quiz Answers
In this article i am gone to share Microsoft Azure Databricks for Data Engineering All Weeks Quiz Answers with you..
Enrol Link: Microsoft Azure Databricks for Data Engineering
Microsoft Azure Databricks for Data Engineering Coursera Quiz Answers
WEEK 1 QUIZ ANSWERS
Knowledge check
Question 1)
Apache Spark is a unified processing engine that can analyze big data with which of the following features? Select all that apply.
- Support for multiple Drivers running in parallel on a cluster
- Graph Processing
- SQL
- Real-time stream analysis
- Machine Learning
Question 2)
Which of the following Databricks features are not Open-Source Spark? Select all that apply.
- Databricks Workspace
- Databricks Runtime
- MLFlow
- Databricks Workflows
Question 3)
Apache Spark notebooks allow which of the following? Select all that apply.
- Rendering of formatted text
- Display graphical visualizations
- Create new Workspace
- Execution of code
Question 4)
In Azure Databricks when creating a new Notebook, the default languages available to select from are? Select all that apply.
- R
- Java
- Python
- Scala
- SQL
Question 5)
If your notebook is attached to a cluster, you can carry out which of the following from within the notebook? Select all that apply.
- Restart the cluster
- Delete the cluster
- Detach your notebook from the cluster
- Attach to another cluster
Visit this link: Microsoft Azure Databricks for Data Engineering Week 1 | Test prep Quiz Answers
WEEK 2 QUIZ ANSWERS
Knowledge check
Question 1)
How do you list files in DBFS within a notebook?
- %fs ls /my-file-path
- ls /my-file-path
- %fs dir /my-file-path
Question 2)
How do you infer the data types and column names when you read a JSON file?
- spark.read.inferSchema(“true”).json(jsonFile)
- spark.read.option(“inferData”, “true”).json(jsonFile)
- spark.read.option(“inferSchema”, “true”).json(jsonFile)
Question 3)
Which of the following SparkSession functions returns a DataFrameReader
- readStream(..)
- read(..)
- createDataFrame(..)
- emptyDataFrame(..)
Question 4)
When using a notebook and a spark session. We can read a CSV file. Which of the following can be used to view the first couple thousand characters of a file?
- %fs ls /mnt/training/wikipedia/pageviews/
- %fs dir /mnt/training/wikipedia/pageviews/
- %fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv
Visit this link: Microsoft Azure Databricks for Data Engineering Week 2 | Test prep Quiz Answers
WEEK 3 QUIZ ANSWERS
Knowledge check
Question 1)
Which of the following SparkSession functions returns a DataFrameReader
- createDataFrame(..)
- emptyDataFrame(..)
- .read(..)
- .readStream(..)
Question 2)
When using a notebook and a spark session. We can read a CSV file.
Which of the following can be used to view the first couple of thousand characters of a file
- %fs ls /mnt/training/wikipedia/pageviews/
- %fs head /mnt/training/wikipedia/pageviews/pageviews_by_second.tsv
- %fs dir /mnt/training/wikipedia/pageviews/
Question 3)
Which DataFrame method do you use to create a temporary view?
- createTempView()
- createTempViewDF()
- createOrReplaceTempView()
Question 4)
How do you define a DataFrame object?
- Use the createDataFrame() function
- Use the DF.create() syntax
- Introduce a variable name and equate it to something like myDataFrameDF =
Question 5)
How do you cache data into the memory of the local executor for instant access?
- .cache()
- .save().inMemory()
- .inMemory().save()
Question 6)
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?
- IPGeocodeDF = spark.read.parquet(“dbfs:/mnt/training/ip-geocode.parquet”)
- IPGeocodeDF = parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
- IPGeocodeDF = spark.parquet.read(“dbfs:/mnt/training/ip-geocode.parquet”)
Knowledge check
Question 1)
Among the most powerful components of Spark are Spark SQL. At its core lies the Catalyst optimizer. When you execute code, Spark SQL uses Catalyst’s general tree transformation framework in four phases. In which order are these phases carried out?
- 1: analyzing a logical plan to resolve references
2. logical plan optimization
3: physical planning
4. code generation to compile parts of the query to Java bytecode
Question 2)
Which of the following statements describes a wide transformation?
- A wide transformation requires sharing data across workers. It does so by shuffling data.
- A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers
- A wide transformation applies data transformation over a large number of columns
Question 3)
Which of the following statements describes a narrow transformation?
- Can be applied per partition/worker with no need to share or shuffle data to other workers
- Requires sharing data across workers and by shuffling data.
- Applies data transformation over a large number of columns
Question 4)
Which feature of Spark determines how your code is executed?
- Java Garbage Collection
- Tungsten Record Format
- Catalyst Optimizer
Question 5)
Which feature of Spark of optimization is used in shuffling operations during wide transformations?
- Lazy Execution
- Catalyst Optimizer
- Tungsten Record Format
Question 6)
If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformations to be analyzed?
- Java Garbage Collection
- Tungsten Record Format
- Lazy Execution
Visit this link: Microsoft Azure Databricks for Data Engineering Week 3 | Test prep Quiz Answers
WEEK 4 QUIZ ANSWERS
Knowledge check
Question 1)
Which of the following formats are supported when importing files into an Azure Databricks notebook,? Select all that apply.
- .ORC
- .html
- .scala
- .Zip
- .dbc
- .Yaml
Question 2)
Examine the following code. From the options below select the correct syntax to complete line 4 which will return an instance of a DataFrame in a Spark notebook in Azure Databricks.
1: pagecountsEnAllDF = (spark
2: .read
3: ____________________________ # Returns an instance of DataFrame
4: .cache()
5: )
6: print(pagecountsEnAllDF)
- .cache(parquetFile)
- .DataFrame(parquetFile)
- .parquet(parquetFile)
- .read(parquetFile)
Question 3)
Examine the following piece of code taken from a notebook in an Azure Databricks.
Complete line 4 so that 15 rows of data will be displayed, and the columns will not be truncated.
1: sortedDF = (pagecountsEnAllDF
2: .orderBy(“requests”)
3:
4: SortedDF. __________
- sortedDF.show(15, False)
- sortedDF.print(15, False)
- sortedDF.print(15)
- sortedDF.show(15)
Question 4)
Which command will order by a column in descending order?
- df.orderBy(“requests desc”)
- df.orderBy(col(“requests”).desc())
- df.orderBy(“requests”).desc()
Question 5)
Which command specifies a column value in a DataFrame’s filter? Specifically, filter by a productType column where the value is equal to book?
- df.filter(“productType = ‘book'”)
- df.filter(col(“productType”) == “book”)
- df.col(“productType”).filter(“book”)
Question 6)
When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with “ing”.
- df.filter(col(“verb”).endswith(“ing”))
- df.filter().col(“verb”).like(“%ing”)
- df.filter(“verb like ‘%ing'”)
Knowledge check
Question 1)
Which of the listed methods for renaming a DataFrame’s column are correct? Select two options.
- C: df.toDF(“dateCaptured”)
- df.select(col(“timestamp”).alias(“dateCaptured”))
- df.alias(“timestamp”, “dateCaptured”)
Question 2)
You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?
- df.groupBy(col(“storefront”)).avg(“completedTransactions”)
- df.groupBy(col(“storefront”)).avg(col(“completedTransactions”))
- df.select(col(“storefront”)).avg(“completedTransactions”)
Question 3)
In Azure Databricks you are about to do some ETL on a file you have received from a customer. The file contains data about people, including:
first, middle, and last names
gender
birth date
Social Security number
Salary
You discover that the file contains some duplicate records and you have been instructed to remove any duplicates. The dropDuplicates() command will more than likely create a shuffle. To help reduce the number of post-shuffle partitions which of the following commands should you run?
- spark.conf.set(“spark.sql.partitions”, 8)
- spark.sql.conf.set(“spark.shuffle.partitions”, 8)
- spark.conf.set(“spark.sql.shuffle.partitions”, 8)
Question 4)
Which of the following syntax will successfully display the year portion for a column named capturedAt and formatted as a Timestamp column?
- .select( year( col(“capturedAt”)) )
- .select( year (“capturedAt”)
- .select(col(“capturedAt”)year)
Question 5)
You need to change a column name from “dob” to “DateOfBirth” on a spark DataFrame. Which of the following syntax is valid?
- .ColumnRename(“dob”,”DateOfBirth”)
- .RenameColumn(“dob”,”DateOfBirth”)
- .withColumnRenamed(“dob”,”DateOfBirth”)
Visit this link: Microsoft Azure Databricks for Data Engineering Week 4 | Test prep Quiz Answers
WEEK 5 QUIZ ANSWERS
Knowledge check
Question 1)
True or False?
ETL/ELT workflows including analytics workloads in Azure Databricks can be operationalized using Azure Data Factory pipelines.
- False
- True
Question 2)
When you create an Azure Databricks service, a “Databricks appliance” is deployed as an Azure resource in your subscription. When a Databricks appliance is deployed into Azure which of the following resources are created? Select all that apply.
- Virtual Network
- Azure SQL Database
- Network security Group
Question 3)
In Azure Data Bricks the Blob Storage account provides default file storage within the workspace referred to as DBFS.
What does DBFS stand for?
- Database File system
- Databricks File System
- Data Block File System
Question 4)
In Azure Databricks when ADLS Passthrough is configured on a standard cluster you must set which of the following?
- Group Access
- Single User Access
- Multiple Users
Question 5)
By default, all users can create and modify clusters unless an administrator enables cluster access control. With cluster access control, permissions determine a user’s abilities. There are four permission levels for a cluster. Select the correct four permissions.
- Can Edit
- Can Read
- Can Attach To
- Can Manage
- No Permissions
- Can Restart
Question 6)
Users access Azure Databricks workspace with an Azure AD account
Is the following statement True or False?
The user’s Azure AD account has to be added to the Azure Databricks workspace before they can access it.
- True
- False
Visit this link: Microsoft Azure Databricks for Data Engineering Week 5 | Test prep Quiz Answers
WEEK 6 QUIZ ANSWERS
Knowledge check
Question 1)
Delta Lake provides snapshots of data enabling developers to access and revert to earlier versions of data for audits, rollbacks or to reproduce experiments. This functionality is referred to as?
- Schema Enforcement
- Time Travel
- Schema Evolution
- ACID Transactions
Question 2)
One of the core features of Delta Lake is performing upserts. Which of the following statements is true in regard to Upsert?
- Upsert is a new DML statement for SQL syntax
- UpSert is literally TWO operations. Update / Insert
- Upsert is supported in traditional data lakes
Question 3)
When discussing Delta Lake, there is often a reference to the concept of Bronze, Silver and Gold tables. These levels refer to the state of data refinement as data flows through a processing pipeline and are conceptual guidelines. Based on these table concepts the refinements in Silver tables generally relate to which of the following?
- Raw data (or very little processing)
- Data that is directly queryable and ready for insights
- Highly refined views of the data
Question 4)
What is the Databricks Delta command to display metadata?
- SHOW SCHEMA tablename
- DESCRIBE DETAIL tableName
- MSCK DETAIL tablename
Question 5)
How do you perform UPSERT in a Delta dataset?
- Use UPSERT INTO my-table /MERGE
- Use UPSERT INTO my-table
- Use MERGE INTO my-table USING data-to-upsert
Question 6)
What optimization does the following command perform: OPTIMIZE Students ZORDER BY Grade?
- Ensures that all data backing, for example, Grade=8 is colocated, then updates a graph that routes requests to the appropriate files.
- Ensures that all data backing, for example, Grade=8 is colocated, then rewrites the sorted data into new Parquet files.
- Creates an order-based index on the Grade field to improve filters against that field.
Knowledge check
Question 1)
The lambda architecture is a big data processing architecture combining both batch and real-time processing methods and features an append-only immutable data source.Which of the following are features of an append-only immutable data source? Select all that apply.
- Data is implicitly ordered by time of arrival
- Timestamped events are appended to existing events
- Timestamped events overwrite existing events
- serves as system of record
Question 2)
Delta Lake Architecture improves upon the traditional Lambda architecture through a unified pipeline that allows you to combine batch and streaming workflows through a shared filestore with ACID-compliant transactions. What do the letters ACID stand for? Select 4 options.
- Durability
- Isolation
- Agile
- Consistency
- Atomicity
- Concurrency
- Implicit
- Desirable
Question 3)
In the Delta Lake architecture, the refinement of the data is often referred to as Bronze, Silver and Gold Tables. Which of the following tables provide business level aggregates often used for reporting and Dashboarding?
- Silver
- Bronze
- Gold
Question 4)
What is a lambda architecture and what does it try to solve?
- An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today.
- An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing.
- An architecture that splits incoming data into two paths – a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.
Question 5)
What command should be issued to view the list of active streams?
- Invoke spark.view.active
- Invoke spark.streams.show
- Invoke spark.streams.active
Question 6)
What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?
- .writeStream.format(“delta”).checkpoint(“location”, checkpointPath) …
- .writeStream.format(“parquet”).option(“checkpointLocation”, checkpointPath) …
- .writeStream.format(“delta”).option(“checkpointLocation”, checkpointPath) …
Visit this link: Microsoft Azure Databricks for Data Engineering Week 6 | Test prep Quiz Answers
WEEK 7 QUIZ ANSWERS
Knowledge check
Question 1)
Stream processing is where you continuously incorporate new data into Data Lake storage and compute results. Which of the following would be examples of Stream processing?
- Bank Card Processing
- Invoicing
- Game play events
- Monthly Payroll processing
- IoT Device Data
Question 2)
When creating a new event hub in the Azure Portal you are required to specify a Namespace name. The namespace name must be unique in which of the following?
- Tenant Only
- Azure
- Subscription only
- Resource group only
Question 3)
When doing a write stream command, what does the outputMode(“append”) option do?
- The append mode replaces existing records and updates aggregates
- The append mode allows records to be updated and changed in place
- The append outputMode allows records to be added to the output sink
Question 4)
In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?
- spark.stream.read
- spark.read
- spark.readStream
Question 5)
What happens if the command option(“checkpointLocation”, pointer-to-checkpoint directory) is not specified?
- When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch.
- The streaming job will function as expected since the checkpointLocation option does not exist
- It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict
Question 6)
Select the correct option to complete the statement below:
In Azure Databricks every streaming DataFrame must have a schema. That is the definition of column names and data types. For file based streaming sources the schema is __________.
- Both Defined for you and can be user defined if required
- Defined for you
- User Defined
Knowledge check
Question 1)
In Azure Databricks when creating a new user access token, the Lifetime setting of the access token can be manually set. What is the default Lifetime (Days) value when creating a new access token?
- 60 Days
- 120 days
- 90
- 30 Days
Question 2)
In Azure Databricks when creating a new user access token, the Lifetime setting of the access token can be manually set. If the Token Lifetime is unspecified what will be the Lifetime(Days) of the token?
- 90 Days
- 120 Days
- 30 Days
- Indefinite
- 60 Days
Question 3)
True or False?
In Azure Databricks, personal access tokens can be used for secure authentication to the Databricks API instead of passwords. After a new token is generated, it can be viewed by going back to the user settings from where it was generated.
- False
- True
Question 4)
What’s the purpose of linked services in Azure Data Factory?
- To link data stores or computer resources together for the movement of data between resources
- To represent a processing step in a pipeline
- To represent a data store or a compute resource that can host execution of an activity
Question 5)
How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?
- Use the new API endpoint option on a notebook in Databricks and provide the parameter name
- Deploy the notebook as a web service in Databricks, defining parameter names and types
- Use notebook widgets to define parameters that can be passed into the notebook
Question 6)
What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn’t running when the cluster is called by Data Factory?
- Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity
- If the target cluster is stopped, Databricks will start the cluster before attempting to execute
- The Databricks activity will fail in Azure Data Factory – you must always have the cluster running
Visit this link: Microsoft Azure Databricks for Data Engineering Week 7 | Test prep Quiz Answers
WEEK 8 QUIZ ANSWERS
Knowledge check
Question 1)
Azure DevOps is a collection of services that provide an end-to-end solution for the five core practices of DevOps. The five core practices of DevOps as defined by Microsoft are?
- Planning and Tracking
- Program Management
- Monitoring and Operations
- Build and Test
- Delivery
- Project Management
- Development
- Scoping
Question 2)
In an Azure DevOps project creating a release pipeline provides which of the following portions of CI/CD?
- CD
- CI
Question 3)
What does the CD in CI/CD mean?
- Both are correct
- Continuous Delivery
- Continuous Deployment
Question 4)
What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?
- A Build pipeline
- An Artifact pipeline
- A Release pipeline
Question 5)
What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?
- Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
- Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline
- In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo
Question 6)
In an Azure DevOps project creating a build pipeline provides which of the following portions of CI/CD
- CD
- CI
Knowledge check
Question 1)
What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?
- Add the client IP address to the firewall’s allowed IP addresses list and use the correctly formatted ConnectionString
- Create a database master key and configure the firewall to enable Azure services to connect
- Use a correctly formatted ConnectionString and create a database master key
Question 2)
Which of the following is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?
- df.write.mode(“overwrite”).option(“…”).option(“…”).save()
- df.write.format(“com.databricks.spark.sqldw”).mode(“overwrite”).option(“…”).option(“…”).save()
- df.write.format(“com.databricks.spark.sqldw”).overwrite().option(“…”).option(“…”).save()
Question 3)
The Azure Synapse Connector uses Azure Blob Storage as intermediary storage and using PolyBase in Synapse enables MPP reads and writes to Synapse from Azure Databricks. However, the Synapse connector is more suited to ETL than to interactive queries. For interactive and ad-hoc queries, data should be extracted into which of the following?
- Azure Data Factory table
- Azure Databricks Delta table
- Azure SQL database Table
Question 4)
You can access Azure Synapse from Databricks using the Azure Synapse connector which uses three types of network connections.
Which of the following connections are used by Synapse Connector? Select all that apply.
- Spark driver and executors to Azure storage account
- Databricks connector to Spark driver
- Azure Storage account to Databricks connection
- Spark driver to Azure Synapse
- Azure Synapse to Azure storage account
Question 5)
You can access Azure Synapse from Databricks using the Azure Synapse connector and it is recommended that the connection strings use Secure Sockets Layer (SSL) encryption for all data sent between the Spark driver and the Azure Synapse instance through the JDBC connection. To verify that SSL encryption is enabled, you should verify that which of the following is set in the connection string?
- encrypt=on
- encrypt=enabled
- encrypt=active
- encrypt=true
Knowledge check
Question 1)
Select two items from the following options to complete this statement correctly:
Azure Databricks uses Azure Active Directory (AAD) as the exclusive Identity Provider. Any AAD member assigned to the ________ or ________ role can deploy Databricks and is automatically added to the ADB members list upon first login.
- Reader
- Owner
- Contributor
- User Access Administrator
Question 2)
Azure Databricks is a multitenant service and to provide fair resource sharing to all regional customers, it imposes limits on API calls. What is currently the restrictions on the maximum number of notebooks or execution contexts that can be attached to a cluster?
- 100
- 200
- 150
- No Limit
Question 3)
Azure Databricks deployments are built on top of the Azure infrastructure and currently have default restrictions or Azure limits. Currently, what is the maximum number of storage accounts per region per subscription in Azure Databricks?
- 1000
- 150
- 500
- 250
Question 4)
Azure Databrick jobs use clusters and different types of jobs demand different types of cluster resources. When training machine learning models you should consider using which of the following?
- Autoscaling features
- Computing optimized VMs
- Memory optimized VMs
- General purpose VMs
Question 5)
What is SCIM?
- An open standard that enables users to bring their own auth key to the Databricks environment
- An open standard that enables organizations to import both groups and users from Azure Active Directory into Azure Databricks
- An optimization that removes orphaned data from a given dataset
Question 6)
If mounting an Azure Data Lake Storage (ADLS) account to a workspace, what cluster feature must be used to have ACLs within ADLS applied to the user executing commands in a notebook?
- Enable SCIM
- Set spark.config.adls.impersonateuser(true)
- Enable ADLS Passthrough on a cluster.
Visit this link: Microsoft Azure Databricks for Data Engineering Week 8 | Test prep Quiz Answers
WEEK 9 QUIZ ANSWERS
Visit this link: Microsoft Azure Databricks for Data Engineering Week 9 | Course Practice Exam Quiz Answers