Download the Latest Professional-Data-Engineer Dumps - 2021 Professional-Data-Engineer Exam Questions [Q142-Q158]

Download the Latest Professional-Data-Engineer Dumps - 2021 Professional-Data-Engineer Exam Questions

Latest Google Professional-Data-Engineer Certification Practice Test Questions

NEW QUESTION 142
Which Java SDK class can you use to run your Dataflow programs locally?

A. MachineRunner
B. LocalPipelineRunner
C. LocalRunner
D. DirectPipelineRunner

Answer: D

Explanation:
DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests Reference: https://cloud.google.com/dataflow/java- sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner

NEW QUESTION 143
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

A. Perform hyperparameter tuning
B. Train a classifier with deep neural networks, because neural networks would always beat SVMs
C. Deploy the model and measure the real-world AUC; it's always higher because of generalization
D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Answer: D

NEW QUESTION 144
You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm. To do this you need to add a synthetic feature. What should the value of that feature be?

A. cos(X)
B. X^2+Y^2
C. X^2
D. Y^2

Answer: A

NEW QUESTION 145
After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.
What should you do?

A. Select random samples from the tables using the HASH() function and compare the samples.
B. Create stratified random samples using the OVER() function and compare equivalent samples from each table.
C. Select random samples from the tables using the RAND() function and compare the samples.
D. Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.

Answer: D

Explanation:
Full comparison with this option, rest are comparison on sample which doesn't ensure all the data will be ok.

NEW QUESTION 146
Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine.
The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do?

A. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.
B. Host a visualization tool on a VM on Google Compute Engine.
C. Run a local version of Jupiter on the laptop.
D. Grant the user access to Google Cloud Shell.

Answer: A

Explanation:
Datalab provides Jupyter for this kind of work.

NEW QUESTION 147
Which of these is not a supported method of putting data into a partitioned table?

A. Create a partitioned table and stream new records to it every day.
B. Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
C. If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
D. Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".

Answer: D

Explanation:
You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch. Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables

NEW QUESTION 148
Which SQL keyword can be used to reduce the number of columns processed by BigQuery?

A. SELECT
B. LIMIT
C. WHERE
D. BETWEEN

Answer: A

Explanation:
SELECT allows you to query specific columns rather than the whole table. LIMIT, BETWEEN, and WHERE clauses will not reduce the number of columns processed by BigQuery.
Reference: https://cloud.google.com/bigquery/launch-
checklist#architecture_design_and_development_checklist

NEW QUESTION 149
You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?

A. Cloud Dataflow
B. Cloud Composer
C. Cloud Functions
D. Cloud Scheduler

Answer: B

NEW QUESTION 150
You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes.
You've recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload.
What should you do?

A. Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and profile for the analytics workload.
B. Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.
C. Export Bigtable dump to GCS and run your analytical job on top of the exported files.
D. Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batch-analytics profile for the analytics workload.

Answer: D

NEW QUESTION 151
You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.
Which Google database service should you use?

A. Cloud Datastore
B. BigQuery
C. Cloud Bigtable
D. Cloud SQL

Answer: D

NEW QUESTION 152
You have an Apache Kafka Cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins.
What should you do?

A. Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.
B. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Source connector. Use a Dataflow job to read fron PubSub and write to GCS.
C. Deploy the PubSub Kafka connector to your on-prem Kafka cluster and configure PubSub as a Sink connector. Use a Dataflow job to read fron PubSub and write to GCS.
D. Deploy a Kafka cluster on GCE VM Instances with the PubSub Kafka connector configured as a Sink connector. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

Answer: A

NEW QUESTION 153
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly.
How should you optimize the cluster for cost?

A. Use SSDs on the worker nodes so that the job can run faster
B. Migrate the workload to Google Cloud Dataflow
C. Use pre-emptible virtual machines (VMs) for the cluster
D. Use a higher-memory node so that the job runs faster

Answer: B

NEW QUESTION 154
Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

A. Issue a command to restart the database servers.
B. Reduce the query frequency to once every hour until the database comes back online.
C. Retry the query every second until it comes back online to minimize staleness of data.
D. Retry the query with exponential backoff, up to a cap of 15 minutes.

Answer: D

Explanation:
Explanation
https://cloud.google.com/sql/docs/mysql/manage-connections#backoff

NEW QUESTION 155
Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are:
* The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured
* Support for publish/subscribe semantics on hundreds of topics
* Retain per-key ordering
Which system should you choose?

A. Firebase Cloud Messaging
B. Cloud Storage
C. Cloud Pub/Sub
D. Apache Kafka

Answer: D

Explanation:
These are the functionalities which are currently lagging/not-available with Pub/Sub.

NEW QUESTION 156
Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

A. Threading
B. Dimensionality Reduction
C. Serialization
D. Dropout Methods

Answer: D

Explanation:
Explanation
Reference
https://medium.com/mlreview/a-simple-deep-learning-model-for-stock-price-prediction-using-tensorflow-30505

NEW QUESTION 157
You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine. Which learning algorithm should you use?

A. Recurrent neural network
B. Linear regression
C. Feedforward neural network
D. Logistic classification

Answer: B

NEW QUESTION 158
......

Understanding functional and technical aspects of Google Professional Data Engineer Exam Operationalizing machine learning models

The following will be discussed here:

Measuring, monitoring, and troubleshooting machine learning models
Hardware accelerators (e.g., GPU, TPU)
Deploying an ML pipeline
Continuous evaluation
Common sources of error (e.g., assumptions about data)
Ingesting appropriate data
Distributed vs. single machine
Impact of dependencies of machine learning models
Retraining of machine learning models (Cloud Machine Learning Engine, BigQuery ML, Kubeflow, Spark ML)
Customizing ML APIs (e.g., AutoML Vision, Auto ML text)
Operationalizing machine learning models
Machine learning terminology (e.g., features, labels, models, regression, classification, recommendation, supervised and unsupervised learning, evaluation metrics)
ML APIs (e.g., Vision API, Speech API)
Leveraging pre-built ML models as a service
Conversational experiences (e.g., Dialogflow)
Use of edge compute
Choosing the appropriate training and serving infrastructure

Verified Professional-Data-Engineer Dumps Q&As - 1 Year Free & Quickly Updates: https://www.dumpexams.com/Professional-Data-Engineer-real-answers.html

Get 2021 Updated Free Google Professional-Data-Engineer Exam Questions & Answer: https://drive.google.com/open?id=1i6KC19-SvBerk-AQU8GaO1HAYpPEjMzW

Download the Latest Professional-Data-Engineer Dumps - 2021 Professional-Data-Engineer Exam Questions [Q142-Q158]

Understanding functional and technical aspects of Google Professional Data Engineer Exam Operationalizing machine learning models

Related Articles

Latest Dumps Exams

Useful Links

Contact Us