DumpExams is an authorized company offering valid and latest dump exams & dumps VCE materials. Our dump exams & dumps VCE materials are high-quality; our passing rate is higher than others.

TAKE Google Cloud Certified Professional-Data-Engineer PRACTICE QUESTIONS FOR AMAZING RESULTS [Q202-Q222]

Share

TAKE Google Cloud Certified Professional-Data-Engineer PRACTICE QUESTIONS FOR AMAZING RESULTS

 Google Professional-Data-Engineer Exam Dumps Are Essential To Get Good Marks


Google Professional-Data-Engineer certification is ideal for data engineers who want to demonstrate their expertise in using Google Cloud technologies to develop and manage data pipelines. Google Certified Professional Data Engineer Exam certification is also suitable for individuals who want to enhance their career prospects in the field of data engineering. By passing the exam, candidates can prove their proficiency in designing, building, and maintaining data processing systems using Google Cloud services.

 

NEW QUESTION # 202
You need to compose visualization for operations teams with the following requirements:
Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute) The report must not be more than 3 hours delayed from live data.
The actionable report should only show suboptimal links.
Most suboptimal links should be sorted to the top.
Suboptimal links can be grouped and filtered by regional geography.
User response time to load the report must be <5 seconds.
You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

  • A. Look through the current data and compose a small set of generalized charts and tables bound to criteria filters that allow value selection.
  • B. Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizes the data across each criteria, and then renders results using the Google Charts and visualization API.
  • C. Look through the current data and compose a series of charts and tables, one for each possible combination of criteria.
  • D. Export the data to a spreadsheet, compose a series of charts and tables, one for each possible combination of criteria, and spread them across multiple tabs.

Answer: A


NEW QUESTION # 203
You are designing the architecture of your application to store data in Cloud Storage. Your application consists of pipelines that read data from a Cloud Storage bucket that contains raw data, and write the data to a second bucket after processing. You want to design an architecture with Cloud Storage resources that are capable of being resilient if a Google Cloud regional failure occurs. You want to minimize the recovery point objective (RPO) if a failure occurs, with no impact on applications that use the stored dat a. What should you do?

  • A. Adopt two regional Cloud Storage buckets, and update your application to write the output on both buckets.
  • B. Adopt two regional Cloud Storage buckets, and create a daily task to copy from one bucket to the other.
  • C. Adopt a dual-region Cloud Storage bucket, and enable turbo replication in your architecture.
  • D. Adopt multi-regional Cloud Storage buckets in your architecture.

Answer: C

Explanation:
To ensure resilience and minimize the recovery point objective (RPO) with no impact on applications, using a dual-region bucket with turbo replication is the best approach. Here's why option D is the best choice:
Dual-Region Buckets:
Dual-region buckets store data redundantly across two distinct geographic regions, providing high availability and durability.
This setup ensures that data remains available even if one region experiences a failure.
Turbo Replication:
Turbo replication ensures that data is replicated between the two regions within 15 minutes, aligning with the requirement to minimize the recovery point objective (RPO).
This feature provides near real-time replication, significantly reducing the risk of data loss.
No Impact on Applications:
Applications continue to access the dual-region bucket without any changes, ensuring seamless operation even during a regional failure.
The dual-region setup transparently handles failover, providing uninterrupted access to data.
Steps to Implement:
Create a Dual-Region Bucket:
Create a dual-region Cloud Storage bucket in the Google Cloud Console, selecting appropriate regions (e.g., us-central1 and us-east1).
Enable Turbo Replication:
Enable turbo replication to ensure rapid data replication between the selected regions.
Configure Applications:
Ensure that applications read and write to the dual-region bucket, benefiting from its high availability and durability.
Test Failover:
Simulate a regional failure to verify that the dual-region bucket and turbo replication meet the required RPO and ensure data resilience.
Reference:
Google Cloud Storage Dual-Region
Turbo Replication in Google Cloud Storage


NEW QUESTION # 204
You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

  • A. Your custom endpoint has an out-of-date SSL certificate.
  • B. The message body for the sensor event is too large.
  • C. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.
  • D. The Cloud Pub/Sub topic has too many messages published to it.

Answer: A


NEW QUESTION # 205
You are implementing workflow pipeline scheduling using open source-based tools and Google Kubernetes Engine (GKE). You want to use a Google managed service to simplify and automate the task. You also want to accommodate Shared VPC networking considerations. What should you do?

  • A. Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the service project.
  • B. Use Dataflow for your workflow pipelines. Use shell scripts to schedule workflows.
  • C. Use Dataflow for your workflow pipelines. Use Cloud Run triggers for scheduling.
  • D. Use Cloud Composer in a Shared VPC configuration. Place the Cloud Composer resources in the host project.

Answer: A

Explanation:
Shared VPC requires that you designate a host project to which networks and subnetworks belong and a service project, which is attached to the host project. When Cloud Composer participates in a Shared VPC, the Cloud Composer environment is in the service project. Reference:
https://cloud.google.com/composer/docs/how-to/managing/configuring-shared-vpc


NEW QUESTION # 206
MJTelco Case Study
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more

than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control

topology definition.
MJTelco will also use three separate operating environments - development/test, staging, and production
- to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where

needed in an unpredictable, distributed telecom user community.
Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.

Provide reliable and timely access to data for analysis from distributed research workers

Maintain isolated environments that support rapid iteration of their machine-learning models without

affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis.
Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
MJTelco is building a custom interface to share data. They have these requirements:
1. They need to do aggregations over their petabyte-scale datasets.
2. They need to scan specific time range rows with a very fast response time (milliseconds).
Which combination of Google Cloud Platform products should you recommend?

  • A. Cloud Bigtable and Cloud SQL
  • B. Cloud Datastore and Cloud Bigtable
  • C. BigQuery and Cloud Bigtable
  • D. BigQuery and Cloud Storage

Answer: C


NEW QUESTION # 207
Your chemical company needs to manually check documentation for customer order. You use a pull subscription in Pub/Sub so that sales agents get details from the order. You must ensure that you do not process orders twice with different sales agents and that you do not add more complexity to this workflow. What should you do?

  • A. Create a new Pub/Sub push subscription to monitor the orders processed in the agent's system.
  • B. Use Pub/Sub exactly-once delivery in your pull subscription.
  • C. Use a Deduphcate PTransform in Dataflow before sending the messages to the sales agents.
  • D. Create a transactional database that monitors the pending messages.

Answer: B

Explanation:
Pub/Sub exactly-once delivery is a feature that guarantees that subscriptions do not receive duplicate deliveries of messages based on a Pub/Sub-defined unique message ID. This feature is only supported by the pull subscription type, which is what you are using in this scenario. By enabling exactly-once delivery, you can ensure that each order is processed only once by a sales agent, and that no order is lost or duplicated. This also simplifies your workflow, as you do not need to create a separate database or subscription to monitor the pending or processed messages. Reference:
Exactly-once delivery | Cloud Pub/Sub Documentation
Cloud Pub/Sub Exactly-once Delivery feature is now Generally Available (GA)


NEW QUESTION # 208
You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old.
What should you do?

  • A. Disable caching by editing the report settings.
  • B. Refresh your browser tab showing the visualizations.
  • C. Disable caching in BigQuery by editing table details.
  • D. Clear your browser history for the past hour then reload the tab showing the virtualizations.

Answer: A

Explanation:
Explanation
Reference https://support.google.com/datastudio/answer/7020039?hl=en


NEW QUESTION # 209
Your United States-based company has created an application for assessing and responding to user actions.
The primary table's data volume grows by 250,000 records per second. Many third parties use your application's APIs to build the functionality into their own frontend applications. Your application's APIs should comply with the following requirements:
* Single global endpoint
* ANSI SQL support
* Consistent access to the most up-to-date data
What should you do?

  • A. Implement BigQuery with no region selected for storage or processing.
  • B. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
  • C. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.
  • D. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.

Answer: B


NEW QUESTION # 210
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query - -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall dat
a. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

  • A. Create a separate table for each ID.
  • B. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.
  • C. Use the LIMIT keyword to reduce the number of rows returned.
  • D. Recreate the table with a partitioning column and clustering column.

Answer: D


NEW QUESTION # 211
Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

  • A. Use Google Stackdriver Audit Logs to review data access.
  • B. Get the identity and access management IIAM) policy of each table
  • C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
  • D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

Answer: A


NEW QUESTION # 212
You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster.
This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

  • A. Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
  • B. An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
  • C. A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.
  • D. Edge TPUs as sensor devices for storing and transmitting the messages.

Answer: B


NEW QUESTION # 213
You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm. To do this you need to add a synthetic feature. What should the value of that feature be?

  • A. X^2
  • B. X^2+Y^2
  • C. Y^2
  • D. cos(X)

Answer: D


NEW QUESTION # 214
Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine.
The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do?

  • A. Grant the user access to Google Cloud Shell.
  • B. Run a local version of Jupiter on the laptop.
  • C. Host a visualization tool on a VM on Google Compute Engine.
  • D. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

Answer: A


NEW QUESTION # 215
Case Study: 2 - MJTelco
Company Overview
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware.
Company Background
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to- many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs.
Solution Concept
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations.
Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition.
MJTelco will also use three separate operating environments ?development/test, staging, and production ?
to meet the needs of running experiments, deploying new features, and serving production customers.
Business Requirements
Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers.
Technical Requirements
Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles.
CEO Statement
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments.
CTO Statement
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate.
CFO Statement
The project is too large for us to maintain the hardware and software required for the data and analysis.
Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines.
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. It is company policy to ensure employees can view only the data associated with their region, so you create and populate a table for each region. You need to enforce the regional access policy to the data.
Which two actions should you take? (Choose two.)

  • A. Ensure each table is included in a dataset for a region.
  • B. Adjust the settings for each view to allow a related region-based security group view access.
  • C. Ensure all the tables are included in global dataset.
  • D. Adjust the settings for each table to allow a related region-based security group view access.
  • E. Adjust the settings for each dataset to allow a related region-based security group view access.

Answer: A,B


NEW QUESTION # 216
The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

  • A. Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.
  • B. Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.
  • C. Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.
  • D. Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.

Answer: C


NEW QUESTION # 217
Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

  • A. They have not set the triggers to accommodate the data coming in late, which causes the job to fail
  • B. They have not assigned the timestamp, which causes the job to fail
  • C. They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created
  • D. They have not applied a global windowing function, which causes the job to fail when the pipeline is created

Answer: C


NEW QUESTION # 218
You are updating the code for a subscriber to a Pub/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. Your subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

  • A. Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successfully acknowledged. If an error occurs after deployment, re-deliver any messages captured by the dead-letter queue.
  • B. Use Cloud Build for your deployment. If an error occurs after deployment, use a Seek operation to locate a timestamp logged by Cloud Build at the start of the deployment.
  • C. Set up the Pub/Sub emulator on your local machine. Validate the behavior of your new subscriber logic before deploying it to production.
  • D. Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to re-deliver messages that became available after the snapshot was created.

Answer: B

Explanation:
Explanation/Reference: https://cloud.google.com/pubsub/docs/replay-overview


NEW QUESTION # 219
You are creating the CI'CD cycle for the code of the directed acyclic graphs (DAGs) running in Cloud Composer. Your team has two Cloud Composer instances: one instance for development and another instance for production. Your team is using a Git repository to maintain and develop the code of the DAGs. You want to deploy the DAGs automatically to Cloud Composer when a certain tag is pushed to the Git repository. What should you do?

  • A. 1 Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.
    2. If the tests pass, use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the container to the Google Kubernetes Engine (GKE) cluster of the production instance.
  • B. 1. Use Cloud Build to build a container and the Kubemetes Pod Operator to deploy the code of the DAG to the Google Kubernetes Engine (GKE) cluster of the development instance for testing.
    2. If the tests pass, copy the code to the Cloud Storage bucket of the production instance.
  • C. 1 Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.
    2. If the tests pass, use Cloud Build to copy the code to the bucket of the production instance.
  • D. 1 Use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the code to the Google Kubernetes Engine (GKE) cluster of the development instance for testing.
    2. If the tests pass, use the KubernetesPodOperator to deploy the container to the GKE cluster of the production instance.

Answer: D


NEW QUESTION # 220
You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm. To do this you need to add a synthetic feature. What should the value of that feature be?

  • A. X^2
  • B. X^2+Y^2
  • C. Y^2
  • D. cos(X)

Answer: D


NEW QUESTION # 221
You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems.
Which solutions should you choose?

  • A. Dialogflow Enterprise Edition
  • B. Cloud Natural Language API
  • C. Cloud AutoML Natural Language
  • D. Cloud Speech-to-Text API

Answer: A


NEW QUESTION # 222
......

Latest Google Professional-Data-Engineer Dumps with Test Engine and PDF (New Questions): https://www.dumpexams.com/Professional-Data-Engineer-real-answers.html

Pass Your Professional-Data-Engineer Exam Easily - Real Professional-Data-Engineer Practice Dump Updated: https://drive.google.com/open?id=1otpKgKjhvAQ2yeMkJo3plQ_UwHIcRx9I