Design a Partition Strategy – Microsoft Certified: Azure Data Engineer Associate Study Guide

Design a Partition Strategy for Efficiency and Performance – Data Sources and Ingestion

Jas Moore Updated on 08/03/202408/03/2024Write a Comment

Both efficiency and performance were discussed in earlier sections pertaining to the design of a partition strategy. An efficient query is one in which the time required to execute it is well used. That means the query should not be waiting on data shuffling or querying irrelevant data. The most efficient query would be one […]

AVG, MAX, MIN, SUM, COUNT Design a Partition Strategy Microsoft DP-203

Recommended File Types for Storage – Data Sources and Ingestion

Jas Moore Updated on 08/03/202404/07/2024Write a Comment

Chapter 2 introduced the numerous file types and their use cases. If you need a refresher, go back to Chapter 2 to review. The following file formats are used most when working in the Big Data context: This code loads an existing session into a DataFrame and then creates a new DataFrame to contain the […]

Design a Partition Strategy Microsoft DP-203

Design a Distribution Strategy – Data Sources and Ingestion

Jas Moore Updated on 08/03/202402/11/2024Write a Comment

When running your Big Data workloads using Azure Synapse Analytics dedicated SQL pools, how you distribute your data is worthy of meticulous consideration. To summarize, distribution is concerned with the way data is loaded onto the numerous nodes (aka compute machine) running your data analytics queries. When you execute a query, the platform chooses a […]

CARTESIAN JOIN Design a Partition Strategy Microsoft DP-203

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-2

Jas Moore Updated on 08/03/202411/02/2023Write a Comment

The following options are available on the Advanced tab: Begining with the selections you made during the provisioning of ADLS in Exercise 3.1, start with Enable Hierarchical Namespaces. If you do not select this, instead of getting an ADLS container, you get a general‐purpose v2‐based blob container. As discussed in Chapter 1, blob containers are […]

Design a Partition Strategy Microsoft DP-203 Training and Enrichment

Where Does Data Come From? – Data Sources and Ingestion-2

Jas Moore Updated on 08/03/202410/05/2023Write a Comment

Data also can come from social media outlets in the form of comments or ratings. An Azure Cognitive Service called the Language Understanding Intelligent Service (LUIS) can help you understand the meaning of comments. LUIS converts a comment into a meaning using something called an intent. If the comments contain words such as “bad,” “angry,” […]

Design a Partition Strategy Microsoft DP-203 Training and Enrichment

Understanding Big Data Processing – CREATE DATABASE dbName; GO

Jas Moore Updated on 08/03/202406/29/2023Write a Comment

The previous sections covered much of the mid‐level data theory required to be a great Azure data engineer and give you a good chance of passing the exam. Now it’s time to learn a bit about the processing of Big Data across the various data management stages. You will also learn about some different types […]

Design a Partition Strategy Microsoft DP-203 Training and Enrichment Where Does Data Come From?

Analytics Types – CREATE DATABASE dbName; GO

Jas Moore Updated on 08/03/202404/28/2023Write a Comment

There are numerous types of data analytics, all of which are supported by Azure products, specifically Azure Synapse Analytics. In most cases, the name of the analytics type is enough to determine its meaning and purpose. However, here are the most common analytics types and a brief summary of each. Descriptive The descriptive analytic type […]

Design a Partition Strategy Microsoft DP-203 Where Does Data Come From?

Where Does Data Come From? – Data Sources and Ingestion-1

Jas Moore Updated on 08/03/202403/22/2023Write a Comment

Chapter 2, “CREATE DATABASE dbName; GO,” discussed the variety, velocity, and volume characteristics of data. You learned that the velocity and volume of data are increasing exponentially and that those two characteristics are the reason running data analytics in the cloud became necessary. Most companies cannot afford to purchase and maintain the compute and storage […]

CARTESIAN JOIN Design a Partition Strategy Microsoft DP-203 Training and Enrichment

AVG, MAX, MIN, SUM, COUNT – CREATE DATABASE dbName; GO

Jas Moore Updated on 08/03/202407/18/2022Write a Comment

These are some of the most common aggregate SQL functions. You saw them in the previous section. You can use these functions to calculate average, maximum, minimum, and total of numeric column values on one or more tables. The COUNT function returns the number of rows that match the SQL statement criteria. Note that after […]

AVG, MAX, MIN, SUM, COUNT Design a Partition Strategy Microsoft DP-203 Training and Enrichment

CONVERT and CAST – CREATE DATABASE dbName; GO

Jas Moore Updated on 08/03/202406/05/2022Write a Comment

CONVERT and CAST are essentially the same—there is no difference between their capabilities or performance. They both exist solely for historical reasons, not for any functional ones. As long as you understand that both of these SQL functions are used to change the data type of data stored in a table, you have this one […]