Both efficiency and performance were discussed in earlier sections pertaining to the design of a partition strategy. An efficient query is one in which the time required to execute it is well used. That means the query should not be waiting on data shuffling or querying irrelevant data. The most efficient query would be one […]
Recommended File Types for Storage – Data Sources and Ingestion
Chapter 2 introduced the numerous file types and their use cases. If you need a refresher, go back to Chapter 2 to review. The following file formats are used most when working in the Big Data context: This code loads an existing session into a DataFrame and then creates a new DataFrame to contain the […]
Design a Distribution Strategy – Data Sources and Ingestion
When running your Big Data workloads using Azure Synapse Analytics dedicated SQL pools, how you distribute your data is worthy of meticulous consideration. To summarize, distribution is concerned with the way data is loaded onto the numerous nodes (aka compute machine) running your data analytics queries. When you execute a query, the platform chooses a […]
Create an Azure Data Lake Storage Container – Data Sources and Ingestion-2
The following options are available on the Advanced tab: Begining with the selections you made during the provisioning of ADLS in Exercise 3.1, start with Enable Hierarchical Namespaces. If you do not select this, instead of getting an ADLS container, you get a general‐purpose v2‐based blob container. As discussed in Chapter 1, blob containers are […]
Where Does Data Come From? – Data Sources and Ingestion-2
Data also can come from social media outlets in the form of comments or ratings. An Azure Cognitive Service called the Language Understanding Intelligent Service (LUIS) can help you understand the meaning of comments. LUIS converts a comment into a meaning using something called an intent. If the comments contain words such as “bad,” “angry,” […]
Understanding Big Data Processing – CREATE DATABASE dbName; GO
The previous sections covered much of the mid‐level data theory required to be a great Azure data engineer and give you a good chance of passing the exam. Now it’s time to learn a bit about the processing of Big Data across the various data management stages. You will also learn about some different types […]
Analytics Types – CREATE DATABASE dbName; GO
There are numerous types of data analytics, all of which are supported by Azure products, specifically Azure Synapse Analytics. In most cases, the name of the analytics type is enough to determine its meaning and purpose. However, here are the most common analytics types and a brief summary of each. Descriptive The descriptive analytic type […]
Where Does Data Come From? – Data Sources and Ingestion-1
Chapter 2, “CREATE DATABASE dbName; GO,” discussed the variety, velocity, and volume characteristics of data. You learned that the velocity and volume of data are increasing exponentially and that those two characteristics are the reason running data analytics in the cloud became necessary. Most companies cannot afford to purchase and maintain the compute and storage […]
AVG, MAX, MIN, SUM, COUNT – CREATE DATABASE dbName; GO
These are some of the most common aggregate SQL functions. You saw them in the previous section. You can use these functions to calculate average, maximum, minimum, and total of numeric column values on one or more tables. The COUNT function returns the number of rows that match the SQL statement criteria. Note that after […]
CONVERT and CAST – CREATE DATABASE dbName; GO
CONVERT and CAST are essentially the same—there is no difference between their capabilities or performance. They both exist solely for historical reasons, not for any functional ones. As long as you understand that both of these SQL functions are used to change the data type of data stored in a table, you have this one […]