Menu
Design a Partition Strategy for Efficiency and Performance – Data Sources and Ingestion

Design a Partition Strategy for Efficiency and Performance – Data Sources and Ingestion

Both efficiency and performance were discussed in earlier sections pertaining to the design of a partition strategy. An efficient query is one in which the time required to execute it is well used. That means the query should not be waiting on data shuffling or querying irrelevant data. The most efficient query would be one […]

Design a Folder Structure That Represents the Levels of Data Transformation – Data Sources and Ingestion

Design a Folder Structure That Represents the Levels of Data Transformation – Data Sources and Ingestion

Once data is ingested and initially stored into what is commonly referred to as a data landing zone (DLZ), the data will flow through the other Big Data stages. More, in‐depth detail about the Big Data transformation stage is covered in Part III, “Develop Data Processing.” For now, only the file structure to support data […]

Design for Efficient Querying – Data Sources and Ingestion

Design for Efficient Querying – Data Sources and Ingestion

You can take numerous steps to optimize the performance and manageability of your files contained on ADLS. The following actions can improve query efficiency: Use this information as a basis for the design of your storage structure. File Size, Type, and Quantity The more data contained within a file, the larger it is and the […]

Design a Distribution Strategy – Data Sources and Ingestion

Design a Distribution Strategy – Data Sources and Ingestion

When running your Big Data workloads using Azure Synapse Analytics dedicated SQL pools, how you distribute your data is worthy of meticulous consideration. To summarize, distribution is concerned with the way data is loaded onto the numerous nodes (aka compute machine) running your data analytics queries. When you execute a query, the platform chooses a […]

Design a Partition Strategy for Files – Data Sources and Ingestion

Design a Partition Strategy for Files – Data Sources and Ingestion

Having an intuitive directory structure for the ingestion of data is a prequel to implementing the partitioning strategy. You may not know how the received files will be formatted in all scenarios; therefore, analysis and preliminary transformation is often required before any major actions like partitioning happens. A directory structure similar to the following is […]

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-3

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-3

Some Azure products generate costs even when not actively used, whereas others do not. An empty ADLS container does not incur any costs, but one that consumes space does. You should remove resources that are no longer being used. Make sure to perform due diligence when provisioning Azure products, as you will be required to […]

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-2

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-2

The following options are available on the Advanced tab: Begining with the selections you made during the provisioning of ADLS in Exercise 3.1, start with Enable Hierarchical Namespaces. If you do not select this, instead of getting an ADLS container, you get a general‐purpose v2‐based blob container. As discussed in Chapter 1, blob containers are […]