Menu
Design a Partition Strategy for Efficiency and Performance – Data Sources and Ingestion

Design a Partition Strategy for Efficiency and Performance – Data Sources and Ingestion

Both efficiency and performance were discussed in earlier sections pertaining to the design of a partition strategy. An efficient query is one in which the time required to execute it is well used. That means the query should not be waiting on data shuffling or querying irrelevant data. The most efficient query would be one […]

Partitioning – Data Sources and Ingestion

Partitioning – Data Sources and Ingestion

As discussed in Chapter 2, partitioning is a way to logically structure data. The closer queried data physically exists together, the faster the query will render results. What you learned in Chapter 2 related to PolyBase and CTAS, where you added a PARTITION argument to the WITH clause; therefore, the data was allocated properly across […]

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-3

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-3

Some Azure products generate costs even when not actively used, whereas others do not. An empty ADLS container does not incur any costs, but one that consumes space does. You should remove resources that are no longer being used. Make sure to perform due diligence when provisioning Azure products, as you will be required to […]

Summary – CREATE DATABASE dbName; GO

Summary – CREATE DATABASE dbName; GO

There was a lot covered in this chapter, starting with the description of the first data storage device up to real‐time data analysis and intelligence gathering. You learned about the different file formats, like JSON, CSV, and Parquet. You learned about the different ways in which data can be stored, like structured, semi‐structured, and nonstructured. […]

Training and Enrichment – CREATE DATABASE dbName; GO

Training and Enrichment – CREATE DATABASE dbName; GO

The training and enrichment of data typically happens by making improvements to the data quality or invoking Azure Machine Learning models, which can be later consumed by Azure Cognitive Services. The invoke can take place within a pipeline or manually. Azure Machine Learning models can be used to predict future outcomes based on historical trends […]

SQL Server Integration Services – CREATE DATABASE dbName; GO

SQL Server Integration Services – CREATE DATABASE dbName; GO

Introduced in Chapter 1, SQL Server Integration Services (SSIS) is useful for pulling data from numerous datastores, transforming the data, and storing it in a central datastore for analysis. Ingestion can be initiated by pulling data from existing sources instead, which is in contrast to data producers pushing data into the pipeline. Bulk Copy Program […]

Analytics Types – CREATE DATABASE dbName; GO

Analytics Types – CREATE DATABASE dbName; GO

There are numerous types of data analytics, all of which are supported by Azure products, specifically Azure Synapse Analytics. In most cases, the name of the analytics type is enough to determine its meaning and purpose. However, here are the most common analytics types and a brief summary of each. Descriptive The descriptive analytic type […]

Where Does Data Come From? – Data Sources and Ingestion-1

Where Does Data Come From? – Data Sources and Ingestion-1

Chapter 2, “CREATE DATABASE dbName; GO,” discussed the variety, velocity, and volume characteristics of data. You learned that the velocity and volume of data are increasing exponentially and that those two characteristics are the reason running data analytics in the cloud became necessary. Most companies cannot afford to purchase and maintain the compute and storage […]

Design a Data Storage Structure – Data Sources and Ingestion

Design a Data Storage Structure – Data Sources and Ingestion

In this chapter you will provision numerous Azure data analytics products. By doing so, you will begin to understand more about the products and their features, which can help you create and choose the best tool for your given solution requirements. Choosing a proper service for a scenario results in having a solid design. Table […]

CORR – CREATE DATABASE dbName; GO

CORR – CREATE DATABASE dbName; GO

This function returns the coefficient of correlation when passed a pair of numbers. CORR will determine if a relationship exists between the pair of values it receives. The result is a range from −1 to 1 where either ±1 means there is a correlation between the two numbers, and a 0 means there is no […]