Menu

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-1

  1. Log in to the Azure portal at https://portal.azure.com ➢ click the menu button on the top left of the browser ➢ click + Create a Resource ➢ click Storage ➢ click Storage Account; if not found, search for Storage Account ➢ select it ➢ and then click Create.
  2. Select the subscription and resource group where you want the storage account to reside ➢ enter a storage account name (I used csharpguitar) ➢ enter a location (should be the same as a resource group location but is not required) ➢ and then select Locally Redundant Storage (LRS) from the Replication drop‐down. Leave all the remaining options as the defaults.
  3. Click the Next: Advanced > button ➢ check the Enable Hierarchical Namespace check box in the Data Lake Storage Gen2 section ➢ check the Enable Network File System v3 check box in the Blob Storage section (if this is grayed out, leave it as default); feel free to navigate through the other tabs but leave everything else as default ➢ on the Review + Create tab ➢ and then click Create.
  4. Once the provision is complete, navigate to the Overview blade of the storage account. In the center of the blade, you will see something similar to Figure 3.2.

FIGUER 3.2 An Azure storage account Overview blade

  1. Click the Data Lake Storage link on the Overview blade or Containers from the navigation menu ➢ click + Container ➢ enter a Name (I used brainjammer) ➢ and then click Create.

Exercise 3.1 walked you through provisioning an ADLS container. You encountered numerous options, beginning with the first items selected, the subscription and resource group. Remember that an Azure subscription is the location where billing happens. It is a grouping of all provisioned Azure products. You can have multiple subscriptions within what is called a management group. Similarly, you can have multiple resource groups. A resource group logically groups together resources within a subscription and is where you would typically create all the provisioned Azure resources for a given project, thereby providing better visibility of the costs on a project basis. Having all the related Azure products grouped together is also helpful. The hierarchy, management group ➢ subscription ➢ resource group, is also the place where you typically enforce role‐based access control (RBAC) restrictions. Granting access and privileges to groups of individuals (not to individuals themselves) is recommended at the resource group level. See Chapter 1, “Gaining the Azure Data Engineer Associate Certification,” for a review of RBAC.

When selecting the region (aka datacenter location), you need to consider two things. First, where is the data being produced? Second, where will the data be consumed? You need to choose a location that is closest to both—closest to the most critical entity that is impacted by latency or that requires the greatest performance based on business need. Simply put, when choosing the region, consider where the producers and consumers are relative to the location where the data is stored.

Next, the redundancy levels (see Chapter 1)—LRS, GRS, ZRS, and GZRS—are available from the drop‐down list for an ADLS container. GRS is the default and means that your data is copied three times within the selected region and then again three times in a secondary region. Exercise 3.1 recommended LRS because it costs less. Because you are not running in production mode, LRS is a valid choice. ZRS means the data is replicated to multiple datacenters in the same region, and GZRS means LRS plus ZRS.

Ending out the Basic tab is the choice between the Standard and Premium performance tiers. This comes down to how fast you need your read and write operations on the files stored in the container to be. If your data analytic solution will be running at a company or enterprise level, seriously consider Premium, which provides the best performance. For testing, experimenting, and small data analytics projects, Standard will be sufficient.

Leave a Reply

Your email address will not be published. Required fields are marked *