Menu

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-3

Some Azure products generate costs even when not actively used, whereas others do not. An empty ADLS container does not incur any costs, but one that consumes space does. You should remove resources that are no longer being used. Make sure to perform due diligence when provisioning Azure products, as you will be required to pay for them when consumed.

The next tab, Networking, provides options to configure how the Azure storage account is accessed using network‐level constraints. The Network Connectivity section includes an option named Connectivity Method, where you will find the following possible values:

  • Public Endpoint (All Networks)
  • Public Endpoint (Selected Networks)
  • Private Endpoint

The default is Public Endpoint (All Networks), which means the globally discoverable URL is available via the Internet. This does not mean anyone can access the content in the Azure storage account; it simply means the endpoint is “pingable.” If you select Public Endpoint (Selected Networks), you are prompted to select an existing virtual network (VNet) or create a new one. Once the network is configured, only resources existing in the VNET can access the Azure storage account. The connection protection is from inbound traffic only, and the global endpoint is still visible. The Private Endpoint option will remove the global endpoint from Internet discoverability and requires the binding with a VNET.

The Network Routing section includes an option named Routing Preferences with the following possible values:

  • Microsoft Network Routing
  • Internet Routing

When you select Microsoft Network Routing, traffic to your Azure Storage Account will enter the Microsoft network as quickly as possible. The Internet Routing option routes the traffic to the Azure network closest to where the Azure Storage Account is hosted. For example, if your ADLS container is in the western United States and the client wanting access is in the eastern United States, and if Microsoft Network Routing is enabled, the traffic will enter the Microsoft network from the eastern US datacenter. Then all traffic between the client and the ADLS container would flow primarily within the Microsoft network. If Internet Routing is selected, the traffic from the eastern US would traverse the Internet to the western US and enter the Microsoft network there. That would mean that most of the traffic between the client and the endpoint would be transmitted over the Internet instead of within the Microsoft network.

The last tab, Data Protection, enables you to configure some recovery, tracking, and access controls. Many of these options are disabled when ADLS Gen2 is the targeted storage container type. This is because many of the options have a great impact on performance and can cause latency when enabled. Data protection options are different between blob and ADLS containers; options that are not available are grayed out and will be disabled in the Azure Portal. The following list contains all available Azure Storage container options:

  • Recovery
    • Enable Point‐in‐time Restore for Containers.
    • Enable Soft Delete for Blobs.
    • Enable Soft Delete for Containers.
    • Enable Soft Delete for Shares.
  • Tracking
    • Enable Versioning for Blobs.
    • Enable Blob Change Feed.
  • Access Control
    • Enable Version‐level Immutability Support.

If the data in your container becomes corrupt or gets deleted, having the Enabled Point‐in‐time Restore for Containers option configured means that regular backups have been performed, so you would not lose everything. You could then roll back to a specific time where you know the data was in a valid state and then recover from that backup. This option is not yet supported when you are using hierarchical namespaces. When the option Enable Soft Delete for Blobs is selected, files that were deleted are stored for 7 days by default, just in case you want to recover the deletion. The same goes for the Enable Soft Delete for Containers and Enable Soft Delete for Shares options, where deletions are stored for a default of 7 days before being permanently deleted. Tracking changes and version control of your blobs in the container is important for monitoring what is changed and by whom. Both Enable Versioning for Blobs and Enable Blob Change Feed provide the features to achieve that. You might have a requirement that necessitates the retention of complete historical versions of any updated file. Instead of performing a change and updating the version metadata for that file, a new file is created with the update and the old one is maintained for historical reference. To achieve that, check the Enable Version‐level Immutability Support check box.

While you are in the ADLS mindset, complete Exercise 3.2 to load some brain wave data to an ADLS container.

Leave a Reply

Your email address will not be published. Required fields are marked *