Menu

Create an Azure Data Lake Storage Container – Data Sources and Ingestion-2

The following options are available on the Advanced tab:

  • Security
    • Require Secure Transfer for REST API Operations.
    • Enable Infrastructure Encryption.
    • Enable Blob Public Access.
    • Enable Storage Account Key Access.
    • Default to Azure Active Directory Authorization in the Azure Portal.
    • Minimum TLS Version.
  • Data Lake Storage Gen2
    • Enable Hierarchical Namespace.
  • SSH File Transfer Protocol (SFTP)
    • Enable SFTP.
  • Blob Storage
    • Enable Network File System v3.
    • Allow Cross‐tenant Replication.

Begining with the selections you made during the provisioning of ADLS in Exercise 3.1, start with Enable Hierarchical Namespaces. If you do not select this, instead of getting an ADLS container, you get a general‐purpose v2‐based blob container. As discussed in Chapter 1, blob containers are flat and do not support wildcard syntax; they also do not perform well when files are renamed or deleted. A hierarchical namespace renders better performance when storing and retrieving files when compared to a flat file hierarchical structure. The hierarchical namespace structure also aligns well with the implementation of access control lists (ACLs), which are used to control access to files.

The other option selected during Exercise 3.1 was Enable Network File System v3. This option can only be set during provision and cannot be changed, and it is required for many products to be configured as a Linked Service in Azure Synapse Analytics. This option allows files to be shared across the network; if the Enable Network File System v3 option is not enabled, the data cannot be shared. This means that you would have to provision an Azure storage account to enable this, which may or may not have serious implications. For example, if you have already transferred a large amount of data to the Azure storage account and now must move it to a new account, the transfer can be both costly and time‐consuming.

You learned about endpoints in Chapter 1. Most Azure products have a globally discoverable address accessible over HTTP. Requiring secure transfer for REST API operations enforces that the protocol is HTTPS and disallows access using HTTP only. By default, the data you place into an Azure storage account is encrypted. This is known as encrypted at rest. When you enable infrastructure encryption, your data is encrypted twice: once at the default service level and then again at the infrastructure level. Enabling blob public access allows clients to access the blob over the public endpoint; both anonymous access or authenticated access options are available. If blob public access is disabled, then anonymous access to the files stored in the container is allowed but not the default. You can still apply ACLs on the container to restrict access, even though access is initially anonymous.

Clients typically connect to an Azure storage account over HTTPS. The Enable Storage Account Key Access option creates a key to append at the end of the endpoint URL and used to authenticate the client. This is a valid approach, but in most cases using a managed identity or Azure AD is a more secure, recommended, and long‐term approach. The storage account keys could become compromised or be re‐created. In both cases the ability to connect to the Azure storage account would be impacted for some time. In mission critical scenarios, this could be a catastrophe. It is possible to access the contents in your ADLS container via the Azure Portal. To protect the ADLS contents using Azure AD, enable the Default to Azure Active Directory Authorization in Azure Portal the option. HTTPS has historically used Secure Sockets Layer (SSL), the latest version of which is 3.0. There is now Transport Layer Security (TLS), with the most widely supported and secure version being TLS 1.2. To enforce the use of only the most secure version of encryption over HTTPS, select Version 1.2 from the Minimum TLS Version drop‐down box. Be warned, however, that many client machines do not support this version. If you enforce TLS 1.2, it will cause all client machines that do not support that version to fail when connecting.

If you have a business need to open access to an FTP client, you can enable the Enable SFTP option. Recall from Chapter 1 where you were introduced to Azure Active Directory and that it is bound to a tenant. By default, you are not able to replicate content in your container to another Azure AD tenant. If this is necessary, enable the Allow Cross‐tenant Replication option. Azure Files, Tables and Queues are not in scope here, so the final two options are not discussed; you have enough on your plate already, but the option name itself is enough to gather its purpose.

Finally, when creating the ADLS container in Exercise 3.1, you might have noticed the Public Access Level drop‐down, which contained Private (no anonymous access), Blob (anonymous read access for blobs only), and Container (anonymous read access for containers and blobs). If you select Private, then the client must have an authorized credential to access the container, which means the client must be configured to send this along with the request for files. If you select Blob, then all blobs in the container can be accessed from any client without authentication. By selecting Container, not only are the blobs accessible, but the client has the ability to list the contents of the container and discover its contents.

Leave a Reply

Your email address will not be published. Required fields are marked *