Chapter 2, “CREATE DATABASE dbName; GO,” discussed the variety, velocity, and volume characteristics of data. You learned that the velocity and volume of data are increasing exponentially and that those two characteristics are the reason running data analytics in the cloud became necessary. Most companies cannot afford to purchase and maintain the compute and storage resources required to handle data at that scale. The variety of data, however, is where your expertise as an Azure Data Engineer becomes most valuable.
After passing the DP‐203 exam, you will be expected to be able to provision and configure the necessary Azure products for your data analytics solution. Given a set of requirements, you would know if you needed Azure Stream Analytics, Apache Kafka, Azure Data Factory, Azure Synapse Analytics Spark, or SQL pool (serverless or dedicated), etc. This would be expected from all certified Azure Data Engineer Associates. What is not tested is your ability to know your data and to know the questions you need to answer with it. This is not tested due to the variety of scenarios in which data exists, not only in form and location but also in its meaning. Consider the following scenarios of where data can come from, how it might look, and what you might be able to learn from it:
- Sales forecasting
- Stock trading
- Social media
- Application logs
- IoT devices like a Brain Computer Interface (BCI)
If you want to predict what your company will realize in annual sales for the current year and in the next quarter, what kind of data would you need? Two ideas come into mind. First, the sales trend of over the last few years and quarterly sales comparisons. Consider the following dataset:
+——+———-+———-+———-+———-+
| YEAR | SALES Q1 | SALES Q2 | SALES Q3 | SALES Q4 |
+——+———-+———-+———-+———-+
| 2020 | 1000 | 1100 | 1650 | 2900 |
| 2021 | 3050 | 3355 | 5000 | 8750 |
| 2022 | 9200 | ?? | | |
+——+———-+———-+———-+———-+
Over the past two years you can see a consistent increase in sales. In 2020 total sales were 6,650, and in 2021 total sales were 20,155, which is a little over 300% growth year to year. Using that data, you can predict expected total sales by multiplying the total sales for 2021 by 300%. You might also notice that sales in Q2 have consistently been 10% more than Q1; therefore, it might be safe to predict a sales target of 10,120 in Q2 of 2022. This is a simple example that ignores many elements that can influence predictions. However, as your data analytics become more sophisticated, you can apply algorithms that assess the factors that influence the sales predictions. Then use those assessments to make a sales prediction more precisely and reliably. The data itself may be hosted in a relational database, where you can perform a simple query, or it may be ingested into your pipeline as a CSV file.
People generally invest in the stock market to make money—by buying low and selling high. Some investors try to use historical prices as a basis to predict future prices, instead of looking at a company profit and loss statement or a balance sheet. You can download historical stock prices from many places on the Internet. The data might look something like the following Microsoft stock history:Date,Open,High,Low,Close,Adj Close,Volume
2021-12-21,323.290009,327.730011,319.799988,327.290009,327.290009,24740600
2021-12-22,328.299988,333.609985,325.750000,333.200012,333.200012,24831500
2021-12-23,332.750000,336.390015,332.730011,334.690002,334.690002,19617800
2021-12-27,335.459991,342.480011,335.429993,342.450012,342.450012,19947000
2021-12-28,343.149994,343.809998,340.320007,341.250000,341.250000,15661500
2021-12-29,341.299988,344.299988,339.679993,341.950012,341.950012,15042000
2021-12-30,341.910004,343.130005,338.820007,339.320007,339.320007,15994500
Using a similar approach as with the sales prediction example, you can find the direction of the price trend by comparing the daily or quarterly average closing prices. If the price is trending upwards, you might want to buy it; if not, then not. There are numerous properties and elements to employ when analyzing stock prices in hopes of finding the one that makes you wealthy. That is exactly the point. Although you have the data, the data alone will not be enough to gather insights from. Using the data in isolation or in collaboration with a massive number of other sources and data types can still result in unexpected results. The magic sauce typically comes from the individual (you) performing the data analytics, because in many cases that individual also has the experience with that specific type or variety of data.