Menu

Where Does Data Come From? – Data Sources and Ingestion-2

Data also can come from social media outlets in the form of comments or ratings. An Azure Cognitive Service called the Language Understanding Intelligent Service (LUIS) can help you understand the meaning of comments. LUIS converts a comment into a meaning using something called an intent. If the comments contain words such as “bad,” “angry,” “hard,” or “upset,” then LUIS returns an intent equal to something like “negative.” If the comments contain words like “happy,” “love,” “friends,” or “good,” then the result would be “positive.” On platforms like Twitter, Instagram, and Facebook, where there are billions of messages daily, you might be able to gauge the societal sentiment for a given day. With more focused analytics, you might be able to narrow down society’s opinion on a specific current issue. Data can be uploaded to LUIS in bulk in the following format:

Business is good
Can you recommend a good restaurant?
That smells bad
Have a good trip
That person is a very good student
That’s too bad
I feel good
There is a restaurant over there, but I don’t think it’s very good
I’m happy

Each row represents a comment, with a line break symbolizing the end of the phrase. The response to each phrase is a rating in JSON format, similar to the following. The score is a measurement of how probable the intent is true; a 1 would represent a 100 percent certainty that the intent of the sentence is positive.

{ “query”: “Business is good”,”topScoringIntent”:{“intent”: “positive”,”score”: “0.856282966” }}

Monitoring the availability and performance of an application requires activity logs. The frequency and utilization of the application is the leading factor when it comes to the volume and velocity of data generated—that, and the verbose level of logging, which is configured in the monitored application. Verbosity has to do with if you want information‐level logs or just the critical error log. Information logs occur much more often than critical errors—at least you hope that is the case. When the application is coded to generate logs, you can decide to store them and analyze them offline, or you can monitor them in real time and trigger alerts that immediately notify someone who can take an action, depending on the level and criticality the application has on the business. The format in which these logs are written is totally open and up to the team who codes the logic into the application. Log files are typically text‐based and may resemble the following:

Date time s-sitename cs-method sc-status sc-substatus sc-bytes time-taken
2022-05-02 12:01 CSHARPGUITAR POST 500 0 2395146 44373
2022-05-02 12:01 CSHARPGUITAR POST 404 14 11118 783
2022-05-02 12:02 CSHARPGUITAR POST 403 6 8640 1055
2022-05-02 12:04 CSHARPGUITAR POST 503 1 32911 104437
2022-05-02 12:04 CSHARPGUITAR POST 200 0 32911 95

Sometimes the log files will contain an error message, or sometimes the log only contains the error number, which requires additional analysis to understand. Here is another example of an application log:

Date time cs-version cs-method sc-status s-siteid s-reason s-queuename
2022-11-09 08:15 HTTP/1.1 GET 503 2 Disabled csharpguitar
2022-11-09 08:16 HTTP/1.1 GET 403 1 Forbidden brainjammer
2022-11-09 08:16 HTTP/1.1 GET 400 1 BadRequest brainjammer
2022-11-09 08:19 HTTP/1.1 HEAD 400 2 Hostname csharpguitar
2022-11-09 08:20 HTTP/1.1 POST 411 1 LengthRequired brainjammer

On‐premises application logs are typically written to a file on the machine where the application runs. In that scenario you would need to pull them at scheduled intervals and ingest them into your data analytics pipeline. The Azure platform includes numerous products to provide this kind of analysis and alerting, such as Application Insights, Log Analytics, and Azure Monitor.

Relatively new sources of data producers are IoT devices. Humidity trackers, light bulbs, automobiles, alarm clocks, and brain computer interfaces (BCIs) are all examples of IoT devices. The examples of data analytics for the remainder of this book will have to do with the BCI that was described in Chapter 2. You have already seen many examples of the data generated from the BCI that was captured and stored on a local workstation, then uploaded ad hoc to the Azure platform. The same data could have been saved to an Azure Cosmos DB as a JSON file or into an Azure SQL database in real time. An objective for this analysis is to stream the brain data and determine in real time what activity (aka scenario) the person is performing. The following is an example of a brain wave reading in JSON format:

{“Session”: {“Scenario”: “ClassicalMusic”,”POWReading”: [{“ReadingDate”: ” 2021-09-12T09:00:18.492″,”Counter”: 0, “AF3”: [{“THETA”: 15.585,”ALPHA”: 5.892,”BETA_L”: 3.415,”BETA_H”: 1.195,”GAMMA”: 0.836}]}]}}

And the following is a similar reading is CSV format:

Scenario,Counter,Sensor,THETA,ALPHA,GAMMA
TikTok,5,AF3,9.681,3.849,0.738
TikTok,6,Pz,8.392,4.142,1.106

Data comes in a large variety of formats and from many different sources, and it can have very diverse interpretations and use cases. This section attempted to show this by discussing a few different scenarios where you would benefit from provisioning some Azure data analytics products. Your greatest contribution and impact lies in your ability to ingest relevant data from numerous sources and formats, then run it through the pipeline and find business insights from it. The “running it through the pipeline” part requires some coding—PySpark, DML, or C#—and an understanding of what the data means and what questions you are trying to answer.

Leave a Reply

Your email address will not be published. Required fields are marked *