This function is used in combination with OPENROWSET and instructs the runtime about the format of the content being retrieved. In addition to OPENJSON, we can use OPENXML and OPENQUERY, which are used in the same context but on different file formats or to run SQL queries directly against a database. The following query illustrates how to implement the OPENJSON function:
SELECT TOP (5) Cntr, ReadingDate, AF3ALPHA, T7ALPHA, PzALPHA, T8ALPHA, AF4ALPHA,Scenario, details, ScenarioReadings[key]asScenarioReadingFROMOPENROWSET(‘CosmosDb’, ‘Account=csharpguitar;Database=brainjammer;Key=sMxCN…xyQ==’,sessions)
WITH (Scenario varchar(max) ‘$.Session.Scenario’,details varchar(max) ‘$.Session.POWReading’) AS SessionsCROSS APPLY OPENJSON(Sessions.details) AS ScenarioReadingsCROSS APPLY OPENJSON(ScenarioReadings.[value])
WITH ( Cntr int ‘$.Counter’, ReadingDate varchar(50) ‘$.ReadingDate’, AF3ALPHA decimal(7,3) ‘$.AF3[0].ALPHA’,T7ALPHA decimal(7,3) ‘$.T7[0].ALPHA’,PzALPHA decimal(7,3) ‘$.Pz[0].ALPHA’,T8ALPHA decimal(7,3) ‘$.T8[0].ALPHA’,AF4ALPHA decimal(7,3) ‘$.AF4[0].ALPHA’) AS ScenarioDetails
The first few lines you know from the previous subsection. There is a function named TOP that, when passed a number, returns only that number of rows. Notice as well the utilization of the CROSS APPLY command. Remember that the data in the Azure Cosmos DB are JSON files, which represent a brainwave session. Executing the previous query results in something similar to that shown in Figure 2.29.
A very powerful and useful implementation of this would be to use it as input to the creation of a view or table using the CREATE AS command. For example, you can place the following four words in front of the previous code snippet:
CREATE VIEW READINGSTMP AS
Once the code executed, the referenced JSON files stored on the Azure Cosmos DB will be parsed and the data within them is placed into a view for further transformation or analysis. That is a truly simple and amazing feature. I hope you agree.
OVER
The OVER clause is used in the context of SQL Windows functions. OVER is used to perform calculations on a group of records, similar to GROUP BY. The difference is that GROUP BYcollapses the dataset, making it impossible to then reference a single row in that same dataset. The following SparkSQL query is executed using a Spark pool on a table containing brainwaves collected in CSV format:
%%sql
SELECT DISTINCT from_unixtime(_c0, ‘yyyy-MM-dd HH:mm:ss’) AS TIMESTAMP,AVG(_c1) OVER(PARTITION BY from_unixtime(_c0, ‘yyyy-MM-dd HH:mm:ss’)) AS Average,MAX(_c1) OVER(PARTITION BY from_unixtime(_c0, ‘yyyy-MM-dd HH:mm:ss’)) AS Maximum,MIN(_c1) OVER(PARTITION BY from_unixtime(_c0, ‘yyyy-MM-dd HH:mm:ss’)) AS Minimum,SUM(_c1) OVER(PARTITION BY from_unixtime(_c0, ‘yyyy-MM-dd HH:mm:ss’)) AS TotalFROM MEDITATIONORDER BY TIMESTAMP
FIGURE 2.29OPENJSON query
As you can see, this kind of query can be executed on both types of pools. The OVER clause supports three arguments: PARTITION BY, ORDER BY, and ROWS/RANGE. You have seen the first one in both scenarios and how it breaks the result into partitions. In this case, the partition is a timestamp. Therefore, the output would look similar to the result of a GROUP BY on the same timestamp, which is what makes OVER and Windows frames so useful. ORDER BY will logically order the results within each partition. Here’s an example of how that looks:
SUM(_c1) OVER(PARTITION BY TimeStamp ORDER BY SCENARIO ) AS Total
The ROWS/RANGE arguments give you the ability to specify row number start and end points or a range of rows for displaying a result of data.