The UNION command combines the data from two or more tables without adding any additional rows. This is best understood visually, so consider the following SQL statement:
SELECT SESSION_ID AS ID, CONVERT(VARCHAR(50),SESSION_DATETIME, 127) AS DATE_SCENARIOFROM [SESSION]UNIONSELECT SCENARIO_ID, SCENARIOFROM SCENARIO
The result would be something similar to the following. Notice that the rendered output is a single table with the two columns combined.
+————-+———————+
| ID | DATE_SCENARIO |
+————-|———————+
| 1 | 2021-07-30T09:35:00 |
| 1 | ClassicalMusic |
| 2 | FlipChart |
| 3 | Meditation |
| 4 | 2021-07-31T10:15:00 |
| 4 | MetalMusic |
| 5 | 2021-07-31T18:25:00 |
| … | … |
+————-+———————+
It means that there exists a SCENARIO_ID on the SESSION table that equals 1 and a SCENARIO_ID on the SCENARIO table that also equals 1. That is why you see the first two rows in the previous table illustration. The same is the case when SCENARIO_ID is 4, which exists on both tables. A UNION is often bundled into the same category of commands as JOIN, EXCEPT, and INTERSECT in that they all provide the means or interface for working with more than a single table. If you add an ALL command directly after the UNION, the result will include duplicates from both tables. Without ALL, duplicates are excluded. Look at the previous table. Although there is a duplicate ID on both tables, there isn’t a duplicate for DATE_SCENARIO as well. Had there been, then without the use of ALL, that row would have been excluded.
VAR and STDEV
Both VAR and STDEV have been introduced already, so this section will provide only a brief summary of what these aggregate functions do, without going too deep into the field of statistics. VAR is used to show how varied the data being analyzed is, known as variance. You can capture the variance between two groups (scenarios)—for example, TikTok and MetalMusic. The assessment will help determine how similar or different the two scenarios are from each other. STDEV returns the standard deviation, a value that represents its distance from the mean, or the average. Data interpretation is not part of the DP‐203 exam.
WITH
You’ve seen the WITH clause already but let’s discuss it in greater depth. As a data engineer you will be confronted with complicated questions that will require you to query a database to answer. Sometimes these questions and the queries you come up with are either extremely complicated or not even possible. Extremely complicated queries are those that would contain numerous subqueries, which make them hard to understand. Take warning that some DBMSs have a maximum number of allowed subqueries, so in some cases a single query is not possible. An alternative to subqueries might be to use CTAS to create a table based on the query, then write another query to query that table. You can create the flow of table creations and querying those tables into as many steps as required to get the answer. Although these options are valid approaches, you might instead consider trying to get the answer using the WITH clause when you find yourself taking such actions. The following code snippet is long but legible to someone with a skilled eye. It illustrates the implementation of the WITH clause:
WITH GETSTATS AS(SELECT AVG([AVG]) AS AF3_BETA_L_AVGFROM READINGSTATSWHERE ELECTRODE = ‘AF3’ and FREQUENCY = ‘BETA_L’GROUP BY ELECTRODE, FREQUENCY)
SELECT r.MODE, r.SCENARIO, r.ELECTRODE, r.FREQUENCY, r.[AVG] AS AVERAGE_PROBLEM_SOLVINGFROM READINGSTATS r, GETSTATSWHERE GETSTATS.AF3_BETA_L_AVG> r.[AVG] AND ELECTRODE = ‘AF3’ and FREQUENCY = ‘BETA_L’GROUP BY r.MODE, r.SCENARIO, r.ELECTRODE, r.FREQUENCY, r[AVG]ORDER BY AVERAGE_PROBLEM_SOLVING DESC
Because a WHERE clause cannot include an aggregate function, a possible solution is to capture that value using WITH. The query captures the average of the averages of brainwave readings for a given electrode and frequency, for all scenarios. Then, that value is used to render scenarios that have a higher average reading value for the given electrode and frequency. Since the BETA_L frequency is linked to scenarios where problem‐solving stimulation is happening, you can partially assume that listening to ClassicalMusic, being the most above average, stimulates this kind of brainwave. Only a single electrode is used here; all five electrodes capture all five frequencies, so a more sophisticated query needs to be written. This is a step in the right direction, however.
+——+—————-+———–+———–+————————-+
| MODE | SCENARIO | ELECTRODE | FREQUENCY | AVERAGE_PROBLEM_SOLVING |
+——+—————-+———–+———–+————————-+
| POW | ClassicalMusic | AF3 | BETA_L | 2.854088 |
| POW | TikTok | AF3 | BETA_L | 2.629033 |
| POW | WorkMeeting | AF3 | BETA_L | 2.522477 |
| POW | MetalMusic | AF3 | BETA_L | 2.192918 |
+——+—————-+———–+———–+————————-+
It is interesting to see that TikTok resulted in such a high BETA_L. This is exactly the point of data analysis and analytics in that you run queries and analyze the output, which you can then relate to what you think should happen in the real world. The event of the data not representing the world as you perceive it either changes you and your mindset or makes you want to dig more deeply into the data source, the data analytics, and try to find out if there is a flaw and prove you are right after all.