Making sense of it all

As described in one of our previous posts (the specific one can be found here), our many sensors provide a wide variety of data for us to store and analyze. For the power suite, require our data to be easily accessible and work with from a front-end perspective, which means that most of it is stored by default as JSON-files.

Luckily for us, Azure Data Lake Analytics (ADLA) enables the user to write U-SQL scripts to query, process and transform the data even though the data is not in a structured format. One of the many benefits of using ADLA with our architecture is that we do not have to worry about scaling when the amount of data increases. With ADLA it is super easy to set up data pipelines that does the necessary processing required for you to get the reports and drill-down that you need.

To demonstrate this, Visma ConsultIoTing wants to look at a specific business case, namely analyzing the temperature and sweat levels of the athlete, in order to determine the comfort- and hydration level. This is an important aspect of skiing, since dressing too warm will cause the athlete to sweat a lot and dehydrate. Dressing too cold will not enable the athlete to sweat at all, but this may cause the athlete to be cold which is both uncomfortable and may cause sickness. To analyze this, Visma ConsultIoTing looks at some sample data from the Sweat Detection Sensor and the temperature sensor, and constructed a data pipeline to process this data. Note that the data used in this analysis is taken while the athlete was working in the Power Suite solution. Additionally, the temperature sensor detects temperature of the surroundings instead of body temperature. Once the product is ready for production, one should have data both from the surroundings and the body of the athlete.

Preparing for the immense amount of data that will flow in when we reach hundreds of millions of active users, we split partition the data files in the data lake by the hour. This keeps the individual jobs for the pipeline manageable. The following figure shows the high level flow of our data analytics pipeline.

Data analytics overview

JSON data is sent from the sensors to the Data Lake Storage. ADLA enables us to write U-SQL scripts to process this data in the way we would like, and store the output to the desired destination. Since we are already working on a dashing dashboard of our own, we will for example not use this to push data to a data warehouse for instance (could store in a new database or pretty much whatever you want), in this demonstration we simply export the processed data as a csv-files.

From our sweat data, we are interested in exploring how much time is spent within different sweating zones as the athlete is pushing himself onward. We also want to keep a record of the temperature data, so that one can determine correlation between the two. To do this, we aggregate the data and count the amount of times the sensor reported that the athlete had sweat levels within the defined sweat range. Similarly, we counted the amount of times the temperature sensor recorded a specific temperature. Although the results are somewhat time-invariant, it still gives a clear indication of correlation. The result can be presented as a classical report in the csv-file directly, or you could use the tool of your choice to visualize the data. In our case, we constructed a simple python script to plot the results.

As you can see from the plot above, the athlete spent most of his time in sweat level 2. The temperature of the surroundings were most of the times just below 25 degrees centigrade, which is slightly higher than what is common (inside temperature). Since the athlete was wearing the ski suit during the data collection period, he displayed moderate sweating (considering he did not engage in physical activities). This suggests that he should dress more lightly, since the current suit will cause him to sweat more than necessary. If the athlete had been doing physical work, or the temperature were measured outside, the conclusion would have been different. Thanks to our data analysis, the athlete is now better prepared than ever to face the challenges of a ski race.

ADLA enables the possibilities to combine this data with for instance the data from a heart rate monitor to see the physical strain on the athlete in a new way.

We believe that the foundation and robustness of setting up a data pipeline like this to parse, process and produce data (in addition to the specific business case) qualifies us for the "The analyst" badge.