Building a real-time dashboard for your IoT data
Using Azure IoT Hub, IoT Edge, Machine Learning, and Bokeh
Companion repo available. Read the intro below for the link!
You may face a use case where your data needs to be updated and visualized at a very high frequency. In addition, you would like to create a machine learning model to decide what to do on such high-frequency data, something a human cannot do. Well, with the help of the cloud and open-source tools we can build a framework for that. In this post, I will introduce a workflow and the cloud resources needed to build a pipeline that collects, enriches, and routes telemetry to both a storage sink and a streaming endpoint. This endpoint is used by a service that updates a real-time dashboard.
This post is based on a personal project for predictive maintenance using IoT telemetry. It has links on the text that will lead you to the corresponding code in each section. Check out this link for the root of the repository. Note that it is only a proof of concept and is in no way production-ready.
Index
- Overview of the process in action
- Setting up the infrastructure
- Training the model in the cloud
- Creating and submitting the IoT Edge modules
- Important considerations
Overview of the process in action
The flow of the pipeline in execution works as depicted:
IoT Edge
It starts with a local server, where an IoT Edge runtime is installed, turning the server into an Edge device. IoT Edge allows us to easily deploy, manage, monitor docker containers or “modules”, and define communication routes among them.
These devices will have two modules deployed (click the links to see the code):
The sensor data is captured, sent to the ML module for inference, and streamed to the cloud using Azure IoT Hub. Once the code is defined, it has to be packaged into a Docker container and deployed to the devices. The deployment will be covered in the setup stages.
IoT Hub
The IoT Hub resource enriches the data with the timestamp and device ID. Then, it sends the stream to a file storage-based data lake and to the built-in endpoint that is Event Hubs compatible. This endpoint will be used for the real-time dashboard.
Real-time dashboard
From the dashboarding service, the stream is captured using a consumer client and updates a data structure that feeds Bokeh visualizations. In addition, there is a dog that closes the client when the application is being shut down, for a graceful exit. Check out the code for the visualization server here.
Setup stages
Before our pipeline is operational, we need to set up all the required infrastructure. That is: the ML services, compute, container registries, message routings, message enrichments, and other resource interactions. This is done by three support DAGs powered by Apache Airflow:
- Infrastructure
- Machine Learning training
- IoT Edge Modules deployment to edge devices
The setup is explained in the following sections. Here is a recording of the live dashboard with a single sensor:
Setting up the infrastructure
In this section, we deploy all the necessary cloud resources and compute to support our framework. Check here to see the code for the infrastructure DAG. There are two main groups of infrastructure:
Machine Learning
Once this flow is triggered, we create an Azure Machine Learning workspace, a compute cluster to train our future models, and a container registry to build the inference micro-service. If these resources already exist, we just get them for the next steps.
IoT resources
This process executes in parallel to the creation of the ML resources. First, an IoT Hub is created. It will receive, enrich, and route our sensor telemetry. We create an Azure storage account, and a file storage service that will be used as a sink for our data. The sink and message enrichments must be defined in IoT Hub, and our setup pipeline takes care of that in the final step of this stage.
Training the model in the cloud
At this stage, assuming a dataset is already prepared (see the important considerations section), it gets uploaded to a datastore. Then, we define the parameters for the training on the cloud. These parameters are mainly the directory of the training script, the compute resource object, and a conda environment object. All of these are implemented with the help of the Azure ML SDK for python. Furthermore, we submit the ML experiment run and register a new model version.
Finally, the orchestrator downloads the latest model version on the local server. Check here to see the training DAG. This model will be used to create the inference service in the next stage.
Creating and submitting the IoT Edge modules
This stage focuses on deploying two IoT Edge modules: telemetry generation using the IoT sensor and machine learning inference with the downloaded model from the previous stage. The behavior for the two modules is already defined in python scripts. Check here to see the edge DAG.
First, the orchestrator builds, tags, and pushes the images to the registry in parallel. Furthermore, it picks an IoT Edge deployment template and replaces the placeholders with specific values set as arguments in the orchestrator. These values are registry credentials and image locations. Finally, they are deployed to IoT devices.
Important considerations
Training dataset compilation
This step is not contemplated in the initial project. Ideally, you should use the telemetry held on the data lake to continuously build training datasets. This operation is backed by proper machine learning model observability.
ML observability
Model deployment is not the end of the hustle. You should check for shifts in the distribution of the input features, predictions, and also the labels produced for each training iteration. This way, you get ahead of model failures, because observability enables root cause analysis, and helps minimize model degradation.
If my telemetry is massive, will the dashboard service break?
This part was not properly treated since this framework is currently suited to support a handful of devices. However, to handle streaming data in very large quantities, we could set a key-value NoSQL database that is not ACID, but BASE compliant and with both input and output streaming capabilities. These databases make the data available as soon as it comes, meaning that replicas are eventually consistent, but enable the handling of data with very little latency.
The data could use a serverless resource to capture the data from the IoT Hub endpoint and send it to the database at scale. Moreover, we can filter and stream just the data that we need to the dashboarding service. In this way, we prevent the latter from being overwhelmed by too much data.
Thank you for taking the time to read my post. Please, leave a comment or clap the article if it was useful to you so I can improve on my writing.
I will see you next time!