MQTT and Kafka both have their uses in IoT platforms
The MQTT protocol is closely linked with the Internet of Things (IoT). Most any IoT platform will use MQTT in some form. Be it in small personal Home Automation setups. Or in big commercial IoT platforms, be they product IoT (e.g.connecting end user devices like washing machines to the cloud) or Industrial IoT (connecting devices and machines on a factory shop floor). As such, when I design an architecture for a customer as part of my job at MaibornWolff, an MQTT broker like HiveMQ will almost always have a prominent place. At the same time I also add a Kafka broker to the mix. Which has prompted more than one customer to ask why we need both and if we couldn't use just one of them for all use cases.
In this post I want to explain why MQTT and Kafka deal with different requirements and why both have their place in a performant and flexible IoT or Smart Factory architecture.
What is MQTT
MQTT is a lightweight machine-to-machine network protocol designed, among other things, to be used by small limited devices (like sensors or embedded devices) and to facilitate asynchronous publish-subscribe communication. Clients connect to an MQTT broker and publish messages on topics. Other clients subscribe to topics and receive messages sent to these. This allows for flexible communication between multiple parties without having to rely on point-to-point connections. Topics are dynamic in the sense that they don't need to be created or registered beforehand and form a hierarchy. Clients can use wildcards to subscribe to parts of that hierarchy. This makes the system very flexible in that we can express structure in topics and can use wildcards to finely control which messages we want. MQTT also supports different Quality of Service (QoS) levels. Level 0 (at-most-once) means a message can get lost, Level 1 (at-least-once) means a message is guaranteed to be delivered but could be received multiple times, Level 2 (exactly-once) guarantees that a message is delivered exactly once. Technically this is handled by the broker and client exchanging acknowledges for messages on Level 1 and 2.
MQTT brokers can easily handle millions of topics and clients making the protocol very scalable. For example, HiveMQ is used to connect 10 million cars into an IoT platform.
MQTT also supports queuing, meaning a broker will buffer messages if a client is offline for a time or temporarily can't keep up with the influx of messages.
In my experience MQTT works best for clients that communicate quasi-live and where each client only publishes and receives a limited number of messages.
Unified namespace for MQTT
The Unified Namespace (UNS) is a good example of how MQTT and its topic hierarchy are used in Smart Factory platforms. The structure of a manufacturing system is represented in the topic hierarchy, often using ISA-95 equipment levels: enterprise/site/area/line/cell/metric (e.g. bottlingcompany/munich/bottling/line1/machine1/conveyorspeed). This makes it easy to get the information I want using wildcard subscriptions. I can get all data from a certain work area (bottlingcompany/munich/bottling/#), or I might want a specific metric from a type of machine from all sites (bottlingcompany/+/bottling/+/+/conveyorspeed). Both can be expressed with wildcard subscriptions. Which makes sure I only get the data I want and do not have to do filtering or get overwhelmed with unwanted messages.
UNS of course also deals with much more, such as unified message structures and schemas, but I don't want to go into more detail here.
What is Kafka
With MQTT out of the way, let's look at Kafka.
Technically speaking, Apache Kafka is a distributed log and event store, meaning messages are persisted in ordered form. In practice Kafka can and is treated as a message broker with topics, producers and consumers. But it is aimed primarily at high-throughput data streaming and processing and is often used with Stream processing solutions like Apache Flink.
In contrast to MQTT topics are additionally split into partitions and ordering of messages is only guaranteed within one partition. MQTT also has more queue-like behavior for messages, meaning once a message has been processed, it is gone. In Kafka on the other hand messages are persisted into a log and can be retrieved again at any time (as long as the Kafka cluster has not deleted it, but that time window can be configured). So a consumer can always go back and restart reading and processing messages from an older point in time.
This makes Kafka ideal for situations where data might need to be reprocessed later on. One situation can be grouping data into batches for processing and storing in a database system. Writing data into a system like PostgreSQL or OpenSearch you always want to write batches of messages (easily thousands per batch) to be efficient. If during that process the client crashes, it can always restart at the last time it left off. Kafka supports this using so-called manual commits, which means a consumer can manually set a specific commit offset up to which it has safely processed messages. Upon restart, the Kafka broker replays messages from that point on.
The other situation can be that data needs to be reprocessed due to changes in the logic of a processing service. Maybe a deployment introduced a bug leading to wrong results. If it is discovered within the retention window of the Kafka topic, a fixed version of the service can be deployed. Once started it can ask Kafka to replay messages from an older point in time, even ones that it already committed, and this time correctly process them. As long as the next system in the processing chain (like a database) can handle updates, the bug can be fixed without much hassle. In a past blog entry on safety nets in infrastructure development I have already described how we used such a system in a customer project.
Message brokers in a Smart Factory Architecture
Now that we know what MQTT and Kafka brokers are, let's look at how my colleagues and I use them in Smart Factory architectures. For most setups nowadays we use the Unified Namespace approach with a central MQTT broker, where data from all machines and other systems like Enterprise Resource Planning (ERP) or Manufacturing Execution Systems (MES) is collected. Often in a tiered setup where machine data is pushed to a broker per plan which forwards data to a central broker in the cloud.
Services can subscribe to parts of the data on the MQTT broker (using wildcard subscriptions) and publish their own results back to the UNS. All data is persisted into a timeseries database like TimescaleDB. But the service responsible for that (called a persist service) is not directly connected to the MQTT broker. Instead, all data is forwarded from MQTT to a Kafka broker and the persist services read data from there. The same goes for other services that deal with large parts of the data. That way these services can take advantage of the manual commit and replay features. The same goes for any stream processing with Flink jobs or similar.
Can we do one with the other
The big question now is, why use two different broker systems with all the complexity this entails. Why not just use one for both. Let's look at this in a bit more detail.
Could we just use MQTT for everything? So, specifically, could we connect our persist service to MQTT? In theory, yes. MQTT and client libraries allow us to manually acknowledge messages with a QoS of 1 or 2, meaning we could buffer and batch messages to write to the database. But MQTT brokers are not good at queuing large numbers of messages for single clients. And replaying messages is impossible once they have been acknowledged. So we could only handle small batches. But if our platform has a huge volume of messages and we need big batches for efficiency, then MQTT is not feasible. And if we look further to stream processing systems, they have the same problems and most have only limited support for MQTT whereas Kafka integration is mostly excellent and feature-rich.
Could we go the other direction and use Kafka for everything? That is also a definitive no. Implementing a Unified Namespace requires hierarchical fine-grained topic structures, something Kafka is not designed for. Having a few hundred topics will be fine, but the broker will not be able to handle thousands. And things like hierarchical topics and wildcard subscriptions can in theory be done but in practice are clunky and not really usable in a production setup. Kafka client libraries are often a lot more heavy-weight and complex than MQTT libraries (due to the Kafka protocol having more complexity in the client side). This is another aspect that can be critical for embedded devices but would not mostly be an issue in manufacturing setups.
Conclusions
Both MQTT brokers and Kafka have their place in complex Smart Factory architectures with a high data volume. Replacing one with the other is not really possible and will lead to more problems and complexity down the road than having the additional components in your architecture. Both may be called message brokers, but they fulfill different functions and cater to different use cases. So we should do what the legendary Montgomery Scott told his engineers in Star Trek V: The Final Frontier: "How many times do I have to tell you - the right tool for the right job!"