Kafka vs RabbitMQ: when I reach for each

May 14, 2024·7 min read

Kafka and RabbitMQ both move messages between services. That shared description is where the similarity ends. They were designed for different problems, and understanding the difference is the key to choosing correctly.

The conceptual difference

RabbitMQ is a message broker. It receives messages, routes them to queues, and delivers them to consumers. Once a message is consumed and acknowledged, it's gone. RabbitMQ is designed for task distribution: give this work to a worker, any worker, exactly once.

Kafka is a distributed log. It receives messages and appends them to an immutable, ordered log. Consumers read from the log at their own pace. Messages aren't deleted after consumption. They're retained based on a time or size policy. Kafka is designed for event streaming: record that this thing happened, and let any number of consumers read it.

This distinction determines everything else.

When RabbitMQ is the right choice

Job queues. You have tasks that need to be processed by workers. Send emails, resize images, process payments. The task goes to one worker, the worker processes it, the task is done.

Producer → Queue → Consumer (one of many)
                 → Consumer
                 → Consumer

Workers compete for messages. Each message goes to exactly one worker. If a worker crashes before acknowledging, the message is redelivered to another worker.

Request-reply patterns. Service A needs a response from Service B, but wants to do it asynchronously. RabbitMQ supports reply queues and correlation IDs natively.

Routing logic. RabbitMQ supports topic exchanges, header exchanges, and fanout exchanges. You can route messages based on their content, headers, or routing keys. If you need "all messages about orders go to the order service, all messages about users go to the user service," RabbitMQ handles this at the broker level.

Low latency per message. RabbitMQ delivers individual messages faster than Kafka because it doesn't wait for batch accumulation. For applications where a single message needs sub-millisecond broker latency, RabbitMQ wins.

When Kafka is the right choice

Event sourcing. You want a complete history of everything that happened. User signed up, order placed, payment processed, item shipped. The log is the source of truth. You can rebuild any derived state by replaying the log.

Multiple consumers. Multiple services need to react to the same event, independently. When a user places an order, the inventory service reduces stock, the notification service sends a confirmation, the analytics service records a conversion. Each consumer reads from the same topic at its own pace.

Producer → Topic → Consumer Group A (inventory)
                 → Consumer Group B (notifications)
                 → Consumer Group C (analytics)

Each consumer group gets every message. Within a group, messages are distributed across consumers for parallelism. This is fundamentally different from RabbitMQ's competing consumers.

Replay and audit. Kafka retains messages. If you deploy a new analytics service next month, it can read the topic from the beginning and process all historical events. With RabbitMQ, consumed messages are gone.

High throughput. Kafka is designed for throughput. It batches writes to disk, uses sequential I/O, and can handle millions of messages per second per broker. For log aggregation, metrics collection, or any workload that produces a high volume of events, Kafka's architecture is more efficient.

Consumer groups in Kafka

Kafka's consumer group semantics are central to understanding it.

A topic has partitions (ordered, append-only logs). Each partition is assigned to exactly one consumer within a consumer group. If you have 6 partitions and 3 consumers in a group, each consumer reads 2 partitions.

If a consumer fails, its partitions are reassigned to surviving consumers. This is automatic rebalancing.

Different consumer groups are independent. Group A can be on partition 3 while Group B is on partition 5. They don't interfere.

This is different from RabbitMQ's competing consumers. In RabbitMQ, consumers compete for individual messages. In Kafka, consumers within a group share partitions, but each partition is read by exactly one consumer. The parallelism is partition-level, not message-level.

Message retention

This is the feature that changes what you can build with Kafka.

RabbitMQ: message consumed, message gone. Kafka: message written, message stays for the configured retention period (default 7 days, configurable to forever).

Retention means:

You can replay events to rebuild state after a bug
New consumers can process historical data
You can audit what happened and when
You can run A/B tests on event processing logic against real historical data

For the healthcare platform I work on, we use Kafka for session events. Every state change in an interpreter session is recorded as an event. If we discover a bug in session duration calculation, we can fix the bug and replay the events to correct all affected sessions. With RabbitMQ, the original events would be gone.

Operational complexity

Kafka is operationally more complex than RabbitMQ. A production Kafka cluster needs:

Multiple brokers (minimum 3 for fault tolerance)
ZooKeeper or KRaft for coordination (KRaft is newer and removes the ZooKeeper dependency)
Monitoring for partition lag, consumer group health, and broker disk usage
Capacity planning for retention (retention = disk space)

RabbitMQ is simpler to operate:

A single node works for many use cases
Clustering is straightforward for high availability
Memory-based by default, so disk planning is less critical
The management UI provides monitoring out of the box

For small teams without dedicated infrastructure engineers, RabbitMQ's operational simplicity is a real advantage. Managed Kafka services (AWS MSK, Confluent Cloud) reduce the operational burden but add cost.

The decisions on two real projects

Project 1: payment processing system. Tasks: validate payment, charge card, send receipt, update ledger. Each task needed to happen exactly once, in order, by one worker. RabbitMQ with a single queue and three workers. Simple, reliable, done.

Project 2: healthcare session tracking. Events: session requested, interpreter matched, session started, message sent, session ended, feedback submitted. Multiple services consume each event independently. Events need to be auditable and replayable. Kafka with one topic and three consumer groups (session management, analytics, compliance logging).

The choice was clear in both cases because the requirements mapped directly to the strengths of each system. When the requirements are ambiguous, default to RabbitMQ. It's simpler to set up, simpler to operate, and if you discover later that you need Kafka's features, the migration path (publish the same events to Kafka while keeping RabbitMQ for task queues) is straightforward.

RESPONSES