Apache Kafka partitions provide order and structure to all work processes. But how many partitions should you set up? A good question, which should be answered before you deploy Kafka to production. And also a question that the following blog post addresses.
Why are Apache Kafka partitions needed at all?
Kafka partitions speed up the processing of large amounts of data in parallel by distributing message processing across many nodes in your Kafka cluster. Instead of writing all of your data from a topic to one partition (and thus one broker), you are able to split your topics into different partitions and distribute those partitions to brokers. Unlike other messaging systems, producers are responsible for deciding which partition messages are written to.
If your messages have keys, then the producer distributes the data such that data with the same key ends up on the same partition. This way, you can guarantee that messages with the same key are always in order. If keys are not used, then messages are distributed to the partitions in a round-robin manner.
Furthermore, you can have at most as many (useful) consumers in a consumer group as there are partitions in the topics they consume. The bottleneck in data processing is often not the broker or the producer, but the consumer, which must not only read the data, but also process it.
How Partition count affects Kafka Performance
In general, the more Kafka partitions you have, the
higher is your data throughput: Both the brokers and the producers can process different partitions completely independently – and thus in parallel. This allows these systems to better utilize the available resources and process more messages. Important: The number of partitions can also be significantly higher than the number of brokers. This is not a problem.
more consumers you can have in your Consumer Groups: This also potentially increases the data throughput because you can spread the work across more systems. But beware: The more individual systems you have, the more parts that can fail and cause issues.
more open file handles you have on the brokers: Kafka opens two files for each segment of a partition: the log and the index. This has little impact on performance, but you should definitely increase the allowed number of open files on the operating system side.
longer downtimes occur: If a Kafka broker is shut down cleanly, then it notifies the controller and the controller can move the partition leaders to the other brokers without downtime. However, if a broker fails, there may be longer downtime because there is no leader for many partitions. Due to limitations in Zookeeper, the consumer can only move one leader at a time. This takes about 10 ms. With thousands of leaders to move, this can take minutes in some circumstances. If the controller fails, then it must read in the leaders of all partitions. If this takes about 2 ms per leader, then the process takes even longer. With KRaft this problem will become much smaller.
more RAM is consumed by the clients: the clients create buffers per partition and if a client interacts with very many partitions, possibly spread over many topics (especially as a producer), then the RAM consumption adds up a lot.
Kafka partition limits
There are no hard limits on the number of partitions in Kafka clusters. But here are a few general rules:
maximum 4000 partitions per broker (in total; distributed over many topics)
maximum 200,000 partitions per Kafka cluster (in total; distributed over many topics)
resulting in a maximum of 50 brokers per Kafka cluster
This reduces downtime in case something does go wrong. But beware: It should not be your goal to push these limits. In many “medium data” applications, you don’t need any of this.
Kafka Partitioning Rules of thumb
As already described at the beginning, there is no “right” answer to the question about the number of Kafka partitions. However, the following rules of thumb have become established over time:
No prime numbers: Even though many examples on the Internet (or in training courses) use three partitions, in general, it is a bad idea to use prime numbers because prime numbers are very difficult to divide among different numbers of brokers and consumers. Numbers that can be divided by many other numbers should always be used
Monitor, monitor, monitor: Visibility into Kafka partition, topic, consumer performance is crtically important. A Kafka GUI client (here's a free one) can help you visualize cluster performance and optimize your partitions and consumer groups.
Multiples of consumers: This allows partitions to be distributed evenly among the consumers in a consumer group.
Multiples of the brokers: This way you can distribute the partitions (and leaders!) evenly among all brokers this way
Consistency in Kafka Cluster: Especially if you want to use Kafka Streams, we realize that it makes sense to keep the number of partitions the same across your topics. For example, if we intend to do a join across two topics and these two topics do not have the same number of partitions, Kafka Streams needs to repartition a topic beforehand. This is a costly issue that you should avoid if possible.
Depending on the performance measurements: If you know your target data throughput and also know the measured throughput of your Consumer and Producer per partition, you can calculate from that how many Kafka partitions are needed. For example, if you want to move 100 MB/s of data and can achieve 10 MB/s in the Producer per partition and 20 MB/s in the Consumer, we require at least 10 partitions (and at least 5 Consumers in the Consumer Group).
Do not overdo it: This is not a contest to set up the largest number of partitions possible. If you only process a few tens of thousands of messages per day, you don’t need hundreds of partitions.
Practical Kafka partitioning examples
From my experience, 12 has become a good guideline for the number of partitions. For customers who process very little data with Kafka (or need to pay per partition), even smaller numbers can make sense (2,4,6).
Other helpful resources...
If you found this article helpful, you might also enjoy:
As an IT trainer and a frequent Blogger for Xeotek, Anatoly Zelenin teaches Apache Kafka to hundreds of participants in interactive training sessions. For more than a decade, his clients from the DAX environment and German medium-sized businesses have appreciated his expertise and his inspiring manner. In that capacity, he is delivering trainings for Xeotek Clients as well. His book is available directly from him, from Hanser-Verlag, Amazon. You can reach Anatoly via his e-mail. In addition, he is not only an IT Consultant, Trainer and Blogger but also explores our planet as an adventurer.