Apache Kafka is our rocket, and the individual partitions provide order and structure to all work processes at every stage of the flight. But how many partitions should we set up? A good question, which should be answered before we deploy Kafka to production. And also a question that the following blog post addresses. 3,2,1 – launch.
This is a blog post from our Community Stream: by developers, for developers. Don’t miss to stop by our community to find similar articles or join the conversation.
Why do we need partitions in Apache Kafka at all?
In Kafka, we use partitions to speed up the processing of large amounts of data. So instead of writing all of our data from a topic to one partition (and thus one broker), we split our topics into different partitions and distribute those partitions to our brokers. Unlike other messaging systems, our producers are responsible for deciding which partition our messages are written to. If we use keys, then the producer distributes the data such that data with the same key ends up on the same partition. This way, we can guarantee that the order between messages with the same key is guaranteed. If we do not use keys, then messages are distributed to the partitions in a round-robin manner.
Furthermore, we can have at most as many (useful) consumers in a consumer group as we have partitions in the topics they consume. The bottleneck in data processing is often not the broker or the producer, but the consumer, which must not only read the data, but also process it.
In general, the more partitions, the
higher is our data throughput: Both the brokers and the producers can process different partitions completely independently – and thus in parallel. This allows these systems to better utilise the available resources and process more messages. Important: The number of partitions can also be significantly higher than the number of brokers. This is not a problem.
more consumers we can have in our Consumer Groups: This also potentially increases the data throughput because we can spread the work across more systems. But beware: The more individual systems we have, the more parts can fail and cause issues.
more open file handles we have on the brokers: Kafka opens two files for each segment of a partition: the log and the index. This has little impact on performance, but we should definitely increase the allowed number of open files on the operating system side.
longer downtimes occur: If a Kafka broker is shut down cleanly, then it notifies the controller and the controller can move the partition leaders to the other brokers without downtime. However, if a broker fails, there may be longer downtime because there is no leader for many partitions. Due to limitations in Zookeeper, the consumer can only move one leader at a time. This takes about 10 ms. With thousands of leaders to move, this can take minutes in some circumstances. If the controller fails, then it must read in the leaders of all partitions. If this takes about 2 ms per leader, then the process takes even longer. With KRaft this problem will become much smaller.
more RAM is consumed by the clients: the clients create buffers per partition and if a client interacts with very many partitions, possibly spread over many topics (especially as a producer), then the RAM consumption adds up a lot.
Limits on partitions:
There are no hard limits on the number of partitions in Kafka clusters. But here are a few general rules:
maximum 4000 partitions per broker (in total; distributed over many topics)
maximum 200,000 partitions per Kafka cluster (in total; distributed over many topics)
resulting in a maximum of 50 brokers per Kafka cluster
This reduces downtime in case something does go wrong. But beware: It should not be your goal to push these limits. In many “medium data” applications, you don’t need any of this.
Rules of thumb
As already described at the beginning, there is no “right” answer to the question about the number of partitions. However, the following rules of thumb have become established over time:
No prime numbers: Even though many examples on the Internet (or in training courses) use three partitions, in general, it is a bad idea to use prime numbers because prime numbers are very difficult to divide among different numbers of brokers and consumers.
Well divisible numbers: Therefore, numbers that can be divided by many other numbers should always be used.
Multiples of consumers: This allows partitions to be distributed evenly among the consumers in a consumer group.
Multiples of the brokers: This way we can distribute the partitions (and leaders!) evenly among all brokers this way
Consistency in Kafka Cluster: Especially when we want to use Kafka Streams, we realise that it makes sense to keep the number of partitions the same across our topics. For example, if we intend to do a join across two topics and these two topics do not have the same number of partitions, Kafka Streams needs to repartition a topic beforehand. This is a costly issue that we would like to avoid if possible.
Depending on the performance measurements: If we know our target data throughput and also know the measured throughput of our Consumer and Producer per partition, we can calculate from that how many partitions we need. For example, if we know we want to move 100 MB/s of data and can achieve 10 MB/s in the Producer per partition and 20 MB/s in the Consumer, we require at least 10 partitions (and at least 5 Consumers in the Consumer Group).
Do not overdo it: This is not a contest to set up the largest number of partitions possible. If you only process a few tens of thousands of messages per day, you don’t need hundreds of partitions.
In my consulting practice, 12 has become a good guideline for the number of partitions. For customers who process very little data with Kafka (or need to pay per partition), even smaller numbers can make sense (2,4,6). If you are processing large amounts of data, then Pere’s Excel spreadsheet that he has made available on GitHub will help: kafka-cluster-size-calculator.
As an IT trainer and a frequent Blogger for Xeotek, Anatoly Zelenin teaches Apache Kafka to hundreds of participants in interactive training sessions. For more than a decade, his clients from the DAX environment and German medium-sized businesses have appreciated his expertise and his inspiring manner. In that capacity, he is delivering trainings for Xeotek Clients as well. His book is available directly from him, from Hanser-Verlag, Amazon. You can reach Anatoly via his e-mail. In addition, he is not only an IT Consultant, Trainer and Blogger but also explores our planet as an adventurer.