Table of contents
No headings in the article.
There are several factors to consider when choosing the number of partitions:
What is the throughput you expect to achieve for the topic? For example, do you expect to write 100KB per second or 1 GB per second?
What is the maximum throughput you expect to achieve when consuming from a single partition? You will always have, at most, one the consumer reading from a partition, so if you know that slower consumer write to a database and this database never handlers more than 50MB per second from each thread writing to it, then you know you limited to 60 MB throughput when consuming from a partition.
You can go throughput the same exercise to estimate the maximum throughput per producer for a single partition, but since producer are typically much faster than consumers, it is usually safe to skip this.
Consider the number of partition you will place on each broker and available diskspace and network bandwidth per broker.
if you are sending message to partitions based on keys, adding partition later can be very challenging, so calculate throughput based on your expected future usage, not the current usage.
=> With all this in mind, it's clear that you want many partitions but not too many. If you have some estimate regarding the target throughput of the topic and the expected throughput of the consumer, you can divide the target throughput by the expected consumer throughput and derive the number of partitions this way. So if you want to be able to write and read 1 GB/sec from a topic, and I know each consumer can only process 50 MB/sec from a topic, then you know I need at least 20 partitions. This way, I can have 20 consumer reading from a topic and achieve 1GB/sec.