Hey Praveen, I also suspect that you can get away with far fewer threads. Here's the general starting point I recommend:
* start with just a little over 1 thread per hardware thread (accounting for cores and hyperthreading). For example, on my machine, I have 4 cores with 2 threads of execution each, so I would configure the application with 8 or maybe 9 threads. Much more than that introduces a *lot* of CPU/memory overhead in exchange for not much gain (if any). * choose a number of partitions that would allow you to scale up to a reasonable number of machines, with respect to the numbers you get above. >From there, take a close look at all your important machine metrics (cpu, memory, disk, network) as well as processing metrics (task throughput (how long your application code takes), end-to-end processing throughput (how long the full processing lifecycle takes, including the broker roundtrips)). If there's any resource not saturated, you can tweak various configurations to try and saturate it. I would think that stuff like buffer size and batch size would be more helpful with less overhead than number of threads. But keep a close look at your throughputs each time you make a change, to be sure you're not locally optimizing at the expense of global performance. I hope this helps! -John On Thu, Sep 13, 2018 at 4:53 PM Svante Karlsson <svante.karls...@csi.se> wrote: > You are doing something wrong if you need 10k threads to produce 800k > messages per second. It feels you are a factor of 1000 off. What size are > your messages? > > On Thu, Sep 13, 2018, 21:04 Praveen <praveev...@gmail.com> wrote: > > > Hi there, > > > > I have a kafka application that uses kafka consumer low-level api to help > > us process data from a single partition concurrently. Our use case is to > > send out 800k messages per sec. We are able to do that with 4 boxes using > > 10k threads and each request taking 50ms in a thread. (1000/50*10000*4) > > > > I understand that kafka in general uses partitions as its parallelism > > model. It is my understanding that if I want the exact same behavior with > > kafka streams, I'd need to create 40k partitions for this topic. Is that > > right? > > > > What is the overhead on creating thousands of partitions? If we end up > > wanting to send out millions of messages per second, is increasing the > > partitions the only way? > > > > Best, > > Praveen > > >