My use case requires total order in kafka queue, so I tested with a topic with only 1 partition. My spout parallelism was set to 1, and bolt parallelism 20. The message size is less than 1k bytes each.
No matter how I tune kafka spout configs, including those queue fetch related params, and max spout pending, I could only get about 10K tuples/s with very low complete latency (<10ms) I even tried with empty bolt that acks tuples immediately without any extra processing. Still the throughput is similar, though the complete latency was even lower. This makes me wonder if I hit some sort of perf. walls. My boxes are quite powerful baremetals (40 cores, lots of disk space, 96G memory, 1G network), also the worker jvm was tuned so negligible pauses there. Any advice on what I can tune or look into? Thanks a lot! Fang
