Hi

I am trying to use a plain Java consumer (over SSL) to consume a very large
amount of historic data (20+TB across 20+ partitions). Consumption
performance is very low when fully parallelized.

We are seeing about* 200k rec/s* with java consumer versus *950k rec/s*
with librdkafka
We are seeing about *1 gigabit/s* with java consumer versus *5.3 gigabit/s*
with librdkafka

Both applications are doing no-ops (eg: consume, deserialize as byte
arrays, print a line for every 100 events). Both applications are using
defaults (including the same fetch sizes, maximums, batch sizes, etc). The
java processes do not appear to be starved for resources, CPU, memory, etc,
nor do the kafkacat instances. Everything is being run in exactly the same
environments with the same resources, but the Java Kafka client is just
incredibly slow.

Java Kafka Client version 2.4.x
JDK 11 (I think there was an SSL performance issue that required upgrading
to at least JDK 11).

Am I doing wrong here? The last time I tested the performance difference
between these two libraries was years ago, and it was something like
librdkafka was a bit faster in most cases, but certainly not 5x faster in a
no-op scenario. Is this in line with expectations?

Any thoughts or suggestions would be very much appreciated.

Thanks
Adam

Reply via email to