Time needed to read from Kafka source

B.B. Tue, 25 May 2021 06:44:07 -0700

Hi,

I am in the process of optimizing my job which at the moment by our
thinking is too slow.


We are deploying job in kubernetes with 1 job manager with 1gb ram and 1
cpu and 1 task manager with 4gb ram and 2 cpu-s (eg. 2 task slots and
parallelism of two).

The main problem is one kafka source that has 3,8 million events that we
have to process.
As a test we made a simple job that connects to kafka using a custom
implementation of KafkaDeserializationSchema. There we are using
ObjectMapper that mapps input values eg.

*var event = objectMapper.readValue(consumerRecord.value(), MyClass.class);*

This is then validated with hibernate validator and output of this
source is printed on the console.

The time needed for the job to consume all the events was one and a half
hours, which seems a bit long.
Is there a way we can speed up this process?

Is more cpu cores or memory solution?
Should we switch to avro deserialization schema?

Time needed to read from Kafka source

Reply via email to