Re: Time needed to read from Kafka source

2021-05-27 Thread B.B.
OMG! Thank you! Thank you! I didn't think this could be a problem. When I removed validation the time needed to ingest all events reduced to 10min. BR, BB On Thu, May 27, 2021 at 11:50 AM Arvid Heise wrote: > Hi, > > The implementation looks good. I'd probably cache the > *ObjectValidator.of().

Re: Time needed to read from Kafka source

2021-05-27 Thread Arvid Heise
Hi, The implementation looks good. I'd probably cache the *ObjectValidator.of().getValidator()* in a field to be sure that it's not a pricey construction. Did you evaluate what happens when you skip the validation entirely in terms of records/s? On Thu, May 27, 2021 at 11:18 AM B.B. wrote: > I

Re: Time needed to read from Kafka source

2021-05-27 Thread B.B.
I am having a problem with sending code. So here it is. Hope this now looks ok This is my main job (some parts of codes are abbreviated and this is the main part): *public class MyJob {* * private StreamExecutionEnvironment env;* * private static final Integer NUM_OF_PARALLEL_OPERATORS = 1;*

Re: Time needed to read from Kafka source

2021-05-26 Thread B.B.
Hi, I forgot to mention that we are using Flink 1.12.0. This is a job that has only minimum components. Reading from source and printing it. Profiling was my next step to do. Regarding memory I didn't see any bottlenecks. I guess I will have to do some investigating in the metric part of Flink. BR

Re: Time needed to read from Kafka source

2021-05-26 Thread B.B.
Hi, I forgot to mention that we are running Flink 1.12.0. This is the main function (some parts of codes are abbreviated and this is the main part). As you can see the job was simplified to minimum. Just reading from source and printing. [image: Screenshot 2021-05-26 at 08.05.53.png] And this

Re: Time needed to read from Kafka source

2021-05-25 Thread Arvid Heise
Could you share your KafkaDeserializationSchema, we might be able to spot some optimization potential. You could also try out enableObjectReuse [1], which avoids copying data between tasks (not sure if you have any non-chained tasks). If you are on 1.13, you could check out the flamegraph to see w

Re: Time needed to read from Kafka source

2021-05-25 Thread Piotr Nowojski
Hi, That's a throughput of 700 records/second, which should be well below theoretical limits of any deserializer (from hundreds thousands up to tens of millions records/second/per single operator), unless your records are huge or very complex. Long story short, I don't know of a magic bullet to h

Time needed to read from Kafka source

2021-05-25 Thread B.B.
Hi, I am in the process of optimizing my job which at the moment by our thinking is too slow. We are deploying job in kubernetes with 1 job manager with 1gb ram and 1 cpu and 1 task manager with 4gb ram and 2 cpu-s (eg. 2 task slots and parallelism of two). The main problem is one kafka source t