Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Sozonoff Serge
Hi, OutputAndTimeBoundedSplittableProcessElementInvoker, the  KafkaUnboundedReader.start() is called for every checkpoint and DirectRunner produces checkpoint every 100 records or every 1  This is not what I am seeing. The following few extracted log messages are generated in the KafkaUnbounded

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Boyuan Zhang
If you are not using `withMaxNumRecords` nor `withMaxReadTime` , Beam 2.28.0 and 2.29.0 will wrap KafkaIO.read() with UnboundedSourceAsSDFWrapperFn and executes it as Splittable DoFn via OutputAndTimeBoundedSplittableProcessElementInvoker. The KafkaConsumer is created every time KafkaUnboundedReade

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Alexey Romanenko
> On 24 May 2021, at 10:43, Sozonoff Serge wrote: > > OK thanks. Just to clarify, in my case the message throughput is zero when I > start the Beam pipeline up and it will still crash once all file handles are > consumed even if I dont send a single message to the kafka topic. This sounds li

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Sozonoff Serge
Hi, For all these connections, are they existing at the same time(connection is created without closing)? Yes. Serge On 24 May 2021 at 20:41:06, Boyuan Zhang (boyu...@google.com) wrote: One more question : ) For all these connections, are they existing at the same time(connection is create

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Serge Sozonoff
Hi. So far 2.28 and 2.29 Serge Sent from my iPhone 6 > On 24 May 2021, at 19:37, Boyuan Zhang wrote: > >  > Hi Serge, > > Which Beam version are you using when you notice the problem? > >> On Mon, May 24, 2021 at 6:36 AM Sozonoff Serge wrote: >> Hi, >> >> Tested with 2.28 that flag doe

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Boyuan Zhang
Hi Serge, Which Beam version are you using when you notice the problem? On Mon, May 24, 2021 at 6:36 AM Sozonoff Serge wrote: > Hi, > > Tested with 2.28 that flag does not seem to make any difference. > > BR, > Serge > > On 24 May 2021 at 15:06:11, Steve Niemitz (sniem...@apache.org) wrote: > >

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Sozonoff Serge
Hi, Tested with 2.28 that flag does not seem to make any difference. BR, Serge On 24 May 2021 at 15:06:11, Steve Niemitz (sniem...@apache.org) wrote: Out of curiosity, does adding the "--experiments=use_deprecated_read" argument fix things? (note, this flag was broken in beam 2.29 on the direc

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Steve Niemitz
Out of curiosity, does adding the "--experiments=use_deprecated_read" argument fix things? (note, this flag was broken in beam 2.29 on the direct runner and didn't do anything, so you'd need to test on 2.28 or 2.30) On Mon, May 24, 2021 at 4:44 AM Sozonoff Serge wrote: > Hi. > > OK thanks. Just

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Sozonoff Serge
Hi. OK thanks.  Just to clarify, in my case the message throughput is zero when I start the Beam pipeline up and it will still crash once all file handles are consumed even if I dont send a single message to the kafka topic. Thanks, Serge On 24 May 2021 at 10:14:33, Jan Lukavský (je...@seznam.

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Jan Lukavský
It is not 100 consumers, the checkpoint is created every 100 records. So, if your message throughput is high enough, the consumers might be created really often. But most importantly - DirectRunner is really not intended for performance sensitive applications. You should use a different runner

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Sozonoff Serge
Hi Jan, So if I read your SO answer correctly and looking at the Github link you provided we are talking about ~100 consumers ? Since I am developing locally with a dockerized minimal Kafka broker it is possible that this is enough to hit the max open files limit.  Depending on your definition

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-24 Thread Jan Lukavský
Hi Serge, I posted answer to the SO question, hope that helps. One question - a frequent creation of consumers should be expected with DirectRunner, but there should be only a limited number of them at a time. Do you see many of them present simultaneously? Or are they correctly closed and rel