Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Sozonoff Serge
Hi, For all these connections, are they existing at the same time(connection is created without closing)? Yes. Serge On 24 May 2021 at 20:41:06, Boyuan Zhang (boyu...@google.com) wrote: One more question : ) For all these connections, are they existing at the same time(connection is create

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Alexey Romanenko
> On 24 May 2021, at 10:43, Sozonoff Serge wrote: > > OK thanks. Just to clarify, in my case the message throughput is zero when I > start the Beam pipeline up and it will still crash once all file handles are > consumed even if I dont send a single message to the kafka topic. This sounds li

Re: File processing triggered from external source

2021-05-25 Thread Alexey Romanenko
You don’t need to use windowing strategy or aggregation triggers for a pipeline with bounded source to perform GbK-like transforms, but since you started to use unbounded source then your pcollections became unbounded and you need to do that. Otherwise, it’s unknown at which point of time your G

Re: JdbcIO parallel read on spark

2021-05-25 Thread Alexey Romanenko
Hi, Did you check a Spark DAG if it doesn’t fork branches after "Genereate queries” transform? — Alexey > On 24 May 2021, at 20:32, Thomas Fredriksen(External) > wrote: > > Hi there, > > We are struggling to get the JdbcIO-connector to read a large table on spark. > > In short - we wish to

Re: Oracle JDBC driver with expansion service

2021-05-25 Thread Alexey Romanenko
Hi, This question looks more as a user-related question, so let's continue this conversation on user@ — Alexey > On 25 May 2021, at 15:32, Rafael Ribeiro wrote: > > Hi, > > I'm trying to read and write on Oracle database using the JDBC driver of Beam > > but I'm having some problems, speci

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Boyuan Zhang
If you are not using `withMaxNumRecords` nor `withMaxReadTime` , Beam 2.28.0 and 2.29.0 will wrap KafkaIO.read() with UnboundedSourceAsSDFWrapperFn and executes it as Splittable DoFn via OutputAndTimeBoundedSplittableProcessElementInvoker. The KafkaConsumer is created every time KafkaUnboundedReade

Re: File processing triggered from external source

2021-05-25 Thread Sozonoff Serge
Hi, Thanks for the clarification. What is an issue with applying windowing/triggering strategy for your case? The problem was actually not the trigger but the whole approach I took. I guess fundamentally the whole issue for me boils down to the fact the with bound pipelines we have quite a fe

Re: File processing triggered from external source

2021-05-25 Thread Vincent Marquez
On Tue, May 25, 2021 at 11:14 AM Sozonoff Serge wrote: > Hi, > > Thanks for the clarification. > > What is an issue with applying windowing/triggering strategy for your case? > > > The problem was actually not the trigger but the whole approach I took. > > > I guess fundamentally the whole issue

Re: KafkaIO with DirectRunner is creating tons of connections to Kafka Brokers

2021-05-25 Thread Sozonoff Serge
Hi, OutputAndTimeBoundedSplittableProcessElementInvoker, the  KafkaUnboundedReader.start() is called for every checkpoint and DirectRunner produces checkpoint every 100 records or every 1  This is not what I am seeing. The following few extracted log messages are generated in the KafkaUnbounded

Re: JdbcIO parallel read on spark

2021-05-25 Thread Thomas Fredriksen(External)
There is no forking after the "Generate Queries" transform. We noticed that the "Generate Queries" transform is in a different stage than the reading itself. This is likely due to the Reparallelize-transform, and we also see this with JdbcIO.readAll. After reading up on Splittable DoFn's, we deci