snyc between two pcollection with different windows

2022-07-20 Thread Sigalit Eliazov
Hi all i need some advice regarding windows usage. i am sure this is a very basic question, any guidance will be very appreciated I am using: - unbounded pcollectionA with FixedWindow of 1 minute from which eventually i create state and use it as side input. PCollection> pcollectionA = x.

Can we use KafkaIO SplittableDoFn ?

2022-07-20 Thread Jean Wisser
Hi all, I just spotted https://github.com/apache/beam/pull/22261 which completely disables Kafka SDF as default. I'm currently trying to make it work because the unbounded source version has performance limitations. What is your advice ? what is the current most efficient/working way of readi

Re: Can we use KafkaIO SplittableDoFn ?

2022-07-20 Thread John Casey via user
This is being temporarily disabled due to some substantial issues we've discovered with the SDF implementation. Ideally this is temporary, and will be resolved quickly. If the SDF issues do not affect you, you can avoid upgrading Beam versions temporarily. On Wed, Jul 20, 2022 at 9:00 AM Jean Wiss

Re: RedisIO Apache Beam JAVA Connector

2022-07-20 Thread Alexey Romanenko
I believe that Read and Write parts of RedisIO are well independent and I’m not aware of any issues with Write. — Alexey > On 20 Jul 2022, at 00:52, Shivam Singhal wrote: > > Hi Alexey! > > Thanks for replying. > I think we will only use RedisIO to write to redis. From your reply & github >

Re: [Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-20 Thread Shree Tanna
Thank you! I will try this out. One more question on this, is it considered anti-pattern to do HTTP ingestion on GCP Dataflow due to the reasoning I mentioned in my original message? I ask because I am getting that indication from some of my co-workers and also from google cloud support. Not sure i

Re: [Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-20 Thread Chamikara Jayalath via user
I don't think it's an antipattern per se. You can implement arbitrary operations in a DoFn or an SDF to read data. But if a single resource ID maps to a large amount of data, Beam runners (including Dataflow) will be able to parallelize reading, hence your solution may have suboptimal performance

Re: [Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-20 Thread Chamikara Jayalath via user
On Wed, Jul 20, 2022 at 12:57 PM Chamikara Jayalath wrote: > I don't think it's an antipattern per se. You can implement arbitrary > operations in a DoFn or an SDF to read data. > > But if a single resource ID maps to a large amount of data, Beam runners > (including Dataflow) will be able to par