[Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-19 Thread Shree Tanna
Hi all, I'm planning to use Apache beam to extract and load part of the ETL pipeline and run the jobs on Dataflow. I will have to do the REST API ingestion on our platform. I can opt to make sync API calls from DoFn. With that pipelines will stall while REST requests are made over the network. Is

Re: [Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-19 Thread Damian Akpan
Provided you have all the resources ids ahead of fetching, Beam will spread the fetches to its workers. It will still fetch synchronously but within that worker. On Tue, Jul 19, 2022 at 5:40 PM Shree Tanna wrote: > Hi all, > > I'm planning to use Apache beam to extract and load part of the ETL >

Re: [Dataflow][Python] Guidance on HTTP ingestion on Dataflow

2022-07-19 Thread Luke Cwik via user
Even if you don't have the resource ids ahead of time, you can have a pipeline like: Impulse -> ParDo(GenerateResourceIds) -> Reshuffle -> ParDo(ReadResourceIds) -> ... You could also compose these as splittable DoFns [1, 2, 3]: ParDo(SplittableGenerateResourceIds) -> ParDo(SplittableReadResourceI

Re: RedisIO Apache Beam JAVA Connector

2022-07-19 Thread Shivam Singhal
Hi Alexey! Thanks for replying. I think we will only use RedisIO to write to redis. From your reply & github issue 21825, it seems SDF is causing some issue in reading from Redis. Do you know of any issues with Write? If I get a chance to test the reading in my staging environment, I will :) Th