Hi all,
I'm planning to use Apache beam to extract and load part of the ETL
pipeline and run the jobs on Dataflow. I will have to do the REST API
ingestion on our platform. I can opt to make sync API calls from DoFn. With
that pipelines will stall while REST requests are made over the network.
Is
Provided you have all the resources ids ahead of fetching, Beam will spread
the fetches to its workers. It will still fetch synchronously but within
that worker.
On Tue, Jul 19, 2022 at 5:40 PM Shree Tanna wrote:
> Hi all,
>
> I'm planning to use Apache beam to extract and load part of the ETL
>
Even if you don't have the resource ids ahead of time, you can have a
pipeline like:
Impulse -> ParDo(GenerateResourceIds) -> Reshuffle ->
ParDo(ReadResourceIds) -> ...
You could also compose these as splittable DoFns [1, 2, 3]:
ParDo(SplittableGenerateResourceIds) -> ParDo(SplittableReadResourceI
Hi Alexey!
Thanks for replying.
I think we will only use RedisIO to write to redis. From your reply &
github issue 21825, it seems SDF is causing some issue in reading from
Redis.
Do you know of any issues with Write?
If I get a chance to test the reading in my staging environment, I will :)
Th