Hi Folks,
I know this is not the optimal way to use beam :-) But assume I only use
the spark runner.
I have a spark library (very complex) that emits a spark dataframe (or
RDD).
I also have an existing complex beam pipeline that can do post processing
on the data inside the dataframe.
However, t
To add a bit more to what Robert suggested. Right, in general we can’t read
Spark RDD directly with Beam (Spark runner uses RDD under the hood but it’s a
different story) but you can write the results to any storage and in data
format that Beam supports and then read it with a corespondent Beam
Thanks Robert and Brian.
As for "writing the RDD somewhere", I can totally write a bunch of files on
disk/s3. Any other options?
-Yushu
On Mon, May 23, 2022 at 11:40 AM Brian Hulette wrote:
> Yeah I'm not sure of any simple way to do this. I wonder if it's worth
> considering building some Spar
> On 23 May 2022, at 20:40, Brian Hulette wrote:
>
> Yeah I'm not sure of any simple way to do this. I wonder if it's worth
> considering building some Spark runner-specific feature around this, or at
> least packaging up Robert's proposed solution?
I’m not sure that a runner specific featu