Re: RDD (Spark dataframe) into a PCollection?

Yushu Yao Mon, 23 May 2022 11:43:28 -0700

Thanks Robert and Brian.
As for "writing the RDD somewhere", I can totally write a bunch of files on
disk/s3. Any other options?
-Yushu



On Mon, May 23, 2022 at 11:40 AM Brian Hulette <bhule...@google.com> wrote:

> Yeah I'm not sure of any simple way to do this. I wonder if it's worth
> considering building some Spark runner-specific feature around this, or at
> least packaging up Robert's proposed solution?
>
> There could be other interesting integrations in this space too, e.g.
> using Spark RDDs as a cache for Interactive Beam.
>
> Brian
>
> On Mon, May 23, 2022 at 11:35 AM Robert Bradshaw <rober...@google.com>
> wrote:
>
>> The easiest way to do this would be to write the RDD somewhere then
>> read it from Beam.
>>
>> On Mon, May 23, 2022 at 9:39 AM Yushu Yao <yao.yu...@gmail.com> wrote:
>> >
>> > Hi Folks,
>> >
>> > I know this is not the optimal way to use beam :-) But assume I only
>> use the spark runner.
>> >
>> > I have a spark library (very complex) that emits a spark dataframe (or
>> RDD).
>> > I also have an existing complex beam pipeline that can do post
>> processing on the data inside the dataframe.
>> >
>> > However, the beam part needs a pcollection to start with. The question
>> is, how can I convert a spark RDD into a pcollection?
>> >
>> > Thanks
>> > -Yushu
>> >
>>
>

Re: RDD (Spark dataframe) into a PCollection?

Reply via email to