Re: Pointers on Contributing to Structured Streaming Spark Runner

Xinyu Liu Fri, 13 Sep 2019 12:16:44 -0700

Hi, Etienne,

The slides are very informative! Thanks for sharing the details about how
the Beam API are mapped into Spark Structural Streaming. We (LinkedIn) are
also interested in trying the new SparkRunner to run Beam pipeine in batch,
and contribute to it too. From my understanding, seems the functionality on
batch side is mostly complete and covers quite a large percentage of the
tests (a few missing pieces like state and timer in ParDo and SDF). If so,
is it possible to merge the new runner sooner into master so it's much
easier for us to pull it in (we have an internal fork) and contribute back?


Also curious about the scheme part in the runner. Seems we can leverage the
schema-aware work in PCollection and translate from Beam schema to Spark,
so it can be optimized in the planner layer. It will be great to hear back
your plans on that.

Congrats on this great work!
Thanks,
Xinyu

On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <ruw...@google.com> wrote:

> Hello Etienne,
>
> Your slide mentioned that streaming mode development is blocked because
> Spark lacks supporting multiple-aggregations in its streaming mode but
> design is ongoing. Do you have a link or something else to their design
> discussion/doc?
>
>
> -Rui
>
> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <echauc...@apache.org>
> wrote:
>
>> Hi Rahul,
>> Sure, and great ! Thanks for proposing !
>> If you want details, here is the presentation I did 30 mins ago at the
>> apachecon. You will find the video on youtube shortly but in the meantime,
>> here is my presentation slides.
>>
>> And here is the structured streaming branch. I'll be happy to review your
>> PRs, thanks !
>>
>> <https://github.com/apache/beam/tree/spark-runner_structured-streaming>
>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>>
>> Best
>> Etienne
>>
>> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>>
>> Hi Etienne,
>>
>> I came to know about the work going on in Structured Streaming Spark
>> Runner from Apache Beam Wiki - Works in Progress.
>> I have contributed to BeamSql earlier. And I am working on supporting
>> PCollectionView in BeamSql.
>>
>> I would love to understand the Runner's side of Apache Beam and
>> contribute to the Structured Streaming Spark Runner.
>>
>> Can you please point me in the right direction?
>>
>> Thanks,
>> Rahul
>>
>>

Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to