Re: Pointers on Contributing to Structured Streaming Spark Runner

Alexey Romanenko Tue, 24 Sep 2019 06:07:26 -0700
I don’t see any updates on my calendar.  Does it work for others?

> On 19 Sep 2019, at 17:16, Ismaël Mejía <ieme...@gmail.com> wrote:
> 
> 25/09 looks ok. I just updated the meeting invitation to the new
> date.I will prepare a mini agenda in the shared minute document in the
> meantime.
> I cannot see the old invitees, can someone please confirm me they see
> the date updated.
> Thanks,
> Ismaël
> 
> On Thu, Sep 19, 2019 at 2:13 PM Etienne Chauchot <echauc...@apache.org> wrote:
>> 
>> Hi Rahul and Xinyu,
>> I just added you to the list of guests in the meeting. Time is 5pm GMT +2.
>> That being said, for some reason last meeting scheduled was 08/28. Ismael 
>> initially created the meeting, I do not have the rights to add a new date. 
>> Ismael can you add a date ? I suggest 09/25. WDYT ?
>> 
>> Best
>> Etienne
>> 
>> Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit :
>> 
>> Hi,
>> 
>> I would love to join the call.
>> Can you also share the meeting invitation with me?
>> 
>> Thanks,
>> Rahul
>> 
>> On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <xinyuliu...@gmail.com> wrote:
>> 
>> Alexey and Etienne: I'm very happy to join the sync-up meeting. Please 
>> forward the meeting info to me. I am based in California, US and hopefully 
>> the time will work :).
>> 
>> Thanks,
>> Xinyu
>> 
>> On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <echauc...@apache.org> 
>> wrote:
>> 
>> Hi Xinyu,
>> 
>> Thanks for offering help ! My comments are inline:
>> 
>> Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit :
>> 
>> Hi, Etienne,
>> 
>> The slides are very informative! Thanks for sharing the details about how 
>> the Beam API are mapped into Spark Structural Streaming.
>> 
>> 
>> Thanks !
>> 
>> We (LinkedIn) are also interested in trying the new SparkRunner to run Beam 
>> pipeine in batch, and contribute to it too. From my understanding, seems the 
>> functionality on batch side is mostly complete and covers quite a large 
>> percentage of the tests (a few missing pieces like state and timer in ParDo 
>> and SDF).
>> 
>> 
>> Correct, it passes 89% of the tests, but there is more than SDF, state and 
>> timer missing, there is also ongoing encoders work that I would like to 
>> commit/push before merging.
>> 
>> If so, is it possible to merge the new runner sooner into master so it's 
>> much easier for us to pull it in (we have an internal fork) and contribute 
>> back?
>> 
>> 
>> Sure, see my other mail on this thread. As Alexey mentioned, please join the 
>> sync meeting we have, the more the merrier !
>> 
>> 
>> Also curious about the scheme part in the runner. Seems we can leverage the 
>> schema-aware work in PCollection and translate from Beam schema to Spark, so 
>> it can be optimized in the planner layer. It will be great to hear back your 
>> plans on that.
>> 
>> 
>> Well, it is not designed yet but, if you remember my talk, we need to store 
>> beam windowing information with the data itself, so ending up having a 
>> dataset<WindowedValue> . One lead that was discussed is to store it as a 
>> Spark schema such as this:
>> 
>> 1. field1: binary data for beam windowing information (cannot be mapped to 
>> fields because beam windowing info is complex structure)
>> 
>> 2. fields of data as defined in the Beam schema if there is one
>> 
>> 
>> Congrats on this great work!
>> 
>> Thanks !
>> 
>> Best,
>> 
>> Etienne
>> 
>> Thanks,
>> Xinyu
>> 
>> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <ruw...@google.com> wrote:
>> 
>> Hello Etienne,
>> 
>> Your slide mentioned that streaming mode development is blocked because 
>> Spark lacks supporting multiple-aggregations in its streaming mode but 
>> design is ongoing. Do you have a link or something else to their design 
>> discussion/doc?
>> 
>> 
>> -Rui
>> 
>> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <echauc...@apache.org> 
>> wrote:
>> 
>> Hi Rahul,
>> Sure, and great ! Thanks for proposing !
>> If you want details, here is the presentation I did 30 mins ago at the 
>> apachecon. You will find the video on youtube shortly but in the meantime, 
>> here is my presentation slides.
>> 
>> And here is the structured streaming branch. I'll be happy to review your 
>> PRs, thanks !
>> 
>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>> 
>> Best
>> Etienne
>> 
>> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>> 
>> Hi Etienne,
>> 
>> I came to know about the work going on in Structured Streaming Spark Runner 
>> from Apache Beam Wiki - Works in Progress.
>> I have contributed to BeamSql earlier. And I am working on supporting 
>> PCollectionView in BeamSql.
>> 
>> I would love to understand the Runner's side of Apache Beam and contribute 
>> to the Structured Streaming Spark Runner.
>> 
>> Can you please point me in the right direction?
>> 
>> Thanks,
>> Rahul
Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to