Hi Ismael, Can you also add me to this meeting. I would also like to contribute.
Regards, Vishwas On 2019/09/19 12:13:40, Etienne Chauchot <echauc...@apache.org> wrote: > Hi Rahul and Xinyu,I just added you to the list of guests in the meeting. > Time is 5pm GMT +2. That being said, for some > reason last meeting scheduled was 08/28. Ismael initially created the > meeting, I do not have the rights to add a new > date. Ismael can you add a date ? I suggest 09/25. WDYT ? > BestEtienne > Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit : > > Hi, > > I would love to join the call. > > Can you also share the meeting invitation with me? > > > > Thanks, > > Rahul > > On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <xinyuliu...@gmail.com> wrote: > > > Alexey and Etienne: I'm very happy to join the sync-up meeting. Please > > > forward the meeting info to me. I am based in > > > California, US and hopefully the time will work :). > > > Thanks, > > > Xinyu > > > On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <echauc...@apache.org> > > > wrote: > > > > Hi Xinyu, > > > > Thanks for offering help ! My comments are inline: > > > > Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit : > > > > > Hi, Etienne, > > > > > The slides are very informative! Thanks for sharing the details about > > > > > how the Beam API are mapped into Spark > > > > > Structural Streaming. > > > > > > > > Thanks ! > > > > > We (LinkedIn) are also interested in trying the new SparkRunner to > > > > > run Beam pipeine in batch, and contribute to > > > > > it too. From my understanding, seems the functionality on batch side > > > > > is mostly complete and covers quite a large > > > > > percentage of the tests (a few missing pieces like state and timer in > > > > > ParDo and SDF). > > > > > > > > Correct, it passes 89% of the tests, but there is more than SDF, state > > > > and timer missing, there is also ongoing > > > > encoders work that I would like to commit/push before merging. > > > > > If so, is it possible to merge the new runner sooner into master so > > > > > it's much easier for us to pull it in (we > > > > > have an internal fork) and contribute back? > > > > > > > > Sure, see my other mail on this thread. As Alexey mentioned, please > > > > join the sync meeting we have, the more the > > > > merrier ! > > > > > Also curious about the scheme part in the runner. Seems we can > > > > > leverage the schema-aware work in PCollection and > > > > > translate from Beam schema to Spark, so it can be optimized in the > > > > > planner layer. It will be great to hear back > > > > > your plans on that. > > > > > > > > Well, it is not designed yet but, if you remember my talk, we need to > > > > store beam windowing information with the > > > > data itself, so ending up having a dataset<WindowedValue> . One lead > > > > that was discussed is to store it as a Spark > > > > schema such as this: > > > > 1. field1: binary data for beam windowing information (cannot be mapped > > > > to fields because beam windowing info is > > > > complex structure) > > > > 2. fields of data as defined in the Beam schema if there is one > > > > > > > > > Congrats on this great work! > > > > Thanks ! > > > > Best, > > > > Etienne > > > > > Thanks, > > > > > Xinyu > > > > > On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <ruw...@google.com> wrote: > > > > > > Hello Etienne, > > > > > > Your slide mentioned that streaming mode development is blocked > > > > > > because Spark lacks supporting multiple- > > > > > > aggregations in its streaming mode but design is ongoing. Do you > > > > > > have a link or something else to their design > > > > > > discussion/doc? > > > > > > > > > > > > > > > > > > -Rui > > > > > > On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot > > > > > > <echauc...@apache.org> wrote: > > > > > > > Hi Rahul,Sure, and great ! Thanks for proposing !If you want > > > > > > > details, here is the presentation I did 30 mins > > > > > > > ago at the apachecon. You will find the video on youtube shortly > > > > > > > but in the meantime, here is my > > > > > > > presentation slides. > > > > > > > And here is the structured streaming branch. I'll be happy to > > > > > > > review your PRs, thanks ! > > > > > > > https://github.com/apache/beam/tree/spark-runner_structured-streaming > > > > > > > BestEtienne > > > > > > > Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a > > > > > > > écrit : > > > > > > > > Hi Etienne, > > > > > > > > > > > > > > > > I came to know about the work going on in Structured Streaming > > > > > > > > Spark Runner from Apache Beam Wiki - Works > > > > > > > > in Progress. > > > > > > > > I have contributed to BeamSql earlier. And I am working on > > > > > > > > supporting PCollectionView in BeamSql. > > > > > > > > > > > > > > > > I would love to understand the Runner's side of Apache Beam and > > > > > > > > contribute to the Structured Streaming > > > > > > > > Spark Runner. > > > > > > > > > > > > > > > > Can you please point me in the right direction? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Rahul >