Hi Shaoxuan, Yes, I've seen your message and I am not saying it already contradicts. I agree as long as we just define DAG/pipeline/logical plan it is a reasonable thing to do. No doubts about that. I have a feeling though it mentions at some points things that might be in the area of responsibility of Beam, e.g. convenience methods like: fromElements, Table#head(in comments)... Those methods require bidirectional communication between java <> python, and not only one way communication python -> (logical, representation) -> java. Also UDFs support as far as I understand is something we might be able to leverage Beam (but at the same time I might be completely wrong).
The only thing I wanted to outline is I would welcome at least some comparisons of the proposed approach to Beam multi-language support. Discussion when can we think of leveraging Beam and when we should come up with our own solution and why would also be beneficial I think. Right now the design document does not mention Beam at all. Sorry if I sounded too harsh, my intention isn't/wasn't to discard this effort. Best, Dawid On 04/04/2019 09:41, Shaoxuan Wang wrote: > David, > This proposal does not contradict with what we have discussed. > Please check my reply in > https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E > on > 2019/02/21. > "Beam Python API and Flink Python TableAPI describe the DAG/pipeline in > different manners. We got a chance to communicate with Tyler Akidau (from > Beam) offline, and explained why the Flink tableAPI needs a specific design > for python, rather than purely leverage Beam portability layer. > > In our proposal, most of the Python code is just a DAG/pipeline builder for > tableAPI. The majority of operators run purely in Java, while only python > UDFs executed in Python environment during the runtime. This design does > not affect the development and adoption of Beam language portability layer > with Flink runner. Flink and Beam community will still collaborate, jointly > develop and optimize on the JVM / Non-JVM (python,GO) bridge (data and > control connections between different processes) to ensure the robustness > and performance." > > When we talk about multi-language support, it involves two components: API > and language. And they are Orthogonal. TableAPI is a descriptive API, and > will be a superset of SQL. I do not see Beam has the layer and any plan to > cover the tableAPI semantics. We already have two languages supported for > tableAPI(java/scala). I do not see the reason why we should not add another > language (python) support for tableAPI. > > Regards, > Shaoxuan > > > > On Thu, Apr 4, 2019 at 3:13 PM Dawid Wysakowicz <dwysakow...@apache.org> > wrote: > >> Hi all, >> >> Thank you very much Jincheng for the very thorough proposal. I was >> following the discussion very briefly, but I have an impression that the >> consensus in the previous discussion[1] was that we do not want to have >> an independent, flink specific multi language support but we want to >> collaborate on that manner with the Beam community. I think this is also >> the concern Thomas raised[2]. >> >> Let's make sure we do not contradict with what was said in[1]. Could you >> elaborate more how does it fit in the Beam-Flink multi language support? >> >> Best, >> >> Dawid >> >> [1] >> >> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E >> >> [2] >> >> https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E >> >> >> On 04/04/2019 08:31, jincheng sun wrote: >>> Hi Shuyi, >>> >>> Glad to see your feedback and port more requirements about >> multi-language! >>> I think the Flink community is very much looking forward to more language >>> support, of course, Golang should be in the future support list. >>> Since the topic of supporting Python on Flink has been researched and >>> discussed in the community for a long time, and I want to support Python >> in >>> the Table API as the first stage, then other languages should be planed >> to >>> support. but I do not think more about the detail about how/when support >>> Golang. And very welcome to share more ideas on how to support Golang if >>> you have more thoughts. :) >>> >>> Regarding UDF, we do have some ideas and design attempts. The related >>> attempts to show the performance of python UDF are not optimistic. And >>> there are also some problems with Python environment management should be >>> considered. After we have more investigations and experiments, I will >> share >>> the discussion with you in time. Perhaps after the first stage(Python >>> TableAPI support), We will then discuss the detailed discussion of UDF >>> support. >>> >>> I think the support of the DataStream API should be considered after >>> supporting UDFs because DataStream is mostly supported by various >>> functions. >>> >>> We plan to complete the first phase before the release of Flink-1.9, and >>> start the UDF support after 1.9. Of course, I am very glad to hear that >>> you want to contribute to the Flink multi-language support. I believe, >>> nothing is impossible if more people interest in Python Table API with >> UDF >>> support and more people want to contribute community more, UDF may be >> there >>> when flink-1.9 release. :) >>> >>> Best, >>> Jincheng >>> >>> Shuyi Chen <suez1...@gmail.com> 于2019年4月4日周四 上午3:35写道: >>> >>>> Thanks a lot for driving the FLIP, jincheng. The approach looks >>>> good. Adding multi-lang support sounds a promising direction to expand >> the >>>> footprint of Flink. Do we have plan for adding Golang support? As many >>>> backend engineers nowadays are familiar with Go, but probably not Java >> as >>>> much, adding Golang support would significantly reduce their friction to >>>> use Flink. Also, do we have a design for multi-lang UDF support, and >> what's >>>> timeline for adding DataStream API support? We would like to help and >>>> contribute as well as we do have similar need internally at our company. >>>> Thanks a lot. >>>> >>>> Shuyi >>>> >>>> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com> >>>> wrote: >>>> >>>>> Hi All, >>>>> As Xianda brought up in the previous email, There are a large number of >>>>> data analysis users who want flink to support Python. At the Flink API >>>>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API >> will >>>>> become the first-class citizen. Table API is declarative and can be >>>>> automatically optimized, which is mentioned in the Flink mid-term >> roadmap >>>>> by Stephan. So we first considering supporting Python at the Table >> level >>>> to >>>>> cater to the current large number of analytics users. For further >> promote >>>>> Python support in flink table level. Dian, Wei and I discussed offline >> a >>>>> bit and came up with an initial features outline as follows: >>>>> >>>>> - Python TableAPI Interface >>>>> Introduce a set of Python Table API interfaces, including interface >>>>> definitions such as Table, TableEnvironment, TableConfig, etc. >>>>> >>>>> - Implementation Architecture >>>>> We will offer two alternative architecture options, one for pure >> Python >>>>> language support and one for extended multi-language design. >>>>> >>>>> - Job Submission >>>>> Provide a way that can submit(local/remote) Python Table API jobs. >>>>> >>>>> - Python Shell >>>>> Python Shell is to provide an interactive way for users to write and >>>>> execute flink Python Table API jobs. >>>>> >>>>> >>>>> The design document for FLIP-38 can be found here: >>>>> >>>>> >>>>> >> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing >>>>> I am looking forward to your comments and feedback. >>>>> >>>>> Best, >>>>> Jincheng >>>>> >>
signature.asc
Description: OpenPGP digital signature