Hi Dawid, Thanks for your feedback!
Yes, you are right, there is no problem with two-way communication in our proposal. The two solutions proposed already support for two-way communication between Python and Java. Similar to the interface of from_collection/from_elements, our proposal can solve those problems very well. Regarding why we do not mention Beam, is because Beam does not have the layer to cover the table API semantics, and the current proposal is about Table API interface and implementation. So there is no inevitable connection with Beam in the design of the Python Table API. Furthermore, Beam uses protobuf to define the data structure and solve the multi-language problem. It is very similar to the proposed Approach 2, but there are some differences in the implementation details. You can see the comments in the document. :) Regards, Jincheng Dawid Wysakowicz <dwysakow...@apache.org> 于2019年4月4日周四 下午4:28写道: > Hi Shaoxuan, > > Yes, I've seen your message and I am not saying it already contradicts. > I agree as long as we just define DAG/pipeline/logical plan it is a > reasonable thing to do. No doubts about that. I have a feeling though it > mentions at some points things that might be in the area of > responsibility of Beam, e.g. convenience methods like: fromElements, > Table#head(in comments)... Those methods require bidirectional > communication between java <> python, and not only one way communication > python -> (logical, representation) -> java. Also UDFs support as far as > I understand is something we might be able to leverage Beam (but at the > same time I might be completely wrong). > > The only thing I wanted to outline is I would welcome at least some > comparisons of the proposed approach to Beam multi-language support. > Discussion when can we think of leveraging Beam and when we should come > up with our own solution and why would also be beneficial I think. Right > now the design document does not mention Beam at all. > > Sorry if I sounded too harsh, my intention isn't/wasn't to discard this > effort. > > Best, > > Dawid > > On 04/04/2019 09:41, Shaoxuan Wang wrote: > > David, > > This proposal does not contradict with what we have discussed. > > Please check my reply in > > > https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E > > on > > 2019/02/21. > > "Beam Python API and Flink Python TableAPI describe the DAG/pipeline in > > different manners. We got a chance to communicate with Tyler Akidau (from > > Beam) offline, and explained why the Flink tableAPI needs a specific > design > > for python, rather than purely leverage Beam portability layer. > > > > In our proposal, most of the Python code is just a DAG/pipeline builder > for > > tableAPI. The majority of operators run purely in Java, while only python > > UDFs executed in Python environment during the runtime. This design does > > not affect the development and adoption of Beam language portability > layer > > with Flink runner. Flink and Beam community will still collaborate, > jointly > > develop and optimize on the JVM / Non-JVM (python,GO) bridge (data and > > control connections between different processes) to ensure the robustness > > and performance." > > > > When we talk about multi-language support, it involves two components: > API > > and language. And they are Orthogonal. TableAPI is a descriptive API, and > > will be a superset of SQL. I do not see Beam has the layer and any plan > to > > cover the tableAPI semantics. We already have two languages supported for > > tableAPI(java/scala). I do not see the reason why we should not add > another > > language (python) support for tableAPI. > > > > Regards, > > Shaoxuan > > > > > > > > On Thu, Apr 4, 2019 at 3:13 PM Dawid Wysakowicz <dwysakow...@apache.org> > > wrote: > > > >> Hi all, > >> > >> Thank you very much Jincheng for the very thorough proposal. I was > >> following the discussion very briefly, but I have an impression that the > >> consensus in the previous discussion[1] was that we do not want to have > >> an independent, flink specific multi language support but we want to > >> collaborate on that manner with the Beam community. I think this is also > >> the concern Thomas raised[2]. > >> > >> Let's make sure we do not contradict with what was said in[1]. Could you > >> elaborate more how does it fit in the Beam-Flink multi language support? > >> > >> Best, > >> > >> Dawid > >> > >> [1] > >> > >> > https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E > >> > >> [2] > >> > >> > https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E > >> > >> > >> On 04/04/2019 08:31, jincheng sun wrote: > >>> Hi Shuyi, > >>> > >>> Glad to see your feedback and port more requirements about > >> multi-language! > >>> I think the Flink community is very much looking forward to more > language > >>> support, of course, Golang should be in the future support list. > >>> Since the topic of supporting Python on Flink has been researched and > >>> discussed in the community for a long time, and I want to support > Python > >> in > >>> the Table API as the first stage, then other languages should be planed > >> to > >>> support. but I do not think more about the detail about how/when > support > >>> Golang. And very welcome to share more ideas on how to support Golang > if > >>> you have more thoughts. :) > >>> > >>> Regarding UDF, we do have some ideas and design attempts. The related > >>> attempts to show the performance of python UDF are not optimistic. And > >>> there are also some problems with Python environment management should > be > >>> considered. After we have more investigations and experiments, I will > >> share > >>> the discussion with you in time. Perhaps after the first stage(Python > >>> TableAPI support), We will then discuss the detailed discussion of UDF > >>> support. > >>> > >>> I think the support of the DataStream API should be considered after > >>> supporting UDFs because DataStream is mostly supported by various > >>> functions. > >>> > >>> We plan to complete the first phase before the release of Flink-1.9, > and > >>> start the UDF support after 1.9. Of course, I am very glad to hear > that > >>> you want to contribute to the Flink multi-language support. I believe, > >>> nothing is impossible if more people interest in Python Table API with > >> UDF > >>> support and more people want to contribute community more, UDF may be > >> there > >>> when flink-1.9 release. :) > >>> > >>> Best, > >>> Jincheng > >>> > >>> Shuyi Chen <suez1...@gmail.com> 于2019年4月4日周四 上午3:35写道: > >>> > >>>> Thanks a lot for driving the FLIP, jincheng. The approach looks > >>>> good. Adding multi-lang support sounds a promising direction to expand > >> the > >>>> footprint of Flink. Do we have plan for adding Golang support? As many > >>>> backend engineers nowadays are familiar with Go, but probably not Java > >> as > >>>> much, adding Golang support would significantly reduce their friction > to > >>>> use Flink. Also, do we have a design for multi-lang UDF support, and > >> what's > >>>> timeline for adding DataStream API support? We would like to help and > >>>> contribute as well as we do have similar need internally at our > company. > >>>> Thanks a lot. > >>>> > >>>> Shuyi > >>>> > >>>> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <sunjincheng...@gmail.com > > > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> As Xianda brought up in the previous email, There are a large number > of > >>>>> data analysis users who want flink to support Python. At the Flink > API > >>>>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API > >> will > >>>>> become the first-class citizen. Table API is declarative and can be > >>>>> automatically optimized, which is mentioned in the Flink mid-term > >> roadmap > >>>>> by Stephan. So we first considering supporting Python at the Table > >> level > >>>> to > >>>>> cater to the current large number of analytics users. For further > >> promote > >>>>> Python support in flink table level. Dian, Wei and I discussed > offline > >> a > >>>>> bit and came up with an initial features outline as follows: > >>>>> > >>>>> - Python TableAPI Interface > >>>>> Introduce a set of Python Table API interfaces, including interface > >>>>> definitions such as Table, TableEnvironment, TableConfig, etc. > >>>>> > >>>>> - Implementation Architecture > >>>>> We will offer two alternative architecture options, one for pure > >> Python > >>>>> language support and one for extended multi-language design. > >>>>> > >>>>> - Job Submission > >>>>> Provide a way that can submit(local/remote) Python Table API jobs. > >>>>> > >>>>> - Python Shell > >>>>> Python Shell is to provide an interactive way for users to write > and > >>>>> execute flink Python Table API jobs. > >>>>> > >>>>> > >>>>> The design document for FLIP-38 can be found here: > >>>>> > >>>>> > >>>>> > >> > https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing > >>>>> I am looking forward to your comments and feedback. > >>>>> > >>>>> Best, > >>>>> Jincheng > >>>>> > >> > >