Hi Stephan & Jeff, Thanks a lot for sharing your thoughts!
Regarding the bundled jars, currently only the jars in the flink binary distribution is packaged in the pyflink package. That maybe a good idea to also bundle the other jars such as flink-hadoop-compatibility. We may need also consider whether to bundle the format jars such as flink-avro, flink-json, flink-csv and the connector jars such as flink-connector-kafka, etc. If FLINK_HOME is set, the binary distribution specified by FLINK_HOME will be used instead. Regards, Dian > 在 2019年7月24日,上午9:47,Jeff Zhang <zjf...@gmail.com> 写道: > > +1 for publishing pyflink to pypi. > > Regarding including jar, I just want to make sure which flink binary > distribution we would ship with pyflink since we have multiple flink binary > distributions (w/o hadoop). > Personally, I prefer to use the hadoop-included binary distribution. > > And I just want to confirm whether it is possible for users to use a > different flink binary distribution as long as he set env FLINK_HOME. > > Besides that, I hope that there will be bi-direction link reference between > flink doc and pypi doc. > > > > Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道: > >> Hi! >> >> Sorry for the late involvement. Here are some thoughts from my side: >> >> Definitely +1 to publishing to PyPy, even if it is a binary release. >> Community growth into other communities is great, and if this is the >> natural way to reach developers in the Python community, let's do it. This >> is not about our convenience, but reaching users. >> >> I think the way to look at this is that this is a convenience distribution >> channel, courtesy of the Flink community. It is not an Apache release, we >> make this clear in the Readme. >> Of course, this doesn't mean we don't try to uphold similar standards as >> for our official release (like proper license information). >> >> Concerning credentials sharing, I would be fine with whatever option. The >> PMC doesn't own it (it is an initiative by some community members), but the >> PMC needs to ensure trademark compliance, so slight preference for option >> #1 (PMC would have means to correct problems). >> >> I believe there is no need to differentiate between Scala versions, because >> this is merely a convenience thing for pure Python users. Users that mix >> python and scala (and thus depend on specific scala versions) can still >> download from Apache or build themselves. >> >> Best, >> Stephan >> >> >> >> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <sunjincheng...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> Thanks for the feedback @Chesnay Schepler <ches...@apache.org> @Dian! >>> >>> I think using `apache-flink` for the project name also makes sense to me. >>> due to we should always keep in mind that Flink is owned by Apache. (And >>> beam also using this pattern `apache-beam` for Python API) >>> >>> Regarding the Python API release with the JAVA JARs, I think the >> principle >>> of consideration is the convenience of the user. So, Thanks for the >>> explanation @Dian! >>> >>> And your right @Chesnay Schepler <ches...@apache.org> we can't make a >>> hasty decision and we need more people's opinions! >>> >>> So, I appreciate it if anyone can give us feedback and suggestions! >>> >>> Best, >>> Jincheng >>> >>> >>> >>> >>> Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道: >>> >>>> So this would not be a source release then, but a full-blown binary >>>> release. >>>> >>>> Maybe it is just me, but I find it a bit suspect to ship an entire java >>>> application via PyPI, just because there's a Python API for it. >>>> >>>> We definitely need input from more people here. >>>> >>>> On 03/07/2019 14:09, Dian Fu wrote: >>>>> Hi Chesnay, >>>>> >>>>> Thanks a lot for the suggestions. >>>>> >>>>> Regarding “distributing java/scala code to PyPI”: >>>>> The Python Table API is just a wrapper of the Java Table API and >>> without >>>> the java/scala code, two steps will be needed to set up an environment >> to >>>> execute a Python Table API program: >>>>> 1) Install pyflink using "pip install apache-flink" >>>>> 2) Download the flink distribution and set the FLINK_HOME to it. >>>>> Besides, users have to make sure that the manually installed Flink is >>>> compatible with the pip installed pyflink. >>>>> >>>>> Bundle the java/scala code inside the Python package will eliminate >>> step >>>> 2) and makes it more simple for users to install pyflink. There was a >>> short >>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this >> in >>>> Spark community and they finally decide to package the java/scala code >> in >>>> the python package. (BTW, PySpark only bundle the jars of scala 2.11). >>>>> >>>>> Regards, >>>>> Dian >>>>> >>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道: >>>>>> >>>>>> The existing artifact in the pyflink project was neither released by >>>> the Flink project / anyone affiliated with it nor approved by the Flink >>> PMC. >>>>>> >>>>>> As such, if we were to use this account I believe we should delete >> it >>>> to not mislead users that this is in any way an apache-provided >>>> distribution. Since this goes against the users wishes, I would be in >>> favor >>>> of creating a separate account, and giving back control over the >> pyflink >>>> account. >>>>>> >>>>>> My take on the raised points: >>>>>> 1.1) "apache-flink" >>>>>> 1.2) option 2 >>>>>> 2) Given that we only distribute python code there should be no >> reason >>>> to differentiate between scala versions. We should not be distributing >>> any >>>> java/scala code and/or modules to PyPi. Currently, I'm a bit confused >>> about >>>> this question and wonder what exactly we are trying to publish here. >>>>>> 3) The should be treated as any other source release; i.e., it >> needs a >>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would >>> be >>>> to make this part of our normal release process. There will be _one_ >>> source >>>> release on dist.apache.org encompassing everything, and a separate >>> python >>>> of focused source release that we push to PyPi. The LICENSE and NOTICE >>>> contained in the python source release must also be present in the >> source >>>> release of Flink; so basically the python source release is just the >>>> contents of flink-python module the maven pom.xml, with no other >> special >>>> sauce added during the release process. >>>>>> >>>>>> On 02/07/2019 05:42, jincheng sun wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF >>>> support >>>>>>> for now) will be supported in the coming release-1.9. >>>>>>> As described in "Build PyFlink"[2], if users want to use the Python >>>> Table >>>>>>> API, they can manually install it using the command: >>>>>>> "cd flink-python && python3 setup.py sdist && pip install >>>> dist/*.tar.gz". >>>>>>> >>>>>>> This is non-trivial for users and it will be better if we can >> follow >>>> the >>>>>>> Python way to publish PyFlink to PyPI >>>>>>> which is a repository of software for the Python programming >>> language. >>>> Then >>>>>>> users can use the standard Python package >>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, there >>> are >>>> some >>>>>>> topic need to be discussed as follows: >>>>>>> >>>>>>> 1. How to publish PyFlink to PyPI >>>>>>> >>>>>>> 1.1 Project Name >>>>>>> We need to decide the project name of PyPI to use, for >> example, >>>>>>> apache-flink, pyflink, etc. >>>>>>> >>>>>>> Regarding to the name "pyflink", it has already been >> registered >>> by >>>>>>> @ueqt and there is already a package '1.0' released under this >>> project >>>>>>> which is based on flink-libraries/flink-python. >>>>>>> >>>>>>> @ueqt has kindly agreed to give this project back to the >>>> community. And >>>>>>> he has requested that the released package '1.0' should not be >>> removed >>>> as >>>>>>> it has already been used in their company. >>>>>>> >>>>>>> So we need to decide whether to use the name 'pyflink'? If >> yes, >>>> we >>>>>>> need to figure out how to tackle with the package '1.0' under this >>>> project. >>>>>>> >>>>>>> From the points of my view, the "pyflink" is better for our >>>> project >>>>>>> name and we can keep the release of 1.0, maybe more people want to >>> use. >>>>>>> >>>>>>> 1.2 PyPI account for release >>>>>>> We need also decide on which account to use to publish >> packages >>>> to PyPI. >>>>>>> >>>>>>> There are two permissions in PyPI: owner and maintainer: >>>>>>> >>>>>>> 1) The owner can upload releases, delete files, releases or >> the >>>> entire >>>>>>> project. >>>>>>> 2) The maintainer can also upload releases. However, they >> cannot >>>> delete >>>>>>> files, releases, or the project. >>>>>>> >>>>>>> So there are two options in my mind: >>>>>>> >>>>>>> 1) Create an account such as 'pyflink' as the owner share it >>> with >>>> all >>>>>>> the release managers and then release managers can publish the >>> package >>>> to >>>>>>> PyPI using this account. >>>>>>> 2) Create an account such as 'pyflink' as owner(only PMC can >>>> manage it) >>>>>>> and adds the release manager's account as maintainers of the >> project. >>>>>>> Release managers publish the package to PyPI using their own >> account. >>>>>>> >>>>>>> As I know, PySpark takes Option 1) and Apache Beam takes >> Option >>>> 2). >>>>>>> >>>>>>> From the points of my view, I prefer option 2) as it's pretty >>>> safer as >>>>>>> it eliminate the risk of deleting old releases occasionally and at >>> the >>>> same >>>>>>> time keeps the trace of who is operating. >>>>>>> >>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 >>>>>>> >>>>>>> The PyFlink package bundles the jars in the package. As we know, >>> there >>>> are >>>>>>> two versions of jars for each module: one for Scala 2.11 and the >>> other >>>> for >>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. We >>>> need to >>>>>>> decide which one to publish to PyPI or both. If both packages will >> be >>>>>>> published to PyPI, we may need two projects, such as pyflink_211 >> and >>>>>>> pyflink_212 separately. Maybe more in the future such as >> pyflink_213. >>>>>>> >>>>>>> (BTW, I think we should bring up a discussion for dorp >>> Scala_2.11 >>>> in >>>>>>> Flink 1.10 release due to 2.13 is available in early June.) >>>>>>> >>>>>>> From the points of my view, for now, we can only release the >>>> scala_2.11 >>>>>>> version, due to scala_2.11 is our default version in Flink. >>>>>>> >>>>>>> 3. Legal problems of publishing to PyPI >>>>>>> >>>>>>> As @Chesnay Schepler <ches...@apache.org> pointed out in >>>> FLINK-13011[3], >>>>>>> publishing PyFlink to PyPI means that we will publish binaries to a >>>>>>> distribution channel not owned by Apache. We need to figure out if >>>> there >>>>>>> are legal problems. From my point of view, there are no problems >> as a >>>> few >>>>>>> Apache projects such as Spark, Beam, etc have already done it. >>> Frankly >>>>>>> speaking, I am not familiar with this problem, welcome any feedback >>> on >>>> this >>>>>>> if somebody is more family with this. >>>>>>> >>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project name >>>> `pyflink` >>>>>>> to the Apache Flink community!!! >>>>>>> Great thanks to @Dian for the offline effort!!! >>>>>>> >>>>>>> Best, >>>>>>> Jincheng >>>>>>> >>>>>>> [1] >>>>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >>>>>>> [2] >>>>>>> >>>> >>> >> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 >>>>>>> >>>>> >>>> >>>> >>> >> > > > -- > Best Regards > > Jeff Zhang