+1 for publishing pyflink to pypi. Regarding including jar, I just want to make sure which flink binary distribution we would ship with pyflink since we have multiple flink binary distributions (w/o hadoop). Personally, I prefer to use the hadoop-included binary distribution.
And I just want to confirm whether it is possible for users to use a different flink binary distribution as long as he set env FLINK_HOME. Besides that, I hope that there will be bi-direction link reference between flink doc and pypi doc. Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道: > Hi! > > Sorry for the late involvement. Here are some thoughts from my side: > > Definitely +1 to publishing to PyPy, even if it is a binary release. > Community growth into other communities is great, and if this is the > natural way to reach developers in the Python community, let's do it. This > is not about our convenience, but reaching users. > > I think the way to look at this is that this is a convenience distribution > channel, courtesy of the Flink community. It is not an Apache release, we > make this clear in the Readme. > Of course, this doesn't mean we don't try to uphold similar standards as > for our official release (like proper license information). > > Concerning credentials sharing, I would be fine with whatever option. The > PMC doesn't own it (it is an initiative by some community members), but the > PMC needs to ensure trademark compliance, so slight preference for option > #1 (PMC would have means to correct problems). > > I believe there is no need to differentiate between Scala versions, because > this is merely a convenience thing for pure Python users. Users that mix > python and scala (and thus depend on specific scala versions) can still > download from Apache or build themselves. > > Best, > Stephan > > > > On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <sunjincheng...@gmail.com> > wrote: > > > Hi All, > > > > Thanks for the feedback @Chesnay Schepler <ches...@apache.org> @Dian! > > > > I think using `apache-flink` for the project name also makes sense to me. > > due to we should always keep in mind that Flink is owned by Apache. (And > > beam also using this pattern `apache-beam` for Python API) > > > > Regarding the Python API release with the JAVA JARs, I think the > principle > > of consideration is the convenience of the user. So, Thanks for the > > explanation @Dian! > > > > And your right @Chesnay Schepler <ches...@apache.org> we can't make a > > hasty decision and we need more people's opinions! > > > > So, I appreciate it if anyone can give us feedback and suggestions! > > > > Best, > > Jincheng > > > > > > > > > > Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道: > > > > > So this would not be a source release then, but a full-blown binary > > > release. > > > > > > Maybe it is just me, but I find it a bit suspect to ship an entire java > > > application via PyPI, just because there's a Python API for it. > > > > > > We definitely need input from more people here. > > > > > > On 03/07/2019 14:09, Dian Fu wrote: > > > > Hi Chesnay, > > > > > > > > Thanks a lot for the suggestions. > > > > > > > > Regarding “distributing java/scala code to PyPI”: > > > > The Python Table API is just a wrapper of the Java Table API and > > without > > > the java/scala code, two steps will be needed to set up an environment > to > > > execute a Python Table API program: > > > > 1) Install pyflink using "pip install apache-flink" > > > > 2) Download the flink distribution and set the FLINK_HOME to it. > > > > Besides, users have to make sure that the manually installed Flink is > > > compatible with the pip installed pyflink. > > > > > > > > Bundle the java/scala code inside the Python package will eliminate > > step > > > 2) and makes it more simple for users to install pyflink. There was a > > short > > > discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this > in > > > Spark community and they finally decide to package the java/scala code > in > > > the python package. (BTW, PySpark only bundle the jars of scala 2.11). > > > > > > > > Regards, > > > > Dian > > > > > > > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道: > > > >> > > > >> The existing artifact in the pyflink project was neither released by > > > the Flink project / anyone affiliated with it nor approved by the Flink > > PMC. > > > >> > > > >> As such, if we were to use this account I believe we should delete > it > > > to not mislead users that this is in any way an apache-provided > > > distribution. Since this goes against the users wishes, I would be in > > favor > > > of creating a separate account, and giving back control over the > pyflink > > > account. > > > >> > > > >> My take on the raised points: > > > >> 1.1) "apache-flink" > > > >> 1.2) option 2 > > > >> 2) Given that we only distribute python code there should be no > reason > > > to differentiate between scala versions. We should not be distributing > > any > > > java/scala code and/or modules to PyPi. Currently, I'm a bit confused > > about > > > this question and wonder what exactly we are trying to publish here. > > > >> 3) The should be treated as any other source release; i.e., it > needs a > > > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would > > be > > > to make this part of our normal release process. There will be _one_ > > source > > > release on dist.apache.org encompassing everything, and a separate > > python > > > of focused source release that we push to PyPi. The LICENSE and NOTICE > > > contained in the python source release must also be present in the > source > > > release of Flink; so basically the python source release is just the > > > contents of flink-python module the maven pom.xml, with no other > special > > > sauce added during the release process. > > > >> > > > >> On 02/07/2019 05:42, jincheng sun wrote: > > > >>> Hi all, > > > >>> > > > >>> With the effort of FLIP-38 [1], the Python Table API(without UDF > > > support > > > >>> for now) will be supported in the coming release-1.9. > > > >>> As described in "Build PyFlink"[2], if users want to use the Python > > > Table > > > >>> API, they can manually install it using the command: > > > >>> "cd flink-python && python3 setup.py sdist && pip install > > > dist/*.tar.gz". > > > >>> > > > >>> This is non-trivial for users and it will be better if we can > follow > > > the > > > >>> Python way to publish PyFlink to PyPI > > > >>> which is a repository of software for the Python programming > > language. > > > Then > > > >>> users can use the standard Python package > > > >>> manager "pip" to install PyFlink: "pip install pyflink". So, there > > are > > > some > > > >>> topic need to be discussed as follows: > > > >>> > > > >>> 1. How to publish PyFlink to PyPI > > > >>> > > > >>> 1.1 Project Name > > > >>> We need to decide the project name of PyPI to use, for > example, > > > >>> apache-flink, pyflink, etc. > > > >>> > > > >>> Regarding to the name "pyflink", it has already been > registered > > by > > > >>> @ueqt and there is already a package '1.0' released under this > > project > > > >>> which is based on flink-libraries/flink-python. > > > >>> > > > >>> @ueqt has kindly agreed to give this project back to the > > > community. And > > > >>> he has requested that the released package '1.0' should not be > > removed > > > as > > > >>> it has already been used in their company. > > > >>> > > > >>> So we need to decide whether to use the name 'pyflink'? If > yes, > > > we > > > >>> need to figure out how to tackle with the package '1.0' under this > > > project. > > > >>> > > > >>> From the points of my view, the "pyflink" is better for our > > > project > > > >>> name and we can keep the release of 1.0, maybe more people want to > > use. > > > >>> > > > >>> 1.2 PyPI account for release > > > >>> We need also decide on which account to use to publish > packages > > > to PyPI. > > > >>> > > > >>> There are two permissions in PyPI: owner and maintainer: > > > >>> > > > >>> 1) The owner can upload releases, delete files, releases or > the > > > entire > > > >>> project. > > > >>> 2) The maintainer can also upload releases. However, they > cannot > > > delete > > > >>> files, releases, or the project. > > > >>> > > > >>> So there are two options in my mind: > > > >>> > > > >>> 1) Create an account such as 'pyflink' as the owner share it > > with > > > all > > > >>> the release managers and then release managers can publish the > > package > > > to > > > >>> PyPI using this account. > > > >>> 2) Create an account such as 'pyflink' as owner(only PMC can > > > manage it) > > > >>> and adds the release manager's account as maintainers of the > project. > > > >>> Release managers publish the package to PyPI using their own > account. > > > >>> > > > >>> As I know, PySpark takes Option 1) and Apache Beam takes > Option > > > 2). > > > >>> > > > >>> From the points of my view, I prefer option 2) as it's pretty > > > safer as > > > >>> it eliminate the risk of deleting old releases occasionally and at > > the > > > same > > > >>> time keeps the trace of who is operating. > > > >>> > > > >>> 2. How to handle Scala_2.11 and Scala_2.12 > > > >>> > > > >>> The PyFlink package bundles the jars in the package. As we know, > > there > > > are > > > >>> two versions of jars for each module: one for Scala 2.11 and the > > other > > > for > > > >>> Scala 2.12. So there will be two PyFlink packages theoretically. We > > > need to > > > >>> decide which one to publish to PyPI or both. If both packages will > be > > > >>> published to PyPI, we may need two projects, such as pyflink_211 > and > > > >>> pyflink_212 separately. Maybe more in the future such as > pyflink_213. > > > >>> > > > >>> (BTW, I think we should bring up a discussion for dorp > > Scala_2.11 > > > in > > > >>> Flink 1.10 release due to 2.13 is available in early June.) > > > >>> > > > >>> From the points of my view, for now, we can only release the > > > scala_2.11 > > > >>> version, due to scala_2.11 is our default version in Flink. > > > >>> > > > >>> 3. Legal problems of publishing to PyPI > > > >>> > > > >>> As @Chesnay Schepler <ches...@apache.org> pointed out in > > > FLINK-13011[3], > > > >>> publishing PyFlink to PyPI means that we will publish binaries to a > > > >>> distribution channel not owned by Apache. We need to figure out if > > > there > > > >>> are legal problems. From my point of view, there are no problems > as a > > > few > > > >>> Apache projects such as Spark, Beam, etc have already done it. > > Frankly > > > >>> speaking, I am not familiar with this problem, welcome any feedback > > on > > > this > > > >>> if somebody is more family with this. > > > >>> > > > >>> Great thanks to @ueqt for willing to dedicate PyPI's project name > > > `pyflink` > > > >>> to the Apache Flink community!!! > > > >>> Great thanks to @Dian for the offline effort!!! > > > >>> > > > >>> Best, > > > >>> Jincheng > > > >>> > > > >>> [1] > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > >>> [2] > > > >>> > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > >>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > >>> > > > > > > > > > > > > > -- Best Regards Jeff Zhang