Hi Chesnay, Thanks a lot for the suggestions.
Regarding “distributing java/scala code to PyPI”: The Python Table API is just a wrapper of the Java Table API and without the java/scala code, two steps will be needed to set up an environment to execute a Python Table API program: 1) Install pyflink using "pip install apache-flink" 2) Download the flink distribution and set the FLINK_HOME to it. Besides, users have to make sure that the manually installed Flink is compatible with the pip installed pyflink. Bundle the java/scala code inside the Python package will eliminate step 2) and makes it more simple for users to install pyflink. There was a short discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this in Spark community and they finally decide to package the java/scala code in the python package. (BTW, PySpark only bundle the jars of scala 2.11). Regards, Dian > 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道: > > The existing artifact in the pyflink project was neither released by the > Flink project / anyone affiliated with it nor approved by the Flink PMC. > > As such, if we were to use this account I believe we should delete it to not > mislead users that this is in any way an apache-provided distribution. Since > this goes against the users wishes, I would be in favor of creating a > separate account, and giving back control over the pyflink account. > > My take on the raised points: > 1.1) "apache-flink" > 1.2) option 2 > 2) Given that we only distribute python code there should be no reason to > differentiate between scala versions. We should not be distributing any > java/scala code and/or modules to PyPi. Currently, I'm a bit confused about > this question and wonder what exactly we are trying to publish here. > 3) The should be treated as any other source release; i.e., it needs a > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to > make this part of our normal release process. There will be _one_ source > release on dist.apache.org encompassing everything, and a separate python of > focused source release that we push to PyPi. The LICENSE and NOTICE contained > in the python source release must also be present in the source release of > Flink; so basically the python source release is just the contents of > flink-python module the maven pom.xml, with no other special sauce added > during the release process. > > On 02/07/2019 05:42, jincheng sun wrote: >> Hi all, >> >> With the effort of FLIP-38 [1], the Python Table API(without UDF support >> for now) will be supported in the coming release-1.9. >> As described in "Build PyFlink"[2], if users want to use the Python Table >> API, they can manually install it using the command: >> "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz". >> >> This is non-trivial for users and it will be better if we can follow the >> Python way to publish PyFlink to PyPI >> which is a repository of software for the Python programming language. Then >> users can use the standard Python package >> manager "pip" to install PyFlink: "pip install pyflink". So, there are some >> topic need to be discussed as follows: >> >> 1. How to publish PyFlink to PyPI >> >> 1.1 Project Name >> We need to decide the project name of PyPI to use, for example, >> apache-flink, pyflink, etc. >> >> Regarding to the name "pyflink", it has already been registered by >> @ueqt and there is already a package '1.0' released under this project >> which is based on flink-libraries/flink-python. >> >> @ueqt has kindly agreed to give this project back to the community. And >> he has requested that the released package '1.0' should not be removed as >> it has already been used in their company. >> >> So we need to decide whether to use the name 'pyflink'? If yes, we >> need to figure out how to tackle with the package '1.0' under this project. >> >> From the points of my view, the "pyflink" is better for our project >> name and we can keep the release of 1.0, maybe more people want to use. >> >> 1.2 PyPI account for release >> We need also decide on which account to use to publish packages to PyPI. >> >> There are two permissions in PyPI: owner and maintainer: >> >> 1) The owner can upload releases, delete files, releases or the entire >> project. >> 2) The maintainer can also upload releases. However, they cannot delete >> files, releases, or the project. >> >> So there are two options in my mind: >> >> 1) Create an account such as 'pyflink' as the owner share it with all >> the release managers and then release managers can publish the package to >> PyPI using this account. >> 2) Create an account such as 'pyflink' as owner(only PMC can manage it) >> and adds the release manager's account as maintainers of the project. >> Release managers publish the package to PyPI using their own account. >> >> As I know, PySpark takes Option 1) and Apache Beam takes Option 2). >> >> From the points of my view, I prefer option 2) as it's pretty safer as >> it eliminate the risk of deleting old releases occasionally and at the same >> time keeps the trace of who is operating. >> >> 2. How to handle Scala_2.11 and Scala_2.12 >> >> The PyFlink package bundles the jars in the package. As we know, there are >> two versions of jars for each module: one for Scala 2.11 and the other for >> Scala 2.12. So there will be two PyFlink packages theoretically. We need to >> decide which one to publish to PyPI or both. If both packages will be >> published to PyPI, we may need two projects, such as pyflink_211 and >> pyflink_212 separately. Maybe more in the future such as pyflink_213. >> >> (BTW, I think we should bring up a discussion for dorp Scala_2.11 in >> Flink 1.10 release due to 2.13 is available in early June.) >> >> From the points of my view, for now, we can only release the scala_2.11 >> version, due to scala_2.11 is our default version in Flink. >> >> 3. Legal problems of publishing to PyPI >> >> As @Chesnay Schepler <ches...@apache.org> pointed out in FLINK-13011[3], >> publishing PyFlink to PyPI means that we will publish binaries to a >> distribution channel not owned by Apache. We need to figure out if there >> are legal problems. From my point of view, there are no problems as a few >> Apache projects such as Spark, Beam, etc have already done it. Frankly >> speaking, I am not familiar with this problem, welcome any feedback on this >> if somebody is more family with this. >> >> Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink` >> to the Apache Flink community!!! >> Great thanks to @Dian for the offline effort!!! >> >> Best, >> Jincheng >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API >> [2] >> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink >> [3] https://issues.apache.org/jira/browse/FLINK-13011 >> >