So this would not be a source release then, but a full-blown binary release.
Maybe it is just me, but I find it a bit suspect to ship an entire java
application via PyPI, just because there's a Python API for it.
We definitely need input from more people here.
On 03/07/2019 14:09, Dian Fu wrote:
Hi Chesnay,
Thanks a lot for the suggestions.
Regarding “distributing java/scala code to PyPI”:
The Python Table API is just a wrapper of the Java Table API and without the
java/scala code, two steps will be needed to set up an environment to execute a
Python Table API program:
1) Install pyflink using "pip install apache-flink"
2) Download the flink distribution and set the FLINK_HOME to it.
Besides, users have to make sure that the manually installed Flink is
compatible with the pip installed pyflink.
Bundle the java/scala code inside the Python package will eliminate step 2) and makes
it more simple for users to install pyflink. There was a short discussion
<https://issues.apache.org/jira/browse/SPARK-1267> on this in Spark community
and they finally decide to package the java/scala code in the python package. (BTW,
PySpark only bundle the jars of scala 2.11).
Regards,
Dian
在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道:
The existing artifact in the pyflink project was neither released by the Flink
project / anyone affiliated with it nor approved by the Flink PMC.
As such, if we were to use this account I believe we should delete it to not
mislead users that this is in any way an apache-provided distribution. Since
this goes against the users wishes, I would be in favor of creating a separate
account, and giving back control over the pyflink account.
My take on the raised points:
1.1) "apache-flink"
1.2) option 2
2) Given that we only distribute python code there should be no reason to
differentiate between scala versions. We should not be distributing any
java/scala code and/or modules to PyPi. Currently, I'm a bit confused about
this question and wonder what exactly we are trying to publish here.
3) The should be treated as any other source release; i.e., it needs a LICENSE
and NOTICE file, signatures and a PMC vote. My suggestion would be to make this
part of our normal release process. There will be _one_ source release on
dist.apache.org encompassing everything, and a separate python of focused
source release that we push to PyPi. The LICENSE and NOTICE contained in the
python source release must also be present in the source release of Flink; so
basically the python source release is just the contents of flink-python module
the maven pom.xml, with no other special sauce added during the release process.
On 02/07/2019 05:42, jincheng sun wrote:
Hi all,
With the effort of FLIP-38 [1], the Python Table API(without UDF support
for now) will be supported in the coming release-1.9.
As described in "Build PyFlink"[2], if users want to use the Python Table
API, they can manually install it using the command:
"cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz".
This is non-trivial for users and it will be better if we can follow the
Python way to publish PyFlink to PyPI
which is a repository of software for the Python programming language. Then
users can use the standard Python package
manager "pip" to install PyFlink: "pip install pyflink". So, there are some
topic need to be discussed as follows:
1. How to publish PyFlink to PyPI
1.1 Project Name
We need to decide the project name of PyPI to use, for example,
apache-flink, pyflink, etc.
Regarding to the name "pyflink", it has already been registered by
@ueqt and there is already a package '1.0' released under this project
which is based on flink-libraries/flink-python.
@ueqt has kindly agreed to give this project back to the community. And
he has requested that the released package '1.0' should not be removed as
it has already been used in their company.
So we need to decide whether to use the name 'pyflink'? If yes, we
need to figure out how to tackle with the package '1.0' under this project.
From the points of my view, the "pyflink" is better for our project
name and we can keep the release of 1.0, maybe more people want to use.
1.2 PyPI account for release
We need also decide on which account to use to publish packages to PyPI.
There are two permissions in PyPI: owner and maintainer:
1) The owner can upload releases, delete files, releases or the entire
project.
2) The maintainer can also upload releases. However, they cannot delete
files, releases, or the project.
So there are two options in my mind:
1) Create an account such as 'pyflink' as the owner share it with all
the release managers and then release managers can publish the package to
PyPI using this account.
2) Create an account such as 'pyflink' as owner(only PMC can manage it)
and adds the release manager's account as maintainers of the project.
Release managers publish the package to PyPI using their own account.
As I know, PySpark takes Option 1) and Apache Beam takes Option 2).
From the points of my view, I prefer option 2) as it's pretty safer as
it eliminate the risk of deleting old releases occasionally and at the same
time keeps the trace of who is operating.
2. How to handle Scala_2.11 and Scala_2.12
The PyFlink package bundles the jars in the package. As we know, there are
two versions of jars for each module: one for Scala 2.11 and the other for
Scala 2.12. So there will be two PyFlink packages theoretically. We need to
decide which one to publish to PyPI or both. If both packages will be
published to PyPI, we may need two projects, such as pyflink_211 and
pyflink_212 separately. Maybe more in the future such as pyflink_213.
(BTW, I think we should bring up a discussion for dorp Scala_2.11 in
Flink 1.10 release due to 2.13 is available in early June.)
From the points of my view, for now, we can only release the scala_2.11
version, due to scala_2.11 is our default version in Flink.
3. Legal problems of publishing to PyPI
As @Chesnay Schepler <ches...@apache.org> pointed out in FLINK-13011[3],
publishing PyFlink to PyPI means that we will publish binaries to a
distribution channel not owned by Apache. We need to figure out if there
are legal problems. From my point of view, there are no problems as a few
Apache projects such as Spark, Beam, etc have already done it. Frankly
speaking, I am not familiar with this problem, welcome any feedback on this
if somebody is more family with this.
Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink`
to the Apache Flink community!!!
Great thanks to @Dian for the offline effort!!!
Best,
Jincheng
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
[2]
https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
[3] https://issues.apache.org/jira/browse/FLINK-13011