Hi Chesnay,

Thanks a lot for the suggestions.

Regarding “distributing java/scala code to PyPI”:
The Python Table API is just a wrapper of the Java Table API and without the 
java/scala code, two steps will be needed to set up an environment to execute a 
Python Table API program:
1) Install pyflink using "pip install apache-flink" 
2) Download the flink distribution and set the FLINK_HOME to it.
Besides, users have to make sure that the manually installed Flink is 
compatible with the pip installed pyflink. 

Bundle the java/scala code inside the Python package will eliminate step 2) and 
makes it more simple for users to install pyflink. There was a short discussion 
<https://issues.apache.org/jira/browse/SPARK-1267> on this in Spark community 
and they finally decide to package the java/scala code in the python package. 
(BTW, PySpark only bundle the jars of scala 2.11).

Regards,
Dian 

> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道:
> 
> The existing artifact in the pyflink project was neither released by the 
> Flink project / anyone affiliated with it nor approved by the Flink PMC.
> 
> As such, if we were to use this account I believe we should delete it to not 
> mislead users that this is in any way an apache-provided distribution. Since 
> this goes against the users wishes, I would be in favor of creating a 
> separate account, and giving back control over the pyflink account.
> 
> My take on the raised points:
> 1.1) "apache-flink"
> 1.2)  option 2
> 2) Given that we only distribute python code there should be no reason to 
> differentiate between scala versions. We should not be distributing any 
> java/scala code and/or modules to PyPi. Currently, I'm a bit confused about 
> this question and wonder what exactly we are trying to publish here.
> 3) The should be treated as any other source release; i.e., it needs a 
> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to 
> make this part of our normal release process. There will be _one_ source 
> release on dist.apache.org encompassing everything, and a separate python of 
> focused source release that we push to PyPi. The LICENSE and NOTICE contained 
> in the python source release must also be present in the source release of 
> Flink; so basically the python source release is just the contents of 
> flink-python module the maven pom.xml, with no other special sauce added 
> during the release process.
> 
> On 02/07/2019 05:42, jincheng sun wrote:
>> Hi all,
>> 
>> With the effort of FLIP-38 [1], the Python Table API(without UDF support
>> for now) will be supported in the coming release-1.9.
>> As described in "Build PyFlink"[2], if users want to use the Python Table
>> API, they can manually install it using the command:
>> "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz".
>> 
>> This is non-trivial for users and it will be better if we can follow the
>> Python way to publish PyFlink to PyPI
>> which is a repository of software for the Python programming language. Then
>> users can use the standard Python package
>> manager "pip" to install PyFlink: "pip install pyflink". So, there are some
>> topic need to be discussed as follows:
>> 
>> 1. How to publish PyFlink to PyPI
>> 
>> 1.1 Project Name
>>      We need to decide the project name of PyPI to use, for example,
>> apache-flink,  pyflink, etc.
>> 
>>     Regarding to the name "pyflink", it has already been registered by
>> @ueqt and there is already a package '1.0' released under this project
>> which is based on flink-libraries/flink-python.
>> 
>>    @ueqt has kindly agreed to give this project back to the community. And
>> he has requested that the released package '1.0' should not be removed as
>> it has already been used in their company.
>> 
>>     So we need to decide whether to use the name 'pyflink'?  If yes, we
>> need to figure out how to tackle with the package '1.0' under this project.
>> 
>>     From the points of my view, the "pyflink" is better for our project
>> name and we can keep the release of 1.0, maybe more people want to use.
>> 
>> 1.2 PyPI account for release
>>     We need also decide on which account to use to publish packages to PyPI.
>> 
>>     There are two permissions in PyPI: owner and maintainer:
>> 
>>     1) The owner can upload releases, delete files, releases or the entire
>> project.
>>     2) The maintainer can also upload releases. However, they cannot delete
>> files, releases, or the project.
>> 
>>     So there are two options in my mind:
>> 
>>     1) Create an account such as 'pyflink' as the owner share it with all
>> the release managers and then release managers can publish the package to
>> PyPI using this account.
>>     2) Create an account such as 'pyflink' as owner(only PMC can manage it)
>> and adds the release manager's account as maintainers of the project.
>> Release managers publish the package to PyPI using their own account.
>> 
>>     As I know, PySpark takes Option 1) and Apache Beam takes Option 2).
>> 
>>     From the points of my view, I prefer option 2) as it's pretty safer as
>> it eliminate the risk of deleting old releases occasionally and at the same
>> time keeps the trace of who is operating.
>> 
>> 2. How to handle Scala_2.11 and Scala_2.12
>> 
>> The PyFlink package bundles the jars in the package. As we know, there are
>> two versions of jars for each module: one for Scala 2.11 and the other for
>> Scala 2.12. So there will be two PyFlink packages theoretically. We need to
>> decide which one to publish to PyPI or both. If both packages will be
>> published to PyPI, we may need two projects, such as pyflink_211 and
>> pyflink_212 separately. Maybe more in the future such as pyflink_213.
>> 
>>     (BTW, I think we should bring up a discussion for dorp Scala_2.11 in
>> Flink 1.10 release due to 2.13 is available in early June.)
>> 
>>     From the points of my view, for now, we can only release the scala_2.11
>> version, due to scala_2.11 is our default version in Flink.
>> 
>> 3. Legal problems of publishing to PyPI
>> 
>> As @Chesnay Schepler <ches...@apache.org>  pointed out in FLINK-13011[3],
>> publishing PyFlink to PyPI means that we will publish binaries to a
>> distribution channel not owned by Apache. We need to figure out if there
>> are legal problems. From my point of view, there are no problems as a few
>> Apache projects such as Spark, Beam, etc have already done it. Frankly
>> speaking, I am not familiar with this problem, welcome any feedback on this
>> if somebody is more family with this.
>> 
>> Great thanks to @ueqt for willing to dedicate PyPI's project name `pyflink`
>> to the Apache Flink community!!!
>> Great thanks to @Dian for the offline effort!!!
>> 
>> Best,
>> Jincheng
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
>> [3] https://issues.apache.org/jira/browse/FLINK-13011
>> 
> 

Reply via email to