Hi Stephan & Jeff,

Thanks a lot for sharing your thoughts!

Regarding the bundled jars, currently only the jars in the flink binary 
distribution is packaged in the pyflink package. That maybe a good idea to also 
bundle the other jars such as flink-hadoop-compatibility. We may need also 
consider whether to bundle the format jars such as flink-avro, flink-json, 
flink-csv and the connector jars such as flink-connector-kafka, etc.

If FLINK_HOME is set, the binary distribution specified by FLINK_HOME will be 
used instead.

Regards,
Dian

> 在 2019年7月24日,上午9:47,Jeff Zhang <zjf...@gmail.com> 写道:
> 
> +1 for publishing pyflink to pypi.
> 
> Regarding including jar, I just want to make sure which flink binary
> distribution we would ship with pyflink since we have multiple flink binary
> distributions (w/o hadoop).
> Personally, I prefer to use the hadoop-included binary distribution.
> 
> And I just want to confirm whether it is possible for users to use a
> different flink binary distribution as long as he set env FLINK_HOME.
> 
> Besides that, I hope that there will be bi-direction link reference between
> flink doc and pypi doc.
> 
> 
> 
> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道:
> 
>> Hi!
>> 
>> Sorry for the late involvement. Here are some thoughts from my side:
>> 
>> Definitely +1 to publishing to PyPy, even if it is a binary release.
>> Community growth into other communities is great, and if this is the
>> natural way to reach developers in the Python community, let's do it. This
>> is not about our convenience, but reaching users.
>> 
>> I think the way to look at this is that this is a convenience distribution
>> channel, courtesy of the Flink community. It is not an Apache release, we
>> make this clear in the Readme.
>> Of course, this doesn't mean we don't try to uphold similar standards as
>> for our official release (like proper license information).
>> 
>> Concerning credentials sharing, I would be fine with whatever option. The
>> PMC doesn't own it (it is an initiative by some community members), but the
>> PMC needs to ensure trademark compliance, so slight preference for option
>> #1 (PMC would have means to correct problems).
>> 
>> I believe there is no need to differentiate between Scala versions, because
>> this is merely a convenience thing for pure Python users. Users that mix
>> python and scala (and thus depend on specific scala versions) can still
>> download from Apache or build themselves.
>> 
>> Best,
>> Stephan
>> 
>> 
>> 
>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <sunjincheng...@gmail.com>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> Thanks for the feedback @Chesnay Schepler <ches...@apache.org> @Dian!
>>> 
>>> I think using `apache-flink` for the project name also makes sense to me.
>>> due to we should always keep in mind that Flink is owned by Apache. (And
>>> beam also using this pattern `apache-beam` for Python API)
>>> 
>>> Regarding the Python API release with the JAVA JARs, I think the
>> principle
>>> of consideration is the convenience of the user. So, Thanks for the
>>> explanation @Dian!
>>> 
>>> And your right @Chesnay Schepler <ches...@apache.org>  we can't make a
>>> hasty decision and we need more people's opinions!
>>> 
>>> So, I appreciate it if anyone can give us feedback and suggestions!
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> 
>>> 
>>> Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道:
>>> 
>>>> So this would not be a source release then, but a full-blown binary
>>>> release.
>>>> 
>>>> Maybe it is just me, but I find it a bit suspect to ship an entire java
>>>> application via PyPI, just because there's a Python API for it.
>>>> 
>>>> We definitely need input from more people here.
>>>> 
>>>> On 03/07/2019 14:09, Dian Fu wrote:
>>>>> Hi Chesnay,
>>>>> 
>>>>> Thanks a lot for the suggestions.
>>>>> 
>>>>> Regarding “distributing java/scala code to PyPI”:
>>>>> The Python Table API is just a wrapper of the Java Table API and
>>> without
>>>> the java/scala code, two steps will be needed to set up an environment
>> to
>>>> execute a Python Table API program:
>>>>> 1) Install pyflink using "pip install apache-flink"
>>>>> 2) Download the flink distribution and set the FLINK_HOME to it.
>>>>> Besides, users have to make sure that the manually installed Flink is
>>>> compatible with the pip installed pyflink.
>>>>> 
>>>>> Bundle the java/scala code inside the Python package will eliminate
>>> step
>>>> 2) and makes it more simple for users to install pyflink. There was a
>>> short
>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this
>> in
>>>> Spark community and they finally decide to package the java/scala code
>> in
>>>> the python package. (BTW, PySpark only bundle the jars of scala 2.11).
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道:
>>>>>> 
>>>>>> The existing artifact in the pyflink project was neither released by
>>>> the Flink project / anyone affiliated with it nor approved by the Flink
>>> PMC.
>>>>>> 
>>>>>> As such, if we were to use this account I believe we should delete
>> it
>>>> to not mislead users that this is in any way an apache-provided
>>>> distribution. Since this goes against the users wishes, I would be in
>>> favor
>>>> of creating a separate account, and giving back control over the
>> pyflink
>>>> account.
>>>>>> 
>>>>>> My take on the raised points:
>>>>>> 1.1) "apache-flink"
>>>>>> 1.2)  option 2
>>>>>> 2) Given that we only distribute python code there should be no
>> reason
>>>> to differentiate between scala versions. We should not be distributing
>>> any
>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit confused
>>> about
>>>> this question and wonder what exactly we are trying to publish here.
>>>>>> 3) The should be treated as any other source release; i.e., it
>> needs a
>>>> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would
>>> be
>>>> to make this part of our normal release process. There will be _one_
>>> source
>>>> release on dist.apache.org encompassing everything, and a separate
>>> python
>>>> of focused source release that we push to PyPi. The LICENSE and NOTICE
>>>> contained in the python source release must also be present in the
>> source
>>>> release of Flink; so basically the python source release is just the
>>>> contents of flink-python module the maven pom.xml, with no other
>> special
>>>> sauce added during the release process.
>>>>>> 
>>>>>> On 02/07/2019 05:42, jincheng sun wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> With the effort of FLIP-38 [1], the Python Table API(without UDF
>>>> support
>>>>>>> for now) will be supported in the coming release-1.9.
>>>>>>> As described in "Build PyFlink"[2], if users want to use the Python
>>>> Table
>>>>>>> API, they can manually install it using the command:
>>>>>>> "cd flink-python && python3 setup.py sdist && pip install
>>>> dist/*.tar.gz".
>>>>>>> 
>>>>>>> This is non-trivial for users and it will be better if we can
>> follow
>>>> the
>>>>>>> Python way to publish PyFlink to PyPI
>>>>>>> which is a repository of software for the Python programming
>>> language.
>>>> Then
>>>>>>> users can use the standard Python package
>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". So, there
>>> are
>>>> some
>>>>>>> topic need to be discussed as follows:
>>>>>>> 
>>>>>>> 1. How to publish PyFlink to PyPI
>>>>>>> 
>>>>>>> 1.1 Project Name
>>>>>>>      We need to decide the project name of PyPI to use, for
>> example,
>>>>>>> apache-flink,  pyflink, etc.
>>>>>>> 
>>>>>>>     Regarding to the name "pyflink", it has already been
>> registered
>>> by
>>>>>>> @ueqt and there is already a package '1.0' released under this
>>> project
>>>>>>> which is based on flink-libraries/flink-python.
>>>>>>> 
>>>>>>>    @ueqt has kindly agreed to give this project back to the
>>>> community. And
>>>>>>> he has requested that the released package '1.0' should not be
>>> removed
>>>> as
>>>>>>> it has already been used in their company.
>>>>>>> 
>>>>>>>     So we need to decide whether to use the name 'pyflink'?  If
>> yes,
>>>> we
>>>>>>> need to figure out how to tackle with the package '1.0' under this
>>>> project.
>>>>>>> 
>>>>>>>     From the points of my view, the "pyflink" is better for our
>>>> project
>>>>>>> name and we can keep the release of 1.0, maybe more people want to
>>> use.
>>>>>>> 
>>>>>>> 1.2 PyPI account for release
>>>>>>>     We need also decide on which account to use to publish
>> packages
>>>> to PyPI.
>>>>>>> 
>>>>>>>     There are two permissions in PyPI: owner and maintainer:
>>>>>>> 
>>>>>>>     1) The owner can upload releases, delete files, releases or
>> the
>>>> entire
>>>>>>> project.
>>>>>>>     2) The maintainer can also upload releases. However, they
>> cannot
>>>> delete
>>>>>>> files, releases, or the project.
>>>>>>> 
>>>>>>>     So there are two options in my mind:
>>>>>>> 
>>>>>>>     1) Create an account such as 'pyflink' as the owner share it
>>> with
>>>> all
>>>>>>> the release managers and then release managers can publish the
>>> package
>>>> to
>>>>>>> PyPI using this account.
>>>>>>>     2) Create an account such as 'pyflink' as owner(only PMC can
>>>> manage it)
>>>>>>> and adds the release manager's account as maintainers of the
>> project.
>>>>>>> Release managers publish the package to PyPI using their own
>> account.
>>>>>>> 
>>>>>>>     As I know, PySpark takes Option 1) and Apache Beam takes
>> Option
>>>> 2).
>>>>>>> 
>>>>>>>     From the points of my view, I prefer option 2) as it's pretty
>>>> safer as
>>>>>>> it eliminate the risk of deleting old releases occasionally and at
>>> the
>>>> same
>>>>>>> time keeps the trace of who is operating.
>>>>>>> 
>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12
>>>>>>> 
>>>>>>> The PyFlink package bundles the jars in the package. As we know,
>>> there
>>>> are
>>>>>>> two versions of jars for each module: one for Scala 2.11 and the
>>> other
>>>> for
>>>>>>> Scala 2.12. So there will be two PyFlink packages theoretically. We
>>>> need to
>>>>>>> decide which one to publish to PyPI or both. If both packages will
>> be
>>>>>>> published to PyPI, we may need two projects, such as pyflink_211
>> and
>>>>>>> pyflink_212 separately. Maybe more in the future such as
>> pyflink_213.
>>>>>>> 
>>>>>>>     (BTW, I think we should bring up a discussion for dorp
>>> Scala_2.11
>>>> in
>>>>>>> Flink 1.10 release due to 2.13 is available in early June.)
>>>>>>> 
>>>>>>>     From the points of my view, for now, we can only release the
>>>> scala_2.11
>>>>>>> version, due to scala_2.11 is our default version in Flink.
>>>>>>> 
>>>>>>> 3. Legal problems of publishing to PyPI
>>>>>>> 
>>>>>>> As @Chesnay Schepler <ches...@apache.org>  pointed out in
>>>> FLINK-13011[3],
>>>>>>> publishing PyFlink to PyPI means that we will publish binaries to a
>>>>>>> distribution channel not owned by Apache. We need to figure out if
>>>> there
>>>>>>> are legal problems. From my point of view, there are no problems
>> as a
>>>> few
>>>>>>> Apache projects such as Spark, Beam, etc have already done it.
>>> Frankly
>>>>>>> speaking, I am not familiar with this problem, welcome any feedback
>>> on
>>>> this
>>>>>>> if somebody is more family with this.
>>>>>>> 
>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's project name
>>>> `pyflink`
>>>>>>> to the Apache Flink community!!!
>>>>>>> Great thanks to @Dian for the offline effort!!!
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
>>>>>>> [2]
>>>>>>> 
>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Reply via email to