+1 for publishing pyflink to pypi.

Regarding including jar, I just want to make sure which flink binary
distribution we would ship with pyflink since we have multiple flink binary
distributions (w/o hadoop).
Personally, I prefer to use the hadoop-included binary distribution.

And I just want to confirm whether it is possible for users to use a
different flink binary distribution as long as he set env FLINK_HOME.

Besides that, I hope that there will be bi-direction link reference between
flink doc and pypi doc.



Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道:

> Hi!
>
> Sorry for the late involvement. Here are some thoughts from my side:
>
> Definitely +1 to publishing to PyPy, even if it is a binary release.
> Community growth into other communities is great, and if this is the
> natural way to reach developers in the Python community, let's do it. This
> is not about our convenience, but reaching users.
>
> I think the way to look at this is that this is a convenience distribution
> channel, courtesy of the Flink community. It is not an Apache release, we
> make this clear in the Readme.
> Of course, this doesn't mean we don't try to uphold similar standards as
> for our official release (like proper license information).
>
> Concerning credentials sharing, I would be fine with whatever option. The
> PMC doesn't own it (it is an initiative by some community members), but the
> PMC needs to ensure trademark compliance, so slight preference for option
> #1 (PMC would have means to correct problems).
>
> I believe there is no need to differentiate between Scala versions, because
> this is merely a convenience thing for pure Python users. Users that mix
> python and scala (and thus depend on specific scala versions) can still
> download from Apache or build themselves.
>
> Best,
> Stephan
>
>
>
> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <sunjincheng...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Thanks for the feedback @Chesnay Schepler <ches...@apache.org> @Dian!
> >
> > I think using `apache-flink` for the project name also makes sense to me.
> > due to we should always keep in mind that Flink is owned by Apache. (And
> > beam also using this pattern `apache-beam` for Python API)
> >
> > Regarding the Python API release with the JAVA JARs, I think the
> principle
> > of consideration is the convenience of the user. So, Thanks for the
> > explanation @Dian!
> >
> > And your right @Chesnay Schepler <ches...@apache.org>  we can't make a
> > hasty decision and we need more people's opinions!
> >
> > So, I appreciate it if anyone can give us feedback and suggestions!
> >
> > Best,
> > Jincheng
> >
> >
> >
> >
> > Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道:
> >
> > > So this would not be a source release then, but a full-blown binary
> > > release.
> > >
> > > Maybe it is just me, but I find it a bit suspect to ship an entire java
> > > application via PyPI, just because there's a Python API for it.
> > >
> > > We definitely need input from more people here.
> > >
> > > On 03/07/2019 14:09, Dian Fu wrote:
> > > > Hi Chesnay,
> > > >
> > > > Thanks a lot for the suggestions.
> > > >
> > > > Regarding “distributing java/scala code to PyPI”:
> > > > The Python Table API is just a wrapper of the Java Table API and
> > without
> > > the java/scala code, two steps will be needed to set up an environment
> to
> > > execute a Python Table API program:
> > > > 1) Install pyflink using "pip install apache-flink"
> > > > 2) Download the flink distribution and set the FLINK_HOME to it.
> > > > Besides, users have to make sure that the manually installed Flink is
> > > compatible with the pip installed pyflink.
> > > >
> > > > Bundle the java/scala code inside the Python package will eliminate
> > step
> > > 2) and makes it more simple for users to install pyflink. There was a
> > short
> > > discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this
> in
> > > Spark community and they finally decide to package the java/scala code
> in
> > > the python package. (BTW, PySpark only bundle the jars of scala 2.11).
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > >> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道:
> > > >>
> > > >> The existing artifact in the pyflink project was neither released by
> > > the Flink project / anyone affiliated with it nor approved by the Flink
> > PMC.
> > > >>
> > > >> As such, if we were to use this account I believe we should delete
> it
> > > to not mislead users that this is in any way an apache-provided
> > > distribution. Since this goes against the users wishes, I would be in
> > favor
> > > of creating a separate account, and giving back control over the
> pyflink
> > > account.
> > > >>
> > > >> My take on the raised points:
> > > >> 1.1) "apache-flink"
> > > >> 1.2)  option 2
> > > >> 2) Given that we only distribute python code there should be no
> reason
> > > to differentiate between scala versions. We should not be distributing
> > any
> > > java/scala code and/or modules to PyPi. Currently, I'm a bit confused
> > about
> > > this question and wonder what exactly we are trying to publish here.
> > > >> 3) The should be treated as any other source release; i.e., it
> needs a
> > > LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would
> > be
> > > to make this part of our normal release process. There will be _one_
> > source
> > > release on dist.apache.org encompassing everything, and a separate
> > python
> > > of focused source release that we push to PyPi. The LICENSE and NOTICE
> > > contained in the python source release must also be present in the
> source
> > > release of Flink; so basically the python source release is just the
> > > contents of flink-python module the maven pom.xml, with no other
> special
> > > sauce added during the release process.
> > > >>
> > > >> On 02/07/2019 05:42, jincheng sun wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> With the effort of FLIP-38 [1], the Python Table API(without UDF
> > > support
> > > >>> for now) will be supported in the coming release-1.9.
> > > >>> As described in "Build PyFlink"[2], if users want to use the Python
> > > Table
> > > >>> API, they can manually install it using the command:
> > > >>> "cd flink-python && python3 setup.py sdist && pip install
> > > dist/*.tar.gz".
> > > >>>
> > > >>> This is non-trivial for users and it will be better if we can
> follow
> > > the
> > > >>> Python way to publish PyFlink to PyPI
> > > >>> which is a repository of software for the Python programming
> > language.
> > > Then
> > > >>> users can use the standard Python package
> > > >>> manager "pip" to install PyFlink: "pip install pyflink". So, there
> > are
> > > some
> > > >>> topic need to be discussed as follows:
> > > >>>
> > > >>> 1. How to publish PyFlink to PyPI
> > > >>>
> > > >>> 1.1 Project Name
> > > >>>       We need to decide the project name of PyPI to use, for
> example,
> > > >>> apache-flink,  pyflink, etc.
> > > >>>
> > > >>>      Regarding to the name "pyflink", it has already been
> registered
> > by
> > > >>> @ueqt and there is already a package '1.0' released under this
> > project
> > > >>> which is based on flink-libraries/flink-python.
> > > >>>
> > > >>>     @ueqt has kindly agreed to give this project back to the
> > > community. And
> > > >>> he has requested that the released package '1.0' should not be
> > removed
> > > as
> > > >>> it has already been used in their company.
> > > >>>
> > > >>>      So we need to decide whether to use the name 'pyflink'?  If
> yes,
> > > we
> > > >>> need to figure out how to tackle with the package '1.0' under this
> > > project.
> > > >>>
> > > >>>      From the points of my view, the "pyflink" is better for our
> > > project
> > > >>> name and we can keep the release of 1.0, maybe more people want to
> > use.
> > > >>>
> > > >>> 1.2 PyPI account for release
> > > >>>      We need also decide on which account to use to publish
> packages
> > > to PyPI.
> > > >>>
> > > >>>      There are two permissions in PyPI: owner and maintainer:
> > > >>>
> > > >>>      1) The owner can upload releases, delete files, releases or
> the
> > > entire
> > > >>> project.
> > > >>>      2) The maintainer can also upload releases. However, they
> cannot
> > > delete
> > > >>> files, releases, or the project.
> > > >>>
> > > >>>      So there are two options in my mind:
> > > >>>
> > > >>>      1) Create an account such as 'pyflink' as the owner share it
> > with
> > > all
> > > >>> the release managers and then release managers can publish the
> > package
> > > to
> > > >>> PyPI using this account.
> > > >>>      2) Create an account such as 'pyflink' as owner(only PMC can
> > > manage it)
> > > >>> and adds the release manager's account as maintainers of the
> project.
> > > >>> Release managers publish the package to PyPI using their own
> account.
> > > >>>
> > > >>>      As I know, PySpark takes Option 1) and Apache Beam takes
> Option
> > > 2).
> > > >>>
> > > >>>      From the points of my view, I prefer option 2) as it's pretty
> > > safer as
> > > >>> it eliminate the risk of deleting old releases occasionally and at
> > the
> > > same
> > > >>> time keeps the trace of who is operating.
> > > >>>
> > > >>> 2. How to handle Scala_2.11 and Scala_2.12
> > > >>>
> > > >>> The PyFlink package bundles the jars in the package. As we know,
> > there
> > > are
> > > >>> two versions of jars for each module: one for Scala 2.11 and the
> > other
> > > for
> > > >>> Scala 2.12. So there will be two PyFlink packages theoretically. We
> > > need to
> > > >>> decide which one to publish to PyPI or both. If both packages will
> be
> > > >>> published to PyPI, we may need two projects, such as pyflink_211
> and
> > > >>> pyflink_212 separately. Maybe more in the future such as
> pyflink_213.
> > > >>>
> > > >>>      (BTW, I think we should bring up a discussion for dorp
> > Scala_2.11
> > > in
> > > >>> Flink 1.10 release due to 2.13 is available in early June.)
> > > >>>
> > > >>>      From the points of my view, for now, we can only release the
> > > scala_2.11
> > > >>> version, due to scala_2.11 is our default version in Flink.
> > > >>>
> > > >>> 3. Legal problems of publishing to PyPI
> > > >>>
> > > >>> As @Chesnay Schepler <ches...@apache.org>  pointed out in
> > > FLINK-13011[3],
> > > >>> publishing PyFlink to PyPI means that we will publish binaries to a
> > > >>> distribution channel not owned by Apache. We need to figure out if
> > > there
> > > >>> are legal problems. From my point of view, there are no problems
> as a
> > > few
> > > >>> Apache projects such as Spark, Beam, etc have already done it.
> > Frankly
> > > >>> speaking, I am not familiar with this problem, welcome any feedback
> > on
> > > this
> > > >>> if somebody is more family with this.
> > > >>>
> > > >>> Great thanks to @ueqt for willing to dedicate PyPI's project name
> > > `pyflink`
> > > >>> to the Apache Flink community!!!
> > > >>> Great thanks to @Dian for the offline effort!!!
> > > >>>
> > > >>> Best,
> > > >>> Jincheng
> > > >>>
> > > >>> [1]
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
> > > >>> [2]
> > > >>>
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
> > > >>> [3] https://issues.apache.org/jira/browse/FLINK-13011
> > > >>>
> > > >
> > >
> > >
> >
>


-- 
Best Regards

Jeff Zhang

Reply via email to