if we ship a binary, we should ship the binary we usually ship, not some highly customized version.

On 24/07/2019 05:19, Dian Fu wrote:
Hi Stephan & Jeff,

Thanks a lot for sharing your thoughts!

Regarding the bundled jars, currently only the jars in the flink binary 
distribution is packaged in the pyflink package. That maybe a good idea to also 
bundle the other jars such as flink-hadoop-compatibility. We may need also 
consider whether to bundle the format jars such as flink-avro, flink-json, 
flink-csv and the connector jars such as flink-connector-kafka, etc.

If FLINK_HOME is set, the binary distribution specified by FLINK_HOME will be 
used instead.

Regards,
Dian

在 2019年7月24日,上午9:47,Jeff Zhang <zjf...@gmail.com> 写道:

+1 for publishing pyflink to pypi.

Regarding including jar, I just want to make sure which flink binary
distribution we would ship with pyflink since we have multiple flink binary
distributions (w/o hadoop).
Personally, I prefer to use the hadoop-included binary distribution.

And I just want to confirm whether it is possible for users to use a
different flink binary distribution as long as he set env FLINK_HOME.

Besides that, I hope that there will be bi-direction link reference between
flink doc and pypi doc.



Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道:

Hi!

Sorry for the late involvement. Here are some thoughts from my side:

Definitely +1 to publishing to PyPy, even if it is a binary release.
Community growth into other communities is great, and if this is the
natural way to reach developers in the Python community, let's do it. This
is not about our convenience, but reaching users.

I think the way to look at this is that this is a convenience distribution
channel, courtesy of the Flink community. It is not an Apache release, we
make this clear in the Readme.
Of course, this doesn't mean we don't try to uphold similar standards as
for our official release (like proper license information).

Concerning credentials sharing, I would be fine with whatever option. The
PMC doesn't own it (it is an initiative by some community members), but the
PMC needs to ensure trademark compliance, so slight preference for option
#1 (PMC would have means to correct problems).

I believe there is no need to differentiate between Scala versions, because
this is merely a convenience thing for pure Python users. Users that mix
python and scala (and thus depend on specific scala versions) can still
download from Apache or build themselves.

Best,
Stephan



On Thu, Jul 4, 2019 at 9:51 AM jincheng sun <sunjincheng...@gmail.com>
wrote:

Hi All,

Thanks for the feedback @Chesnay Schepler <ches...@apache.org> @Dian!

I think using `apache-flink` for the project name also makes sense to me.
due to we should always keep in mind that Flink is owned by Apache. (And
beam also using this pattern `apache-beam` for Python API)

Regarding the Python API release with the JAVA JARs, I think the
principle
of consideration is the convenience of the user. So, Thanks for the
explanation @Dian!

And your right @Chesnay Schepler <ches...@apache.org>  we can't make a
hasty decision and we need more people's opinions!

So, I appreciate it if anyone can give us feedback and suggestions!

Best,
Jincheng




Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道:

So this would not be a source release then, but a full-blown binary
release.

Maybe it is just me, but I find it a bit suspect to ship an entire java
application via PyPI, just because there's a Python API for it.

We definitely need input from more people here.

On 03/07/2019 14:09, Dian Fu wrote:
Hi Chesnay,

Thanks a lot for the suggestions.

Regarding “distributing java/scala code to PyPI”:
The Python Table API is just a wrapper of the Java Table API and
without
the java/scala code, two steps will be needed to set up an environment
to
execute a Python Table API program:
1) Install pyflink using "pip install apache-flink"
2) Download the flink distribution and set the FLINK_HOME to it.
Besides, users have to make sure that the manually installed Flink is
compatible with the pip installed pyflink.
Bundle the java/scala code inside the Python package will eliminate
step
2) and makes it more simple for users to install pyflink. There was a
short
discussion <https://issues.apache.org/jira/browse/SPARK-1267> on this
in
Spark community and they finally decide to package the java/scala code
in
the python package. (BTW, PySpark only bundle the jars of scala 2.11).
Regards,
Dian

在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> 写道:

The existing artifact in the pyflink project was neither released by
the Flink project / anyone affiliated with it nor approved by the Flink
PMC.
As such, if we were to use this account I believe we should delete
it
to not mislead users that this is in any way an apache-provided
distribution. Since this goes against the users wishes, I would be in
favor
of creating a separate account, and giving back control over the
pyflink
account.
My take on the raised points:
1.1) "apache-flink"
1.2)  option 2
2) Given that we only distribute python code there should be no
reason
to differentiate between scala versions. We should not be distributing
any
java/scala code and/or modules to PyPi. Currently, I'm a bit confused
about
this question and wonder what exactly we are trying to publish here.
3) The should be treated as any other source release; i.e., it
needs a
LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would
be
to make this part of our normal release process. There will be _one_
source
release on dist.apache.org encompassing everything, and a separate
python
of focused source release that we push to PyPi. The LICENSE and NOTICE
contained in the python source release must also be present in the
source
release of Flink; so basically the python source release is just the
contents of flink-python module the maven pom.xml, with no other
special
sauce added during the release process.
On 02/07/2019 05:42, jincheng sun wrote:
Hi all,

With the effort of FLIP-38 [1], the Python Table API(without UDF
support
for now) will be supported in the coming release-1.9.
As described in "Build PyFlink"[2], if users want to use the Python
Table
API, they can manually install it using the command:
"cd flink-python && python3 setup.py sdist && pip install
dist/*.tar.gz".
This is non-trivial for users and it will be better if we can
follow
the
Python way to publish PyFlink to PyPI
which is a repository of software for the Python programming
language.
Then
users can use the standard Python package
manager "pip" to install PyFlink: "pip install pyflink". So, there
are
some
topic need to be discussed as follows:

1. How to publish PyFlink to PyPI

1.1 Project Name
      We need to decide the project name of PyPI to use, for
example,
apache-flink,  pyflink, etc.

     Regarding to the name "pyflink", it has already been
registered
by
@ueqt and there is already a package '1.0' released under this
project
which is based on flink-libraries/flink-python.

    @ueqt has kindly agreed to give this project back to the
community. And
he has requested that the released package '1.0' should not be
removed
as
it has already been used in their company.

     So we need to decide whether to use the name 'pyflink'?  If
yes,
we
need to figure out how to tackle with the package '1.0' under this
project.
     From the points of my view, the "pyflink" is better for our
project
name and we can keep the release of 1.0, maybe more people want to
use.
1.2 PyPI account for release
     We need also decide on which account to use to publish
packages
to PyPI.
     There are two permissions in PyPI: owner and maintainer:

     1) The owner can upload releases, delete files, releases or
the
entire
project.
     2) The maintainer can also upload releases. However, they
cannot
delete
files, releases, or the project.

     So there are two options in my mind:

     1) Create an account such as 'pyflink' as the owner share it
with
all
the release managers and then release managers can publish the
package
to
PyPI using this account.
     2) Create an account such as 'pyflink' as owner(only PMC can
manage it)
and adds the release manager's account as maintainers of the
project.
Release managers publish the package to PyPI using their own
account.
     As I know, PySpark takes Option 1) and Apache Beam takes
Option
2).
     From the points of my view, I prefer option 2) as it's pretty
safer as
it eliminate the risk of deleting old releases occasionally and at
the
same
time keeps the trace of who is operating.

2. How to handle Scala_2.11 and Scala_2.12

The PyFlink package bundles the jars in the package. As we know,
there
are
two versions of jars for each module: one for Scala 2.11 and the
other
for
Scala 2.12. So there will be two PyFlink packages theoretically. We
need to
decide which one to publish to PyPI or both. If both packages will
be
published to PyPI, we may need two projects, such as pyflink_211
and
pyflink_212 separately. Maybe more in the future such as
pyflink_213.
     (BTW, I think we should bring up a discussion for dorp
Scala_2.11
in
Flink 1.10 release due to 2.13 is available in early June.)

     From the points of my view, for now, we can only release the
scala_2.11
version, due to scala_2.11 is our default version in Flink.

3. Legal problems of publishing to PyPI

As @Chesnay Schepler <ches...@apache.org>  pointed out in
FLINK-13011[3],
publishing PyFlink to PyPI means that we will publish binaries to a
distribution channel not owned by Apache. We need to figure out if
there
are legal problems. From my point of view, there are no problems
as a
few
Apache projects such as Spark, Beam, etc have already done it.
Frankly
speaking, I am not familiar with this problem, welcome any feedback
on
this
if somebody is more family with this.

Great thanks to @ueqt for willing to dedicate PyPI's project name
`pyflink`
to the Apache Flink community!!!
Great thanks to @Dian for the offline effort!!!

Best,
Jincheng

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API
[2]

https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink
[3] https://issues.apache.org/jira/browse/FLINK-13011



--
Best Regards

Jeff Zhang


Reply via email to