Thanks for your confirm Till ! Publish the PyFlink into PyPI is very important for our user, I have initiated a voting thread.
Best, Jincheng Till Rohrmann <trohrm...@apache.org> 于2019年7月29日周一 下午3:01写道: > Sounds good to me. Thanks for driving this discussion. > > Cheers, > Till > > On Mon, Jul 29, 2019 at 9:24 AM jincheng sun <sunjincheng...@gmail.com> > wrote: > > > Yes Till, I think you are correct that we should make sure that the > > published Flink Python API cannot be arbitrarily deleted. > > > > So, It seems that our current consensus is: > > > > 1. Should we re publish the PyFlink into PyPI --> YES > > 2. PyPI Project Name ---> apache-flink > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > binary > > with the default Scala version same with flink default config. > > 4. PyPI account for release --> Create an account such as 'pyflink' as > > owner(only PMC can manage it) and adds the release manager's account as > > maintainers of the project. Release managers publish the package to PyPI > > using their own account but can not delete the release. > > > > So, If there no other comments, I think we should initiate a voting > thread. > > > > What do you think? > > > > Best, Jincheng > > > > > > Till Rohrmann <trohrm...@apache.org> 于2019年7月24日周三 下午1:17写道: > > > > > Sorry for chiming in so late. I would be in favor of option #2. > > > > > > I guess that the PMC would need to give the credentials to the release > > > manager for option #1. Hence, the PMC could also add the release > manager > > as > > > a maintainer which makes sure that only the PMC can delete artifacts. > > > > > > Cheers, > > > Till > > > > > > On Wed, Jul 24, 2019 at 12:33 PM jincheng sun < > sunjincheng...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > Thanks for all of your reply! > > > > > > > > Hi Stephan, thanks for the reply and prove the details we need to pay > > > > attention to. such as: Readme and Trademark compliance. Regarding the > > > PyPI > > > > account for release, #1 may have some risk that our release package > > can > > > be > > > > deleted by anyone who know the password of the account. And in this > > case > > > > PMC would not have means to correct problems. So, I think the #2 is > > > pretty > > > > safe for flink community. > > > > > > > > Hi Jeff&Dian, thanks for share your thoughts. Python API just a > > language > > > > entry point. I think which binary should be contained in the release > we > > > > should make consistency with Java release policy. So, currently we > do > > > not > > > > add the Hadoop, connectors JARs into the release package. > > > > > > > > Hi Chesnay, agree that we should ship the very common binary in > feature > > > if > > > > Java side already make the decision. > > > > > > > > So, our current consensus is: > > > > 1. Should we re publish the PyFlink into PyPI --> YES > > > > 2. PyPI Project Name ---> apache-flink > > > > 3. How to handle Scala_2.11 and Scala_2.12 ---> We only release one > > > binary > > > > with the default Scala version same with flink default config. > > > > > > > > We still need discuss how to manage PyPI account for release: > > > > -------- > > > > > 1) Create an account such as 'pyflink' as the owner share it with > all > > > the > > > > release managers and then release managers can publish the package to > > > PyPI > > > > using this account. > > > > 2) Create an account such as 'pyflink' as owner(only PMC can > manage > > > it) > > > > and adds the release manager's account as maintainers of the project. > > > > Release managers publish the package to PyPI using their own account. > > > > -------- > > > > Stephan like the #1 but want PMC can correct the problems. (sounds > like > > > #2) > > > > can you conform that ? @Stephan > > > > Chesnay and I prefer to #2 > > > > > > > > Best, Jincheng > > > > > > > > Chesnay Schepler <ches...@apache.org> 于2019年7月24日周三 下午3:57写道: > > > > > > > > > if we ship a binary, we should ship the binary we usually ship, not > > > some > > > > > highly customized version. > > > > > > > > > > On 24/07/2019 05:19, Dian Fu wrote: > > > > > > Hi Stephan & Jeff, > > > > > > > > > > > > Thanks a lot for sharing your thoughts! > > > > > > > > > > > > Regarding the bundled jars, currently only the jars in the flink > > > binary > > > > > distribution is packaged in the pyflink package. That maybe a good > > idea > > > > to > > > > > also bundle the other jars such as flink-hadoop-compatibility. We > may > > > > need > > > > > also consider whether to bundle the format jars such as flink-avro, > > > > > flink-json, flink-csv and the connector jars such as > > > > flink-connector-kafka, > > > > > etc. > > > > > > > > > > > > If FLINK_HOME is set, the binary distribution specified by > > FLINK_HOME > > > > > will be used instead. > > > > > > > > > > > > Regards, > > > > > > Dian > > > > > > > > > > > >> 在 2019年7月24日,上午9:47,Jeff Zhang <zjf...@gmail.com> 写道: > > > > > >> > > > > > >> +1 for publishing pyflink to pypi. > > > > > >> > > > > > >> Regarding including jar, I just want to make sure which flink > > binary > > > > > >> distribution we would ship with pyflink since we have multiple > > flink > > > > > binary > > > > > >> distributions (w/o hadoop). > > > > > >> Personally, I prefer to use the hadoop-included binary > > distribution. > > > > > >> > > > > > >> And I just want to confirm whether it is possible for users to > > use a > > > > > >> different flink binary distribution as long as he set env > > > FLINK_HOME. > > > > > >> > > > > > >> Besides that, I hope that there will be bi-direction link > > reference > > > > > between > > > > > >> flink doc and pypi doc. > > > > > >> > > > > > >> > > > > > >> > > > > > >> Stephan Ewen <se...@apache.org> 于2019年7月24日周三 上午12:07写道: > > > > > >> > > > > > >>> Hi! > > > > > >>> > > > > > >>> Sorry for the late involvement. Here are some thoughts from my > > > side: > > > > > >>> > > > > > >>> Definitely +1 to publishing to PyPy, even if it is a binary > > > release. > > > > > >>> Community growth into other communities is great, and if this > is > > > the > > > > > >>> natural way to reach developers in the Python community, let's > do > > > it. > > > > > This > > > > > >>> is not about our convenience, but reaching users. > > > > > >>> > > > > > >>> I think the way to look at this is that this is a convenience > > > > > distribution > > > > > >>> channel, courtesy of the Flink community. It is not an Apache > > > > release, > > > > > we > > > > > >>> make this clear in the Readme. > > > > > >>> Of course, this doesn't mean we don't try to uphold similar > > > standards > > > > > as > > > > > >>> for our official release (like proper license information). > > > > > >>> > > > > > >>> Concerning credentials sharing, I would be fine with whatever > > > option. > > > > > The > > > > > >>> PMC doesn't own it (it is an initiative by some community > > members), > > > > > but the > > > > > >>> PMC needs to ensure trademark compliance, so slight preference > > for > > > > > option > > > > > >>> #1 (PMC would have means to correct problems). > > > > > >>> > > > > > >>> I believe there is no need to differentiate between Scala > > versions, > > > > > because > > > > > >>> this is merely a convenience thing for pure Python users. Users > > > that > > > > > mix > > > > > >>> python and scala (and thus depend on specific scala versions) > can > > > > still > > > > > >>> download from Apache or build themselves. > > > > > >>> > > > > > >>> Best, > > > > > >>> Stephan > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> On Thu, Jul 4, 2019 at 9:51 AM jincheng sun < > > > > sunjincheng...@gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Hi All, > > > > > >>>> > > > > > >>>> Thanks for the feedback @Chesnay Schepler <ches...@apache.org > > > > > > @Dian! > > > > > >>>> > > > > > >>>> I think using `apache-flink` for the project name also makes > > sense > > > > to > > > > > me. > > > > > >>>> due to we should always keep in mind that Flink is owned by > > > Apache. > > > > > (And > > > > > >>>> beam also using this pattern `apache-beam` for Python API) > > > > > >>>> > > > > > >>>> Regarding the Python API release with the JAVA JARs, I think > the > > > > > >>> principle > > > > > >>>> of consideration is the convenience of the user. So, Thanks > for > > > the > > > > > >>>> explanation @Dian! > > > > > >>>> > > > > > >>>> And your right @Chesnay Schepler <ches...@apache.org> we > can't > > > > make > > > > > a > > > > > >>>> hasty decision and we need more people's opinions! > > > > > >>>> > > > > > >>>> So, I appreciate it if anyone can give us feedback and > > > suggestions! > > > > > >>>> > > > > > >>>> Best, > > > > > >>>> Jincheng > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> Chesnay Schepler <ches...@apache.org> 于2019年7月3日周三 下午8:46写道: > > > > > >>>> > > > > > >>>>> So this would not be a source release then, but a full-blown > > > binary > > > > > >>>>> release. > > > > > >>>>> > > > > > >>>>> Maybe it is just me, but I find it a bit suspect to ship an > > > entire > > > > > java > > > > > >>>>> application via PyPI, just because there's a Python API for > it. > > > > > >>>>> > > > > > >>>>> We definitely need input from more people here. > > > > > >>>>> > > > > > >>>>> On 03/07/2019 14:09, Dian Fu wrote: > > > > > >>>>>> Hi Chesnay, > > > > > >>>>>> > > > > > >>>>>> Thanks a lot for the suggestions. > > > > > >>>>>> > > > > > >>>>>> Regarding “distributing java/scala code to PyPI”: > > > > > >>>>>> The Python Table API is just a wrapper of the Java Table API > > and > > > > > >>>> without > > > > > >>>>> the java/scala code, two steps will be needed to set up an > > > > > environment > > > > > >>> to > > > > > >>>>> execute a Python Table API program: > > > > > >>>>>> 1) Install pyflink using "pip install apache-flink" > > > > > >>>>>> 2) Download the flink distribution and set the FLINK_HOME to > > it. > > > > > >>>>>> Besides, users have to make sure that the manually installed > > > Flink > > > > > is > > > > > >>>>> compatible with the pip installed pyflink. > > > > > >>>>>> Bundle the java/scala code inside the Python package will > > > > eliminate > > > > > >>>> step > > > > > >>>>> 2) and makes it more simple for users to install pyflink. > There > > > > was a > > > > > >>>> short > > > > > >>>>> discussion <https://issues.apache.org/jira/browse/SPARK-1267 > > > > on > > > > > this > > > > > >>> in > > > > > >>>>> Spark community and they finally decide to package the > > java/scala > > > > > code > > > > > >>> in > > > > > >>>>> the python package. (BTW, PySpark only bundle the jars of > scala > > > > > 2.11). > > > > > >>>>>> Regards, > > > > > >>>>>> Dian > > > > > >>>>>> > > > > > >>>>>>> 在 2019年7月3日,下午7:13,Chesnay Schepler <ches...@apache.org> > 写道: > > > > > >>>>>>> > > > > > >>>>>>> The existing artifact in the pyflink project was neither > > > released > > > > > by > > > > > >>>>> the Flink project / anyone affiliated with it nor approved by > > the > > > > > Flink > > > > > >>>> PMC. > > > > > >>>>>>> As such, if we were to use this account I believe we should > > > > delete > > > > > >>> it > > > > > >>>>> to not mislead users that this is in any way an > apache-provided > > > > > >>>>> distribution. Since this goes against the users wishes, I > would > > > be > > > > in > > > > > >>>> favor > > > > > >>>>> of creating a separate account, and giving back control over > > the > > > > > >>> pyflink > > > > > >>>>> account. > > > > > >>>>>>> My take on the raised points: > > > > > >>>>>>> 1.1) "apache-flink" > > > > > >>>>>>> 1.2) option 2 > > > > > >>>>>>> 2) Given that we only distribute python code there should > be > > no > > > > > >>> reason > > > > > >>>>> to differentiate between scala versions. We should not be > > > > > distributing > > > > > >>>> any > > > > > >>>>> java/scala code and/or modules to PyPi. Currently, I'm a bit > > > > confused > > > > > >>>> about > > > > > >>>>> this question and wonder what exactly we are trying to > publish > > > > here. > > > > > >>>>>>> 3) The should be treated as any other source release; i.e., > > it > > > > > >>> needs a > > > > > >>>>> LICENSE and NOTICE file, signatures and a PMC vote. My > > suggestion > > > > > would > > > > > >>>> be > > > > > >>>>> to make this part of our normal release process. There will > be > > > > _one_ > > > > > >>>> source > > > > > >>>>> release on dist.apache.org encompassing everything, and a > > > separate > > > > > >>>> python > > > > > >>>>> of focused source release that we push to PyPi. The LICENSE > and > > > > > NOTICE > > > > > >>>>> contained in the python source release must also be present > in > > > the > > > > > >>> source > > > > > >>>>> release of Flink; so basically the python source release is > > just > > > > the > > > > > >>>>> contents of flink-python module the maven pom.xml, with no > > other > > > > > >>> special > > > > > >>>>> sauce added during the release process. > > > > > >>>>>>> On 02/07/2019 05:42, jincheng sun wrote: > > > > > >>>>>>>> Hi all, > > > > > >>>>>>>> > > > > > >>>>>>>> With the effort of FLIP-38 [1], the Python Table > API(without > > > UDF > > > > > >>>>> support > > > > > >>>>>>>> for now) will be supported in the coming release-1.9. > > > > > >>>>>>>> As described in "Build PyFlink"[2], if users want to use > the > > > > > Python > > > > > >>>>> Table > > > > > >>>>>>>> API, they can manually install it using the command: > > > > > >>>>>>>> "cd flink-python && python3 setup.py sdist && pip install > > > > > >>>>> dist/*.tar.gz". > > > > > >>>>>>>> This is non-trivial for users and it will be better if we > > can > > > > > >>> follow > > > > > >>>>> the > > > > > >>>>>>>> Python way to publish PyFlink to PyPI > > > > > >>>>>>>> which is a repository of software for the Python > programming > > > > > >>>> language. > > > > > >>>>> Then > > > > > >>>>>>>> users can use the standard Python package > > > > > >>>>>>>> manager "pip" to install PyFlink: "pip install pyflink". > So, > > > > there > > > > > >>>> are > > > > > >>>>> some > > > > > >>>>>>>> topic need to be discussed as follows: > > > > > >>>>>>>> > > > > > >>>>>>>> 1. How to publish PyFlink to PyPI > > > > > >>>>>>>> > > > > > >>>>>>>> 1.1 Project Name > > > > > >>>>>>>> We need to decide the project name of PyPI to use, > for > > > > > >>> example, > > > > > >>>>>>>> apache-flink, pyflink, etc. > > > > > >>>>>>>> > > > > > >>>>>>>> Regarding to the name "pyflink", it has already been > > > > > >>> registered > > > > > >>>> by > > > > > >>>>>>>> @ueqt and there is already a package '1.0' released under > > this > > > > > >>>> project > > > > > >>>>>>>> which is based on flink-libraries/flink-python. > > > > > >>>>>>>> > > > > > >>>>>>>> @ueqt has kindly agreed to give this project back to > the > > > > > >>>>> community. And > > > > > >>>>>>>> he has requested that the released package '1.0' should > not > > be > > > > > >>>> removed > > > > > >>>>> as > > > > > >>>>>>>> it has already been used in their company. > > > > > >>>>>>>> > > > > > >>>>>>>> So we need to decide whether to use the name > 'pyflink'? > > > If > > > > > >>> yes, > > > > > >>>>> we > > > > > >>>>>>>> need to figure out how to tackle with the package '1.0' > > under > > > > this > > > > > >>>>> project. > > > > > >>>>>>>> From the points of my view, the "pyflink" is better > for > > > our > > > > > >>>>> project > > > > > >>>>>>>> name and we can keep the release of 1.0, maybe more people > > > want > > > > to > > > > > >>>> use. > > > > > >>>>>>>> 1.2 PyPI account for release > > > > > >>>>>>>> We need also decide on which account to use to > publish > > > > > >>> packages > > > > > >>>>> to PyPI. > > > > > >>>>>>>> There are two permissions in PyPI: owner and > > maintainer: > > > > > >>>>>>>> > > > > > >>>>>>>> 1) The owner can upload releases, delete files, > > releases > > > or > > > > > >>> the > > > > > >>>>> entire > > > > > >>>>>>>> project. > > > > > >>>>>>>> 2) The maintainer can also upload releases. However, > > they > > > > > >>> cannot > > > > > >>>>> delete > > > > > >>>>>>>> files, releases, or the project. > > > > > >>>>>>>> > > > > > >>>>>>>> So there are two options in my mind: > > > > > >>>>>>>> > > > > > >>>>>>>> 1) Create an account such as 'pyflink' as the owner > > share > > > > it > > > > > >>>> with > > > > > >>>>> all > > > > > >>>>>>>> the release managers and then release managers can publish > > the > > > > > >>>> package > > > > > >>>>> to > > > > > >>>>>>>> PyPI using this account. > > > > > >>>>>>>> 2) Create an account such as 'pyflink' as owner(only > > PMC > > > > can > > > > > >>>>> manage it) > > > > > >>>>>>>> and adds the release manager's account as maintainers of > the > > > > > >>> project. > > > > > >>>>>>>> Release managers publish the package to PyPI using their > own > > > > > >>> account. > > > > > >>>>>>>> As I know, PySpark takes Option 1) and Apache Beam > > takes > > > > > >>> Option > > > > > >>>>> 2). > > > > > >>>>>>>> From the points of my view, I prefer option 2) as > it's > > > > pretty > > > > > >>>>> safer as > > > > > >>>>>>>> it eliminate the risk of deleting old releases > occasionally > > > and > > > > at > > > > > >>>> the > > > > > >>>>> same > > > > > >>>>>>>> time keeps the trace of who is operating. > > > > > >>>>>>>> > > > > > >>>>>>>> 2. How to handle Scala_2.11 and Scala_2.12 > > > > > >>>>>>>> > > > > > >>>>>>>> The PyFlink package bundles the jars in the package. As we > > > know, > > > > > >>>> there > > > > > >>>>> are > > > > > >>>>>>>> two versions of jars for each module: one for Scala 2.11 > and > > > the > > > > > >>>> other > > > > > >>>>> for > > > > > >>>>>>>> Scala 2.12. So there will be two PyFlink packages > > > theoretically. > > > > > We > > > > > >>>>> need to > > > > > >>>>>>>> decide which one to publish to PyPI or both. If both > > packages > > > > will > > > > > >>> be > > > > > >>>>>>>> published to PyPI, we may need two projects, such as > > > pyflink_211 > > > > > >>> and > > > > > >>>>>>>> pyflink_212 separately. Maybe more in the future such as > > > > > >>> pyflink_213. > > > > > >>>>>>>> (BTW, I think we should bring up a discussion for > dorp > > > > > >>>> Scala_2.11 > > > > > >>>>> in > > > > > >>>>>>>> Flink 1.10 release due to 2.13 is available in early > June.) > > > > > >>>>>>>> > > > > > >>>>>>>> From the points of my view, for now, we can only > > release > > > > the > > > > > >>>>> scala_2.11 > > > > > >>>>>>>> version, due to scala_2.11 is our default version in > Flink. > > > > > >>>>>>>> > > > > > >>>>>>>> 3. Legal problems of publishing to PyPI > > > > > >>>>>>>> > > > > > >>>>>>>> As @Chesnay Schepler <ches...@apache.org> pointed out in > > > > > >>>>> FLINK-13011[3], > > > > > >>>>>>>> publishing PyFlink to PyPI means that we will publish > > binaries > > > > to > > > > > a > > > > > >>>>>>>> distribution channel not owned by Apache. We need to > figure > > > out > > > > if > > > > > >>>>> there > > > > > >>>>>>>> are legal problems. From my point of view, there are no > > > problems > > > > > >>> as a > > > > > >>>>> few > > > > > >>>>>>>> Apache projects such as Spark, Beam, etc have already done > > it. > > > > > >>>> Frankly > > > > > >>>>>>>> speaking, I am not familiar with this problem, welcome any > > > > > feedback > > > > > >>>> on > > > > > >>>>> this > > > > > >>>>>>>> if somebody is more family with this. > > > > > >>>>>>>> > > > > > >>>>>>>> Great thanks to @ueqt for willing to dedicate PyPI's > project > > > > name > > > > > >>>>> `pyflink` > > > > > >>>>>>>> to the Apache Flink community!!! > > > > > >>>>>>>> Great thanks to @Dian for the offline effort!!! > > > > > >>>>>>>> > > > > > >>>>>>>> Best, > > > > > >>>>>>>> Jincheng > > > > > >>>>>>>> > > > > > >>>>>>>> [1] > > > > > >>>>>>>> > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API > > > > > >>>>>>>> [2] > > > > > >>>>>>>> > > > > > >>> > > > > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#build-pyflink > > > > > >>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-13011 > > > > > >>>>>>>> > > > > > >>>>> > > > > > >> > > > > > >> -- > > > > > >> Best Regards > > > > > >> > > > > > >> Jeff Zhang > > > > > > > > > > > > > > > > > > > > > > > > > >