Hi Chesnay,

Yes, in most cases, we can indeed download the required jars in `setup.py`,
which is also the solution I originally thought of reducing the size of
wheel packages. However, I'm afraid that it will not work in scenarios when
accessing the external network is not possible which is very common in the
production cluster.

Best,
Xingbo

Chesnay Schepler <ches...@apache.org> 于2021年3月16日周二 下午8:32写道:

> This proposed apache-flink-libraries package would just contain the
> binary, right? And effectively be unusable to the python audience on
> it's own.
>
> Essentially we are just abusing Pypi for shipping a java binary. Is
> there no way for us to download the jars when the python package is
> being installed? (e.g., in setup.py)
>
> On 3/16/2021 1:23 PM, Dian Fu wrote:
> > Yes, the size of .whl file in PyFlink will also be about 3MB if we split
> the package. Currently the package is big because we bundled the jar files
> in it.
> >
> >> 2021年3月16日 下午8:13,Chesnay Schepler <ches...@apache.org> 写道:
> >>
> >> key difference being that the beam .whl files are 3mb large, aka 60x
> smaller.
> >>
> >> On 3/16/2021 1:06 PM, Dian Fu wrote:
> >>> Hi Chesnay,
> >>>
> >>> We will publish binary packages separately for:
> >>> 1) Python 3.5 / 3.6 / 3.7 / 3.8 (since 1.12) separately
> >>> 2) Linux / Mac separately
> >>>
> >>> Besides, there is also a source package which is used when none of the
> above binary packages is usable, e.g. for Window users.
> >>>
> >>> PS: publishing multiple binary packages is very common in Python
> world, e.g. Beam published 22 packages in 2.28, Pandas published 16
> packages in 1.2.3 [2]. We could also publishing more packages if we
> splitting the packages as the cost of adding another package will be very
> small.
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>> [1] https://pypi.org/project/apache-beam/#files <
> https://pypi.org/project/apache-beam/#files> <
> https://pypi.org/project/apache-beam/#files <
> https://pypi.org/project/apache-beam/#files>>
> >>> [2] https://pypi.org/project/pandas/#files
> >>>
> >>>
> >>> Hi Xintong,
> >>>
> >>> Yes, you are right that there is 9 packages in 1.12 as we added Python
> 3.8 support in 1.12.
> >>>
> >>> Regards,
> >>> Dian
> >>>
> >>>> 2021年3月16日 下午7:45,Xintong Song <tonysong...@gmail.com> 写道:
> >>>>
> >>>> And it's not only uploaded to PyPI, but the ASF mirrors as well.
> >>>>
> >>>> https://dist.apache.org/repos/dist/release/flink/flink-1.12.2/python/
> >>>>
> >>>> Thank you~
> >>>>
> >>>> Xintong Song
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Mar 16, 2021 at 7:41 PM Xintong Song <tonysong...@gmail.com>
> wrote:
> >>>>
> >>>>> Actually, I think it's 9 packages, not 7.
> >>>>>
> >>>>> Check here for the 1.12.2 packages.
> >>>>> https://pypi.org/project/apache-flink/#files
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Mar 16, 2021 at 7:08 PM Chesnay Schepler <ches...@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Am I reading this correctly that we publish 7 different artifacts
> just
> >>>>>> for python?
> >>>>>> What does the release matrix look like?
> >>>>>>
> >>>>>> On 3/16/2021 3:45 AM, Dian Fu wrote:
> >>>>>>> Hi Xingbo,
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks a lot for bringing up this discussion. Actually the size
> limit
> >>>>>> already becomes an issue during releasing 1.11.3 and 1.12.1. It
> blocks us
> >>>>>> to publish PyFlink packages to PyPI during the release as there is
> no
> >>>>>> enough space left (PS: already published the packages after
> increasing the
> >>>>>> size limit).
> >>>>>>> Considering that the total package size are about 1.5GB (220MB *
> 7) for
> >>>>>> each release, it makes sense to split the PyFlink package. It could
> reduce
> >>>>>> the total package size to about 250MB (3MB * 7 + 220 MB) for each
> release.
> >>>>>> We don’t need to increase the size limit any more in the next few
> years as
> >>>>>> currently we still have about 7.5 GB space left.
> >>>>>>> So +1 from my side.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Dian
> >>>>>>>
> >>>>>>>> 2021年3月12日 下午2:30,Xingbo Huang <hxbks...@gmail.com> 写道:
> >>>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> Since release-1.11, pyflink has introduced cython support and we
> will
> >>>>>>>> release 7 packages (for different platforms and Python versions)
> to
> >>>>>> PyPI
> >>>>>>>> for each release and the size of each package is more than 200MB
> as we
> >>>>>> need
> >>>>>>>> to bundle the jar files into the package. The entire project
> space in
> >>>>>> PyPI
> >>>>>>>> grows very fast, and we need to apply to PyPI for more project
> space
> >>>>>>>> frequently. Please refer to [
> >>>>>> https://github.com/pypa/pypi-support/issues/831]
> >>>>>>>> for more details.
> >>>>>>>>
> >>>>>>>> The root cause to this problem is that we bundled the jar files
> in each
> >>>>>>>> package. This is actually unnecessary if we could extract the jar
> files
> >>>>>>>> into a separate package which is dedicated to hold the jar files.
> >>>>>>>>
> >>>>>>>> I’d like to propose to split the pyflink package into two
> packages: the
> >>>>>>>> original apache-flink  and apache-flink-libraries (Any
> suggestions for
> >>>>>> the
> >>>>>>>> name?). The package apache-flink-libraries only contains jar
> files and
> >>>>>>>> there is only one apache-flink-libraries package for each
> release. The
> >>>>>>>> package apache-flink depends on apache-flink-libraries and for
> users,
> >>>>>> they
> >>>>>>>> still only need to install apache-flink and there is nothing
> different
> >>>>>> from
> >>>>>>>> before. We still need to release multiple wheel packages of
> >>>>>> apache-flink.
> >>>>>>>> However, the size will be very small as it doesn't contain the jar
> >>>>>> files
> >>>>>>>> any more.
> >>>>>>>>
> >>>>>>>> Looking forward to your feedback.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Xingbo
> >
>
>

Reply via email to