+1 We are recently struggling with building a pulsar image in house (lots of app sec constraints etc). a much reduced and minimal image would certainly help there.
Any estimates on the size reduction in the base pulsar image after removal of python related content? Is there scope of further slim down of the base pulsar image by removing anything non essential in running a broker (or as a bookie or zk) Regards On Thu, Mar 7, 2024 at 11:19 PM Neng Lu <freen...@gmail.com> wrote: > +1 > > This can reduce the image size significantly and thus improve the > efficiency and reduce the cost. > > On Tue, Mar 5, 2024 at 11:25 PM Enrico Olivelli <eolive...@gmail.com> > wrote: > > > +1 > > > > Great idea > > > > Enrico > > > > Il Mer 6 Mar 2024, 08:23 Zixuan Liu <node...@gmail.com> ha scritto: > > > > > +1 > > > > > > This is a good idea, and then we must provide a document on building > the > > > own connector image and python functions runtime image. > > > > > > Thanks, > > > Zixuan > > > > > > Matteo Merli <matteo.me...@gmail.com> 于2024年3月6日周三 07:04写道: > > > > > > > The docker image `pulsar-all` is a convenience image that is created > on > > > top > > > > of the base `pulsar` image, including all the Pulsar IO connectors as > > > well > > > > as the tiered storage offloaders. > > > > > > > > The Dockerfile for `pulsar-all` can be found here: > > > > > > > > > > https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile > > > > > > > > The resulting image is very big: > > > > > > > > ``` > > > > apachepulsar/pulsar-all 3.1.2 > > > > 3d1aa250bf6c 2 months ago 3.68GB > > > > ``` > > > > > > > > This poses a challenge in many ways: > > > > 1. Our CI pipeline needs to build these images and cache them across > > > > different stages of the pipeline > > > > 2. It takes a lot of time for release managers to build and push > these > > > > images to Docker Hub > > > > 3. Users using this image in production see very long download > times, > > > > something that can affect the availability of the system (eg: more > > > chances > > > > of a 2nd broker to crash if a restart takes a very long time). > > > > 4. It's very unlikely that one user will require all the connectors, > > > most > > > > likely, it would use just 2-3 of them. > > > > > > > > The problem is that `pulsar-all` was introduced at a time when there > > were > > > > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a > 1.9 > > > GB > > > > total size. > > > > > > > > The proposal here is to drop this image altogether. Users will be > able > > to > > > > construct their own targeted images in a very simple way: > > > > > > > > ``` > > > > FROM apachepulsar/pulsar:latest > > > > RUN mkdir -p connectors && \ > > > > cd connectors && \ > > > > wget > > > > > > > > > > > > > > https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar > > > > ``` > > > > > > > > > > > > > > > > ### Pulsar Functions Python Runtime > > > > > > > > In order to support Python functions runtime, we have been including > > the > > > > Pulsar base image with quite a bit of dependencies, from > > `pulsar-client` > > > > Python SDK, to gRPC which is quite a heavy package with many > transitive > > > > dependencies. > > > > > > > > Given that the vast majority would be using the `pulsar` base image > to > > > run > > > > brokers and not python functions, it would make sense to split the > > Python > > > > support into a different image, like `pulsar-functions-python`, which > > > > extends from the base image and adds all the needed Python > > dependencies. > > > > > > > > This way it will be very easy for users to select the appropriate > image > > > and > > > > we wouldn't be carrying a big amount of useless Python dependencies > to > > > > users who don't need them. > > > > > > > > > > > > What are people's opinions with respect to this? > > > > > > > > Matteo > > > > > > > > -- > > > > Matteo Merli > > > > <matteo.me...@gmail.com> > > > > > > > > > > > > -- > Best Regards, > Neng > -- Girish Sharma