Hi Alex,

the situation for UBI9 doesn't look much different from Ubuntu:

registry.access.redhat.com/ubi9/ubi (redhat 9.3)
Total: 166 (UNKNOWN: 0, LOW: 138, MEDIUM: 28, HIGH: 0, CRITICAL: 0)

Full list: https://gist.github.com/merlimat/ba96b91ea49709bb218ddc3906bb9e95


--
Matteo Merli
<matteo.me...@gmail.com>


On Thu, Feb 15, 2024 at 9:10 AM Alexander Hall
<ah...@teknoluxion.com.invalid> wrote:

> Reviving a previous tangent from this discussion. Using UBI9 as a base is
> also a great option. Some end-users use that as a base and copy the files
> from the pulsar and pulsar-all containers as an upstream source.
>
> -Alex H
>
> -----Original Message-----
> From: Matteo Merli <matteo.me...@gmail.com>
> Sent: Wednesday, February 14, 2024 2:01 PM
> To: david.chris...@discordapp.com.invalid
> Cc: dev@pulsar.apache.org
> Subject: ''Re: Re: [DISCUSS] PIP-324: Alpine Docker images
>
> [You don't often get email from *REDACTED*. Learn why this is important at
> https://aka.ms/LearnAboutSenderIdentification ]
>
> Reviving the discussion thread.
>
>
> > For Netty, I think netty-transport-native-epoll is only built against
> > glibc (
>
> https://netty.io/wiki/native-transports.html#using-the-linux-native-transport
> ).
> > Is there a workaround ?
>
> Yes, there is a workaround for Netty. It works perfectly fine by including
> the GLibc compatibility library. Same for Kinesis producer (side note:
> Kinesis SDK is the worst train wreck I've seen in many many years: it's a
> C++ binary that it spawned from Java and communicates through a pipe...
> anyway it works fine with the GLibc compatibility lib).
>
> > Other than that, there is the DNS caching issue Lari mentioned.
>
> I think the DNS issue was already solved a few releases ago. In any case,
> it wouldn't affect Pulsar/BK since we use the Netty DNS client. In the same
> way, I believe that JDK also doesn't use the glibc provided DNS client:
> that's why we configure the DNS cache directly in the JVM configuration.
>
> >> - Using a smaller base image like Alpine can save space. The relative
> size of the JRE image for Alpine is about 45% smaller than the equivalent
> Ubuntu slim image.
> >> - The Ubuntu image has a few tens of CVEs in it, as reported by an
> automated container CVE scan tool, compared to 0 in Alpine.
> > These seem reasonable, but the true magnitude of benefit is likely
> > lower
> in practice. The pulsar-all images are 2.7GB in size, so saving 166MB on
> the base + JRE install translates to just a 6% smaller image. Unless we
> expect other installed packages part of pulsar-all to gain additional space
> savings on Alpine, this difference seems very marginal in practice.
>
> `pulsar-all` is ready for separate discussion (I actually think we should
> discontinue that image).
>
> For `pulsar` image:
>  * apache/pulsar:3.2.0 (which already does not include Presto anymore):
> 919 MB
>  * alpine image wip: 505 MB
>
> There are additional ways we should explore to further reduce the image
> size (eg: removing unused JDK modules, Python packages, etc...)
>
> > Security-wise, I took a cursory look at the CVEs, and many of them are
> > in
> libraries that aren’t used in a Pulsar deployment/are difficult to
> envision a practical exploit scenario. Automated scanning tool results
> should be taken with a grain of salt - they generate a lot of alerts, and
> many public container images throw off these CVE alerts nowadays. The
> counterargument is that only a fraction of the libraries indicated are even
> loaded at runtime, only some fraction of those end up potentially being
> exploitable, and only a smaller fraction have no fix/workaround. This isn’t
> to say reducing the vulnerability surface by using an image with less cruft
> in it is not a worthwhile endeavor — I do think we should try to tackle it
> -- but I’m simply trying to be realistic about what our actual gains will
> be from switching to Alpine.
>
> Even though the CVEs might not be a "real" security issue, or not be
> exploitable in the context of Pulsar, it is really not how any security
> team would look at it. From their perspective, it becomes unmanageable to
> check and understand every single CVE to assess the potential specific
> threat.
>
> This is a real problem that is causing a lot of headaches to have Pulsar
> distribution taken seriously from a security posture perspective.
>
> Just have a glance at the security CVE issues in our last Pulsar release,
> released just a few days ago:
>
> apachepulsar/pulsar:3.2.0 (ubuntu 22.04)
> Total: 243 (UNKNOWN: 0, LOW: 146, MEDIUM: 93, HIGH: 4, CRITICAL: 0)
>
> Compare with Pulsar image based on Alpine:
>
> merlimat/pulsar:3.3.0-SNAPSHOT-f2a91a1 (alpine 3.19.1)
> Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0)
>
> Full list here:
> https://gist.github.com/merlimat/ee7534992b21cae0b04c8c63f64456ff
> The above are all issues coming from Ubuntu base image.
>
> > It’s also worth mentioning we’d be moving away from other large
> open-source big data projects in a way. Spark [2], Flink [3], Kafka [4],
> Elasticsearch [5], and Trino [6] are based on Temurin/Ubuntu/ubi. In my
> brief search, I didn’t find familiar names of tools in the big data
> ecosystem with official images based on Alpine.
> > Distroless would also remove almost everything from our base images,
> minimizing space, reducing the vulnerability surface, and by extension,
> reducing the CVE alerts from automated tooling. Apache Druid [7] has used
> Distroless for a while in their official images. We could achieve the same
> aims without any risk from musl/glibc, DNS quirks, or other hiccups that
> Alpine may have.
>
>
> Regarding the OpenJDK distribution, the team from Amazon Corretto,
> publishes well tested and supported Alpine packages. See
> https://aws.amazon.com/corretto
>
> I have created a WIP/draft PR to show the potential changes:
> https://github.com/apache/pulsar/pull/22054
>
> The image already passes all the integration tests and has been tested for
> few weeks in a test cluster.
>
> I have pushed a Docker image for preview purposes:
> merlimat/pulsar/3.3.0-SNAPSHOT-f2a91a1
>
>
> https://hub.docker.com/layers/merlimat/pulsar/3.3.0-SNAPSHOT-f2a91a1/images/sha256-2d94832618bf30c02baa269bdf943c8f37aa5430258b7b4018f37ed120abb17a?context=explore
>
> Thanks,
> Matteo
>
> --
> Matteo Merli
> <matteo.me...@gmail.com>
>
>
> On Wed, Dec 20, 2023 at 12:49 PM David Christle
> <david.chris...@discordapp.com.invalid> wrote:
>
> > Are we sure the move to Alpine is worth the extensive performance
> > testing and the risk of issues? Sticking with a popular glibc image
> > like Temurin, Ubuntu/Debian, or ubi-minimal (mentioned also in this
> > discussion) seems like a better path to me, without the risk of glibc
> > vs musl issues. Using Distroless seems like another good potential
> > option, as it would achieve the same aims as the Alpine move, with less
> potential risk.
> >
> > The DNS issues seen with Alpine are worth paying strong attention to.
> > Someone running a Pulsar deployment using the images could have a very
> > difficult time debugging library/glibc vs musl/DNS issues, due to
> > their low-level nature. A fix for the DNS issue only landed less than
> > a year ago [1]. Unless we have a compelling reason for Alpine, it may
> > be safer to wait for more adoption/testing before choosing it for the
> official Pulsar images.
> >
> > The two main arguments in the PIP are:
> >
> > - Using a smaller base image like Alpine can save space. The relative
> > size of the JRE image for Alpine is about 45% smaller than the
> > equivalent Ubuntu slim image.
> >
> > - The Ubuntu image has a few tens of CVEs in it, as reported by an
> > automated container CVE scan tool, compared to 0 in Alpine.
> >
> >
> > These seem reasonable, but the true magnitude of benefit is likely
> > lower in practice. The pulsar-all images are 2.7GB in size, so saving
> > 166MB on the base + JRE install translates to just a 6% smaller image.
> > Unless we expect other installed packages part of pulsar-all to gain
> > additional space savings on Alpine, this difference seems very marginal
> in practice.
> >
> > Security-wise, I took a cursory look at the CVEs, and many of them are
> > in libraries that aren’t used in a Pulsar deployment/are difficult to
> > envision a practical exploit scenario. Automated scanning tool results
> > should be taken with a grain of salt - they generate a lot of alerts,
> > and many public container images throw off these CVE alerts nowadays.
> > The counterargument is that only a fraction of the libraries indicated
> > are even loaded at runtime, only some fraction of those end up
> > potentially being exploitable, and only a smaller fraction have no
> > fix/workaround. This isn’t to say reducing the vulnerability surface
> > by using an image with less cruft in it is not a worthwhile endeavor —
> > I do think we should try to tackle it -- but I’m simply trying to be
> > realistic about what our actual gains will be from switching to Alpine.
> >
> > It’s also worth mentioning we’d be moving away from other large
> > open-source big data projects in a way. Spark [2], Flink [3], Kafka
> > [4], Elasticsearch [5], and Trino [6] are based on Temurin/Ubuntu/ubi.
> > In my brief search, I didn’t find familiar names of tools in the big
> > data ecosystem with official images based on Alpine.
> >
> > Distroless would also remove almost everything from our base images,
> > minimizing space, reducing the vulnerability surface, and by
> > extension, reducing the CVE alerts from automated tooling. Apache
> > Druid [7] has used Distroless for a while in their official images. We
> > could achieve the same aims without any risk from musl/glibc, DNS
> > quirks, or other hiccups that Alpine may have.
> >
> > Regards,
> > David
> >
> >
> > [1]
> > https://gitlab.alpinelinux.org/alpine/tsc/-/issues/43#note_295556
> > [2] Apache Spark - Temurin -
> > https://github.com/apache/flink-docker/tree/master/1.18
> > [3] Apache Flink - Temurin -
> > https://github.com/apache/flink-docker/tree/master/1.18
> > [4] KIP-975: Docker Image for Apache Kafka - Temurin -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka
> > [5] Elasticsearch - Ubuntu & ubi-minimal -
> >
> https://github.com/elastic/elasticsearch/blob/bdde29720a9e37224a90e5f186abbcbc73ff9351/distribution/docker/README.md
> [6] Trino - ubi, after moving from Ubuntu -
> >
> https://hub.docker.com/layers/trinodb/trino/435/images/sha256-9540a785c31c4ba9ad099ad99ae06ccd5ccca506e39b7d557effe1482309e05d
> > [7] Apache Druid - Distroless -
> >
> https://github.com/apache/druid/blob/e373f6269251655f5be93ce895aee8dee8cc67dd/distribution/docker/Dockerfile#L4
> >
> >
> > On 2023/12/13 17:06:12 Matteo Merli wrote:
> > > I don't think the compatibility for downstream users is going to be
> > > a big
> > > problem:
> > >  1. Most users don't need to modify the Pulsar image in significant
> > > way  2. If they do, they won't be using the "latest" tag, but rather
> > > a
> > specific
> > > version
> > >  3. Users who are dependent on the Ubuntu base image can stay on the
> > > 3.0 LTS release branch for the entire LTS lifespan
> > >
> > > I would avoid supporting 2 images at the same time because it would
> > > make
> > it
> > > very hard to properly test them both.
> > >
> > >
> > > --
> > > Matteo Merli
> > > <mm...@apache.org>
> > >
> > >
> > > On Tue, Dec 12, 2023 at 8:57 PM Zixuan Liu <zi...@apache.org> wrote:
> > >
> > > > +1.
> > > >
> > > > It is a good idea to use the Alpine image to run the Pulsar, as it
> > > > is
> > more
> > > > secure.
> > > >
> > > > However, switching images may affect downstream users, and I am
> > wondering
> > > > if it is possible to provide multiple docker tags:
> > > >   - latest: using the Ubuntu image
> > > >   - alpine: using the Alpine image
> > > >
> > > > Thanks,
> > > > Zixuan
> > > >
> > > > Yunze Xu <xy...@apache.org> 于2023年12月13日周三 12:24写道:
> > > >
> > > > > +1 to me. The Alpine Linux is much more light-weight than Ubuntu.
> > > > >
> > > > > Thanks,
> > > > > Yunze
> > > > >
> > > > > On Wed, Dec 13, 2023 at 3:00 AM Matteo Merli <mm...@apache.org>
> > wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I've created a new proposal to switch Pulsar base docker
> > > > > > images
> > from
> > > > > Ubuntu
> > > > > > to Alpine Linux.
> > > > > >
> > > > > > Details and motivation in the PIP:
> > > > > > https://github.com/apache/pulsar/pull/21716
> > > > > >
> > > > > > Matteo
> > > > > >
> > > > > > --
> > > > > > Matteo Merli
> > > > > > <mm...@apache.org>
> > > > >
> > > >
> > >
>

Reply via email to