According to Red Hat their latest tagged release for UBI9.3, 9.3-1552, has four moderate CVE's (https://catalog.redhat.com/software/containers/ubi9/ubi/615bcf606feffc5384e8452e). There is also the option of basing the Pulsar image on the UBI9-minimal image (https://catalog.redhat.com/software/containers/ubi9/ubi-minimal/615bd9b4075b022acc111bf5). That may have a better security footprint.
Thank You, Alex Hall <ah...@teknoluxion.com> -----Original Message----- From: Matteo Merli <matteo.me...@gmail.com> Sent: Thursday, February 15, 2024 12:55 PM To: dev@pulsar.apache.org Subject: ''Re: Re: [DISCUSS] PIP-324: Alpine Docker images [You don't often get email from matteo.me...@gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] Hi Alex, the situation for UBI9 doesn't look much different from Ubuntu: registry.access.redhat.com/ubi9/ubi (redhat 9.3) Total: 166 (UNKNOWN: 0, LOW: 138, MEDIUM: 28, HIGH: 0, CRITICAL: 0) Full list: https://gist.github.com/merlimat/ba96b91ea49709bb218ddc3906bb9e95 -- Matteo Merli <matteo.me...@gmail.com> On Thu, Feb 15, 2024 at 9:10 AM Alexander Hall <ah...@teknoluxion.com.invalid> wrote: > Reviving a previous tangent from this discussion. Using UBI9 as a base > is also a great option. Some end-users use that as a base and copy the > files from the pulsar and pulsar-all containers as an upstream source. > > -Alex H > > -----Original Message----- > From: Matteo Merli <matteo.me...@gmail.com> > Sent: Wednesday, February 14, 2024 2:01 PM > To: david.chris...@discordapp.com.invalid > Cc: dev@pulsar.apache.org > Subject: ''Re: Re: [DISCUSS] PIP-324: Alpine Docker images > > [You don't often get email from *REDACTED*. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > Reviving the discussion thread. > > > > For Netty, I think netty-transport-native-epoll is only built > > against glibc ( > > https://nett/ > y.io%2Fwiki%2Fnative-transports.html%23using-the-linux-native-transpor > t&data=05%7C02%7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4 > f53d5%7Cfcceb892218c4d6f9e27223a522b9791%7C0%7C0%7C638436165473628082% > 7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik > 1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=kJQqQ5o6ZnuIlqO6Chq0P0Z6axc6Ji > WSP%2F5Qd7bN7xw%3D&reserved=0 > ). > > Is there a workaround ? > > Yes, there is a workaround for Netty. It works perfectly fine by > including the GLibc compatibility library. Same for Kinesis producer (side > note: > Kinesis SDK is the worst train wreck I've seen in many many years: > it's a > C++ binary that it spawned from Java and communicates through a pipe... > anyway it works fine with the GLibc compatibility lib). > > > Other than that, there is the DNS caching issue Lari mentioned. > > I think the DNS issue was already solved a few releases ago. In any > case, it wouldn't affect Pulsar/BK since we use the Netty DNS client. > In the same way, I believe that JDK also doesn't use the glibc provided DNS > client: > that's why we configure the DNS cache directly in the JVM configuration. > > >> - Using a smaller base image like Alpine can save space. The > >> relative > size of the JRE image for Alpine is about 45% smaller than the > equivalent Ubuntu slim image. > >> - The Ubuntu image has a few tens of CVEs in it, as reported by an > automated container CVE scan tool, compared to 0 in Alpine. > > These seem reasonable, but the true magnitude of benefit is likely > > lower > in practice. The pulsar-all images are 2.7GB in size, so saving 166MB > on the base + JRE install translates to just a 6% smaller image. > Unless we expect other installed packages part of pulsar-all to gain > additional space savings on Alpine, this difference seems very marginal in > practice. > > `pulsar-all` is ready for separate discussion (I actually think we > should discontinue that image). > > For `pulsar` image: > * apache/pulsar:3.2.0 (which already does not include Presto anymore): > 919 MB > * alpine image wip: 505 MB > > There are additional ways we should explore to further reduce the > image size (eg: removing unused JDK modules, Python packages, etc...) > > > Security-wise, I took a cursory look at the CVEs, and many of them > > are in > libraries that aren’t used in a Pulsar deployment/are difficult to > envision a practical exploit scenario. Automated scanning tool results > should be taken with a grain of salt - they generate a lot of alerts, > and many public container images throw off these CVE alerts nowadays. > The counterargument is that only a fraction of the libraries indicated > are even loaded at runtime, only some fraction of those end up > potentially being exploitable, and only a smaller fraction have no > fix/workaround. This isn’t to say reducing the vulnerability surface > by using an image with less cruft in it is not a worthwhile endeavor — > I do think we should try to tackle it > -- but I’m simply trying to be realistic about what our actual gains > will be from switching to Alpine. > > Even though the CVEs might not be a "real" security issue, or not be > exploitable in the context of Pulsar, it is really not how any > security team would look at it. From their perspective, it becomes > unmanageable to check and understand every single CVE to assess the > potential specific threat. > > This is a real problem that is causing a lot of headaches to have > Pulsar distribution taken seriously from a security posture perspective. > > Just have a glance at the security CVE issues in our last Pulsar > release, released just a few days ago: > > apachepulsar/pulsar:3.2.0 (ubuntu 22.04) > Total: 243 (UNKNOWN: 0, LOW: 146, MEDIUM: 93, HIGH: 4, CRITICAL: 0) > > Compare with Pulsar image based on Alpine: > > merlimat/pulsar:3.3.0-SNAPSHOT-f2a91a1 (alpine 3.19.1) > Total: 0 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 0, CRITICAL: 0) > > Full list here: > https://gist/ > .github.com%2Fmerlimat%2Fee7534992b21cae0b04c8c63f64456ff&data=05%7C02 > %7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcceb8 > 92218c4d6f9e27223a522b9791%7C0%7C0%7C638436165473633950%7CUnknown%7CTW > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6 > Mn0%3D%7C0%7C%7C%7C&sdata=y3NzrPg14CBnD2nsr1Devtc5w2Ki0EPKeigKRT5piMI% > 3D&reserved=0 The above are all issues coming from Ubuntu base image. > > > It’s also worth mentioning we’d be moving away from other large > open-source big data projects in a way. Spark [2], Flink [3], Kafka > [4], Elasticsearch [5], and Trino [6] are based on Temurin/Ubuntu/ubi. > In my brief search, I didn’t find familiar names of tools in the big > data ecosystem with official images based on Alpine. > > Distroless would also remove almost everything from our base images, > minimizing space, reducing the vulnerability surface, and by > extension, reducing the CVE alerts from automated tooling. Apache > Druid [7] has used Distroless for a while in their official images. We > could achieve the same aims without any risk from musl/glibc, DNS > quirks, or other hiccups that Alpine may have. > > > Regarding the OpenJDK distribution, the team from Amazon Corretto, > publishes well tested and supported Alpine packages. See > https://aws/. > amazon.com%2Fcorretto&data=05%7C02%7Cahall%40teknoluxion.com%7C079cd39 > e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6f9e27223a522b9791%7C0%7C0%7 > C638436165473638107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj > oiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=%2FTa7LLrx > LHOwdHOIjP%2BIiFON%2FEjQdTH0cTZFtFaQkgA%3D&reserved=0 > > I have created a WIP/draft PR to show the potential changes: > https://gith/ > ub.com%2Fapache%2Fpulsar%2Fpull%2F22054&data=05%7C02%7Cahall%40teknolu > xion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6f9e27223a > 522b9791%7C0%7C0%7C638436165473641886%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7 > C&sdata=6eKLufeZ%2BsqcKJNp01PUYLYQKBLOSHGsDtbN831QQzM%3D&reserved=0 > > The image already passes all the integration tests and has been tested > for few weeks in a test cluster. > > I have pushed a Docker image for preview purposes: > merlimat/pulsar/3.3.0-SNAPSHOT-f2a91a1 > > > https://hub/. > docker.com%2Flayers%2Fmerlimat%2Fpulsar%2F3.3.0-SNAPSHOT-f2a91a1%2Fima > ges%2Fsha256-2d94832618bf30c02baa269bdf943c8f37aa5430258b7b4018f37ed12 > 0abb17a%3Fcontext%3Dexplore&data=05%7C02%7Cahall%40teknoluxion.com%7C0 > 79cd39e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6f9e27223a522b9791%7C0 > %7C0%7C638436165473645634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=B6z5 > xdq%2FF%2BLLasB7MM1OrqupF3vqullwEjZzOQT7ekQ%3D&reserved=0 > > Thanks, > Matteo > > -- > Matteo Merli > <matteo.me...@gmail.com> > > > On Wed, Dec 20, 2023 at 12:49 PM David Christle > <david.chris...@discordapp.com.invalid> wrote: > > > Are we sure the move to Alpine is worth the extensive performance > > testing and the risk of issues? Sticking with a popular glibc image > > like Temurin, Ubuntu/Debian, or ubi-minimal (mentioned also in this > > discussion) seems like a better path to me, without the risk of > > glibc vs musl issues. Using Distroless seems like another good > > potential option, as it would achieve the same aims as the Alpine > > move, with less > potential risk. > > > > The DNS issues seen with Alpine are worth paying strong attention to. > > Someone running a Pulsar deployment using the images could have a > > very difficult time debugging library/glibc vs musl/DNS issues, due > > to their low-level nature. A fix for the DNS issue only landed less > > than a year ago [1]. Unless we have a compelling reason for Alpine, > > it may be safer to wait for more adoption/testing before choosing it > > for the > official Pulsar images. > > > > The two main arguments in the PIP are: > > > > - Using a smaller base image like Alpine can save space. The > > relative size of the JRE image for Alpine is about 45% smaller than > > the equivalent Ubuntu slim image. > > > > - The Ubuntu image has a few tens of CVEs in it, as reported by an > > automated container CVE scan tool, compared to 0 in Alpine. > > > > > > These seem reasonable, but the true magnitude of benefit is likely > > lower in practice. The pulsar-all images are 2.7GB in size, so > > saving 166MB on the base + JRE install translates to just a 6% smaller > > image. > > Unless we expect other installed packages part of pulsar-all to gain > > additional space savings on Alpine, this difference seems very > > marginal > in practice. > > > > Security-wise, I took a cursory look at the CVEs, and many of them > > are in libraries that aren’t used in a Pulsar deployment/are > > difficult to envision a practical exploit scenario. Automated > > scanning tool results should be taken with a grain of salt - they > > generate a lot of alerts, and many public container images throw off these > > CVE alerts nowadays. > > The counterargument is that only a fraction of the libraries > > indicated are even loaded at runtime, only some fraction of those > > end up potentially being exploitable, and only a smaller fraction > > have no fix/workaround. This isn’t to say reducing the vulnerability > > surface by using an image with less cruft in it is not a worthwhile > > endeavor — I do think we should try to tackle it -- but I’m simply > > trying to be realistic about what our actual gains will be from switching > > to Alpine. > > > > It’s also worth mentioning we’d be moving away from other large > > open-source big data projects in a way. Spark [2], Flink [3], Kafka > > [4], Elasticsearch [5], and Trino [6] are based on Temurin/Ubuntu/ubi. > > In my brief search, I didn’t find familiar names of tools in the big > > data ecosystem with official images based on Alpine. > > > > Distroless would also remove almost everything from our base images, > > minimizing space, reducing the vulnerability surface, and by > > extension, reducing the CVE alerts from automated tooling. Apache > > Druid [7] has used Distroless for a while in their official images. > > We could achieve the same aims without any risk from musl/glibc, DNS > > quirks, or other hiccups that Alpine may have. > > > > Regards, > > David > > > > > > [1] > > https://gi/ > > tlab.alpinelinux.org%2Falpine%2Ftsc%2F-%2Fissues%2F43%23note_295556& > > data=05%7C02%7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4 > > f53d5%7Cfcceb892218c4d6f9e27223a522b9791%7C0%7C0%7C63843616547365006 > > 8%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTi > > I6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=tvzXRU7tQWR8TkRjzKEk0PtW > > TJj8aenzYqjDGWSok3E%3D&reserved=0 > > [2] Apache Spark - Temurin - > > https://gi/ > > thub.com%2Fapache%2Fflink-docker%2Ftree%2Fmaster%2F1.18&data=05%7C02 > > %7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcce > > b892218c4d6f9e27223a522b9791%7C0%7C0%7C638436165473654182%7CUnknown% > > 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > > JXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=AUS85hOOnewk88bm7J4zDsr5F1cACzcj6ULv > > 2%2BGGEzk%3D&reserved=0 > > [3] Apache Flink - Temurin - > > https://gi/ > > thub.com%2Fapache%2Fflink-docker%2Ftree%2Fmaster%2F1.18&data=05%7C02 > > %7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcce > > b892218c4d6f9e27223a522b9791%7C0%7C0%7C638436165473658191%7CUnknown% > > 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC > > JXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=L%2BUzvHzoENFmwQUeAgv0lFZlpDKIAncFu7 > > %2FA3YeQPeQ%3D&reserved=0 [4] KIP-975: Docker Image for Apache Kafka > > - Temurin - > > > https://cwik/ > i.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-975%253A%2BDocker%2B > Image%2Bfor%2BApache%2BKafka&data=05%7C02%7Cahall%40teknoluxion.com%7C > 079cd39e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6f9e27223a522b9791%7C > 0%7C0%7C638436165473662148%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA > iLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=090 > vsrQjkBCqK%2FVoPpVbBQ9Uo4LcmxSJY1dOzsuEwCw%3D&reserved=0 > > [5] Elasticsearch - Ubuntu & ubi-minimal - > > > https://gith/ > ub.com%2Felastic%2Felasticsearch%2Fblob%2Fbdde29720a9e37224a90e5f186ab > bcbc73ff9351%2Fdistribution%2Fdocker%2FREADME.md&data=05%7C02%7Cahall% > 40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6 > f9e27223a522b9791%7C0%7C0%7C638436165473666193%7CUnknown%7CTWFpbGZsb3d > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 0%7C%7C%7C&sdata=jKPpyOs5ZgrqHE5i%2FscNIDp3c8f8iXHjqsjcSgodmwQ%3D&rese > rved=0 [6] Trino - ubi, after moving from Ubuntu - > > > https://hub/. > docker.com%2Flayers%2Ftrinodb%2Ftrino%2F435%2Fimages%2Fsha256-9540a785 > c31c4ba9ad099ad99ae06ccd5ccca506e39b7d557effe1482309e05d&data=05%7C02% > 7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcceb89 > 2218c4d6f9e27223a522b9791%7C0%7C0%7C638436165473669894%7CUnknown%7CTWF > pbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M > n0%3D%7C0%7C%7C%7C&sdata=kS9hu%2FXVA4fHP3dL23LBQ1yY5rzIvPZ40V73UBbN7cA > %3D&reserved=0 > > [7] Apache Druid - Distroless - > > > https://gith/ > ub.com%2Fapache%2Fdruid%2Fblob%2Fe373f6269251655f5be93ce895aee8dee8cc6 > 7dd%2Fdistribution%2Fdocker%2FDockerfile%23L4&data=05%7C02%7Cahall%40t > eknoluxion.com%7C079cd39e87b240332d3108dc2e4f53d5%7Cfcceb892218c4d6f9e > 27223a522b9791%7C0%7C0%7C638436165473676661%7CUnknown%7CTWFpbGZsb3d8ey > JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7 > C%7C%7C&sdata=xqjkAQD0%2FQM%2BQ%2BWc1Gfs6KQ%2FOm1RsGeKfaYijvc2ogg%3D&r > eserved=0 > > > > > > On 2023/12/13 17:06:12 Matteo Merli wrote: > > > I don't think the compatibility for downstream users is going to > > > be a big > > > problem: > > > 1. Most users don't need to modify the Pulsar image in > > > significant way 2. If they do, they won't be using the "latest" > > > tag, but rather a > > specific > > > version > > > 3. Users who are dependent on the Ubuntu base image can stay on > > > the > > > 3.0 LTS release branch for the entire LTS lifespan > > > > > > I would avoid supporting 2 images at the same time because it > > > would make > > it > > > very hard to properly test them both. > > > > > > > > > -- > > > Matteo Merli > > > <mm...@apache.org> > > > > > > > > > On Tue, Dec 12, 2023 at 8:57 PM Zixuan Liu <zi...@apache.org> wrote: > > > > > > > +1. > > > > > > > > It is a good idea to use the Alpine image to run the Pulsar, as > > > > it is > > more > > > > secure. > > > > > > > > However, switching images may affect downstream users, and I am > > wondering > > > > if it is possible to provide multiple docker tags: > > > > - latest: using the Ubuntu image > > > > - alpine: using the Alpine image > > > > > > > > Thanks, > > > > Zixuan > > > > > > > > Yunze Xu <xy...@apache.org> 于2023年12月13日周三 12:24写道: > > > > > > > > > +1 to me. The Alpine Linux is much more light-weight than Ubuntu. > > > > > > > > > > Thanks, > > > > > Yunze > > > > > > > > > > On Wed, Dec 13, 2023 at 3:00 AM Matteo Merli > > > > > <mm...@apache.org> > > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > I've created a new proposal to switch Pulsar base docker > > > > > > images > > from > > > > > Ubuntu > > > > > > to Alpine Linux. > > > > > > > > > > > > Details and motivation in the PIP: > > > > > > https://nam12.safelinks.protection.outlook.com/?url=https%3A > > > > > > %2F%2Fgithub.com%2Fapache%2Fpulsar%2Fpull%2F21716&data=05%7C > > > > > > 02%7Cahall%40teknoluxion.com%7C079cd39e87b240332d3108dc2e4f5 > > > > > > 3d5%7Cfcceb892218c4d6f9e27223a522b9791%7C0%7C0%7C63843616547 > > > > > > 3681456%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi > > > > > > V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=B0 > > > > > > vOlZVAdBUQV3qdkxAIJe5G4OLQsnCAkntD1tsSEVk%3D&reserved=0 > > > > > > > > > > > > Matteo > > > > > > > > > > > > -- > > > > > > Matteo Merli > > > > > > <mm...@apache.org> > > > > > > > > > > > > >