Thanks for working on this, Robin! It looks like the complexity here is publishing the docker image. What do you think about isolating that part? (Just move the publish script out of #15124) We can start with the Dockerfile definition, which allows us to build locally. This should unblock us from merging the getting started docs in #15062 Thoughts?
Best, Kevin Liu On Wed, Jan 28, 2026 at 5:57 AM Robin Moffatt via dev < [email protected]> wrote: > Hi, > > Thanks for the discussion and input. > It sounds like there are no major blockers. Could someone please review > https://github.com/apache/iceberg/pull/15124 ? > > thanks, > > Robin. > > On Mon, 26 Jan 2026 at 16:36, Kevin Liu <[email protected]> wrote: > >> Hey folks, >> >> We have a Dockerfile defined in pyiceberg [1] that uses the Spark base >> image and installs all the necessary jars. This is used for our integration >> test setup [2] and is inspired by databricks/docker-spark-iceberg [3]. >> We've made many improvements such as upgrading to Spark 4, supporting Spark >> Connect, and better image build caching. >> >> This is already self-contained and can be reused by other subprojects. In >> fact, iceberg-rust already uses it [4] and I try to keep them in sync. >> I think it would be beneficial for the project to publish this image and >> something similar for Flink. >> >> Let me know what you think. >> >> Best, >> Kevin Liu >> >> >> >> [1] >> https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/spark/Dockerfile >> [2] >> https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/docker-compose-integration.yml#L20-L21 >> [3] >> https://github.com/databricks/docker-spark-iceberg/blob/cf617dc29e8672792e76b9bcf6017af52f570020/spark/Dockerfile >> [4] >> https://github.com/apache/iceberg-rust/blob/330f21da894948fc10b57d541cb2d6f32c8bdbb8/crates/integration_tests/testdata/spark/Dockerfile >> >> On Mon, Jan 26, 2026 at 10:27 AM Steven Wu <[email protected]> wrote: >> >>> > Since the integration code for both Spark and Flink lives in our >>> repository, it might make sense to also store the Docker images and the >>> corresponding scripts there. >>> >>> I agree with Peter here. >>> >>> The previous thread has some concerns if the Iceberg project should host >>> those docker images. Not sure if the opinions have changed. >>> >>> On Mon, Jan 26, 2026 at 2:43 AM Robin Moffatt via dev < >>> [email protected]> wrote: >>> >>>> Thanks Ajantha, I'd not seen that thread. >>>> Having looked at it, am I understanding the view to be that ideally >>>> Flink would publish a Docker image that included the Iceberg dependencies? >>>> >>>> However we do this, I feel that the user coming to run the Flink >>>> quickstart should not have to build their own Docker image; this adds >>>> unnecessary friction that is easily alleviated. >>>> >>>> If I've understood the situation correctly, then I'm happy to discuss >>>> this idea with the Flink community; please let me know before I do so. >>>> >>>> thanks, Robin. >>>> >>>> On Fri, 23 Jan 2026 at 16:50, Ajantha Bhat <[email protected]> >>>> wrote: >>>> >>>>> Hi Robin and Peter, >>>>> >>>>> I discussed community-maintained Docker images previously: >>>>> https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq >>>>> >>>>> The consensus was to publish only the REST fixture Docker image >>>>> <https://hub.docker.com/r/apache/iceberg-rest-fixture> (now at 100K+ >>>>> total downloads) and use Docker images published by the main engines in >>>>> the >>>>> quickstart, instead of maintaining these images ourselves. >>>>> See the thread above for more details. >>>>> >>>>> With respect to adding a Flink quickstart page, I’m in favor of adding >>>>> it and relying on the Docker images provided by Flink rather than >>>>> maintaining our own images. >>>>> - Ajantha >>>>> >>>>> On Fri, Jan 23, 2026 at 9:43 PM Péter Váry < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Robin, >>>>>> It would be nice to separate them. I expect that we will have some >>>>>> extra stuff to do with the docker image. For example make sure that we >>>>>> have >>>>>> ci in place to build it. >>>>>> Thanks, >>>>>> Peter >>>>>> >>>>>> >>>>>> On Fri, Jan 23, 2026, 16:55 Robin Moffatt via dev < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Thanks for the positive reception of this idea. >>>>>>> I've drafted a PR [1] and would appreciate input :) >>>>>>> >>>>>>> Also, should I keep this and the quickstart PR [2] as separate PRs, >>>>>>> or combine them? >>>>>>> >>>>>>> thanks, Robin. >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/apache/iceberg/pull/15124 >>>>>>> [2] https://github.com/apache/iceberg/pull/15062 >>>>>>> >>>>>>> On Fri, 23 Jan 2026 at 13:58, Jean-Baptiste Onofré <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This is a great idea. >>>>>>>> >>>>>>>> If we are moving forward with an "official" Docker image published >>>>>>>> by the project, we must ensure it is fully compliant with ASF >>>>>>>> requirements >>>>>>>> regarding LICENSE/NOTICE files, etc. While this may seem >>>>>>>> straightforward, >>>>>>>> it is a detail that is often overlooked. >>>>>>>> >>>>>>>> I would be happy to help with this process. >>>>>>>> >>>>>>>> Regards, >>>>>>>> JB >>>>>>>> >>>>>>>> On Fri, Jan 23, 2026 at 1:52 PM Maximilian Michels <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey Robin, >>>>>>>>> >>>>>>>>> +1 That's a great idea. It's often a bit painful for new users to >>>>>>>>> get >>>>>>>>> all the dependencies in the right place. >>>>>>>>> >>>>>>>>> +1 for building upon the official Flink Docker images: >>>>>>>>> https://hub.docker.com/r/apache/flink >>>>>>>>> >>>>>>>>> -Max >>>>>>>>> >>>>>>>>> On Fri, Jan 23, 2026 at 12:27 PM Péter Váry < >>>>>>>>> [email protected]> wrote: >>>>>>>>> > >>>>>>>>> > Hi Robin, >>>>>>>>> > >>>>>>>>> > I would love to see the Flink quickstart image in the Iceberg >>>>>>>>> repo. >>>>>>>>> > >>>>>>>>> > Ajantha was working on the Spark side: >>>>>>>>> https://github.com/apache/iceberg/issues/13519 >>>>>>>>> > The conclusion was: >>>>>>>>> >> >>>>>>>>> >> we should both remove the vendor reference and bring this back >>>>>>>>> up to date. My preference would be to rely on the Spark image < >>>>>>>>> https://hub.docker.com/r/apache/spark> provided by the Apache >>>>>>>>> Spark project, similar to what we do for the Hive < >>>>>>>>> https://iceberg.apache.org/hive-quickstart/> quickstart. We >>>>>>>>> should be able to load all the Iceberg-specific JARs through the >>>>>>>>> spark.jars.packages configuration < >>>>>>>>> https://spark.apache.org/docs/3.5.1/configuration.html>. >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > Ajantha also added the link to the old dev list thread: >>>>>>>>> https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4 >>>>>>>>> > >>>>>>>>> > Thanks for working on this, >>>>>>>>> > Peter >>>>>>>>> > >>>>>>>>> > Robin Moffatt via dev <[email protected]> ezt írta >>>>>>>>> (időpont: 2026. jan. 22., Cs, 19:23): >>>>>>>>> >> >>>>>>>>> >> Hi, >>>>>>>>> >> >>>>>>>>> >> Following discussion on the Flink quickstart PR [1], what do >>>>>>>>> people think about adding an official quickstart Docker image for >>>>>>>>> Flink to >>>>>>>>> the project? >>>>>>>>> >> At the moment the Spark quickstart uses tabulario/spark-iceberg >>>>>>>>> so perhaps that could be brought into the project too. >>>>>>>>> >> >>>>>>>>> >> thanks, Robin. >>>>>>>>> >> >>>>>>>>> >> 1: https://github.com/apache/iceberg/pull/15062 >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>> >
