Hey folks, We have a Dockerfile defined in pyiceberg [1] that uses the Spark base image and installs all the necessary jars. This is used for our integration test setup [2] and is inspired by databricks/docker-spark-iceberg [3]. We've made many improvements such as upgrading to Spark 4, supporting Spark Connect, and better image build caching.
This is already self-contained and can be reused by other subprojects. In fact, iceberg-rust already uses it [4] and I try to keep them in sync. I think it would be beneficial for the project to publish this image and something similar for Flink. Let me know what you think. Best, Kevin Liu [1] https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/spark/Dockerfile [2] https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/docker-compose-integration.yml#L20-L21 [3] https://github.com/databricks/docker-spark-iceberg/blob/cf617dc29e8672792e76b9bcf6017af52f570020/spark/Dockerfile [4] https://github.com/apache/iceberg-rust/blob/330f21da894948fc10b57d541cb2d6f32c8bdbb8/crates/integration_tests/testdata/spark/Dockerfile On Mon, Jan 26, 2026 at 10:27 AM Steven Wu <[email protected]> wrote: > > Since the integration code for both Spark and Flink lives in our > repository, it might make sense to also store the Docker images and the > corresponding scripts there. > > I agree with Peter here. > > The previous thread has some concerns if the Iceberg project should host > those docker images. Not sure if the opinions have changed. > > On Mon, Jan 26, 2026 at 2:43 AM Robin Moffatt via dev < > [email protected]> wrote: > >> Thanks Ajantha, I'd not seen that thread. >> Having looked at it, am I understanding the view to be that ideally Flink >> would publish a Docker image that included the Iceberg dependencies? >> >> However we do this, I feel that the user coming to run the Flink >> quickstart should not have to build their own Docker image; this adds >> unnecessary friction that is easily alleviated. >> >> If I've understood the situation correctly, then I'm happy to discuss >> this idea with the Flink community; please let me know before I do so. >> >> thanks, Robin. >> >> On Fri, 23 Jan 2026 at 16:50, Ajantha Bhat <[email protected]> wrote: >> >>> Hi Robin and Peter, >>> >>> I discussed community-maintained Docker images previously: >>> https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq >>> >>> The consensus was to publish only the REST fixture Docker image >>> <https://hub.docker.com/r/apache/iceberg-rest-fixture> (now at 100K+ >>> total downloads) and use Docker images published by the main engines in the >>> quickstart, instead of maintaining these images ourselves. >>> See the thread above for more details. >>> >>> With respect to adding a Flink quickstart page, I’m in favor of adding >>> it and relying on the Docker images provided by Flink rather than >>> maintaining our own images. >>> - Ajantha >>> >>> On Fri, Jan 23, 2026 at 9:43 PM Péter Váry <[email protected]> >>> wrote: >>> >>>> Hi Robin, >>>> It would be nice to separate them. I expect that we will have some >>>> extra stuff to do with the docker image. For example make sure that we have >>>> ci in place to build it. >>>> Thanks, >>>> Peter >>>> >>>> >>>> On Fri, Jan 23, 2026, 16:55 Robin Moffatt via dev < >>>> [email protected]> wrote: >>>> >>>>> Thanks for the positive reception of this idea. >>>>> I've drafted a PR [1] and would appreciate input :) >>>>> >>>>> Also, should I keep this and the quickstart PR [2] as separate PRs, or >>>>> combine them? >>>>> >>>>> thanks, Robin. >>>>> >>>>> >>>>> [1] https://github.com/apache/iceberg/pull/15124 >>>>> [2] https://github.com/apache/iceberg/pull/15062 >>>>> >>>>> On Fri, 23 Jan 2026 at 13:58, Jean-Baptiste Onofré <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> This is a great idea. >>>>>> >>>>>> If we are moving forward with an "official" Docker image published by >>>>>> the project, we must ensure it is fully compliant with ASF requirements >>>>>> regarding LICENSE/NOTICE files, etc. While this may seem straightforward, >>>>>> it is a detail that is often overlooked. >>>>>> >>>>>> I would be happy to help with this process. >>>>>> >>>>>> Regards, >>>>>> JB >>>>>> >>>>>> On Fri, Jan 23, 2026 at 1:52 PM Maximilian Michels <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hey Robin, >>>>>>> >>>>>>> +1 That's a great idea. It's often a bit painful for new users to get >>>>>>> all the dependencies in the right place. >>>>>>> >>>>>>> +1 for building upon the official Flink Docker images: >>>>>>> https://hub.docker.com/r/apache/flink >>>>>>> >>>>>>> -Max >>>>>>> >>>>>>> On Fri, Jan 23, 2026 at 12:27 PM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> > >>>>>>> > Hi Robin, >>>>>>> > >>>>>>> > I would love to see the Flink quickstart image in the Iceberg repo. >>>>>>> > >>>>>>> > Ajantha was working on the Spark side: >>>>>>> https://github.com/apache/iceberg/issues/13519 >>>>>>> > The conclusion was: >>>>>>> >> >>>>>>> >> we should both remove the vendor reference and bring this back up >>>>>>> to date. My preference would be to rely on the Spark image < >>>>>>> https://hub.docker.com/r/apache/spark> provided by the Apache Spark >>>>>>> project, similar to what we do for the Hive < >>>>>>> https://iceberg.apache.org/hive-quickstart/> quickstart. We should >>>>>>> be able to load all the Iceberg-specific JARs through the >>>>>>> spark.jars.packages configuration < >>>>>>> https://spark.apache.org/docs/3.5.1/configuration.html>. >>>>>>> > >>>>>>> > >>>>>>> > Ajantha also added the link to the old dev list thread: >>>>>>> https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4 >>>>>>> > >>>>>>> > Thanks for working on this, >>>>>>> > Peter >>>>>>> > >>>>>>> > Robin Moffatt via dev <[email protected]> ezt írta (időpont: >>>>>>> 2026. jan. 22., Cs, 19:23): >>>>>>> >> >>>>>>> >> Hi, >>>>>>> >> >>>>>>> >> Following discussion on the Flink quickstart PR [1], what do >>>>>>> people think about adding an official quickstart Docker image for Flink >>>>>>> to >>>>>>> the project? >>>>>>> >> At the moment the Spark quickstart uses tabulario/spark-iceberg >>>>>>> so perhaps that could be brought into the project too. >>>>>>> >> >>>>>>> >> thanks, Robin. >>>>>>> >> >>>>>>> >> 1: https://github.com/apache/iceberg/pull/15062 >>>>>>> >>>>>> >>>>> >>>>> >>
