Hey folks,

We have a Dockerfile defined in pyiceberg [1] that uses the Spark base
image and installs all the necessary jars. This is used for our integration
test setup [2] and is inspired by databricks/docker-spark-iceberg [3].
We've made many improvements such as upgrading to Spark 4, supporting Spark
Connect, and better image build caching.

This is already self-contained and can be reused by other subprojects. In
fact, iceberg-rust already uses it [4] and I try to keep them in sync.
I think it would be beneficial for the project to publish this image and
something similar for Flink.

Let me know what you think.

Best,
Kevin Liu



[1]
https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/spark/Dockerfile
[2]
https://github.com/apache/iceberg-python/blob/6de6d6acad440885788fb1a24c04ed647b92af0e/dev/docker-compose-integration.yml#L20-L21
[3]
https://github.com/databricks/docker-spark-iceberg/blob/cf617dc29e8672792e76b9bcf6017af52f570020/spark/Dockerfile
[4]
https://github.com/apache/iceberg-rust/blob/330f21da894948fc10b57d541cb2d6f32c8bdbb8/crates/integration_tests/testdata/spark/Dockerfile

On Mon, Jan 26, 2026 at 10:27 AM Steven Wu <[email protected]> wrote:

> > Since the integration code for both Spark and Flink lives in our
> repository, it might make sense to also store the Docker images and the
> corresponding scripts there.
>
> I agree with Peter here.
>
> The previous thread has some concerns if the Iceberg project should host
> those docker images. Not sure if the opinions have changed.
>
> On Mon, Jan 26, 2026 at 2:43 AM Robin Moffatt via dev <
> [email protected]> wrote:
>
>> Thanks Ajantha, I'd not seen that thread.
>> Having looked at it, am I understanding the view to be that ideally Flink
>> would publish a Docker image that included the Iceberg dependencies?
>>
>> However we do this, I feel that the user coming to run the Flink
>> quickstart should not have to build their own Docker image; this adds
>> unnecessary friction that is easily alleviated.
>>
>> If I've understood the situation correctly, then I'm happy to discuss
>> this idea with the Flink community; please let me know before I do so.
>>
>> thanks, Robin.
>>
>> On Fri, 23 Jan 2026 at 16:50, Ajantha Bhat <[email protected]> wrote:
>>
>>> Hi Robin and Peter,
>>>
>>> I discussed community-maintained Docker images previously:
>>> https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
>>>
>>> The consensus was to publish only the REST fixture Docker image
>>> <https://hub.docker.com/r/apache/iceberg-rest-fixture> (now at 100K+
>>> total downloads) and use Docker images published by the main engines in the
>>> quickstart, instead of maintaining these images ourselves.
>>> See the thread above for more details.
>>>
>>> With respect to adding a Flink quickstart page, I’m in favor of adding
>>> it and relying on the Docker images provided by Flink rather than
>>> maintaining our own images.
>>> - Ajantha
>>>
>>> On Fri, Jan 23, 2026 at 9:43 PM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Hi Robin,
>>>> It would be nice to separate them. I expect that we will have some
>>>> extra stuff to do with the docker image. For example make sure that we have
>>>> ci in place to build it.
>>>> Thanks,
>>>> Peter
>>>>
>>>>
>>>> On Fri, Jan 23, 2026, 16:55 Robin Moffatt via dev <
>>>> [email protected]> wrote:
>>>>
>>>>> Thanks for the positive reception of this idea.
>>>>> I've drafted a PR [1] and would appreciate input :)
>>>>>
>>>>> Also, should I keep this and the quickstart PR [2] as separate PRs, or
>>>>> combine them?
>>>>>
>>>>> thanks, Robin.
>>>>>
>>>>>
>>>>> [1] https://github.com/apache/iceberg/pull/15124
>>>>> [2] https://github.com/apache/iceberg/pull/15062
>>>>>
>>>>> On Fri, 23 Jan 2026 at 13:58, Jean-Baptiste Onofré <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This is a great idea.
>>>>>>
>>>>>> If we are moving forward with an "official" Docker image published by
>>>>>> the project, we must ensure it is fully compliant with ASF requirements
>>>>>> regarding LICENSE/NOTICE files, etc. While this may seem straightforward,
>>>>>> it is a detail that is often overlooked.
>>>>>>
>>>>>> I would be happy to help with this process.
>>>>>>
>>>>>> Regards,
>>>>>> JB
>>>>>>
>>>>>> On Fri, Jan 23, 2026 at 1:52 PM Maximilian Michels <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Robin,
>>>>>>>
>>>>>>> +1 That's a great idea. It's often a bit painful for new users to get
>>>>>>> all the dependencies in the right place.
>>>>>>>
>>>>>>> +1 for building upon the official Flink Docker images:
>>>>>>> https://hub.docker.com/r/apache/flink
>>>>>>>
>>>>>>> -Max
>>>>>>>
>>>>>>> On Fri, Jan 23, 2026 at 12:27 PM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>> >
>>>>>>> > Hi Robin,
>>>>>>> >
>>>>>>> > I would love to see the Flink quickstart image in the Iceberg repo.
>>>>>>> >
>>>>>>> > Ajantha was working on the Spark side:
>>>>>>> https://github.com/apache/iceberg/issues/13519
>>>>>>> > The conclusion was:
>>>>>>> >>
>>>>>>> >> we should both remove the vendor reference and bring this back up
>>>>>>> to date. My preference would be to rely on the Spark image <
>>>>>>> https://hub.docker.com/r/apache/spark> provided by the Apache Spark
>>>>>>> project, similar to what we do for the Hive <
>>>>>>> https://iceberg.apache.org/hive-quickstart/> quickstart. We should
>>>>>>> be able to load all the Iceberg-specific JARs through the
>>>>>>> spark.jars.packages configuration <
>>>>>>> https://spark.apache.org/docs/3.5.1/configuration.html>.
>>>>>>> >
>>>>>>> >
>>>>>>> > Ajantha also added the link to the old dev list thread:
>>>>>>> https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4
>>>>>>> >
>>>>>>> > Thanks for working on this,
>>>>>>> > Peter
>>>>>>> >
>>>>>>> > Robin Moffatt via dev <[email protected]> ezt írta (időpont:
>>>>>>> 2026. jan. 22., Cs, 19:23):
>>>>>>> >>
>>>>>>> >> Hi,
>>>>>>> >>
>>>>>>> >> Following discussion on the Flink quickstart PR [1], what do
>>>>>>> people think about adding an official quickstart Docker image for Flink 
>>>>>>> to
>>>>>>> the project?
>>>>>>> >> At the moment the Spark quickstart uses tabulario/spark-iceberg
>>>>>>> so perhaps that could be brought into the project too.
>>>>>>> >>
>>>>>>> >> thanks, Robin.
>>>>>>> >>
>>>>>>> >> 1: https://github.com/apache/iceberg/pull/15062
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>

Reply via email to