so this'll cut down on #of manifest files read, won't it? so improving
query planning

Does anyone have an estimate of what benefit this is likely to have in
production deployments?

On Thu, 29 May 2025 at 21:25, Ryan Blue <rdb...@gmail.com> wrote:

> Hi everyone,
>
> Like Russell’s recent note, I’m starting a thread to connect those of us
> that are interested in the idea of changing Iceberg’s metadata in v4 so
> that in most cases committing a change only requires writing one additional
> metadata file.
>
> *Idea: One-file commits*
>
> The current Iceberg metadata structure requires writing at least one
> manifest and a new manifest list to produce a new snapshot. The goal of
> this work is to allow more flexibility by allowing the manifest list layer
> to store data and delete files. As a result, only one file write would be
> needed before committing the new snapshot. In addition, this work will also
> try to explore:
>
>    - Avoiding small manifests that must be read in parallel and later
>    compacted (metadata maintenance changes)
>    - Extend metadata skipping to use aggregated column ranges that are
>    compatible with geospatial data (manifest metadata)
>    - Using soft deletes to avoid rewriting existing manifests (metadata
>    DVs)
>
> If you’re interested in these problems, please reply!
>
> Ryan
>

Reply via email to