Hey Jan,

Thanks for the heads-up, that's an interesting proposal. I've shared my
thoughts on the thread itself.

Keep in mind that this would be a spec-change as well, as it now explicitly
states
<https://github.com/apache/iceberg/blob/main/format/spec.md#snapshots>
that they
can be populated both: Manifests must be omitted if manifest-list is
present. This is also how it is implemented in Java today, populating it
next to the manifest list would lead to data loss.

Kind regards,
Fokko

Op vr 22 nov 2024 om 15:36 schreef Jan Kaul <jank...@mailbox.org.invalid>:

> Hi all,
>
> I've been thinking about how we could make Iceberg tables more performant
> for streaming inserts. And I thought about using the manifests field as a
> buffer for manifest files before they are written to the manifest-list.
> This reduces the write amplification and simplifies the conflict resolution
> of concurrent writes.
>
> I've written down my proposal here:
> https://lists.apache.org/thread/4cm9kc6pkmx5ol218z5yjk41gh9t28qg
>
> And I thought I share it with you before you decide to deprecate the
> manifests field.
>
> Kind regards,
>
> Jan
> On 22.11.24 11:55, Fokko Driesprong wrote:
>
> Hey Ryan,
>
> The goal of the deprecation is to avoid other implementations to produce
> it. PyIceberg for example, does not support this and I think it would be
> good to avoid having others (rust, go, etc) to support this. Regarding the
> removal, Amogh expressed the same concern on the PR
> <https://github.com/apache/iceberg/pull/11586#discussion_r1848789823>.
>
> In my quest to make the Java implementation follow the spec as closely as
> possible, I noticed that we use a DummyFileIO to mimic a ManifestList. I
> ran into this when turning
> <https://github.com/apache/iceberg/pull/11626/files#r1853683623>503:
> added_snapshot_id
> <https://github.com/apache/iceberg/pull/11626/files#r1853683623> into a
> required field
> <https://github.com/apache/iceberg/pull/11626/files#r1853683623>. So the
> value is in removing paths, as Shezon pointed out. When removing support
> for the embedded manifest list, we can remove all that logic and keep the
> codebase nice and tidy.
>
> It would be good to start the discussion of deprecating support for older
> formats at some point, however, for a V2 reader is it fairly easy to
> project V1 metadata as V2. Except when embedded manifests are being used,
> marking this kind of oddities as deprecated I think will enable readers to
> support reading older versions for a longer time. My suggestion would be to
> mark the field as deprecated and revisit the actual removal. I've marked it
> up for removal in Java 2.0 for now to give it enough time.
>
> Kind regards,
> Fokko
>
>
>
> Op do 21 nov 2024 om 20:52 schreef rdb...@gmail.com <rdb...@gmail.com>:
>
>> Can we safely deprecate and remove this? The manifest list is required in
>> v2, but the spec has stated for a long time that v1 tables can use
>> manifests rather than a manifest list. It’s unlikely, but it would be
>> valid for other implementations to produce it.
>>
>> I would understand if other implementations chose to fail tables that
>> don’t have a manifest list to avoid adding code to handle manifests, but
>> I don’t think that there’s much value in removing support from the Java
>> implementation.
>>
>> Instead, what about discussing how to deprecate support for older format
>> versions? That seems like the main issue here. Once the majority of
>> implementations move to newer versions, we would like to deprecate the old
>> ones.
>>
>> On Thu, Nov 21, 2024 at 11:01 AM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> +1, great to have less possible paths.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Thu, Nov 21, 2024 at 10:33 AM Steve Zhang
>>> <hongyue_zh...@apple.com.invalid> <hongyue_zh...@apple.com.invalid>
>>> wrote:
>>>
>>>> +1 to deprecate
>>>>
>>>> Thanks,
>>>> Steve Zhang
>>>>
>>>>
>>>>
>>>> On Nov 19, 2024, at 3:32 AM, Fokko Driesprong <fo...@apache.org> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I would like to propose to deprecate embedded manifests
>>>> <https://github.com/apache/iceberg/pull/11586>. This has been used
>>>> before the manifest-list was introduced, but I don't think they are used
>>>> since the project has been open-sourced, and it would be good to
>>>> officially deprecate them from the spec. It is only supported by Iceberg
>>>> Java today, and I haven't seen any requests for PyIceberg to add support
>>>> for this.
>>>>
>>>> Any questions or concerns about deprecating the embedded manifests?
>>>>
>>>> Kind regards,
>>>> Fokko Driesprong
>>>>
>>>>
>>>>

Reply via email to