Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Péter Váry
Currently we have a 'static' 2 level manifest structure. If we introduce the 'everything is a manifest' concept then we will remove the limit on the levels. This would prevent concurrent reading of the embedded manifests (if the table has 5 levels of embedded manifests the reader needs to read thos

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Micah Kornfield
Would cadding the ability to have a list of manifest lists solve this problem? This might be an incremental step to getting to "everything" is a manifest? For now I wanted to reuse the existing manifest-list and manifests fields. Regardless of the outcome, please let's not re-use a field in a w

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Jan Kaul
Thanks for your feedback. About your concerns Fokko: 1. Generally the number of manifest files in the manifests field shouldn't get too large. But I think you can already improve the write amplification and conflict resolution with using up to 10 manifest files. The fact that the manifests fi

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Russell Spitzer
I would much rather we switch to the "everything is a manifest approach. Instead of manifest lists we only ever have manifests. A Manifest can then link to data files or additional manifests. In the case of streaming then you only ever have to read and write a single manifest. If we couple this wit

Re: [DISCUSS] Proposal to buffer manifest files before updating manifest-list

2024-11-22 Thread Fokko Driesprong
Hi Jan, Thanks for sending out this proposal. While reading through it, two questions pop up: - You mentioned repurposing the manifests field. Currently, this field contains a list of paths that point to the manifest data. Would this also be your suggestion? This way, when committing the