Hi Szehon,

Thanks. My apologies; I was too loose in my wording. I'll try to use the
terms from the spec.

I was asking about the number of total manifest files, specifically the
number of `manifest_file` structs that are found in the manifest-list file.

It sounds like the "commit.manifest.target-size-bytes" controls the target
size when we merge small manifest files, which is great to know we can
configure, as it will clearly have an impact on the number of
`manifest_file` structs.

Is there a general order-of-magnitude target number of `manifest_file`
structs? Presumably that would dictate when one would want to merge
manifest files and/or data files.

Thanks again!
ggg


On Fri, Jan 7, 2022 at 11:41 AM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Hi,
>
> The manifest entries are one per data file or delete file, so depends how
> many data files/delete files your table has.  Number of files is controlled
> mostly by the parallelism of the job that writes the table, though there
> are Iceberg RewriteDataFile utilities that can compact as well (as in your
> link).
>
> The number of manifest files is another topic, controlled by 
> "commit.manifest.target-size-bytes"
> (but should not affect the number of total manifest entries).
>
> Hope that helps,
> Szehon
>
> On Fri, Jan 7, 2022 at 9:39 AM g. g. grey <g.g.g...@gmail.com> wrote:
>
>> Hi folks,
>>
>> I am just getting started with Iceberg and I'm trying to build up some
>> intuition for how large the metadata will become for large, active tables.
>> Specifically, what is the order of magnitude of manifest entries that I
>> should reasonably expect in a manifest-list file? Is there a particular
>> range that is ideal and aimed for when cleaning up/maintaining a table?
>>
>> I found the maintenance page <https://iceberg.apache.org/#maintenance/>,
>> but I'm hoping to find rules-of-thumb based on peoples' experience with
>> using iceberg.
>>
>> Thanks! If I've missed the info somewhere, a simple pointer would be
>> great.
>> ggg
>>
>

Reply via email to