Hi Szehon, Thanks. My apologies; I was too loose in my wording. I'll try to use the terms from the spec.
I was asking about the number of total manifest files, specifically the number of `manifest_file` structs that are found in the manifest-list file. It sounds like the "commit.manifest.target-size-bytes" controls the target size when we merge small manifest files, which is great to know we can configure, as it will clearly have an impact on the number of `manifest_file` structs. Is there a general order-of-magnitude target number of `manifest_file` structs? Presumably that would dictate when one would want to merge manifest files and/or data files. Thanks again! ggg On Fri, Jan 7, 2022 at 11:41 AM Szehon Ho <szehon.apa...@gmail.com> wrote: > Hi, > > The manifest entries are one per data file or delete file, so depends how > many data files/delete files your table has. Number of files is controlled > mostly by the parallelism of the job that writes the table, though there > are Iceberg RewriteDataFile utilities that can compact as well (as in your > link). > > The number of manifest files is another topic, controlled by > "commit.manifest.target-size-bytes" > (but should not affect the number of total manifest entries). > > Hope that helps, > Szehon > > On Fri, Jan 7, 2022 at 9:39 AM g. g. grey <g.g.g...@gmail.com> wrote: > >> Hi folks, >> >> I am just getting started with Iceberg and I'm trying to build up some >> intuition for how large the metadata will become for large, active tables. >> Specifically, what is the order of magnitude of manifest entries that I >> should reasonably expect in a manifest-list file? Is there a particular >> range that is ideal and aimed for when cleaning up/maintaining a table? >> >> I found the maintenance page <https://iceberg.apache.org/#maintenance/>, >> but I'm hoping to find rules-of-thumb based on peoples' experience with >> using iceberg. >> >> Thanks! If I've missed the info somewhere, a simple pointer would be >> great. >> ggg >> >