Sure, I guessed you were asking about the number of manifest files rather
than entries.  There's always a tradeoff, some aspects being:

   - More manifest files => better predicate pushdown (skip more manifest
   files during query), and less chance for concurrency conflict (which is two
   transaction trying to modify same manifest file, which leads to retry).
   - Less manifest files => metadata queries (like show partitions) can be
   faster.

Each of these is a large topic itself that might be too big to go into here
:)

For us, we find the benefit for more manifest file is not as important as
making the metadata query fast for our users.  So we have tuned
commit.manifest.target-size-bytes to be a few times than the default.  We
try to keep the manifest file count to be tens or hundreds for any table,
we find if there are thousands, then a 'show partition' query takes a long
time.

We do need to do periodic RewriteManifest to keep the table in this shape
(as we have too many commits), and also to use
'commit.manifest.min-count-to-merge' and 'commit.manifest-merge.enabled' to
do the merge on commit to keep the table in this shape.

Hope that helps,
Szehon

On Fri, Jan 7, 2022 at 1:10 PM g. g. grey <g.g.g...@gmail.com> wrote:

> Hi Szehon,
>
> Thanks. My apologies; I was too loose in my wording. I'll try to use the
> terms from the spec.
>
> I was asking about the number of total manifest files, specifically the
> number of `manifest_file` structs that are found in the manifest-list file.
>
> It sounds like the "commit.manifest.target-size-bytes" controls the target
> size when we merge small manifest files, which is great to know we can
> configure, as it will clearly have an impact on the number of
> `manifest_file` structs.
>
> Is there a general order-of-magnitude target number of `manifest_file`
> structs? Presumably that would dictate when one would want to merge
> manifest files and/or data files.
>
> Thanks again!
> ggg
>
>
> On Fri, Jan 7, 2022 at 11:41 AM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Hi,
>>
>> The manifest entries are one per data file or delete file, so depends how
>> many data files/delete files your table has.  Number of files is controlled
>> mostly by the parallelism of the job that writes the table, though there
>> are Iceberg RewriteDataFile utilities that can compact as well (as in your
>> link).
>>
>> The number of manifest files is another topic, controlled by 
>> "commit.manifest.target-size-bytes"
>> (but should not affect the number of total manifest entries).
>>
>> Hope that helps,
>> Szehon
>>
>> On Fri, Jan 7, 2022 at 9:39 AM g. g. grey <g.g.g...@gmail.com> wrote:
>>
>>> Hi folks,
>>>
>>> I am just getting started with Iceberg and I'm trying to build up some
>>> intuition for how large the metadata will become for large, active tables.
>>> Specifically, what is the order of magnitude of manifest entries that I
>>> should reasonably expect in a manifest-list file? Is there a particular
>>> range that is ideal and aimed for when cleaning up/maintaining a table?
>>>
>>> I found the maintenance page <https://iceberg.apache.org/#maintenance/>,
>>> but I'm hoping to find rules-of-thumb based on peoples' experience with
>>> using iceberg.
>>>
>>> Thanks! If I've missed the info somewhere, a simple pointer would be
>>> great.
>>> ggg
>>>
>>

Reply via email to