Number of entries in manifest-list

2022-01-07 Thread g. g. grey
Hi folks, I am just getting started with Iceberg and I'm trying to build up some intuition for how large the metadata will become for large, active tables. Specifically, what is the order of magnitude of manifest entries that I should reasonably expect in a manifest-list file? Is there a particula

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Hi, The manifest entries are one per data file or delete file, so depends how many data files/delete files your table has. Number of files is controlled mostly by the parallelism of the job that writes the table, though there are Iceberg RewriteDataFile utilities that can compact as well (as in y

Re: Number of entries in manifest-list

2022-01-07 Thread g. g. grey
Hi Szehon, Thanks. My apologies; I was too loose in my wording. I'll try to use the terms from the spec. I was asking about the number of total manifest files, specifically the number of `manifest_file` structs that are found in the manifest-list file. It sounds like the "commit.manifest.target-

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Sure, I guessed you were asking about the number of manifest files rather than entries. There's always a tradeoff, some aspects being: - More manifest files => better predicate pushdown (skip more manifest files during query), and less chance for concurrency conflict (which is two transa

Re: Time-sliced incremental scan

2022-01-07 Thread Ryan Blue
Walaa, At supporting syntax for VERSIONS BETWEEN SYSTEM TIME ... AND ... seems reasonable to me. I think it’s often really nice to be able to select the changes between two points in time for debugging. It would also be nice to be able to do the same for snapshot IDs, so you could reliably use sim

Re: Iceberg engine version maintenance lifecycle

2022-01-07 Thread Ryan Blue
Sorry for the late reply here! These look reasonable to me. I think that this will help us reason about trade-offs next time we have a release issue like the current one. We should simply mark the 3.2 support as beta and get the release out next time. I also think that we should not create situati

Re: [DISCUSS] Table updates in a REST catalog

2022-01-07 Thread Ryan Blue
Hi Yufei, I’ll reply to your questions inline: On Tue, Jan 4, 2022 at 3:12 PM Yufei Gu wrote: 1. Are we going to open for other ways to store the table metadata instead > of the metadata.json files? For example, a relational database or a > key-value database. This will be a big change if that’s