Hi Szehon I agree with you there.
I think it's better to move forward step by step, so Eduard's proposal is a good idea. However, I think it's worth to keep the discussion going, at least to shape a good proposal. Regards JB On Wed, Nov 6, 2024 at 3:23 AM Szehon Ho <szehon.apa...@gmail.com> wrote: > > There seems to be many opinions here, but one of the main objections seems to > be the complexity added to REST spec impeding newer catalogs. > > Looking through the actual REST API change proposal, some of these are indeed > a bit advanced to implement, like metadata property filtering, or time-range > filtering, for potentially small gain, so I can understand that argument. > > There is definitely value in trimming TableMetadata wire traffic though, and > I would love to see this work proceed. TableMetadata maintenance only works > to a point, if a user wants to keep data of many different schemas, partition > specs, etc , maintenance cannot fix the problem alone. Going back to the > previous discussion thread, I think Eduard's proposal in > https://lists.apache.org/thread/r9fgq4yz1oy5bow09zhhmcm66t6kgbh7 in extending > refs to the other table-metadata array fields, beyond snapshots, is a good > compromise to at least get the ball rolling without too much change to the > API. > > Thanks > Szehon > > On Fri, Nov 1, 2024 at 9:04 AM Dmitri Bourlatchkov > <dmitri.bourlatch...@dremio.com.invalid> wrote: >> >> Hello All, >> >> This is an interesting discussion and I'd like to offer my perspective. >> >> When a REST Catalog is involved, the metadata is loaded and modified via the >> catalog API. So control over the metadata is delegated to the catalog. >> >> I'd argue that in this situation, catalogs should have the flexibility to >> optimize metadata operations internally. In other words, if a particular use >> case does not require access to some pieces of metadata, the catalog should >> have to provide them. For example, querying a particular snapshot does not >> require knowledge of other snapshots. >> >> I understand that the current metadata representation evolved to support >> certain use cases. Still, as far as API v2 is concerned, would it have to >> match what was happening in API v1? I think this is an opportunity to design >> API v2 in a more flexible and extensible manner. >> >> On the point of complexity (and I think adoption concerns are but a >> consequence of complexity). I believe if the API is modelled to supply >> information required for particular use cases as opposed to representing a >> particular state of the table as a whole, the complexity can be reduced. >> >> In other words, I propose to make API v2 such that it focuses on what >> clients (engines) require for operation as opposed to what the table >> metadata has in its totality at any moment in time. In a way, API v2 outputs >> do not have to be exact chunks of metadata carved out of physical files, but >> may be defined differently, linking to server-side metadata only >> conceptually. >> >> More specifically, if the client queries a table, it declares this intent in >> API and receives the information required for the query. The client should >> be prepared to receive more information than it needs (in case the server >> does not support metadata slicing), but that should not add complexity as >> discarding unused data should not be hard if the data structures allow for >> slicing. In effect, actual runtime efficiencies will be defined by the >> combined efforts of the client (engine) and catalog. At the same time >> neither the client, nor the catalog is required to implement advanced use >> cases. >> >> Similarly, if the client is only interested to know whether a table changed >> since point X (time or snapshot), that is also expressed in the API request. >> It may be a separate endpoint, or it may be possible to implement it as, for >> example, returning the latest snapshot ID. >> >> I understand, there are use cases where engines want to operate directly on >> metadata files in storage. That is fine too, IMO, I am not proposing to >> change the Iceberg file format spec. At the same time catalogs do not have >> to be limited to fetching data for the REST API from those files. Catalogs >> may choose to have additional storage partitioned and indexed differently >> than plain files. >> >> This is all very high level, of course, and it requires a lot of additional >> thinking about how to design API v2, but I believe we could achieve a more >> supportable and adoptable API v2 this way. >> >> Cheers, >> Dmitri. >> >> On Thu, Oct 31, 2024 at 2:41 PM Daniel Weeks <dwe...@apache.org> wrote: >>> >>> Eric, >>> >>> With respect to the credential endpoint, I believe there is important >>> context missing that probably should have been captured in the doc. The >>> credential endpoint is unlike other use cases because the fundamental issue >>> is that refresh is an operation that happens across distributed workers. >>> Workers in spark/flink/trino/etc. all need to refresh credentials for long >>> running operations and results in orders of magnitude higher request rates >>> than a table load. We originally expected to use the table load even for >>> this, but the concern was it would effectively DDOS the catalog. >>> >>> If there are specific cases that have solid justification like the above, I >>> think we should add specific endpoints, but those should be used sparingly. >>> >>> > In other words -- if it's true that "partial metadata doesn't align with >>> > primary use cases", it seems true that "full metadata doesn't align with >>> > almost all use cases". >>> >>> I don't find this argument compelling. Are you saying that any case where >>> everything from a response isn't fully used, you should optimize that >>> request so that a client can only request the specific information it will >>> use? Generally, we want a surface area that can address most use cases and >>> as a consequence, not every request is going to perfectly match the >>> specific needs of the client. >>> >>> -Dan >>> >>> >>> On Thu, Oct 31, 2024 at 11:03 AM Eric Maynard <eric.w.mayn...@gmail.com> >>> wrote: >>>> >>>> Thanks for this breakdown Dan. >>>> >>>> I share your concerns about the complexity this might impose on the >>>> client. On some of your other notes, I have some thoughts below: >>>> >>>> >>>> Several Apache Polaris (Incubating) committers were in the recent sync on >>>> this proposal, so I want to share one perspective related to the last >>>> point re: Partial metadata impedes adoption. >>>> >>>> Personally, I feel better about the prospect of Polaris supporting a >>>> flexible loadTableV2-type API as opposed to having to keep adding more >>>> endpoints to support new use cases that really just boil down to partial >>>> metadata. Gabor gives the example of isLatest above, and a recent proposal >>>> described an endpoint for credentials. I can't speak for every REST >>>> catalog implementation, but I am worried that Polaris will have to keep >>>> adding more APIs that really just expose various different slices of the >>>> loadTable response. >>>> >>>> I also like that loadTableV2 gives us the option to "partially implement" >>>> the partial metadata response like you noted. Compared to something like a >>>> credential endpoint that either works or doesn't work, the loadTableV2 >>>> endpoint can be trivially implemented to just return all metadata like >>>> loadTable "V1" does. In my view, this makes the road to adoption easier. >>>> >>>> >>>> With respect to your section titled Partial metadata doesn't align with >>>> primary use cases: >>>> >>>> It's certainly true that many use cases do require a significant amount of >>>> the metadata returned by loadTable today. However I would guess that very >>>> few truly require 100% of the metadata. If we are evaluating endpoints >>>> based on how consistently useful the response will be, I feel like this >>>> argument turns into a stronger one against loadTableV1 than loadTableV2. >>>> >>>> In other words -- if it's true that "partial metadata doesn't align with >>>> primary use cases", it seems true that "full metadata doesn't align with >>>> almost all use cases". >>>> >>>> Even if most use cases do need 90% of the metadata, it seems like a useful >>>> optimization for the client to not have to request whatever it doesn't >>>> need. This also gives us the flexibility to make table metadata richer in >>>> the future without having to worry about the cost a heavier metadata >>>> payload might incur for existing use cases. >>>> >>>> >>>> Eric M. >>>> >>>> >>>> On Thu, Oct 31, 2024 at 10:37 AM Daniel Weeks <dwe...@apache.org> wrote: >>>>> >>>>> I'd like to clarify my concerns here because I think there are more >>>>> aspects to this than we've captured. >>>>> >>>>> Partial metadata loads adds significant complexity to the protocol >>>>> Iceberg metadata is a complicated structure and finding a way to >>>>> represent how and what we want to piece apart is non-trivial. There are >>>>> nested structures and references between different fields that would all >>>>> need custom ways to return through a response. This also makes it >>>>> difficult for clients to process and services to implement. Adding this >>>>> (even with an option to return full metadata with requirements that >>>>> reflect the table spec) necessitates a v2 endpoint. If catalogs are >>>>> required to support all partial load semantics, then the catalog becomes >>>>> complicated. If the catalog can opt to always return the full metadata, >>>>> it makes the client more complicated since they may have to handle to >>>>> very different looking response objects for any load request. >>>>> >>>>> Partial metadata doesn't address the underlying issue, but pushes it >>>>> somewhere else >>>>> From a client perspective, I can see that this feels like an optimization >>>>> because I can just grab what I want from the metadata (e.g. schema, or >>>>> properties). However, all we've done is push that complexity to the >>>>> server which either has to parse the metadata and return a subset of it, >>>>> or needs to have a more complicated way of representing and storing >>>>> independent pieces of metadata (all while still being required to produce >>>>> new json metadata). All we've done here is make the service more >>>>> complicated, and the underlying issue of maintenance of the metadata >>>>> still needs to be addressed. >>>>> >>>>> Partial metadata doesn't align with primary use cases >>>>> The vast majority of use cases require a significant amount of the >>>>> metadata returned in the load table response. While some pieces may be >>>>> discarded, much of the information is necessary to read or update a >>>>> table. The ref loading was an effort to limit the overall size of the >>>>> response and include the vast majority of relevant information for read >>>>> only uses cases, but even our most complete implementations still need >>>>> the full metadata to properly construct a new commit and resolve >>>>> conflicts. >>>>> >>>>> Even the example of Impala trying to load the location to determine if >>>>> the table has changed is less than ideal because to accurately answer >>>>> that question, you need to load the metadata. For example, if there was >>>>> a background compaction that resulted in a rewrite operation or a >>>>> property change that doesn't affect the underlying data, it may not be >>>>> necessary to invalidate the cache. This approach is further exacerbated >>>>> if the community decides to remove the location requirement because it >>>>> would then not be available to signify the state of the table. >>>>> >>>>> Partial metadata impedes adoption >>>>> My biggest concern is that the added complexity here impedes adoption of >>>>> the REST specification. There are a large number of engines and catalog >>>>> implementations that are still in the early stages of the adoption curve. >>>>> Partial metadata loads splits these groups into the catalogs willing to >>>>> implement it and engines that start requiring it in order to function. >>>>> While I think partial metadata loads is an interesting technical >>>>> challenge, I don't believe that it's necessary and our effort should go >>>>> into producing good solutions for metadata management and implementations >>>>> of catalogs that can return the table metadata quickly to clients. >>>>> >>>>> I feel like focusing on table metadata maintenance addresses all of the >>>>> issues except the most extreme edge cases and good catalog >>>>> implementations can return a metadata payload faster the most object >>>>> stores can even load the metadata json file (in practice single digit >>>>> millisecond responses are achievable here), so performance is not the >>>>> tradeoff. >>>>> >>>>> - Dan >>>>> >>>>> >>>>> On Tue, Oct 29, 2024 at 1:31 AM Gabor Kaszab <gaborkas...@apache.org> >>>>> wrote: >>>>>> >>>>>> Hi Iceberg Community, >>>>>> >>>>>> I just wanted to mention that I was also going to start a discussion >>>>>> about getting partial information from LoadTableResponse through the >>>>>> REST API. >>>>>> My motivation is a bit different here, though: >>>>>> Impala currently has strong integration with HMS and in turn with the >>>>>> HiveCatalog. Nowadays there are efforts put into the project to make it >>>>>> work with REST catalog for Iceberg tables, and there is one piece that >>>>>> we miss now with the REST API. Impala caches table metadata and we need >>>>>> a way to decide whether we have to reload the metadata for a particular >>>>>> table or not. Currently, with HMS we have a push-based solution where >>>>>> every change of the table is pushed to Impala from HMS as >>>>>> notifications/events, and with REST catalog we were thinking of a >>>>>> pull-based approach where Impala occasionally asks the REST catalog >>>>>> whether a particular table is up-to-date or not. >>>>>> >>>>>> Use-case: So in Impala's case what would be important is to have a REST >>>>>> Catalog API to answer a question like: >>>>>> "I cached this version of this particular table, is it up-to-date or do >>>>>> I have to reload it?" >>>>>> >>>>>> Possible solutions: >>>>>> 1) This could either be achieved by an API like this: >>>>>> boolean isLatest(TableIdentifier ident, String metadataLocation); >>>>>> 2) Another approach could be to get the latest metadata location and let >>>>>> the engine compare it to the one it holds: >>>>>> String metadataLocation(TableIdentifier ident); >>>>>> 3) Similarly to 2) querying metadata location could also be achieved by >>>>>> the current proposal of partial metadata like: (I just made up some >>>>>> types here) >>>>>> Table loadTable(TableIdentifier ident, >>>>>> SomeFilterClass.MetadataLocation); >>>>>> >>>>>> Either way is fine for Impala I think, I just wanted to share our >>>>>> use-case that could also leverage getting partial metadata. >>>>>> Now that I have written this mail it seems to hijack the original >>>>>> conversation a bit. Let me know if I should raise this in a separate >>>>>> [discuss] thread. >>>>>> >>>>>> Regards, >>>>>> Gabor >>>>>> >>>>>> On Tue, Oct 29, 2024 at 2:16 AM Haizhou Zhao >>>>>> <zhaohaizhou940...@gmail.com> wrote: >>>>>>> >>>>>>> Hello Dev list, >>>>>>> >>>>>>> I want to update the community on the current thread for the proposal >>>>>>> "Partially Loading Metadata - LoadTable V2" after hearing more >>>>>>> perspectives from the community. In general, there are still some >>>>>>> distance to go for a general consensus which I hope to foster more >>>>>>> conversations and hear new inputs. >>>>>>> >>>>>>> Previous Discussions >>>>>>> (https://docs.google.com/document/d/1Nv7_9XqS8EyR30_mrrqkwbZx9pw34i3HYIwuDDXnOY4/edit?tab=t.0) >>>>>>> >>>>>>> >>>>>>> 10/28/2024, quick google meet discussion >>>>>>> >>>>>>> Thanks, Christian, Dmitri, Eric, JB, Szehon, Yufei for your time and >>>>>>> voicing your opinion this morning. Here're a quick summary of what we >>>>>>> discussed (detail meeting notes also included in the link above): >>>>>>> >>>>>>> Folks agreed that having a REST endpoint allowing clients to filter for >>>>>>> what they need from LoadTableResult is a useful feature. The >>>>>>> preliminary use cases that are brought up: >>>>>>> 1. Load only current snapshot and current schema >>>>>>> 2. Load only metadata file location >>>>>>> 3. Load only credentials to access table >>>>>>> 4. Query historical status of the table when time traveling >>>>>>> Meanwhile, it is also important for this endpoint to be extensible >>>>>>> enough so that it could take care of likewise use cases that only >>>>>>> require a portion of LoadTableResult (metadata included) in the future. >>>>>>> >>>>>>> What the group has no strong preference or needs further inputs are: >>>>>>> 1. Whether to modify the existing loadTable endpoint for partial >>>>>>> loading or creating a new endpoint. The possible concern here is >>>>>>> backward compatibility. >>>>>>> 2. Whether to add bulk support to support cases like loading the >>>>>>> current schema of all tables belonging to the same namespace. >>>>>>> >>>>>>> >>>>>>> 10/23/2024, Iceberg community sync >>>>>>> >>>>>>> Thanks, Ryan, Dan, Yufei, JB, Russel and Szehon for your inputs here. >>>>>>> >>>>>>> Folks are divided in two aspects: >>>>>>> 1. Can we use table maintenance work to keep metadata size at check, >>>>>>> thus preventing the necessity to slice metadata at all? >>>>>>> 2. Is it the same use case to bulk load part of the information for >>>>>>> many tables and to load part of the information for one table? >>>>>>> >>>>>>> >>>>>>> 10/09/2024, Dev list >>>>>>> >>>>>>> Thanks, Dan, Eduard for your inputs here. >>>>>>> >>>>>>> Folks are aligned here to extend the existing "refs" mode to other >>>>>>> fields (i.e. metadata-log, snapshot-log, schemas), so that we can >>>>>>> lazily load those fields if not needed. >>>>>>> >>>>>>> >>>>>>> There are other parties from the community I had discussion on this >>>>>>> topic with. I appreciate your input, and I failed to mention the >>>>>>> discussion here because I forgot to keep a written record of the >>>>>>> context for those discussions. In case you fall into this category, >>>>>>> then I do apologize. >>>>>>> >>>>>>> >>>>>>> Summary of perspectives >>>>>>> >>>>>>> The original proposal was aimed to tackle the growing metadata problem, >>>>>>> and proposed a loadTable V2 endpoint. As the last thread mentioned, the >>>>>>> conclusion at the time was that extending the existing "refs" loading >>>>>>> mode to more fields is preferable as it introduces less complexity and >>>>>>> is more feasible to implement. >>>>>>> >>>>>>> The later threads were where the community divided. On the one side, >>>>>>> there's a general scepticism on the concept of partial metadata (i.e. >>>>>>> union results from different requests has been a problem, even for >>>>>>> "refs" lazy loading in the past); on the other side, there's a push to >>>>>>> generalize partial metadata concept to "LoadTableResult" as a whole >>>>>>> (e.g. to only return metadata file location, or only return table >>>>>>> access creds based on client filter). >>>>>>> >>>>>>> Related is the concept of bulk API, where the community has raised this >>>>>>> use case more than once, which are typically related to data warehouse >>>>>>> management features, such as: 1) querying current schemas of all the >>>>>>> tables belonging to a namespace; 2) querying certain table properties >>>>>>> of many tables to see if any maintenance (downstream) jobs should be >>>>>>> triggered; 3) querying ownership information of all tables to check >>>>>>> security compliance of all the tables in data warehouse, etc. >>>>>>> >>>>>>> I want to lay everything down and foster more discussion for a good >>>>>>> direction: >>>>>>> 1. extend the current "refs" lazy loading mechanism to be a more >>>>>>> generic solution >>>>>>> 2. prevent partial metadata at all cost, and try to contain metadata >>>>>>> size to always (or most of the time) load in full >>>>>>> 3. generalize partial loading concept to the entire "LoadTableResult" >>>>>>> (e.g. a generic loadTable V2 endpoint), so that users can use the same >>>>>>> endpoint whether they want part of metadata, or other part of the >>>>>>> "LoadTableResult" (e.g. metadata file location; table creds) >>>>>>> 4. repurposing the last direction to make a bulk API for the REST spec, >>>>>>> where loading pieces of information from many tables are permitted >>>>>>> Or if there are other directions I failed to account for here. >>>>>>> >>>>>>> Looking forward to feedback/discussion from the community, thanks! >>>>>>> Haizhou