Thank you for the proposal Steven! I made an initial read of the doc and this is something that would be very valuable for us too. Let me know if there is any way I can help out here!
Taking a look at the recent conversation here just sharing some ideas: I believe there could be multiple use-case for this batch load endpoint. It's either a federating catalog that sends batch load requests or it could be a query engine loading the tables relevant for a particular query. In the former I think it's fine to skip the missing tables and continue processing the rest, while in the latter case I think there is no point in partial results. Maybe adding a header to toggle between the 2 server-side behavior makes sense? (I admit, in the latter probably the list of tables requested is way shorter so might not cause that much difference). Best Regards, Gabor Yufei Gu <[email protected]> ezt írta (időpont: 2026. febr. 4., Sze, 7:19): > My understanding is that most batch loading use cases are not > transactional. Continuing to load the remaining tables and returning the > status per table feels more consistent than failing fast. > > Continuing to load also aligns with how other catalogs(Glue, UC, HMS) > handle batch metadata fetches: the catalog focuses on returning as much > information as possible, and engines or clients decide whether partial > results are acceptable. > > Yufei > > > On Tue, Feb 3, 2026 at 6:12 PM Guotao Yu <[email protected]> wrote: > >> Perhaps I have missed some details, please help me correct them. What I >> want to express is this requirement: when encountering the first >> non-existent table, the behavior of this interface may be one of the >> following: >> >> 1. Continue loading the remaining tables and return the status of each >> table. >> >> 2. Immediately terminate and add the remaining tables to the >> unprocessed-tables list. >> >> The behavior I see in the proposal is the first one. >> >> >> 在 2026年2月4日 10:00:25 上,Steven Wu <[email protected]> 写道: >> >>> > 1. Can the batch loading interface support quick failure? If the >>> engine hopes to fail immediately upon encountering a single table during >>> the planning process, there is no need to load subsequent tables. >>> >>> This behavior aligns with the current wording in the design doc, as the >>> status code for individual table in the response payload only covers "Ok", >>> "NotFound", "NotModified". Any other failure (like authorization) will >>> cause the whole request to fail. This section >>> <https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0> >>> talked about partial failure for authorization issues. >>> >>> This is probably worth discussing. Not sure what others think. >>> >>> > 2. Is it necessary to support time travel configuration at the table >>> level? In time travel queries, the engine may need to load specific >>> versions of table metadata. >>> >>> Time travel is handled entirely on the client side. Catalog just returns >>> the snapshot history in the table. >>> >>> But if you are talking about the multi-table multi-statement >>> transaction, that is mentioned in the non-goal of this proposal. Whatever >>> outcome results from the transaction discussion can apply to both single >>> and batch table load endpoints. >>> >>> On Tue, Feb 3, 2026 at 5:48 PM Guotao Yu <[email protected]> wrote: >>> >>>> Hi Steven, >>>> >>>> 在 2026年2月3日 02:04:13 上,Steven Wu <[email protected]> 写道: >>>> >>>>> Hi, >>>>> >>>>> I would like to discuss a proposal to add batch load endpoints for >>>>> tables and views to the REST spec. >>>>> >>>>> https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0 >>>>> >>>>> It can help with the use cases >>>>> * Catalog federation: improve refresh throughput and reduce load on >>>>> the source catalog >>>>> * Improve planning performance by loading multiple referenced tables >>>>> in one request. >>>>> * Improve MV freshness evaluation if referencing multiple source tables >>>>> >>>>> Thanks, >>>>> Steven >>>>> >>>> >>>> I strongly agree with the role of batch loading interfaces in >>>> optimizing the planning performance of the engine. In specific >>>> implementations, concurrent loading can also be used to improve scenarios >>>> involving many tables in queries. After looking at the batch loading API, I >>>> have the following questions: >>>> >>>> 1. Can the batch loading interface support quick failure? If the engine >>>> hopes to fail immediately upon encountering a single table during the >>>> planning process, there is no need to load subsequent tables. >>>> >>>> 2. Is it necessary to support time travel configuration at the table >>>> level? In time travel queries, the engine may need to load specific >>>> versions of table metadata. >>>> >>>> — >>>> Regards >>>> Guotao Yu >>>> >>>
