Re: [DISCUSS] REST: batch load endpoints

Gábor Kaszab Wed, 04 Feb 2026 01:57:28 -0800

Thank you for the proposal Steven!
I made an initial read of the doc and this is something that would be very
valuable for us too. Let me know if there is any way I can help out here!


Taking a look at the recent conversation here just sharing some ideas:
I believe there could be multiple use-case for this batch load endpoint.
It's either a federating catalog that sends batch load requests or it could
be a query engine loading the tables relevant for a particular query. In
the former I think it's fine to skip the missing tables and continue
processing the rest, while in the latter case I think there is no point in
partial results. Maybe adding a header to toggle between the 2 server-side
behavior makes sense? (I admit, in the latter probably the list of tables
requested is way shorter so might not cause that much difference).

Best Regards,
Gabor



Yufei Gu <[email protected]> ezt írta (időpont: 2026. febr. 4., Sze,
7:19):

> My understanding is that most batch loading use cases are not
> transactional. Continuing to load the remaining tables and returning the
> status per table feels more consistent than failing fast.
>
> Continuing to load also aligns with how other catalogs(Glue, UC, HMS)
> handle batch metadata fetches: the catalog focuses on returning as much
> information as possible, and engines or clients decide whether partial
> results are acceptable.
>
> Yufei
>
>
> On Tue, Feb 3, 2026 at 6:12 PM Guotao Yu <[email protected]> wrote:
>
>> Perhaps I have missed some details, please help me correct them. What I
>> want to express is this requirement: when encountering the first
>> non-existent table, the behavior of this interface may be one of the
>> following:
>>
>> 1. Continue loading the remaining tables and return the status of each
>> table.
>>
>> 2. Immediately terminate and add the remaining tables to the
>> unprocessed-tables list.
>>
>> The behavior I see in the proposal is the first one.
>>
>>
>> 在 2026年2月4日 10:00:25 上，Steven Wu <[email protected]> 写道：
>>
>>> > 1. Can the batch loading interface support quick failure? If the
>>> engine hopes to fail immediately upon encountering a single table during
>>> the planning process, there is no need to load subsequent tables.
>>>
>>> This behavior aligns with the current wording in the design doc, as the
>>> status code for individual table in the response payload only covers "Ok",
>>> "NotFound", "NotModified". Any other failure (like authorization) will
>>> cause the whole request to fail. This section
>>> <https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0>
>>> talked about partial failure for authorization issues.
>>>
>>> This is probably worth discussing. Not sure what others think.
>>>
>>> > 2. Is it necessary to support time travel configuration at the table
>>> level? In time travel queries, the engine may need to load specific
>>> versions of table metadata.
>>>
>>> Time travel is handled entirely on the client side. Catalog just returns
>>> the snapshot history in the table.
>>>
>>> But if you are talking about the multi-table multi-statement
>>> transaction, that is mentioned in the non-goal of this proposal. Whatever
>>> outcome results from the transaction discussion can apply to both single
>>> and batch table load endpoints.
>>>
>>> On Tue, Feb 3, 2026 at 5:48 PM Guotao Yu <[email protected]> wrote:
>>>
>>>> Hi Steven,
>>>>
>>>> 在 2026年2月3日 02:04:13 上，Steven Wu <[email protected]> 写道：
>>>>
>>>>> Hi,
>>>>>
>>>>> I would like to discuss a proposal to add batch load endpoints for
>>>>> tables and views to the REST spec.
>>>>>
>>>>> https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0
>>>>>
>>>>> It can help with the use cases
>>>>> * Catalog federation: improve refresh throughput and reduce load on
>>>>> the source catalog
>>>>> * Improve planning performance by loading multiple referenced tables
>>>>> in one request.
>>>>> * Improve MV freshness evaluation if referencing multiple source tables
>>>>>
>>>>> Thanks,
>>>>> Steven
>>>>>
>>>>
>>>> I strongly agree with the role of batch loading interfaces in
>>>> optimizing the planning performance of the engine. In specific
>>>> implementations, concurrent loading can also be used to improve scenarios
>>>> involving many tables in queries. After looking at the batch loading API, I
>>>> have the following questions:
>>>>
>>>> 1. Can the batch loading interface support quick failure? If the engine
>>>> hopes to fail immediately upon encountering a single table during the
>>>> planning process, there is no need to load subsequent tables.
>>>>
>>>> 2. Is it necessary to support time travel configuration at the table
>>>> level? In time travel queries, the engine may need to load specific
>>>> versions of table metadata.
>>>>
>>>> —
>>>> Regards
>>>> Guotao Yu
>>>>
>>>

Re: [DISCUSS] REST: batch load endpoints

Reply via email to