Re: [DISCUSS] REST: batch load endpoints

Yufei Gu Tue, 03 Feb 2026 22:19:09 -0800

My understanding is that most batch loading use cases are not
transactional. Continuing to load the remaining tables and returning the
status per table feels more consistent than failing fast.


Continuing to load also aligns with how other catalogs(Glue, UC, HMS)
handle batch metadata fetches: the catalog focuses on returning as much
information as possible, and engines or clients decide whether partial
results are acceptable.

Yufei


On Tue, Feb 3, 2026 at 6:12 PM Guotao Yu <[email protected]> wrote:

> Perhaps I have missed some details, please help me correct them. What I
> want to express is this requirement: when encountering the first
> non-existent table, the behavior of this interface may be one of the
> following:
>
> 1. Continue loading the remaining tables and return the status of each
> table.
>
> 2. Immediately terminate and add the remaining tables to the
> unprocessed-tables list.
>
> The behavior I see in the proposal is the first one.
>
>
> 在 2026年2月4日 10:00:25 上，Steven Wu <[email protected]> 写道：
>
>> > 1. Can the batch loading interface support quick failure? If the engine
>> hopes to fail immediately upon encountering a single table during the
>> planning process, there is no need to load subsequent tables.
>>
>> This behavior aligns with the current wording in the design doc, as the
>> status code for individual table in the response payload only covers "Ok",
>> "NotFound", "NotModified". Any other failure (like authorization) will
>> cause the whole request to fail. This section
>> <https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0>
>> talked about partial failure for authorization issues.
>>
>> This is probably worth discussing. Not sure what others think.
>>
>> > 2. Is it necessary to support time travel configuration at the table
>> level? In time travel queries, the engine may need to load specific
>> versions of table metadata.
>>
>> Time travel is handled entirely on the client side. Catalog just returns
>> the snapshot history in the table.
>>
>> But if you are talking about the multi-table multi-statement transaction,
>> that is mentioned in the non-goal of this proposal. Whatever outcome
>> results from the transaction discussion can apply to both single and batch
>> table load endpoints.
>>
>> On Tue, Feb 3, 2026 at 5:48 PM Guotao Yu <[email protected]> wrote:
>>
>>> Hi Steven,
>>>
>>> 在 2026年2月3日 02:04:13 上，Steven Wu <[email protected]> 写道：
>>>
>>>> Hi,
>>>>
>>>> I would like to discuss a proposal to add batch load endpoints for
>>>> tables and views to the REST spec.
>>>>
>>>> https://docs.google.com/document/d/1VW5hgaaajRWtp5KbOU3s83YyoyPi5WOSvHtoJ_yXzJs/edit?tab=t.0
>>>>
>>>> It can help with the use cases
>>>> * Catalog federation: improve refresh throughput and reduce load on the
>>>> source catalog
>>>> * Improve planning performance by loading multiple referenced tables in
>>>> one request.
>>>> * Improve MV freshness evaluation if referencing multiple source tables
>>>>
>>>> Thanks,
>>>> Steven
>>>>
>>>
>>> I strongly agree with the role of batch loading interfaces in optimizing
>>> the planning performance of the engine. In specific implementations,
>>> concurrent loading can also be used to improve scenarios involving many
>>> tables in queries. After looking at the batch loading API, I have the
>>> following questions:
>>>
>>> 1. Can the batch loading interface support quick failure? If the engine
>>> hopes to fail immediately upon encountering a single table during the
>>> planning process, there is no need to load subsequent tables.
>>>
>>> 2. Is it necessary to support time travel configuration at the table
>>> level? In time travel queries, the engine may need to load specific
>>> versions of table metadata.
>>>
>>> —
>>> Regards
>>> Guotao Yu
>>>
>>

Re: [DISCUSS] REST: batch load endpoints

Reply via email to