Re: Optimize object lookup in REST catalog

Vladimir Ozerov Tue, 17 Dec 2024 22:22:05 -0800

Hi Piotr, Yufei,

Thanks for the feedback.


In addition to a single object lookup and namespace listing, is there
anything else that can potentially help query engines reduce latency during
semantic analysis?

As an example, maybe a bulk object lookup? Like, you have 10 objects in a
query. Usually, one would traverse AST, resolving objects one by one. With
bulk lookup, an engine can collect all object references first, and call
the REST catalog only one time instead of 10. Could be useful for
convoluted BI semantic models or ODS layer querying.

For object listing, do you think that additional filters (like in JDBC or
Arrow Flight SQL), or sorting might be useful here? It would be nice to
have several examples of real metadata queries generated by BI tools for
better understanding.

Trying to collect more pain points to wrap my head around the potential
proposal.

*Vladimir Ozerov*

Ср, 18 дек. 2024 г. в 05:12, Yufei Gu <flyrain...@gmail.com>:

> Seems a nice optimization. I also echo Piotr's point about the list
> endpoints. Either a `relation` or a `table-like` are good to have.
> Looking forward to a formal proposal!
>
> Yufei
>
>
> On Thu, Dec 5, 2024 at 5:37 AM Piotr Findeisen <piotr.findei...@gmail.com>
> wrote:
>
>> Hi
>>
>> I like the idea to just "get relation" to get the relation in one shot.
>> Similar thing applies to listing relations. This is obviously less common
>> operation, but not uncommon and also more expensive.
>> BI tools query information_schema.tables (and other information_schema
>> information).
>> To complete information_schema.tables query we need to call "list tables"
>> and "list views" separately, whereas it could be sufficient to "list
>> relations".
>>
>> Given than tables views and materialized views all share single
>> namespace, such a unified API should be possible to add and definitely
>> would be beneficial for users.
>>
>> Best,
>> Piotr
>>
>>
>>
>>
>> On Thu, 5 Dec 2024 at 08:53, Vladimir Ozerov <voze...@querifylabs.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Consider the query “SELECT * FROM t”.
>>>
>>> The query engine needs to resolve the object “t” during semantic
>>> analysis. In Iceberg, this could be a table, a view, a materialized view
>>> (soon).
>>>
>>> Currently, the engine has to guess object type via multiple REST calls,
>>> e.g loadTable -> loadView. This increases latency and REST server load,
>>> bloats audit records, etc.
>>>
>>> Caching may help to some extent, but is not a general solution: doesn’t
>>> work for one-shot executions (Spark), breaks security and audit,
>>> inefficient in the case of near real-time ingestion, etc.
>>>
>>> Do you think it is worth adding an endpoint, which will return an object
>>> metadata in one hop? Like “loadObject: oneof(Table, View)”. A careful
>>> analysis may reveal more use cases potentially prompting even more generic
>>> endpoint.
>>>
>>> If this was discussed earlier, could you kindly point me to the
>>> discussion (couldn’t find one)?
>>>
>>> Regards,
>>> Vladimir
>>>
>>

Re: Optimize object lookup in REST catalog

Reply via email to