A motivational example: Trino has to implement a parallel table metadata fetching recently (https://github.com/trinodb/trino/pull/23909) because otherwise metadata queries (e.g., INFORMATION_SCHEMA) was slow. Parallel metadata retrieval boosted metadata query performance significantly. But this solution is far from ideal:
1. Now catalogs will experience request bursts whenever a user or a tool attempts to list Iceberg objects in Trino. This may potentially induce unpredictable latency spikes, especially for large schemas 2. Each such request imposes a constant catalog overhead on request dispatching, serde, security checks, etc. which could be easily avoided with bulk metadata lookup 3. The aforementioned fix addresses only parallel table retrieval. But then the engine will have to support the same thing for views and materialized views, producing even more requests bursts, with considerable number of requests returning error responses because we cannot get object type and its metadata in one shot. On Tue, Dec 24, 2024 at 10:29 PM Vladimir Ozerov <voze...@querifylabs.com> wrote: > Hi, > > Following the discussion [1] I'd like to formally propose an extension to > REST catalog API that allows efficient lookup of multiple catalog objects > without knowing their types in advance. > > When a query is submitted, the engine needs to resolve referenced objects. > The current REST API requires multiple catalog calls per query, because it > (1) assumes the prior knowledge of the object type (not the case for > virtually all query engines), and (2) lacks bulk object lookup operation. > This leads to increased query latency and increased REST catalog load. > > The proposal aims to solve the problem introducing an optional endpoint > that returns information about several catalogs objects, including their > type (table, view) and metadata. > > Note that the proposal attempts to solve two distinct issues via a single > endpoint: > > 1. Inability to lookup the object without knowing its type > 2. Inability to lookup multiple objects in a single request > > If the community finds the proposal too complicated, we can minimize the > scope to the point 1, and introduce an endpoint for object lookup without > knowing it's type. Even without bulk lookup this can help engine developers > minimize SQL query planning latency. > > Proposal: > https://docs.google.com/document/d/1KfzdQT8Q2xiV_yPNvICROCepz-Qqpm0npob7hmb40Fc/edit?usp=sharing > > [1] https://lists.apache.org/thread/g44czzpjqqhdvronqfyckw4mnxvlpn3s > > Regards, > -- > *Vladimir Ozerov* > > -- *Vladimir Ozerov* Founder querifylabs.com