Hi Fokko

I like the idea, but I think it's more a workaround and could be
confusing for users :)

Regards
JB

On Tue, Nov 12, 2024 at 2:53 PM Fokko Driesprong <fo...@apache.org> wrote:
>
> Hey Gabor,
>
> Thanks for raising this. While reading this, my first thought is to leverage 
> the `tableExists` operation:
> https://github.com/apache/iceberg/blob/e3f39972863f891481ad9f5a559ffef093976bd7/open-api/rest-catalog-open-api.yaml#L1129-L1160
>
> This doesn't return anything today, but we could return a payload to the 
> latest metadata.json.
>
> Looking forward to what others think.
>
> Kind regards,
> Fokko
>
>
>
>
> Op di 12 nov 2024 om 14:33 schreef Shani Elharrar 
> <sh...@upsolver.com.invalid>:
>>
>> I recommend option (b), provided there is no partial metadata loading. We 
>> implemented option (b) internally to facilitate partial metadata loading, as 
>> we have tables with hundreds of thousands of snapshots. This results in 
>> metadata that occupies approximately 500 MB in memory (excluding the 
>> JsonNodes), which is a significant load for some of our services.
>>
>> Shani.
>>
>> On 12 Nov 2024, at 14:12, Gabor Kaszab <gaborkas...@apache.org> wrote:
>>
>> Hey Iceberg Community,
>>
>> Background:
>> Impala is designed in a way to cache the Iceberg table metadata (BaseTable 
>> objects in practice) for faster access. Currently, Impala is tightly coupled 
>> with HMS and in turn with the HiveCatalog, and in order to keep the cached 
>> table objects up-to-date there is a notification mechanism driven by HMS to 
>> notify Impala about any changes in the table metadata.
>> The Impala community is actively looking for ways to decouple HMS from 
>> Impala and provide a way to use Impala without the need for HMS, and get the 
>> Iceberg table metadata from other catalog Implementations mainly focusing 
>> now on REST catalogs.
>>
>> Problem to solve:
>> We identified a particular missing functionality in the current REST spec: 
>> For engines that cache table metadata currently there is no way to check if 
>> that table metadata is up-to-date or not, and whether the engine should 
>> reload the metadata for that table or not without getting a whole table 
>> object from the catalog. For this I think the REST catalog (but in fact I 
>> think this could apply to any other catalogs) should be able to answer a 
>> question like:
>> "Hi Catalog, I have this version of this table, is it up-to-date?"
>>
>> Proposal:
>> I've been following the discussion about partial metadata loading that could 
>> be also used to answer the above question, but I have the impression now 
>> that the conversation stopped making any progress.
>> So instead of waiting for partial metadata loading I propose to have an 
>> addition to the REST spec now to answer the question I raised above:
>>
>> a) boolean isLatest(TableIdentifier ident, String metadataLocation);
>> b) String metadataLocation(TableIdentifier ident);
>>
>> Any of the above 2 approaches could help engines to decide if they have to 
>> invalidate/reload particular table metadata in the cache. I personally would 
>> go for option a) but would be open to hear other opinions.
>>
>> I'd like to know if the community could support me extending the REST spec 
>> with any of the 2 options.
>>
>> Regards,
>> Gabor
>>
>>

Reply via email to