Chetas, Iceberg has an implementation of what you're talking about. There's
a caching layer implemented as a catalog, `CachingCatalog`. That's turned
on by default in the Flink catalog but the default interval is 30s. Maybe
you need to extend that interval by setting `cache.expiration-interval-ms`
in your catalog config?

On Wed, Mar 6, 2024 at 11:52 AM Chetas Joshi <chetas.jo...@gmail.com> wrote:

> Hi Community,
>
> I am working on loading iceberg data from S3 using Flink. I am using
> GlueCatalog for storing the iceberg table metadata. I found that the
> GlueCatalog’s loadTable call (implemented
> <https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L46>
> in the abstract class BaseMetastoreCatalog) creates a new instance of
> GlueTableOperations every time for a Glue table identifier. This instance
> is initialized with shouldRefresh = true and hence it refreshes the
> tableMetadata for a given table identifier every time the loadTable is
> called for that tableIdentifier even though it was called in the recent
> past. I am wondering why these tableOperation instances are not cached in
> the catalog. I suggest the following changes in the newTableOps method
> <https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L205>
> in the GlueCatalog (and other catalog impls) and would really appreciate
> the community's feedback on this.
>
> protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
>
>     // tableCache is a Cache with key=tableIdentifier and 
> value=GlueTableOperations object
>
>     if (tableCache.containsKey(tableIdentifier)) {
>
>        return tableCache.get(tableIdentifier)
>
>     } else {
>
>        return new GlueTableOperations(....)
>
>     }
> }
>
> If you like the approach, I am happy to contribute to open source. Let me
> know.
>
> Thank you
> Chetas
>
>


-- 
Ryan Blue
Tabular

Reply via email to