Chetas, Iceberg has an implementation of what you're talking about. There's a caching layer implemented as a catalog, `CachingCatalog`. That's turned on by default in the Flink catalog but the default interval is 30s. Maybe you need to extend that interval by setting `cache.expiration-interval-ms` in your catalog config?
On Wed, Mar 6, 2024 at 11:52 AM Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hi Community, > > I am working on loading iceberg data from S3 using Flink. I am using > GlueCatalog for storing the iceberg table metadata. I found that the > GlueCatalog’s loadTable call (implemented > <https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java#L46> > in the abstract class BaseMetastoreCatalog) creates a new instance of > GlueTableOperations every time for a Glue table identifier. This instance > is initialized with shouldRefresh = true and hence it refreshes the > tableMetadata for a given table identifier every time the loadTable is > called for that tableIdentifier even though it was called in the recent > past. I am wondering why these tableOperation instances are not cached in > the catalog. I suggest the following changes in the newTableOps method > <https://github.com/apache/iceberg/blob/apache-iceberg-1.4.0/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L205> > in the GlueCatalog (and other catalog impls) and would really appreciate > the community's feedback on this. > > protected TableOperations newTableOps(TableIdentifier tableIdentifier) { > > // tableCache is a Cache with key=tableIdentifier and > value=GlueTableOperations object > > if (tableCache.containsKey(tableIdentifier)) { > > return tableCache.get(tableIdentifier) > > } else { > > return new GlueTableOperations(....) > > } > } > > If you like the approach, I am happy to contribute to open source. Let me > know. > > Thank you > Chetas > > -- Ryan Blue Tabular