Catalog Questions

Taher Koitawala Tue, 30 Jul 2024 22:43:32 -0700

Hi All,
           I have a question about which catalog to be using with Iceberg
for our use case.


We are on Kubernetes running Spark and Minio for storage. We use spark and
write data to Minio as s3 and we use something like a third party Data
Catalog to write the table location and create data lineage.

In our new phase we want to move to iceberg however I see that Iceberg uses
Catalogs to maintain atomicity. What catalog would we use for our usecase?

   1. Will we have to provision a Hive Metastore so that we use the
   SparkCatalog with hive metastore uri so that multiple transactions are
   isolated?
   2. Would a simple Spark Catalog with a warehouse dir on Minio suffice?
      1. In that case my question would be, do all spark jobs refer to one
      warehouse dir? I assume no
   3. What about spark catalog with Jdbc ? Would that be enough and an easy
   way do isolation and atomic rw

Thanks,
Taher Koitawala

Catalog Questions

Reply via email to