Does it make sense to keep a Hive installation when your parquet files come with a transactional metadata layer like Delta Lake / Apache Iceberg?
My understanding from this: https://github.com/delta-io/delta/issues/85 is that Hive is no longer necessary other than discovering where the table is stored. Hence, we can simply do something like: ``` df = spark.read.delta($LOCATION) df.createOrReplaceTempView("myTable") res = spark.sql("select * from myTable") ``` and this approach still gets all the benefits of having the metadata for partition discovery / SQL optimization? With Delta, the Hive metastore should only store a pointer from the table name to the path of the table, and all other metadata will come from the Delta log, which will be processed in Spark. One reason i can think of keeping Hive is to keep track of other data sources that don't necessarily have a Delta / Iceberg transactional metadata layer. But i'm not sure if it's still worth it, are there any use cases i might have missed out on keeping a Hive installation after migrating to Delta / Iceberg? Please correct me if i've used any terms wrongly. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org