Does it make sense to keep a Hive installation when your parquet files come with a transactional metadata layer like Delta Lake / Apache Iceberg?
My understanding from this: https://github.com/delta-io/delta/issues/85 is that Hive is no longer necessary in a Spark cluster other than discovering where the table is stored. Hence, we can simply do something like: ``` df = spark.read.delta($LOCATION) df.createOrReplaceTempView("myTable") res = spark.sql("select * from myTable") ``` and this approach still gets all the benefits of having the metadata for partition discovery / SQL optimization? With Delta, the Hive metastore should only store a pointer from the table name to the path of the table, and all other metadata will come from the Delta log, which will be processed in Spark. One reason i can think of keeping Hive is to keep track of other data sources that don't necessarily have a Delta / Iceberg transactional metadata layer. But i'm not sure if it's still worth it, are there any use cases i might have missed out on keeping a Hive installation after migrating to Delta / Iceberg? Please correct me if i've used any terms wrongly. On Sun, Apr 25, 2021 at 5:42 PM chia kang ren <kangren.c...@gmail.com> wrote: > Does it make sense to keep a Hive installation when your parquet files > come with a transactional metadata layer like Delta Lake / Apache Iceberg? > > My understanding from this: > https://github.com/delta-io/delta/issues/85 > > is that Hive is no longer necessary in a Spark cluster other than > discovering where the table is stored. Hence, we can simply do something > like: > ``` > df = spark.read.delta($LOCATION) > df.createOrReplaceTempView("myTable") > res = spark.sql("select * from myTable") > ``` > and this approach still gets all the benefits of having the metadata for > partition discovery / SQL optimization? With Delta, the Hive metastore > should only store a pointer from the table name to the path of the table, > and all other metadata will come from the Delta log, which will be > processed in Spark. > > One reason i can think of keeping Hive is to keep track of other data > sources that don't necessarily have a Delta / Iceberg transactional > metadata layer. But i'm not sure if it's still worth it, are there any use > cases i might have missed out on keeping a Hive installation after > migrating to Delta / Iceberg? > > Please correct me if i've used any terms wrongly. >