Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

2019-03-18 Thread Sandeep Nayak
To Xabriel's point, it would be good to have a Store abstraction so that one could plug-in an implementation be it HMS or something else. On Mon, Mar 18, 2019 at 3:20 PM Xabriel Collazo Mojica wrote: > +1 for having a tool/API to migrate tables from HMS into Iceberg. > > > > We do not use HMS i

Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

2019-03-18 Thread Xabriel Collazo Mojica
+1 for having a tool/API to migrate tables from HMS into Iceberg. We do not use HMS in my current project, but since HMS is the de facto catalog in most companies doing Hadoop, I think such a tool would be vital for incentivizing Iceberg adoption and/or PoCs. Xabriel J Collazo Mojica | Senior

Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

2019-03-18 Thread Anton Okolnychyi
I definitely support this idea. Having a clean and reliable API to migrate existing Spark tables to Iceberg will be helpful. I propose to collect all requirements for the new API in this thread. Then I can come up with a doc that we will discuss within the community. From the feature perspective

Re: Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

2019-03-18 Thread Ryan Blue
I think that would be fine, but I want to throw out a quick warning: SparkTableUtil was initially written as a few handy helpers, so it wasn't well designed as an API. It's really useful, so I can understand wanting to extend it. But should we come up with a real API for these conversion tasks inst

Extend SparkTableUtil to Handle Tables Not Tracked in Hive Metastore

2019-03-18 Thread Anton Okolnychyi
Hi, SparkTableUtil can be helpful for migrating existing Spark tables into Iceberg. Right now, SparkTableUtil assumes that the partition information is always tracked in Hive metastore. What about extending SparkTableUtil to handle Spark tables that don’t rely on Hive metastore? I have a local