Thanks for the interest in Hive integration! I haven't heard about progress here lately, so it's good that you bring it up. Hopefully the other people that are interested can jump in with their current status.
I think you're right that the MR input and output formats are a good place to start, but if I remember correctly, Hive ignores the output format's committer. That means we will need to plug in at the catalog level at some point. Owen O'Malley has pointed us to the `RawStore` API that is what backs metastore interaction for that. On Wed, Jan 8, 2020 at 6:28 AM Elliot West <tea...@gmail.com> wrote: > Hello, > > We're considering working on an integration of Iceberg with Apache Hive, > initially so that the latest snapshot of Iceberg tables can be queried via > Hive, but later to allow the writing of data using the Iceberg table format. > > I wanted to first check for the existence and status of any similar > efforts so that we do not find ourselves duplicating work unnecessarily. > I've checked both the Iceberg and Hive projects and can find no issues that > suggest that such an integration is underway or planned (only HIVE-19457 > <https://issues.apache.org/jira/browse/HIVE-19457> which was raised by > myself and remains open). > > If one or more efforts is underway we'd certainly be open to contributing. > If not, we'd be keen to capture any thoughts from the community on > preferred or recommended technical approaches. > > I see that some work occurred on MR In/Out formats > <https://github.com/guilload/incubator-iceberg/pull/1> which might serve > as a foundation, so we'll certainly be investigating those further. > > Thanks, > > Elliot. > -- Ryan Blue Software Engineer Netflix