Re: Apache Hive integration

Ryan Blue Wed, 08 Jan 2020 10:43:31 -0800

Thanks for the interest in Hive integration! I haven't heard about progress
here lately, so it's good that you bring it up. Hopefully the other people
that are interested can jump in with their current status.


I think you're right that the MR input and output formats are a good place
to start, but if I remember correctly, Hive ignores the output
format's committer. That means we will need to plug in at the catalog level
at some point. Owen O'Malley has pointed us to the `RawStore` API that is
what backs metastore interaction for that.

On Wed, Jan 8, 2020 at 6:28 AM Elliot West <[email protected]> wrote:

> Hello,
>
> We're considering working on an integration of Iceberg with Apache Hive,
> initially so that the latest snapshot of Iceberg tables can be queried via
> Hive, but later to allow the writing of data using the Iceberg table format.
>
> I wanted to first check for the existence and status of any similar
> efforts so that we do not find ourselves duplicating work unnecessarily.
> I've checked both the Iceberg and Hive projects and can find no issues that
> suggest that such an integration is underway or planned (only HIVE-19457
> <https://issues.apache.org/jira/browse/HIVE-19457> which was raised by
> myself and remains open).
>
> If one or more efforts is underway we'd certainly be open to contributing.
> If not, we'd be keen to capture any thoughts from the community on
> preferred or recommended technical approaches.
>
> I see that some work occurred on MR In/Out formats
> <https://github.com/guilload/incubator-iceberg/pull/1> which might serve
> as a foundation, so we'll certainly be investigating those further.
>
> Thanks,
>
> Elliot.
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Apache Hive integration

Reply via email to