Re: Hive Metastore integration future

Kristopher Kane Wed, 29 Jan 2020 09:56:14 -0800

Adrian, "I'd imagine that keeping binary compatibility across Hive, Spark
and Iceberg will be quite a challenge."  Yeah, this is what I'm afraid of
over time.  Iceberg's big draw for me is only maintaining a processing
engine (Spark), Iceberg and cloud storage compatibility and any potential
Iceberg use wouldn't even be with the rest of the Hive ecosystem. It would
be simply to gain full functionality of Hive via a ready-to-use metastore
which, right now, defaults to Hive.  Hive 3, with Ranger and Atlas and
Ranger based security, take things even further away for Spark as it is not
allowing interaction with Hive intrinsic services like the metastore
anyway.  It might be that you can run the Hive 3 metastore for now but the
paths forward don't suggest that is accessible for much into the future.


Ryan, when you said, "I'd really love to see a new metastore project," did
you mean internal to the Iceberg project?

Kris

On Wed, Jan 29, 2020 at 12:17 PM Mass Dosage <[email protected]> wrote:

> On the topic of Hive versions - we've definitely experienced some issues
> trying to programmatically use the iceberg-spark-runtime artifact in unit
> tests (it uses Hive 1.2 as mentioned above). We then tried to also use some
> other common HIve testing libraries like HiveRunner
> <https://github.com/klarna/HiveRunner/> and BeeJU
> <https://github.com/HotelsDotCom/beeju> which in turn use Hive 2.3. We
> then ended up with exceptions (e.g. "Method not found") due to
> incompatibilities between the Hive library classes and had to abandon the
> testing libraries. I can share these exceptions if that would be useful but
> I'd imagine that keeping binary compatibility across Hive, Spark and
> Iceberg will be quite a challenge. I'd prefer Iceberg defaulting to Hive
> 2.3.x over 1.2 as 1.2 is pretty old, I don't think any of the commercial
> Hadoop vendors officially support it any more and I think it's used a lot
> less now than 2.x but I could be wrong. Alternatively a way to pick and
> choose a Hive version would be great but probably quite a bit of work to
> pull off...
>
> Adrian
>
> On Wed, 29 Jan 2020 at 16:59, Ryan Blue <[email protected]> wrote:
>
>> Hi Kris,
>>
>> We use version 1.2.1 because the part that we're using hasn't changed
>> much and we want to ensure compatibility with old metastore versions.
>> Iceberg should work with newer metastores, and feel free to open a bug if
>> you find a problem with one. We'll make sure to fix it to be compatible
>> with a range of versions.
>>
>> I'm not sure what people are going to want eventually. Right now, we know
>> that many people use the Hive metastore to track tables, so it makes sense
>> to support it as an option. Iceberg allows you to plug in your own
>> metastore easily because we know that lots of places (Netflix included)
>> have their own metastore implementations. I'd really love to see a new
>> metastore project, but I don't think that Iceberg should be opinionated
>> about which one you use.
>>
>> rb
>>
>> On Wed, Jan 29, 2020 at 7:32 AM Kristopher Kane <[email protected]>
>> wrote:
>>
>>> Hi Iceberg.
>>>
>>> It looks like for most cases where non-atomic rename is required, using
>>> the Hive metastore is the baseline with the ability to implement a custom.
>>>
>>> I couldn't find mailing list history or GitHub issue that suggests that
>>> Iceberg will implement its own. Is that intended for the future?
>>>
>>> I ask because Iceberg's metastore version pin is 1.2.1 which is very
>>> old.  Someone using Iceberg, with a Hive metastore, mind find difficult
>>> moving maintaining peace in upgrades with Hive.
>>>
>>> Related:  Is the intention here that existing Hive users would use the
>>> store that they have and new Iceberg users would implement custom?
>>>
>>> Appreciate help in understanding,
>>>
>>> Kris
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Re: Hive Metastore integration future

Reply via email to