Re: Hive table compatibility for Iceberg readers

Ryan Blue Wed, 02 Feb 2022 14:40:39 -0800

Walaa, thanks for this list. I think most of these are definitely useful. I
think the best one to focus on first is the default values, since those
will make Iceberg tables behave more like standard SQL tables, which is the
goal.

I'm really curious to learn more about #1, but I don't think that I have
enough detail to know whether it is something that fits in the Iceberg
project. At Netflix, we had an alternative implementation of Hive and Spark
tables (Spark tables are slightly different) that we similarly used. But we
didn't write to both at the same time.

For the others, I'm interested in hearing what other people in the
community find valuable. I don't think I would use #2 or #3, for example.
That's because we already support a flag for case insensitive column
resolution that is well supported throughout Iceberg. If you wanted to use
alternative names, then I'd probably recommend just turning that on...
although that may not be an option depending on how you're working with a
table. It would work in Spark, though. This may be a better feature for
your system that is built on Iceberg.

Reading unions as structs has come up a couple times so that seems like
people will want it. I think someone attempted to add this support in the
past, but ran into issues because the spec is clear that these are NOT
Iceberg files. There is no guarantee that other implementations will read
them and Iceberg cannot write them in this form. I'm fairly confident that
not allowing unions to be written is a good choice, but I would support
being able to read them.

Ryan

On Mon, Jan 31, 2022 at 4:32 PM Owen O'Malley <[email protected]>
wrote:

>
>
> On Thu, Jan 27, 2022 at 10:26 PM Walaa Eldin Moustafa <
> [email protected]> wrote:
>
>> *2. Iceberg schema lower casing:* Before Iceberg, when users read Hive
>> tables from Spark, the returned schema is lowercase since Hive stores all
>> metadata in lowercase mode. If users move to Iceberg, such readers could
>> break once Iceberg returns proper case schema. This feature is to add
>> lowercasing for backward compatibility with existing scripts. This feature
>> is added as an option and is not enabled by default.
>>
>
> This isn't quite correct. Hive lowercases top-level columns. It does not
> lowercase field names inside structs.
>
>
>> *3. Hive table proper casing:* conversely, we leverage the Avro schema
>> to supplement the lower case Hive schema when reading Hive tables. This is
>> useful if someone wants to still get proper cased schemas while still in
>> the Hive mode (to be forward-compatible with Iceberg). The same flag used
>> in (2) is used here.
>>
>
> Are there users of Avro schemas in Hive outside of LinkedIn? I've never
> seen it used. I don't think you should tie #2 and #3 together.
>
> Supporting default values and union types are useful extensions.
>
> .. Owen
>

-- 
Ryan Blue
Tabular

Re: Hive table compatibility for Iceberg readers

Reply via email to