Re: About importing Hive tables and name mapping

Ryan Blue Thu, 05 Nov 2020 08:37:06 -0800

Edgar is correct. Name mapping is used if a data file has no field ids.
When you import data with a name mapping, you should leave it configured on
the table so that you can read the data files that you imported.


There's no need for a different mapping because we assume that the files
you add to the table all use a consistent naming scheme. You can add more
than one alias to a mapping if you need to handle a rename, but most of the
time names don't change and are consistent across files if you have been
reading the files as a table already using name-based column resolution.

On Thu, Nov 5, 2020 at 8:21 AM Edgar Rodriguez
<edgar.rodrig...@airbnb.com.invalid> wrote:

> Hi Xiang,
>
> On Thu, Nov 5, 2020 at 11:07 AM 李响 <wate...@gmail.com> wrote:
>
>> Dear community:
>>
>> I am using SparkTableUtil to import an existing Hive table to an Iceberg
>> table.
>> The ORC files of Hive table is an old version of ORC, so I set a name
>> mapping (like: id 1 mapped to _col0 and id 2 mapped to _col1...) to the
>> Iceberg table by using "schema.name-mapping.default" so that the matrics of
>> ORC files could be built correctly during the import process.
>>
>> After that, I plan to write new data into the Iceberg table (using the
>> ORC version 1.6.5 in the iceberg package), how could I deal with that name
>> mapping used for importing ? Should I remove that? Does that name mapping
>> do any harm when reading/writing from/to the new ORC file?
>>
>
> If I understand correctly the name-mapping would only apply if there were
> no Iceberg IDs found in the ORC file as type attributes, which is the case
> for the imported data. All new data you write with Iceberg/ORC will have
> the Iceberg field-id stored as a type attribute, so when reading those new
> files the name-mapping should have no effect since the read path will
> detect the Iceberg field-ids.
>
> Cheers,
> --
> Edgar R
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: About importing Hive tables and name mapping

Reply via email to