Re: About importing Hive tables and name mapping

Edgar Rodriguez Thu, 05 Nov 2020 08:22:41 -0800

Hi Xiang,

On Thu, Nov 5, 2020 at 11:07 AM 李响 <wate...@gmail.com> wrote:


> Dear community:
>
> I am using SparkTableUtil to import an existing Hive table to an Iceberg
> table.
> The ORC files of Hive table is an old version of ORC, so I set a name
> mapping (like: id 1 mapped to _col0 and id 2 mapped to _col1...) to the
> Iceberg table by using "schema.name-mapping.default" so that the matrics of
> ORC files could be built correctly during the import process.
>
> After that, I plan to write new data into the Iceberg table (using the ORC
> version 1.6.5 in the iceberg package), how could I deal with that name
> mapping used for importing ? Should I remove that? Does that name mapping
> do any harm when reading/writing from/to the new ORC file?
>

If I understand correctly the name-mapping would only apply if there were
no Iceberg IDs found in the ORC file as type attributes, which is the case
for the imported data. All new data you write with Iceberg/ORC will have
the Iceberg field-id stored as a type attribute, so when reading those new
files the name-mapping should have no effect since the read path will
detect the Iceberg field-ids.

Cheers,
-- 
Edgar R

Re: About importing Hive tables and name mapping

Reply via email to