Edgar is correct. Name mapping is used if a data file has no field ids. When you import data with a name mapping, you should leave it configured on the table so that you can read the data files that you imported.
There's no need for a different mapping because we assume that the files you add to the table all use a consistent naming scheme. You can add more than one alias to a mapping if you need to handle a rename, but most of the time names don't change and are consistent across files if you have been reading the files as a table already using name-based column resolution. On Thu, Nov 5, 2020 at 8:21 AM Edgar Rodriguez <edgar.rodrig...@airbnb.com.invalid> wrote: > Hi Xiang, > > On Thu, Nov 5, 2020 at 11:07 AM 李响 <wate...@gmail.com> wrote: > >> Dear community: >> >> I am using SparkTableUtil to import an existing Hive table to an Iceberg >> table. >> The ORC files of Hive table is an old version of ORC, so I set a name >> mapping (like: id 1 mapped to _col0 and id 2 mapped to _col1...) to the >> Iceberg table by using "schema.name-mapping.default" so that the matrics of >> ORC files could be built correctly during the import process. >> >> After that, I plan to write new data into the Iceberg table (using the >> ORC version 1.6.5 in the iceberg package), how could I deal with that name >> mapping used for importing ? Should I remove that? Does that name mapping >> do any harm when reading/writing from/to the new ORC file? >> > > If I understand correctly the name-mapping would only apply if there were > no Iceberg IDs found in the ORC file as type attributes, which is the case > for the imported data. All new data you write with Iceberg/ORC will have > the Iceberg field-id stored as a type attribute, so when reading those new > files the name-mapping should have no effect since the read path will > detect the Iceberg field-ids. > > Cheers, > -- > Edgar R > -- Ryan Blue Software Engineer Netflix