[ https://issues.apache.org/jira/browse/HIVE-26657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Pintér reassigned HIVE-26657: ------------------------------------ > [Iceberg] Filter out the metadata.json file when migrating > ----------------------------------------------------------- > > Key: HIVE-26657 > URL: https://issues.apache.org/jira/browse/HIVE-26657 > Project: Hive > Issue Type: Bug > Reporter: László Pintér > Assignee: László Pintér > Priority: Major > > When migrating a hive table to an iceberg in certain cases a Runtime > exception is raised > {code:java} > ERROR : Failed > java.lang.RuntimeException: > s3a://dev-nfqe-base/cc-cdw-nfqe-q7wj9a/archive/env-8pt556/parquet/bakeoff/large/pli/metadata/00000-94fffe5c-c307-4341-9ea3-f5fa4863d301.metadata.json > is not a Parquet file. Expected magic number at tail, but found [32, 93, 10, > 125] > {code} > The hive-to-iceberg table migration has the following logic. > 1. In order to walk through all the data files we request a file iterator > from the filesystem. This iterator will provide all the references to be able > to scan the data files. > 2. The new iceberg table is created, meaning that a new entry is added to the > hive catalog and on the file system level the metadata directory is created > together with the first metadata file (*.metadata.json) > 3. All the data files are scanned and the manifests are created. > The issue occurs when there are so many data files that it doesn't fit into > memory in one go. So in step 3 when we walk through the data files list, the > iterator has to run another round of file listing that reads up the content > of the metadata directory that was created in step 2. -- This message was sent by Atlassian Jira (v8.20.10#820010)