I had a short proposal here[1] suggesting the same as Russell. I think this
is probably a more broadly useful operation but I don't really know the
best place for it to live. Im happy to finish the proposal if there are
some opinions on where in iceberg it is appropriate to add such
functionality.

Best,
Ryan

[1] https://github.com/apache/iceberg/issues/2288

On Thu, Jul 1, 2021 at 3:34 PM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> I think you could probably also do this by just creating a Hive table and
> then changing the location to point to the most recent hadoop metadata.json
> file.
>
> On Jul 1, 2021, at 1:42 AM, Huadong Liu <huadong...@gmail.com> wrote:
>
> FYI, I was able to do the migration by casting ManifestFile
> to GenericManifestFile, resetting sequence number and snapshot id and
> adding them to AppendFiles.
>
> On Mon, Jun 28, 2021 at 3:49 PM Huadong Liu <huadong...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to migrate an Iceberg Hadoop table to a table using the hive
>> catalog. Luckily the table is appended only, so there are no delete files.
>> It is not clear which APIs were used in a previous post
>> <https://lists.apache.org/thread.html/r39f2c773bc06889cb19d7de3729d868fccbafbafcfab1922332a4dc6%40%3Cdev.iceberg.apache.org%3E>
>> .
>>
>> The list of ManifestFiles in the current snapshot can be obtained with
>> the Snapshot allManifests
>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#allManifests-->
>> API. However, they cannot be added to the new table's AppendFiles
>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/AppendFiles.html>
>>  for
>> committing because the snapshot id needs to be blank
>> <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergeAppend.java#L55>
>> .
>>
>> Alternatively, the table snapshots
>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Table.html#snapshots-->
>>  API
>> can be used to get all snapshots of the table. From there, data files for
>> each snapshot can be obtained with addedFiles
>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#addedFiles-->
>> API and then added to AppendFiles of the new table with hive catalog.
>>
>> I am not sure the latter is correct for the migration. Any input is
>> appreciated.
>>
>> --
>> Huadong
>>
>
>

Reply via email to