Re: migrating Hadoop tables to tables with hive catalog

Huadong Liu Thu, 01 Jul 2021 13:38:38 -0700

Thank you all. That saves rewriting all the manifest files, which is a lot.
I did the following and it seems to be working fine.


1. Create an iceberg table using the hive catalog with the table schema,
partition spec etc.
2. Copy the hadoop latest vddddd.metadata.json to the hive table metadata
json
3. Change table-uuid back to the uuid from the original hive table metadata
json.


On Thu, Jul 1, 2021 at 7:00 AM Ryan Murray <[email protected]> wrote:

> I had a short proposal here[1] suggesting the same as Russell. I
> think this is probably a more broadly useful operation but I don't really
> know the best place for it to live. Im happy to finish the proposal if
> there are some opinions on where in iceberg it is appropriate to add such
> functionality.
>
> Best,
> Ryan
>
> [1] https://github.com/apache/iceberg/issues/2288
>
> On Thu, Jul 1, 2021 at 3:34 PM Russell Spitzer <[email protected]>
> wrote:
>
>> I think you could probably also do this by just creating a Hive table and
>> then changing the location to point to the most recent hadoop metadata.json
>> file.
>>
>> On Jul 1, 2021, at 1:42 AM, Huadong Liu <[email protected]> wrote:
>>
>> FYI, I was able to do the migration by casting ManifestFile
>> to GenericManifestFile, resetting sequence number and snapshot id and
>> adding them to AppendFiles.
>>
>> On Mon, Jun 28, 2021 at 3:49 PM Huadong Liu <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am trying to migrate an Iceberg Hadoop table to a table using the hive
>>> catalog. Luckily the table is appended only, so there are no delete files.
>>> It is not clear which APIs were used in a previous post
>>> <https://lists.apache.org/thread.html/r39f2c773bc06889cb19d7de3729d868fccbafbafcfab1922332a4dc6%40%3Cdev.iceberg.apache.org%3E>
>>> .
>>>
>>> The list of ManifestFiles in the current snapshot can be obtained with
>>> the Snapshot allManifests
>>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#allManifests-->
>>> API. However, they cannot be added to the new table's AppendFiles
>>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/AppendFiles.html>
>>>  for
>>> committing because the snapshot id needs to be blank
>>> <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergeAppend.java#L55>
>>> .
>>>
>>> Alternatively, the table snapshots
>>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Table.html#snapshots-->
>>>  API
>>> can be used to get all snapshots of the table. From there, data files for
>>> each snapshot can be obtained with addedFiles
>>> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#addedFiles-->
>>> API and then added to AppendFiles of the new table with hive catalog.
>>>
>>> I am not sure the latter is correct for the migration. Any input is
>>> appreciated.
>>>
>>> --
>>> Huadong
>>>
>>
>>

Re: migrating Hadoop tables to tables with hive catalog

Reply via email to