Hi everyone,

At my org we’ve spun up a few Iceberg tables on top of S3 without a
metastore (conscious of the consequences) and we’ve arrived at the point
that we need to support concurrent writes. :) I was hoping to get some
advice as to what the best way to integrate an existing Iceberg table into
a Hive Metastore or an alternative might be. We’re still relatively early
in our adoption of Iceberg and have no real prior experience with Hive so I
don’t know what I don’t know.

Some options we’re weighing:

  - Existing tables aren’t so big that the moral equivalent of "CREATE
TABLE hive.db.table … AS SELECT * FROM hadoop.table" is out of the
question, but we’d prefer to not have to read + rewrite everything. We also
have stateful readers (tracking which snapshots they have previously read)
and preserving table history would make life easier.

  - Doing something along the lines of the following and importing the
tables into Hive as external tables looks like it should work given my
understanding of how Iceberg is using HMS, but I don’t know if it’s
encouraged and I haven’t done diligence to understand potential
consequences:

```
hive> CREATE EXTERNAL TABLE `existing_table` (...)
LOCATION
  's3://existing-table/'
-- serde, input/output formats omitted
TBLPROPERTIES (
  -- Assuming latest metadata file for Hadoop table is v99.metadata.json,
rename it to 00099-uuid.metadata.json
  -- so that BaseMetastoreTableOperations can correctly parse the version
number.

'metadata_location'='s3://existing-table/metadata/00099-uuid.metadata.json',
  'table_type'='ICEBERG'
)
```

  - Others seem to have had success implementing + maintaining a custom
catalog (https://iceberg.apache.org/custom-catalog/) backed by e.g.
DynamoDB for atomic metadata updates, which could appeal to us. Seems like
migration in this case consists of implementing the catalog and plopping
the latest metadata into the backing store. Are custom catalogs more of an
escape hatch when HMS can’t be used, or would that maybe be a reasonable
way forward if we find we don’t want to maintain + operate on top of HMS?

Apologies if this was discussed or documented somewhere else and I’ve
missed it.

Thanks!

Marko

Reply via email to