Hi everyone,

Just want to pull this specific topic out of the materialized view
discussion thread. I noticed this during the MV discussion, and I think it
is important to clarify this not just for the MV topic, but also for the
ongoing discussion to consolidate all the different catalogs.

*How the table/view spec defines Iceberg table/view*

If we look into the table/view spec, the optimistic concurrency section
<https://iceberg.apache.org/spec/#optimistic-concurrency> requires the
existence of a metadata file, and the atomic swap of the metadata file
ensures serializable isolation. This implies 2 things:
1. the metadata file in a storage that holds the information described in
the rest of the spec.
2. there is an object in a catalog that holds the pointer of the metadata
file. What object and what catalog is implementation dependent, but these
generalized concepts are always intact.

The JSON serialization parts of the spec plus the reader requirements also
implies that the metadata file is in JSON format.

So when we talk about an Iceberg table/view that is compliant with the
spec, it is the combination of all these 5 requirements:
1. there is an object in the catalog representing this table/view
2. there is a pointer to a JSON metadata file in the object
3. the JSON metadata file exists in storage and contains the table/view
metadata content
4. the metadata content is compliant with the standard described in the spec
5. serializable isolation is achieved by atomic swap of the object pointer

*How non-REST catalogs are compliant with the table/view spec*

An implementation of the Iceberg table/view is essentially specifying:
1. what is the exact implementation of the catalog, e.g. JDBC, Hive
metastore (HMS), Glue, etc.
2. what is the object that represents a table, e.g. a row in the
"iceberg_tables" table in JDBC, a Table object in HMS/Glue, etc.
3. how is the JSON metadata file pointer stored, e.g. a column in the
table's row in JDBC, metadata_location key in the Table's parameter map in
HMS/Glue, etc.
4. how the atomic swap is implemented, e.g. SQL atomic update in JDBC,
conditional parameter update in HMS, conditional version update in Glue,
etc.

*How the REST spec is NOT compliant with the table/view spec*

The REST spec technically does not match the following table/view spec
requirements:
2. there is a pointer to a JSON metadata file in the object
3. the JSON metadata file exists in storage and contains the table/view
metadata content
5. serializable isolation is achieved by atomic swap of the object pointer

The key parts in REST spec that are not compliant are:
1. metadata-location field is optional in LoadTableResponse
<https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L2721-L2728>
2. pointer swap is not enforced in the UpdateTable
<https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L658>
operation

Therefore, it opens the door for a REST service to be completely not
dependent on a JSON metadata file, store the Iceberg table/view metadata
not as a file, and achieve much better performance characteristics than
other catalogs. This technically gives a unique advantage for REST catalog
adopters that is not there for non-REST catalogs like HMS and Glue.

*How can we fix this?*

I suggest the following:

Firstly, I think it is good that we try to remove the requirements of JSON
metadata file pointer and atomic pointer swap. We know these requirements
have perf limitations based on production usage, especially when the
metadata file is large. If that is the direction, we should make it
official by changing the table/view spec to say that those requirements are
catalog level implementation details that are no longer required.

Secondly, once we do that, we should declare REST spec as the official
catalog spec to interact with Iceberg tables. Otherwise at least I will be
very tempted to just break the atomic pointer swap pattern and store the
entire metadata using the Glue Table object to achieve much better
performance and also Glue native feature integrations, and I think other
players will be equally motivated to do something similar. That will lead
to even more chaos in the Iceberg catalog space.

With REST spec as the official catalog spec, we can actually support
non-REST catalogs by using the HTTP execution chain handler. Dan has
already done a prototype here
<https://github.com/apache/iceberg/commit/619127ff69f89e43a1edef2ea94c3dd439396a8d#diff-869264a83ba9ca657e7defefaa16ad196b0de9fce6c87f97533db77f29e44762>
that is based on this discussion
<https://github.com/apache/iceberg/pull/8091#issuecomment-1647189146> in
the past about using AWS Lambda as an alternative HTTP client for REST
catalog. The same approach can be used to talk to HMS/Glue/JDBC/... while
users will only interact with the RESTCatalog as the entry point.

I think this can provide a good path forward overall for the catalog
consolidation story, interested to know what others think.

Best,
Jack Ye

Reply via email to