[DISCUSS] What do we plan for Iceberg 2.0.0 ?

Jean-Baptiste Onofré Fri, 16 Feb 2024 07:52:40 -0800

Hi guys,

During the last community meeting, we started to quickly discuss Iceberg 2.0.
I was quite surprised it came during the community meeting because I
don't remember having a previous discussion (on the mailing list)
about that.


So, I would like to have to start an open discussion about our
community driven roadmap.

I see the following topics that should be discussed (maybe as proposed
by Brian we can have corresponding GitHub issues tagged with
"discussion" flag). That's open questions, feel free to add points I
missed:

* Spec v3
    We have the discussion about ts_nanosecond, and other enhancements
in the spec. Do we plan to have Iceberg 2.0 with Spec v3 ? What do we
plan to include in spec v3 as a target ?
* Catalogs
    We have a consensus that we have too many catalogs, especially
with different capabilities/issues. Jack already started the
discussion to deprecate DynamoDBCatalog. The discussion is:
     - Where do we want the catalog to leave (repository) ?
     - What catalogs do we want to deprecate (HadoopCatalog for instance :)) ?
     - Do we want to have the REST Catalog as a kind of façade for
other catalog/backend ?
* REST Catalog
   If we want to use the REST Catalog as a façade, what are the
requirements to have it even more pluggable for both backend (other
catalogs) and the REST itself (authentication/authorization, runtime,
etc) ? Jack also started a discussion about permission on the REST
catalog.
* Engines
   What engines (and version) do we plan to still support ? What new
engines do we plan (for instance I can work on an Apache Beam and an
Apache Karaf powered engine) ?
* Data file formats / Table formats
   Do we plan to add/remove/update data file formats for 2.0 (Parquet,
ORC, ...) ?
   Same question about table formats ? Do we plan a kind of "tool" to
move data from table formats to Iceberg ?
* Data Injection (e.g. Kafka Connect sink)
   Iceberg 1.5.0 will include the first bricks of Kafka Connect, new
ones will come with 1.6+.
   What do we plan for Iceberg 2.0 on this front ? Do we plan an
additional layer next to Kafka Connect (for instance why not provide
an Apache Camel for read/write data to Iceberg) ?
* Rough date: depending on all previous points (and maybe others :)),
when do we target 2.0.0 ?

That's a raw discussion start, I propose to create a GitHub
"Discussion" issue (flagged with 2.0.0 milestone) for each topic where
we have consensus.

Thoughts ?

Regards
JB

[DISCUSS] What do we plan for Iceberg 2.0.0 ?

Reply via email to