Hey everyone,

Building on the previous thread
<https://lists.apache.org/thread/tv3405nx6zpbm6cxbo71yygf8s9sbj6m>
regarding Catalogs in Beam, @Talat Uyarer <tal...@google.com> and I noticed
several areas where Beam SQL's usability could be significantly improved,
particularly concerning its interaction with existing tables and its
metadata management.

Some gaps we see currently:


   - Lack of a DATABASE concept (analogous to BigQuery datasets or Iceberg
   namespaces)
   - Users are required to execute a redundant CREATE TABLE statement when
   reading from a table that already exists
   - Beam requires the table name/path to be specified in the LOCATION
   property, when. it could be inferred from the reference name in CREATE
   TABLE <name>. For example, a user would need to do something like CREATE
   TABLE foo.bar(...) LOCATION 'foo.bar'. LOCATION may be necessary for
   some IOs like Kafka or Pubsub, but is redundant for others.
   - Missing support for SHOW statements, which are crucial for
   discoverability. e.g.:
      - SHOW CATALOGS
      - SHOW CURRENT CATALOG
      - SHOW DATABASES FROM catalog_name LIKE 'pay*'
      - SHOW CURRENT DATABASE
      - SHOW TABLES FROM catalog_name.database_name NOT LIKE '*foo'
      - Missing support for ALTER statements, which is important for table
   schema manipulation or catalog modification. e.g.:
      - ALTER CATALOG my_catalog SET ('foo_property' = 'bar')
      - ALTER TABLE my_table ADD (col1 INTEGER, col2 TIMESTAMP)

I've created a Github issue to track these points: #35637
<https://github.com/apache/beam/issues/35637>. Our initial focus is on
enhancing the experience for Iceberg users within Beam SQL, but this should
benefit broader Beam SQL usage as well. Please take a look, and if you
identify any other crucial gaps or have suggestions, feel free to comment
there or reply to this thread.

Thanks,
Ahmed

Reply via email to