Hey everyone, Building on the previous thread <https://lists.apache.org/thread/tv3405nx6zpbm6cxbo71yygf8s9sbj6m> regarding Catalogs in Beam, @Talat Uyarer <tal...@google.com> and I noticed several areas where Beam SQL's usability could be significantly improved, particularly concerning its interaction with existing tables and its metadata management.
Some gaps we see currently: - Lack of a DATABASE concept (analogous to BigQuery datasets or Iceberg namespaces) - Users are required to execute a redundant CREATE TABLE statement when reading from a table that already exists - Beam requires the table name/path to be specified in the LOCATION property, when. it could be inferred from the reference name in CREATE TABLE <name>. For example, a user would need to do something like CREATE TABLE foo.bar(...) LOCATION 'foo.bar'. LOCATION may be necessary for some IOs like Kafka or Pubsub, but is redundant for others. - Missing support for SHOW statements, which are crucial for discoverability. e.g.: - SHOW CATALOGS - SHOW CURRENT CATALOG - SHOW DATABASES FROM catalog_name LIKE 'pay*' - SHOW CURRENT DATABASE - SHOW TABLES FROM catalog_name.database_name NOT LIKE '*foo' - Missing support for ALTER statements, which is important for table schema manipulation or catalog modification. e.g.: - ALTER CATALOG my_catalog SET ('foo_property' = 'bar') - ALTER TABLE my_table ADD (col1 INTEGER, col2 TIMESTAMP) I've created a Github issue to track these points: #35637 <https://github.com/apache/beam/issues/35637>. Our initial focus is on enhancing the experience for Iceberg users within Beam SQL, but this should benefit broader Beam SQL usage as well. Please take a look, and if you identify any other crucial gaps or have suggestions, feel free to comment there or reply to this thread. Thanks, Ahmed