Hey everyone! I wanted to follow up on the points above with two things:
1. I've submitted a PR to add a Database concept (#35641 <https://github.com/apache/beam/pull/35641>). 2. I've created a PR that implements a more concrete schema hierarchy for Catalogs and Databases (#35787 <https://github.com/apache/beam/pull/35787>). The second PR is regarding points (2) and (3) above, and does the following: - Introduces CatalogManagerSchema and CatalogSchema that model Catalogs and Databases, respectively, moving away from a flat hierarchy of BeamCalciteSchemas. - Enables cross-database and cross-catalog queries using standard SQL syntax (this is the core benefit). This was already possible using a Beam-specific LOCATION property. Now, this is possible using standard syntax (e.g. INSERT INTO db_1.table_1 SELECT * FROM catalog_2.db_2.table_2). - Enables interaction with existing external tables and databases without needing a redundant CREATE ... statement. This greatly improves usability for external sources. This is a sizable change that lays the groundwork for more advanced SQL features like SHOW and ALTER statements in the future. You can find a more detailed description in the PR. Please take a look and provide any feedback! Best, Ahmed On Tue, Jul 22, 2025 at 8:42 AM Ahmed Abualsaud <ahmedabuals...@google.com> wrote: > Hey everyone, > > Building on the previous thread > <https://lists.apache.org/thread/tv3405nx6zpbm6cxbo71yygf8s9sbj6m> > regarding Catalogs in Beam, @Talat Uyarer <tal...@google.com> and > I noticed several areas where Beam SQL's usability could be significantly > improved, particularly concerning its interaction with existing tables and > its metadata management. > > Some gaps we see currently: > > > - Lack of a DATABASE concept (analogous to BigQuery datasets or > Iceberg namespaces) > - Users are required to execute a redundant CREATE TABLE statement > when reading from a table that already exists > - Beam requires the table name/path to be specified in the LOCATION > property, when. it could be inferred from the reference name in CREATE > TABLE <name>. For example, a user would need to do something like CREATE > TABLE foo.bar(...) LOCATION 'foo.bar'. LOCATION may be necessary for > some IOs like Kafka or Pubsub, but is redundant for others. > - Missing support for SHOW statements, which are crucial for > discoverability. e.g.: > - SHOW CATALOGS > - SHOW CURRENT CATALOG > - SHOW DATABASES FROM catalog_name LIKE 'pay*' > - SHOW CURRENT DATABASE > - SHOW TABLES FROM catalog_name.database_name NOT LIKE '*foo' > - Missing support for ALTER statements, which is important for > table schema manipulation or catalog modification. e.g.: > - ALTER CATALOG my_catalog SET ('foo_property' = 'bar') > - ALTER TABLE my_table ADD (col1 INTEGER, col2 TIMESTAMP) > > I've created a Github issue to track these points: #35637 > <https://github.com/apache/beam/issues/35637>. Our initial focus is on > enhancing the experience for Iceberg users within Beam SQL, but this should > benefit broader Beam SQL usage as well. Please take a look, and if you > identify any other crucial gaps or have suggestions, feel free to comment > there or reply to this thread. > > Thanks, > Ahmed >