Re: [Beam SQL] Addressing gaps in metadata management to improve usability

Ahmed Abualsaud via dev Tue, 05 Aug 2025 11:59:57 -0700

Hey everyone!

I wanted to follow up on the points above with two things:


   1. I've submitted a PR to add a Database concept (#35641
   <https://github.com/apache/beam/pull/35641>).
   2. I've created a PR that implements a more concrete schema hierarchy
   for Catalogs and Databases (#35787
   <https://github.com/apache/beam/pull/35787>).

The second PR is regarding points (2) and (3) above, and does the following:

   - Introduces CatalogManagerSchema and CatalogSchema that model Catalogs
   and Databases, respectively, moving away from a flat hierarchy of
   BeamCalciteSchemas.
   - Enables cross-database and cross-catalog queries using standard SQL
   syntax (this is the core benefit). This was already possible using a
   Beam-specific LOCATION property. Now, this is possible using standard
   syntax (e.g. INSERT INTO db_1.table_1 SELECT * FROM catalog_2.db_2.table_2).
   - Enables interaction with existing external tables and databases
   without needing a redundant CREATE ... statement. This greatly improves
   usability for external sources.

This is a sizable change that lays the groundwork for more advanced SQL
features like SHOW and ALTER statements in the future.
You can find a more detailed description in the PR. Please take a look and
provide any feedback!

Best,
Ahmed

On Tue, Jul 22, 2025 at 8:42 AM Ahmed Abualsaud <[email protected]>
wrote:

> Hey everyone,
>
> Building on the previous thread
> <https://lists.apache.org/thread/tv3405nx6zpbm6cxbo71yygf8s9sbj6m>
> regarding Catalogs in Beam, @Talat Uyarer <[email protected]> and
> I noticed several areas where Beam SQL's usability could be significantly
> improved, particularly concerning its interaction with existing tables and
> its metadata management.
>
> Some gaps we see currently:
>
>
>    - Lack of a DATABASE concept (analogous to BigQuery datasets or
>    Iceberg namespaces)
>    - Users are required to execute a redundant CREATE TABLE statement
>    when reading from a table that already exists
>    - Beam requires the table name/path to be specified in the LOCATION
>    property, when. it could be inferred from the reference name in CREATE
>    TABLE <name>. For example, a user would need to do something like CREATE
>    TABLE foo.bar(...) LOCATION 'foo.bar'. LOCATION may be necessary for
>    some IOs like Kafka or Pubsub, but is redundant for others.
>    - Missing support for SHOW statements, which are crucial for
>    discoverability. e.g.:
>       - SHOW CATALOGS
>       - SHOW CURRENT CATALOG
>       - SHOW DATABASES FROM catalog_name LIKE 'pay*'
>       - SHOW CURRENT DATABASE
>       - SHOW TABLES FROM catalog_name.database_name NOT LIKE '*foo'
>       - Missing support for ALTER statements, which is important for
>    table schema manipulation or catalog modification. e.g.:
>       - ALTER CATALOG my_catalog SET ('foo_property' = 'bar')
>       - ALTER TABLE my_table ADD (col1 INTEGER, col2 TIMESTAMP)
>
> I've created a Github issue to track these points: #35637
> <https://github.com/apache/beam/issues/35637>. Our initial focus is on
> enhancing the experience for Iceberg users within Beam SQL, but this should
> benefit broader Beam SQL usage as well. Please take a look, and if you
> identify any other crucial gaps or have suggestions, feel free to comment
> there or reply to this thread.
>
> Thanks,
> Ahmed
>

Re: [Beam SQL] Addressing gaps in metadata management to improve usability

Reply via email to