Hi everyone,

It’s been over six months since Iceberg views were introduced (in version
1.5.0), and while we’ve seen some benefits—such as versioning and
cross-engine recognition—there’s still a critical gap in terms of true
interoperability. Although views created by one engine are recognized by
another, they currently cannot be queried across engines as one engine
cannot understand the SQL dialect of another engine.

To address this, I propose a two-step approach:


*Step 1: A temporary solution to manually add SQL dialects.*It is already
possible to do this using some java code but most of the view users
interact via SQL, there’s currently no SQL-native method to manually add
dialects from other engines into Iceberg views. I suggest adding a call
procedure to enable this. I've recently worked on this and have submitted
the following PRs for review:

   - Spark-3.5: Refactor BaseProcedure to support views [#11326
   <https://github.com/apache/iceberg/pull/11326>]
   - Spark-3.5: Procedure to add view dialect [#11327
   <https://github.com/apache/iceberg/pull/11327>]

With these changes, Iceberg views could be queried across different engines
using their respective dialects.
Please review and provide feedback.


*Step 2: A long-term solution for full view interoperability.*For a
sustainable solution, we need to consider both open-source and commercial
engines (e.g., Snowflake, Dremio, BigQuery). A framework that can
automatically translate SQL dialects from one engine to another seems ideal.

I’ve explored Coral*[1]*, which works well with open-source engines like
Hive, Trino, and Spark, but doesn’t fully cover commercial engines. In
contrast, SQLGlot*[2]* seems to be a better fit, as it supports 23 SQL
dialects and would provide a more robust solution for interoperability. But
SQLGlot is a python project. So, we need REST or Py4J to work with SQLGlot.

Additionally, I looked into Substrait*[3]* as a potential intermediate
representation for views, but as per my understanding, integration with
commercial engines isn’t quite ready yet. This technology has broader
applications beyond views, including table planning, and may be worth
revisiting in the future.

Given this, I believe that incorporating a translation layer with SQLGlot
is the right direction. If no one is currently working on view
interoperability, I’d be happy to explore this further and take the lead. I
welcome any thoughts or feedback from those involved in related projects.

Looking forward to hearing your opinions.

[1] https://github.com/linkedin/coral
[2] https://sqlglot.com/sqlglot.html
[3] https://substrait.io/

- Ajantha

Reply via email to