Thanks everybody for taking a look at the doc. FYI, I’ve updated it. I would like to share some intermediate thoughts.
1. It seems beneficial to follow the stored procedures approach to call small actions like rollback or expire snapshots. Presto already allows connectors to define stored procedures and it will be much easier to add such syntax to other query engines as it is standard SQL. If we go that route, optional arguments and name-based arguments can make the syntax very reasonable for straightforward operations. 2. There are still some cases where separate commands *may* make sense. For example, it may be more natural to have SNAPSHOT or MIGRATE as separate commands. That way, we can use well-known clauses like TBLPROPERTIES. Later, we may build a VACUUM command with different modes to combine 3-4 actions. We have SNAPSHOT and MIGRATE internally and they are frequently used (especially SNAPSHOT). 3. If we decide to build SNAPSHOT and MIGRATE as separate commands, it is unlikely we can get them into query engines even though the commands are generic. So, we may need to maintain them in Iceberg in a form of SQL extensions (e.g. extended parser via SQL extensions in Spark). That may not be always possible in all query engines. 4. We need to align the syntax including arg names across query engines. Otherwise, it will be a mess if there is a cosmetic difference in each query engine. 5. Spark does not have a plugin for stored procedures. There is a proposal from Ryan to add function catalog API. I think it is a bit different from the stored procedure catalog as functions are used in SELECT and procedures are used in CALL. While we can explore how to add such support to Spark, we most likely need to start with SQL extensions in Iceberg. Otherwise, we will be blocked for a long time. 6. Wherever possible, SQL calls must return some output that should be a summary of what was done. For example, if we expire snapshots, return the number of expired snapshots, the number of removed data and metadata files, the number of scanned manifests, etc. If we import a table, output the number of imported files, etc. 7. SQL calls must be smart. For example, we should not simply rewrite all metadata or data. Commands should analyze what needs to be rewritten. I’ve tried to outline that for metadata and will submit a doc for data compaction. - Anton > On 23 Jul 2020, at 12:40, Anton Okolnychyi <aokolnyc...@apple.com.INVALID> > wrote: > > Hi devs, > > I want to start a discussion on whether we want to have some SQL extensions > in Iceberg that should help data engineers to invoke Iceberg-specific > functionality through SQL. I know companies have this internally but I would > like to unify this starting from Spark 3 and share the same syntax across > query engines to have a consistent behavior. > > I’ve put together a short doc: > > https://docs.google.com/document/d/1Nf8c16R2hj4lSc-4sQg4oiUUV_F4XqZKth1woEo6TN8 > > <https://docs.google.com/document/d/1Nf8c16R2hj4lSc-4sQg4oiUUV_F4XqZKth1woEo6TN8> > > I’d appreciate everyone’s feedback. Please, feel free to comment and add > alternatives. > > Thanks, > Anton