Re: [DISCUSS] SQL syntax extensions

Anton Okolnychyi Tue, 04 Aug 2020 15:13:10 -0700

During the last sync we discussed a blocker for this work raised by Carl. It 
was unclear how role-based control will work in the proposed approach. 
Specifically, how to ensure that user `X` not only has access to a stored 
procedure but is also allowed to execute it on table `T` where table name `T` 
is provided as an argument.


The Presto community can expose an API for performing security checks within 
stored procedures to address this. It is not ideal as it is up to the stored 
procedure to do all checks correctly but it solves the problem.

- Anton 

> On 29 Jul 2020, at 13:46, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> That looks like a good plan to me. Initially using stored procedures and 
> adding custom syntax where possible sounds like a good way to start.
> 
> For Spark, I agree that we can start exploring a plugin that can extend 
> Spark's syntax. Having that done will make development faster and make it 
> easier to get this upstream, I think.
> 
> On Mon, Jul 27, 2020 at 11:14 PM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Thanks everybody for taking a look at the doc. FYI, I’ve updated it.
> 
> I would like to share some intermediate thoughts.
> 
> 1. It seems beneficial to follow the stored procedures approach to call small 
> actions like rollback or expire snapshots. Presto already allows connectors 
> to define stored procedures and it will be much easier to add such syntax to 
> other query engines as it is standard SQL. If we go that route, optional 
> arguments and name-based arguments can make the syntax very reasonable for 
> straightforward operations.
> 
> 2. There are still some cases where separate commands *may* make sense. For 
> example, it may be more natural to have SNAPSHOT or MIGRATE as separate 
> commands. That way, we can use well-known clauses like TBLPROPERTIES. Later, 
> we may build a VACUUM command with different modes to combine 3-4 actions. We 
> have SNAPSHOT and MIGRATE internally and they are frequently used (especially 
> SNAPSHOT). 
> 
> 3. If we decide to build SNAPSHOT and MIGRATE as separate commands, it is 
> unlikely we can get them into query engines even though the commands are 
> generic. So, we may need to maintain them in Iceberg in a form of SQL 
> extensions (e.g. extended parser via SQL extensions in Spark). That may not 
> be always possible in all query engines.
> 
> 4. We need to align the syntax including arg names across query engines. 
> Otherwise, it will be a mess if there is a cosmetic difference in each query 
> engine.
> 
> 5. Spark does not have a plugin for stored procedures. There is a proposal 
> from Ryan to add function catalog API. I think it is a bit different from the 
> stored procedure catalog as functions are used in SELECT and procedures are 
> used in CALL. While we can explore how to add such support to Spark, we most 
> likely need to start with SQL extensions in Iceberg. Otherwise, we will be 
> blocked for a long time.
> 
> 6. Wherever possible, SQL calls must return some output that should be a 
> summary of what was done. For example, if we expire snapshots, return the 
> number of expired snapshots, the number of removed data and metadata files, 
> the number of scanned manifests, etc. If we import a table, output the number 
> of imported files, etc.
> 
> 7. SQL calls must be smart. For example, we should not simply rewrite all 
> metadata or data. Commands should analyze what needs to be rewritten. I’ve 
> tried to outline that for metadata and will submit a doc for data compaction.
> 
> - Anton
> 
> 
>> On 23 Jul 2020, at 12:40, Anton Okolnychyi <aokolnyc...@apple.com.INVALID 
>> <mailto:aokolnyc...@apple.com.INVALID>> wrote:
>> 
>> Hi devs,
>> 
>> I want to start a discussion on whether we want to have some SQL extensions 
>> in Iceberg that should help data engineers to invoke Iceberg-specific 
>> functionality through SQL. I know companies have this internally but I would 
>> like to unify this starting from Spark 3 and share the same syntax across 
>> query engines to have a consistent behavior.
>> 
>> I’ve put together a short doc: 
>> 
>> https://docs.google.com/document/d/1Nf8c16R2hj4lSc-4sQg4oiUUV_F4XqZKth1woEo6TN8
>>  
>> <https://docs.google.com/document/d/1Nf8c16R2hj4lSc-4sQg4oiUUV_F4XqZKth1woEo6TN8>
>> 
>> I’d appreciate everyone’s feedback. Please, feel free to comment and add 
>> alternatives.
>> 
>> Thanks,
>> Anton 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: [DISCUSS] SQL syntax extensions

Reply via email to