[PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

via GitHub Fri, 10 Jan 2025 10:14:56 -0800


dusantism-db opened a new pull request, #49445:
URL: https://github.com/apache/spark/pull/49445


   ### What changes were proposed in this pull request?
   This pull request introduces support for local variables in SQL scripting.
   
   #### Behavior:
   
   Local variables are declared in the headers of compound bodies, and are 
bound to it's scope. Variables of the same name are allowed in nested scopes, 
where the innermost variable will be resolved. Optionally, a local variable can 
be qualified with the label of the compound body in which it was declared, 
which would allow accessing variables which are not the innermost in the 
current scope.
   
   Local variables have resolution priority over session variables, session 
variable resolution is attempted after local variable resolution. The exception 
to this is with fully qualified session variables, in the format 
**system.session.<varName>** or `session.<varName>`. System and session are 
forbidden for use as compound body labels.
   
   Local variables must not be qualified on declaration, can be set using `SET 
VAR` and cannot be `DROPPED`.
   
   They also should not be allowed to be declared with **DECLARE OR REPLACE**, 
however this is not implemented on this PR as `FOR` statement relies on this 
behavior. `FOR` statement must be updated in a separate PR to use proper local 
variables, as the current implementation is simulating them using session 
variables.
   
   #### Implementation notes:
   
   As core depends on catalyst, it's impossible to import code from core(where 
most of SQL scripting implementation is located) to catalyst. To solve this a 
trait `VariableManager` is introduced, which is then implemented in core and 
injected to catalyst. This `VariableManager` is basically a wrapper around 
`SqlScriptingExecutionContext` and provides methods for 
getting/setting/creating variables.
   
   This injection is tricky because we want to have one 
`ScriptingVariableManager` **per script**.
   Options considered to achieve this are:
   - Pass the manager/context to the analyzer using function calls. If 
possible, this solution would be ideal because it would allow every run of the 
analyzer to have it's own scripting context which is automatically cleaned up 
(AnalysisContext). This would also allow more control over the variable 
resolution, i.e. for `EXECUTE IMMEDIATE` we could simply not pass in the script 
context and it would behave as if outside of a script. This is the intended 
behavior for `EXECUTE IMMEDIATE`. The problem with this approach is it seems 
hard to implement. The call stack would be as follows: 
`Analyzer.executeAndCheck` -> `HybridAnalyzer.apply` -> 
`**RuleExecutor**.executeAndTrack` -> `Analyzer.execute` (**overridden** from 
RuleExecutor) -> `Analyzer.withNewAnalysisContext`. Implementing this context 
propagation would require changing the signatures of all of these methods, 
including superclass methods like `execute` and `executeAndTrack`.
   - Store the context in `CatalogManager`. `CatalogManager's` lifetime is tied 
to the session, so to allow for multiple scripts to execute in the same time we 
would need to e.g. have a map `scriptUUID -> VariableManager`, and to have the 
`scriptUUID` as a `ThreadLocal` variable in the `CatalogManager`. The drawback 
of this approach is that the script has to clean up it's resources after 
execution, and also that it's more complicated to e.g. forbid `EXECUTE 
IMMEDIATE` from accessing local variables.
   
   I currently favor the second option, however I am open to suggestions on how 
to approach this.
   
   ### Why are the changes needed?
   Currently, local variables are simulated using session variables in SQL 
scripting, which is a temporary solution and bad in many ways.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, this change introduces multiple new types of errors.
   
   
   ### How was this patch tested?
   Tests were added to SqlScriptingExecutionSuite and SqlScriptingParserSuite.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]

Reply via email to