dusantism-db opened a new pull request, #49445: URL: https://github.com/apache/spark/pull/49445
### What changes were proposed in this pull request? This pull request introduces support for local variables in SQL scripting. #### Behavior: Local variables are declared in the headers of compound bodies, and are bound to it's scope. Variables of the same name are allowed in nested scopes, where the innermost variable will be resolved. Optionally, a local variable can be qualified with the label of the compound body in which it was declared, which would allow accessing variables which are not the innermost in the current scope. Local variables have resolution priority over session variables, session variable resolution is attempted after local variable resolution. The exception to this is with fully qualified session variables, in the format **system.session.<varName>** or `session.<varName>`. System and session are forbidden for use as compound body labels. Local variables must not be qualified on declaration, can be set using `SET VAR` and cannot be `DROPPED`. They also should not be allowed to be declared with **DECLARE OR REPLACE**, however this is not implemented on this PR as `FOR` statement relies on this behavior. `FOR` statement must be updated in a separate PR to use proper local variables, as the current implementation is simulating them using session variables. #### Implementation notes: As core depends on catalyst, it's impossible to import code from core(where most of SQL scripting implementation is located) to catalyst. To solve this a trait `VariableManager` is introduced, which is then implemented in core and injected to catalyst. This `VariableManager` is basically a wrapper around `SqlScriptingExecutionContext` and provides methods for getting/setting/creating variables. This injection is tricky because we want to have one `ScriptingVariableManager` **per script**. Options considered to achieve this are: - Pass the manager/context to the analyzer using function calls. If possible, this solution would be ideal because it would allow every run of the analyzer to have it's own scripting context which is automatically cleaned up (AnalysisContext). This would also allow more control over the variable resolution, i.e. for `EXECUTE IMMEDIATE` we could simply not pass in the script context and it would behave as if outside of a script. This is the intended behavior for `EXECUTE IMMEDIATE`. The problem with this approach is it seems hard to implement. The call stack would be as follows: `Analyzer.executeAndCheck` -> `HybridAnalyzer.apply` -> `**RuleExecutor**.executeAndTrack` -> `Analyzer.execute` (**overridden** from RuleExecutor) -> `Analyzer.withNewAnalysisContext`. Implementing this context propagation would require changing the signatures of all of these methods, including superclass methods like `execute` and `executeAndTrack`. - Store the context in `CatalogManager`. `CatalogManager's` lifetime is tied to the session, so to allow for multiple scripts to execute in the same time we would need to e.g. have a map `scriptUUID -> VariableManager`, and to have the `scriptUUID` as a `ThreadLocal` variable in the `CatalogManager`. The drawback of this approach is that the script has to clean up it's resources after execution, and also that it's more complicated to e.g. forbid `EXECUTE IMMEDIATE` from accessing local variables. I currently favor the second option, however I am open to suggestions on how to approach this. ### Why are the changes needed? Currently, local variables are simulated using session variables in SQL scripting, which is a temporary solution and bad in many ways. ### Does this PR introduce _any_ user-facing change? Yes, this change introduces multiple new types of errors. ### How was this patch tested? Tests were added to SqlScriptingExecutionSuite and SqlScriptingParserSuite. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org