Hi,

I have been working on Spark SQL Macros
https://github.com/hbutani/spark-sql-macros

Spark SQL Macros provide a capability to register custom functions into a
Spark Session that is similar to UDF Registration. The difference being
that the SQL Macros registration mechanism attempts to translate the
function body to an equivalent Spark catalyst Expression with
holes(MarcroArg catalyst expressions). Under the covers SQLMacro is a set
of scala blackbox macros that attempt to rewrite the scala function body
AST into an equivalent catalyst Expression.

The generated expression is encapsulated in a FunctionBuilder that is
registered in Spark's FunctionRegistry. Then any function invocation is
replaced by the equivalent catalyst Expression with the holes replaced by
the calling site arguments.

There are 2 potential performance benefits for replacing function calls
with native catalyst expressions:
- evaluation performance. Since we avoid the SerDe cost at the function
boundary.
- More importantly, since the plan has native catalyst expressions more
optimizations are possible.
  - For example see the taxRate example where discount calculation is
eliminated.
  - Pushdown of operations to Datsources has a huge impact. For example see
the
    Oracle SQL generated and  pushed when a macro is defined instead of a
UDF.

So this has potential benefits for developers of custom functions and
DataSources. I am looking for feedback from the community on potential use
cases and features to develop.

To use this functionality:
- build the jar by issuing sbt sql/assembly or download from the releases
page.
- the jar can be added into any Spark env.
- We have developed with spark-3.1.0 dependency.

The README provides more details, and examples.
This page (
https://github.com/hbutani/spark-sql-macros/wiki/Spark_SQL_Macro_examples)
provides even more examples.

regards,
Harish Butani.

Reply via email to