chris-twiner commented on PR #34558:
URL: https://github.com/apache/spark/pull/34558#issuecomment-2722268943

   > > > @Kimahriman just out of curiosity, how much did the performance 
improve?
   > > 
   > > 
   > > I just wanted to add to the above response that I've implemented a 
compilation scheme 
[here](https://sparkutils.github.io/quality/latest/advanced/userFunctions/#controlling-compilation-tweaking-the-quality-optimisations),
 as part of Quality, and we saw perf boosts of up to 40%, after that adding 
further lambdas triggered the cost of code generation being higher than the 
saving. It's definitely usage dependant though, the more work done in the 
function the higher the cost (and therefore potential saving by compilation), a 
small boost is noticeable on removal of the atomic under similar ideal 
circumstances.
   > > edit - [the 
source](https://github.com/sparkutils/quality/blob/main/src/main/scala/org/apache/spark/sql/qualityFunctions/LambdaCompilation.scala)
   > 
   > Not sure how I missed this comment. we haven't done extensive performance 
comparisons with and without this, we've just been using it for a few years 
now. It's hard to quantify the impact since it's completely dependent on the 
expressions run inside the functions. But that's also the whole point, by 
enabling codegen for HOFs you enable codegen for expressions inside the lambda 
functions, which are assumed to be more performant since that's the whole point 
of codegen.
   > 
   > Additionally this enables a follow on I'm currently working on which is 
enabling subexpression elimination inside of lambda functions, which we've 
recently identified as a major performance killer for us, as it's very easy to 
generate a lot of duplicate expression evaluations in certain cases
   
   The subexpression elimination option is huge!  Very exciting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to