chris-twiner commented on PR #34558: URL: https://github.com/apache/spark/pull/34558#issuecomment-2722268943
> > > @Kimahriman just out of curiosity, how much did the performance improve? > > > > > > I just wanted to add to the above response that I've implemented a compilation scheme [here](https://sparkutils.github.io/quality/latest/advanced/userFunctions/#controlling-compilation-tweaking-the-quality-optimisations), as part of Quality, and we saw perf boosts of up to 40%, after that adding further lambdas triggered the cost of code generation being higher than the saving. It's definitely usage dependant though, the more work done in the function the higher the cost (and therefore potential saving by compilation), a small boost is noticeable on removal of the atomic under similar ideal circumstances. > > edit - [the source](https://github.com/sparkutils/quality/blob/main/src/main/scala/org/apache/spark/sql/qualityFunctions/LambdaCompilation.scala) > > Not sure how I missed this comment. we haven't done extensive performance comparisons with and without this, we've just been using it for a few years now. It's hard to quantify the impact since it's completely dependent on the expressions run inside the functions. But that's also the whole point, by enabling codegen for HOFs you enable codegen for expressions inside the lambda functions, which are assumed to be more performant since that's the whole point of codegen. > > Additionally this enables a follow on I'm currently working on which is enabling subexpression elimination inside of lambda functions, which we've recently identified as a major performance killer for us, as it's very easy to generate a lot of duplicate expression evaluations in certain cases The subexpression elimination option is huge! Very exciting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org