blaginin commented on issue #14563:
URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2654717456

   I really like that idea, Bruce! I tried to break your branch, but everything 
seems to work 🙂 I think the issue was that on every rename, we tried to 
recursively normalize _every_ column for the query, which is very expensive. 
You could also potentially just normalize only your newly added columns and not 
touch the rest - if you avoid `normalize_col`, it'll be even faster.
   
   I think we can use this issue to make several nice optimizations that will 
complement each other:
   
   - The simplest one: do not call expensive operations when column is _already 
normalized_. The most obvious example is 
[this](https://github.com/apache/datafusion/compare/main...blaginin:datafusion:early-exit-on-normalization)
 (gives a 30% increase in your benchmarks; maybe I'll find other places as 
well).
   - Do not normalize columns (your PoC), which will boost benchmarks further. 
But it will still require normalization in some cases (`(..., true)` in your 
current 
[PR](https://github.com/apache/datafusion/compare/main...Omega359:arrow-datafusion:with_column_updates#diff-997707d7dfcac94032b84a25bc0010c62209bf767e3abc6580a55a0a97c19de2R1727)).
   - For those cases with normalization, we can make an improvement by reusing 
the existing projection (my PoC from yesterday). I think it's overall a good 
idea to keep the plan simple - we'll spend less time simplifying and executing 
it later 
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to