Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-04-04 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2767224780 This should now be resolved with the changes from https://github.com/apache/datafusion/pull/14653 -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-03-31 Thread via GitHub
Omega359 closed issue #14563: Perf: Dataframe with_column and with_column_renamed are slow URL: https://github.com/apache/datafusion/issues/14563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-12 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2654798363 I'll be honest - I'm pretty out of my element with these changes. I don't know what is 'correct behaviour' and what isn't here. My thinking for the changes in my current branch

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-12 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2654717456 I really like that idea, Bruce! I tried to break your branch, but everything seems to work 🙂 I think the issue was that on every rename, we tried to recursively normalize _ever

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652215840 Interesting. I tried a somewhat different approach - https://github.com/apache/datafusion/compare/main...Omega359:arrow-datafusion:with_column_updates It is much much fas

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652180854 Okay, so I think the issue is that with every `.with_column_renamed` / `.with_column` we add a new projection - that creates a lot of layers and each time adding a new one is m

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652057779 Stacktrace also may also help https://github.com/user-attachments/assets/83ea287f-5312-4624-bc70-3824fb55c203"; /> -- This is an automated message from the Ap

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
blaginin commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2652036656 A lot of `TreeNodeRecursion::visit_sibling`... may be related to https://github.com/apache/datafusion/issues/13748 ? ![Image](https://github.com/user-attachments/ass

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-11 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2651322116 In looking into this issue I have a question for the db experts that happen to be following this issue. The with_column code builds a `Vec` called fields in the dataframe

Re: [I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-10 Thread via GitHub
Omega359 commented on issue #14563: URL: https://github.com/apache/datafusion/issues/14563#issuecomment-2649003746 If someone would be so kind as to [generate a flamegraph](https://datafusion.apache.org/library-user-guide/profiling.html#example-flamegraph-for-a-benchmark) for the benchmark

[I] Perf: Dataframe with_column and with_column_renamed are slow [datafusion]

2025-02-09 Thread via GitHub
Omega359 opened a new issue, #14563: URL: https://github.com/apache/datafusion/issues/14563 ### Describe the bug Dataframe functions `.with_column` and `.with_column_renamed` (and possibly others) are slow. One can really see this in dataframe's with many many columns where a .with_c