tobixdev opened a new issue, #17488: URL: https://github.com/apache/datafusion/issues/17488
### Describe the bug As part of trying to update [RDF Fusion](https://github.com/tobixdev/rdf-fusion) to DataFusion 50, we observed a significant performance regression for a query that makes use of a Nested Loop Join. [Original comment](https://github.com/apache/datafusion/issues/16799#issuecomment-3270869325) was in the Release issue for DataFusion 50. I think the regression foots on two points: - Apparently `build_row_join_batch` calls `ScalarValue::to_array_of_size` and creates a `UnionArray` which seems to be slow. - Furthermore, much more time seems to be spent during evaluation of expressions This could be related to https://github.com/apache/datafusion/pull/16996 @2010YOUY01 do you have a take on this? ### To Reproduce I don't have a reproducer with the DataFusion CLI. Below is a part of our execution plan that causes the problem. I know its tough to read without knowing the system and the filters are rather messy. If we cannot triage the regression with this information I can try to come up with a custom program. I think we need two ingredients: Union values and complex filter expressions. ``` NestedLoopJoinExec: join_type=Inner, filter=coalesce(coalesce(EBV(LT(join_proj_push_down_6@0, join_proj_push_down_8@2)), false) AND coalesce(EBV(GT(join_proj_push_down_7@1, join_proj_push_down_9@3)), false), false), projection=[product@0, productLabel@1] ``` Column Types: - `join_proj_push_down_6`: Large Union Type - `join_proj_push_down_8`: Large Union Type - `join_proj_push_down_7`: Large Union Type - `join_proj_push_down_9`: Large Union Type - `product`: `UInt32` - `productLabel`: `UInt32` ### Expected behavior Similar performance to DataFusion 49 ### Additional context Here is a flamegraph of the query sub plan on DataFusion 49 (Total Time: 4.2 ms): <img width="1633" height="646" alt="Image" src="https://github.com/user-attachments/assets/9233d33b-9dbf-4663-b93c-e703aa2e1efe" /> Here is a flamegraph of the query sub plan on DataFusion 50 (Total Time: 190.2 ms): <img width="1633" height="646" alt="Image" src="https://github.com/user-attachments/assets/d485d6f6-68a8-4b6d-ad06-e6b24ae5910b" /> There is also an interactive view on [CodSpeed](https://codspeed.io/tobixdev/rdf-fusion/branches/feature%2Fupdate-df-50?uri=bench%2Fbenches%2Fbsbm_explore.rs%3A%3Absbm_explore%3A%3Absbm_explore_10000_1_partition%3A%3ABSBM%2520Explore%252010000%2520%28target_partitions%3D1%29%2520-%2520Q5). You can switch between Base (DF 49) and Head (DF 50). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org