Tushar7012 commented on issue #20052:
URL: https://github.com/apache/datafusion/issues/20052#issuecomment-3813920291

   Hi @alamb , I'd like to work on this issue!
   
   ##  Root Cause Analysis
   The regression starts after PR #19674, which introduced name-based struct 
field matching. The key overhead comes from:
   1. **`cast_struct_column()` in `nested_struct.rs`** - performs field-by-name 
matching with recursive struct handling
   2. **`validate_struct_compatibility()`** - comprehensive compatibility 
checks on every struct cast
   3. **Additional validation in `ColumnarValue::cast_to`** - routes all struct 
casts through new logic
   
   ##  Proposed Approach
   1. **Profile extended tests** to identify which tests are most affected
   2. **Optimize hot paths**:
      - Add fast-path when source/target schemas are identical
      - Skip redundant re-validation when already verified at planning time
      - Consider `#[inline]` hints for frequently-called casting functions
   3. **Reduce overhead**:
      - Early bailout in `validate_struct_compatibility()` for identical types
      - Lazy evaluation for expensive field matching operations
     
   ##  Next Steps
   1. Set up local profiling to identify exact bottlenecks
   2. Compare test durations before/after PR #19674
   3. Submit targeted optimization PR
   Could I please be assigned to this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to