findepi commented on issue #11413:
URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2348895869

   @wangrunji0408 thanks, that's valuable!
   
   I spend quite some time today benchmarking various hypothetical 
implementations where concatenation logic is maximally separated from columnar 
(scalar/array) handling.
   
   Here couple conclusions
   
   - hand-written[^2] `concat(a, b)` can easily be 18%[^3] faster than current 
variadic implementation
   - result array pre-sizing has very big impact on performance (~35%)
     - currently concat does this internally, together with doing the actual 
concatenation. 
       However, if we go the "simple functions" path, the sizing can also be 
adaptive, especially if function implementation is given some scratch space for 
doing the math for adaptivity 
https://github.com/apache/datafusion/issues/8051#issuecomment-2348333254
   - extracting logic to `fn concat(a: &str, b: &str) -> String` has x3-6 
performance hit (compared to writing to `MutableBuffer` directly). I still 
struggle to accept  that the compiler cannot inline that!
   - adapting `&MutableBuffer` to `&std::fmt::Write` with another throw-away 
struct is optimized away by the compiler. (and we also can rustly do `impl 
std::fmt::Write for MutableBuffer`). This means we should be able to provide 
zero-cost[^1] abstraction where function logic is implemented as
     ```rust
     concat(lhs: &str, rhs: &str, output: &mut impl std::fmt::Write)
     ```
     and the `output` writes directly back to `MutableBuffer` of the 
`StringArray` being constructed.
   - ```rust
     write!(output, "{}{}", a, b).unwrap()
     ```
     is significantly slower than
     ```rust
     output.write_str(a).unwrap();
     output.write_str(b).unwrap();
     ```
   - and
     ```rust
     format!("{}{}", a, b)
     ```
     is significantly slower than
     ```rust
     a.to_string() + b
     ``` 
   
   [^2]: for hand-written i was using DF's `StringArrayBuilder`, but 
interacting with it's `valueBuffer MutableBuffer` directly. So bare-arrow in 
practice. https://gist.github.com/findepi/e7b53f8e06cde083205125d83f7ec615
   [^3]: i was running benchmarks with `time cargo bench --bench concat` with 
slight modifications of the benchmark: two inputs arrays, no nulls since DF's 
`concat` has non-typical null handling which i didn't want to focus on
   [^1]: pre-sizing aside
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to