Re: [PR] feat: optimise copying in `left` for Utf8 and LargeUtf8 [datafusion]

via GitHub Sun, 25 Jan 2026 05:39:17 -0800


theirix commented on PR #19980:
URL: https://github.com/apache/datafusion/pull/19980#issuecomment-3796678559


   Thank you for the review!
    
   > Could you help me understand which changes here make it O(1)?
   
   It's for memory complexity. We avoid an extra copy of the string into 
`chars_buf` and the collecting it back via `collect`, as suggested in the 
original PR. Now we just use byte slicing from the original string.
   
   I cannot say about time complexity  - it is improved, but not for all 
queries (`QQuery 1`). Since I cannot invoke benchmarks from the PR for the 
updated version, I'll try to set it up locally.
   
   > 
   > > For LargeUtf8 (`StringViewArray`), implement a zero-copy slice operation 
reusing the same Arrow buffers. It is possible for both views since the string 
only shrinks. We only need to tune a German prefix.
   > 
   > `LargeUtf8` and `Utf8View` are different types, so it is confusing to see 
them used interchangeably.
   
   My bad, corrected it in the description.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: optimise copying in `left` for Utf8 and LargeUtf8 [datafusion]

Reply via email to