theirix commented on PR #19980:
URL: https://github.com/apache/datafusion/pull/19980#issuecomment-3796678559
Thank you for the review!
> Could you help me understand which changes here make it O(1)?
It's for memory complexity. We avoid an extra copy of the string into
`chars_buf` and the collecting it back via `collect`, as suggested in the
original PR. Now we just use byte slicing from the original string.
I cannot say about time complexity - it is improved, but not for all
queries (`QQuery 1`). Since I cannot invoke benchmarks from the PR for the
updated version, I'll try to set it up locally.
>
> > For LargeUtf8 (`StringViewArray`), implement a zero-copy slice operation
reusing the same Arrow buffers. It is possible for both views since the string
only shrinks. We only need to tune a German prefix.
>
> `LargeUtf8` and `Utf8View` are different types, so it is confusing to see
them used interchangeably.
My bad, corrected it in the description.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]