alamb commented on code in PR #15348: URL: https://github.com/apache/datafusion/pull/15348#discussion_r2008785345
########## datafusion/physical-plan/src/sorts/cursor.rs: ########## @@ -281,6 +281,33 @@ impl<T: ByteArrayType> CursorArray for GenericByteArray<T> { } } +impl CursorArray for StringViewArray { + type Values = StringViewArray; + fn values(&self) -> Self { + self.clone() + } +} + +impl CursorValues for StringViewArray { + fn len(&self) -> usize { + self.views().len() + } + + fn eq(l: &Self, l_idx: usize, r: &Self, r_idx: usize) -> bool { + unsafe { GenericByteViewArray::compare_unchecked(l, l_idx, r, r_idx).is_eq() } Review Comment: I agree it would be good to justify the use of unchecked (which I think is ok here) The docs say https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.compare_unchecked SO maybe the safety argument is mostly "The left/right_idx must within range of each array" It also seems like we need to be comparing the Null masks too 🤔 like checking if the values are null before comparing Given that this comparison is typically *the* hottest part of a merge operation maybe we should try using unchecked comparisions elswhere -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org