zhuqi-lucas commented on code in PR #16630:
URL: https://github.com/apache/datafusion/pull/16630#discussion_r2179270018


##########
datafusion/physical-plan/src/sorts/cursor.rs:
##########
@@ -288,6 +288,64 @@ impl CursorArray for StringViewArray {
     }
 }
 
+/// Todo use arrow-rs side api after: 
<https://github.com/apache/arrow-rs/pull/7748> released
+/// Builds a 128-bit composite key for an inline value:
+///
+/// - High 96 bits: the inline data in big-endian byte order (for correct 
lexicographical sorting).
+/// - Low  32 bits: the length in big-endian byte order, acting as a 
tiebreaker so shorter strings
+///   (or those with fewer meaningful bytes) always numerically sort before 
longer ones.
+///
+/// This function extracts the length and the 12-byte inline string data from 
the raw
+/// little-endian `u128` representation, converts them to big-endian ordering, 
and packs them
+/// into a single `u128` value suitable for fast, branchless comparisons.
+///
+/// ### Why include length?
+///
+/// A pure 96-bit content comparison can’t distinguish between two values 
whose inline bytes
+/// compare equal—either because one is a true prefix of the other or because 
zero-padding
+/// hides extra bytes. By tucking the 32-bit length into the lower bits, a 
single `u128` compare
+/// handles both content and length in one go.
+///
+/// Example: comparing "bar" (3 bytes) vs "bar\0" (4 bytes)
+///
+/// | String     | Bytes 0–4 (length LE) | Bytes 4–16 (data + padding)    |
+/// |------------|-----------------------|---------------------------------|
+/// | `"bar"`   | `03 00 00 00`         | `62 61 72` + 9 × `00`           |
+/// | `"bar\0"`| `04 00 00 00`         | `62 61 72 00` + 8 × `00`        |
+///
+/// Both inline parts become `62 61 72 00…00`, so they tie on content. The 
length field
+/// then differentiates:
+///
+/// ```text
+/// key("bar")   = 0x0000000000000000000062617200000003
+/// key("bar\0") = 0x0000000000000000000062617200000004
+/// ⇒ key("bar") < key("bar\0")
+/// ```
+#[inline(always)]
+pub fn inline_key_fast(raw: u128) -> u128 {

Review Comment:
   Thank you @alamb for good point, addressed in latest PR!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to