martin-g commented on code in PR #18981:
URL: https://github.com/apache/datafusion/pull/18981#discussion_r2574795222


##########
datafusion/common/src/hash_utils.rs:
##########
@@ -484,6 +484,107 @@ fn hash_fixed_list_array(
     Ok(())
 }
 
+#[cfg(not(feature = "force_hash_collisions"))]
+fn hash_run_array<R: RunEndIndexType>(
+    array: &RunArray<R>,
+    random_state: &RandomState,
+    hashes_buffer: &mut [u64],
+    rehash: bool,
+) -> Result<()> {
+    // We find the relevant runs that cover potentially sliced arrays, so we 
can only hash those
+    // values. Then we find the runs refer to the original runs and ensure 
that we apply hashes
+    // correctly to the sliced, whether sliced at the start, end, or both.
+    let array_offset = array.offset();
+    let array_len = array.len();
+
+    if array_len == 0 {
+        return Ok(());
+    }
+
+    let run_ends = array.run_ends();
+    let run_ends_values = run_ends.values();
+    let values = array.values();
+
+    let mut start_physical_index = 0;
+    let mut end_physical_index = run_ends_values.len();
+
+    for (physical_index, &run_end) in run_ends_values.iter().enumerate() {
+        if run_end.as_usize() > array_offset {
+            start_physical_index = physical_index;

Review Comment:
   You could also take the start and end indices with 
`array.get_start_physical_index()` and `array.get_end_physical_index()`, no ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to