andygrove opened a new pull request, #3670:
URL: https://github.com/apache/datafusion-comet/pull/3670

   ## Which issue does this PR close?
   
   Performance optimization for native shuffle row-to-columnar conversion.
   
   ## Rationale for this change
   
   The `SparkUnsafeObject` trait previously used a one-size-fits-all approach 
for reading primitive values: creating a byte slice via `from_raw_parts`, then 
calling `from_le_bytes(slice.try_into().unwrap())`. This incurs unnecessary 
overhead from slice creation, `try_into`, and `unwrap` on every field access.
   
   The two implementors of `SparkUnsafeObject` have fundamentally different 
alignment guarantees:
   
   - **SparkUnsafeRow**: All field offsets are always 8-byte aligned. The JVM 
guarantees 8-byte alignment on the base address, `bitset_width` is a multiple 
of 8, and each field slot is 8 bytes. This means aligned `ptr::read()` is safe 
and optimal.
   - **SparkUnsafeArray**: The array base address may be unaligned when nested 
within a row's variable-length region (accessed via arbitrary byte offset), so 
`ptr::read_unaligned()` is required for correctness.
   
   ## What changes are included in this PR?
   
   - Move primitive accessor method implementations (`get_int`, `get_long`, 
`get_float`, `get_double`, etc.) out of the trait defaults and into each 
concrete `impl` block via a `impl_primitive_accessors!` macro parameterized on 
the read method (`read` vs `read_unaligned`).
   - `SparkUnsafeRow` uses `ptr::read()` (aligned) — avoids the `from_le_bytes` 
+ slice overhead.
   - `SparkUnsafeArray` uses `ptr::read_unaligned()` — correct for potentially 
unaligned data.
   - Switch `is_null_at` and `set_not_null_at` in `SparkUnsafeRow` from 
`read_unaligned`/`write_unaligned` to aligned `read`/`write`, since the null 
bitset words are always at 8-byte aligned offsets within the row.
   
   ## How are these changes tested?
   
   Existing tests cover these code paths. The change is purely an optimization 
of pointer read methods — no behavioral change. `cargo clippy` and `cargo 
check` pass cleanly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to