andygrove opened a new pull request, #3670: URL: https://github.com/apache/datafusion-comet/pull/3670
## Which issue does this PR close? Performance optimization for native shuffle row-to-columnar conversion. ## Rationale for this change The `SparkUnsafeObject` trait previously used a one-size-fits-all approach for reading primitive values: creating a byte slice via `from_raw_parts`, then calling `from_le_bytes(slice.try_into().unwrap())`. This incurs unnecessary overhead from slice creation, `try_into`, and `unwrap` on every field access. The two implementors of `SparkUnsafeObject` have fundamentally different alignment guarantees: - **SparkUnsafeRow**: All field offsets are always 8-byte aligned. The JVM guarantees 8-byte alignment on the base address, `bitset_width` is a multiple of 8, and each field slot is 8 bytes. This means aligned `ptr::read()` is safe and optimal. - **SparkUnsafeArray**: The array base address may be unaligned when nested within a row's variable-length region (accessed via arbitrary byte offset), so `ptr::read_unaligned()` is required for correctness. ## What changes are included in this PR? - Move primitive accessor method implementations (`get_int`, `get_long`, `get_float`, `get_double`, etc.) out of the trait defaults and into each concrete `impl` block via a `impl_primitive_accessors!` macro parameterized on the read method (`read` vs `read_unaligned`). - `SparkUnsafeRow` uses `ptr::read()` (aligned) — avoids the `from_le_bytes` + slice overhead. - `SparkUnsafeArray` uses `ptr::read_unaligned()` — correct for potentially unaligned data. - Switch `is_null_at` and `set_not_null_at` in `SparkUnsafeRow` from `read_unaligned`/`write_unaligned` to aligned `read`/`write`, since the null bitset words are always at 8-byte aligned offsets within the row. ## How are these changes tested? Existing tests cover these code paths. The change is purely an optimization of pointer read methods — no behavioral change. `cargo clippy` and `cargo check` pass cleanly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
