lwz23 opened a new issue, #1141:
URL: https://github.com/apache/datafusion-comet/issues/1141
### Describe the bug
The StringView::as_utf8_str function in its current implementation is
unsound and may lead to undefined behavior (UB) under certain conditions. The
function uses an unchecked raw pointer (ptr) and skips UTF-8 validation via
from_utf8_unchecked. These unsafe operations can easily result in UB when the
ptr field points to invalid memory or when the data at ptr is not valid UTF-8.
### Steps to reproduce
```
pub struct StringView {
pub len: u32,
//pub prefix: [u8; STRING_VIEW_PREFIX_LEN],
pub ptr: usize,
}
impl StringView {
pub fn as_utf8_str(&self) -> &str {
unsafe {
let slice = std::slice::from_raw_parts(self.ptr as *const u8,
self.len as usize);
std::str::from_utf8_unchecked(slice)
}
}
}
fn main() {
let invalid_ptr = 0xdeadbeef_usize;
let string_view = StringView {
len: 10,
ptr: invalid_ptr,
};
let result = string_view.as_utf8_str();
println!("Result: {}", result);
}
```
### Expected behavior
The function should validate the pointer and ensure it points to valid
memory. Additionally, it should verify that the data is valid UTF-8 before
returning a &str.
### Additional context
Note that I'm currently only emulating the implementation of this function,
and since it doesn't appear to be published to crates.io, I commented out a
field to facilitate triggering the UB. But considering that this data_type mod
is not a pub mod, I am only reporting this issue for possible security risks,
if in fact it is safe here, please don't mind.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]