lwz23 opened a new issue, #1141:
URL: https://github.com/apache/datafusion-comet/issues/1141

   ### Describe the bug
   
   The StringView::as_utf8_str function in its current implementation is 
unsound and may lead to undefined behavior (UB) under certain conditions. The 
function uses an unchecked raw pointer (ptr) and skips UTF-8 validation via 
from_utf8_unchecked. These unsafe operations can easily result in UB when the 
ptr field points to invalid memory or when the data at ptr is not valid UTF-8.
   
   ### Steps to reproduce
   
   ```
   pub struct StringView {
       pub len: u32,
       //pub prefix: [u8; STRING_VIEW_PREFIX_LEN],
       pub ptr: usize,
   }
   
   impl StringView {
       pub fn as_utf8_str(&self) -> &str {
           unsafe {
               let slice = std::slice::from_raw_parts(self.ptr as *const u8, 
self.len as usize);
               std::str::from_utf8_unchecked(slice)
           }
       }
   }
   
   fn main() {
       let invalid_ptr = 0xdeadbeef_usize; 
   
       let string_view = StringView {
           len: 10,
           ptr: invalid_ptr,
       };
   
       let result = string_view.as_utf8_str();
       println!("Result: {}", result);
   }
   ```
   
   ### Expected behavior
   
   The function should validate the pointer and ensure it points to valid 
memory. Additionally, it should verify that the data is valid UTF-8 before 
returning a &str.
   
   ### Additional context
   
   Note that I'm currently only emulating the implementation of this function, 
and since it doesn't appear to be published to crates.io, I commented out a 
field to facilitate triggering the UB. But considering that this data_type mod 
is not a pub mod, I am only reporting this issue for possible security risks, 
if in fact it is safe here, please don't mind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to