alamb opened a new issue, #13796:
URL: https://github.com/apache/datafusion/issues/13796

   ### Is your feature request related to a problem or challenge?
   
   Similarly to https://github.com/apache/datafusion/issues/13759, we found 
another potential code vulnerability from a security audit performed by 
InfluxData
   
   > The public method `with_capacity` of `StringArrayBuilder` accepts a 
parameter `item_capacity: usize`. It is used to create a `MutableBuffer` with a 
capacity calculated as `(item_capacity + 1)*size_of::<i32>()`, which can 
overflow undetected, leaving the buffer
   > too small. The subsequent unsafe call to push a value into the buffer can 
lead to an out-of-bounds memory access.
   >
   > In datafusion/functions/src/strings.rs, lines 125ff:
   ```rust
   impl StringArrayBuilder {
       pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self 
{
           let mut offsets_buffer = MutableBuffer::with_capacity((item_capacity 
+ 1) * size_of::<i32>());
           // SAFETY: the first offset value is definitely not going to exceed 
the bounds.
           unsafe { offsets_buffer.push_unchecked(0_i32) };
           Self {
               offsets_buffer,
               value_buffer:
               MutableBuffer::with_capacity(data_capacity),
           }
       }
   ```
   
   
   I analyzed the potential risk, and I agree there is a risk of memory 
unsafety but I do not think it is exploitable via DataFusion APIs. 
Specifically, the only callsites are:
   
   
https://github.com/apache/datafusion/blob/3ee9b3dfb6d9c4e95a93d694b6aaf5c21ab61354/datafusion/functions/src/string/concat_ws.rs#L228-L229
   
   The argument is taken from 
   
https://github.com/apache/datafusion/blob/3ee9b3dfb6d9c4e95a93d694b6aaf5c21ab61354/datafusion/functions/src/string/concat_ws.rs#L149-L150
   
   So to trigger this code you would have to provide an input record batch with 
`usize::MAX` (aka the maximum value of a 64bit unsigned integer)  rows (the 
default is 8k) which would require an allocation of 4 * usize::MAX bytes in the 
source array.  I don't think such an allocation is possible (it would be 
rejected by the OS)
   
   
   ### Reproducer showing segfault:
   Here is a test case I wrote:
   
   ```diff
   andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
   diff --git a/datafusion/functions/src/strings.rs 
b/datafusion/functions/src/strings.rs
   index d2fb5d585..8e7c05fa1 100644
   --- a/datafusion/functions/src/strings.rs
   +++ b/datafusion/functions/src/strings.rs
   @@ -422,3 +422,13 @@ impl ColumnarValueRef<'_> {
            }
        }
    }
   +
   +
   +#[cfg(test)]
   +mod test {
   +    use super::*;
   +    #[test]
   +    fn test_overflow() {
   +        let builder = StringArrayBuilder::with_capacity(usize::MAX, 
usize::MAX);
   +    }
   +}
   ```
   
   It must be run in release mode
   ```shell
   cargo test --release  -p datafusion-functions --lib -- overflow
   ```
   
   ```
   error: test failed, to rerun pass `-p datafusion-functions --lib`
   
   Caused by:
     process didn't exit successfully: 
`/Users/andrewlamb/Software/datafusion/target/release/deps/datafusion_functions-e3de98ad3d4dc27c
 overflow` (signal: 11, SIGSEGV: invalid memory reference)
   andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$
   ```
   
    
   
   
   ### Describe the solution you'd like
   
   > Recommendations: Check for integer overflow when calculating the capacity.
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to