Hi Paddy, Arrow Developers, I've given this some thought, and I preliminarily think that perhaps we can audit our use of unsafe and evaluate where we can remove it, propagate it upwards (and provide safe alternatives) or provide some safety to callers.
Looking at the 3 options that Paul Kernfeld raised in the linked JIRA: 1. Add in bounds checking so that we don't need to deal with unsafe at all. 2. Propagate the unsafes up through the code. 3. Maintain a safe and unsafe version of each function that is currently unsafe. I think bounds checking would hurt performance, an example being the changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In ARROW-4670, I believe we were able to get the compiler to auto-vectorise due to Array::value() avoiding bounds checks. In the case of compute, we are in control of the array length, and so we know that it's safe to skip bounds checking. I presume this would largely be the case in tabular-data use-cases (because we assert that arrows in a record batch meet certain criteria). >From a cursory glance, if we do find that we don't need explicit SIMD (still immature in Rust, I've found it difficult to implement in some cases), we could potentially reduce our unsafe count by around 20%. The flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd need to maybe adopt option 2 from Paul on IPC once we're done with the basics. We'd then mainly be left with bit manipulation and `Buffer` (which as as much unsafe as the fbs generated files). I think the API around buffer would depend on whether we're expecting (based on what can be done with buffers) this to be exposed to users beyond those using Arrow as a development platform. The above are some of my thoughts, but important's that I don't have a lot of experience with Rust, especially `unsafe` and the other dark corners of the language. Regards Neville On Fri, 10 Jan 2020 at 04:13, paddy horan <[email protected]> wrote: > Hi All, > > This time last year there was a brief discussion on the usage of unsafe in > Rust (a user on github raised the issue and I created the JIRA). [1] > > So far we mostly avoid unsafe in the public API's. The thinking here is > that Arrow is a "development platform", i.e. lower level that most > libraries, and library builders will want to avoid any performance hit of > bounds checking, etc. > > This is not typical in the Rust community where unsafe is a clear signal > that care is needed. Although it might clutter the API a little more I > would be in favor of having safe and unsafe variants of methods as needed. > For instance, "value" for array access would be changed to "value" and > "value_unchecked" where the latter is unsafe and does not perform bounds > checks. > > We don't have a huge number of libraries building on top of Arrow in Rust > at the moment so it seems like a good time, before 1.0, to decide on this > to avoid breaking changes to the public API in post 1.0. > > Thoughts? > > Paddy > > [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557 > >
