Hi Paddy, Arrow Developers,

I've given this some thought, and I preliminarily think that perhaps we can
audit our use of unsafe and evaluate where we can remove it, propagate it
upwards (and provide safe alternatives) or provide some safety to callers.

Looking at the 3 options that Paul Kernfeld raised in the linked JIRA:


   1. Add in bounds checking so that we don't need to deal with unsafe at
   all.
   2. Propagate the unsafes up through the code.
   3. Maintain a safe and unsafe version of each function that is currently
   unsafe.


I think bounds checking would hurt performance, an example being the
changes introduced in https://issues.apache.org/jira/browse/ARROW-4670. In
ARROW-4670, I believe we were able to get the compiler to auto-vectorise
due to Array::value() avoiding bounds checks. In the case of compute, we
are in control of the array length, and so we know that it's safe to skip
bounds checking. I presume this would largely be the case in tabular-data
use-cases (because we assert that arrows in a record batch meet certain
criteria).

>From a cursory glance, if we do find that we don't need explicit SIMD
(still immature in Rust, I've found it difficult to implement in some
cases), we could potentially reduce our unsafe count by around 20%. The
flatbuffers generated files also introduce a lot of unsafe (~26%), so we'd
need to maybe adopt option 2 from Paul on IPC once we're done with the
basics.

We'd then mainly be left with bit manipulation and `Buffer` (which as as
much unsafe as the fbs generated files). I think the API around buffer
would depend on whether we're expecting (based on what can be done with
buffers) this to be exposed to users beyond those using Arrow as a
development platform.

The above are some of my thoughts, but important's that I don't have a lot
of experience with Rust, especially `unsafe` and the other dark corners of
the language.

Regards
Neville

On Fri, 10 Jan 2020 at 04:13, paddy horan <[email protected]> wrote:

> Hi All,
>
> This time last year there was a brief discussion on the usage of unsafe in
> Rust (a user on github raised the issue and I created the JIRA). [1]
>
> So far we mostly avoid unsafe in the public API's.  The thinking here is
> that Arrow is a "development platform", i.e. lower level that most
> libraries, and library builders will want to avoid any performance hit of
> bounds checking, etc.
>
> This is not typical in the Rust community where unsafe is a clear signal
> that care is needed.  Although it might clutter the API a little more I
> would be in favor of having safe and unsafe variants of methods as needed.
> For instance, "value" for array access would be changed to "value" and
> "value_unchecked" where the latter is unsafe and does not perform bounds
> checks.
>
> We don't have a huge number of libraries building on top of Arrow in Rust
> at the moment so it seems like a good time, before 1.0, to decide on this
> to avoid breaking changes to the public API in post 1.0.
>
> Thoughts?
>
> Paddy
>
> [1] https://issues.apache.org/jira/browse/ARROW-3776?filter=12343557
>
>

Reply via email to