Thanks for the response Mahmut,

I don't think I have a lot more to add


On Sat, Oct 10, 2020 at 8:18 AM Vertexclique <vertexcli...@gmail.com> wrote:

> Hi Andrew,
>
> > I wonder if you can describe at a higher level what you are doing that
> requires so many allocations or rebuildings. The example you provide of
> modifying the underlying offset pointer seems a little strange to me as I
> thought one of the architectural goals of those structures was to be
> immutable.
>
> Sure. Vectorized processing kernels that I am using need a rebuild of
> buffers continuously. Various intermediate arrays are destroyed after I am
> done with it, which doesn't need to build intermediate arrays. For my case
> immutability shouldn't come with extra cost. And for most of the database
> systems, that is exactly how it is.
>
> Architecture's goal is immutable structs. That is for sure. But, for some
> cases, you don't need immutability for intermediate results. Moreover,
> allocating immutable data once to working on it later is the right approach
> for some cases.
>
>
> > It might also help to show/explain some examples of what types of
> performance improvements would be enabled.
>
> It is hard to show in a single email here. But I will try my best to
> explain; doing CAS for processed intermediate results and hoist operations
> for the same type of vectors, having hot and cold arrays, migrations,
> scratch pads are a couple of optimizations for it.
>
> >  Depending on what exactly you are doing, I wonder if you could use the
> Rust
> unsafe API for your advanced use-cases rather than having to extend arrow
> itself.
>
> For using unsafe, you need to be able to access to the pointer. In the
> example I have sent before, there is nothing that enables it. That is what
> exposed API is all about. With a feature gate making buffer fields public
> at will.
>
> If we don't make it public, people need to copy codebase and make things
> public (or transmute) at their will and create new types to add their
> methods to do what I have explained. This is much more cumbersome than
> exposing pointers to outside with gates.
>
> Since encapsulation exposure will always be feature gated and disabled by
> default, there is no harm to current immutability and encapsulation.
>
> Best,
> Mahmut
>
>
> On Oct 10, 2020, 13:32, at 13:32, Andrew Lamb <al...@influxdata.com>
> wrote:
> >Hi Mahmut,
> >
> >I wonder if you can describe at a higher level what you are doing that
> >requires so many allocations or rebuildings. The example you provide of
> >modifying the underlying offset pointer seems a little strange to me as
> >I
> >thought one of the architectural goals of  those structures was to be
> >immutable.
> >
> >It might also help to show/explain some examples of what types of
> >performance improvements would be enabled.
> >
> >Depending on what exactly you are doing, I wonder if you could use the
> >Rust
> >unsafe API for your advanced usecases rather than having to extend
> >arrow
> >itself.
> >
> >Andrew
> >
> >On Thu, Oct 8, 2020 at 9:10 AM vertexclique vertexclique <
> >vertexcli...@gmail.com> wrote:
> >
> >> Hi;
> >>
> >> Let me start with my aim and how things are evolved in my mind.
> >> Through extensive usage of Arrow API, I've realized that we are doing
> >so
> >> many unnecessary allocations and rebuilding for simple things like
> >offset
> >> changes. (At least that's what I am doing).
> >>
> >> That said, it is tough to make the tradeoff of iterator overhead in
> >> reconstruction, and other extra bits come with the ArrayData and
> >Array
> >> construction. I see that tests are also so long because of the
> >> reconstruction of the intermediate results.
> >>
> >> Use case 1, below code won't do something:
> >>
> >>         std::mem::swap(&mut child_data.offset(), &mut 40);
> >>
> >> Due to private fields, such as the simple operation mentioned above,
> >that
> >> will enable the developer for advanced cases, is blocked.
> >>
> >> I propose the following:
> >>
> >> There is a feature gate macro that exposes fields to enable doing
> >this:
> >>
> >>         std::mem::swap(&mut child_data.offset, &mut 40);
> >>
> >> Macro will check the feature called `*exposed*` to enable conditional
> >> compilation for fields.
> >> This can be for anything. That said, we put a disclaimer in the
> >README
> >> about the exposed API that it shouldn't be used unless you know what
> >you
> >> are doing.
> >>
> >> An important part of this, that it will enable so many things from
> >the
> >> performance perspective. Which we can also internally use when the
> >exposed
> >> feature is enabled.
> >>
> >> What do you think of it? If you feel good about it, I want to
> >incorporate
> >> this into the codebase asap.
> >>
> >> Best,
> >> Mahmut Bulut
> >>
>

Reply via email to