Thanks for the response Mahmut, I don't think I have a lot more to add
On Sat, Oct 10, 2020 at 8:18 AM Vertexclique <vertexcli...@gmail.com> wrote: > Hi Andrew, > > > I wonder if you can describe at a higher level what you are doing that > requires so many allocations or rebuildings. The example you provide of > modifying the underlying offset pointer seems a little strange to me as I > thought one of the architectural goals of those structures was to be > immutable. > > Sure. Vectorized processing kernels that I am using need a rebuild of > buffers continuously. Various intermediate arrays are destroyed after I am > done with it, which doesn't need to build intermediate arrays. For my case > immutability shouldn't come with extra cost. And for most of the database > systems, that is exactly how it is. > > Architecture's goal is immutable structs. That is for sure. But, for some > cases, you don't need immutability for intermediate results. Moreover, > allocating immutable data once to working on it later is the right approach > for some cases. > > > > It might also help to show/explain some examples of what types of > performance improvements would be enabled. > > It is hard to show in a single email here. But I will try my best to > explain; doing CAS for processed intermediate results and hoist operations > for the same type of vectors, having hot and cold arrays, migrations, > scratch pads are a couple of optimizations for it. > > > Depending on what exactly you are doing, I wonder if you could use the > Rust > unsafe API for your advanced use-cases rather than having to extend arrow > itself. > > For using unsafe, you need to be able to access to the pointer. In the > example I have sent before, there is nothing that enables it. That is what > exposed API is all about. With a feature gate making buffer fields public > at will. > > If we don't make it public, people need to copy codebase and make things > public (or transmute) at their will and create new types to add their > methods to do what I have explained. This is much more cumbersome than > exposing pointers to outside with gates. > > Since encapsulation exposure will always be feature gated and disabled by > default, there is no harm to current immutability and encapsulation. > > Best, > Mahmut > > > On Oct 10, 2020, 13:32, at 13:32, Andrew Lamb <al...@influxdata.com> > wrote: > >Hi Mahmut, > > > >I wonder if you can describe at a higher level what you are doing that > >requires so many allocations or rebuildings. The example you provide of > >modifying the underlying offset pointer seems a little strange to me as > >I > >thought one of the architectural goals of those structures was to be > >immutable. > > > >It might also help to show/explain some examples of what types of > >performance improvements would be enabled. > > > >Depending on what exactly you are doing, I wonder if you could use the > >Rust > >unsafe API for your advanced usecases rather than having to extend > >arrow > >itself. > > > >Andrew > > > >On Thu, Oct 8, 2020 at 9:10 AM vertexclique vertexclique < > >vertexcli...@gmail.com> wrote: > > > >> Hi; > >> > >> Let me start with my aim and how things are evolved in my mind. > >> Through extensive usage of Arrow API, I've realized that we are doing > >so > >> many unnecessary allocations and rebuilding for simple things like > >offset > >> changes. (At least that's what I am doing). > >> > >> That said, it is tough to make the tradeoff of iterator overhead in > >> reconstruction, and other extra bits come with the ArrayData and > >Array > >> construction. I see that tests are also so long because of the > >> reconstruction of the intermediate results. > >> > >> Use case 1, below code won't do something: > >> > >> std::mem::swap(&mut child_data.offset(), &mut 40); > >> > >> Due to private fields, such as the simple operation mentioned above, > >that > >> will enable the developer for advanced cases, is blocked. > >> > >> I propose the following: > >> > >> There is a feature gate macro that exposes fields to enable doing > >this: > >> > >> std::mem::swap(&mut child_data.offset, &mut 40); > >> > >> Macro will check the feature called `*exposed*` to enable conditional > >> compilation for fields. > >> This can be for anything. That said, we put a disclaimer in the > >README > >> about the exposed API that it shouldn't be used unless you know what > >you > >> are doing. > >> > >> An important part of this, that it will enable so many things from > >the > >> performance perspective. Which we can also internally use when the > >exposed > >> feature is enabled. > >> > >> What do you think of it? If you feel good about it, I want to > >incorporate > >> this into the codebase asap. > >> > >> Best, > >> Mahmut Bulut > >> >