Hi Andrew,

> I wonder if you can describe at a higher level what you are doing that
requires so many allocations or rebuildings. The example you provide of
modifying the underlying offset pointer seems a little strange to me as I
thought one of the architectural goals of those structures was to be
immutable.

Sure. Vectorized processing kernels that I am using need a rebuild of buffers 
continuously. Various intermediate arrays are destroyed after I am done with 
it, which doesn't need to build intermediate arrays. For my case immutability 
shouldn't come with extra cost. And for most of the database systems, that is 
exactly how it is.

Architecture's goal is immutable structs. That is for sure. But, for some 
cases, you don't need immutability for intermediate results. Moreover, 
allocating immutable data once to working on it later is the right approach for 
some cases.


> It might also help to show/explain some examples of what types of
performance improvements would be enabled.

It is hard to show in a single email here. But I will try my best to explain; 
doing CAS for processed intermediate results and hoist operations for the same 
type of vectors, having hot and cold arrays, migrations, scratch pads are a 
couple of optimizations for it.

>  Depending on what exactly you are doing, I wonder if you could use the Rust
unsafe API for your advanced use-cases rather than having to extend arrow
itself.

For using unsafe, you need to be able to access to the pointer. In the example 
I have sent before, there is nothing that enables it. That is what exposed API 
is all about. With a feature gate making buffer fields public at will.

If we don't make it public, people need to copy codebase and make things public 
(or transmute) at their will and create new types to add their methods to do 
what I have explained. This is much more cumbersome than exposing pointers to 
outside with gates.

Since encapsulation exposure will always be feature gated and disabled by 
default, there is no harm to current immutability and encapsulation.

Best,
Mahmut


On Oct 10, 2020, 13:32, at 13:32, Andrew Lamb <al...@influxdata.com> wrote:
>Hi Mahmut,
>
>I wonder if you can describe at a higher level what you are doing that
>requires so many allocations or rebuildings. The example you provide of
>modifying the underlying offset pointer seems a little strange to me as
>I
>thought one of the architectural goals of  those structures was to be
>immutable.
>
>It might also help to show/explain some examples of what types of
>performance improvements would be enabled.
>
>Depending on what exactly you are doing, I wonder if you could use the
>Rust
>unsafe API for your advanced usecases rather than having to extend
>arrow
>itself.
>
>Andrew
>
>On Thu, Oct 8, 2020 at 9:10 AM vertexclique vertexclique <
>vertexcli...@gmail.com> wrote:
>
>> Hi;
>>
>> Let me start with my aim and how things are evolved in my mind.
>> Through extensive usage of Arrow API, I've realized that we are doing
>so
>> many unnecessary allocations and rebuilding for simple things like
>offset
>> changes. (At least that's what I am doing).
>>
>> That said, it is tough to make the tradeoff of iterator overhead in
>> reconstruction, and other extra bits come with the ArrayData and
>Array
>> construction. I see that tests are also so long because of the
>> reconstruction of the intermediate results.
>>
>> Use case 1, below code won't do something:
>>
>>         std::mem::swap(&mut child_data.offset(), &mut 40);
>>
>> Due to private fields, such as the simple operation mentioned above,
>that
>> will enable the developer for advanced cases, is blocked.
>>
>> I propose the following:
>>
>> There is a feature gate macro that exposes fields to enable doing
>this:
>>
>>         std::mem::swap(&mut child_data.offset, &mut 40);
>>
>> Macro will check the feature called `*exposed*` to enable conditional
>> compilation for fields.
>> This can be for anything. That said, we put a disclaimer in the
>README
>> about the exposed API that it shouldn't be used unless you know what
>you
>> are doing.
>>
>> An important part of this, that it will enable so many things from
>the
>> performance perspective. Which we can also internally use when the
>exposed
>> feature is enabled.
>>
>> What do you think of it? If you feel good about it, I want to
>incorporate
>> this into the codebase asap.
>>
>> Best,
>> Mahmut Bulut
>>

Reply via email to