Yes,
I am set for what I need at the moment but since I went for a deepish dive into
the current API, and this has been a recurring use case over the year I would
extend a few proposals, for expanding Take:
1. Add support for packed indices - three avenues:
a) expand Datum.Kind to allow for PackedIndex: a sequence of individual
indices and ranges as in 3,7,1,[4-10),2,[30-100) which can be represented as an
Array<int>: { 3,7,1,4,-10,2,30,-100}
b) use and additional flag signaling the index argument (of any type
fungible to an int sequence) is in fact a packed index represented as above
c) have an explicit contention where, the type of the index is signed,
it is expected to be a packed index
2. Add explicit control for result chunk size: Since the result of Take is
typically (always?) allocated inside the kernel, we can and an argument that
specifies the size of each allocated chunk (in bytes or in rows - I lean toward
rows) , and that can be applied any Datum type of the values, not only
ChunkedArray.
What’s the best way to push this forward? Free discussion, votes, tickets? I am
happy to work on the actual solution once we agree on one (all of the above
should be fairly straight forward).
Cheers
Radu
> On Jun 27, 2020, at 6:23 PM, Wes McKinney <[email protected]> wrote:
>
> Efficiently assembling a selection from multiple arrays will require
> some care -- our current implementation of Take involving ChunkedArray
> arguments is not too efficient, and they will need some rewriting for
> efficiency at some point in the future. Using some combination of
> Concatenate and Take may yield a working solution but probably not a
> computationally optimal one
>
> On Fri, Jun 26, 2020 at 3:07 PM Antoine Pitrou <[email protected]> wrote:
>>
>> On Fri, 26 Jun 2020 13:56:26 -0400
>> Radu Teodorescu <[email protected]> wrote:
>>> Looks like Concatenate is my best bet if I am looking at putting together
>>> ranges, certainly doesn’t look as neatly packaged as Take, but this might
>>> be the right tool for this job.
>>
>> Yes, you could Slice the array and then Concatenate the slices.
>> Note that slicing will keep the entire buffers alive, not only the
>> range that's being sliced, so it might be suboptimal if you only
>> keep a small part of the original values.
>>
>> Regards
>>
>> Antoine.
>>
>>