On Mon, Jun 29, 2020 at 9:33 AM Radu Teodorescu <radukay...@yahoo.com.invalid> wrote: > > Yes, > I am set for what I need at the moment but since I went for a deepish dive > into the current API, and this has been a recurring use case over the year I > would extend a few proposals, for expanding Take: > 1. Add support for packed indices - three avenues: > a) expand Datum.Kind to allow for PackedIndex: a sequence of > individual indices and ranges as in 3,7,1,[4-10),2,[30-100) which can be > represented as an Array<int>: { 3,7,1,4,-10,2,30,-100} > b) use and additional flag signaling the index argument (of any type > fungible to an int sequence) is in fact a packed index represented as above > c) have an explicit contention where, the type of the index is > signed, it is expected to be a packed index
This sounds like a new function altogether, not an expansion of the existing one. I don't think we should overload the existing compute::Take function nor add new types to Datum. I am not sure it needs to fit within the algebra of the kernels framework at the moment, but this can always be changed later if it makes sense. Much easier to add things than take them away. > 2. Add explicit control for result chunk size: Since the result of Take is > typically (always?) allocated inside the kernel, we can and an argument that > specifies the size of each allocated chunk (in bytes or in rows - I lean > toward rows) , and that can be applied any Datum type of the values, not only > ChunkedArray. This can be handled by options to the new function. > What’s the best way to push this forward? Free discussion, votes, tickets? I > am happy to work on the actual solution once we agree on one (all of the > above should be fairly straight forward). > Cheers > Radu > > > > On Jun 27, 2020, at 6:23 PM, Wes McKinney <wesmck...@gmail.com> wrote: > > > > Efficiently assembling a selection from multiple arrays will require > > some care -- our current implementation of Take involving ChunkedArray > > arguments is not too efficient, and they will need some rewriting for > > efficiency at some point in the future. Using some combination of > > Concatenate and Take may yield a working solution but probably not a > > computationally optimal one > > > > On Fri, Jun 26, 2020 at 3:07 PM Antoine Pitrou <solip...@pitrou.net> wrote: > >> > >> On Fri, 26 Jun 2020 13:56:26 -0400 > >> Radu Teodorescu <radukay...@yahoo.com.INVALID> wrote: > >>> Looks like Concatenate is my best bet if I am looking at putting together > >>> ranges, certainly doesn’t look as neatly packaged as Take, but this might > >>> be the right tool for this job. > >> > >> Yes, you could Slice the array and then Concatenate the slices. > >> Note that slicing will keep the entire buffers alive, not only the > >> range that's being sliced, so it might be suboptimal if you only > >> keep a small part of the original values. > >> > >> Regards > >> > >> Antoine. > >> > >> >