I wonder if Memory itself could be that layer? On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <jad.na...@imply.io> wrote:
> We could build an abstraction layer on top of the memory interface provided > by DataSketches. When the JDK gets the new stuff, we can just change the > implementation of the abstraction. > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <g...@apache.org> wrote: > > > The thing that worries me about JEP 370 is that if historical Java user > > migration patterns hold up, we will need to support Java 11 for a while > > (probably another 2–3 years), and we would therefore need to wait that > long > > to use JEP 370. It seems like a long time and until then we would be > stuck > > with a pretty inferior API. > > > > I also would prefer not having to rewrite code a bunch of times, but > that's > > why I suggested starting by using Memory for the VectorAggregator > interface > > and stuff that interacts with it. There isn't that much code there yet > > (only a few aggregators implement VectorAggregator). So we will need to > > write most of it for the first time, and since it is fresh code, I think > > it'd be nice to use the best API currently available in Java 8 / 11. From > > what I see that is Memory. > > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> wrote: > > > > > Hi Gian, > > > Thanks for bringing this up. > > > IMO for the long run and looking at how much code will have to change, > it > > > makes more sense to rely on JDK based API JEP 370 and have this work > done > > > ONCE as oppose to multiple iteration. FYI i do not think it is far > away, > > > seems like there is a good momentum around it. > > > This does not exclude or means we should not use Memory API for other > > stuff > > > like sketches et al, in fact i think even for project like Sketches it > > > makes more sense to move to newer API offered by the JDK rather that do > > it > > > your self. > > > > > > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <g...@apache.org> wrote: > > > > > > > Hey Druids, > > > > > > > > There has generally been a lot of talk about moving away from > > ByteBuffer > > > > and towards the DataSketches Memory package ( > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) or > > even > > > > using Unsafe directly. Much of that discussion happened on > > > > https://github.com/apache/druid/issues/3892. > > > > > > > > Recently a patch was merged that added datasketches-memory as a > > > dependency > > > > of druid-processing: https://github.com/apache/druid/pull/9308. The > > > reason > > > > was partially due to better performance and partially due to nicer > API > > > > (both reasons mentioned in #3892 as well). > > > > > > > > JEP 370 is a potential long term solution but it seems a while away > > from > > > > being ready: https://openjdk.java.net/jeps/370 > > > > > > > > I wanted to bring the larger discussion back up and see what people > > think > > > > is a good path forward. > > > > > > > > My suggestion is that we migrate the VectorAggregator interface to > use > > > > Memory, but keep BufferAggregator the way it is. That way, as we > build > > > out > > > > support for vectorization (right now, only timeseries/groupby support > > it, > > > > and only a few aggregators, but we should be building this out) we'll > > be > > > > doing it with a nicer and potentially faster API. But we won't need > to > > go > > > > back and redo a bunch of old code, since we'll keep the > non-vectorized > > > code > > > > paths the way they are. (And hopefully, one day, delete them all > > > outright.) > > > > > > > > Gian > > > > > > > > > >