I have been studying JEP370 and indeed it looks promising. However, it is only just completed JEP stage and has not been reviewed by the JCP for possible creation of a JSR. Even assuming it flows smoothly through the JCP and JSR it is at least 2-3 years away if not longer.
Another concern I have is that this current proposal appears quite cumbersome. Creating a Memory Segment is easy enough, but to read and write into that segment appears to require declarations of every different object type and their offsets using VarHandles. Suppose you need to read and write a C-like "struct" consisting of tightly packed ints, longs, bytes, shorts, doubles, etc. Just the declarations of that structure would take pages of boiler-plate code. Now, suppose you have a dynamic struct that changes over time and evolves (This is exactly the situation we have with sketches)! Yikes! It is possible that the proposed "MemoryLayout" API may address this, but the current JEP is rather vague about how flexible it can be. I think that there is also a real risk that even if we wait for JEP370 to become reality, it still may not provide the flexibility that we need. We just don't know yet what form it will actually take. If JEP370 does actually appear, and is workable, we will be strongly motivated to replace our current use of Unsafe inside Memory with the newer API, and all of that could be behind the current Memory API. On 2020/02/06 20:33:18, Gian Merlino <g...@apache.org> wrote: > By the way, I just did a quick test of using Memory instead of ByteBuffer > in VectorAggregator with groupBy (hash grouping, 8-byte key, two > aggregators) and measured a 14% speedup. I think the speedup would be > higher with more aggregators or with array-based rather than hash-based > grouping (since in those cases, relatively more time is spent doing > aggregations). > > On Thu, Feb 6, 2020 at 11:32 AM Gian Merlino <g...@apache.org> wrote: > > > We could make an interface that is a subset of Memory and use that. But I > > think once the new JEP 370 stuff becomes mainstream it would be best to use > > it directly, since it offers a richer API and will probably become > > considered idiomatic Java over time. So we may still want to drop the > > abstraction later, once we drop support for Java pre-14. > > > > Separately, I think if we do build an abstraction layer here, we need to > > make sure the performance overhead is zero — it's important that the jvm be > > able to inline the underlying calls. > > > > > @Gian Merlino I think i am not 100% sure about the scope you are talking > > about, thought you are talking about > > https://github.com/apache/druid/issues/3892 and doing the work from > > bottom up (storage to processing ...) > > > > For now I'm only proposing using a non-BB API (whether it's Memory or our > > own abstraction) for the VectorAggregator interface and the query engines > > that use it. I don't see a strong motivation to tackle the storage layer at > > this time. The reason is that when I profile Druid queries I usually see > > the lions share of time being spend in query engines and aggregators, so I > > think that's what we need to tackle first from a performance perspective. > > > > Btw, a side note, this talk is great: > > https://www.youtube.com/watch?v=iwSCtxMbBLI > > > > On Wed, Feb 5, 2020 at 6:44 PM Slim Bouguerra <slim.bougue...@gmail.com> > > wrote: > > > >> @Jad Sounds good approach was thinking the same as well how can we expose > >> those using higher level API. > >> @Gian Merlino <g...@apache.org> I think i am not 100% sure about the > >> scope you are talking about, thought you are talking about > >> https://github.com/apache/druid/issues/3892 and doing the work from > >> bottom up (storage to processing ...) > >> Can you please highlight the scope and would love to help if you think > >> adding such abstraction will add an extra work overhead that you do want to > >> avoid. > >> > >> > >> On Wed, Feb 5, 2020 at 5:52 PM Jad Naous <jad.na...@imply.io> wrote: > >> > >>> But then you could never get rid of it... Ideally abstraction layers > >>> should > >>> be constrained to the actual uses within the system using them instead of > >>> making something that may be too general and would be hard to replace. > >>> > >>> On Wed, Feb 5, 2020, 4:03 PM Gian Merlino <g...@apache.org> wrote: > >>> > >>> > I wonder if Memory itself could be that layer? > >>> > > >>> > On Wed, Feb 5, 2020 at 10:03 AM Jad Naous <jad.na...@imply.io> wrote: > >>> > > >>> > > We could build an abstraction layer on top of the memory interface > >>> > provided > >>> > > by DataSketches. When the JDK gets the new stuff, we can just change > >>> the > >>> > > implementation of the abstraction. > >>> > > > >>> > > On Wed, Feb 5, 2020 at 9:43 AM Gian Merlino <g...@apache.org> wrote: > >>> > > > >>> > > > The thing that worries me about JEP 370 is that if historical Java > >>> user > >>> > > > migration patterns hold up, we will need to support Java 11 for a > >>> while > >>> > > > (probably another 2–3 years), and we would therefore need to wait > >>> that > >>> > > long > >>> > > > to use JEP 370. It seems like a long time and until then we would > >>> be > >>> > > stuck > >>> > > > with a pretty inferior API. > >>> > > > > >>> > > > I also would prefer not having to rewrite code a bunch of times, > >>> but > >>> > > that's > >>> > > > why I suggested starting by using Memory for the VectorAggregator > >>> > > interface > >>> > > > and stuff that interacts with it. There isn't that much code there > >>> yet > >>> > > > (only a few aggregators implement VectorAggregator). So we will > >>> need to > >>> > > > write most of it for the first time, and since it is fresh code, I > >>> > think > >>> > > > it'd be nice to use the best API currently available in Java 8 / > >>> 11. > >>> > From > >>> > > > what I see that is Memory. > >>> > > > > >>> > > > On Wed, Feb 5, 2020 at 9:21 AM Slim Bouguerra <bs...@apache.org> > >>> > wrote: > >>> > > > > >>> > > > > Hi Gian, > >>> > > > > Thanks for bringing this up. > >>> > > > > IMO for the long run and looking at how much code will have to > >>> > change, > >>> > > it > >>> > > > > makes more sense to rely on JDK based API JEP 370 and have this > >>> work > >>> > > done > >>> > > > > ONCE as oppose to multiple iteration. FYI i do not think it is > >>> far > >>> > > away, > >>> > > > > seems like there is a good momentum around it. > >>> > > > > This does not exclude or means we should not use Memory API for > >>> other > >>> > > > stuff > >>> > > > > like sketches et al, in fact i think even for project like > >>> Sketches > >>> > it > >>> > > > > makes more sense to move to newer API offered by the JDK rather > >>> that > >>> > do > >>> > > > it > >>> > > > > your self. > >>> > > > > > >>> > > > > > >>> > > > > On Tue, Feb 4, 2020 at 10:12 PM Gian Merlino <g...@apache.org> > >>> > wrote: > >>> > > > > > >>> > > > > > Hey Druids, > >>> > > > > > > >>> > > > > > There has generally been a lot of talk about moving away from > >>> > > > ByteBuffer > >>> > > > > > and towards the DataSketches Memory package ( > >>> > > > > > https://datasketches.apache.org/docs/Memory/MemoryPackage.html) > >>> or > >>> > > > even > >>> > > > > > using Unsafe directly. Much of that discussion happened on > >>> > > > > > https://github.com/apache/druid/issues/3892. > >>> > > > > > > >>> > > > > > Recently a patch was merged that added datasketches-memory as a > >>> > > > > dependency > >>> > > > > > of druid-processing: https://github.com/apache/druid/pull/9308 > >>> . > >>> > The > >>> > > > > reason > >>> > > > > > was partially due to better performance and partially due to > >>> nicer > >>> > > API > >>> > > > > > (both reasons mentioned in #3892 as well). > >>> > > > > > > >>> > > > > > JEP 370 is a potential long term solution but it seems a while > >>> away > >>> > > > from > >>> > > > > > being ready: https://openjdk.java.net/jeps/370 > >>> > > > > > > >>> > > > > > I wanted to bring the larger discussion back up and see what > >>> people > >>> > > > think > >>> > > > > > is a good path forward. > >>> > > > > > > >>> > > > > > My suggestion is that we migrate the VectorAggregator > >>> interface to > >>> > > use > >>> > > > > > Memory, but keep BufferAggregator the way it is. That way, as > >>> we > >>> > > build > >>> > > > > out > >>> > > > > > support for vectorization (right now, only timeseries/groupby > >>> > support > >>> > > > it, > >>> > > > > > and only a few aggregators, but we should be building this out) > >>> > we'll > >>> > > > be > >>> > > > > > doing it with a nicer and potentially faster API. But we won't > >>> need > >>> > > to > >>> > > > go > >>> > > > > > back and redo a bunch of old code, since we'll keep the > >>> > > non-vectorized > >>> > > > > code > >>> > > > > > paths the way they are. (And hopefully, one day, delete them > >>> all > >>> > > > > outright.) > >>> > > > > > > >>> > > > > > Gian > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> > >> > >> -- > >> > >> B-Slim > >> _______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______/\/\/\_______ > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org