Re: Computing multiple different aggregations over a match-set in one pass

2023-09-09 Thread Stefan Vodita
Hi everyone, I ended up using the idea of doing multiple aggregations in one go and it was a nice improvement. Maybe we can reconsider introducing this? I've opened an issue [1] and published a PR [2] based on the code I had previously shared, with some extra tests and a few improvements. Stefan

Re: Computing multiple different aggregations over a match-set in one pass

2023-03-06 Thread Greg Miller
Hi Stefan- I opened https://github.com/apache/lucene/issues/12190 where we can discuss this further. Thanks for raising the idea! Cheers, -Greg On Mon, Mar 6, 2023 at 7:21 AM Stefan Vodita wrote: > Hi Greg, > > The PR looks great. I think it's a useful feature to have and it helps > with the >

Re: Computing multiple different aggregations over a match-set in one pass

2023-03-06 Thread Stefan Vodita
Hi Greg, The PR looks great. I think it's a useful feature to have and it helps with the use-case we were discussing. I left a comment with some other ideas that I'd like to explore. Thank you for coding this up, Stefan On Sun, 5 Mar 2023 at 19:33, Greg Miller wrote: > > Hi Stefan- > > I cobble

Re: Computing multiple different aggregations over a match-set in one pass

2023-03-05 Thread Greg Miller
Hi Stefan- I cobbled together a draft PR that I _think_ is what you're looking for so we can have something to talk about. Please let me know if this misses the mark, or is what you had in mind. If so, we could open an issue to propose the idea of adding something like this. I'm not totally convin

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-24 Thread Stefan Vodita
Hi everyone, Greg and I discussed a bit offline. His assessment was right - I’m not looking to compute multiple values per ordinal as an end in itself. That is only a means to compute a single value which depends on other facet results. This section from the previous email explains it really well:

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-23 Thread Greg Miller
Thanks for the detailed benchmarking Stefan! I think you've demonstrated here that looping over the collected hits multiple times does in fact add meaningful overhead. That's interesting to see! As for whether-or-not to add functionality to the facets module that supports this, I'm not convinced a

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-17 Thread Stefan Vodita
After benchmarking my implementation against the existing one, I think there is some meaningful overhead. I built a small driver [1] that runs the two solutions over a geo data [2] index (thank you Greg for providing the indexing code!). The table below lists faceting times in milliseconds. I’ve n

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-17 Thread Greg Miller
Thanks for the follow up Stefan. If you find significant overhead associated with the multiple iterations, please keep challenging the current approach and suggest improvements. It's always good to revisit these things! Cheers, -Greg On Thu, Feb 16, 2023 at 1:32 PM Stefan Vodita wrote: > Hi Gre

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-16 Thread Stefan Vodita
Hi Greg, To better understand how much work gets duplicated, I went ahead and modified FloatTaxonomyFacets as an example [1]. It doesn't look too pretty, but it illustrates how I think multiple aggregations in one iteration could work. Overall, you're right, there's not as much wasted work as I h

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-15 Thread Greg Miller
Hi Stefan- > In that case, iterating twice duplicates most of the work, correct? I'm not sure I'd agree that it duplicates "most" of the work. This is an association faceting example, which is a little bit of a special case in some ways. But, to your question, there is duplicated work here of re-

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-14 Thread Stefan Vodita
Hi Greg, I see now where my example didn’t give enough info. In my mind, `Genre / Author nationality / Author name` is stored in one hierarchical facet field. The data we’re aggregating over, like publish date or price, are stored in DocValues. The demo package shows something similar [1], where

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-13 Thread Greg Miller
Hi Stefan- That helps, thanks. I'm a bit confused about where you're concerned with iterating over the match set multiple times. Is this a situation where the ordinals you want to facet over are stored in different index fields, so you have to create multiple Facets instances (one per field) to co

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-11 Thread Stefan Vodita
Hi Greg, I’m assuming we have one match-set which was not constrained by any of the categories we want to aggregate over, so it may have books by Mark Twain, books by American authors, and sci-fi books. Maybe we can imagine we obtained it by searching for a keyword, say “Washington”, which is pre

Re: Computing multiple different aggregations over a match-set in one pass

2023-02-10 Thread Greg Miller
Hi Stefan- Can you clarify your example a little bit? It sounds like you want to facet over three different match sets (one constrained by "Mark Twain" as the author, one constrained by "American authors" and one constrained by the "sci-fi" genre). Is that correct? Cheers, -Greg On Fri, Feb 10,