Re: Optimizing term-occurrence counting (code included)

2020-09-21 Thread Michael McCandless
I left a comment on the issue. Mike McCandless http://blog.mikemccandless.com On Sun, Sep 20, 2020 at 1:08 PM Alex K wrote: > Hi all, I'm still a bit stuck on this particular issue.I posted an issue on > the Elastiknn repo outlining some measurements and thoughts on potential > solutions: htt

Re: Optimizing term-occurrence counting (code included)

2020-09-20 Thread Alex K
Hi all, I'm still a bit stuck on this particular issue.I posted an issue on the Elastiknn repo outlining some measurements and thoughts on potential solutions: https://github.com/alexklibisz/elastiknn/issues/160 To restate the question: Is there a known optimal way to find and count docs matching

Re: Optimizing term-occurrence counting (code included)

2020-07-24 Thread Alex K
Thanks Ali. I don't think that will work in this case, since the data I'm counting is managed by lucene, but that looks like an interesting project. -Alex On Fri, Jul 24, 2020, 00:15 Ali Akhtar wrote: > I'm new to lucene so I'm not sure what the best way of speeding this up in > Lucene is, but I

Re: Optimizing term-occurrence counting (code included)

2020-07-23 Thread Ali Akhtar
I'm new to lucene so I'm not sure what the best way of speeding this up in Lucene is, but I've previously used https://github.com/npgall/cqengine for similar stuff. It provided really good performance, especially if you're just counting things. On Fri, Jul 24, 2020 at 6:55 AM Alex K wrote: > Hi

Optimizing term-occurrence counting (code included)

2020-07-23 Thread Alex K
Hi all, I am working on a query that takes a set of terms, finds all documents containing at least one of those terms, computes a subset of candidate docs with the most matching terms, and applies a user-provided scoring function to each of the candidate docs Simple example of the query: - query