Re: NRT segment replication in AWS

2025-03-02 Thread Marc Davenport
, it could manage its own > dense > > sequence numbers. > > 3. A searcher is "sticky" to a writer, and periodically issues an S3 > > GetObject for the next metadata object's full URL (i.e. the URL using the > > next dense sequence number). Until the next checkpoin

NRT segment replication in AWS

2025-02-26 Thread Marc Davenport
Hello, Our current search solution is a pretty big monolith running on pretty beefy EC2 instances. Every node is responsible for indexing and serving queries. We want to start decomposing our service and are starting with separating the indexing and query handling responsibilities. I'm in the re

Re: Learning resources for Lucene Development

2024-10-15 Thread Marc Davenport
h > > > On Tue, Oct 8, 2024 at 7:46 PM Navneet Verma > wrote: > > > +1 on the question. > > > > On Tue, Oct 8, 2024 at 6:35 PM Marc Davenport > > wrote: > > > > > Hello, > > > I had this question buried in a previous email. I feel like I

Learning resources for Lucene Development

2024-10-08 Thread Marc Davenport
Hello, I had this question buried in a previous email. I feel like I have a very loose grasp on the Lucene API and how to properly implement with it. I'm working on code that I didn't write myself from the ground up. Since I'm learning as I'm reading it, I can only assume things were done right.

Re: Facet Count strategies and common errors

2024-10-08 Thread Marc Davenport
> field. Not sure if that fits your use case, but it is a typical user > interaction when searching and filtering by facets. > > > > > > On Tue, Oct 8, 2024, 17:29 Marc Davenport .invalid> > wrote: > > > Thanks Stefan, > > > > I will look into the

Re: Facet Count strategies and common errors

2024-10-08 Thread Marc Davenport
It facets at match-time > and is > generally faster than the faceting we had before 9.12. > > Stefan > > [1] > > https://github.com/apache/lucene/tree/main/lucene/demo/src/java/org/apache/lucene/demo/facet > [2] https://github.com/apache/lucene/pull/13568 > > > On

Facet Count strategies and common errors

2024-09-30 Thread Marc Davenport
I've been looking at the way our code gets the facet counts from Lucene and see if there are some obvious inefficiencies. We have about 60 normal flat facets, some of which are multi-valued, and 5 or so hierarchical and multi-valued facets. I'm seeing cases where the call to create a FastTaxonomyF

Re: KnnQueries and result discrepancy between indexes with the same data

2024-09-12 Thread Marc Davenport
ly and in the same order, then I > > believe that you would get the same results. But we consider this an > > implementation detail rather than a guarantee that Lucene should have. > > > > On Thu, Sep 12, 2024 at 7:03 PM Marc Davenport > > wrote: > > > > > H

KnnQueries and result discrepancy between indexes with the same data

2024-09-12 Thread Marc Davenport
Hello, I've been working on this personalization project using KNN queries and I have a couple questions but one is more pressing for me than the others. 1) Inconsistency between index instances: All of the same documents are loaded into different indexes. They may be loaded in different order, bu

Re: Converting docid to uid

2024-09-11 Thread Marc Davenport
ty efficient and a cache wouldn't > likely win you very much and just lead to trouble, > > On Mon, Aug 5, 2024 at 12:08 PM Marc Davenport > wrote: > > > > Hello, > > Right now our implementation retrieves our UID for our records from the > > topdocs by calling Inde

Converting docid to uid

2024-08-05 Thread Marc Davenport
Hello, Right now our implementation retrieves our UID for our records from the topdocs by calling IndexSearcher.doc(docid, fieldToLoad) (Deprecated) with the UID as the only field. I'm looking to replace this with the appropriate call to IndexSearcher.storedFields(). This feels a little inefficie

KnnFloatVectorQuery: filtering query & rewrite

2024-05-15 Thread Marc Davenport
Hello, I'm exploring some personalization to our sort orders. If I have an original query q which is mostly just a set of term filters, and I want to sort those by distance between some float vector on the document and a supplied user vector. I only see one way to do this. I would create a new bool

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-26 Thread Marc Davenport
ems like you can use your own cache implementation, > similar > > to what you can see in tests - TestDirectoryTaxonomyWriter.java > > or TestConcurrentFacetedIndexing.java. This would > > allow you to plug in the previous implementation (or something even more > > fine-tun

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-22 Thread Marc Davenport
items and match the previous behavior. Marc On Fri, Apr 19, 2024 at 4:39 PM Marc Davenport wrote: > Hello, > Thanks for the leads. I haven't yet gone as far as doing a git bisect, but > I have found that the big jump in time is in the call to > facetsConfig.build(taxonomyWriter

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-19 Thread Marc Davenport
. It'll take some time to build but it's a > logarithmic bisection and you'd know for sure where the problem is. > > D. > > On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport > wrote: > > > Hi Adrien et al, > > I've been doing some investigation today and it

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Marc Davenport
se for this 2x regression. It would > be interesting to look at a profile. > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport > wrote: > > > Hello, > > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build > can > > now support Java 1

Indexing time increase moving from Lucene 8 to 9

2024-04-17 Thread Marc Davenport
Hello, I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build can now support Java 11. The quick first step of renaming packages and importing the new libraries has gone well. I'm even seeing a nice performance bump in our average query time. I am however seeing a dramatic increas

Performance changes within the Lucene 8 branch

2023-12-12 Thread Marc Davenport
returned? Thank you, Marc Davenport