Lucene Index Cloud Replication

2019-07-03 Thread Michael Froh
and 3, based on S3 and DynamoDB, but I'd like to do it with interfaces that lend themselves to other implementations for blob and metadata storage. Is it worth opening a Jira issue for this? Is this something that would benefit the Lucene community? Thanks, Michael Froh

Re: PhraseQuery

2020-01-24 Thread Michael Froh
Did you check the Javadoc for PhraseQuery.Builder? https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/PhraseQuery.Builder.html Checking the source code, I see that the add method that takes a position argument will throw an IllegalArgumentException if you try to add a Term in a lo

Re: Scoring Across Multiple Fields

2020-01-27 Thread Michael Froh
Hi John, A TermQuery produces a scorer that can compute similarity for a given term value against a given field, in the context of the index, so as you say, it produces a score for one field. If you want to match a given term value across multiple fields, indeed you could use a BooleanQuery with

Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

2020-02-17 Thread Michael Froh
Hi Baris, The idea with PhraseWildcardQuery is that you can mix literal "exact" terms with "MultiTerms" (i.e. any subclass of MultiTermQuery). Using addTerm is for exact terms, while addMultiTerm is for things that may match a number of possible terms in the given position. If you want to search

Re: SingleTerm vs MultiTerm in PhraseWildCardQuery class in the sandbox Lucene

2020-02-18 Thread Michael Froh
-\\U0010"] } On Tue, 18 Feb 2020 at 13:52, wrote: > Michael and Forum,- > Thanks for thegreat explanations. > > one question please: > > why is PrefixQuery used instead of WildCardQuery in the below snippet? > > Best regards > > > On Feb 17, 2020, at

Re: What is the Lucene 8.4.1 equivalent for StandardAnalyzer.STOP_WORDS_SET

2020-02-24 Thread Michael Froh
Those words ( https://github.com/apache/lucene-solr/blob/releases/lucene-solr/7.3.1/lucene/core/src/java/org/apache/lucene/analysis/standard/StandardAnalyzer.java#L44-L49) have been moved to EnglishAnalyzer ( https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/lucene/analysis/comm

Re: How can I boost score of a document if two consecutive terms match

2020-11-02 Thread Michael Froh
Hi John, What you're looking for sounds like Solr's pf2 parameter (see https://lucene.apache.org/solr/guide/8_6/the-extended-dismax-query-parser.html#extended-dismax-parameters and https://lucene.apache.org/solr/guide/8_6/the-dismax-query-parser.html#pf-phrase-fields-parameter for details). Basic

Re: DisjunctionMinQuery

2023-11-08 Thread Michael Froh
Hi Marc, Can you clarify what the semantics of a DisjunctionMinQuery would be? Would you keep the score for the *lowest* scoring disjunct (plus some tiebreaker applied to the other matching disjuncts)? I'm trying to imagine how that would work compared to the classic DisMax use-case. Say I'm sear

Re: get distinct values from indexreader for given field

2023-11-28 Thread Michael Froh
Hello! Instead of MultiFields.getFields(), you can use MultiTerms.getTerms(reader, fieldname) to get the Terms instance. To decode your long / int values, you should be able to use LongPoint/IntPoint.unpack to write the values into an array: long[] val = new long[1]; // Assuming 1-D values LongP

Re: get distinct values from indexreader for given field

2023-11-28 Thread Michael Froh
}; lr.getPointValues(fieldname).intersect(collectingVisitor); } On Tue, Nov 28, 2023 at 1:42 PM Michael Froh wrote: > Hello! > > Instead of MultiFields.getFields(), you can use > MultiTerms.getTerms(reader, fieldname) to get the Terms instance. > > To decode y

Re: Updating document with IndexWriter#updateDocument doesn't seem to take effect

2024-08-09 Thread Michael Froh
Hi Wojtek, Thank you for linking to your test code! When you open an IndexReader, it is locked to the view of the Lucene directory at the time that it's opened. If you make changes, you'll need to open a new IndexReader before those changes are visible. I see that you tried creating a new IndexS

Re: Learning resources for Lucene Development

2024-10-09 Thread Michael Froh
Hi Marc, In some shameless self-promotion, I've written up some worked Lucene examples (maybe a little more focused on Lucene internals than best practices) over at https://github.com/msfroh/lucene-university. If you have anything you'd like to understand better, feel free to open issues there and

Re: Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Michael Froh
Hi Prashant, For your particular use-case, you probably don't need to join across multiple indices. Lucene is able to maintain multiple data structures per field, with the selection of data structures coming from attributes of the field's type. If you have a field that you want to return, but doe

Re: NRT segment replication in AWS

2025-03-03 Thread Michael Froh
On Sun, Mar 2, 2025 at 7:21 AM Marc Davenport wrote: > > @Michael - That second simpler architecture is very similar to what we are > considering; With the exception of a queue for announcing new > segments rather than a polling process. It is good to know that it's a > reasonable outline. You

Re: NRT segment replication in AWS

2025-02-26 Thread Michael Froh
Hi there, I'm happy to share some details about how Amazon Product Search does its segment replication. I haven't worked on Product Search in over three years, so anything that I remember is not particularly novel. Also, it's not really secret sauce -- I would have happily talked about it more in

Re: Synonym graph and multiple values

2025-03-25 Thread Michael Froh
This relates to the "position increment gap" for your analyzer and is configurable. If you check the JavaDoc for Analyzer#getPositionIncrementGap, it says: * Invoked before indexing a IndexableField instance if terms have already been added to that * field. This allows custom analyzers to p

Re: Sub-Graphs in Hnsw

2025-06-05 Thread Michael Froh
I'm wondering if this is the same idea that Kaival is proposing in https://github.com/apache/lucene/issues/14758 (Support multiple HNSW graphs backed by the same vectors). On Thu, Jun 5, 2025 at 11:32 AM Michael Sokolov wrote: > I do think there could be many interesting use cases for building >

Re: need help with JoinUitl.createJoinQuery() method

2025-07-23 Thread Michael Froh
It looks like your pk_p and pk_c fields aren't indexed -- they just have doc values. If you try making them KeywordFields instead (so they're indexed and have doc values), does it work? Also, the join module may be overkill for what you're trying to do, since it looks like you're indexing parent/