Re: Lucene Slack Channel

2024-11-13 Thread Michael Wechner
I think you have to be committer of at least one Apache project https://infra.apache.org/committer-email.html HTH Michael Am 13.11.24 um 22:12 schrieb ashwini singh: Thanks . How can I get the apache.org email address ? Is there a policy for that ? On Mon, 4 Nov 2024 at 15:06, Michael

Re: Lucene Slack Channel

2024-11-04 Thread Michael Wechner
I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene to build a

Re: Lucas - Luke toolbox integration for IntelliJ

2024-06-06 Thread Michael Wechner
cool! Thanks, Michael Am 06.06.24 um 08:56 schrieb Balog Tamás: Dear Lucene Community, Since Tuesday, the IntelliJ plugin called [Lucas](https://plugins.jetbrains.com/plugin/24567-lucas) is available on the JetBrains Marketplace. It integrates / ports the Luke toolbox to the IntelliJ Platform

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-29 Thread Michael Wechner
he/lucene/pull/13197> support which takes the number of bits to use for quantizing as input. Since this change allows passing 1 for bits to be used for quantization, it looks to me like an enabler for binary quantization. - Shubham On Sun, Mar 24, 2024 at 4:34 AM Michael Wechner wrote: btw

Re: Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner
thanks, will try to get started asap :-) Am 26.03.24 um 15:37 schrieb Adrien Grand: GitHub issue or PR directly, whatever works best for you is going to work for us. On Tue, Mar 26, 2024 at 3:12 PM Michael Wechner wrote: Hi Adrien Cool, thanks for your quick feedback! Yes, IIUC it should

Re: Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner
addition. Plus it should be pretty easy to implement. This sounds like a good fit for a utility method on the TopDocs class? On Tue, Mar 26, 2024 at 2:54 PM Michael Wechner wrote: Hi IIUC Lucene does not contain a RRF implementation, for example to merge keyword/BM25 and vector search results

Support of RRF (Reciprocal Rank Fusion) by Lucene?

2024-03-26 Thread Michael Wechner
Hi IIUC Lucene does not contain a RRF implementation, for example to merge keyword/BM25 and vector search results, right? I think it would be nice to have within Lucene, WDYT? Thanks Michael - To unsubscribe, e-mail: java-u

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-23 Thread Michael Wechner
rQuantizedVectorsReader.html - https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner wrote: Hi Cohere recently announced there "compressed" embeddings https://twitter.com/N

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-22 Thread Michael Wechner
-L224C41 and it works very fine so far, whereas I have tested it with the Cohere int8 embeddings. Thanks Michael Am 20.03.24 um 06:56 schrieb Michael Wechner: Hi Shubham Thanks very much for your feedback! I will try it asap :-) Michael Am 19.03.24 um 21:57 schrieb Shubham Chaudhary: Hi

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner
ache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsReader.html - https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.html On Wed, Mar 20, 2024 at 1:54 AM Michael Wechner wrote: Hi Cohere

Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner
Hi Cohere recently announced there "compressed" embeddings https://twitter.com/Nils_Reimers/status/1769809006762037368 https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share&utm_medium=member_desktop Does Lucene Vector Search support th

Re: Right Way to Read vectors from Index

2024-02-12 Thread Michael Wechner
make sure the best way to read vectors for our case. @Michael - Yes Michael that’s the case here. Regards, Uthra On 12-Feb-2024, at 1:23 PM, Michael Wechner wrote: thanks for explainig, Uthra! IIUC the text / data for which the vector was originally generated was not changed, only some

Re: Right Way to Read vectors from Index

2024-02-11 Thread Michael Wechner
h its vector field without resending the vector in change data every time. The change data will consist of only “updated_field(s):value(s)” wherein I will read the vector value from Index to update the document. Thanks, Uthra On 09-Feb-2024, at 7:13 PM, Michael Wechner wrote: Can you describe

Re: Right Way to Read vectors from Index

2024-02-09 Thread Michael Wechner
Can you describe your use case in more detail (beyond having to read the vectors)? Thanks Michael Am 09.02.24 um 12:28 schrieb Uthra: Hi, Our project uses Lucene 9_7_0 and we have a requirement of frequent vector read operation from the Index for a set of documents. We tried two app

Re: hnsw parameters for vector search

2024-01-30 Thread Michael Wechner
Re your "second" question about suboptimal results, I think Nils Reimers explains quite nicely why this might happen, see for example https://www.youtube.com/watch?v=Abh3YCahyqU HTH Michael Am 30.01.24 um 15:48 schrieb Moll, Dr. Andreas: Hi, the hnsw documentation for the Lucene HnswGraph

Katie released as Open Source under the Apache License 2.0 using Lucene for full text and vector search by default

2024-01-24 Thread Michael Wechner
Hi Together Yesterday, Katie got released as Open Source under the Apache License 2.0 using Lucene for full text and vector search by default. You can find the code on GitHub https://github.com/wyona/katie-backend A very big thank you to everyone working on Lucene, to make this great search

Re: Azure AI Search uses Apache Lucene for full text search

2024-01-24 Thread Michael Wechner
time on it during the next couple of days and keep you posted once I will have gained more experience. Thanks Michael Am 22.01.24 um 09:06 schrieb Ali Akhtar: Sure, please share On Mon, Jan 22, 2024 at 1:33 AM Michael Wechner wrote: Hi I recently noticed, that Azure AI Search uses

Azure AI Search uses Apache Lucene for full text search

2024-01-21 Thread Michael Wechner
Hi I recently noticed, that Azure AI Search uses Apache Lucene for full text search https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture which I did not know so far, but I think it is very cool, that Microsoft is using Lucene. The doc

Re: Old codecs may only be used for reading

2024-01-11 Thread Michael Wechner
Hi Adrian, thank you very much for confirming quickly! I switched to Lucene99Codec and all looks good again :-) Thanks Michael Am 11.01.24 um 10:47 schrieb Adrien Grand: Hey Michael. Your understanding is correct. On Thu, Jan 11, 2024 at 10:46 AM Michael Wechner wrote: Hi I recently

Old codecs may only be used for reading

2024-01-11 Thread Michael Wechner
Hi I recently upgraded from Lucene 9.8.0 to Lucene 9.9.1 and noticed that Lucene95Codec got moved to org.apache.lucene.backward_codecs.lucene95.Lucene95Codec When testing my code I received the following error message: "Old codecs may only be used for reading" Do I understand correctly, that

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-16 Thread Michael Wechner
of contract) and potential longer-term maintenance? Best regards, Tamás ---Tamás Balog Freelance JetBrains IDE Plugin Developer Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website Proton Mail biztonságos e-maillel küldve. 2023. november 13., hétfő 10:33 keltezéssel, Michael Wechner

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-16 Thread Michael Wechner
l compensation for this work (maybe under some kind of contract) and potential longer-term maintenance? Best regards, Tamás ---Tamás Balog Freelance JetBrains IDE Plugin Developer Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website Proton Mail biztonságos e-maillel küldve. 2023. nov

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-15 Thread Michael Wechner
Tamás ---Tamás Balog Freelance JetBrains IDE Plugin Developer Find me on: GitHub / JetBrains Marketplace / LinkedIn / Website Proton Mail biztonságos e-maillel küldve. 2023. november 13., hétfő 10:33 keltezéssel, Michael Wechner írta: Hi Tamas Can one download your plugin somewhere to test it? T

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-13 Thread Michael Wechner
Hi Tamas Can one download your plugin somewhere to test it? Thanks Michael Am 13.11.23 um 10:07 schrieb Balog Tamás: Hello everyone! I've been working on a proof of concept of creating an IntelliJ plugin from the Luke application and it reached a demoable state. If anyone of the Lucene c

Re: How to get terms of a particular field of a particular document

2023-11-13 Thread Michael Wechner
/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.html correctly, then one should add it. Thanks Michael Am 12.11.23 um 23:36 schrieb Michael Wechner: Thanks again, whereas I think I have found now what I wanted (without needing the Highlighter): IndexReader reader = DirectoryReader.o

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
here > https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159 > > > On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner > wrote: > >> Hi Mikhail >> >> Thank you

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
alyse it again, get all terms. Good Luck On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner wrote: HI IIUC I can get all terms of a particular field of an index with IndexReader reader = DirectoryReader.open(„index_directory"); List list = reader.leaves(); for (LeafReaderContext

How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
HI IIUC I can get all terms of a particular field of an index with IndexReader reader = DirectoryReader.open(„index_directory"); List list = reader.leaves(); for (LeafReaderContext lrc : list) { Terms terms = lrc.reader().terms(„field_name"); if (terms != null) { TermsEnum termsEn

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-08 Thread Michael Wechner
 * implement getKnnVectorsFormatForField() and return the wrapper with    other max dimension Reading indexes still works with unmodified default codec, you only need to set it for IndexWriter. When reading the actual codec is looked up by name. Uwe Am 07.11.2023 um 17:03 schrieb Michael Wechner: Hi Uwe

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-07 Thread Michael Wechner
nt vectors format like a delegator as descirbed before. The responsibility was shifted to the codec, because there may be better alternatives to HNSW that have different limits especially with regard to performance during merging and query response times, e.g. BKD trees. Uwe Am 19.10.2023 um

Re: When to use StringField and when to use FacetField for categorization?

2023-10-23 Thread Michael Wechner
probably have too many ways to do the same thing in the faceting module, and maybe our documentation could be a bit more helpful. Cheers, -Greg On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner wrote: thanks very much for this additional information, Marc! Am 20.10.23 um 20:30 schrieb Marc D

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
hich case, yes, you need to create a TaxonomyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner <

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
omyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner wrote: Hi Mike Thanks for your feedback! II

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
et by different points/levels of your hierarchy. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: > Hi > > I have found the following simple Facet Example > > > https://github

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
less http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of d

When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of documents I currently use StringField, e.g. doc1.add(new StringField("category", "bo

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
trees. Uwe Am 19.10.2023 um 10:53 schrieb Michael Wechner: I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was r

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the v

Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the vector dimension 1536 and received the following error Field[vector]vector's dimensions must be <= [1024]; got 1536 wheres this worked previously with the hack to override the vector di

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
that gives the missing information in 9.x Javadocs, too. Uwe Am 25.09.2023 um 11:02 schrieb Michael Wechner: you mean once per search request? I mean for example GET https://localhost:8080/search?q=Lucene and the following would be executed IndexReader reader = DirectoryReader

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
);    }    ```    Note that these StoredFields and TermVectors instances should only    be consumed in the thread where    they were acquired. For instance, it is illegal to share them across    threads. Uwe Am 25.09.2023 um 07:53 schrieb Michael Wechner: Hi Shubham Great, thank you very much

Re: How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
) - Shubham On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner wrote: Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc

How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) {     Document doc = indexReader.document(scoreDoc.doc); } How do I best replace document(int)? Thanks M

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@luce

Re: Top docs depend on value of K nearest neighbour

2023-08-04 Thread Michael Wechner
" KNN and can get caught in local minima (maxima?). Increasing K has, indirectly, the effect of expanding the search space because the minimum score in the priority score (score of the Kth item) is used as a threshold for deciding when to terminate the search On Wed, Aug 2, 2023 at 5:19 PM Michael We

Top docs depend on value of K nearest neighbour

2023-08-02 Thread Michael Wechner
Hi I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0 when doing vector search as follows: I have indexed about 200 vectors (dimension 768) I build the query as follows  Query query = new KnnFloatVectorQuery("vector-field-name", queryVector, k); and do the search as f

Re: [ANNOUNCE] Apache Lucene 9.6.0 released

2023-05-11 Thread Michael Wechner
Thank you very much for the release! Works very fine so far :-) All the best Michael Am 10.05.23 um 09:49 schrieb Alan Woodward: The Lucene PMC is pleased to announce the release of Apache Lucene 9.6.0. Apache Lucene is a high-performance, full-featured search engine library written entirely

Re: Vector Search on Lucene

2023-03-02 Thread Michael Wechner
Hi Marcos The indexing looks kind of Document doc =new Document(); float[] vector = getEmbedding(text); FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, VectorSimilarityFunction.COSINE); KnnVectorField vectorField =new KnnVectorField("my_vector_field", vector, vectorFi

Re: Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-11 Thread Michael Wechner
em with your vectors, very fast, only 500 calculations required, no HNSW or anything needed. Of course you could use a vector search instead of a BM25 search as the initial search to pull the top 500 hits too. So it could meet both use-cases and provide a really performant option for users that want

Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-10 Thread Michael Wechner
Hi I use the vector search of Lucene, whereas the embeddings I get from SentenceBERT for example. According to https://www.sbert.net/examples/applications/retrieve_rerank/README.html a re-ranking with a cross-encoder after the vector search (bi-encoding) can improve the ranking. Would it

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-15 Thread Michael Wechner
aybe it is easier to just contribute another metric as part of the source, than make it configurable dynamically with a custom implementation. Thanks Michael On Sat, Jan 14, 2023 at 6:04 PM Michael Wechner wrote: Hi Adrien Thanks for your feedback! Whereas I am not sure I fully understand wha

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner
vectors format that ignores the vector similarity configured on the field and uses its own. Le sam. 14 janv. 2023, 21:33, Michael Wechner a écrit : Hi IIUC Lucene currently supports VectorSimilarityFunction.COSINE VectorSimilarityFunction.DOT_PRODUCT VectorSimilarityFunction.EUCLIDEAN whereas s

Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner
Hi IIUC Lucene currently supports VectorSimilarityFunction.COSINE VectorSimilarityFunction.DOT_PRODUCT VectorSimilarityFunction.EUCLIDEAN whereas some embedding models have been trained with other metrics. Also see https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdi

Re: Question for SynonymQuery

2023-01-02 Thread Michael Wechner
r SynonymQuery; I have just used the standard QueryParser. Instead the synonym processing occurs in the indexing phase, which is not only simpler (one search pattern, one query), but also I think you would also find it gives you superior performance (because the synonym processing occurs once at indexing

Re: Question for SynonymQuery

2022-12-28 Thread Michael Wechner
Hi Anh The following Stackoverflow link might help https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene The following thread seems to confirm, that escaping the space with a backslash does not help https://lists.apache.org/list?java-u

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner
is high. Uwe Am 08.12.2022 um 11:44 schrieb Michael Wechner: My code at the moment is as follows: Directory dir = FSDirectory.open(Paths.get(vectorIndexPath)); IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath))); int numberOfDocsBeforeDeleting

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner
reader to see deletes from the indexwriter. Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar: Did you call this method before or after commit method? My wild guess would be that you can count deleted documents inside transaction only. On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner wrote: Hi I

What exactly returns IndexReader.numDeletedDocs()

2022-12-07 Thread Michael Wechner
Hi I am using Lucen 9.4.2 vector search and everything seems to work fine, except that when I delete some documents from the index, then the method https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs() always returns 0, whereas I would have expect

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
I have enhanced the FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-Whatisthedefaultrelevance/similarityimplementationofLucene? Hope it is ok like this :-) Thanks Michael Am 23.11.22 um 10:58 schrieb Michael Wechner: cool, thanks! Am 23.11.22 um 10:55 schrieb

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
cool, thanks! Am 23.11.22 um 10:55 schrieb Adrien Grand: This is correct. See IndexSearcher#getDefaultSimilarity(). On Wed, Nov 23, 2022 at 10:53 AM Michael Wechner wrote: Hi On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would like to add some notes, but to be sure I

The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
Hi On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would like to add some notes, but to be sure I don't write anything wrong I would like to ask whether the current default similarity implementation of Lucene is really BM25, right? as described at https://opensourceconnec

Will ApacheCon North America 2022 sessions also be published on YouTube?

2022-10-16 Thread Michael Wechner
Hi I just noticed that the ApacheCon Asia 2022 have been published on YouTube https://apachecon.com/ https://www.youtube.com/c/TheApacheFoundation/playlists Will this also happen for ApacheCon North America 2022? Thanks Michael

Re: Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
l comparison, but every choice is a compromise. We've known for centuries that "Odyous of olde been comparisonis, And of comparisonis engendyrd is haterede." On Sat, Oct 1, 2022 at 7:18 AM Michael Wechner wrote: Hi Together I just read the following article, where the author compares L

Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
Hi Together I just read the following article, where the author compares Lucene and Vespa re HSWN https://bergum.medium.com/will-new-vector-databases-dislodge-traditional-search-engines-b4fdb398fb43 What is your take on "comparing Lucene and Vespa re HSWN latency and recall"? Thanks Micha

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
HNSW parameters? If so, there is no better way than what you are doing. Le sam. 1 oct. 2022, 12:31, Michael Wechner a écrit : Hi Adrien Thank you very much for your help! That was it :-) I completely forgot that I set this somewhere hidden inside my code. I made a note in the pom file, such that I s

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
? Thanks Michael Am 01.10.22 um 08:06 schrieb Adrien Grand: I would guess that you are configuring your IndexWriterConfig with a "Lucene91Codec" instance. You need to replace it with a "Lucene94Codec" instance. Le sam. 1 oct. 2022, 06:12, Michael Wechner a écrit : Hi I hav

Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-09-30 Thread Michael Wechner
Hi I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but when I run and re-index my data using KnnVectorField, then I receive the following exception: java.lang.UnsupportedOperationException: Old codecs may only be used for reading     at org.apache.lucene.backward_codecs.l

Re: [ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Wechner
great, thank you very much! Just in time for ApacheCon :-) Am 01.10.22 um 00:09 schrieb Michael Sokolov: The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technolo

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
u can also pass a BooleanQuery with multiple terms or a combination of other queries, a numeric range,... or a fulltext query out of Lucene's query parsers. Uwe Am 31.08.2022 um 22:19 schrieb Michael Wechner: Hi Matt Thanks very much for your feedback! According to your links I will try C

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
BooleanQuery.Builder. As noted in TermsInSetQuery ( https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java#L62) multiple terms could be represented as a boolean query with Occur.SHOULD. ~Matt On Wed, Aug 31, 2022 at 11:15 AM Michael Wechner wrote

How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
Hi I am currently filtering a KnnVectorQuery as follows Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD, classification)); query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter); but it is not clear to me how I can filter for multiple terms. Should I subclass MultiTermQuery

Re: Multi-Value query test

2022-06-23 Thread Michael Wechner
Maybe I misunderstand the problem, but why don't you decouple showing the results from the results of the query? Am 23.06.22 um 14:03 schrieb Patrick Bernardina: How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all

Re: Auto-complete in Lucene

2022-05-25 Thread Michael Wechner
we are using  AnalyzingInfixSuggester but I would also be curious to know whether this is the best way :-) Thanks Michael Am 25.05.22 um 14:39 schrieb Anastasiya Tarasenko: Hi All, I have a question regarding auto-complete functionality in Lucene. On the StackOverflow the suggestion regardin

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-23 Thread Michael Wechner
AM Michael Wechner Hi Julie I got it running and it seems to work fine so far :-) Re an example for the demo package, I guess this would go here https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html and I thought of something like

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-20 Thread Michael Wechner
rch(query, k); Does that make sense to you? Thanks Michael Am 11.05.22 um 07:59 schrieb Michael Wechner: Hi Julie Cool, thanks! I try to apply it and if it works could create an example to the demo package. Will keep you posted :-) Thanks Michael Am 11.05.22 um 02:13 schrieb Julie Tibshi

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Michael Wechner
et us know if you run into any questions/ issues while trying it out! Julie On Mon, May 9, 2022 at 8:08 AM Michael Wechner wrote: sorry for the URLs below. I have tested Twilio SendGrid as outgoing server and it just rewrote the URLs https://issues.apache.org/jira/browse/SOLR-15947 https://issues

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
-summary.html which I was not aware of, but disabled the tracking now and hope it will be ok now. Thanks Michael Am 09.05.22 um 15:12 schrieb Michael Wechner: Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2C

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great I have found https://issues.apache.org/jira/browse/SOLR-15947 https://issues.apache.org/jira/browse/LUCENE-10382 and https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/packa

Re: New user questions about demo, downloads, and IRC

2022-04-26 Thread Michael Wechner
great, thanks! Am 26.04.22 um 21:48 schrieb Michael Sokolov: thanks, I fixed the doc! On Tue, Apr 26, 2022 at 9:13 AM Bridger Dyson-Smith wrote: Hi Michael - On Mon, Apr 25, 2022 at 5:38 PM Michael Wechner wrote: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
PR Thanks Michael Am 25.04.22 um 23:37 schrieb Michael Wechner: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-demo-9.

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-demo-9.1.0.jar I guess the documentation is not quite right. Re your second

Re: Need help on defining custom scorer in Lucene 9

2022-04-03 Thread Michael Wechner
Hi Lokesh IIUC each document (like for example a shop description) has a longitude and a latitude associated with. The user search input are some keywords and the the user's geo location. The keywords you use to search for the documents and the users's geo location you would like to use for

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-31 Thread Michael Wechner
for helping spread the word about Lucene's new vector search capabilities! On Thu, Mar 31, 2022 at 7:36 AM Michael Wechner wrote: ok :-) thanks! Anyway, if somebody would like to join re a "vector search" proposal, please let me know Michael Am 30.03.22 um 20:13 schrieb An

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
uld be reviewed independently and if there is another proposals that clashes, the abstract would help the program committee pick the one (or both) that's best suited for the audience. Good luck! -Anshum On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner wrote: Hi Together I would be interested

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner
Hi Yuxin Can you provide a concrete example of a query and a document/code snippet? Thanks Michael Am 20.12.21 um 03:06 schrieb Yuxin Liu: Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes usi

Re: Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
("contract search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); entities.add(new Item("claims management system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); suggester.build(new ItemIterator(entities.iterator())); ) I was

Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
Hi I recently started to use the Autosuggest/Autocomplete package as suggested by Robert https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html which works very fine, thanks again for your help :-) But it is not clear to me what are the best practices building a suggester us

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner
according to Lucene 8.10.1 suggest API If you know any simple, recent examples, please let me know Thanks Michael Am 08.10.21 um 21:40 schrieb Michael Wechner: Am 08.10.21 um 18:49 schrieb Michael Sokolov: Thank you for offering to add to the FAQ! Indeed it should mention the suggester

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner
Michael On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner wrote: Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense

Re: Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
really want to just search on prefixes and jumble up the results (perhaps because you are gonna just sort by some custom document feature instead of relevance), then you can do that if you really want. You can use the n-gram/edge-ngram/shingle filters in the analysis package for that. On Wed, Oct 6, 2

Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Hi I am trying to implement a search with Lucene similar to what for example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are offering, that with every new letter typed a new search is being executed. For example when I type "tes", then all documents are being returned contain

Re: hello~~i have a question

2021-08-02 Thread Michael Wechner
I don't know either, whereas I searched  a little and found various good explanations what segments are, e.g. https://www.alibabacloud.com/blog/analysis-of-lucene---basic-concepts_594672 but not in which order the segments are being read. I am nore sure where in the code the segments are being

Re: Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
cool, thanks very much for your quick response and updating the FAQ! Am 17.06.21 um 10:28 schrieb Adrien Grand: Good catch Michael, removing from IndexReader has actually been removed a long time ago. I just edited the FAQ to correct this. On Thu, Jun 17, 2021 at 10:08 AM Michael Wechner

Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
Hi According to the FAQ one can delete documents using the IndexReader https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIdeletedocumentsfromtheindex? but when I look at the javadoc of Lucene version 8_8_2 https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/in

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
possible you *should* update because the 8.x index may not be able to be read by the eventual 10 release. On Thu, May 27, 2021 at 7:52 AM Michael Wechner wrote: I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex? Hope that makes sense, otherwise let me know and I can correct/update :-) Am 26.05.21 um 23:56 schrieb Michael Wechner: using lucene

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner
t indexing, and searching, performance, you should generally index as large a number of documents as possible before flushing. -Mike On Wed, May 26, 2021 at 9:43 AM Michael Wechner wrote: Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K:

  1   2   >