How to use setWriteLockTimeout(long) when write.lock already exists

2011-12-13 Thread Michael Wechner
Hi According to http://www.gossamer-threads.com/lists/lucene/java-dev/37421 one cannot overwrite the default write lock timeout of 1000ms once a write.lock already exists (for example inside a multi-threaded web-application), because in order to use the method setWriteLockTimeout(long) one

Re: How to use setWriteLockTimeout(long) when write.lock already exists

2011-12-14 Thread Michael Wechner
Am 13.12.11 19:36, schrieb Michael Wechner: Hi According to http://www.gossamer-threads.com/lists/lucene/java-dev/37421 one cannot overwrite the default write lock timeout of 1000ms once a write.lock already exists (for example inside a multi-threaded web-application), because in order to

Re: is it possible to index wiki markup files?

2012-01-11 Thread Michael Wechner
Maybe Tika is also of help to you http://tika.apache.org/ HTH Michael Am 11.01.12 20:13, schrieb Reyna Melara: Hi, my name is Reyna Melara I'm a PhD student form Mexico, and I have a set of 11,051,447 files with txt extension but the content of each file is in fact in wiki format, I want and

Re: Storing Documents in Lucene

2013-03-29 Thread Michael Wechner
you also might like to consider Jackrabbit: http://jackrabbit.apache.org/ or Yarep: https://github.com/wyona/yarep which are both using Lucene for indexing, but the actual data storage is hidden by an abstraction layer and is configurable/customizable. HTH Michael Am 29.03.13 02:24, schri

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Michael Wechner
On 3/22/11 8:40 AM, shrinath.m wrote: On Tue, Mar 22, 2011 at 12:39 PM, Anshum-2 [via Lucene]< ml-node+2713899-1210341880-376...@n3.nabble.com> wrote: No as of now, there's no way to do so. Thank you Anshum-2, how do you propose I do this ? I have thought of a way like this : - first get the

Re: Is it possible to update only selected fields in a document ?

2011-03-22 Thread Michael Wechner
On 3/22/11 10:09 AM, shrinath.m wrote: On Tue, Mar 22, 2011 at 1:37 PM, Michael Wechner [via Lucene]< ml-node+2714008-984126374-376...@n3.nabble.com> wrote: are you looking for something like http://hrycan.com/2009/11/26/updating-document-fields-in-lucene/ ? Precisely that. I am O

Re: Lucene Simple Project

2011-06-19 Thread Michael Wechner
Am 18.06.11 19:05, schrieb Steven A Rowe: Hi Hamada, Do you know about the Lucene demo?: http://lucene.apache.org/java/3_2_0/demo.html also you might want to use http://code.google.com/p/luke/ in order to view your search index and check what fields it actually contains HTH Michael Ste

Compiling and running Lucene/Solr based on github does not seem to work

2014-12-04 Thread Michael Wechner
Hi I have cloned the github version of Lucene/Solr yesterday https://github.com/apache/lucene-solr and was running ant compile ant test successfully. Also Jetty seems to startup fine, but when I access http://localhost:8983/solr/ then I receive HTTP ERROR: 503 Problem accessing /solr

Re: Compiling and running Lucene/Solr based on github does not seem to work

2014-12-05 Thread Michael Wechner
thanks very much for your help. I will use the solr mailing list for future solr related questions. After running ant example ant run-example inside the solr folder, I was able to access http://localhost:8983/solr without a problem. I think it would make sense to change the main README and th

Lucene FAQ as CSV to train DeepPavlov

2019-12-26 Thread Michael Wechner
Hi I would like to train "DeepPavlov FAQ" http://docs.deeppavlov.ai/en/master/features/skills/faq.html https://colab.research.google.com/github/deepmipt/dp_notebooks/blob/master/DP_autoFAQ.ipynb https://medium.com/deeppavlov/simple-intent-recognition-and-question-answering-with-deeppavlov-c54ccf

Re: Use Case clarification

2021-04-05 Thread Michael Wechner
Hi The following FAQ might be a bit outdated, but nevertheless you should find some answers there as well https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ For example to answer your question 4) see https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ#LuceneFAQ-CanIuseLuce

Re: Use Case clarification

2021-04-05 Thread Michael Wechner
ngine as a personal project . On Mon, 5 Apr 2021, 10:57 Michael Wechner, wrote: Hi The following FAQ might be a bit outdated, but nevertheless you should find some answers there as well https://cwiki.apache.org/confluence/display/lucene/LuceneFAQ For example to answer your question 4) s

Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
Hi I recently found the following articles re Lucene/Solr and BERT https://dmitry-kan.medium.com/neural-search-with-bert-and-solr-ea5ead060b28 https://medium.com/swlh/fun-with-apache-lucene-and-bert-embeddings-c2c496baa559 and would like to ask whether there might be more recent developments w

Re: Lucene/Solr and BERT

2021-04-21 Thread Michael Wechner
e are some test suites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucen

Re: Negation search help

2021-04-28 Thread Michael Wechner
Hi Amitesh I don't have statistical proof , but I think it doesn't help on mailing lists with volunteeers to write "I badly need some help", because it seems to me the contrary will happen, that people will not help at all. I think there are various reasons for this behaviour, which is inter

Re: Negation search help

2021-04-28 Thread Michael Wechner
Hi Amitesh Thanks for the more concrete examples. Unfortunately I do not know how to solve this better with Lucene itself in a more general context, but did you ever consider using BERT in combination with Lucene/Solr https://blog.google/products/search/search-language-understanding-bert/ ht

Re: Negation search help

2021-04-29 Thread Michael Wechner
Yes, it would be great if you could share code snippets. Maybe it will help others or maybe someone will have a suggestion to improve or an alternative. All the best Michael Am 29.04.21 um 14:35 schrieb amitesh116: Thank you Michael! I solved this requirement by setting the tokenStream at t

Re: Lucene/Solr and BERT

2021-05-19 Thread Michael Wechner
uites that index and search Glove vectors. My first impression was that indexing seems surprisingly slow, but it's entirely possible I'm doing something wrong. On Wed, Apr 21, 2021 at 9:31 AM Michael Wechner wrote: Hi I recently found the following articles re Lucene/Solr and BERT

Re: Lucene/Solr and BERT

2021-05-23 Thread Michael Wechner
sure the VectorFormat API (might still get renamed due to confusion with other kinds of vectors existing in Lucene) can support alternative KNN implementations. On Wed, May 19, 2021 at 12:22 PM Michael Wechner wrote: Hi Alex Just to make sure I understand better what the additions are about Am

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
ds of vectors existing in Lucene) can support alternative KNN implementations. On Wed, May 19, 2021 at 12:22 PM Michael Wechner wrote: Hi Alex Just to make sure I understand better what the additions are about Am 21.04.21 um 17:21 schrieb Alex K: There were a couple additions recently merge

Index backwards compatibility

2021-05-26 Thread Michael Wechner
Hi I am using Lucene 8.8.2 in production and I am currently doing some tests using 9.0.0-SNAPSHOT, whereas I have included lucene-backward-codecs, because in the log files it was asking me whether I have forgotten to include lucene-backward-codecs.jar         org.apache.lucene  

Re: Lucene/Solr and BERT

2021-05-26 Thread Michael Wechner
Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K: Hi Michael and others, Sorry just now getting back to you. For your three original questions: - Yes, I was referring to the Lucene90Hnsw* classes. Michael S. had a thorough response. -

Re: Index backwards compatibility

2021-05-26 Thread Michael Wechner
: I think you need backward-codecs-9.0.0-SNAPSHOT there. It enables 9.0 to read 8.x indexes. On Wed, May 26, 2021 at 9:27 AM Michael Wechner wrote: Hi I am using Lucene 8.8.2 in production and I am currently doing some tests using 9.0.0-SNAPSHOT, whereas I have included lucene-backward-codecs

Re: Lucene/Solr and BERT

2021-05-27 Thread Michael Wechner
t indexing, and searching, performance, you should generally index as large a number of documents as possible before flushing. -Mike On Wed, May 26, 2021 at 9:43 AM Michael Wechner wrote: Hi Alex Thank you very much for your feedback and the various insights! Am 26.05.21 um 04:41 schrieb Alex K:

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0,doIhavetoreindex? Hope that makes sense, otherwise let me know and I can correct/update :-) Am 26.05.21 um 23:56 schrieb Michael Wechner: using lucene

Re: Index backwards compatibility

2021-05-27 Thread Michael Wechner
possible you *should* update because the 8.x index may not be able to be read by the eventual 10 release. On Thu, May 27, 2021 at 7:52 AM Michael Wechner wrote: I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-WhenIupradeLucene,forexamplefrom8.8.2to9.0.0

Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
Hi According to the FAQ one can delete documents using the IndexReader https://cwiki.apache.org/confluence/display/lucene/lucenefaq#LuceneFAQ-HowdoIdeletedocumentsfromtheindex? but when I look at the javadoc of Lucene version 8_8_2 https://lucene.apache.org/core/8_8_2/core/org/apache/lucene/in

Re: Is deleting with IndexReader still possible?

2021-06-17 Thread Michael Wechner
cool, thanks very much for your quick response and updating the FAQ! Am 17.06.21 um 10:28 schrieb Adrien Grand: Good catch Michael, removing from IndexReader has actually been removed a long time ago. I just edited the FAQ to correct this. On Thu, Jun 17, 2021 at 10:08 AM Michael Wechner

Re: hello~~i have a question

2021-08-02 Thread Michael Wechner
I don't know either, whereas I searched  a little and found various good explanations what segments are, e.g. https://www.alibabacloud.com/blog/analysis-of-lucene---basic-concepts_594672 but not in which order the segments are being read. I am nore sure where in the code the segments are being

Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Hi I am trying to implement a search with Lucene similar to what for example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are offering, that with every new letter typed a new search is being executed. For example when I type "tes", then all documents are being returned contain

Re: Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
really want to just search on prefixes and jumble up the results (perhaps because you are gonna just sort by some custom document feature instead of relevance), then you can do that if you really want. You can use the n-gram/edge-ngram/shingle filters in the analysis package for that. On Wed, Oct 6, 2

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner
Michael On Thu, Oct 7, 2021 at 2:30 AM Michael Wechner wrote: Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner
according to Lucene 8.10.1 suggest API If you know any simple, recent examples, please let me know Thanks Michael Am 08.10.21 um 21:40 schrieb Michael Wechner: Am 08.10.21 um 18:49 schrieb Michael Sokolov: Thank you for offering to add to the FAQ! Indeed it should mention the suggester

Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
Hi I recently started to use the Autosuggest/Autocomplete package as suggested by Robert https://www.mail-archive.com/java-user@lucene.apache.org/msg51403.html which works very fine, thanks again for your help :-) But it is not clear to me what are the best practices building a suggester us

Re: Autosuggest/Autocomplete: What are the best practices to build Suggester?

2021-11-18 Thread Michael Wechner
("contract search","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); entities.add(new Item("claims management system","",asList("a84581a3-302f-4b73-80d9-0e60da5238f9"),1)); suggester.build(new ItemIterator(entities.iterator())); ) I was

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner
Hi Yuxin Can you provide a concrete example of a query and a document/code snippet? Thanks Michael Am 20.12.21 um 03:06 schrieb Yuxin Liu: Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes usi

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
Hi Together I would be interested to submit a proposal/presentation re Lucene's vector search,  but would like to ask first whether somebody else wants to do this as well or might be interested to do this together? Thanks Michael Am 30.03.22 um 14:16 schrieb Rich Bowen: [You are receiving

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Michael Wechner
uld be reviewed independently and if there is another proposals that clashes, the abstract would help the program committee pick the one (or both) that's best suited for the audience. Good luck! -Anshum On Wed, Mar 30, 2022 at 5:47 AM Michael Wechner wrote: Hi Together I would be interested

Re: Call for Presentations now open, ApacheCon North America 2022

2022-03-31 Thread Michael Wechner
for helping spread the word about Lucene's new vector search capabilities! On Thu, Mar 31, 2022 at 7:36 AM Michael Wechner wrote: ok :-) thanks! Anyway, if somebody would like to join re a "vector search" proposal, please let me know Michael Am 30.03.22 um 20:13 schrieb An

Re: Need help on defining custom scorer in Lucene 9

2022-04-03 Thread Michael Wechner
Hi Lokesh IIUC each document (like for example a shop description) has a longitude and a latitude associated with. The user search input are some keywords and the the user's geo location. The keywords you use to search for the documents and the users's geo location you would like to use for

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-demo-9.1.0.jar I guess the documentation is not quite right. Re your second

Re: New user questions about demo, downloads, and IRC

2022-04-25 Thread Michael Wechner
PR Thanks Michael Am 25.04.22 um 23:37 schrieb Michael Wechner: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0.tgz you should find modules/lucene-core-9.1.0.jar modules/lucene-queryparser-9.1.0.jar modules/lucene-analysis-common-9.1.0.jar modules/lucene-demo-9.

Re: New user questions about demo, downloads, and IRC

2022-04-26 Thread Michael Wechner
great, thanks! Am 26.04.22 um 21:48 schrieb Michael Sokolov: thanks, I fixed the doc! On Tue, Apr 26, 2022 at 9:13 AM Bridger Dyson-Smith wrote: Hi Michael - On Mon, Apr 25, 2022 at 5:38 PM Michael Wechner wrote: Hi Bridger Inside https://dlcdn.apache.org/lucene/java/9.1.0/lucene-9.1.0

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great I have found https://issues.apache.org/jira/browse/SOLR-15947 https://issues.apache.org/jira/browse/LUCENE-10382 and https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/packa

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA-2BOn91WS4mEQPCWI9gZNzEZlJPmWPGP2C

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
-summary.html which I was not aware of, but disabled the tracking now and hope it will be ok now. Thanks Michael Am 09.05.22 um 15:12 schrieb Michael Wechner: Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Michael Wechner
et us know if you run into any questions/ issues while trying it out! Julie On Mon, May 9, 2022 at 8:08 AM Michael Wechner wrote: sorry for the URLs below. I have tested Twilio SendGrid as outgoing server and it just rewrote the URLs https://issues.apache.org/jira/browse/SOLR-15947 https://issues

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-20 Thread Michael Wechner
rch(query, k); Does that make sense to you? Thanks Michael Am 11.05.22 um 07:59 schrieb Michael Wechner: Hi Julie Cool, thanks! I try to apply it and if it works could create an example to the demo package. Will keep you posted :-) Thanks Michael Am 11.05.22 um 02:13 schrieb Julie Tibshi

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-23 Thread Michael Wechner
AM Michael Wechner Hi Julie I got it running and it seems to work fine so far :-) Re an example for the demo package, I guess this would go here https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn/package-summary.html and I thought of something like

Re: Auto-complete in Lucene

2022-05-25 Thread Michael Wechner
we are using  AnalyzingInfixSuggester but I would also be curious to know whether this is the best way :-) Thanks Michael Am 25.05.22 um 14:39 schrieb Anastasiya Tarasenko: Hi All, I have a question regarding auto-complete functionality in Lucene. On the StackOverflow the suggestion regardin

Re: Multi-Value query test

2022-06-23 Thread Michael Wechner
Maybe I misunderstand the problem, but why don't you decouple showing the results from the results of the query? Am 23.06.22 um 14:03 schrieb Patrick Bernardina: How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all

How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
Hi I am currently filtering a KnnVectorQuery as follows Query filter =new TermQuery(new Term(CLASSIFICATION_FIELD, classification)); query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, filter); but it is not clear to me how I can filter for multiple terms. Should I subclass MultiTermQuery

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
BooleanQuery.Builder. As noted in TermsInSetQuery ( https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TermInSetQuery.java#L62) multiple terms could be represented as a boolean query with Occur.SHOULD. ~Matt On Wed, Aug 31, 2022 at 11:15 AM Michael Wechner wrote

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Michael Wechner
u can also pass a BooleanQuery with multiple terms or a combination of other queries, a numeric range,... or a fulltext query out of Lucene's query parsers. Uwe Am 31.08.2022 um 22:19 schrieb Michael Wechner: Hi Matt Thanks very much for your feedback! According to your links I will try C

Re: [ANNOUNCE] Apache Lucene 9.4.0 released

2022-09-30 Thread Michael Wechner
great, thank you very much! Just in time for ApacheCon :-) Am 01.10.22 um 00:09 schrieb Michael Sokolov: The Lucene PMC is pleased to announce the release of Apache Lucene 9.4.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technolo

Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-09-30 Thread Michael Wechner
Hi I have just upgraded from 9.1.0 to 9.4.0 and compiling works fine, but when I run and re-index my data using KnnVectorField, then I receive the following exception: java.lang.UnsupportedOperationException: Old codecs may only be used for reading     at org.apache.lucene.backward_codecs.l

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
? Thanks Michael Am 01.10.22 um 08:06 schrieb Adrien Grand: I would guess that you are configuring your IndexWriterConfig with a "Lucene91Codec" instance. You need to replace it with a "Lucene94Codec" instance. Le sam. 1 oct. 2022, 06:12, Michael Wechner a écrit : Hi I hav

Re: Upgrading from 9.1.0. to 9.4.0: Old codecs may only be used for reading Lucene91HnswVectorsFormat.java

2022-10-01 Thread Michael Wechner
HNSW parameters? If so, there is no better way than what you are doing. Le sam. 1 oct. 2022, 12:31, Michael Wechner a écrit : Hi Adrien Thank you very much for your help! That was it :-) I completely forgot that I set this somewhere hidden inside my code. I made a note in the pom file, such that I s

Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
Hi Together I just read the following article, where the author compares Lucene and Vespa re HSWN https://bergum.medium.com/will-new-vector-databases-dislodge-traditional-search-engines-b4fdb398fb43 What is your take on "comparing Lucene and Vespa re HSWN latency and recall"? Thanks Micha

Re: Latency and recall re HSWN: Lucene versus Vespa

2022-10-01 Thread Michael Wechner
l comparison, but every choice is a compromise. We've known for centuries that "Odyous of olde been comparisonis, And of comparisonis engendyrd is haterede." On Sat, Oct 1, 2022 at 7:18 AM Michael Wechner wrote: Hi Together I just read the following article, where the author compares L

Will ApacheCon North America 2022 sessions also be published on YouTube?

2022-10-16 Thread Michael Wechner
Hi I just noticed that the ApacheCon Asia 2022 have been published on YouTube https://apachecon.com/ https://www.youtube.com/c/TheApacheFoundation/playlists Will this also happen for ApacheCon North America 2022? Thanks Michael

The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
Hi On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would like to add some notes, but to be sure I don't write anything wrong I would like to ask whether the current default similarity implementation of Lucene is really BM25, right? as described at https://opensourceconnec

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
cool, thanks! Am 23.11.22 um 10:55 schrieb Adrien Grand: This is correct. See IndexSearcher#getDefaultSimilarity(). On Wed, Nov 23, 2022 at 10:53 AM Michael Wechner wrote: Hi On the Lucene FAQ there is no mentioning re tf-idf or bm25 and I would like to add some notes, but to be sure I

Re: The current default similarity implementation of Lucene is BM25, right?

2022-11-23 Thread Michael Wechner
I have enhanced the FAQ https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-Whatisthedefaultrelevance/similarityimplementationofLucene? Hope it is ok like this :-) Thanks Michael Am 23.11.22 um 10:58 schrieb Michael Wechner: cool, thanks! Am 23.11.22 um 10:55 schrieb

What exactly returns IndexReader.numDeletedDocs()

2022-12-07 Thread Michael Wechner
Hi I am using Lucen 9.4.2 vector search and everything seems to work fine, except that when I delete some documents from the index, then the method https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexReader.html#numDeletedDocs() always returns 0, whereas I would have expect

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner
reader to see deletes from the indexwriter. Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar: Did you call this method before or after commit method? My wild guess would be that you can count deleted documents inside transaction only. On Thu, Dec 8, 2022 at 12:10 AM Michael Wechner wrote: Hi I

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Michael Wechner
is high. Uwe Am 08.12.2022 um 11:44 schrieb Michael Wechner: My code at the moment is as follows: Directory dir = FSDirectory.open(Paths.get(vectorIndexPath)); IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(vectorIndexPath))); int numberOfDocsBeforeDeleting

Re: Question for SynonymQuery

2022-12-28 Thread Michael Wechner
Hi Anh The following Stackoverflow link might help https://stackoverflow.com/questions/73240494/can-someone-assist-me-with-a-multi-word-synonym-problem-in-lucene The following thread seems to confirm, that escaping the space with a backslash does not help https://lists.apache.org/list?java-u

Re: Question for SynonymQuery

2023-01-02 Thread Michael Wechner
r SynonymQuery; I have just used the standard QueryParser. Instead the synonym processing occurs in the indexing phase, which is not only simpler (one search pattern, one query), but also I think you would also find it gives you superior performance (because the synonym processing occurs once at indexing

Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner
Hi IIUC Lucene currently supports VectorSimilarityFunction.COSINE VectorSimilarityFunction.DOT_PRODUCT VectorSimilarityFunction.EUCLIDEAN whereas some embedding models have been trained with other metrics. Also see https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdi

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-14 Thread Michael Wechner
vectors format that ignores the vector similarity configured on the field and uses its own. Le sam. 14 janv. 2023, 21:33, Michael Wechner a écrit : Hi IIUC Lucene currently supports VectorSimilarityFunction.COSINE VectorSimilarityFunction.DOT_PRODUCT VectorSimilarityFunction.EUCLIDEAN whereas s

Re: Other vector similarity metric than provided by VectorSimilarityFunction

2023-01-15 Thread Michael Wechner
aybe it is easier to just contribute another metric as part of the source, than make it configurable dynamically with a custom implementation. Thanks Michael On Sat, Jan 14, 2023 at 6:04 PM Michael Wechner wrote: Hi Adrien Thanks for your feedback! Whereas I am not sure I fully understand wha

Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-10 Thread Michael Wechner
Hi I use the vector search of Lucene, whereas the embeddings I get from SentenceBERT for example. According to https://www.sbert.net/examples/applications/retrieve_rerank/README.html a re-ranking with a cross-encoder after the vector search (bi-encoding) can improve the ranking. Would it

Re: Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-11 Thread Michael Wechner
em with your vectors, very fast, only 500 calculations required, no HNSW or anything needed. Of course you could use a vector search instead of a BM25 search as the initial search to pull the top 500 hits too. So it could meet both use-cases and provide a really performant option for users that want

Re: Vector Search on Lucene

2023-03-02 Thread Michael Wechner
Hi Marcos The indexing looks kind of Document doc =new Document(); float[] vector = getEmbedding(text); FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, VectorSimilarityFunction.COSINE); KnnVectorField vectorField =new KnnVectorField("my_vector_field", vector, vectorFi

Re: [ANNOUNCE] Apache Lucene 9.6.0 released

2023-05-11 Thread Michael Wechner
Thank you very much for the release! Works very fine so far :-) All the best Michael Am 10.05.23 um 09:49 schrieb Alan Woodward: The Lucene PMC is pleased to announce the release of Apache Lucene 9.6.0. Apache Lucene is a high-performance, full-featured search engine library written entirely

Top docs depend on value of K nearest neighbour

2023-08-02 Thread Michael Wechner
Hi I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0 when doing vector search as follows: I have indexed about 200 vectors (dimension 768) I build the query as follows  Query query = new KnnFloatVectorQuery("vector-field-name", queryVector, k); and do the search as f

Re: Top docs depend on value of K nearest neighbour

2023-08-04 Thread Michael Wechner
" KNN and can get caught in local minima (maxima?). Increasing K has, indirectly, the effect of expanding the search space because the minimum score in the priority score (score of the Kth item) is used as a threshold for deciding when to terminate the search On Wed, Aug 2, 2023 at 5:19 PM Michael We

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@luce

How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) {     Document doc = indexReader.document(scoreDoc.doc); } How do I best replace document(int)? Thanks M

Re: How to replace deprecated document(i)

2023-09-24 Thread Michael Wechner
) - Shubham On Mon, Sep 25, 2023 at 1:59 AM Michael Wechner wrote: Hi I recently noctived that IndexReader.document(int) is deprecated, whereas my code is currently TopDocs topDocs = searcher.search(query, k); for (ScoreDoc scoreDoc : topDocs.scoreDocs) { Document doc

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
);    }    ```    Note that these StoredFields and TermVectors instances should only    be consumed in the thread where    they were acquired. For instance, it is illegal to share them across    threads. Uwe Am 25.09.2023 um 07:53 schrieb Michael Wechner: Hi Shubham Great, thank you very much

Re: How to replace deprecated document(i)

2023-09-25 Thread Michael Wechner
that gives the missing information in 9.x Javadocs, too. Uwe Am 25.09.2023 um 11:02 schrieb Michael Wechner: you mean once per search request? I mean for example GET https://localhost:8080/search?q=Lucene and the following would be executed IndexReader reader = DirectoryReader

Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the vector dimension 1536 and received the following error Field[vector]vector's dimensions must be <= [1024]; got 1536 wheres this worked previously with the hack to override the vector di

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's embedding model, which has the v

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Michael Wechner
trees. Uwe Am 19.10.2023 um 10:53 schrieb Michael Wechner: I forgot to mention, that when using the custom FieldType and 1536 vector dimension does work with Lucene 9.7.0 Thanks Michael Am 19.10.23 um 10:39 schrieb Michael Wechner: Hi I recently upgraded Lucene to 9.8.0 and was r

When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of documents I currently use StringField, e.g. doc1.add(new StringField("category", "bo

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
less http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: Hi I have found the following simple Facet Example https://github.com/apache/lucene/blob/main/lucene/demo/src/java/org/apache/lucene/demo/facet/SimpleFacetsExample.java whereas for a simple categorization of d

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
et by different points/levels of your hierarchy. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 5:43 AM Michael Wechner wrote: > Hi > > I have found the following simple Facet Example > > > https://github

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
omyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner wrote: Hi Mike Thanks for your feedback! II

Re: When to use StringField and when to use FacetField for categorization?

2023-10-20 Thread Michael Wechner
hich case, yes, you need to create a TaxonomyWriter). It used to be that the "doc values" based faceting did not support arbitrary hierarchy, but I think that was fixed at some point. Mike McCandless http://blog.mikemccandless.com On Fri, Oct 20, 2023 at 9:03 AM Michael Wechner <

Re: When to use StringField and when to use FacetField for categorization?

2023-10-23 Thread Michael Wechner
probably have too many ways to do the same thing in the faceting module, and maybe our documentation could be a bit more helpful. Cheers, -Greg On Fri, Oct 20, 2023 at 2:54 PM Michael Wechner wrote: thanks very much for this additional information, Marc! Am 20.10.23 um 20:30 schrieb Marc D

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-07 Thread Michael Wechner
nt vectors format like a delegator as descirbed before. The responsibility was shifted to the codec, because there may be better alternatives to HNSW that have different limits especially with regard to performance during merging and query response times, e.g. BKD trees. Uwe Am 19.10.2023 um

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-08 Thread Michael Wechner
 * implement getKnnVectorsFormatForField() and return the wrapper with    other max dimension Reading indexes still works with unmodified default codec, you only need to set it for IndexWriter. When reading the actual codec is looked up by name. Uwe Am 07.11.2023 um 17:03 schrieb Michael Wechner: Hi Uwe

How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
HI IIUC I can get all terms of a particular field of an index with IndexReader reader = DirectoryReader.open(„index_directory"); List list = reader.leaves(); for (LeafReaderContext lrc : list) { Terms terms = lrc.reader().terms(„field_name"); if (terms != null) { TermsEnum termsEn

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
alyse it again, get all terms. Good Luck On Sun, Nov 12, 2023 at 7:47 PM Michael Wechner wrote: HI IIUC I can get all terms of a particular field of an index with IndexReader reader = DirectoryReader.open(„index_directory"); List list = reader.leaves(); for (LeafReaderContext

Re: How to get terms of a particular field of a particular document

2023-11-12 Thread Michael Wechner
here > https://github.com/apache/lucene/blob/4e2ce76b3e131ba92b7327a52460e6c4d92c5e33/lucene/highlighter/src/java/org/apache/lucene/search/highlight/Highlighter.java#L159 > > > On Sun, Nov 12, 2023 at 11:42 PM Michael Wechner > wrote: > >> Hi Mikhail >> >> Thank you

Re: How to get terms of a particular field of a particular document

2023-11-13 Thread Michael Wechner
/core/9_8_0/core/org/apache/lucene/analysis/TokenStream.html correctly, then one should add it. Thanks Michael Am 12.11.23 um 23:36 schrieb Michael Wechner: Thanks again, whereas I think I have found now what I wanted (without needing the Highlighter): IndexReader reader = DirectoryReader.o

Re: Proof of concept for a Luke IntelliJ plugin

2023-11-13 Thread Michael Wechner
Hi Tamas Can one download your plugin somewhere to test it? Thanks Michael Am 13.11.23 um 10:07 schrieb Balog Tamás: Hello everyone! I've been working on a proof of concept of creating an IntelliJ plugin from the Luke application and it reached a demoable state. If anyone of the Lucene c

  1   2   >