FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

2025-07-08 Thread Singh, Divya
From: Singh, Divya Sent: 04 July 2025 14:40 To: d...@lucene.apache.org Cc: Birajdar, Sharad (DI SW PLM LCS APPS ALM R&D7) Subject: FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer) From: Thakare, Monika (ext) (DI SW PLM LCS APPS A

[ANNOUNCE] Apache Lucene 10.2.2 released

2025-06-20 Thread Chris Hegarty
The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.2. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

[ANNOUNCE] Apache Lucene 9.12.2 released

2025-06-20 Thread Chris Hegarty
The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-30 Thread Michael Sokolov
> Regards > Rajib > > -Original Message- > From: Saha, Rajib > Sent: 27 May 2025 11:52 > To: java-user@lucene.apache.org > Subject: RE: Suggestion needed for a case of Lucene Migration with TokenStream > > Hi Uwe, > > Thanks for your suggestions till now. We have be

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-29 Thread Saha, Rajib
needed for a case of Lucene Migration with TokenStream Hi Uwe, Thanks for your suggestions till now. We have been able to proceed good. We are now stuck to a point, where we need some your expert suggestion. As per our design, on full content indexing, - in first step, there will small Lucene index

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-26 Thread Saha, Rajib
Hi Uwe, Thanks for your suggestions till now. We have been able to proceed good. We are now stuck to a point, where we need some your expert suggestion. As per our design, on full content indexing, - in first step, there will small Lucene index files gets created with 5-6 documents. We called

Re: Regarding Clustering Support in Lucene

2025-05-14 Thread Arun Kumar Kalakanti
Dear all, My bad, KMeans is in 10.2 too. Are there any other clustering algos like DBSCAN (or HDBSCAN) or Agglomerative planned in future? Regards, Arun Kumar K On Tue, 6 May 2025 at 17:11, Arun Kumar Kalakanti wrote: > Dear all, > > Lucene 10.1 introduced "experimental"

Regarding Clustering Support in Lucene

2025-05-06 Thread Arun Kumar Kalakanti
Dear all, Lucene 10.1 introduced "experimental" KMeans clustering of vectors. However, I couldn't find it in the 10.2 version. Ref: https://lucene.apache.org/core/10_1_0/sandbox/org/apache/lucene/sandbox/codecs/quantization/KMeans.html Could you please share the plans, if any, or

[ANNOUNCE] Apache Lucene 10.2.1 released

2025-05-01 Thread Chris Hegarty
The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.1. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-29 Thread Uwe Schindler
different level of indexing like MetaData/FullContent information of the Reports. So, Rebuild indexing deletes the existing Lucene index files and do a fresh indexing of all the documents. When we do physically going to directory and delete the Lucene Index files. The Rebuild indexing is working

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-29 Thread Saha, Rajib
Hi Uwe, In our product we have different level of indexing like MetaData/FullContent information of the Reports. So, Rebuild indexing deletes the existing Lucene index files and do a fresh indexing of all the documents. When we do physically going to directory and delete the Lucene Index files

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-28 Thread Uwe Schindler
ut data for the new indexer and sends it to the API (or whatever you have for indexing in your new system). If you just have incomplete Lucene Document instances from the older Lucene index, I think you're lost. When you cann IndexReader/IndexSearcher.document(), you only get stored fields

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-28 Thread Saha, Rajib
Hi Uwe, Thank you for your detailed input and valuable advice. I fully understand and agree that upgrading from such an old version of Lucene involves much more than just resolving compilation issues. Based on the latest Lucene version, we have redesigned our platform accordingly going through

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-25 Thread Uwe Schindler
Hi, I'd like to mention the following: You are trying to upgrade Lucene from a really ancient version. Of course, basic concepts are still the same, but the serach engine and its APIs have changed dramatically, so just trying to "compile code and fix random stuff until it compiles

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-24 Thread Mikhail Khludnev
ludnev > Sent: 24 April 2025 12:10 > To: java-user@lucene.apache.org > Subject: Re: Suggestion needed for a case of Lucene Migration with > TokenStream > > Hi > Use TextField.TYPE_STORED as the third argument in new Field() > see > > https://github.com/apache/lucene-solr/blo

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-24 Thread Saha, Rajib
) = Can you please suggest here too? Regards Rajib -Original Message- From: Mikhail Khludnev Sent: 24 April 2025 12:10 To: java-user@lucene.apache.org Subject: Re: Suggestion needed for a case of Lucene Migration with TokenStream Hi Use TextField.TYPE_STORED as the third

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-23 Thread Mikhail Khludnev
Hi Use TextField.TYPE_STORED as the third argument in new Field() see https://github.com/apache/lucene-solr/blob/e27f44e3d78dfcec230c97e0a1240e3751daeff9/lucene/core/src/java/org/apache/lucene/document/TextField.java#L35C33-L35C44 On Thu, Apr 24, 2025 at 8:37 AM Saha, Rajib wrote: > Hi Expe

Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-23 Thread Saha, Rajib
Hi Experts, We are migrating Lucene from 2.4.1 to 8.11.2. During Migration for a part of code, we are getting below exception in 8.11.2 based changes from Red line colored. = java.lang.IllegalArgumentException: TokenStream fields must be indexed and tokenized at

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread Uwe Schindler
long VarHandles to get 64 dimensions in one go (<https://github.com/apache/lucene/pull/13288/files#diff-1faf01efbf448c751b357e758254b2e623de1145b07bd8afcfe8a49b7dbde9cc>). https://lucene.apache.org/core/10_2_0/codecs/org/apache/lucene/codecs/bitvectors/HnswBitVectorsFormat.html But you h

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread John Dale (DB2DOM)
unsubscribe On Tue, Mar 19, 2024 at 2:59 PM Shubham Chaudhary wrote: > Hi Michael, > > Lucene already had int8 vector support since 9.5 (#1054 > <https://github.com/apache/lucene/pull/1054>) but it was left to the user > to get those quantized vectors and index usi

[ANNOUNCE] Apache Lucene 10.2.0 released

2025-04-10 Thread Ignacio Vera
The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: How can I know the lucene index version from files

2025-03-02 Thread Mikhail Khludnev
I suppose it depends on the version. On Sun, Mar 2, 2025 at 10:55 AM Ralf Heyde wrote: > Hey, > > You might use ‚luke‘ to figure it out. > > Luke is part of the lucene project and a tool to look into indexes. > > Cheers Ralf > > Von meinem Telefon gesendet, etwaige

Re: How can I know the lucene index version from files

2025-03-02 Thread Daniel Cerqueira
> On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira >>> wrote: >>> >>> I have this lucene index files, in a directory: >>> >>> ``` >>> $ ls >>> _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos >>> _1p_Lucene

Re: How can I know the lucene index version from files

2025-03-02 Thread Daniel Cerqueira
> On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira > wrote: > >> I have this lucene index files, in a directory: >> >> ``` >> $ ls >> _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos >> _1p_Lucene41_0.tim _1p_Lucene41_0.tip _1p.nvd

Re: How can I know the lucene index version from files

2025-03-01 Thread Ralf Heyde
Hey, You might use ‚luke‘ to figure it out. Luke is part of the lucene project and a tool to look into indexes. Cheers Ralf Von meinem Telefon gesendet, etwaige Rechtschreibfehler kann ich nicht ausschliessen > Am 02.03.2025 um 08:18 schrieb Mikhail Khludnev : > > Hi Daniel.

Re: How can I know the lucene index version from files

2025-03-01 Thread Mikhail Khludnev
print it to console that should answer your questions. On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira wrote: > I have this lucene index files, in a directory: > > ``` > $ ls > _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos > _1p_Lucene41_0.tim _1p_Lucene41_0.

How can I know the lucene index version from files

2025-03-01 Thread Daniel Cerqueira
I have this lucene index files, in a directory: ``` $ ls _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos _1p_Lucene41_0.tim _1p_Lucene41_0.tip _1p.nvd _1p.nvm _1p.si segments_1 segments.gen write.lock ``` - How can I know which is the version of this lucene index

Re: apache-lucene blowing up with large file

2025-03-01 Thread Dawid Weiss
index your document(s) and how you can then query those documents. You can even start with the source of IndexFiles (the demo class). > That's a school example of integer overflow. Perhaps Lucene is not designed to work with such a large single files Correct. Token offsets and positions with

Re: apache-lucene blowing up with large file

2025-02-28 Thread Daniel Cerqueira
> On Fri, Feb 28, 2025 at 10:30 AM Daniel Cerqueira > wrote: > >> Hi. I have apache-lucene version 10.1.0: >> ``` >> $ pacman -Qs apache-lucene >> local/apache-lucene 10.1.0-1 >> Apache Lucene is a high-performance, full-featured text search eng

Re: apache-lucene blowing up with large file

2025-02-28 Thread Hrvoje Lončar
That's a school example of integer overflow. Perhaps Lucene is not designed to work with such a large single files. On Fri, 28 Feb 2025, 10:50 Dawid Weiss, wrote: > Split your large file into smaller fragments and index each fragment as a > document. > > D. > > On Fri, F

Re: apache-lucene blowing up with large file

2025-02-28 Thread Dawid Weiss
Split your large file into smaller fragments and index each fragment as a document. D. On Fri, Feb 28, 2025 at 10:30 AM Daniel Cerqueira wrote: > Hi. I have apache-lucene version 10.1.0: > ``` > $ pacman -Qs apache-lucene > local/apache-lucene 10.1.0-1 > Apache Lucene is a h

apache-lucene blowing up with large file

2025-02-28 Thread Daniel Cerqueira
Hi. I have apache-lucene version 10.1.0: ``` $ pacman -Qs apache-lucene local/apache-lucene 10.1.0-1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. ``` I am trying to build a lucene index for a large file. ``` $ ll total 2,3G -rw

lucene-replicator: how to correctly reset NRT version

2025-02-21 Thread Steven Schlansker
Hi Lucene friends, We use the replicator module to implement log-shipping replication for our Lucene cluster. We have an offline "rebuild everything" process for use when indexing or data formats change. We have a single primary node that only serves the IndexWriter and replicator

RE: Re: Sentence classification with Lucene

2025-02-19 Thread Dmitri Geller
Yes, something like lucene-classification [1]. But, there are multiple classifiers in this package. Which one is better suited ? (Imagine I collect more samples per class... about... 30-40 samples per class) Any good Java examples using these classifiers? Another question: in case I want my

Re: Sentence classification with Lucene

2025-02-19 Thread Tommaso Teofili
Hi, if you have 30 classes with 10 samples per class, I'd say that's not an optimal distribution. Apart from that, you may use one of the text classifiers from lucene-classification [1], is anything like this what you had in mind? Alternatively you can also do things outside of Luce

Sentence classification with Lucene

2025-02-17 Thread Dmitri Geller
:    example1    example2    ...    exampleN ... ``` There are about 25-30 classes. About 10-30 examples per class. One sentence can get one or two classes assigned As far as I understand: this can be done with Lucene Core, should be quite a standard functionality. Can you point me to a Java example

Re: Reg Migration to 10.0.0 lucene core jar

2025-01-03 Thread Uwe Schindler
Hi, Which vulnerability are you talking about?!? We opened a CVE a while ago, but this was not about Lucene Core. Some checkers have false positives due to name mismatch. Am 13.12.2024 um 10:41 schrieb lavanya ponnapoolu: Hi Team, We are upgrading lucene-core jar from 4.7.0 to 10.0.0

[ANNOUNCE] Apache Lucene 10.1.0 released

2024-12-20 Thread Luca Cavanna
The Lucene PMC is pleased to announce the release of Apache Lucene 10.1.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: Reg Migration to 10.0.0 lucene core jar

2024-12-14 Thread Mikhail Khludnev
Hello, org.apache.lucene.document.Field is there https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/document/Field.html or I don't understand what you refers to. Please elaborate. I think you need org.apache.lucene.store.FSDirectory#open(java.nio.file.Path) All jars should be the

Reg Migration to 10.0.0 lucene core jar

2024-12-13 Thread lavanya ponnapoolu
Hi Team, We are upgrading lucene-core jar from 4.7.0 to 10.0.0 because of vulnerability. org.apache.lucene.document.Field.*Index *but am not finding any alternative as part of https://lucene.apache.org/core/6_0_0/MIGRATE.html. From lucene-core-6.0.0 this class files are removed. Same with

[ANNOUNCE] Apache Lucene 9.12.1 released

2024-12-13 Thread Chris Hegarty
The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Re: Lucene Query Metrics

2024-12-04 Thread Mikhail Khludnev
Hello, There's nothing like that. On top of my head is a profile collector in Elasticsearch. On Wed, Dec 4, 2024 at 11:46 PM ashwini singh wrote: > Does lucene provide extensions (utilities)to extract metrics from Lucene > during the request execution? Or applications can only trac

Re: Lucene Query Metrics

2024-12-04 Thread ashwini singh
Does lucene provide extensions (utilities)to extract metrics from Lucene during the request execution? Or applications can only track execution stats on top of Lucene. On Tue, 3 Dec 2024 at 23:20, Adrien Grand wrote: > Lucene doesn't expose query metrics, it's up to the app

Re: Lucene Query Metrics

2024-12-03 Thread Adrien Grand
Lucene doesn't expose query metrics, it's up to the application that integrates Lucene to compute and expose metrics that are relevant to them. Le mer. 4 déc. 2024, 00:31, ashwini singh a écrit : > Hey everyone, > > Does lucene provide any query metrics (perf) ? I am lo

Lucene Query Metrics

2024-12-03 Thread ashwini singh
Hey everyone, Does lucene provide any query metrics (perf) ? I am looking for something very similar to MongoSB explain() output or Execution metrics for Cosmos DB? *Thanks and Regards,* *Ashwini Singh*

Re: Lucene Slack Channel

2024-12-03 Thread ashwini singh
Thanks !! On Wed, 13 Nov 2024 at 13:31, Gus Heck wrote: > The slack channel (named 'lucene-dev') is generally for people building > lucene itself, and not generally for people looking for help providing > solutions using lucene. Typically one gets an apache.org address by &g

Re: Lucene Slack Channel

2024-11-13 Thread Gus Heck
The slack channel (named 'lucene-dev') is generally for people building lucene itself, and not generally for people looking for help providing solutions using lucene. Typically one gets an apache.org address by contributing enough to an apache project to get invited as a committer. Alt

Re: Lucene Slack Channel

2024-11-13 Thread Michael Wechner
Wechner wrote: I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene

Re: Lucene Slack Channel

2024-11-13 Thread ashwini singh
e access policy? > > Thanks > > Michael > > Am 04.11.24 um 23:56 schrieb ashwini singh: > > Hi > > > > How can I get added to lucene slack channel? I am working on Lucene to > > build a customer search technology. I

Re: Lucene Slack Channel

2024-11-04 Thread Michael Wechner
I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene to build a

Lucene Slack Channel

2024-11-04 Thread ashwini singh
Hi How can I get added to lucene slack channel? I am working on Lucene to build a customer search technology. I want to discuss more about lucene in the community -- *Thanks and Regards,* *Ashwini Singh*

RE: Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-29 Thread Renaud SAINT-GRATIEN
CONFIDENTIAL Hello, Indeed, I double-checked, and our app does not use lucene-replicator. I silenced my dumb security scanner. Thank you for your help. -Original Message- From: Michael Sokolov Sent: Monday, October 28, 2024 3:06 PM To: java-user@lucene.apache.org Subject: Re: Any

Re: Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-28 Thread Michael Sokolov
Do you actually use org.apache.lucene.replicator.http ? If not then this wouldn't have any material impact on your application. On Mon, Oct 28, 2024 at 4:25 AM Renaud SAINT-GRATIEN wrote: > > CONFIDENTIAL > > Hello, > > Is there any plan to patch Lucene 8.11 for CVE-2024-4

Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-28 Thread Renaud SAINT-GRATIEN
CONFIDENTIAL Hello, Is there any plan to patch Lucene 8.11 for CVE-2024-45772 ? I need to stay on 8.11 branch because my application still runs on Java 8. We plan to migrate to Java 17 but this cannot be done sooner than mid 2025... (this is a huge application). Thank you for this amazing

Re: Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Michael Froh
Hi Prashant, For your particular use-case, you probably don't need to join across multiple indices. Lucene is able to maintain multiple data structures per field, with the selection of data structures coming from attributes of the field's type. If you have a field that you want to r

Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Prashant Saxena
I'm new to Lucene and trying to understand the concept of unique document id, something like a primary key in databases like sql or sqlite etc. While searching, I came across this article: https://blog.mikemccandless.com/2014/05/choosing-which actually fast-unique-identifier-uuid.html &

Re: lucene build failure on Windows using pylucene 9.7.0

2024-10-21 Thread Gautam Worah
scratch as > I am new to javascript and lucene. It will help me learn. > > 1. downloading and extracting pylucene > 2. cd lucene-java-9.7.0 > 3. gradlew.bat assemble > > Downloading https://services.gradle.org/distributions/gradle-7.6-bin.zip > > ...10%...

lucene build failure on Windows using pylucene 9.7.0

2024-10-21 Thread Prashant Saxena
Hello, OS : Windows 10 PyLucene : 9.7.0 JDK : 23.0 Although I can download the binary distribution of version 9.7.0, I have decided to build it from scratch as I am new to javascript and lucene. It will help me learn. 1. downloading and extracting pylucene 2. cd lucene-java-9.7.0 3. gradlew.bat

Re: Learning resources for Lucene Development

2024-10-15 Thread Marc Davenport
; > In some shameless self-promotion, I've written up some worked Lucene > examples (maybe a little more focused on Lucene internals than best > practices) over at https://github.com/msfroh/lucene-university. If you > have > anything you'd like to understand better, feel free to

[ANNOUNCE] Apache Lucene 10.0.0 released

2024-10-14 Thread Luca Cavanna
The Lucene PMC is pleased to announce the release of Apache Lucene 10.0.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: Learning resources for Lucene Development

2024-10-09 Thread Michael Froh
Hi Marc, In some shameless self-promotion, I've written up some worked Lucene examples (maybe a little more focused on Lucene internals than best practices) over at https://github.com/msfroh/lucene-university. If you have anything you'd like to understand better, feel free to open is

Re: Learning resources for Lucene Development

2024-10-08 Thread Navneet Verma
+1 on the question. On Tue, Oct 8, 2024 at 6:35 PM Marc Davenport wrote: > Hello, > I had this question buried in a previous email. I feel like I have a very > loose grasp on the Lucene API and how to properly implement with it. I'm > working on code that I didn't write

Learning resources for Lucene Development

2024-10-08 Thread Marc Davenport
Hello, I had this question buried in a previous email. I feel like I have a very loose grasp on the Lucene API and how to properly implement with it. I'm working on code that I didn't write myself from the ground up. Since I'm learning as I'm reading it, I can only assume th

[ANNOUNCE] Apache Lucene 9.12.0 released

2024-09-28 Thread Chris Hegarty
The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

Re: Current command line tools for Lucene?

2024-09-25 Thread Uwe Schindler
Hi, One addition to Dawid's comment: Please make sure to use the "Luke" version shipped with Lucene Distribution. The versions available separately in Github are outadted, that's correct. Uwe Am 25.09.2024 um 08:15 schrieb Dawid Weiss: I spent some time with ChatGPT and

Re: Current command line tools for Lucene?

2024-09-24 Thread Dawid Weiss
> I spent some time with ChatGPT and Google, looking for a simple CLI method > to explore the content. I see mention of Luke, but it seems very dated. Luke is your best bet. There is no command-line tool to "explore the content" because Lucene indexes are fairly low level. I&

Re: Current command line tools for Lucene?

2024-09-24 Thread Dwaipayan Roy
t; > I am exploring the Maltego link analysis package file format and I see > mention of Lucene in the folders. I'm a little bit familiar from having > used ArangoDB and Elasticsearch in the past. > > I spent some time with ChatGPT and Google, looking for a simple CLI method >

Current command line tools for Lucene?

2024-09-22 Thread neal rauhauser
Hello, I am exploring the Maltego link analysis package file format and I see mention of Lucene in the folders. I'm a little bit familiar from having used ArangoDB and Elasticsearch in the past. I spent some time with ChatGPT and Google, looking for a simple CLI method to explore the conte

Re: Get knowledge about apache lucene index migrate

2024-09-12 Thread Rui Wu
g that would lead to grief > for users and/or hamper development of Lucene, so now you can only > upgrade one major version. If you need to do so, the best supported > option is to write a program that reads your data from one index (old > version) and writes it to a new one. There h

Re: Excessive reads while doing commit in lucene

2024-09-04 Thread Michael McCandless
It's odd to have a ~500X difference in writes versus reads. Are you sure? Is it possible you are also opening IndexReaders and searching the commit points? Lucene does re-read previously written (already indexed) documents during segment merges. But at default settings (as long as you di

Re: Excessive reads while doing commit in lucene

2024-09-04 Thread Robert Muir
On Wed, Sep 4, 2024 at 7:07 AM Gopal Sharma wrote: > > Hi Team, > > I am using aws efs to store a lucene index. That's the issue, don't use NFS! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa

Excessive reads while doing commit in lucene

2024-09-04 Thread Gopal Sharma
Hi Team, I am using aws efs to store a lucene index. While indexing for around 200 millions records. The efs read iops went till 20,097 GB's and the cost for efs went too high. Whereas the write iops was only 56GB In my use case i am committing every 100k records (because in my test scen

Re: Get knowledge about apache lucene index migrate

2024-08-06 Thread Michael Sokolov
Yes, there is no support for upgrading a pre-8.x index to 9 or later. At some point it was decided that supporting that would lead to grief for users and/or hamper development of Lucene, so now you can only upgrade one major version. If you need to do so, the best supported option is to write a

Get knowledge about apache lucene index migrate

2024-08-05 Thread Jayamal Jayamaha
Hello I am currently working on a project that is using apache lucene 4.1.0 version. Now I need to upgrade that version to 9.11.1. So I configure the imports and configure the codebase according to the new lucene version. Now I need to upgrade existing indexes which have been created using lucene

Re: Lucene LRUQueryCache question

2024-07-16 Thread Yixun Xu
This post explains why Lucene doesn't cache all queries: https://www.mail-archive.com/java-user@lucene.apache.org/msg51649.html Your queries could be skipping the cache because of the LRUQueryCache constructor parameters, or because of the QueryCachingPolicy.shouldCache predicate. They pro

Lucene LRUQueryCache question

2024-07-02 Thread Δημήτρης Κλειναυτάκης
Hi all, I am using the Lucene 9.6 version and I am trying to add queries into LRUQueryCache from my benchmarks that evaluate the queries and create the LRUQueryCache. First, I believed that Lucene puts the queries by default into queryCache but that was never the case. So, I read the

[ANNOUNCE] Apache Lucene 9.11.1 released

2024-06-27 Thread Ignacio Vera
The Lucene PMC is pleased to announce the release of Apache Lucene 9.11.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Replacing DuplicateFilter with DiversifiedTopDocsCollector in Lucene 7.0.0

2024-06-19 Thread elegant . car3901
Hi there, I am currently updating an old project that was based on Apache Lucene 4.6.0. The project used a DuplicateFilter to filter search results with the following code: TopDocs docs = searcher.search(query, new DuplicateFilter(field), Integer.MAX_VALUE, new Sort(new SortField(sortField

[ANNOUNCE] Apache Lucene 9.11.0 released

2024-06-06 Thread Benjamin Trent
The Lucene PMC is pleased to announce the release of Apache Lucene 9.11.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread sanjay dutt
I opened an issue for this one ( https://github.com/apache/lucene/issues/13373). Please feel free to edit or add more info to it. Regards, Sanjay On Wed, May 15, 2024 at 8:07 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks Jeven, more response inlined below: > &

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread Michael McCandless
itional" (nice token btw) as many times as you like into a Lucene index, even force merging down to a single segment, is perfectly allowed, and it certainly should not throw an exception, let alone a cryptic one like this! That's a valid use-case. So we really need to understand

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Jerven Tjalling Bolleman
check in of our code 18 years ago! since then our data has grown a bit ;) The code was using Lucene 1.4.3 at that time. Users would search using this as what now would be a facet `type:positional`. I changed this to a field only IndexOptions.DOCS which is called 'positional' and searc

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Michael McCandless
> > > > Your response is very helpful already and I very much appreciate it as > > > it cuts down the search space significantly. > > > > > > Regards, > > > Jerven > > > > > > > > > On 5/7/24 14:03, Michael Sokolov wrote:

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
I very much appreciate it as > > it cuts down the search space significantly. > > > > Regards, > > Jerven > > > > > > On 5/7/24 14:03, Michael Sokolov wrote: > >> It seems as if the term frequency for some term exceeded the maximum. > >>

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Regards, Jerven On 5/7/24 14:03, Michael Sokolov wrote: It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttrib

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't change since 8.x but it's possible that the merging brought together some very "high frequency" terms th

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov
It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't c

ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman
Dear Lucene community, This morning I found this exception in our logs. This was the first time we indexed this data with lucene 9.10. Before we were still on the lucene 8.x branch. between the last indexing with 8 and this one with 9.10 we have a bit more data so it could be something else

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Mikhail Khludnev
gt; > > '-' and 'NOT' in query string stands for same reason theoretically. > > > > > > But, in practical, is there any difference? > > > > Why I am asking the question. In our product, we have got an incident > related to different resul

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
or below two queries. 1. Lucene Query String : report -kind:"AAD.AnalysisApplication_Bookmark" -kind:BIWidgets -kind:Discussions -kind:"DSL.MetaDataFile" -kind:"DSL.Universe" -kind:Event -kind:LCMJob -kind:ObjectPackage -kind:Profile -kind:Program -kind:Publica

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
[cid:image001.png@01DA9FCF.A22DAD00] [cid:image002.png@01DA9FD0.1DB4C0D0] But, in practical, is there any difference? Why I am asking the question. In our product, we have got an incident related to different result set for below two queries. 1. Lucene Query String : report -kind:"AAD.Ana

Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Saha, Rajib
ifference? Why I am asking the question. In our product, we have got an incident related to different result set for below two queries. 1. Lucene Query String : report -kind:"AAD.AnalysisApplication_Bookmark" -kind:BIWidgets -kind:Discussions -kind:"DSL.MetaDataFile

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-26 Thread Marc Davenport
on we actually saw an improvement in our overall indexing time and some performance improvements across the board. Thanks for all the feedback. Marc On Wed, Apr 24, 2024 at 9:47 AM Matt Davis wrote: > Marc, > > We also ran into this problem on updating to Lucene 9.5. We found it > suff

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-24 Thread Matt Davis
Marc, We also ran into this problem on updating to Lucene 9.5. We found it sufficient in our use case to just bump up LRU cache in the constructor to a high enough value to not pose a performance problem. The default value of 4k was way too low for our use case with millions of unique facet

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-23 Thread Dawid Weiss
tation (or something even more fine-tuned to your needs)? Dawid On Mon, Apr 22, 2024 at 10:29 PM Marc Davenport wrote: > Hello, > I've done bisect between 9.4.2 and 9.5 and found the PR affecting my > particular set up : https://github.com/apache/lucene/pull/12093 &

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-22 Thread Marc Davenport
Hello, I've done bisect between 9.4.2 and 9.5 and found the PR affecting my particular set up : https://github.com/apache/lucene/pull/12093 This is the switch from UTF8TaxonomyWriterCache to an LruTaxonomyWriterCache. I don't see a way to control the size of this cache to never expel

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-19 Thread Marc Davenport
r.add(FacetLabel) are significantly slower for me. https://github.com/apache/lucene/blob/releases/lucene/9.5.0/lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java#L383 I don't know what is special about my documents that I would be seeing this change. I'm going to start droppin

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Dawid Weiss
Hi Marc, You could try git bisect lucene repository to pinpoint the commit that caused what you're observing. It'll take some time to build but it's a logarithmic bisection and you'd know for sure where the problem is. D. On Thu, Apr 18, 2024 at 11:16 PM Marc Davenport wr

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Gautam Worah
Adrien Grand wrote: > > > Hi Marc, > > > > Nothing jumps to mind as a potential cause for this 2x regression. It > would > > be interesting to look at a profile. > > > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport > > wrote: > > > > >

Re: Indexing time increase moving from Lucene 8 to 9

2024-04-18 Thread Marc Davenport
se for this 2x regression. It would > be interesting to look at a profile. > > On Wed, Apr 17, 2024 at 9:32 PM Marc Davenport > wrote: > > > Hello, > > I'm finally migrating Lucene from 8.11.2 to 9.10.0 as our overall build > can > > now support Java 1

  1   2   3   4   5   6   7   8   9   10   >