subject:"Lucene"

[ANNOUNCE] Apache Lucene 9.12.3 released

2025-09-27 Thread Ankit Jain

The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.3. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

Re: Can't open Lucene 10 index using Luke

2025-09-27 Thread Dwaipayan Roy

Thanks for the support. I didn't understand what was wrong. But I solved the problem by building the entire Lucene 10 from the source, and then, Luke is working. I will also try this solution. Thanks, Doi. On Thu, Sep 25, 2025 at 1:12 PM Uwe Schindler wrote: > Hi, > > are you

Re: Can't open Lucene 10 index using Luke

2025-09-25 Thread Uwe Schindler

Hi, are you starting Luke with the provided startup script. To me it looks like there's a service provider file in classpath that instructs Lucene to load a codec (Lucene53) which should not be there. Lucene 10 does not ship with a Lucene 5.3 codec, so it looks like you have some se

Can't open Lucene 10 index using Luke

2025-09-24 Thread Dwaipayan Roy

Dear folks, I am trying to shift from Lucene 8.8 to 10.0. To start with, I have made a small index with 10. But I can't open it using the Luke 10 (comes together with Lucene 10). I am using java 21.0.8. Following error is what I am getting: SEVERE: Error opening index or dire

[ANNOUNCE] Apache Lucene 10.3.0 released

2025-09-13 Thread Vigya Sharma

The Lucene PMC is pleased to announce the release of Apache Lucene 10.3.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: Regarding PolishAnalyzer in Lucene 9.12.2

2025-09-03 Thread Dawid Weiss

 AM Saha, Rajib wrote: > Hi Team, > > In Lucene 9.12.2, we are trying to consume different locals specific > analyzer. > > When, we are trying to import PolishAnalyzer [import > org.apache.lucene.analysis.pl.PolishAnalyzer]. It is not able to resolve. > But, as per the doc

Re: Regarding PolishAnalyzer in Lucene 9.12.2

2025-09-03 Thread Marko Bekhta

Hey Rajib, I think you should include an extra dependency to org.apache.lucene:lucene-analysis-stempel to get to the PolishAnalyzer. Have a nice day, Marko On Wed, 3 Sept 2025 at 11:38, Saha, Rajib wrote: > Hi Team, > > In Lucene 9.12.2, we are trying to consume different locals

Regarding PolishAnalyzer in Lucene 9.12.2

2025-09-03 Thread Saha, Rajib

Hi Team, In Lucene 9.12.2, we are trying to consume different locals specific analyzer. When, we are trying to import PolishAnalyzer [import org.apache.lucene.analysis.pl.PolishAnalyzer]. It is not able to resolve. But, as per the documentation[https://lucene.apache.org/core/9_12_2/analysis

Re: Queries on Lucene Replication Approach

2025-08-25 Thread Viliam Ďurina

()` in regular intervals, sooner or later they will crash. So you should coordinate reader refresh with the synchronization. Viliam On Mon, Aug 25, 2025 at 6:05 PM Steven Schlansker < stevenschlans...@gmail.com> wrote: > Hi, we use Lucene NRT replication in production. > > For consi

Re: Queries on Lucene Replication Approach

2025-08-25 Thread Steven Schlansker

Hi, we use Lucene NRT replication in production. For consistent snapshots, we use SnapshotDeletionPolicy to open a snapshot, and then copy the snapshot'ed files with your tool of choice like rsync. Without a snapshot, I don't think such tools work reliably - you can copy commit metada

Re: Queries on Lucene Replication Approach

2025-08-25 Thread Adrien Grand

e are concurrent updates to the index. What is NRTLuceneReplication? I cannot find references to it. Lucene's replicator module does have NRT (near-realtime) support through: https://lucene.apache.org/core/10_2_0/replicator/org/apache/lucene/replicator/nrt/package-summary.html . On Mon, Aug 25, 2025

Queries on Lucene Replication Approach

2025-08-25 Thread sandy A

Hi Lucene Community Team, I have a couple of queries related to Lucene replication and would appreciate your guidance: *Query 1:* Is it safe to use tools like *rsync* (on Linux) or *robocopy* (on Windows) for copying Lucene segment files from one server to another? I want to understand if there

FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

2025-07-08 Thread Singh, Divya

From: Singh, Divya Sent: 04 July 2025 14:40 To: d...@lucene.apache.org Cc: Birajdar, Sharad (DI SW PLM LCS APPS ALM R&D7) Subject: FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer) From: Thakare, Monika (ext) (DI SW PLM LCS APPS A

[ANNOUNCE] Apache Lucene 10.2.2 released

2025-06-20 Thread Chris Hegarty

The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.2. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

[ANNOUNCE] Apache Lucene 9.12.2 released

2025-06-20 Thread Chris Hegarty

The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.2. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-30 Thread Michael Sokolov

> Regards > Rajib > > -Original Message- > From: Saha, Rajib > Sent: 27 May 2025 11:52 > To: java-user@lucene.apache.org > Subject: RE: Suggestion needed for a case of Lucene Migration with TokenStream > > Hi Uwe, > > Thanks for your suggestions till now. We have be

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-29 Thread Saha, Rajib

needed for a case of Lucene Migration with TokenStream Hi Uwe, Thanks for your suggestions till now. We have been able to proceed good. We are now stuck to a point, where we need some your expert suggestion. As per our design, on full content indexing, - in first step, there will small Lucene index

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-05-26 Thread Saha, Rajib

Hi Uwe, Thanks for your suggestions till now. We have been able to proceed good. We are now stuck to a point, where we need some your expert suggestion. As per our design, on full content indexing, - in first step, there will small Lucene index files gets created with 5-6 documents. We called

Re: Regarding Clustering Support in Lucene

2025-05-14 Thread Arun Kumar Kalakanti

Dear all, My bad, KMeans is in 10.2 too. Are there any other clustering algos like DBSCAN (or HDBSCAN) or Agglomerative planned in future? Regards, Arun Kumar K On Tue, 6 May 2025 at 17:11, Arun Kumar Kalakanti wrote: > Dear all, > > Lucene 10.1 introduced "experimental"

Regarding Clustering Support in Lucene

2025-05-06 Thread Arun Kumar Kalakanti

Dear all, Lucene 10.1 introduced "experimental" KMeans clustering of vectors. However, I couldn't find it in the 10.2 version. Ref: https://lucene.apache.org/core/10_1_0/sandbox/org/apache/lucene/sandbox/codecs/quantization/KMeans.html Could you please share the plans, if any, or

[ANNOUNCE] Apache Lucene 10.2.1 released

2025-05-01 Thread Chris Hegarty

The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.1. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-29 Thread Uwe Schindler

different level of indexing like MetaData/FullContent information of the Reports. So, Rebuild indexing deletes the existing Lucene index files and do a fresh indexing of all the documents. When we do physically going to directory and delete the Lucene Index files. The Rebuild indexing is working

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-29 Thread Saha, Rajib

Hi Uwe, In our product we have different level of indexing like MetaData/FullContent information of the Reports. So, Rebuild indexing deletes the existing Lucene index files and do a fresh indexing of all the documents. When we do physically going to directory and delete the Lucene Index files

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-28 Thread Uwe Schindler

ut data for the new indexer and sends it to the API (or whatever you have for indexing in your new system). If you just have incomplete Lucene Document instances from the older Lucene index, I think you're lost. When you cann IndexReader/IndexSearcher.document(), you only get stored fields

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-28 Thread Saha, Rajib

Hi Uwe, Thank you for your detailed input and valuable advice. I fully understand and agree that upgrading from such an old version of Lucene involves much more than just resolving compilation issues. Based on the latest Lucene version, we have redesigned our platform accordingly going through

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-25 Thread Uwe Schindler

Hi, I'd like to mention the following: You are trying to upgrade Lucene from a really ancient version. Of course, basic concepts are still the same, but the serach engine and its APIs have changed dramatically, so just trying to "compile code and fix random stuff until it compiles

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-24 Thread Mikhail Khludnev

ludnev > Sent: 24 April 2025 12:10 > To: java-user@lucene.apache.org > Subject: Re: Suggestion needed for a case of Lucene Migration with > TokenStream > > Hi > Use TextField.TYPE_STORED as the third argument in new Field() > see > > https://github.com/apache/lucene-solr/blo

RE: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-24 Thread Saha, Rajib

) = Can you please suggest here too? Regards Rajib -Original Message- From: Mikhail Khludnev Sent: 24 April 2025 12:10 To: java-user@lucene.apache.org Subject: Re: Suggestion needed for a case of Lucene Migration with TokenStream Hi Use TextField.TYPE_STORED as the third

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-23 Thread Mikhail Khludnev

Hi Use TextField.TYPE_STORED as the third argument in new Field() see https://github.com/apache/lucene-solr/blob/e27f44e3d78dfcec230c97e0a1240e3751daeff9/lucene/core/src/java/org/apache/lucene/document/TextField.java#L35C33-L35C44 On Thu, Apr 24, 2025 at 8:37 AM Saha, Rajib wrote: > Hi Expe

Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-23 Thread Saha, Rajib

Hi Experts, We are migrating Lucene from 2.4.1 to 8.11.2. During Migration for a part of code, we are getting below exception in 8.11.2 based changes from Red line colored. = java.lang.IllegalArgumentException: TokenStream fields must be indexed and tokenized at

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread Uwe Schindler

long VarHandles to get 64 dimensions in one go (<https://github.com/apache/lucene/pull/13288/files#diff-1faf01efbf448c751b357e758254b2e623de1145b07bd8afcfe8a49b7dbde9cc>). https://lucene.apache.org/core/10_2_0/codecs/org/apache/lucene/codecs/bitvectors/HnswBitVectorsFormat.html But you h

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread John Dale (DB2DOM)

unsubscribe On Tue, Mar 19, 2024 at 2:59 PM Shubham Chaudhary wrote: > Hi Michael, > > Lucene already had int8 vector support since 9.5 (#1054 > <https://github.com/apache/lucene/pull/1054>) but it was left to the user > to get those quantized vectors and index usi

[ANNOUNCE] Apache Lucene 10.2.0 released

2025-04-10 Thread Ignacio Vera

The Lucene PMC is pleased to announce the release of Apache Lucene 10.2.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: How can I know the lucene index version from files

2025-03-02 Thread Mikhail Khludnev

I suppose it depends on the version. On Sun, Mar 2, 2025 at 10:55 AM Ralf Heyde wrote: > Hey, > > You might use ‚luke‘ to figure it out. > > Luke is part of the lucene project and a tool to look into indexes. > > Cheers Ralf > > Von meinem Telefon gesendet, etwaige

Re: How can I know the lucene index version from files

2025-03-02 Thread Daniel Cerqueira

> On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira >>> wrote: >>> >>> I have this lucene index files, in a directory: >>> >>> ``` >>> $ ls >>> _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos >>> _1p_Lucene

Re: How can I know the lucene index version from files

2025-03-02 Thread Daniel Cerqueira

> On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira > wrote: > >> I have this lucene index files, in a directory: >> >> ``` >> $ ls >> _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos >> _1p_Lucene41_0.tim _1p_Lucene41_0.tip _1p.nvd

Re: How can I know the lucene index version from files

2025-03-01 Thread Ralf Heyde

Hey, You might use ‚luke‘ to figure it out. Luke is part of the lucene project and a tool to look into indexes. Cheers Ralf Von meinem Telefon gesendet, etwaige Rechtschreibfehler kann ich nicht ausschliessen > Am 02.03.2025 um 08:18 schrieb Mikhail Khludnev : > > Hi Daniel.

Re: How can I know the lucene index version from files

2025-03-01 Thread Mikhail Khludnev

print it to console that should answer your questions. On Sun, Mar 2, 2025 at 12:21 AM Daniel Cerqueira wrote: > I have this lucene index files, in a directory: > > ``` > $ ls > _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos > _1p_Lucene41_0.tim _1p_Lucene41_0.

How can I know the lucene index version from files

2025-03-01 Thread Daniel Cerqueira

I have this lucene index files, in a directory: ``` $ ls _1p.fdt _1p.fdx _1p.fnm _1p_Lucene41_0.doc _1p_Lucene41_0.pos _1p_Lucene41_0.tim _1p_Lucene41_0.tip _1p.nvd _1p.nvm _1p.si segments_1 segments.gen write.lock ``` - How can I know which is the version of this lucene index

Re: apache-lucene blowing up with large file

2025-03-01 Thread Dawid Weiss

index your document(s) and how you can then query those documents. You can even start with the source of IndexFiles (the demo class). > That's a school example of integer overflow. Perhaps Lucene is not designed to work with such a large single files Correct. Token offsets and positions with

Re: apache-lucene blowing up with large file

2025-02-28 Thread Daniel Cerqueira

> On Fri, Feb 28, 2025 at 10:30 AM Daniel Cerqueira > wrote: > >> Hi. I have apache-lucene version 10.1.0: >> ``` >> $ pacman -Qs apache-lucene >> local/apache-lucene 10.1.0-1 >> Apache Lucene is a high-performance, full-featured text search eng

Re: apache-lucene blowing up with large file

2025-02-28 Thread Hrvoje Lončar

That's a school example of integer overflow. Perhaps Lucene is not designed to work with such a large single files. On Fri, 28 Feb 2025, 10:50 Dawid Weiss, wrote: > Split your large file into smaller fragments and index each fragment as a > document. > > D. > > On Fri, F

Re: apache-lucene blowing up with large file

2025-02-28 Thread Dawid Weiss

Split your large file into smaller fragments and index each fragment as a document. D. On Fri, Feb 28, 2025 at 10:30 AM Daniel Cerqueira wrote: > Hi. I have apache-lucene version 10.1.0: > ``` > $ pacman -Qs apache-lucene > local/apache-lucene 10.1.0-1 > Apache Lucene is a h

apache-lucene blowing up with large file

2025-02-28 Thread Daniel Cerqueira

Hi. I have apache-lucene version 10.1.0: ``` $ pacman -Qs apache-lucene local/apache-lucene 10.1.0-1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. ``` I am trying to build a lucene index for a large file. ``` $ ll total 2,3G -rw

lucene-replicator: how to correctly reset NRT version

2025-02-21 Thread Steven Schlansker

Hi Lucene friends, We use the replicator module to implement log-shipping replication for our Lucene cluster. We have an offline "rebuild everything" process for use when indexing or data formats change. We have a single primary node that only serves the IndexWriter and replicator

RE: Re: Sentence classification with Lucene

2025-02-19 Thread Dmitri Geller

Yes, something like lucene-classification [1]. But, there are multiple classifiers in this package. Which one is better suited ? (Imagine I collect more samples per class... about... 30-40 samples per class) Any good Java examples using these classifiers? Another question: in case I want my

Re: Sentence classification with Lucene

2025-02-19 Thread Tommaso Teofili

Hi, if you have 30 classes with 10 samples per class, I'd say that's not an optimal distribution. Apart from that, you may use one of the text classifiers from lucene-classification [1], is anything like this what you had in mind? Alternatively you can also do things outside of Luce

Sentence classification with Lucene

2025-02-17 Thread Dmitri Geller

: example1 example2 ... exampleN ... ``` There are about 25-30 classes. About 10-30 examples per class. One sentence can get one or two classes assigned As far as I understand: this can be done with Lucene Core, should be quite a standard functionality. Can you point me to a Java example

Re: Reg Migration to 10.0.0 lucene core jar

2025-01-03 Thread Uwe Schindler

Hi, Which vulnerability are you talking about?!? We opened a CVE a while ago, but this was not about Lucene Core. Some checkers have false positives due to name mismatch. Am 13.12.2024 um 10:41 schrieb lavanya ponnapoolu: Hi Team, We are upgrading lucene-core jar from 4.7.0 to 10.0.0

[ANNOUNCE] Apache Lucene 10.1.0 released

2024-12-20 Thread Luca Cavanna

The Lucene PMC is pleased to announce the release of Apache Lucene 10.1.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: Reg Migration to 10.0.0 lucene core jar

2024-12-14 Thread Mikhail Khludnev

Hello, org.apache.lucene.document.Field is there https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/document/Field.html or I don't understand what you refers to. Please elaborate. I think you need org.apache.lucene.store.FSDirectory#open(java.nio.file.Path) All jars should be the

Reg Migration to 10.0.0 lucene core jar

2024-12-13 Thread lavanya ponnapoolu

Hi Team, We are upgrading lucene-core jar from 4.7.0 to 10.0.0 because of vulnerability. org.apache.lucene.document.Field.*Index *but am not finding any alternative as part of https://lucene.apache.org/core/6_0_0/MIGRATE.html. From lucene-core-6.0.0 this class files are removed. Same with

[ANNOUNCE] Apache Lucene 9.12.1 released

2024-12-13 Thread Chris Hegarty

The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Re: Lucene Query Metrics

2024-12-04 Thread Mikhail Khludnev

Hello, There's nothing like that. On top of my head is a profile collector in Elasticsearch. On Wed, Dec 4, 2024 at 11:46 PM ashwini singh wrote: > Does lucene provide extensions (utilities)to extract metrics from Lucene > during the request execution? Or applications can only trac

Re: Lucene Query Metrics

2024-12-04 Thread ashwini singh

Does lucene provide extensions (utilities)to extract metrics from Lucene during the request execution? Or applications can only track execution stats on top of Lucene. On Tue, 3 Dec 2024 at 23:20, Adrien Grand wrote: > Lucene doesn't expose query metrics, it's up to the app

Re: Lucene Query Metrics

2024-12-03 Thread Adrien Grand

Lucene doesn't expose query metrics, it's up to the application that integrates Lucene to compute and expose metrics that are relevant to them. Le mer. 4 déc. 2024, 00:31, ashwini singh a écrit : > Hey everyone, > > Does lucene provide any query metrics (perf) ? I am lo

Lucene Query Metrics

2024-12-03 Thread ashwini singh

Hey everyone, Does lucene provide any query metrics (perf) ? I am looking for something very similar to MongoSB explain() output or Execution metrics for Cosmos DB? *Thanks and Regards,* *Ashwini Singh*

Re: Lucene Slack Channel

2024-12-03 Thread ashwini singh

Thanks !! On Wed, 13 Nov 2024 at 13:31, Gus Heck wrote: > The slack channel (named 'lucene-dev') is generally for people building > lucene itself, and not generally for people looking for help providing > solutions using lucene. Typically one gets an apache.org address by &g

Re: Lucene Slack Channel

2024-11-13 Thread Gus Heck

The slack channel (named 'lucene-dev') is generally for people building lucene itself, and not generally for people looking for help providing solutions using lucene. Typically one gets an apache.org address by contributing enough to an apache project to get invited as a committer. Alt

Re: Lucene Slack Channel

2024-11-13 Thread Michael Wechner

Wechner wrote: I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene

Re: Lucene Slack Channel

2024-11-13 Thread ashwini singh

e access policy? > > Thanks > > Michael > > Am 04.11.24 um 23:56 schrieb ashwini singh: > > Hi > > > > How can I get added to lucene slack channel? I am working on Lucene to > > build a customer search technology. I

Re: Lucene Slack Channel

2024-11-04 Thread Michael Wechner

I think one can only join when you have an apache.org email address https://infra.apache.org/slack.html but maybe I misunderstand the access policy? Thanks Michael Am 04.11.24 um 23:56 schrieb ashwini singh: Hi How can I get added to lucene slack channel? I am working on Lucene to build a

Lucene Slack Channel

2024-11-04 Thread ashwini singh

Hi How can I get added to lucene slack channel? I am working on Lucene to build a customer search technology. I want to discuss more about lucene in the community -- *Thanks and Regards,* *Ashwini Singh*

RE: Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-29 Thread Renaud SAINT-GRATIEN

CONFIDENTIAL Hello, Indeed, I double-checked, and our app does not use lucene-replicator. I silenced my dumb security scanner. Thank you for your help. -Original Message- From: Michael Sokolov Sent: Monday, October 28, 2024 3:06 PM To: java-user@lucene.apache.org Subject: Re: Any

Re: Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-28 Thread Michael Sokolov

Do you actually use org.apache.lucene.replicator.http ? If not then this wouldn't have any material impact on your application. On Mon, Oct 28, 2024 at 4:25 AM Renaud SAINT-GRATIEN wrote: > > CONFIDENTIAL > > Hello, > > Is there any plan to patch Lucene 8.11 for CVE-2024-4

Any plans to patch Lucene 8.11.x for CVE-2024-45772 ?

2024-10-28 Thread Renaud SAINT-GRATIEN

CONFIDENTIAL Hello, Is there any plan to patch Lucene 8.11 for CVE-2024-45772 ? I need to stay on 8.11 branch because my application still runs on Java 8. We plan to migrate to Java 17 but this cannot be done sooner than mid 2025... (this is a huge application). Thank you for this amazing

Re: Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Michael Froh

Hi Prashant, For your particular use-case, you probably don't need to join across multiple indices. Lucene is able to maintain multiple data structures per field, with the selection of data structures coming from attributes of the field's type. If you have a field that you want to r

Understanding Document ID (Lucene 10.0.0)

2024-10-25 Thread Prashant Saxena

I'm new to Lucene and trying to understand the concept of unique document id, something like a primary key in databases like sql or sqlite etc. While searching, I came across this article: https://blog.mikemccandless.com/2014/05/choosing-which actually fast-unique-identifier-uuid.html &

Re: lucene build failure on Windows using pylucene 9.7.0

2024-10-21 Thread Gautam Worah

scratch as > I am new to javascript and lucene. It will help me learn. > > 1. downloading and extracting pylucene > 2. cd lucene-java-9.7.0 > 3. gradlew.bat assemble > > Downloading https://services.gradle.org/distributions/gradle-7.6-bin.zip > > ...10%...

lucene build failure on Windows using pylucene 9.7.0

2024-10-21 Thread Prashant Saxena

Hello, OS : Windows 10 PyLucene : 9.7.0 JDK : 23.0 Although I can download the binary distribution of version 9.7.0, I have decided to build it from scratch as I am new to javascript and lucene. It will help me learn. 1. downloading and extracting pylucene 2. cd lucene-java-9.7.0 3. gradlew.bat

Re: Learning resources for Lucene Development

2024-10-15 Thread Marc Davenport

; > In some shameless self-promotion, I've written up some worked Lucene > examples (maybe a little more focused on Lucene internals than best > practices) over at https://github.com/msfroh/lucene-university. If you > have > anything you'd like to understand better, feel free to

[ANNOUNCE] Apache Lucene 10.0.0 released

2024-10-14 Thread Luca Cavanna

The Lucene PMC is pleased to announce the release of Apache Lucene 10.0.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: Learning resources for Lucene Development

2024-10-09 Thread Michael Froh

Hi Marc, In some shameless self-promotion, I've written up some worked Lucene examples (maybe a little more focused on Lucene internals than best practices) over at https://github.com/msfroh/lucene-university. If you have anything you'd like to understand better, feel free to open is

Re: Learning resources for Lucene Development

2024-10-08 Thread Navneet Verma

+1 on the question. On Tue, Oct 8, 2024 at 6:35 PM Marc Davenport wrote: > Hello, > I had this question buried in a previous email. I feel like I have a very > loose grasp on the Lucene API and how to properly implement with it. I'm > working on code that I didn't write

Learning resources for Lucene Development

2024-10-08 Thread Marc Davenport

Hello, I had this question buried in a previous email. I feel like I have a very loose grasp on the Lucene API and how to properly implement with it. I'm working on code that I didn't write myself from the ground up. Since I'm learning as I'm reading it, I can only assume th

[ANNOUNCE] Apache Lucene 9.12.0 released

2024-09-28 Thread Chris Hegarty

The Lucene PMC is pleased to announce the release of Apache Lucene 9.12.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting

Re: Current command line tools for Lucene?

2024-09-25 Thread Uwe Schindler

Hi, One addition to Dawid's comment: Please make sure to use the "Luke" version shipped with Lucene Distribution. The versions available separately in Github are outadted, that's correct. Uwe Am 25.09.2024 um 08:15 schrieb Dawid Weiss: I spent some time with ChatGPT and

Re: Current command line tools for Lucene?

2024-09-24 Thread Dawid Weiss

> I spent some time with ChatGPT and Google, looking for a simple CLI method > to explore the content. I see mention of Luke, but it seems very dated. Luke is your best bet. There is no command-line tool to "explore the content" because Lucene indexes are fairly low level. I&

Re: Current command line tools for Lucene?

2024-09-24 Thread Dwaipayan Roy

t; > I am exploring the Maltego link analysis package file format and I see > mention of Lucene in the folders. I'm a little bit familiar from having > used ArangoDB and Elasticsearch in the past. > > I spent some time with ChatGPT and Google, looking for a simple CLI method >

Current command line tools for Lucene?

2024-09-22 Thread neal rauhauser

Hello, I am exploring the Maltego link analysis package file format and I see mention of Lucene in the folders. I'm a little bit familiar from having used ArangoDB and Elasticsearch in the past. I spent some time with ChatGPT and Google, looking for a simple CLI method to explore the conte

Re: Get knowledge about apache lucene index migrate

2024-09-12 Thread Rui Wu

g that would lead to grief > for users and/or hamper development of Lucene, so now you can only > upgrade one major version. If you need to do so, the best supported > option is to write a program that reads your data from one index (old > version) and writes it to a new one. There h

Re: Excessive reads while doing commit in lucene

2024-09-04 Thread Michael McCandless

It's odd to have a ~500X difference in writes versus reads. Are you sure? Is it possible you are also opening IndexReaders and searching the commit points? Lucene does re-read previously written (already indexed) documents during segment merges. But at default settings (as long as you di

Re: Excessive reads while doing commit in lucene

2024-09-04 Thread Robert Muir

On Wed, Sep 4, 2024 at 7:07 AM Gopal Sharma wrote: > > Hi Team, > > I am using aws efs to store a lucene index. That's the issue, don't use NFS! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa

Excessive reads while doing commit in lucene

2024-09-04 Thread Gopal Sharma

Hi Team, I am using aws efs to store a lucene index. While indexing for around 200 millions records. The efs read iops went till 20,097 GB's and the cost for efs went too high. Whereas the write iops was only 56GB In my use case i am committing every 100k records (because in my test scen

Re: Get knowledge about apache lucene index migrate

2024-08-06 Thread Michael Sokolov

Yes, there is no support for upgrading a pre-8.x index to 9 or later. At some point it was decided that supporting that would lead to grief for users and/or hamper development of Lucene, so now you can only upgrade one major version. If you need to do so, the best supported option is to write a

Get knowledge about apache lucene index migrate

2024-08-05 Thread Jayamal Jayamaha

Hello I am currently working on a project that is using apache lucene 4.1.0 version. Now I need to upgrade that version to 9.11.1. So I configure the imports and configure the codebase according to the new lucene version. Now I need to upgrade existing indexes which have been created using lucene

Re: Lucene LRUQueryCache question

2024-07-16 Thread Yixun Xu

This post explains why Lucene doesn't cache all queries: https://www.mail-archive.com/java-user@lucene.apache.org/msg51649.html Your queries could be skipping the cache because of the LRUQueryCache constructor parameters, or because of the QueryCachingPolicy.shouldCache predicate. They pro

Lucene LRUQueryCache question

2024-07-02 Thread Δημήτρης Κλειναυτάκης

Hi all, I am using the Lucene 9.6 version and I am trying to add queries into LRUQueryCache from my benchmarks that evaluate the queries and create the LRUQueryCache. First, I believed that Lucene puts the queries by default into queryCache but that was never the case. So, I read the

[ANNOUNCE] Apache Lucene 9.11.1 released

2024-06-27 Thread Ignacio Vera

The Lucene PMC is pleased to announce the release of Apache Lucene 9.11.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform

Replacing DuplicateFilter with DiversifiedTopDocsCollector in Lucene 7.0.0

2024-06-19 Thread elegant . car3901

Hi there, I am currently updating an old project that was based on Apache Lucene 4.6.0. The project used a DuplicateFilter to filter search results with the following code: TopDocs docs = searcher.search(query, new DuplicateFilter(field), Integer.MAX_VALUE, new Sort(new SortField(sortField

[ANNOUNCE] Apache Lucene 9.11.0 released

2024-06-06 Thread Benjamin Trent

The Lucene PMC is pleased to announce the release of Apache Lucene 9.11.0. Apache Lucene is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread sanjay dutt

I opened an issue for this one ( https://github.com/apache/lucene/issues/13373). Please feel free to edit or add more info to it. Regards, Sanjay On Wed, May 15, 2024 at 8:07 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks Jeven, more response inlined below: > &

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-15 Thread Michael McCandless

itional" (nice token btw) as many times as you like into a Lucene index, even force merging down to a single segment, is perfectly allowed, and it certainly should not throw an exception, let alone a cryptic one like this! That's a valid use-case. So we really need to understand

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Jerven Tjalling Bolleman

check in of our code 18 years ago! since then our data has grown a bit ;) The code was using Lucene 1.4.3 at that time. Users would search using this as what now would be a facet `type:positional`. I changed this to a field only IndexOptions.DOCS which is called 'positional' and searc

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-14 Thread Michael McCandless

> > > > Your response is very helpful already and I very much appreciate it as > > > it cuts down the search space significantly. > > > > > > Regards, > > > Jerven > > > > > > > > > On 5/7/24 14:03, Michael Sokolov wrote:

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov

I very much appreciate it as > > it cuts down the search space significantly. > > > > Regards, > > Jerven > > > > > > On 5/7/24 14:03, Michael Sokolov wrote: > >> It seems as if the term frequency for some term exceeded the maximum. > >>

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman

Regards, Jerven On 5/7/24 14:03, Michael Sokolov wrote: It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttrib

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman

term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't change since 8.x but it's possible that the merging brought together some very "high frequency" terms th

Re: ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Michael Sokolov

It seems as if the term frequency for some term exceeded the maximum. This can happen if you supplied custom term frequencies eg with https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true . The behavior didn't c

ArithmeticException: due to integer overflow during lucene merging

2024-05-07 Thread Jerven Tjalling Bolleman

Dear Lucene community, This morning I found this exception in our logs. This was the first time we indexed this data with lucene 9.10. Before we were still on the lucene 8.x branch. between the last indexing with 8 and this one with 9.10 we have a bit more data so it could be something else

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 5598 matches

Mail list logo