Re: RAM-per-thread hard limit

2025-05-23 Thread Uwe Schindler
t addressing, such as the `ByteBlockPool.byteOffset` above, and perhaps others, but Lucene is including the entire RAM usage into the limit, and therefore builds unnecessarily small segments. Viliam -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.d

Re: Crashes caused by high deleted .dvd file mmap counts

2025-05-13 Thread Uwe Schindler
ppings aren't being "released" after being deleted? Justin On Fri, May 9, 2025 at 7:03 AM Uwe Schindler wrote: Hi, Did the sharedArenaMaxPermits=64 help. Actually sorry for the answer, I did not recognize that you were talking about doc values updates. I just saw deleted. Bu

Re: Crashes caused by high deleted .dvd file mmap counts

2025-05-09 Thread Uwe Schindler
146 /usr/share/opensearch/data/nodes/0/indices/Ci3MyIbNTceUmC67d1IlwQ/126/index/_9buk_4e_Lucene90_0.dvd (deleted) ``` Justin On Wed, May 7, 2025 at 9:50 AM Uwe Schindler wrote: Hi, this could be related to a bug or limitation of the following change: 1. GITHUB#13570 <https://github.

Re: Expressions module, support of Strings

2025-05-07 Thread Uwe Schindler
that? I can see that Javascript.g4 (Antlr grammar) references a "STRING" but can't make out if/how it's used in JavaScriptCompiler. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Crashes caused by high deleted .dvd file mmap counts

2025-05-07 Thread Uwe Schindler
file to a single shared arena. (Chris Hegarty, Michael Gibney, Uwe Schindler) Actually it looks like there are many deletes on the same index segment so the segment itsself is not closed but the deltes are updated over an over. As the whole segment uses the same shared memory arena and it won

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-29 Thread Uwe Schindler
may be some lock is there in Lucene Index files, due to of it, delete of Lucene index files are not working with stopping the service. But, this is a guess. Investigation is on for it. Do you have any suspect? Regards Rajib -Original Message- From: Uwe Schindler Sent: 28 April 2025 17

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-28 Thread Uwe Schindler
art now. Do you have any suggestion on the problem ? Regards Rajib -Original Message- From: Uwe Schindler Sent: 25 April 2025 18:19 To: java-user@lucene.apache.org Subject: Re: Suggestion needed for a case of Lucene Migration with TokenStream Hi, I'd like to mention the following: Yo

Re: Suggestion needed for a case of Lucene Migration with TokenStream

2025-04-25 Thread Uwe Schindler
index/PayloadTokenStream.java#uid 24); PayloadAttributeImpl attributeImpl = new PayloadAttributeImpl(new BytesRef(buffer)); addAttributeImpl(attributeImpl); returnToken = true; } public boolean incrementT

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread Uwe Schindler
- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: Synonyms and searching

2025-03-10 Thread Uwe Schindler
word for word synonyms output (e.g. if the content contains "license", the emerging tokens include both "licence" and "license"), but the phrase substitutions are not. "http", "proxy" and "server " are there, but none of the conj

Re: Reg Migration to 10.0.0 lucene core jar

2025-01-03 Thread Uwe Schindler
changed to NIOFSDirectory? Also we are using lucene-analyzers-common-4.7.0.jar, lucene-queries-4.7.0.jar, lucene-queryparser-4.7.0.jar, lucene-sandbox-4.7.0.jar. When lucene core is upgraded is it recommended to upgrade all these jars. Regards, Lavanya -- Uwe Schindler Achterdiek 19, D-28357

Re: Support for static analysis annotations

2025-01-03 Thread Uwe Schindler
ies. I would like to propose annotating Lucene methods with "org.apache.lucene.annotations.Nullable" where applicable, and perhaps similar for @MustBeClosed. -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.theta

Re: Custom Query Implementation

2025-01-03 Thread Uwe Schindler
, ... termN -> (doc_id1, score), (doc_idN, score), ... Where resulting score will be calculated as: sum(scores) by doc_id for terms in some query Thank you in advance! Best Regards, Viacheslav Dobrynin! -- Sincerely yours Mikhail Khludnev -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://ww

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Uwe Schindler
oughts here. Thanks Navneet On Tue, Oct 1, 2024 at 2:55 AM Uwe Schindler wrote: Hi, thinking about it a bit more: In 10.x we already have some ways to preload data with WILL_NEED (or similar). Maybe this can also be used on merging when we reuse an already open IndexInput. Maybe it is possib

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
forward to your feedback. Thanks Navneet On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler wrote: Hi, this seems to be aspecial case in FlatVectors, because normally theres a separate method to open an IndexInput for checksumming: https://github.com/apache/lucene/blob/524ea208c870861a71

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
nks Navneet On Tue, Oct 1, 2024 at 12:19 AM Uwe Schindler wrote: Hi, this seems to be aspecial case in FlatVectors, because normally theres a separate method to open an IndexInput for checksumming: https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
ting a gh issue. If you believe I should create a GH issue first I can do that. As it might take me sometime to build reproducible benchmarks. Thanks Navneet On Mon, Sep 30, 2024 at 3:08 AM Uwe Schindler wrote: Hi, please also note: In Lucene 10 there checksum IndexInput will always be opened w

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Uwe Schindler
seeing is close to 50%. Hence the performance question is coming up, I wanted to understand is this understanding correct? Thanks Navneet -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de --

Re: Current command line tools for Lucene?

2024-09-25 Thread Uwe Schindler
ontent of stored fields (and not indexes or position/offset data). If this is the case then I'm sure ChatGPT can spit out a snippet of code to read an index and dump stored fields to stdout. For anything more advanced, you'll have to write Java code and traverse the data structures of in

Re: fuzzy search and distance tilde

2024-08-20 Thread Uwe Schindler
lly that's how fulltext search works. Take the user entered text and tokenize/analyze it in the same way like you do on indexing and then find token matches in index for the query tokens. Uwe On 19/08/2024 12:32, Uwe Schindler wrote: Hi, Basically, my only recommendation is to NOT use the

Re: fuzzy search and distance tilde

2024-08-19 Thread Uwe Schindler
.SearchOperation.doRun(SearchOperation.java:202) [classes/:?] at org.events.business.search.operations.ReadFromIndexOperation.run(ReadFromIndexOperation.java:29) [classes/:?] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:5

Re: Slow HNSW creation times.

2024-04-29 Thread Uwe Schindler
g or continuous Garbage Collection pauses. Greatly appreciate any pointers or thoughts on how to further debug this issue or improve the performance. Thanks Kannan Krishnamurthy. -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u

Re: Right Way to Read vectors from Index

2024-02-12 Thread Uwe Schindler
@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe

Re: Need suggestion for a Lucene upgrade scenario

2024-01-30 Thread Uwe Schindler
ther information is required from my side. Regards Rajib -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additi

Re: NumericRangeQuery in Lucene 5.5.5: replacing the deprecated setBoost while keeping the NumericRange type?

2023-11-26 Thread Uwe Schindler
Lucene 5.5.5 where setBoost is deprecated for all Query types. How to set the boost of a NumericRangeQuery while preserving the NumericRangeQuery type? BoostQuery doesn't allow this and I haven't found a way. Thanks for your help. Claude Lepère -- Sincerely yours Mikhail Khludnev -- Uwe

Re: StandardQueryParser and numeric fields

2023-11-14 Thread Uwe Schindler
but I can't seem to track down what I'm missing. The analyzer is the exact same analyzer I'm using during indexing. It's a PerFieldAnalyzerWrapper. The specific analyzer for the numeric fields is the one I mentioned above (StandardAnalyzer). The query used is: index

Re: DisjunctionMinQuery

2023-11-09 Thread Uwe Schindler
a common query to use? Thanks! Marc -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-11-08 Thread Uwe Schindler
eader fieldsReader(SegmentReadState state)throws IOException {     return delegate.fieldsReader(state);     }     @Override public int getMaxDimensions(String fieldName) {     log.info("Maximum vector dimension: " +maxDimensions);     return maxDimensions;     } } Am

Re: Preventing field data from being loaded into page cache

2023-10-21 Thread Uwe Schindler
d into the page cache. Does Lucene have any mechanisms to explicitly prevent them from being cached? Is it even possible with Java? Thanks, Justin Borromeo -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@t

Re: Field[vector]vector's dimensions must be <= [1024]; got 1536

2023-10-19 Thread Uwe Schindler
commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Brem

Re: How to replace deprecated document(i)

2023-09-25 Thread Uwe Schindler
Like this? Thanks Michael Am 25.09.23 um 10:28 schrieb Uwe Schindler: Background: For performance, it is advisable to get the storedFields() *once* to process all documents in the search result. The resason for the change was the problem of accessing stored fields would otherwise need t

Re: How to replace deprecated document(i)

2023-09-25 Thread Uwe Schindler
-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https

Re: forceMerge(1) leads to ~10% perf gains

2023-09-22 Thread Uwe Schindler
o query and still maintain accuracy than simply word tokenizing a sentence and joining with OR text: ? -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Uwe Schindler
implementation front, I am using a set of filters like lowerCaseFilter, EnglishPossessiveFilter etc in addition to base tokenizer StandardTokenizer. Per my analysis, StandardTOkenizer strips off the % sign and hence the behavior.Has someone faced similar requirement? Any help/guidance is highly appreciated.

Re: Reindexing leaving behind 0 live doc segments

2023-09-13 Thread Uwe Schindler
process there are no more 7.x segments as referenced by the segments_x file. But for some reason the physical 7.x segment files continue to stay behind until I restart Solr. Thanks, Rahul On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler wrote: Hi, in Solr the empty segment keeps open as long as there

Re: Reindexing leaving behind 0 live doc segments

2023-09-04 Thread Uwe Schindler
ase(rld); }finally{ if (iwRef != null) { iwRef.decref(); } } Help would be much appreciated! Thanks, Rahul -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsu

Re: Disjunctively scoring non-matching conjunctive clauses

2023-07-21 Thread Uwe Schindler
arate this into a matching query that is wrapped by a ConstantScore query so it has no score and a scoring query that will provide a disjunctive score. My approach feels a bit convoluted, so I was wondering if there were any cleaner ways to do this? And if not, are there any drawback

Re: Getting LinkageError due to Panama APIs

2023-06-30 Thread Uwe Schindler
y.java:448) : : Thanks, Shubham -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

Re: Question about index segment search order

2023-05-13 Thread Uwe Schindler
test generated segments are searched first? Thanks, Wei - To unsubscribe, e-mail:java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-02-10 Thread Uwe Schindler
Exception e) { e.printStackTrace(); } == Regards Rajib -Original Message- From: Uwe Schindler Sent: 06 February 2023 16:46 To: java-user@lucene.apache.org Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 Hi, Since around Lucene 4 (maybe alre

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

2023-02-06 Thread Uwe Schindler
DFwKIBRNEPHczjND9Wa%2FdPzJAYByUqnbAs%3D&reserved=0 A caveat: Cyrillic! - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Acht

Re: Question about current situation of good first issues in GitHub

2023-01-10 Thread Uwe Schindler
commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional

Re: Need your perspective on Garbage Collection

2023-01-03 Thread Uwe Schindler
help me in managing it and provide your insight what steps or configuration i should prefer some useful way to optimize it . my index size 700 GB what configurations you suggest for it , like jvm,ram ,cpu cores,heap size,young and old genration. I hope to hear from you soon - -- Uwe

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
ion. It is unfortunate there seems to be problems with this solution. Microsoft seems not interested in extending the volume mapping options for ACIs and K8 is overkilling for our use case. Thank you for your help so far, you have been very kind :) Cheers, Seb On 2 Jan 2023, at 19:09, Uwe

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
ter using MMapDirectory and no enable-preview, as you suggested. Let’s see what happens. Cheers, Seb On 2 Jan 2023, at 17:51, Uwe Schindler wrote: Hi, in recent versions it works like that: https://www.elastic.co/guide/en/elasticsearch/reference/current/advanced-configuration.html#set-jvm-op

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
st how, and I cannot find anything on the ES website. Many thanks. Seb On 2 Jan 2023, at 11:48, Uwe Schindler wrote: Hi, in general you can still use MMapDirectory. There is no requirement to set vm.max_map_count for smaller clusters. The information in Elastics documentation is not mandatory an

Re: Recurring index corruption

2023-01-02 Thread Uwe Schindler
ucene-core-9.3.0.jar:?] Many thanks. Seb -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For addit

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Uwe Schindler
produces the correct result. Should I open the reader before closing the writer? Thanks Michael Am 08.12.22 um 11:36 schrieb Uwe Schindler: You have to reopen the index reader to see deletes from the indexwriter. Am 08.12.2022 um 10:32 schrieb Hrvoje Lončar: Did you call this method before or

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread Uwe Schindler
in in more detail what this method is doing? Thanks Michael -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apac

Re: Sort by numeric field, order missing values before anything else

2022-11-21 Thread Uwe Schindler
e the same problem, just for a different long value. Besides writing a custom comparator, is there any simpler and still performant way to achieve this sort? --Petko -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de

Re: Migrating WhitespaceTokenizerFactory from 8.2 to 9.4

2022-10-29 Thread Uwe Schindler
sis.TokenizerFactory containing: org.apache.lucene.analysis.core.WhitespaceTokenizerFactory What am I missing? Any help would be appreciated. Thanks, David Shifflett -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMa

Re: java 17 and older lucene (4.x)

2022-09-26 Thread Uwe Schindler
esults let me know as well. Thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thet

Re: Lucene 9.2.0 build fails on Windows

2022-09-14 Thread Uwe Schindler
hat fails. Something is definitely wrong because I'm on Windows and it works for me like a charm. Dawid - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.

Re: [External] Re: Can lucene be used in Android ?

2022-09-12 Thread Uwe Schindler
x27;? Thanks, David Shifflett Senior Lead Technologist Enterprise Cross Domain Solutions (ECDS) Booz Allen Hamilton On 9/10/22, 5:30 AM, "Uwe Schindler" wrote: Hi Jie, actually the Lucene 9.x series requires JDK 11 to run, previous versions also work with Java 8. The ma

Re: Can lucene be used in Android ?

2022-09-10 Thread Uwe Schindler
cribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de - To unsubs

Re: How to filter KnnVectorQuery with multiple terms?

2022-08-31 Thread Uwe Schindler
queryVector, k, filter); but it is not clear to me how I can filter for multiple terms. Should I subclass MultiTermQuery and use as filter, just as I use TermQuery as filter above? Thanks Michael -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u

Re: [ANNOUNCE] Issue migration Jira to GitHub starts on Monday, August 22

2022-08-24 Thread Uwe Schindler
- Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://w

Re: Can I integrate Apache Lucene with Dovecot POP3/IMAP incoming mail server to perform indexing and fast searching of email messages?

2022-08-13 Thread Uwe Schindler
ibe, e-mail:java-user-unsubscr...@lucene.apache.org For additional commands, e-mail:java-user-h...@lucene.apache.org -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Lucene Disable scoring

2022-07-11 Thread Uwe Schindler
ction calls can cause delay. As a result I'm looking for a trick to ignore the function call and have all no scoring on my whole query Is it possible to ignore this step? thanks a million -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@t

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
ene/core/src/java/org/apache/lucene/search/FuzzyTermsEnum.java#L248-L256 So in short the exact term gets a boost factor of 1 in the resulting term query, all other terms a lower one. Uwe -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail:u...@thetaphi.de

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
docCount, total number of documents with field 1.0 = tf(freq=1.0), with freq of: 1.0 = freq, occurrences of term within document 0.70710677 = fieldNorm - To unsubscribe, e-mail: java-user-unsubscr...@luc

Re: Fwd: Finding out which fields matched the query

2022-06-27 Thread Uwe Schindler
h time. I wonder what is the efficient way to get the matched fields. Would you please offer some help? Thank you so much! Best regards, Yichen Sun -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@theta

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
field cache is getting cleared. Can you please help to clarify this. On 2022/06/08 17:46:50 Uwe Schindler wrote: Hi, You do not neessarily need a commit. If you use SearcherManager in combination with NRTCachingDirectory you can also refresh you searcher every few seconds, so in-memory cached

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
Murali: Thanks Uwe! New searcher opens when we do a commit.Apart from this, are there other scenarios where a searcher would be refreshed? On 2022/06/08 16:43:07 Uwe Schindler wrote: Hi, They get evicted when the segment of that index is closed. After that theres no reference to them an

Re: Regarding field cache

2022-06-08 Thread Uwe Schindler
or if there is any other scenario which could evict the unused entries from fieldcache. Please help to clarify the same. Thanks Poorna -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de ---

Re: Index corruption and repair

2022-05-05 Thread Uwe Schindler
> > *Using*: >>>>> > > > > >>>>> > > > >- Python 3.8.10 >>>>> > > > >- Pylucene 6.5.0 >>>>> > > > >- Java 8 (1.8.0_181) >>>>> > > > >- Runs on Linux and Windows (error seen on Windows) >>>>> > > > > >>>>> > > > > We suddenly get the following *error*: >>>>> > > > > >>>>> > > > > 2022-02-10 09:58:09.253215: ERROR : writer | Failed to get index >>>>> > > > > (D:\i\202202) writer, Exception: >>>>> > > > > org.apache.lucene.index.CorruptIndexException: Unexpected file >>>>> read >>>>> > error >>>>> > > > > while reading index. >>>>> > > > > >>>>> > > > >>>>> > >>>>> (resource=BufferedChecksumIndexInput(MMapIndexInput(path="D:\i\202202\segments_fo"))) >>>>> > > > > >>>>> > > > > >>>>> > > > > After this, no further indexing happens - trying to open the >>>>> index >>>>> > for >>>>> > > > > writing throws the above error - and the index writer does not >>>>> open. >>>>> > > > > >>>>> > > > > FYI, our code contains the following *settings*: >>>>> > > > > >>>>> > > > > index_path = "D:\i\202202" >>>>> > > > > index_directory = FSDirectory.open(Paths.get(index_path)) >>>>> > > > > iconfig = IndexWriterConfig(wrapper_analyzer) >>>>> > > > > iconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND) >>>>> > > > > iconfig.setRAMBufferSizeMB(16.0) >>>>> > > > > writer = IndexWriter(index_directory, iconfig) >>>>> > > > > >>>>> > > > > >>>>> > > > > *Repairing* >>>>> > > > > We tried 'repairing' the index with the following command / >>>>> tool: >>>>> > > > > >>>>> > > > > java -cp lucene-core-6.5.0.jar:lucene-backward-codecs-6.5.0.jar >>>>> > > > > org.apache.lucene.index.CheckIndex "D:\i\202202" -exorcise >>>>> > > > > >>>>> > > > > This however returns saying "No problems found with the index." >>>>> > > > > >>>>> > > > > >>>>> > > > > *Work around* >>>>> > > > > We have to manually delete the problematic segment file: >>>>> > > > > D:\i\202202\segments_fo >>>>> > > > > after which the application starts again... until the next >>>>> > corruption. We >>>>> > > > > can't spot a specific pattern. >>>>> > > > > >>>>> > > > > >>>>> > > > > *Two questions:* >>>>> > > > > >>>>> > > > >1. Can we handle this situation programmatically, so that no >>>>> > manual >>>>> > > > >intervention is needed? >>>>> > > > >2. Any reason why we are facing the corruption issue in the >>>>> first >>>>> > > > place? >>>>> > > > > >>>>> > > > > >>>>> > > > > Before this we were using Pylucene 4.10 and we didn't face this >>>>> > problem - >>>>> > > > > the application logic is the same. >>>>> > > > > >>>>> > > > > Also, while the application runs on both Linux and Windows, so >>>>> far we >>>>> > > > have >>>>> > > > > observed this situation only on various Windows platforms. >>>>> > > > > >>>>> > > > > Would really appreciate some assistance. Thanks in advance. >>>>> > > > > >>>>> > > > > Regards, >>>>> > > > > Antony >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > -- >>>>> > > > Adrien >>>>> > > > >>>>> > > > >>>>> - >>>>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > > > >>>>> > > > >>>>> > >>>>> > - >>>>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> > >>>>> > >>>>> >>>> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Returning large resultset is slow and resource intensive

2022-03-08 Thread Uwe Schindler
Hi, > For our use case, we need to run queries which return the full > matched result set. In some cases, this result set can be large (50k+ > results out of 4 million total documents). > Perf test showed that just 4 threads running random queries returning 50k > results make Lucene utilize 100% C

RE: Migration from Lucene 5.5 to 8.11.1

2022-01-17 Thread Uwe Schindler
"*initially* created with 6.x". - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: András Péteri > Sent: Thursday, January 13, 2022 9:59 AM > To: java-user@lucene.apache.org > Subject:

RE: migration from lucene 5 to 8

2022-01-17 Thread Uwe Schindler
Hi, no that's expected. See my other post as response to another question a minute ago. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Sascha Janz > Sent: Wednesday, January 12, 202

RE: Moving from lucene 6.x to 8.x

2022-01-17 Thread Uwe Schindler
By the way > Hi, one thing that always works to "forcefully" upgrade without reindexing. > You > just merge the old index into a completely new index not by coping files, but > by > sending their SegmentReaders to addIndex, stripping all metadata from them > with some trick: > https://lucene.apac

RE: Moving from lucene 6.x to 8.x

2022-01-17 Thread Uwe Schindler
ately. This may be a bit slower as the whole index needs to be processed, but it is still faster than reindexing. If you have incorrect offsets, the process will fail, so there's no risk. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaph

Re: Log4j

2021-12-15 Thread Uwe Schindler
schrieb Ali Akhtar : >Does Lucene not have any internal logging at all, e.g for debugging? > >On Thu, Dec 16, 2021 at 2:49 AM Uwe Schindler wrote: > >> Hi, >> >> Lucene is an API and does not log with log4j. >> >> Only the user interface Luke uses log4j, but

Re: Log4j

2021-12-15 Thread Uwe Schindler
is not affected by the latest bug, right? >I saw on Solr News page there are some fixes already made to Solr. >Best regards -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Java 17 and Lucene

2021-10-26 Thread Uwe Schindler
> > don't hang. It happens not all the time (about 1/4th of all builds) and > > > due > > > > to the fact that the JVM is unresponsible it is not possible to get a > > > stack > > > > trace with "jstack". If you know

RE: Java 17 and Lucene

2021-10-19 Thread Uwe Schindler
Hi, > > On a side note, the Lucene codebase still uses the deprecated (as of > > JDK17) AccessController > > in the RamUsageEstimator class. > > We suppressed the warning for now (based on recommendations > > > > dev/202106.mbox/%3CJIRA.1336944

RE: Java 17 and Lucene

2021-10-19 Thread Uwe Schindler
Hi, > Hey, > > Our team at Amazon Product Search recently ran our internal benchmarks with > JDK 17. > We saw a ~5% increase in throughput and are in the process of > experimenting/enabling it in production. > We also plan to test the new Corretto Generational Shenandoah GC. I would a bit carefu

RE: IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2021-10-08 Thread Uwe Schindler
as sibling should clauses? Other suggestions? Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alan Woodward > Sent: Monday, September 21, 2020 7:56 PM > To: Dawid Weiss > Cc: Lucene Users

Re: Question about readVint & writeVint from DataOutput and DataInput

2021-09-03 Thread Uwe Schindler
be supported but should >be avoided? Should I submit a PR to prevent negative integers? -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

Re: Range query with Lucene7.7.1 on old indexes.

2021-09-01 Thread Uwe Schindler
"), long("20190101115959")) > >No results. > >query = LongPoint.newRangeQuery("xdate", long("2019010100"), >long("20190101115959")) > >No results. > >How to get the results on my old indexes using date range query? > >Can anyone help? > >Thanks -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: lucene 4.10.4 punctuation

2021-08-25 Thread Uwe Schindler
Hi, you should explain to use what you exactly want to do: How do you want to search, how do your documents look like? Why is it important to match on punctuation and how should this matching look like? Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u

RE: Failed to execute Ant run-task command

2021-08-19 Thread Uwe Schindler
Could you please open an issue? Can you also check if it still happens on main branch with Lucene 9.0 and Gradle as build system? - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: xiaoshi > Sent:

RE: NRT readers and overall indexing/querying throughput

2021-08-08 Thread Uwe Schindler
ormance and also search performance go down depending on refresh rate. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alexander Lukyanchikov > Sent: Wednesday, August 4, 2021 4:43 AM > To: jav

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
. The posting list of each term can only store internal, numeric lucene doc ids. Those have then to be used to lookup the actual contents from e.g. stored fields (possibility A) or DocValues (possibility B). We can't store UUIDs in the highly compressed posting list. Uwe ----- Uwe Schi

RE: Control the number of segments without using forceMerge.

2021-07-05 Thread Uwe Schindler
exes. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alex K > Sent: Monday, July 5, 2021 4:04 AM > To: java-user@lucene.apache.org > Subject: Control the number of segments without using forceMer

RE: Does Lucene have anything like a covering index as an alternative to DocValues?

2021-07-05 Thread Uwe Schindler
x. If you still need to store it as DocValues field, just add it with both types. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alex K > Sent: Monday, July 5, 2021 2:30 AM > To: java-user@

RE: Changing Term Vectors for Query

2021-06-07 Thread Uwe Schindler
applicable. If you want to have "per document" scoring factors (not per term), you can also use additional DocValues fields with per-document factors and you can use a function query (e.g. using expressions module) to modify the score. Uwe - Uwe Schindler Achterdiek 19,

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
ted from FSDirectory > if (dir.getPreload() == false) > dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index > enabled-> *here setPreload cannot be used* > IndexReader reader = DirectoryReader.open(dir); > IndexSearcher is = new IndexSearcher(reader); >

RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

2020-12-14 Thread Uwe Schindler
l live in/off heap and are part of usual paging. They are just no longer backed by a file. Lucene does most of the stuff outside heap, live with it! Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: baris

RE: Lucene Migration query

2020-11-20 Thread Uwe Schindler
Hi, > Currently I am using Lucene 7.3, I want to upgrade to lucene 8.5.1. Should > I do reindexing in this case ? No, you don't need that. > Can I make use of backward codec jar without a reindex? Yes, just add the JAR file to your classpath and it can read the indexes. Updates written to the

Re: best way (performance wise) to search for field without value?

2020-11-13 Thread Uwe Schindler
o leave the groups_allowed field empty when the document >> should >> >> able to be retrieved by all users, so we need to also select a >document >> if >> >> the 'groups_allowed' is empty. >> >> >> >> What would be the faster Query construction to do so? >> >> >> >> >> >> Currently I use a TermRangeQuery that basically matches all values >and >> put >> >> that in a MUST_NOT combined with a MatchAllDocumentQuery(), but >that >> gets >> >> rather slow then the number of groups is high. >> >> >> >> Thanks! >> >> >> > >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

Re: BooleanQuery: BooleanClause.Occur.MUST_NOT seems to require at least one BooleanClause.Occur.MUST

2020-11-06 Thread Uwe Schindler
tion with a BooleanQuery with just >>> a BooleanClause.Occur.MUST (i.e. results will return fine if they >match). >>> >>> Is this by design or is this an issue? >>> >>> Thanks You, >>> Nissim Shiman >> >> >> >> -- >> Adrien > > >- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: stucked indexing process

2020-10-14 Thread Uwe Schindler
ConcurrentMergeScheduler to enable SSD or spinning disk default settings in your solrconfig.xml: true Use "true" for spinning disks and "false" for SSDs. This prevents the auto-detection from running. Uwe - Uwe Schindler Achterdiek 19, D-28357

Re: Links to classes missing for BMW

2020-10-12 Thread Uwe Schindler
020 4:22:43 PM UTC schrieb baris.ka...@oracle.com: >Hi Uwe,- > >  Could You please point me to the class documentation please? > >Best regards > > >On 10/12/20 12:16 PM, Uwe Schindler wrote: >> BMW support is in Lucene since version 8.0. >> >> Uwe >> &

Re: Links to classes missing for BMW

2020-10-12 Thread Uwe Schindler
quot; so it implies support for Lucene, too, right? > >Best regards > > > >- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.

Re: Fuzzy Search Scoring Adjustment

2020-09-23 Thread Uwe Schindler
ing logic but >otherwise function exactly the same would also work, but all of those >are >either final classes or have no public constructor, effectively making >it >impossible to reuse their logic directly, as near as I can tell. > >If anyone has any ideas of how to approach this, it would be very >helpful. > >Thanks, >Kainoa -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

IntervalQuery replacement for SpanFirstQuery? Closest replacement for slops?

2020-09-21 Thread Uwe Schindler
estion: What's the "closest" replacement for a PhraseQuery with slop? Should I use maxwidth(slop + 1) or maxgaps(slop-1) or maxgaps(slop). I know SpanQuery slops cannot be fully replaced with intervals, but I don't care about those SpanQuery bugs. Uwe - Uwe Schindler Achterdiek

RE: [VOTE] Lucene logo contest, third time's a charm

2020-09-06 Thread Uwe Schindler
Hi, My votes (binding): A1, D Reason: I want to keep the original Lucene colors, so A1 is the only alternative. I still really like the old one, if it would be better vectorized, so my second choice is D. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https

Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Uwe Schindler
CRS features in recent ES development. My fault. Uwe Am June 4, 2020 1:40:51 PM UTC schrieb Uwe Schindler : >Hi, > >Yes. With different projections there is one issue: Elasticsearch only >converts the polygon points to wgs84. But depending on the projection, >the lines between

Re: Tessellate exception in Elasticsearch

2020-06-04 Thread Uwe Schindler
> > [41.9057321381, 44.2310018589] [9.3213479767, >> > -3.20048586995] ]. Possible malformed shape detected. >> > at >> > org.apache.lucene.geo.Tessellator.tessellate(Tessellator.java:114) >> > ~[lucene-sandbox-7.7.3.jar:7.7.3 >> 1a0d2a901dfec93676b0fe8be425101ceb754b85 - >> > noble - 2020-04-21 10:31:55] >> > at >> > >> >org.apache.lucene.document.LatLonShape.createIndexableFields(LatLonShape.java:73) >> > ~[lucene-sandbox-7.7.3.jar:7.7.3 >> 1a0d2a901dfec93676b0fe8be425101ceb754b85 - >> > noble - 2020-04-21 10:31:55] >> > at >> > >> >org.elasticsearch.index.mapper.GeoShapeFieldMapper.indexShape(GeoShapeFieldMapper.java:146) >> > ~[elasticsearch-6.8.9.jar:6.8.9] >> > >> > This is a very basic geometry. Could someone please explain why >this >> shape >> > is invalid? >> > >> > >> > >> > >> > Thanks in advance, >> > >> > Wouter Claeys >> > >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de

RE: Need suggetion in replacing forcemerge(1) with alternative which consumes less space

2020-04-14 Thread Uwe Schindler
is not feasible for > our > use case , because it takes 3X memory. We are creating indexes for huge data. Don't use forceMerge, especially not to work around some issue that comes from wrong multi-threading code and basic misunderstanding on IndexReaders and their relationship to IndexWr

  1   2   3   4   5   6   7   8   9   10   >