from:"András Péteri"

Re: What exactly returns IndexReader.numDeletedDocs()

2022-12-08 Thread András Péteri

IIRC, it's the number of documents marked with a "deleted" bit. They are obliterated during merges as segments written during the merge operation no longer include deleted contents. So eg. if you call forceMerge(1), no previous segment is preserved and the deleted count will drop to 0 as a result.

Re: Migration from Lucene 5.5 to 8.11.1

2022-01-13 Thread András Péteri

It looks like Sascha runs IndexUpgrader for all major versions, ie. 6.6.6, 7.7.3 and 8.11.1. File "segments_91" is written by the 7.7.3 run immediately before the error. On Wed, Jan 12, 2022 at 3:44 PM Adrien Grand wrote: > The log says what the problem is: version 8.11.1 cannot read indices > c

Re: how to find out each score contribution from booleanquery components

2019-06-27 Thread András Péteri

Hi Baris, Explanation's output is hierarchical, and the leading "0.0" values you are seeing are the individual contributions of each boolean clause or any other nested query. Going from bottom to top: Term query on countryDFLT = 'states', but no term matched this value --> score is 0.0 for the t

Re: SQL OR in lucene : where ((term1=a and term2=b) OR (term3=a and term4=b)) and context in (2,3,4,5.....200)

2018-08-24 Thread András Péteri

> > But it can be workable, if I manage to apply context condition > separately. > > > > > > More probably using custom filtering through Collector interface > https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/ > search/Collector.html. > > > > > > Any idea please. > > > > > > Regards, > > Khurram > > > > -- > Tomoko Uchida > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- András Péteri

Re: Lucene same search result for worlds with and without spaces

2018-06-20 Thread András Péteri

An n-gram tokenizer/filter might also work for you: http://lucene.apache.org/core/7_3_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html Regards, András On Wed, Jun 20, 2018 at 11:53 AM, Markus Jelsma wrote: > Hi Egorlex, > > Set the tokenSeparator to "" and ShingleFilter w

Re: Encryption At Rest - Using CustomAnalyzer

2018-02-06 Thread András Péteri

Hi Avarinth, There is an open issue to encrypt index files using AES, don't know if that would fit your requirements: https://issues.apache.org/jira/browse/LUCENE-2228 Regards, András On Tue, Feb 6, 2018 at 8:32 AM, Michael Wilkowski wrote: > Hi, > sorry to say that, but your encryption is not

Re: Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

2017-07-02 Thread András Péteri

Hi, Note that If you are using Lucene directly, 5.x introduced LUCENE-6064 [1] [2], which adds checks to ensure that the sort field has a corresponding DocValue of the expected type. Indexed fields can only be used for sorting via an UninvertingReader, at a cost of increased heap usage [3]. Solr h

Re: Non-index files under the search directory

2016-11-24 Thread András Péteri

ess the solution > should be explicitly use getCommitData for each sub-index, then set it into > new consolidated search database, right? > > Best, > > --Xiaolong > > > On Tue, Nov 22, 2016 at 12:10 PM, András Péteri > wrote: > >> Hi Xiaolong, >> >> A Map o

Re: Non-index files under the search directory

2016-11-22 Thread András Péteri

> I am wondering does indexwriter can also merge this non-index file while >> it >> > merging multiple search index? >> > >> > And if I am stepping back a little bit, what's is the best way t

Re: Are "position" and "position increment" actually the exact same concept?

2016-02-08 Thread András Péteri

ter all? > > > > TX > > > > ----- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- András Péteri

Re: Quiz question: Which Character.isSpaceChar but not isWhitespace?

2015-11-01 Thread András Péteri

rch. It’s caused all sorts of head-scratching > till we discovered what’s going on. > > Craziness. > > ~ David > -- > Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker > LinkedIn: http://linkedin.com/in/davidwsmiley | Book: > http://www.solrenterprisesearchserver.com > -- András Péteri

Re: ConjunctionScorer access

2015-10-22 Thread András Péteri

l.com] > > >>> Sent: Wednesday, October 21, 2015 7:03 PM > > >>> To: java-user@lucene.apache.org > > >>> Subject: ConjunctionScorer access > > >>> > > >>> It's a bummer Lucene makes the constructor of ConjunctionScorer non- > > >>> public. I wanted to extend from this class in order to tweak its > > >> behavior for > > >>> my use case. Is it possible to change it to protected in future > > releases > > >> ? > > >> > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > -- András Péteri

Re: IndexWriter is not closing the FDs (deleted files)

2015-09-01 Thread András Péteri

Hi Napoli, You could also create an instance of SearcherManager [1], and let it take care of tracking IndexSearchers; it can also be use to reopen the underlying readers, and close them when they are no longer in use. Calling maybeRefresh() or maybeRefreshBlocking() on the manager ensures that a r

Re: Mapping doc values back to doc ID (in decent time)

2015-08-09 Thread András Péteri

If I understand it correctly, the Zoie library [1][2] implements the "sledgehammer" approach by collecting docValues for all documents when a segment reader is opened. If you have some RAM to throw at the problem, this could indeed bring you an acceptable level of performance. [1] http://senseidb.

Re: ignore score and weight in lucene search

2015-07-30 Thread András Péteri

Collector's javadoc in Lucene 4.x includes a bare minimum example which only registers matching documents in a bitset: https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_4/lucene/core/src/java/org/apache/lucene/search/Collector.java#L85 You'll have to adapt this if you want to use it in L

Re: Lucene 5: Wrapping Collector

2015-06-29 Thread András Péteri

Hi, IndexSearcher.search(Query, Collector) will iterate through all segments of the index, call getLeafCollector, and use the returned LeafCollector to collect result documents from that segment [1]. As LeafCollector's javadoc describes [2], there are cases when you want to take into account prec

Re: BytesRef violates the principle of least astonishment

2015-05-20 Thread András Péteri

As Olivier wrote, multiple BytesRef instances can share the underlying byte array when representing slices of existing data, for performance reasons. BytesRef#clone()'s javadoc comment says that the result will be a shallow clone, sharing the backing array with the original instance, and points to

Re: understanding the norm encode and decode

2015-03-05 Thread András Péteri

Sorry, I also got it wrong in the previous message. :) It goes 0.89f -> 123 -> 0.875f. On Thu, Mar 5, 2015 at 10:08 AM, András Péteri wrote: > Hi Andrew, > > If you are using Lucene 3.6.1, you can take a look at the method which > creates a single byte value out of the receiv

Re: understanding the norm encode and decode

2015-03-05 Thread András Péteri

Hi Andrew, If you are using Lucene 3.6.1, you can take a look at the method which creates a single byte value out of the received float using bit manipulation at [1]. There is also a 256-element decoder table in Similarity, where each byte corresponds to a decoded float value computed by [2]. The

Throwing CollectionTerminatedException from Collector.getLeafCollector

2015-03-02 Thread András Péteri

Hi, According to IndexSearcher's code [1], if a Collector implementation is not interested in collecting document hits from a particular leaf reader, it can also throw CollectionTerminatedException from Collector.getLeafCollector(LeafReaderContext). This option is however not described in Collecto

Re: Lucene 4.x -> 5 : IllegalStateException while sorting

2015-02-23 Thread András Péteri

Hi Clemens, I think this part of the release notes [1] applies to your case: * FieldCache is gone (moved to a dedicated UninvertingReader in the misc module). This means when you intend to sort on a field, you should index that field using doc values, which is much faster and less heap consuming

Re: Query nested document

2014-10-20 Thread András Péteri

Hello Aurélien, I believe the approach you described is what Elasticsearch is taking with nested documents, in addition to indexing parent and child documents in a single block. See the "sidebar" at the bottom of [1] and the sections labeled "nested" of [2] for more details. Michael's blog post o

Merge policy for branching data model

2014-01-05 Thread András Péteri

Hello, Our application uses Lucene to index documents received from a back-end that supports storage of temporal data with branches, similar to revision control systems like SVN: when looking at a single object, one can choose to either retrieve the current state, go back to a previous point in ti

Re: What exactly returns IndexReader.numDeletedDocs()

Re: Migration from Lucene 5.5 to 8.11.1

Re: how to find out each score contribution from booleanquery components

Re: SQL OR in lucene : where ((term1=a and term2=b) OR (term3=a and term4=b)) and context in (2,3,4,5.....200)

Re: Lucene same search result for worlds with and without spaces

Re: Encryption At Rest - Using CustomAnalyzer

Re: Maintaining sorting order (stored fields vs DocValue fields) while upgrading Lucene version

Re: Non-index files under the search directory

Re: Non-index files under the search directory

Re: Are "position" and "position increment" actually the exact same concept?

Re: Quiz question: Which Character.isSpaceChar but not isWhitespace?

Re: ConjunctionScorer access

Re: IndexWriter is not closing the FDs (deleted files)

Re: Mapping doc values back to doc ID (in decent time)

Re: ignore score and weight in lucene search

Re: Lucene 5: Wrapping Collector

Re: BytesRef violates the principle of least astonishment

Re: understanding the norm encode and decode

Re: understanding the norm encode and decode

Throwing CollectionTerminatedException from Collector.getLeafCollector

Re: Lucene 4.x -> 5 : IllegalStateException while sorting

Re: Query nested document

Merge policy for branching data model

23 matches

Site Navigation

Mail list logo

Footer information