Re: Question about ImpactsDISI for boolean queries

2025-04-21 Thread Alfonsi, Peter
Hi Adrien, Thanks for the quick reply! This makes sense. I think BlockMaxConjunctionBulkScorer actually never calls setMinCompetitiveScore() at all, so there's no hope of skipping, while ConjunctionScorer does in the case that there's only one scorer (which happens when we move the range query

Re: Question about ImpactsDISI for boolean queries

2025-04-21 Thread Adrien Grand
You are on the right track. It's easier to skip by score when there is a single scoring clause than when the score is the sum of the scores of two clauses. Well, actually in this case two clauses are not much harder since one of the clauses gives the same score to all documents, but the conjunctio

Re: Question about the performance of Lucene99PostingsFormat

2024-09-16 Thread Rui Wu
Dear Adrien, We found that the regression of match-all is not caused by the PostingList format, and instead it's caused by MaxScoreBulkScorer class. Let me create a new email thread about it since the tile of this email thread is N/A anymore. On Wed, Sep 11, 2024 at 6:24 PM Rui Wu wrote: > Than

Re: Question about the performance of Lucene99PostingsFormat

2024-09-11 Thread Rui Wu
Thanks for your prompt reply! On Tue, Sep 10, 2024 at 1:38 PM Adrien Grand wrote: > Can you clarify what you refer to by match-all and match-many queries? > Lucene's MatchAllDocsQuery should not be impacted since it doesn't use > postings for evaluation. > match-all refers to a query that hits a

Re: Question about the performance of Lucene99PostingsFormat

2024-09-10 Thread Adrien Grand
Can you clarify what you refer to by match-all and match-many queries? Lucene's MatchAllDocsQuery should not be impacted since it doesn't use postings for evaluation. Since FOR is a bit less space-efficient than PFOR, I guess it could be a bit slower if your Directory abstraction was a bit slow at

Re: Question about index segment search order

2023-05-13 Thread Uwe Schindler
Hi, in reference to previous code references and discussions from other Lucene committers I have to clarify: * If you run the query multithreaded (per segment), this means when you add an Executor to IndexSearcher, the order is not predicatable, plain simple * If you use Solr, a single

Re: Question about index segment search order

2023-05-11 Thread Wei
Hi Michael, Yes the collector counts hits across all segments. Thanks for the suggestion, I'm also asking the question on solr-dev. Wei On Thu, May 11, 2023 at 11:57 AM Michael Sokolov wrote: > Maybe ask this issue on solr-dev then? I'm not familiar with how that > collector works. Does it cou

Re: Question about index segment search order

2023-05-11 Thread Michael Sokolov
Maybe ask this issue on solr-dev then? I'm not familiar with how that collector works. Does it count hits across all segments? only within a single segment? On Tue, May 9, 2023 at 1:36 PM Wei wrote: > > Hi Michael, > > I am applying early termination with Solr's EarlyTerminatingCollector > https:

Re: Question about index segment search order

2023-05-09 Thread Wei
Hi Michael, I am applying early termination with Solr's EarlyTerminatingCollector https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java , which triggers EarlyTerminatingCollectorException in SolrIndexSe

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
Yes, sorry I didn't mean to imply you couldn't control this if you want to. I guess in the typical setup it is not predictable. How are you applying early termination? Are you using a standard Lucene Collector or do you have your own? On Thu, May 4, 2023 at 2:03 PM Patrick Zhai wrote: > > Hi Mike

Re: Question about index segment search order

2023-05-04 Thread Patrick Zhai
Hi Mike, Just want to mention if the user chooses to use single thread to index and use LogXXMergePolicy then the document order will be preserved as index order. On Thu, May 4, 2023 at 10:04 AM Wei wrote: > Hi Michael, > > We are interested in the segment sequence for early termination. In ou

Re: Question about index segment search order

2023-05-04 Thread Wei
Hi Michael, We are interested in the segment sequence for early termination. In our case there is always a large dominant segment after index rebuild, then many small segments are generated with continuous updates as time goes by. When early termination is applied, the limit could be reached just

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
There is no meaning to the sequence. The segments are created concurrently by many threads and the merge process will merge them without regards to any ordering. On Wed, May 3, 2023, 1:09 PM Patrick Zhai wrote: > For that part I'm not entirely sure, if other folks know it please chime in > :)

Re: Question about index segment search order

2023-05-03 Thread Patrick Zhai
For that part I'm not entirely sure, if other folks know it please chime in :) On Wed, May 3, 2023 at 8:48 AM Wei wrote: > Thanks Patrick! In the default case when no LeafSorter is provided, are the > segments traversed in the order of creation time, i.e. the oldest segment > is always visited f

Re: Question about index segment search order

2023-05-03 Thread Wei
Thanks Patrick! In the default case when no LeafSorter is provided, are the segments traversed in the order of creation time, i.e. the oldest segment is always visited first? Wei On Tue, May 2, 2023 at 7:22 PM Patrick Zhai wrote: > Hi Wei, > Lucene in general iterate through the index in the or

Re: Question about index segment search order

2023-05-02 Thread Patrick Zhai
Hi Wei, Lucene in general iterate through the index in the order of what is recorded in the SegmentInfos And at search time, you can specify the order using LeafSorter

Re: Question about current situation of good first issues in GitHub

2023-03-11 Thread Shunya Ueta
And sorry for the too late response. I completely missed your kindful response. Thank you again! I will try to contribute Apache Lucene. : ) Regards! 2023年3月11日(土) 16:03 Shunya Ueta : > Oh, Thank you very much. > I don't know those beginner-friendly labels. > I will try to find the good first iss

Re: Question about current situation of good first issues in GitHub

2023-03-10 Thread Shunya Ueta
Oh, Thank you very much. I don't know those beginner-friendly labels. I will try to find the good first issue. 2023年1月14日(土) 5:51 Michael Sokolov : > That label seems to be something GitHub created automatically? > > You might have better luck browsing the full list of labels. I found these: > >

Re: Question about searcherManager applyAllDeletes parameter and maybeRefresh method

2023-03-03 Thread Patrick Zhai
Note that in the javadoc it says "If false, the deletes may or may not be applied" means it will not force applying all the delete and it's up to IndexWriter to decide whether to apply at the refresh time or not, I'm not 100% sure how IndexWriter decides that and maybe someone knows more can chime

Re: Question about searcherManager applyAllDeletes parameter and maybeRefresh method

2023-03-02 Thread Ningshan Li
Hi Patrick, Thanks for the quick response and the explanation and sources are helpful! But there is still a point we couldn't quite understand: why did the test I mentioned earlier pass (applyAllDeletes false and do maybeRefresh())? If the delete is not applied, we should see the deleted doc in th

Re: Question about searcherManager applyAllDeletes parameter and maybeRefresh method

2023-03-02 Thread Patrick Zhai
Hi Ningshan, If you want to make sure the deletes are applied after you call maybeRefresh() then you need to set the applyAllDeletes to be true. A bit more details: The constructor of SearcherManager actually internally passes the applyAllDeletes to the IndexWriter, which then will pass it to the

Re: Question about current situation of good first issues in GitHub

2023-01-13 Thread Michael Sokolov
That label seems to be something GitHub created automatically? You might have better luck browsing the full list of labels. I found these: https://github.com/apache/lucene/labels/legacy-jira-label%3Anewbie https://github.com/apache/lucene/labels/legacy-jira-label%3Anewdev https://github.com/apach

Re: Question about current situation of good first issues in GitHub

2023-01-10 Thread Uwe Schindler
Hi, The old JIRA labels are also in Github. See tags named "legacy-jiralabel:*". The equivalent search would be this: https://github.com/apache/lucene/labels/legacy-jira-label%3Anewdev Uwe Am 10.01.2023 um 12:41 schrieb Stefan Vodita: Hello Shunya, As far as I know, GitHub issues are not m

Re: Question about Benchmark

2022-05-17 Thread Michael Sokolov
OK I replied on the issue. This ann-benchmarks is a separate project, and I think you are asking about how to change it. Probably should take it up with erikbern or whatever community is supporting that actively. I just created a "plugin" so we could use it to test Lucene's KNN implementation, but

Re: Question about Benchmark

2022-05-17 Thread balmukund mandal
Hi All, It's my apologies for not mentioning the benchmark which i was using. Also, i realized that i've not subscribed to this group,hence duplicating this mail. The below queries are for ANN-Benchmark https://issues.apache.org/jira/browse/LUCENE-9625 Indexing takes a long time, so is there a way

Re: Question about Benchmark

2022-05-16 Thread Mikhail Khludnev
Hi, Balmukund. Assuming you are asking about Lucene benchmark module. 1) If one build index once, it's possible to start benchmark with ResetSystemSoft that keep index files intact and allow to benchmark search again and again, without waiting long for reindex. 2) Check indexing-multithreaded.alg

Re: Question about Benchmark

2022-05-16 Thread Adrien Grand
Hi Balmukund, What benchmark are you talking about? On Mon, May 16, 2022 at 4:35 PM balmukund mandal wrote: > > Hi All, > I was trying to run the benchmark and had a couple of questions. Indexing > takes a long time, so is there a way to configure the benchmark to use an > already existing index

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner
Hi Yuxin Can you provide a concrete example of a query and a document/code snippet? Thanks Michael Am 20.12.21 um 03:06 schrieb Yuxin Liu: Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes usi

Re: Question about readVint & writeVint from DataOutput and DataInput

2021-09-03 Thread Aaron Cohen
Thank you for the clarification. > On Sep 3, 2021, at 10:46 AM, Uwe Schindler wrote: > > They are fully supported, so you can write and read them. > > The problem with negative numbers is that they need lot of (disk) space, > because in two's complement they have almost all bits set. The large

Re: Question about readVint & writeVint from DataOutput and DataInput

2021-09-03 Thread Uwe Schindler
They are fully supported, so you can write and read them. The problem with negative numbers is that they need lot of (disk) space, because in two's complement they have almost all bits set. The largest number is kinds of disk space is -1. Negative numbers appear in older index formats, so they

Re: Question about PhraseQuery's capacity...

2020-01-12 Thread 小鱼儿
hi i have filed a issue to lucene-core: https://issues.apache.org/jira/browse/LUCENE-9130 i just write a test case, and find that BooelanQuery with MUST filter mode is ok, but PhraseQuery fails 小鱼儿 于2020年1月10日周五 下午7:14写道: > explain api helps! thanks for hint~! > I have found out that one case fa

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread 小鱼儿
explain api helps! thanks for hint~! I have found out that one case failed becaused i carelessly add another filter condition, but the other case (which is analyzed into 30 terms) still failed, doesn't know why I guess i need to write a unit testcase to use MultiTerms.getTerms API to find out if th

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread Mikhail Khludnev
Hello, Sometimes IndexSearcher.explain(Query, int) allows to analyse mismatches. On Fri, Jan 10, 2020 at 1:13 PM 小鱼儿 wrote: > After i directly call Analyzer.tokenStream() method to extract terms from > query, i still cannot get results. Doesn't know the why... > > Code when build index: >

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread 小鱼儿
After i directly call Analyzer.tokenStream() method to extract terms from query, i still cannot get results. Doesn't know the why... Code when build index: IndexWriterConfig iwc = new IndexWriterConfig(analyzer); //new SmartChineseAnalyzer(); Code do query: (1) extract terms from query

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread 小鱼儿
Hi Adrien, I find i might make a mistake: There is 2 level processing in a Analyzer class: one is Tokenizer, which is HMMChineseTokenizer, and the other is Analyzer which may apply some filtering... I'm using lucene's default interface to set a Analyzer instance to do the indexing, b

Re: Question about PhraseQuery's capacity...

2020-01-10 Thread Adrien Grand
It should match. My guess is that you might not reusing the same positions as set by the analysis chain when creating the phrase query? Can you show us how you build the phrase query? On Fri, Jan 10, 2020 at 9:24 AM 小鱼儿 wrote: > I use SmartChineseAnalyzer to do the indexing, and add a document w

Re: Question about the light and minimal French stemmers

2019-07-28 Thread Adrien Gallou
Hi Tomoko, Thanks for your answer. So, after them, I have opened an issue with a patch attached: https://issues.apache.org/jira/browse/LUCENE-8937 Adrien Le dim. 28 juil. 2019 à 13:51, Michael Sokolov a écrit : > Oh sorry for jumping in with my irrelevant comment, you are right, of > course!

Re: Question about the light and minimal French stemmers

2019-07-28 Thread Michael Sokolov
Oh sorry for jumping in with my irrelevant comment, you are right, of course! On Sat, Jul 27, 2019, 10:36 PM Tomoko Uchida wrote: > Let me just make things a bit clear... > I think the concern here is that FrenchMinimalStemmer would remove the > last "digit" from a token because of it does not c

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
Let me just make things a bit clear... I think the concern here is that FrenchMinimalStemmer would remove the last "digit" from a token because of it does not check if the character is letter or not. e.g., "123455" is trimmed to "12345" by FrenchMinimalStemmer. To me, this behaviour is beyond stem

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Michael Sokolov
I'm not so sure. I think the whole idea of having both stemmers is that the minimal one does less than the light one. Removing the final character of a double letter suffix is going to sacrifice some precision. For example mes/mess, ne/née, I'm sure there are others. So having both options is hel

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
I found an issue which adds the isLetter() check on FrenchLightStemmer. https://issues.apache.org/jira/browse/LUCENE-4063 Seems the same change has not been applied to FrenchMinimalStemmer, would it be a good idea that we add the same check to it to avoid too aggressive stemming? Tomoko 2019年7月2

Re: Question about the light and minimal French stemmers

2019-07-27 Thread Tomoko Uchida
Hi Adrien, To me, it sounds simply a bug. Can you please open a JIRA (with a patch if possible)? Tomoko 2019年7月23日(火) 22:05 Adrien Gallou : > > Hi, > > I'm using both light and minimal French stemmers and encountered an issue > when using the minimal stemmer. > > The light stemmer removes the la

Re: Question about Lucene in my project ..

2019-05-28 Thread Adrien Grand
Hi John, I heard of many users who used Lucene for this use-case, it's definitely a valid one. Indexes are stored mostly on disk, with a tiny part of them being held in memory to guarantee good access speed. Lucene supports both inverted indexes and KD trees up to 8 dimensions. Lookup, sorting an

Re: Question about Indexsearcher.search()

2019-01-25 Thread Tomoko Uchida
Hi, Tokenization is usually performed by a query parser before searching and the result documents may include all terms or some of the terms or only one term in the query string (it depends on your query configuration). > I'm trying to make sample search application with Lucene. Have you checked

Re: Question about upgrading lucene 4.4.0 to 7.5.0

2018-11-06 Thread Tomoko Uchida
Hi, I think changing analyzer per each document when indexing will lead inconsistent or unstable search results. I would break down the reason why this is needed. > While adding a document we are adding a different analyzer. If a field needs to be analyzed by multiple analyzer, I would split up

Re: Question about upgrading lucene 4.4.0 to 7.5.0

2018-11-05 Thread Arpit Mittal
Could you please help us on it? This is urgent for us? On Sun, Nov 4, 2018 at 10:04 PM Arpit Mittal wrote: > Hi All, > > We are working on upgrading lucene version from 4.4.0 to 7.5.0. > > We have a few questions. Could you please help us by giving us suggestions > to fix it? > > Remove IndexWri

Re: Question About FST, multiple-column index

2018-09-22 Thread Michael McCandless
You might want to index the name field normally (as StringField, for example), then index the age as a NumericDocValuesField, and then make a BooleanQuery with two required clauses, one clause TermQuery on the name, the other a NumericDocValuesField.newSlowExactQuery. Even though its name is "slow

Re: Question About FST, multiple-column index

2018-09-21 Thread Mikhail Khludnev
No way. And this is the point. To have combined index you need to combine fields concatenating terms. It will be faster but it brings much other hurdles. Do you think that this is the real problem? What's the search time now and how do you search exactly? On Thu, Sep 20, 2018 at 5:57 PM ly铖 <5204

Re: Question about usage of LuceneTestCase

2018-08-27 Thread Tomoko Uchida
> i haven't looked closely into what exactly that "useFactory(null)" call > does, but it's probably worth getting to the bottom of the failures and > *IF* it's tied to some specific dir type or codec, using annotations to > supress them -- rather then just eliminating all directory randomization.

Re: Question about usage of LuceneTestCase

2018-08-27 Thread Chris Hostetter
: Current version of Luke supports FS based directory implementations only. : (I think it will be better if future versions support non-FS based custom : implementations, such as HdfsDirectoryFactory for users who need it.) : Disabling the randomization, at least for now, sounds reasonable to me

Re: Question about BytesRef and BinaryDocValues

2018-08-24 Thread Vadim Gindin
Kevin, the sequence is the following: get terms for the field, get postings for a term and further get payload from the postings. Have a read a little about reverse index structure and it will be more clear to you. Your Query creates Weight, that must create a scorer in the method scorer(context).

Re: Question about BytesRef and BinaryDocValues

2018-08-23 Thread Kevin Manuel
Hi Vadim, Thank you so much for your reply. I think you were right. So if a field is 'analyzed' how can I get both terms 'hey' and 'tom'? Thanks, Kevin On Thu, Aug 23, 2018, 20:26 Vadim Gindin wrote: > Hi Kevin! > > I think that your field is "analyzed" and so your field value is divided to >

Re: Question about BytesRef and BinaryDocValues

2018-08-23 Thread Vadim Gindin
Hi Kevin! I think that your field is "analyzed" and so your field value is divided to 2 terms "hey" and "tom". So docvalue is written for each of them. Regards Vadim Gindin пт, 24 авг. 2018, 5:19 Kevin Manuel : > Hi, > > I'm using lucene version 4.3.1 and I've implemented a custom score query.

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
> You don't really have to figure out exactly what the combinations are, > just execute the test with the "reproduce with" flags set, cut/paste > the error message at the root of your local Solr source tree in a > command prompt. > ant test -Dtestcase=CommitsImplTest > -Dtests.method=testGetSegme

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Michael Sokolov
It looks to me as if this test is asserting that the segment in an index it just created has some attributes, but in fact it does not. Perhaps there is a codec that does not store any attributes with its segments, and Luke does not expect this, and maybe the codec is being selected randomly by the

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Michael Sokolov
Here's a seed that fails for me consistently in IntelliJ: "FEF692F43FE50191:656E22441676701C" running CommitsImplTest. Warning: I have a bunch of local changes that might have perturbed the randomness so possibly it might not reproduce for others. I just run the tests, open the "Edit Configuration

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Erick Erickson
bq. My understanding at this point is (though it may be a repeat of your words,) first we should find out the combinations behind the failures. If there are any particular patterns, there could be bugs, so we should fix it. You don't really have to figure out exactly what the combinations are, jus

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
Can I ask one more question. 4> If MIke's intuition that it's one of the file system randomizations that occasionally gets hit _and_ you determine that that's an invalid test case (and for Luke requiring that the FS-basesd tests are all that are necessary may be fine) I'm pretty sure you you can d

Re: Question about usage of LuceneTestCase

2018-08-22 Thread Tomoko Uchida
Thanks for your kind explanations, sorry of course I know what is the randomization seed, but your description and instruction is exactly what I wanted. > The randomization can cause different > combinations of "stuff" to happen. Say the locale is randomized to > Turkish and a token is also rando

Re: Question about usage of LuceneTestCase

2018-08-21 Thread Erick Erickson
The pseudo-random generator in the Lucene test framework is used to randomize lots of test conditions, we're talking about the file system implementation here, but there are lots of others. Whenever you see a call to random().whatever, that's the call to the framework's method. But here's the thin

Re: Question about usage of LuceneTestCase

2018-08-21 Thread Tomoko Uchida
Thanks a lot for your information & insights, I will try to reproduce the errors and investigate the results. And, maybe I should learn more about internal of the test framework, I'm not familiar with it and still do not understand what does "seed" means exactly in this context. Regards, Tomoko

Re: Question about usage of LuceneTestCase

2018-08-21 Thread Erick Erickson
Couple of things (and I know you've been around for a while, so pardon me if it's all old hat to you): 1> if you run the entire "reproduce with" line and can get a consistent failure, then you are half way there, nothing is as frustrating as not getting failures reliably. The critical bit is often

Re: Question about usage of LuceneTestCase

2018-08-21 Thread Tomoko Uchida
Hi, Mike Thanks for sharing your experiments. > CommitsImplTest.testListCommits > CommitsImplTest.testGetCommit_generation_notfound > CommitsImplTest.testGetSegments > DocumentsImplTest.testGetDocumentFIelds I also found CommitsImplTest and DocumentsImplTest fail frequently, especially CommitsIm

Re: Question about usage of LuceneTestCase

2018-08-21 Thread Michael Sokolov
I was running these luke tests a bunch and found the following tests fail intermittently; pretty frequently. Once I @Ignore them I can get a consistent pass: CommitsImplTest.testListCommits CommitsImplTest.testGetCommit_generation_notfound CommitsImplTest.testGetSegments DocumentsImplTest.testGet

Re: Question about threading in search

2018-08-17 Thread Erick Erickson
Please don't optimize to 1 segment unless you can afford to do it quite regularly, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ (NOTE: this is changing as of 7.5, see: https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/). bq. It

Re: Question about threading in search

2018-08-17 Thread Toke Eskildsen
On Sat, 2017-09-02 at 18:33 -0700, Peilin Yang wrote: > we're comparing two different indexes on the same collection - one > with lots of different segments (default settings), and one with a > force merged into one segment. It seems that search is sometimes > faster with multiple segments. If you

Re: Question about a documentation note in CompressingStoredFieldsIndexWriter

2017-11-24 Thread Adrien Grand
That makes sense to me, I'll push that change. Thanks! Le ven. 24 nov. 2017 à 10:40, Roman Margolis a écrit : > Sorry about that. > In my original message, I highlighted the relevant parts which probably > didn't make it to the mail archive. > > I would expect the note to state the following (un

Re: Question about a documentation note in CompressingStoredFieldsIndexWriter

2017-11-24 Thread Roman Margolis
Sorry about that. In my original message, I highlighted the relevant parts which probably didn't make it to the mail archive. I would expect the note to state the following (unless I misunderstood some of the details): "Once data is loaded into memory, you can lookup the start pointer of any docum

Re: Question about a documentation note in CompressingStoredFieldsIndexWriter

2017-11-24 Thread Adrien Grand
Hi Roman, It's unclear to me what modification you are suggesting, could you please share what the updated comment would look like? Le mer. 22 nov. 2017 à 14:17, Roman Margolis a écrit : > Hi, > > I was reading some internal info about Lucene, and was confused by a note > on this page: > > http

Re: question about spatial module in lucene 5

2015-03-30 Thread david.w.smi...@gmail.com
I Anton. I think you’re right. PointVectorStrategy has been overlooked. The work-around is pretty simple though. In addition to calling createIndexableFields, also create two DoubleDocValuesField instances, one for each dimension that uses the identical names the strategy generates. Lucene will

Re: Question about JoinUtil

2014-12-17 Thread Glen Newton
Hi Gregory, Thanks for your reply. In reading it, I realized that one side of my relational join wasn't that large, and I could bring it in as a couple of fields to the main document without any penalty, so my need to join two different document types then goes away. Thanks! :-) Glen On Tue,

Re: Question about JoinUtil

2014-12-16 Thread Gregory Dearing
Glen, Lucene isn't relational at heart and may not be the right tool for what you're trying to accomplish. Note that JoinQuery doesn't join 'left' and 'right' answers; rather it transforms a 'left' answerset into a 'right' answerset. JoinQuery is able to perform this transformation with a single

Re: Question about JoinUtil

2014-12-16 Thread Glen Newton
Anyone? On Thu, Dec 11, 2014 at 2:53 PM, Glen Newton wrote: > Is there any reason JoinUtil (below) does not have a 'Query toQuery' > available? I was wanting to filter on the 'to' side as well. I feel I > am missing something here. > > To make sure this is not an XY problem, here is my use case:

Re: Question about multi-valued fields

2014-05-21 Thread Chris Bamford
ow come span queries are heading for extinction? Thanks - Chris -Original Message- From: Allison, Timothy B. To: java-user@lucene.apache.org Sent: Tue, 20 May 2014 16:59 Subject: RE: Question about multi-valued fields Chris, Good to see you over here. There's probably an

RE: Question about multi-valued fields

2014-05-20 Thread Allison, Timothy B.
Chris, Good to see you over here. There's probably an easier way... I ran into this with geo queries, and the answer there is to test every value in the multi field for the document that is a hit. For the text search question, though, you could use analysis and then run a SpanQuery against y

Re: Question about Payloads in Lucene 4.5

2014-03-27 Thread Rohit Banga
Awesome works well for me! Thanks Rohit Banga http://iamrohitbanga.com/ On Sun, Mar 23, 2014 at 10:06 PM, Manuel Le Normand < manuel.lenorm...@gmail.com> wrote: > Hello Rohit, > We had a similar query time bottleneck when attempting to map lucene's > internal id's to the uniqueKey, especially a

Re: Question about Payloads in Lucene 4.5

2014-03-23 Thread Manuel Le Normand
Hello Rohit, We had a similar query time bottleneck when attempting to map lucene's internal id's to the uniqueKey, especially as we generally return only the uniqueKey to the user we had no other use of the stored field. As you noted, every internal id --> uniqueKey id requires a disk seek and as

Re: Question about Payloads in Lucene 4.5

2014-03-22 Thread Michael McCandless
On Sat, Mar 22, 2014 at 5:18 AM, Rohit Banga wrote: > Awesome BinaryDocValues sounds nice! > I saw that NumericDocValues did not inherit from a base class hence I > thought there is no StringDocValues :). > > Can I expect that a searcher manager will invoke > searcherfactory.newSearcher at most o

Re: Question about Payloads in Lucene 4.5

2014-03-22 Thread Rohit Banga
Awesome BinaryDocValues sounds nice! I saw that NumericDocValues did not inherit from a base class hence I thought there is no StringDocValues :). Can I expect that a searcher manager will invoke searcherfactory.newSearcher at most once between searcher manager refreshes? I believe IndexSearcher i

Re: Question about Payloads in Lucene 4.5

2014-03-22 Thread Michael McCandless
On Fri, Mar 21, 2014 at 10:25 PM, Rohit Banga wrote: > Thanks Michael for your response. You're welcome! > Few questions: > > 1. Can I expect better performance when retrieving a single NumericDocValue > for all hits vs when I retrieve documents for all hits to fetch the field > value? As far as

Re: Question about Payloads in Lucene 4.5

2014-03-21 Thread Rohit Banga
​Just saw the implementation of MultiDocValues.getNumericValues(). It uses sort of returns an anonymous inner classes to get the doc value from the appropriate index reader. Very cool impleentation! I guess that answers my question on how to get docVal from multiple​ ​ atomic readers. It would be

Re: Question about Payloads in Lucene 4.5

2014-03-21 Thread Rohit Banga
​Thanks Michael for your response. Few questions: 1. Can I expect better performance when retrieving a single NumericDocValue for all hits vs when I retrieve documents for all hits to fetch the field value? As far as I understand retrieving n documents from the index requires n disk reads. How ma

Re: Question about Payloads in Lucene 4.5

2014-03-21 Thread Michael McCandless
DocValues are better than payloads. E.g. index a NumericDocValuesField with each doc, holding your id. Then at search time you can use MultiDocValues.getNumericValues. Mike McCandless http://blog.mikemccandless.com On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga wrote: > Hi everyone > > When I

Re: question about using lucene on large documents

2014-02-05 Thread Michael Sokolov
No, not really. What would you do if you had a match contained entirely within the overlapping region? You'd probably need a way to distinguish that from a term that matched in two adjacent chunks, but *not* in the overlap. Sounds very tricky to me. -Mike On 2/5/2014 2:21 AM, mrodent wrote:

Re: question about using lucene on large documents

2014-02-04 Thread mrodent
Thanks, gives me food for thought. So no { N, N+1 } ideas specifically... -- View this message in context: http://lucene.472066.n3.nabble.com/question-about-using-lucene-on-large-documents-tp4115343p4115465.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Re: question about using lucene on large documents

2014-02-04 Thread Michael Sokolov
Ideally you would chunk a document at logical boundaries that will make sense as units of both search and presentation. For some content, these boundaries don't align; for example you might want to search for matches within a paragraph scope, or within a section, chapter, or part of a book, bu

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Michael McCandless
The picture didn't come through to the list. If you are really fully re-indexing and replacing the index every time then you should just open a new IndexReader instead of trying to .maybeReopen? Ie, the newly opened reader cannot share any segments with the old one, so you get no benefit from it.

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Alexei Morgado
We are not copying index files from one index to another. Will try to explain: 1 - We have a unix script that removes the old physical index and create a new one several times a day from the database. 2 - The SearcherManager call maybeReopen in a separate thread from the main application every f

Re: Question about SearcherManager.maybeReopen() method.

2013-11-07 Thread Michael McCandless
It sounds like you are somehow copying over index files from one index to another? You shouldn't do that; use IW.addIndexes instead. Or maybe give a bigger picture of how your application works with Lucene? Mike McCandless http://blog.mikemccandless.com On Wed, Nov 6, 2013 at 6:46 PM, Alexei

Re: Question about the CompoundWordTokenFilterBase

2013-09-18 Thread Jack Krupansky
Out of curiosity, what is your use case? I mean, the normal use of this filter is to permit a "shorthand" reference to a long term, but why would you necessarily want to preclude direct reference to the full term? -- Jack Krupansky -Original Message- From: Alex Parvulescu Sent: Wedne

Re: question about document-frequency in score

2013-03-22 Thread Simon Willnauer
all statistics in lucene are per field so is document frequency simon On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste wrote: > Hi > > I am trying to figure out if the document-frequency of a term used in > calculating the score. Is it per field? Or is independent of the field? > > Thanks > >

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Chris Hostetter
: I am confused with the ordering rule about SpanNearQuery. For example, I : indicate the slot in SpanNearQuery is 10. And the results are all the : qualified documents. Is it true that any document with shorter distance ... : it till uses tf-idf algorithm to rank the docs. Or there is

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Jack Krupansky
#explain(org.apache.lucene.search.Query, int) -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, November 21, 2012 11:44 AM To: java-user@lucene.apache.org Subject: Re: Question about ordering rule of SpanNearQuery Add &debugQuery=true to your query and look at

Re: Question about ordering rule of SpanNearQuery

2012-11-21 Thread Jack Krupansky
Add &debugQuery=true to your query and look at the "explain" section to see how the scoring is calculated for each document. Sometimes it is counter-intuitive and some factors may differ but those differences can be overwhelmed by other, unrelated factors. -- Jack Krupansky -Original Mess

Re: Question about ordering rule of SpanNearQuery

2012-11-19 Thread Jack Krupansky
Unfortunately, there doesn't appear to be any Javadoc that discusses what factors are used to score spans. For example, how to relate the number of times a span matches in a document vs. the exactness of each span match. -- Jack Krupansky -Original Message- From: 杨光 Sent: Monday, Nov

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
August 23, 2012 9:13 AM To: java-user@lucene.apache.org Subject: Re: Question about BooleanQuery OK, it's not the idea that the nested NOT query has got anything to do with booleanField_1, so I'll try to phrase very clearly what I want : the query should return docs where ( someField

Re: Question about BooleanQuery

2012-08-23 Thread heikki
OK, it's not the idea that the nested NOT query has got anything to do with booleanField_1, so I'll try to phrase very clearly what I want : the query should return docs where ( someField_1 = 0 OR someField_2 = 0) AND ( booleanField_1 = false ) AND ( NOT ( ( someField_1 = 0 OR someField_2 = 0 )

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
cene.apache.org Subject: Re: Question about BooleanQuery thanks Jack for your answer, however I'm not quite sure what to do with it: the query is like +( someField_1:0 someField_2:0 ) +booleanField_1:false -( +( someField_1:0 someField_2:0 ) +booleanField_2:true ) (I put

Re: Question about BooleanQuery

2012-08-23 Thread heikki
thanks Jack for your answer, however I'm not quite sure what to do with it: the query is like +( someField_1:0 someField_2:0 ) +booleanField_1:false -( +( someField_1:0 someField_2:0 ) +booleanField_2:true ) (I put this in 'raw' before, think it might not have shown up in

Re: Question about BooleanQuery

2012-08-23 Thread Jack Krupansky
Step 1, fully parenthesize your boolean to show your desired order of execution. The Lucene BooleanQuery does not do a pure Boolean evaluation. You have the same sub-expression in your NOT clause - that's probably what guarantees zero results. And you have an unmatched right parenthesis at the

  1   2   3   >