date:20100210

Do deleted documents affect scores?

2010-02-10 Thread Yuval Feinstein

I want to focus my previous question. Say we have two Lucene indexes: A and B. Index A contains documents a and b. Index B used to contain documents a, b and c, But c was deleted. All documents share some vocabulary. If we search using terms common to documents b and c, Can we get a different score

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Marvin Humphrey

On Wed, Feb 10, 2010 at 12:33:27PM -0500, Michael McCandless wrote: > In Lucene, skipping is done through the aggregator, I had a look at MultiDocsEnum in the flex blanch. It doesn't know when sub-enum is reading skip data. > > I suppose another possibility would have been to have the aggregato

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless

On Wed, Feb 10, 2010 at 8:27 AM, Marvin Humphrey wrote: >> But why didn't you have the Multi*Enums layer add the offset (so >> that the codec need not know who's consuming it)? Performance? > > That would have involved something like this within the aggregator: > >posting.setDocID(posting.ge

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless

On Wed, Feb 10, 2010 at 9:47 AM, Renaud Delbru wrote: > On 10/02/10 13:15, Uwe Schindler wrote: >>> >>> Could you provide pointers to search code that uses the segment-level >>> enum ? >>> As I explained in my last answer to Michael, the TermScorer is using >>> the >>> DocsEnum interface, and ther

Re: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread Michael McCandless

OK I opened this issue: https://issues.apache.org/jira/browse/LUCENE-2259 And put a patch on. If you can try the patch, that'd be great :) You should be able to apply the patch, build a new jar, then run your test again unmodified, and 0.cfs, 1.cfs should then be removed. Mike 2010/2/10 Mic

Re: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread Michael McCandless

>From this test, I would expect all 3 files to be left, because IndexWriter never gets another chance to remove the files. IndexWriter only attempts to remove unreferenced files in roughly 3 places: * On open * On flushing a new segment * On finishing a merge So, the moment your optimize f

RE: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread luocanrao

here a small test case I watched there are three compound files. 0.cfs6786kb 1.cfs2044kb 2.cfs8790kb(the optimize file) I think in this testcase only 2.cfs left(the optimize file left),Is that right?? import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.

RE: here a small test case problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread luocan19826164

I watched there are three compound files. 0.cfs 6786kb 1.cfs 2044kb 2.cfs 8790kb(the optimize file) I think in this testcase only 2.cfs left(the optimize file left),Is that right?? import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.standard.StandardAnalyzer;

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru

On 10/02/10 13:15, Uwe Schindler wrote: Could you provide pointers to search code that uses the segment-level enum ? As I explained in my last answer to Michael, the TermScorer is using the DocsEnum interface, and therefore do not know if it manipulates segment-level enum or a Multi*Enums. What s

Re: read more tokens during analysis

2010-02-10 Thread Grant Ingersoll

On Feb 10, 2010, at 8:33 AM, Rohit Banga wrote: > basically i want to use my own filter wrapping around a standard analyzer. > > the kind explained on page 166 of Lucene in Action, uses input.next() which > is perhaps not available in lucene 3.0 > > what is the substitute method. captureState(

Re: TREC Data and Topic-Specific Index

2010-02-10 Thread Robert Muir

Hi, so you mean around 15% and 24% respectively? i think you could fairly say either of these is an improvement over your baseline of 0.141 what i mean by large difference, is while I think its safe to say that using either of these methods improves over your baseline, i am not sure you can conclu

Re: TREC Data and Topic-Specific Index

2010-02-10 Thread Ivan Provalov

Robert, Thank you for your reply. What would be considered a large difference? We started applying the Sweet Spot Similarity. It gives us an improvement of 0.163-0.141=0.022 MAP so far. LnbLtcSimilarity gets us more improvement: 0.175-0.141=0.034. Thanks, Ivan --- On Sun, 2/7/10, Robert

Re: read more tokens during analysis

2010-02-10 Thread Rohit Banga

basically i want to use my own filter wrapping around a standard analyzer. the kind explained on page 166 of Lucene in Action, uses input.next() which is perhaps not available in lucene 3.0 what is the substitute method. Rohit Banga On Wed, Feb 10, 2010 at 6:46 PM, Rohit Banga wrote: > i want

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Marvin Humphrey

On Wed, Feb 10, 2010 at 06:58:01AM -0500, Michael McCandless wrote: > But why didn't you have the Multi*Enums layer add the offset (so that > the codec need not know who's consuming it)? Performance? That would have involved something like this within the aggregator: posting.setDocID(pos

RE: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Uwe Schindler

> Could you provide pointers to search code that uses the segment-level > enum ? > As I explained in my last answer to Michael, the TermScorer is using > the > DocsEnum interface, and therefore do not know if it manipulates > segment-level enum or a Multi*Enums. What search (or query operators) > i

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru

On 10/02/10 09:47, Uwe Schindler wrote: Positions as attributes would be good. For positions we need a new Attribute (not PositionIncrement), but e.g. for offsets and payloads we can use the standard attributes from the analysis, which is really cool. This would also make it possible to add al

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Renaud Delbru

Hi Michael, On 09/02/10 20:47, Michael McCandless wrote: But, then, it's very convenient when you need it and don't care about performance. EG in Renaud's usage, a test case that is trying to assert that all indexed docs look right, why should you be forced to operate per segment? He shouldn't

Re: Contrib Lucene Analyzers & Stemming

2010-02-10 Thread Robert Muir

hi, what does your test code look like? The Russian stemmer still stems as of 3.0: assertAnalyzesToReuse(a, "Но знание это хранилось в тайне", new String[] { "знан", "хран", "тайн" }); On Wed, Feb 10, 2010 at 4:16 AM, Jamie wrote: > Hi There > > We are having problems with some of the

Re: Problems with IndexWriter#commit() on Linux

2010-02-10 Thread Michael McCandless

Yes. Mike On Wed, Feb 10, 2010 at 6:36 AM, Naama Kraus wrote: > Do you mean by calling > > IndexWriter#*setInfoStream*(PrintStream > > infoStream) > > ? > > Naama > > > On Mon, Feb 8, 2010 at 3:22 PM, Michael McCandless < > luc...@

Re: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Michael McCandless

On Tue, Feb 9, 2010 at 4:44 PM, Marvin Humphrey wrote: >> Interesting... and segment merging just does its own private >> concatenation/mapping-around-deletes of the doc/positions? > > I think the answer is yes, but I'm not sure I understand the > question completely since I'm not sure why you'd

Re: Problems with IndexWriter#commit() on Linux

2010-02-10 Thread Naama Kraus

Do you mean by calling IndexWriter#*setInfoStream*(PrintStream infoStream) ? Naama On Mon, Feb 8, 2010 at 3:22 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hmmm... I think that means you're using the default data

Re: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread Michael McCandless

My guess is there is accidentally still a reader open, at the time that IW tries to delete these unreferenced files. Eg if you close & reopen your reader, always, then there is always a reader open on the index. Try closing all readers, then close IW, then open & close a new IW, and see if the fi

Re: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread luocan19826164

thanks for your reply! but I don't think there is an IndexReader still reading those files,because I call indexReader close and reopen every 1 minute . IW also deletes unreferenced files,but why it delete the optimize file,not delete the old index file. the merged file is what I wanted. ((aft

Re: problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread Michael McCandless

This happens, on Windows, when there is an IndexReader still reading those files. IndexWriter will periodically (after a merge completes or a new segment is flushed) retry deleting those files, but it won't succeed until no reader has a given file open anymore. IW also deletes unreferenced files

RE: Flex & Docs/AndPositionsEnum

2010-02-10 Thread Uwe Schindler

> > And we don't return "objects or aggregates" with Multi*Enum now... > > Yeah, this is different. In KS right now, we use a generic > PostingList, which > conveys different information depending on what class of Posting it > contains. > > > In flex right now the codec is unware that it's being

problem:lucene did not delete old index file after optimize method called

2010-02-10 Thread luocan19826164

lucene did not delete old index file after optimize method called. ps:I call IndexWriter.getReader() and then call old IndexReader.close() every 1 minute, a long time pass, I watche old index file did not disappear. after I restart my program, optimize index file disappear,but old index file

Contrib Lucene Analyzers & Stemming

2010-02-10 Thread Jamie

Hi There We are having problems with some of the Lucene analyzers in the contributions package. For instance, it appears that the Russian analyzer supports stemming, although, when we test it it does not. Is there a specific switch that we must enable to enable the stemming of words? When we

Do deleted documents affect scores?

Re: Flex & Docs/AndPositionsEnum

Re: Flex & Docs/AndPositionsEnum

Re: Flex & Docs/AndPositionsEnum

Re: problem:lucene did not delete old index file after optimize method called

Re: problem:lucene did not delete old index file after optimize method called

RE: problem:lucene did not delete old index file after optimize method called

RE: here a small test case problem:lucene did not delete old index file after optimize method called

Re: Flex & Docs/AndPositionsEnum

Re: read more tokens during analysis

Re: TREC Data and Topic-Specific Index

Re: TREC Data and Topic-Specific Index

Re: read more tokens during analysis

Re: Flex & Docs/AndPositionsEnum

read more tokens during analysis

RE: Flex & Docs/AndPositionsEnum

Re: Flex & Docs/AndPositionsEnum

Re: Flex & Docs/AndPositionsEnum

Re: Contrib Lucene Analyzers & Stemming

Re: Problems with IndexWriter#commit() on Linux

Re: Flex & Docs/AndPositionsEnum

Re: Problems with IndexWriter#commit() on Linux

Re: problem:lucene did not delete old index file after optimize method called

Re: problem:lucene did not delete old index file after optimize method called

Re: problem:lucene did not delete old index file after optimize method called

RE: Flex & Docs/AndPositionsEnum

problem:lucene did not delete old index file after optimize method called

Contrib Lucene Analyzers & Stemming

28 matches

Site Navigation

Mail list logo

Footer information