Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
Sorry Peter - I introduced this problem with some kind of typo type issue - I somehow changed an includeSpans variable to excludeSpans - but I certainly didn't mean too - it makes no sense. So not sure how it happened, and surprised the tests that passed still passed! We could probably use even

RE: Reusing a CachingWrapperFilter

2011-07-25 Thread Uwe Schindler
This collector uses no resources, I would recreate it inside the loop. Its just a thin class in young gen heap (like a autoboxed number). If you really want to reuse, you have to log the last count and simply measure the difference as you cannot reset. - Uwe Schindler H.-H.-Meier-Allee 63, D-2

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
Thanks Peter - if you supply the unit tests, I'm happy to work on the fixes. I can likely look at this later today. - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > Hi Mark, > > Sorry to bug you again, but there's another case that fails the unit test > (s

RE: Reusing a CachingWrapperFilter

2011-07-25 Thread Konstantyn Smirnov
Uwe Schindler wrote: > > To just count the results use TotalHitCountCollector (since Lucene Core > 3.1) > with IndexSaercher.search(). > ok, thanks for that! so the code should look like: CachingWrapperFilter cwf = new CachingWrapperFilter( filter ) searcher.search( query, cwf ... ) // search

Re: Search within a sentence (revisited)

2011-07-25 Thread Peter Keegan
Hi Mark, Sorry to bug you again, but there's another case that fails the unit test (search within the second sentence), as shown here in the last test: package org.apache.lucene.search.spans; import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.To

Re: 4.0-SNAPSHOT in maven repo via Jenkins?

2011-07-25 Thread Eric Charles
Hi Steven, Thx for your answers. Seems like I missed the wiki page and SOLR-2634 where much is already said. Cheers. On 25/07/11 15:16, Steven A Rowe wrote: Hi Eric, On 7/24/2011 at 3:07 AM, Eric Charles wrote: 0112233445566778

RE: 4.0-SNAPSHOT in maven repo via Jenkins?

2011-07-25 Thread Steven A Rowe
Hi Eric, On 7/24/2011 at 3:07 AM, Eric Charles wrote: 0112233445566778 12345678901234567890123456789012345678901234567890123456789012345678901234567890 > Jenkins jobs builds lucene trunk with 'mvn --batch-mode > --non-recursive -Pboot

RE: Reusing a CachingWrapperFilter

2011-07-25 Thread Uwe Schindler
The problem with your code is , that the IndexSearcher.search() method works per segment, but in your cardinality code you are using top-level-readers, so it will cache two times (once per segments and once for the top-level MultiReader/DirectoryReader). Also not all DocIdSet implementations extend

Re: Index one huge text file

2011-07-25 Thread Konstantyn Smirnov
If you read your file as a stream, i.e. line-by-line without buffering it in RAM, you shall have no problems with performance, as 60k lines is a piece of cake :). You can try using LineNumberReader: Reader lnr = new LineNumberReader( new FileReader( new File( '/path/to/your/file' ) ) ) String lin

Reusing a CachingWrapperFilter

2011-07-25 Thread Konstantyn Smirnov
Hi all! are there any limitations or implications on reusing a CWF? In my app I'm doing the following: Filter filter = new BooleanFilter(...) // initialized with a couple of Term-, Range-, Boolean- and PrefixFilter CachingWrapperFilter cwf = new CachingWrapperFilter( filter ) searcher.search(

boolean score calculation

2011-07-25 Thread Pavel Goncharik
Hi, as far as I can see, boolean scorers always sum up scores of their sub-scorers. It works, but in case of my application it's required to multiply sub-scores. Is there a simple/efficient way to do this (apart from modifying lucene's source code)? It seems to me that standard tricks (e.g. Custom