need a better way of caching the value of a field ???

2008-04-16 Thread Shailendra Mudgal
Hi folks, I am using a MultiSearcher object which uses 4 months indexes. I have a requirement for which i need to cache one field for documents which are less then one month old. So for that i am first creating a date query(for last one month) and using HitCollector.collect() for collecting Docume

Any idea ?

2008-04-16 Thread Bhavin Pandya
Hi, I am getting this exception sometimes while searching.. .Any idea what could be problem ? java.lang.NullPointerException at org.apache.lucene.search.MultiSearcherThread.hits(ParallelMultiSearcher.java:286) at org.apache.lucene.search.ParallelMultiSearcher.search(ParallelMul

Re: QueryWrapperFilter question...

2008-04-16 Thread Michael Stoppelman
Actually, I screwed up the timing info. I wasn't including the time for the QueryWrapperFilter#bits(IndexReader) call. Sadly, it actually takes longer than the original query that had both terms included. Bummer. I had really convinced myself till the thought came to me at lunch :). -M On Wed, A

Re: QueryWrapperFilter question...

2008-04-16 Thread Karl Wettin
Michael Stoppelman skrev: Hi all, I've been doing some performance testing and found that using QueryWrapperFilter for a location field restriction I have to do allows my search results to approach 5-10ms. This was surprising. Before the performance was between 50ms-100ms. The queries from befor

Re: Using Lucene to find duplicate/similar names

2008-04-16 Thread eks dev
NGrams will do ok, depends a lot on what you are up to, if there is a person looking at result lists making decision, it will work fine as default TF/IDF similarity will give you ok order of hits, but if you need to set some cutoff value to decide automatically if this is a match or not, then y

Re: Pooled searcher (was: Solid State Drives vs. RAMDirectory)

2008-04-16 Thread Jake Mannix
We started doing the same thing (pooling 1 searcher per core) at my work when profiling showed a lot of time hitting synchonized blocks deep inside the SegmentTermReader (? Might be messing the class up) under high load, due to file read()'s using instance variables for seeking. I could dig up the

Re: Analyzer for WikipediaTokenizer

2008-04-16 Thread Yonik Seeley
On Wed, Apr 16, 2008 at 3:13 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > LOL. That would probably be useful, eh? :-). Not sure why it completely > slipped my mind other than I use it in Solr. I suppose it would make sense > to create a variation of the StandardAnalyzer that uses the > Wiki

Re: Analyzer for WikipediaTokenizer

2008-04-16 Thread Grant Ingersoll
LOL. That would probably be useful, eh? :-). Not sure why it completely slipped my mind other than I use it in Solr. I suppose it would make sense to create a variation of the StandardAnalyzer that uses the WikipediaTokenizer instead. Care to crank out a patch? -Grant On Apr 16, 2008,

Re: How to improve performance of large numbers of successive searches?

2008-04-16 Thread Chris McGee
Hi Erick, Thanks for the information. I changed over my code to use a reader and get a term enumeration. Once I find a value that matches an element in my set, I use a TermDocs object to seek to that term and open all of the matching documents. This has sped up my searches by a large amount. So

QueryWrapperFilter question...

2008-04-16 Thread Michael Stoppelman
Hi all, I've been doing some performance testing and found that using QueryWrapperFilter for a location field restriction I have to do allows my search results to approach 5-10ms. This was surprising. Before the performance was between 50ms-100ms. The queries from before the optimization look like

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Glen Newton
On 16/04/2008, Michael McCandless <[EMAIL PROTECTED]> wrote: > These are great results! Thanks for posting. Thanks! > > I'd be curious if you'd get better indexing throughput by using a single > IndexWriter, fed by all 8 indexing threads, with an 8X bigger RAM buffer, > instead of 8 IndexWriter

Pooled searcher (was: Solid State Drives vs. RAMDirectory)

2008-04-16 Thread Karl Wettin
Toke Eskildsen skrev: In the log names, t2 signifies 2 threads with a shared searcher, t2u signifies 2 threads with separate searchers. metis_RAM_24GB_i14_v23_t1_l23.log 530.0 q/sec metis_RAM_24GB_i14_v23_t2_l23.log 888.2 q/sec Did someone end up investigating this thing with pool

Analyzer for WikipediaTokenizer

2008-04-16 Thread David Etter
Is there an Analyzer for the WikipediaTokenizer? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene to find duplicate/similar names

2008-04-16 Thread Andy DePue
Thanks for the pointer. I found the thread, and there is certainly some interesting information there. I'd like to stick to what Lucene has available today, mainly because I lack the time to implement anything more than that. I originally thought Levenshtein, but then realized that Lucene wo

Re: Using Lucene to find duplicate/similar names

2008-04-16 Thread Grant Ingersoll
I believe there were some posts on this about a year ago. Try searching in the archives for duplicate names, as well as "record linkage" or any other various synonyms that you can think of. The short answer is Lucene is reasonable to attempt this with, but you may need some help. The lon

Using Lucene to find duplicate/similar names

2008-04-16 Thread Andy DePue
I'm new to Lucene, and would like to use it to find duplicate (or similar) names in a contact list. Is Lucene a good fit? We have a form where a user enters a company or person's name, and we want the system to warn them if there is already a company or person entered with the same or similar n

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Michael McCandless
These are great results! Thanks for posting. I'd be curious if you'd get better indexing throughput by using a single IndexWriter, fed by all 8 indexing threads, with an 8X bigger RAM buffer, instead of 8 IndexWriters that merge in the end. How long does that final merge take now? Also, 6

Re: Lucene performance: benchmarktemplate.xml

2008-04-16 Thread Glen Newton
Cass, Thanks for converting it. I've posted it to my blog: http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html Sorry for the XML tags: I guess I followed the instructions on the Lucene performance benchmarks page to literally ("Post these figures to the lucene-user maili