Hi folks,
I am using a MultiSearcher object which uses 4 months indexes. I have a
requirement for which i need to cache one field for documents which are less
then one month old. So for that i am first creating a date query(for last
one month) and using HitCollector.collect() for collecting Docume
Hi,
I am getting this exception sometimes while searching.. .Any idea what could be
problem ?
java.lang.NullPointerException
at
org.apache.lucene.search.MultiSearcherThread.hits(ParallelMultiSearcher.java:286)
at
org.apache.lucene.search.ParallelMultiSearcher.search(ParallelMul
Actually, I screwed up the timing info. I wasn't including the time for the
QueryWrapperFilter#bits(IndexReader) call. Sadly,
it actually takes longer than the original query that had both terms
included. Bummer. I had really convinced myself till the
thought came to me at lunch :).
-M
On Wed, A
Michael Stoppelman skrev:
Hi all,
I've been doing some performance testing and found that using
QueryWrapperFilter for a location field
restriction I have to do allows my search results to approach 5-10ms. This
was surprising.
Before the performance was between 50ms-100ms.
The queries from befor
NGrams will do ok,
depends a lot on what you are up to, if there is a person looking at result
lists making decision, it will work fine as default TF/IDF similarity will give
you ok order of hits, but if you need to set some cutoff value to decide
automatically if this is a match or not, then y
We started doing the same thing (pooling 1 searcher per core) at my
work when profiling showed a lot of time hitting synchonized blocks
deep inside the SegmentTermReader (? Might be messing the class up)
under high load, due to file read()'s using instance variables for
seeking. I could dig up the
On Wed, Apr 16, 2008 at 3:13 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> LOL. That would probably be useful, eh? :-). Not sure why it completely
> slipped my mind other than I use it in Solr. I suppose it would make sense
> to create a variation of the StandardAnalyzer that uses the
> Wiki
LOL. That would probably be useful, eh? :-). Not sure why it
completely slipped my mind other than I use it in Solr. I suppose it
would make sense to create a variation of the StandardAnalyzer that
uses the WikipediaTokenizer instead. Care to crank out a patch?
-Grant
On Apr 16, 2008,
Hi Erick,
Thanks for the information. I changed over my code to use a reader and get
a term enumeration. Once I find a value that matches an element in my set,
I use a TermDocs object to seek to that term and open all of the matching
documents. This has sped up my searches by a large amount. So
Hi all,
I've been doing some performance testing and found that using
QueryWrapperFilter for a location field
restriction I have to do allows my search results to approach 5-10ms. This
was surprising.
Before the performance was between 50ms-100ms.
The queries from before the optimization look like
On 16/04/2008, Michael McCandless <[EMAIL PROTECTED]> wrote:
> These are great results! Thanks for posting.
Thanks!
>
> I'd be curious if you'd get better indexing throughput by using a single
> IndexWriter, fed by all 8 indexing threads, with an 8X bigger RAM buffer,
> instead of 8 IndexWriter
Toke Eskildsen skrev:
In the log names, t2 signifies 2 threads with a shared
searcher, t2u signifies 2 threads with separate searchers.
metis_RAM_24GB_i14_v23_t1_l23.log 530.0 q/sec
metis_RAM_24GB_i14_v23_t2_l23.log 888.2 q/sec
Did someone end up investigating this thing with pool
Is there an Analyzer for the WikipediaTokenizer?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks for the pointer. I found the thread, and there is certainly some
interesting information there. I'd like to stick to what Lucene has
available today, mainly because I lack the time to implement anything
more than that. I originally thought Levenshtein, but then realized
that Lucene wo
I believe there were some posts on this about a year ago. Try
searching in the archives for duplicate names, as well as "record
linkage" or any other various synonyms that you can think of. The
short answer is Lucene is reasonable to attempt this with, but you may
need some help. The lon
I'm new to Lucene, and would like to use it to find duplicate (or
similar) names in a contact list. Is Lucene a good fit?
We have a form where a user enters a company or person's name, and we
want the system to warn them if there is already a company or person
entered with the same or similar n
These are great results! Thanks for posting.
I'd be curious if you'd get better indexing throughput by using a
single IndexWriter, fed by all 8 indexing threads, with an 8X bigger
RAM buffer, instead of 8 IndexWriters that merge in the end.
How long does that final merge take now?
Also, 6
Cass,
Thanks for converting it. I've posted it to my blog:
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Sorry for the XML tags: I guess I followed the instructions on the
Lucene performance benchmarks page to literally ("Post these figures
to the lucene-user maili
18 matches
Mail list logo