Re: Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread Chris Hostetter
: This is because the index is updated every 5 mins or so, due to the incoming : feed of stories .. : : When you say iteration, i take it you mean, search request, well for each : search that is conducted I create a new one .. search reader that is .. yeah ... i ment iteration of your test. don'

Re: Overriding Similarity

2006-08-19 Thread MH H
Ah, I see, I should of course use the same similarity during indexing and searching. Many thanks! On 20/08/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : And then I made this subclass the default similarity. It worked well : for tf but not for lengthNorm. The reason appears to be that the : Te

Re: Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread M A
yes there is a new searcher opened each time a search is conducted, This is because the index is updated every 5 mins or so, due to the incoming feed of stories .. When you say iteration, i take it you mean, search request, well for each search that is conducted I create a new one .. search read

Re: Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread Chris Hostetter
: hits = searcher.search(query, new Sort("sid", true)); you don't show where searcher is initialized, and you don't clarify how you are timing your multiple iterations -- i'm going to guess that you are opening a new searcher every iteration right? sorting on a field requires pre-computing a

Re: Overriding Similarity

2006-08-19 Thread Chris Hostetter
: And then I made this subclass the default similarity. It worked well : for tf but not for lengthNorm. The reason appears to be that the : TermScorer class does not call lengthNorm, but instead uses a cache Acctually, the lengthNorm method is used by the IndexWriter; it compresses the float retur

Re: Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread M A
what i am measuring is this Analyzer analyzer = new StandardAnalyzer(new String[]{}); if(fldArray.length > 1) { BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD}; query = MultiFieldQueryP

Re: Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread Erick Erickson
This is a lnggg time, I think you're right, it's excessive. What are you timing? The time to complete the search (i.e. get a Hits object back) or the total time to assemble the response? Why I ask is that the Hits object is designed to return the fir st100 or so docs efficiently. Every 10

Search Performance Problem 16 sec for 250K docs

2006-08-19 Thread M A
Hi there, I have an index with about 250K document, to be indexed full text. there are 2 types of searches carried out, 1. using 1 field, the other using 4 .. for a query string ... given the nature of the queries required, all stop words are maintained in the index, thereby allowing for phrasa

Re: Overriding Similarity

2006-08-19 Thread MH H
I had a situation where I was only interested in whether the term was there or not (not how many times), and I didn't want to penalize long fields. So I wrote a Similariy subclass where I overrided the following methods as this: public float lengthNorm(String fieldName, int numTerms) { ret

Re: Indexing Documents which has Attachments and are Refered many times!!

2006-08-19 Thread Jason Polites
I think you can still achieve your desired outcome, but I'm not sure I fully understand the use case. Can you describe more clearly a specific example of what you need to achieve? You are correct that "joins" in lucene aren't really a strong point, but this is often a by-product of thinking abou