Re: Need some Advice on Searching

2006-05-21 Thread David Ahlschläger
On 19/05/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: i assume when you say this... : 1. I need to temporarilly index sets of documents on the Fly say 100 at a : Time. you mean that you'll have lots of temporary indexes of a few hundrad documents and then you'll do a bunch of queries and th

should I avoid create many Fields for a Document?

2006-05-21 Thread Paulo Silveira
Hello What is the best way to search? Should I separate all the fields, or create a big one that have all fields? Does this impact the performance dramatically? Creating a big field I would not need to create a BooleanQuery... last time I did not get any clues, lets see if this time will be bet

indexing in lucene 1.9.1

2006-05-21 Thread Harini Raghavan
Hi All, We have recently upgraded from lucene 1.4.3 to lucene 1.9.1 version. After the upgrade, we are facing some issues: 1. Indexing seems to be behaving differently. There were more than 300 segment files(.cfs) in the index and the IndexSearcher is taking forever to refresh the index. Have t

Re: does anybody have the experience to do some pooling upon lucene?

2006-05-21 Thread Zhenjian YU
Hi, Erik, Thanks for your prompt response. I didn't dig the source code of lucence deep enough, but I noticed that the IndexSearcher uses an IndexReader, while the cost of initializing IndexReader is a bit high. My application is a webapp, so I think it may be good if I cache some instances of

Re: Matching at least N terms of subqueries

2006-05-21 Thread Paul Elschot
On Sunday 21 May 2006 20:01, Chris Hostetter wrote: > : "wrapping" it with a SpanNearQuery. Unless, there is a way to make > : Span(Near)Query take a BooleanQuery as its clause. Is there a way to > > ope .. span queries can only contain other span queries -- they need the > sub queries to propogat

Re: SV: Date first best practises

2006-05-21 Thread Chris Hostetter
: If I use a sort on the datefield and perform a query (with that sort) : will it always rebuild the whole cache or just the cache for the actual : hits? the FieldCache is built for all documents so that it's completleyte reusable for any search that sorts on that field -- as long as you keep you

Re: Matching at least N terms of subqueries

2006-05-21 Thread Chris Hostetter
: "wrapping" it with a SpanNearQuery. Unless, there is a way to make : Span(Near)Query take a BooleanQuery as its clause. Is there a way to ope .. span queries can only contain other span queries -- they need the sub queries to propogate up the span information which normal queries don't know abou

SV: Date first best practises

2006-05-21 Thread Marcus Falck
Thanks. I preffer sorting. But I'm afraid that it won't be enough. How long time do you think it will take to rebuild the caches? If I use a sort on the datefield and perform a query (with that sort) will it always rebuild the whole cache or just the cache for the actual hits? / Marcus ___

Re: Date first best practises

2006-05-21 Thread Erik Hatcher
On May 21, 2006, at 11:31 AM, Marcus Falck wrote: I will use Lucene to index 200 million documents (doc size 2kb -> 20 kb). With the following requirements: IndexSearcher needs to be created atleast every 5 minute. The ranking/scoring/sorting will need to reply the hits ordered by date desc.

Date first best practises

2006-05-21 Thread Marcus Falck
Hi, I will use Lucene to index 200 million documents (doc size 2kb -> 20 kb). With the following requirements: IndexSearcher needs to be created atleast every 5 minute. The ranking/scoring/sorting will need to reply the hits ordered by date desc. Will the sorting be good enough on a machine wit

Re: Matching at least N terms of subqueries

2006-05-21 Thread Michael Chan
Come to think of it... I can only use SpanOrQuery because I'm "wrapping" it with a SpanNearQuery. Unless, there is a way to make Span(Near)Query take a BooleanQuery as its clause. Is there a way to set the min. number of terms to be matched in an OR subquery inside a SpanNearQuery? Thanks. Micha

SpanScorer Out Of Bounds

2006-05-21 Thread Michael Chan
Hi, Somehow, after running many searches using instances of SpanQuery (mostly SpanNearQuery), I get the ArrayIndexOutOfBounds exception: "bash-2.03$ java.lang.ArrayIndexOutOfBoundsException: 2147483647 at org.apache.lucene.search.spans.SpanScorer.score(SpanScorer.java:72) at org.a

RE: Scoring purely on term frequencies

2006-05-21 Thread Ziv Gome
Hi Wouter, My thought would be to go for plan (b) (have not tested it though). This would produce simply the sum of frequencies of the different terms (I'm referring to a real multi-term query, not a phrase as you mentioned - "the man" - which should work). The problem I see is that it you loose