One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
Hi, I have the following problem. I'm indexing documents that belong to some collection (ie. the dataset is divided into collections, which are divided into documents). These documents become my lucene documents, with some relatively small string that becomes the field I want to search. Howev

Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
". Otherwise, could you describe what behavior you're after and maybe there'd be more ideas Best Erick On 5/19/07, Peter Bloem <[EMAIL PROTECTED]> wrote: Hi, I have the following problem. I'm indexing documents that belong to some collection (ie. the dataset

Re: One (large) field shared by many documents

2007-05-19 Thread Peter Bloem
sideration is "how many collections do you have?" The reason I ask is that in the worst case scenario, you'll have an OR clause for every collection ID you have. Lucene can easily handle many thousands of terms in an OR, but your search time will suffer. And you'll have to take

Re: One (large) field shared by many documents

2007-05-20 Thread Peter Bloem
ndred thousand.On the other hand, any reasonable query should return only as much collections as it would from a set of medium sized documents. I guess the only way to find out how bad the performance will be, is to implement it. regards, Peter Paul Elschot wrote: On Sunday 20 May 2007 02:49

Re: One (large) field shared by many documents

2007-05-20 Thread Peter Bloem
he formula in the Similiarity docs correctly). thank you for your comments so far, Peter Erick Erickson wrote: See Paul's e-mail, he's talking about a place I haven't been in Lucene yet. Other than that, see below On 5/19/07, Peter Bloem <[EMAIL PROTECTED]> wrote: Ah, now w

Optional terms in BooleanQuery

2007-05-20 Thread Peter Bloem
I'm constructing a search with some required terms and some optional terms in in the query. According to some earlier posts that looks like "+(A B) C D E" in query syntax for required terms A and B and optional terms C D and E. In other words, Lucene considers all documents that have both A and