Re: Searching by bit masks

2006-11-28 Thread Biggy
OK here what i've come up with - After reading your suggestions - bit set from DB stays untouched - only one field shall be used to store interest field bits in the document: "interest". Saves disk space. - The bits shall be not be converted to readable string but added as values separated by spa

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-28 Thread Stanislav Jordanov
Paul, we are using a slightly modified version of Lucene, so in order to run the performance tests on a nightly build, I need Lucene's sources, not the compiled classes. Is there a nice and easy way to get them? Stanislav Stanislav Jordanov wrote: Paul, We are working on delivering the next

Re: Searching by bit masks

2006-11-28 Thread Erick Erickson
Lucene will automatically separate tokens during index and search if you use the right analyzer. See the various classes that implement Analyzer. I don't know if you really wanted to use the numeric literals, but I wouldn't. The analyzers that do the most for you (automatically break up on spaces,

Re: Searching by bit masks

2006-11-28 Thread Biggy
The background of this is also separating content according to domains Example: - pictureA (marked as a "joke" #flag :1) - pictureB (marked as a "adult picture" #flag: 2) Site1: Users allowed to view everything (pictureA, pictureB ) Site2: Users allowed to view everything except pictureB (no adu

Re: Searching by bit masks

2006-11-28 Thread Erick Erickson
You could store a value for each flag then be careful about what analyzers you use. For instance, using WhitespaceAnalyzer (index AND search) and doing your own casing. That is, make sure you lowercase as necessary (NOTE: operators AND, OR NOT must not be lowercased if you send them through queryp

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-28 Thread Paul Elschot
On Tuesday 28 November 2006 12:12, Stanislav Jordanov wrote: > Paul, > we are using a slightly modified version of Lucene, > so in order to run the performance tests on a nightly build, I need > Lucene's sources, not the compiled classes. > Is there a nice and easy way to get them? The sources ar

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Suman Ghosh
Mike, Below is the pseudo code of the application. A few implementation points to understand the pseudo-code: - We have a home grown threadpool class that allows us to index multiple documents in parallel. We usually submit 200 jobs to the pool (2-3 worker threads usually for the pool). O

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Yonik Seeley
On 11/27/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Suman Ghosh wrote: > On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: >> > Here are the values: >> > >> > mergeFactor=10 >> > maxMergeDocs=10 >> > minMergeDocs=100 >> > >> >

RE: Hits length with no sorting or scoring

2006-11-28 Thread Hirsch Laurence
The code works very well, Thanks, Laurie -Original Message- From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: 27 November 2006 18:52 To: java-user@lucene.apache.org Subject: Re: Hits length with no sorting or scoring On Monday 27 November 2006 14:30, Hirsch Laurence wrote: > Hello, >

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Yonik Seeley wrote: Actually, in previous versions of Lucene, it *was* possible to get way too many first level segments because of the wonky logic when the IndexWriter was closed. That has been fixed in the trunk with the new merge policy, and you will never see more than mergeFactor first lev

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
This looks correct to me. It's good you are doing the deletes "in bulk" up front for each batch of documents. So I guess you hit the error (& 5000 segments files) while processing batches of 200 docs (because you then optimize in the end)? Do you search this index while it's building, or, only

Re: Syns2Index utility: version of Lucene and Java

2006-11-28 Thread Chris Hostetter
1) I don't really know anything about Syns2Index - but the errors you cited don't seem to have anything to do with Lucene ... your compiler appears to be complaining about assert statements within the core java system classes ... which is a little strainge. you said you are psat the HellowWorld

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Suman Ghosh
The search functionality must be available during the index build. Since a relatively small number of documents are being affected (and also we plan to perform the build during a period of time we know to be relatively quiet from last 2 years site access data) during the build process, we hope tha

multiple keyword fields vs. multiple-token field

2006-11-28 Thread Michael Rusch
I have documents that can be referred to by multiple identifiers (and I want to store the identifiers separate from the main indexed content). I'm wondering if I should put each identifier in it's own keyword field, or have one tokenized field with all of the identifiers in it. What I'm talking a

Re: multiple keyword fields vs. multiple-token field

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 4:31 PM, Michael Rusch wrote: I have documents that can be referred to by multiple identifiers (and I want to store the identifiers separate from the main indexed content). I'm wondering if I should put each identifier in it's own keyword field, or have one tokenized fi

BUG ? - lucene multisearcher / sorting

2006-11-28 Thread Kai R. Emde
Hello, we have one problem with the sort routine. We use the multisearcher function over severall index. The result will be sorted by the booknumber, but the produced list isn't sorted correct. There are 300 hits from book a, then 150 from book b, 95 hits book 3, but then there are 1,2,3 hits of

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Suman Ghosh wrote: The search functionality must be available during the index build. Since a relatively small number of documents are being affected (and also we plan to perform the build during a period of time we know to be relatively quiet from last 2 years site access data) during the buil