Re: How to handle more than Integer.MAX_VALUE documents?

2010-11-01 Thread Lance Norskog
2billion is a hard limit. Usually people split indexes into multiple index long before this, and use the parallel multi reader (I think) to read from all of the sub-indexes. On Mon, Nov 1, 2010 at 2:16 PM, Zhang, Lisheng wrote: > > Hi, > > Now lucene uses integer as document id, so it means we ca

IndexWriter.close() performance issue

2010-11-01 Thread Mark Kristensson
Hello, One of our Lucene indexes has started misbehaving on indexWriter.close and I'm searching for ideas about what may have happened and how to fix it. Here's our scenario: - We have seven Lucene indexes that contain different sets of data from a web application are indexed for searching by

Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.

2010-11-01 Thread Iam Jabour
Thanks guys! I can estimate frequently equal to update 20% of 800k weekly. If I just optimize one time a week the main cost will be in this optimization, correctly? Erick, I will use your approach, and if it don't work wheel I try option 3, what do you think? Nilesh, your approach looks good,

How to handle more than Integer.MAX_VALUE documents?

2010-11-01 Thread Zhang, Lisheng
Hi, Now lucene uses integer as document id, so it means we cannot have more than 2^31-1 documents within one collection? Even if we use MultiSearcher the document id is still integer so it seems this is still a problem? We have been using lucene for some time and our document count is growing ra

Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.

2010-11-01 Thread Erick Erickson
How often is "frequently"? How many updates do you expect to do in a day? And how quickly must those updates be reflected in the search results? 800K documents isn't all that many. I'd go with the simple approach first and monitor the results, #then# go to a more complex solution if you see a prob

Re: MemoryIndex or RAMDirectory, but score using term statistics from a corpus given during preprocessing?

2010-11-01 Thread Joseph Turian
Does this question make sense? What I want is to compute term statistics over a corpus, and then use these statistics when doing scoring + retrieval using a MemoryIndex or RAMDirectory. How can I do that? Thanks, Joseph On Thu, Oct 28, 2010 at 8:06 PM, Joseph Turian wrote: > How do I use

Re: Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.

2010-11-01 Thread Nilesh Vijaywargiay
Hey Iam, I have worked on the approach number 3 recently. It suits our requirement although its not the best way to do incremental updates. Please find the details here Nilesh On Mon, Nov 1, 2010 at 12:25 PM, Iam Jabour wrote:

Best way to create a Lucene Index with fields that can be updated frequently, and filtering the results by this field.

2010-11-01 Thread Iam Jabour
Hi, I use Lucene to index my documents and search. Actually I have 800k documents indexed in Lucene. Those documents have some fields: Id: is a Numeric field to index the documents Name: is a textual field to be stored and analyzed Description: like name Availability: is a numeric field to filt

Re: filtering results per field?

2010-11-01 Thread Erick Erickson
I'm not quite following here. You can construct filters on any field you want, and combine them as you choose, then apply the resulting filter to your query. See TermsFilter for instance. Or, your filter could #be# your query. If this is gibberish, could you give an example or two showing what you

filtering results per field?

2010-11-01 Thread Francisco Borges
Hello, I would like to search several fields while applying different Filter's to the results of different fields. Is it possible to (efficiently) filter out results according to which fields they are coming from? I've been navigating the code and Javadocs, and haven't found any way to do it. On

Re: Indexing with foreign key

2010-11-01 Thread Paulo Levi
Just self imposed.