Thanks guys! I can estimate frequently equal to update 20% of 800k weekly. If I just optimize one time a week the main cost will be in this optimization, correctly?
Erick, I will use your approach, and if it don't work wheel I try option 3, what do you think? Nilesh, your approach looks good, but looks complicated too. I will try a simple approach first. ______________ Iam Jabour On Mon, Nov 1, 2010 at 6:54 PM, Erick Erickson <erickerick...@gmail.com> wrote: > How often is "frequently"? How many updates do you expect to do in > a day? And how quickly must those updates be reflected in the search > results? > > 800K documents isn't all that many. I'd go with the simple approach first > and monitor the results, #then# go to a more complex solution if you > see a problem arising. Just update (delete/add) the documents when > the value changes. > > Well, the cost to reindex is just about what the cost to index it orignally > is. The old version of the document is marked deleted and the new one > is added. It's essentially the same cost as to index a new document. > This leaves some gaps in your index, that is the deleted docs are still in > there, but the next optimize will compact them. > > From which you may infer that optimizing is the expensive part. I'd do that, > say > once daily (or even weekly). > > HTH > Erick > > On Mon, Nov 1, 2010 at 3:25 PM, Iam Jabour <iamjab...@gmail.com> wrote: > >> Hi, I use Lucene to index my documents and search. Actually I have >> 800k documents indexed in Lucene. Those documents have some fields: >> >> Id: is a Numeric field to index the documents >> >> Name: is a textual field to be stored and analyzed >> >> Description: like name >> >> Availability: is a numeric field to filter results. This field can be >> updated frequently, every day. >> >> My question is: What's the better way to create a filter for availability? >> >> 1 - add this information to index and make a lucene filter. With this >> approach I have to update document (remove and add, because lucene >> 3.0.2 not have update support) every time the "availability" changes. >> What the cost of reindex? >> >> 2 - don't add this information to index, and filter the results with a >> DB select. This approach will do a lot of selects, because I need >> select every id from database to check availability. >> >> 3 - Create a separated index with id and availability. I don't know if >> it is a good solution, but I can create a index with static >> information and other with information can be frequently updated. I >> think it is better then update all document, just because some fields >> were updated. >> >> ______________ >> Iam Jabour >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org