RE: Sorting a Lucene index

2010-08-24 Thread Shelly_Singh
I have 1 bln documents to sort. So, that would mean ( 8 bln bytes == 8GB RAM) bytes. All I have is 8 GB on my machine, so I do not think approach would work. Any other options? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, August 19, 2010 7:18

Re: Calculate Term Co-occurrence Matrix

2010-08-24 Thread Ivan Provalov
Aida, Right now it will do two term collocation only. Ivan --- On Mon, 8/23/10, Aida Hota wrote: > From: Aida Hota > Subject: Re: Calculate Term Co-occurrence Matrix > To: java-user@lucene.apache.org > Date: Monday, August 23, 2010, 1:36 PM > Hi Ivan thanx a lot for this. I just > caught tim

RE: Wanting batch update to avoid high disk usage

2010-08-24 Thread Beard, Brian
At any given time, you need to have at least twice as much disk space available as the total index size, for the use case you mention, but also in the case of optimization. It is possible for an optimize to double the index size right before the commit. You could try to dynamically call expungeDel

Re: Wanting batch update to avoid high disk usage

2010-08-24 Thread Justin
> reclamation may take longer ... for segments ... less activity At the present time, I'm concerned about adding a field to every document in an existing index. The activity is delete followed by add many times. So if my disk capacity is 32GB and my index size is 20GB, there may be plenty of sp

RE: Wanting batch update to avoid high disk usage

2010-08-24 Thread Beard, Brian
We had a situation where our index size was inflated to roughly double. It took about a couple of months, but the size eventually dropped back down, so it does seem to eventually get rid of the deleted documents. With that said, in the future expungeDeletes will get called once a day to better man

Re: slow search threads during a disk copy

2010-08-24 Thread Toke Eskildsen
On Mon, 2010-08-23 at 11:43 +0200, gag...@graffiti.net wrote: > Intererstingly, the copy is quite fast (around 30s) when there are no > searches in progress. I agree with Anshum: This looks very much like IO contention. However, it might not just be a case of seek-time trouble: We've had similar