I have 1 bln documents to sort. So, that would mean ( 8 bln bytes == 8GB RAM)
bytes.
All I have is 8 GB on my machine, so I do not think approach would work.
Any other options?
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Thursday, August 19, 2010 7:18
Aida,
Right now it will do two term collocation only.
Ivan
--- On Mon, 8/23/10, Aida Hota wrote:
> From: Aida Hota
> Subject: Re: Calculate Term Co-occurrence Matrix
> To: java-user@lucene.apache.org
> Date: Monday, August 23, 2010, 1:36 PM
> Hi Ivan thanx a lot for this. I just
> caught tim
At any given time, you need to have at least twice as much disk space
available as the total index size, for the use case you mention, but
also in the case of optimization. It is possible for an optimize to
double the index size right before the commit.
You could try to dynamically call expungeDel
> reclamation may take longer ... for segments ... less activity
At the present time, I'm concerned about adding a field to every document in an
existing index. The activity is delete followed by add many times. So if my
disk
capacity is 32GB and my index size is 20GB, there may be plenty of sp
We had a situation where our index size was inflated to roughly double.
It took about a couple of months, but the size eventually dropped back
down, so it does seem to eventually get rid of the deleted documents.
With that said, in the future expungeDeletes will get called once a day
to better man
On Mon, 2010-08-23 at 11:43 +0200, gag...@graffiti.net wrote:
> Intererstingly, the copy is quite fast (around 30s) when there are no
> searches in progress.
I agree with Anshum: This looks very much like IO contention.
However, it might not just be a case of seek-time trouble: We've had
similar