I wonder if you might not get better performance in a case like this if you
were ok taking your index off line, disabling merges, performing deletions
and only then enabling merges? This could be done on a copy of the index if
updates can be turned off or held in a queue, so that queries could stil
Thanks so much. I actually found that my purging routine finished after
about 35 minutes which is really acceptable given that this routine is
supposed to run during the overnight period.
On Feb 28, 2018 8:34 PM, "Adrien Grand" wrote:
> Thanks. Deleting lots of documents can indeed trigger a lot
Thanks. Deleting lots of documents can indeed trigger a lot of work in the
Lucene side. First Lucene likely needs to rewrite the live docs of all your
segments and then this might trigger significant merging activity due to
the fact that Lucene tries to keep the number of deleted docs reasonable so
I call deleteDocuments
On Feb 28, 2018 8:16 PM, "Adrien Grand" wrote:
> What do you mean by purging? What methods do you call?
>
> Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg a
> écrit :
>
> > I have huge lucene index. On disk it's about 24Gb.
> >
> >
> >
> > I have a purging routine that is
What do you mean by purging? What methods do you call?
Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg a
écrit :
> I have huge lucene index. On disk it's about 24Gb.
>
>
>
> I have a purging routine that is supposed to run and purge old docs.
>
>
>
> There are about 650 million docs in there and
I have huge lucene index. On disk it's about 24Gb.
I have a purging routine that is supposed to run and purge old docs.
There are about 650 million docs in there and through testing I have
determined that about 1/3 of these need to be purged.
During the purge, every so often it's appare
rio:
> > I have an application with frequent doucment updates and it uses NRT
> > reader. I also have a background thread which calls commit on the
> > IndexWriter on a regular interval. There is one instance of IndexWriter
> > which is created and it is never closed.
> &
ead which calls commit on the
> IndexWriter on a regular interval. There is one instance of IndexWriter
> which is created and it is never closed.
>
> The problem is that Lucene is creating lot of files and huge index which is
> way much more than the size of documents. This size is not r
that Lucene is creating lot of files and huge index which is
way much more than the size of documents. This size is not reduced despite
the lucene merge running in the background. (I am using the
default TieredMergePolicy and ConcurrentMergeScheduler).
I also tried running a background thread
ead which calls commit on the
> IndexWriter on a regular interval. There is one instance of IndexWriter
> which is created and it is never closed.
>
> The problem is that Lucene is creating lot of files and huge index which is
> way much more than the size of documents. This size is not r
that Lucene is creating lot of files and huge index which is
way much more than the size of documents. This size is not reduced despite
the lucene merge running in the background. (I am using the
default TieredMergePolicy and ConcurrentMergeScheduler).
I also tried running a background thread
o: java-user@lucene.apache.org
> Subject: RE: howto run CheckIndex on huge index size
>
> I hope the problem is fixed now; this mail is just to check! It was hard
to
> unsubscribe because of the strange eMail. Have no idea at all...
>
> Uwe
>
> -
> Uwe Schindler
> H
m: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Wednesday, August 15, 2012 3:13 PM
> To: java-user@lucene.apache.org
> Subject: RE: howto run CheckIndex on huge index size
>
> I got is, too. As a moderator of this list, I will look into finding the
root cause
> and forcefully
ehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Wednesday, August 15, 2012 3:04 PM
> To: java-user@lucene.apache.org
> Subject: Re: howto run CheckIndex on huge index size
>
>
> I guess that ulimit could be a default setting of XenServer when it was
first time
> setup.
>
om: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
>> Sent: Wednesday, August 15, 2012 2:07 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: howto run CheckIndex on huge index size
>>
>> Hi Uwe,
>>
>> index size is:
>> -rw-r--r-- 1 solr users 82
bject: Re: howto run CheckIndex on huge index size
>
> Hi Uwe,
>
> index size is:
> -rw-r--r-- 1 solr users 82G 15. Aug 07:50 _2rhe.fdt
> -rw-r--r-- 1 solr users 303M 15. Aug 07:50 _2rhe.fdx
> -rw-r--r-- 1 solr users 1,2k 15. Aug 07:36 _2rhe.fnm
> -rw-r--r-- 1 solr users 39G
> Sent: Wednesday, August 15, 2012 1:25 PM
>> To: java-user@lucene.apache.org
>> Subject: howto run CheckIndex on huge index size
>>
>>
>> I'm trying to run CheckIndex as seperate tool on a large index to get nice
> infos
>> about number of terms, number of tok
feld.de]
> Sent: Wednesday, August 15, 2012 1:25 PM
> To: java-user@lucene.apache.org
> Subject: howto run CheckIndex on huge index size
>
>
> I'm trying to run CheckIndex as seperate tool on a large index to get nice
infos
> about number of terms, number of tokens, ... but a
I'm trying to run CheckIndex as seperate tool on a large index to get
nice infos about number of terms, number of tokens, ... but always get OOM
exception.
Already have JAVA_OPTS -d64 -Xmx25g -Xms25g -Xmn6g
Any idea how to use CheckIndex on huge index size?
Opening index @ /srv/www
009 2:18:35 PM
> Subject: Re: How to avoid huge index files
>
>
> Is it possible to upload to GAE an already exist index? My index is data I'm
> collecting for long time, and I prefer not to give it up.
>
>
in Google App Engine
>> > (http://code.google.com/appengine/), which limits files length to be
>> > smaller than 10MB.
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
&
Another alternative is storing the indexes in the Google Datastore, I think
Compass already supports that (though I have not used it).
Also, I have successfully run Lucene on GAE using GaeVFS
(http://code.google.com/p/gaevfs/) to store the index in the Datastore.
(I developed a Lucene Directory
28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> From: Dvora [mailto:barak.ya...@gmail.com]
>> Sent: Thursday, September 10, 2009 1:23 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: How to avoid huge index files
>>
>>
>> Hi aga
earches.
> >>>>> I'm intending to deploy the application in Google App Engine
> >>>>> (http://code.google.com/appengine/), which limits files length to be
> >>>>> smaller than 10MB. I've read about the various policies supported by
>
t; which parameters, the index files still grew to be lot more the 10MB.
>>>>> Looking at the code, I've managed to limit the cfs files (predicting
>>>>> the
>>>>> file size in CompoundFileWriter before closing the file) - I guess
>>>>> that
>
zes, but on matter which policy I used and
>>>> which parameters, the index files still grew to be lot more the 10MB.
>>>> Looking at the code, I've managed to limit the cfs files (predicting the
>>>> file size in CompoundFileWriter before closing the file)
#x27;ve managed to limit the cfs files (predicting the
>>> file size in CompoundFileWriter before closing the file) - I guess that
>>> will degrade performance, but it's OK for now. But now the FDT files are
>>> becoming huge (about 60MB) and I cant identifiy a
ance, but it's OK for now. But now the FDT files are
>> becoming huge (about 60MB) and I cant identifiy a way to limit those
>> files.
>>
>> Is there some built-in and correct way to limit these files length? If no,
>> can someone direct me please how should I twea
huge (about 60MB) and I cant identifiy a way to limit those
> files.
>
> Is there some built-in and correct way to limit these files length? If no,
> can someone direct me please how should I tweak the source code to achieve
> that?
>
> Thanks for any help.
>
--
View t
eak the source code to achieve
that?
Thanks for any help.
--
View this message in context:
http://www.nabble.com/How-to-avoid-huge-index-files-tp25347505p25347505.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
AIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, January 11, 2007 12:36:52 PM
Subject: RE: Huge Index
Given the ram directory inside of the IndexWriter in 2.0, is there still
any reason to stage this manually?
-
T
Given the ram directory inside of the IndexWriter in 2.0, is there still
any reason to stage this manually?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
.org
Sent: Thursday, January 11, 2007 12:16:51 PM
Subject: RE: Huge Index
I never got to index all the data but it is too slow.
I got 3 million in 2,5 hours.
As suggested in Lucene in Action, I use ramDir and after I write 5000
documents I merge them to the fsDir.
The merge factor is now 100 I tr
-Original Message-
From: Ruslan Sivak [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 15:12
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Alice,
If you have a computer that crashes once you put a lot of load on it,
I'd say you have bigger problems t
ginal Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 15:07
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Hi Alice,
Can you define slow (hours, days, months and on what hardware)? Have
you done any profiling, etc. to see wher
e the server crashes.
-Original Message-
From: Russ [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 14:33
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Can you use multiple threads/machines to index the data into separate
indexes, and then combine them?
Hi Alice,
Can you define slow (hours, days, months and on what hardware)? Have
you done any profiling, etc. to see where the bottlenecks are? What
size documents are you talking about here? What are your merge
factors, etc.?
Thanks,
Grant
On Jan 11, 2007, at 10:47 AM, Alice wrote:
H
Unfortunately I can't use multiple machines.
>
> And I cannot start lots of threads because the server crashes.
>
> -Original Message-
> From: Russ [mailto:[EMAIL PROTECTED]
> Sent: quinta-feira, 11 de janeiro de 2007 14:33
> To: java-user@lucene.apache.org
&g
ause the server crashes.
-Original Message-
From: Russ [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 14:33
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Can you use multiple threads/machines to index the data into separate
indexes, and then combine t
Unfortunately I can't use multiple machines.
And I cannot start lots of threads because the server crashes.
-Original Message-
From: Russ [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 14:33
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Can yo
Can you use multiple threads/machines to index the data into separate indexes,
and then combine them?
Russ
Sent wirelessly via BlackBerry from T-Mobile.
-Original Message-
From: "Alice" <[EMAIL PROTECTED]>
Date: Thu, 11 Jan 2007 13:47:36
To:
Subject: Huge Index
Hell
41 matches
Mail list logo