Can RAMDirectory work for gigabyte data which needs refreshing of the index all the time?

2014-05-14 Thread Cheng
Hi, I have an index of multiple gigabytes which serves 5-10 threads and needs refreshing very often. I wonder if RAMDirectory is the good candidate for this purpose. If not, what kind of directory is better? Thanks, Cheng

Merger performance degradation on 3.6.1

2014-05-14 Thread danielv
Hi, We have about 550M records index (~800GB) and we merge thousands of mini indexes once a week using hadoop - 45 mappers on 2 hadoop nodes. After upgrading to Lucene 3.6.1 we noticed that the merge process continuously slowing down. After we test a couple of options it looks like we found the so

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
Not closing an IndexReader most probably (to say the least) results in a mem-leak -> OOM > But if I close if given that it is share by multiple threads I will >need to check each time before doing the search if IndexReader is still open >correct? You can make use of IndexReader#incRef/#decRef ,

Re: ConcurrentModificationException in ICU analyzer

2014-05-14 Thread Robert Muir
I opened https://issues.apache.org/jira/browse/LUCENE-5671 for now, if you are able to use the latest release of ICU, it should prevent the bug. On Wed, May 14, 2014 at 11:47 AM, Robert Muir wrote: > fyi: this bug was already found and fixed in ICU's trunk: > http://bugs.icu-project.org/trac/tic

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
> But if I close if given that it is share by multiple threads I will need to > check each time >before doing the search if IndexReader is still open correct? You can make use of IndexReader#incRef/#decRef , i.e. ir.incRef(); try { Or maybe SearcherManager http://blog.mikemccandless.com/2011/09

Re: ConcurrentModificationException in ICU analyzer

2014-05-14 Thread Robert Muir
fyi: this bug was already found and fixed in ICU's trunk: http://bugs.icu-project.org/trac/ticket/10767 On Wed, May 14, 2014 at 4:32 AM, Robert Muir wrote: > This looks like a bug in ICU? I'll try to reproduce it. We are also a > little out of date, maybe they've already fixed it. > > Thank you

Re: best choice for ramBufferSizeMB

2014-05-14 Thread Michael McCandless
Generally larger is better, as long as JVM's heap is big enough to allow IW to use that RAM. Large flushed segments means less merging later ... Mike McCandless http://blog.mikemccandless.com On Wed, May 14, 2014 at 9:33 AM, Gudrun Siedersleben wrote: > Hi all, > > we want to speed up buildin

Re: best choice for ramBufferSizeMB

2014-05-14 Thread Shai Erera
Well, first make sure that you set ramBufferSizeMB to well below the max Java heap size, otherwise you could run into OOMs. While a larger RAM buffer may speed up indexing (since it flushes less often to disk), it's not the only factor that affects indexing speed. For instance, if a big portion o

Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Liviu Matei
Hi, I am encountering the following issue with Lucene 3.6.1 if you could let me know if I am doing something wrong / there is a mistake I am making it would be great. In order to improve the performance of the application that I am working at I went to the approach of reusing the IndexReader and

Re: Lucene: Index Writer to write in multiple file instead make one heavy file

2014-05-14 Thread Yogesh patel
Thanks for reply!!! Can you please provide me sample code for it? I got the idea but i dont know how to implement it. Thanks On Tue, May 13, 2014 at 7:02 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > You can tell the MergePolicy to limit the maximum size of segments it >

Re: writer.updateDocument() not working (possible bug?)

2014-05-14 Thread Michael McCandless
How did you produce the document that you are sending to updateDocument? Are you loading it from IndexReader.document() or IndexSearcher.doc(), changing it, then passing that to IW.updateDocument? If so, that's probably your bug: a loaded document is not identical to the original Document you ind