RE: RAMDirectory unexpectedly slows

2012-06-04 Thread Uwe Schindler
This is managed by your operating system. In general OS kernels like Linux or Windows use all free memory to cache disk accesses. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Cheng [mailto:zhoucheng2.

Re: Deferring merging of index segments

2012-06-04 Thread Michael McCandless
Awesome, thanks for bringing closure Vitaly. Mike McCandless http://blog.mikemccandless.com On Mon, Jun 4, 2012 at 3:10 PM, Vitaly Funstein wrote: > Thanks for the tip, Mike. After changing the three calls > > IndexWriter.commit(); > > > > IndexWriter.maybeMerge(); > IndexWriter.waitForMerges

Re: Deferring merging of index segments

2012-06-04 Thread Vitaly Funstein
Thanks for the tip, Mike. After changing the three calls IndexWriter.commit(); IndexWriter.maybeMerge(); IndexWriter.waitForMerges(); to simply calling IndexWriter.close(true) the disk size and run time are now very close to the case of parallel segment merges. On Sat, Jun 2, 2012 at 6:43 AM,

Re: forcing an IndexWriter to close

2012-06-04 Thread Geoff Cooney
Hi Shai, writer.rollback() looks like exactly what I need. Not sure how I overlooked that. Thanks for the help! -Geoff On Mon, Jun 4, 2012 at 10:11 AM, Shai Erera wrote: > Hi > > You have several ways to do it: > > 1) Use NativeFSLockFactory, which obtains native locks that are released > au

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
Can I control the size of ram given to either MMapDirectory or ByteBufferDirectory? On Mon, Jun 4, 2012 at 11:42 PM, Uwe Schindler wrote: > Hi, > > If you are using MMapDirectory or this ByteBufferDirectory (which is > similar to the first) the used RAM is outside JVM heap, it is in the FS > cac

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Uwe Schindler
Hi, If you are using MMapDirectory or this ByteBufferDirectory (which is similar to the first) the used RAM is outside JVM heap, it is in the FS cache of the OS kernel. Giving too much memory to the JVM penalizes the OS cache, so give only as much as the App needs. Lucene and the OS kernel will

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jason Rutherglen
> What about the ByteBufferDirectory? Can this specific directory utilize the > 2GB memory I grant to the app? BBD places the byte objects outside of the heap, so increasing the heap size is only going to rob the system IO cache. With Lucene the heap is only used for field caches and the terms di

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
Please shed more insight into the difference between JVM heap size and the memory size used by Lucene. What I am getting at is that no matter however much ram I give my apps, Lucene can't utilize it. Is that right? What about the ByteBufferDirectory? Can this specific directory utilize the 2GB me

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jason Rutherglen
If you want the index to be stored completely in RAM, there is the ByteBuffer directory [1]. Though I do not see the point in putting an index in RAM, it will be cached in RAM regardless in the OS system IO cache. 1. https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/ap

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Cheng
My indexes are 500MB+. So it seems like that RAMDirectory is not good for that big a size. My challenge, on the other side, is that I need to update the indexes very frequently. So, do you think MMapDirectory is the solution? Thanks. On Mon, Jun 4, 2012 at 10:30 PM, Jack Krupansky wrote: > Fro

Re: RAMDirectory unexpectedly slows

2012-06-04 Thread Jack Krupansky
From the javadoc for RAMDirectory: "Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimi

Re: forcing an IndexWriter to close

2012-06-04 Thread Shai Erera
Hi You have several ways to do it: 1) Use NativeFSLockFactory, which obtains native locks that are released automatically when the process dies, as well as after a successful IndexWriter.close(). If your writer.close() is called just before the process terminates, then this might be a good soluti

forcing an IndexWriter to close

2012-06-04 Thread Geoff Cooney
Hi, Is there a safe way to forcefully close an IndexWriter that is unable to flush to disk? We're seeing occasional issues where an IndexWriter encounters an IOException on close and does not release the write lock. The IndexWriter documentation lists this as desired behavior so that clients can

Re: pruning package- pruneAllPositions

2012-06-04 Thread Zeynep P.
Hi, Thanks for your fix. I used it but I think there is something wrong with the fix!!? because I am using LATimes collection and with epsilon = 0.1 and k =10 I got 97% pruned index. It means 3% of index left unchanged after pruning. In the the original paper, "Static index pruning for IR systems

RAMDirectory with FSDirectory merging Versus large mergeFactor and RAMBufferSizeMB

2012-06-04 Thread Maxim Terletsky
Hi guys, There are two approaches I see in Lucene In Action about speeding up the indexing process. 1) Simply increase the mergeFactor and RAMBufferSizeMB. 2) Using RAMDirectory as a buffer (perhaps even several in parallel) and later merging it using addIndexes to FSDirectory. So my question