[jira] [Comment Edited] (LUCENE-5161) review FSDirectory chunking defaults and test the chunking

Uwe Schindler (JIRA) Sat, 10 Aug 2013 05:10:50 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735851#comment-13735851
 ]


Uwe Schindler edited comment on LUCENE-5161 at 8/10/13 12:09 PM:
-----------------------------------------------------------------

Hi,

I will now explain the problems of SimpleFSDirectory and NIOFSDirectory and why 
the OOM oocurs:

NIOFSDir uses a FileChannel to read from disk. This is generally a good idea to 
support lockless transfers (on windows unfortunaetly not). The issue here is 
some limitation in the internal JVM implementation: The big issue is the 
garbage collector. It is impossible for the native code to read from a file 
descriptor and let the results go directly to a native byte[] (e.g. a 
ByteBuffer.allocate() on heap or a byte[] in RandomAccessFile), because those 
are interruptible operations and are not synchronized. It may happen that JDK 
invokes the kernel read() method and give it the native pointer of the byte[] 
and suddenly the garbage collector jumps in (in another thread) and moves the 
byte[] to defragment the heap. As the code is in the kernel, there is nothing 
that can be done to prevent this code from writing outside byte[], once it was 
moved. Theoretically the JVM could lock the byte[] somehow to prevent the GC 
from moving it, but that is not how it is done.

Because of this problem FileChannel (and also RandomAccessFile) allocate a 
DirectBuffer if the buffer passed to write is a heap ByteBuffer (see 
[http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/IOUtil.java#211]
 and 
[http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/Util.java#60]).
 Those direct byte buffers are allocated with SoftReferences to it, so they get 
garbage collected one memory gets low. But as you see from the code, the direct 
buffer is choosen to be at least the size of the requested transfer if none of 
these buffers is available a new one is allocated with the transfer size). And 
this s the big problem leading to the OOM. The maximum size of direct memory 
allocated outside of the JVM is limited by the heap size (I think 2 times heap).

The current chunk sizes are horrible: With 2 Gigabytes on 64 bit and mayn 
megabytes on 32 bit you allocate huge direct buffer outside of the JDK heap 
that consume memory and it is unlikely that they are freed again. So we should 
really limit the maximum size of those chunks to reasonable values. The 
chunking code is working (and is tested), so we should limit those read buffers 
to a sensible value.

E.g., for windows everything greater than 64 MB is useless (see some references 
for transferTo). The only thing, we change by the chunk size, is the number of 
syscalls, but for reading 500 MB of index norms it makes no difference if you 
have 2 syscalls or 500 syscalls, the harddisk is the limiting factor).

For RandomAccessFile, the same is done: it internally allocates direct memory 
(in fact the current JDKs implement RandomAccessFile mostly through NIO using 
ByteBuffer.wrap()).

The above also explains why making a difference between 32 bit and 64 bits is 
useless. The OOM occurs not because of the bit size, but morre because the 
direct memory is like the Java heap a limitation by the underlying JDK. So we 
should not waste all this memory. To also note: In fact to transfer 500 MB you 
need at least the 500 MB byte[] as target for the transfer (using 
ByteBuffer.wrap as we do in NIOFSDir) on heap, but also 500 MB in direct 
memory, so we waste 1 Gigabyte!!! This is horrible inefficient.

Also note that NIOFSDir always has to copy the direct buffer to the heap buffer 
so this is an overhead. It might be a good idea to implement a second 
*optimized* NIOFSDir that uses DirectBuffers and does not copy all stuff to the 
heap. For the direct buffer chunks we can use similar code like in 
ByteBufferIndexInput (which is very effectove).

I would default the chunk size in NIOFSDir to something around 1 to 32 
megabyte, e.g. 2 Megabytes on 32 bit and 8 Megabytes on 64 bit. Definitely the 
current chunk sizes are way too large and waste physical memory we could use 
for something else!

Maybe [~mikemccand] can do some perf tests with NIOFSDir with radically lowered 
buffer sizes. I think it will not show any difference!
                
      was (Author: thetaphi):
    Hi,

I will now explain the problems of SimpleFSDirectory and NIOFSDirectory and why 
the OOM oocurs:

NIOFSDir uses a FileChannel to read from disk. This is generally a good idea to 
support lockless transfers (on windows unfortunaetly not). The issue here is 
some limitation in the internal JVM implementation: The big issue is the 
garbage collector. It is impossible for the native code to read from a file 
descriptor and let the results go directly to a native byte[] (e.g. a 
ByteBuffer.allocate() on heap or a byte[] in RandomAccessFile), because those 
are interruptible operations and are not synchronized. It may happen that JDK 
invokes the kernel read() method and give it the native pointer of the byte[] 
and suddenly the garbage collector jumps in (in another thread) and moves the 
byte[] to defragment the heap. As the code is in the kernel, there is nothing 
that can be done to prevent this code from writing outside byte[], once it was 
moved. Theoretically the JVM could lock the byte[] somehow to prevent the GC 
from moving it, but that is not how it is done.

Because of this problem FileChannel (and also RandomAccessFile) allocate a 
DirectBuffer if the buffer passed to write is a heap ByteBuffer (see 
[http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/IOUtil.java#211]
 and 
[http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/Util.java#60]).
 Those direct byte buffers are allocated with SoftReferences to it, so they get 
garbage collected one memory gets low. But as you see from the code, the direct 
buffer is choosen to be at least the size of the requested transfer if none of 
these buffers is available a new one is allocated with the transfer size). And 
this s the big problem leading to the OOM. The maximum size of direct memory 
allocated outside of the JVM is limited by the heap size (I think 2 times heap).

The current chunk sizes are horrible: With 2 Gigabytes on 64 bit and mayn 
megabytes on 32 bit you allocate huge direct buffer outside of the JDK heap 
that consume memory and it is unlikely that they are freed again. So we should 
really limit the maximum size of those chunks to reasonable values. The 
chunking code is working (and is tested), so we should limit those read buffers 
to a sensible value.

E.g., for windows everything greater than 64 MB is useless (see some references 
for transferTo). The only thing, we change by the chunk size, is the number of 
syscalls, but for reading 500 MB of index norms it makes no difference if you 
have 2 syscalls or 500 syscalls, the harddisk is the limiting factor).

For RandomAccessFile, the same is done: it internally allocates direct memory 
(in fact the current JDKs implement RandomAccessFile mostly through NIO using 
ByteBuffer.wrap()).

The above also explains why making a difference between 32 bit and 64 bits is 
useless. The OOM occurs not because of the bit size, but morre because the 
direct memory is like the Java heap a limitation by the underlying JDK. So we 
should not waste all this memory. To also note: In fact to transfer 500 MB you 
need at least the 500 MB byte[] as target for the transfer (using 
ByteBuffer.wrap as we do in NIOFSDir) on heap, but also 500 MB in direct 
memory, so we waste 1 Gigabyte!!! This is horrible inefficient.

Also note that NIOFSDir always has to copy the direct buffer to the heap bugger 
so this is an overhead. It might be a good idea to implement a second 
*optimized* NIOFSDir that uses DirectBuffers and does not copy all stuff to the 
heap. For the direct buffer junks we can use similar code like in 
ByteBufferIndexInput.

I would default the chunk size in NIOFSDir to something around 1 to 32 
megabyte, e.g. 2 Megabytes on 32 bit and 8 Megabytes on 64 bit. Definitely the 
current chunk sizes are way too large and waste physical memory we could use 
for something else!

Maybe [~mikemccand] can do some perf tests with NIOFSDir with radically lowered 
buffer sizes. I think it will not show any difference!
                  
> review FSDirectory chunking defaults and test the chunking
> ----------------------------------------------------------
>
>                 Key: LUCENE-5161
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5161
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-5161.patch
>
>
> Today there is a loop in SimpleFS/NIOFS:
> {code}
> try {
>           do {
>             final int readLength;
>             if (total + chunkSize > len) {
>               readLength = len - total;
>             } else {
>               // LUCENE-1566 - work around JVM Bug by breaking very large 
> reads into chunks
>               readLength = chunkSize;
>             }
>             final int i = file.read(b, offset + total, readLength);
>             total += i;
>           } while (total < len);
>         } catch (OutOfMemoryError e) {
> {code}
> I bet if you look at the clover report its untested, because its fixed at 
> 100MB for 32-bit users and 2GB for 64-bit users (are these defaults even 
> good?!).
> Also if you call the setter on a 64-bit machine to change the size, it just 
> totally ignores it. We should remove that, the setter should always work.
> And we should set it to small values in tests so this loop is actually 
> executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-5161) review FSDirectory chunking defaults and test the chunking

Reply via email to