RS crash: OOME: Requested array size exceeds VM limit.

James Estes Wed, 29 Jul 2015 12:38:22 -0700

All,

Running 0.98.12 on hadoop 2.6.0 and Java 7.

TL;DR; If your RS is crashing for no apparent reason, check the
stderr/stdout log, and if you have wide rows, upgrade to 0.98.13, OR set
batch size AND max result size on scans. I'm reporting here because the
failure was tough to track down and it appears there are still active
changes happening in the area of the scan batch & result size (I like the
direction it is heading, most of these could be considered "transport
layer" or just hints to stop a scan early server side).

Long Story:
We have a daily backup that runs full scans over our table. The scan sets a
batch size but not a maxScanResultSize. The RS started crashing on this
backup job every day. It took awhile to track down, but it wound up being a
large row with many columns causing the failure.

In our case, when our backup scan hits a row that is over 2GB (yes this is
excessive, and we'll be working to address this as well), the RS dies but
with nothing in the RS log indicating why (ie, no OOME exception there).
Every daily backup kills a region server when it hits that row. It actually
results in 2 crashes because the region is picked up again (sometimes by
the same recovering region), and the scan continues and kills whatever RS
picked up the region. After that the scan times out and fails out to the
caller.

Nothing in the RS log, but I checked the stdout/err logs and found the
"OOME: Requested array size exceeds VM limit." there, but no stack trace.

Note this isn't a normal heap space OOM, but a curious "Requested array
size exceeds VM limit" one. After researching a bit, Java 7 changed how
array limits work. In short, the maximum size the JVM will allow is
Integer.MAX_VALUE - 2. Anything over that, and it will throw the OOM
exception shown.

See more: https://bugs.openjdk.java.net/browse/JDK-8059914
and:
https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit

We changed the -XX:OnOutOfMemoryError to do a kill -3, and found the
following in the thread dump of every failure:

java.lang.Thread.State: RUNNABLE
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at
org.apache.hadoop.hbase.io.ByteBufferOutputStream.allocate(ByteBufferOutputStream.java:69)
at
org.apache.hadoop.hbase.io.ByteBufferOutputStream.checkSizeAndGrow(ByteBufferOutputStream.java:88)
at
org.apache.hadoop.hbase.io.ByteBufferOutputStream.write(ByteBufferOutputStream.java:126)
at org.apache.hadoop.hbase.KeyValue.oswrite(KeyValue.java:2938)
at
org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueEncoder.write(KeyValueCodec.java:60)
at
org.apache.hadoop.hbase.ipc.IPCUtil.buildCellBlock(IPCUtil.java:147)
at
org.apache.hadoop.hbase.ipc.RpcServer$Call.setResponse(RpcServer.java:440)
- locked <0x00000004a836d968> (a
org.apache.hadoop.hbase.ipc.RpcServer$Call)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:121)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)

Looking at
org.apache.hadoop.hbase.io.ByteBufferOutputStream.checkSizeAndGrow, it has
some logic to increase the byte buffer up to Integer.MAX_VALUE. If this
ever actually happens, the OOM "array size exceeds vm limit" will be
triggered. It seems this should be changed to max out at
Integer.MAX_VALUE-2? Although, even if that is changed, it is likely a
symptom of an issue higher up the stack.

We created a test to repro. Then, we tested with changing the buffer max to
be lower. This caused a BufferOverflow on the RS side, but it didn't kill
the RS and the client scan fails with OutOfOrder scan next exception. So,
it still fails, but less catastrophically.

We then ran the test with hbase 0.98.13 and it works, even though
checkSizeAndGrow hasn't changed. Looking further, our test works when the
client is 0.98.13 on either 0.98.12 or 0.98.13. So, something in the client
was changed to reduce how much data is needed by the scan.

After a git bisect, we got down to this commit:
fa14b028bdeae72f492da2054ceb79985f93adc1 "Backport of HBASE-13527
https://issues.apache.org/jira/browse/HBASE-13527"; (which is marked with
fixVersion 0.98.12, but is actually in 0.98.13, is that marked correctly?)
So that commit makes it so the client sends along the client-side default
config for the max scan size if not set, which is then honored by the
server and stops the scan well before getting too large.

So the RS crash issue was fixed by that commit, and we can also set the
result max size in 0.98.12 client and that works as well (so long as the
client has enough heap to not gc pause, causing an
OutOfOrderScannerNextException). We hadn't set the max scan size b/c the
batch of the 100 cells we requested should never be all that huge, but it
seems the scanner RS side doesn't consider the batch size when filling the
request when no result size is set.

While the client was changed to pass along a max by default, it may still
be possible to wind up getting the buffer to attempt a resizing to MAX_INT.
I'm not sure how many places in HBase might attempt to use the same upper
bound for byte array allocation.

We wanted to test on a variety of client/server versions and scan settings.
Server was always configured with 10g max heap. The test adds enough 9mb
cells to the same row and column family to exceed 1gb, then tries to scan
them back. This may just be a report that things work as expected. The test
can still crash the RS with 0.98.13 when no batch size is set. That appears
to be fixed by 1.1.0.

0.98.12 client & Server
no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested
array size exceeds VM limit" in stderr/out log
batch 100, max size 2m: SUCCESS*
batch 100, no max size: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log
no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log

0.98.12 client & 0.98.13 Server
no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested
array size exceeds VM limit" in stderr/out log
batch 100, max size 2m: SUCCESS*
batch 100, no max size: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log
no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log

0.98.13 client and server
no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested
array size exceeds VM limit" in stderr/out log
batch 100, max size 2m: SUCCESS*
batch 100, no max size: SUCCESS*
no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log

0.98.13 client and 0.98.12 server
no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested
array size exceeds VM limit" in stderr/out log
batch 100, max size 2mb: SUCCESS*
batch 100, no max size: SUCCESS*
no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError:
Requested array size exceeds VM limit" in stderr/out log

1.1.0 client & server
no batch, no max size, allow partials false: SUCCESS**, and a single
Result row.
batch 100, max size 1mb, allow partials true: SUCCESS, 327 results (same
row) NO client heap issues
batch 100, max size 1mb, allow partials false: SUCCESS, 327 results (same
row) NO client heap issues
no batch, max size 1mb, allow partials false: SUCCESS**, and a single
Result row.
no batch, max size 1mb, allow partials true: SUCCESS**, and a single Result
row
batch 100, no max size, allow partials true: SUCCESS, 327 results (same
row) NO client heap issues
batch 100, no max size, allow partials false: SUCCESS, 327 results (same
row) NO client heap issues

* if client side heap is too small, then NO rs crash, but scan fails on
client side with OutOfOrderScannerNextException, and scan failed on server
side with ClosedChannelException. upping client heap to 8g fixes this.
** if client side heap is too small, then NO rs crash, but scan fails on
client side with OOME heap (much clearer than OOOSNE). upping client heap
to 8g fixes this.

So, we have a workaround and are actively upgrading to 1.1. I'm not sure
anything can be done about the 0.98.12 using the scan batch size to stop
the scan early, or the 0.98.13 RS crashes (when no batch is set) aside from
suggesting upgrade to 1.1 to get the new allow-partials support. I'm also
not sure anything can/should be done about the array allocation limit jdk 7
change. It could delay a (potentially) premature OOME on array size, but if
it does happen, there are likely things to fix in the call path (like has
been done with the handling of the max result size).

Thanks,

James

RS crash: OOME: Requested array size exceeds VM limit.

Reply via email to