All, Running 0.98.12 on hadoop 2.6.0 and Java 7.
TL;DR; If your RS is crashing for no apparent reason, check the stderr/stdout log, and if you have wide rows, upgrade to 0.98.13, OR set batch size AND max result size on scans. I'm reporting here because the failure was tough to track down and it appears there are still active changes happening in the area of the scan batch & result size (I like the direction it is heading, most of these could be considered "transport layer" or just hints to stop a scan early server side). Long Story: We have a daily backup that runs full scans over our table. The scan sets a batch size but not a maxScanResultSize. The RS started crashing on this backup job every day. It took awhile to track down, but it wound up being a large row with many columns causing the failure. In our case, when our backup scan hits a row that is over 2GB (yes this is excessive, and we'll be working to address this as well), the RS dies but with nothing in the RS log indicating why (ie, no OOME exception there). Every daily backup kills a region server when it hits that row. It actually results in 2 crashes because the region is picked up again (sometimes by the same recovering region), and the scan continues and kills whatever RS picked up the region. After that the scan times out and fails out to the caller. Nothing in the RS log, but I checked the stdout/err logs and found the "OOME: Requested array size exceeds VM limit." there, but no stack trace. Note this isn't a normal heap space OOM, but a curious "Requested array size exceeds VM limit" one. After researching a bit, Java 7 changed how array limits work. In short, the maximum size the JVM will allow is Integer.MAX_VALUE - 2. Anything over that, and it will throw the OOM exception shown. See more: https://bugs.openjdk.java.net/browse/JDK-8059914 and: https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit We changed the -XX:OnOutOfMemoryError to do a kill -3, and found the following in the thread dump of every failure: java.lang.Thread.State: RUNNABLE at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.hadoop.hbase.io.ByteBufferOutputStream.allocate(ByteBufferOutputStream.java:69) at org.apache.hadoop.hbase.io.ByteBufferOutputStream.checkSizeAndGrow(ByteBufferOutputStream.java:88) at org.apache.hadoop.hbase.io.ByteBufferOutputStream.write(ByteBufferOutputStream.java:126) at org.apache.hadoop.hbase.KeyValue.oswrite(KeyValue.java:2938) at org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueEncoder.write(KeyValueCodec.java:60) at org.apache.hadoop.hbase.ipc.IPCUtil.buildCellBlock(IPCUtil.java:147) at org.apache.hadoop.hbase.ipc.RpcServer$Call.setResponse(RpcServer.java:440) - locked <0x00000004a836d968> (a org.apache.hadoop.hbase.ipc.RpcServer$Call) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:121) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Looking at org.apache.hadoop.hbase.io.ByteBufferOutputStream.checkSizeAndGrow, it has some logic to increase the byte buffer up to Integer.MAX_VALUE. If this ever actually happens, the OOM "array size exceeds vm limit" will be triggered. It seems this should be changed to max out at Integer.MAX_VALUE-2? Although, even if that is changed, it is likely a symptom of an issue higher up the stack. We created a test to repro. Then, we tested with changing the buffer max to be lower. This caused a BufferOverflow on the RS side, but it didn't kill the RS and the client scan fails with OutOfOrder scan next exception. So, it still fails, but less catastrophically. We then ran the test with hbase 0.98.13 and it works, even though checkSizeAndGrow hasn't changed. Looking further, our test works when the client is 0.98.13 on either 0.98.12 or 0.98.13. So, something in the client was changed to reduce how much data is needed by the scan. After a git bisect, we got down to this commit: fa14b028bdeae72f492da2054ceb79985f93adc1 "Backport of HBASE-13527 https://issues.apache.org/jira/browse/HBASE-13527" (which is marked with fixVersion 0.98.12, but is actually in 0.98.13, is that marked correctly?) So that commit makes it so the client sends along the client-side default config for the max scan size if not set, which is then honored by the server and stops the scan well before getting too large. So the RS crash issue was fixed by that commit, and we can also set the result max size in 0.98.12 client and that works as well (so long as the client has enough heap to not gc pause, causing an OutOfOrderScannerNextException). We hadn't set the max scan size b/c the batch of the 100 cells we requested should never be all that huge, but it seems the scanner RS side doesn't consider the batch size when filling the request when no result size is set. While the client was changed to pass along a max by default, it may still be possible to wind up getting the buffer to attempt a resizing to MAX_INT. I'm not sure how many places in HBase might attempt to use the same upper bound for byte array allocation. We wanted to test on a variety of client/server versions and scan settings. Server was always configured with 10g max heap. The test adds enough 9mb cells to the same row and column family to exceed 1gb, then tries to scan them back. This may just be a report that things work as expected. The test can still crash the RS with 0.98.13 when no batch size is set. That appears to be fixed by 1.1.0. 0.98.12 client & Server no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log batch 100, max size 2m: SUCCESS* batch 100, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log 0.98.12 client & 0.98.13 Server no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log batch 100, max size 2m: SUCCESS* batch 100, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log 0.98.13 client and server no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log batch 100, max size 2m: SUCCESS* batch 100, no max size: SUCCESS* no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log 0.98.13 client and 0.98.12 server no batch, no max size: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log batch 100, max size 2mb: SUCCESS* batch 100, no max size: SUCCESS* no batch, max size 2mb: rs crash with "java.lang.OutOfMemoryError: Requested array size exceeds VM limit" in stderr/out log 1.1.0 client & server no batch, no max size, allow partials false: SUCCESS**, and a single Result row. batch 100, max size 1mb, allow partials true: SUCCESS, 327 results (same row) NO client heap issues batch 100, max size 1mb, allow partials false: SUCCESS, 327 results (same row) NO client heap issues no batch, max size 1mb, allow partials false: SUCCESS**, and a single Result row. no batch, max size 1mb, allow partials true: SUCCESS**, and a single Result row batch 100, no max size, allow partials true: SUCCESS, 327 results (same row) NO client heap issues batch 100, no max size, allow partials false: SUCCESS, 327 results (same row) NO client heap issues * if client side heap is too small, then NO rs crash, but scan fails on client side with OutOfOrderScannerNextException, and scan failed on server side with ClosedChannelException. upping client heap to 8g fixes this. ** if client side heap is too small, then NO rs crash, but scan fails on client side with OOME heap (much clearer than OOOSNE). upping client heap to 8g fixes this. So, we have a workaround and are actively upgrading to 1.1. I'm not sure anything can be done about the 0.98.12 using the scan batch size to stop the scan early, or the 0.98.13 RS crashes (when no batch is set) aside from suggesting upgrade to 1.1 to get the new allow-partials support. I'm also not sure anything can/should be done about the array allocation limit jdk 7 change. It could delay a (potentially) premature OOME on array size, but if it does happen, there are likely things to fix in the call path (like has been done with the handling of the max result size). Thanks, James