All, If a full GC happens on the client when a scan is in progress, the scan can be missing rows. I have a test that repros this almost every time.
The test runs against a local standalone server with 10g heap, using jdk1.7.0_45. The Test: - run with -Xmx1900m to restrict client heap - run with -verbose:gc to see the GCs - connect and create a new table with one CF - add 99 cells, 9mb each to that CF to the same row (individual PUTs in a loop). - full-scan the table, only setting the maxResultSize to 2mb (no batch size) - if no data, sleep 5s and try to scan again. Running this test, it fails the first scan. There is no exception, just no results returned (results.hasNext is false). The test then sleeps 5s and tries the scan again, and it usually succeeds on the 2nd or 3rd attempt. Looking at the logs, we see several full GCs during the scan (but no OOME stacks before the first failure). Then a curious message: 2015-07-30 10:42:10,815 [main] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation - Removed 192.168.1.131:53244 as a location of big_row_1438274455440,\x00\x80,1438274455540.b213fc048745241f236bc6e2291092d1. for tableName=big_row_1438274455440 from cache As if the client has somehow decided the region location is bad/gone? After that, the scan completes with no results. After a sleep, it tries again, and it usually passes, but oddly there are also actual OOMEs in the client log just before the scan finishes successfully: 2015-07-30 10:42:36,459 [IPC Client (1790044085) connection to / 192.168.1.131:53244 from james] WARN org.apache.hadoop.ipc.RpcClient - IPC Client (1790044085) connection to /192.168.1.131:53244 from james: unexpected exception receiving call responses java.lang.OutOfMemoryError: Java heap space 2015-07-30 10:42:36,459 [IPC Client (1790044085) connection to / 192.168.1.131:53244 from james] DEBUG org.apache.hadoop.ipc.RpcClient - IPC Client (1790044085) connection to /192.168.1.131:53244 from james: closing ipc connection to /192.168.1.131:53244: Unexpected exception receiving call responses java.io.IOException: Unexpected exception receiving call responses at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:731) Caused by: java.lang.OutOfMemoryError: Java heap space It seems like the rpc winds up retrying after catching Throwable. This test is single threaded, and the single row is large, causing several full GCs while receiving data. I suspect the same thing may happen if there are multiple threads scanning, causing mem pressure elsewhere, leading to a GC and may cause partial results (but I've not proven that). I can make the tests pass by setting batch size to 10, reducing the mem pressure from this one row, but again I'm not sure if a full GC were to happen for other activity in the JVM, the scan wouldn't wind up behaving the same and missing data. I tested the following combinations of client/server versions: Repro'ed in: - 0.98.12 client/server - 0.98.13 client 0.98.12 server - 0.98.13 client/server - 1.1.0 client 0.98.13 server - 0.98.13 client and 1.1.0 server - 0.98.12 client and 1.1.0 server NOT repro'ed in - 1.1.0 client/server I'm not sure why 1.1.0 client would fail the same way against a 0.98.13 server, but not a 1.1.0 server. But, more reason for my team to get up to 1.1 fully :) I have not yet run the test against a full cluster. I can provide the test and logs from my testing if requested. Thanks, James