On Fri, Oct 10, 2014 at 10:59 AM, Khaled Elmeleegy <[email protected]> wrote:
> Yes, I can reproduce it with some work. > The workload is basically as follows: > There are writers streaming writes to a table. Then, there is a reader > (invoked via a web interface). The reader does a 1000 parallel reverse > scans, all end up hitting the same region in my case. The scans are > effectively "gets" as I just need to get one record off of each of them. I > just need to do a "reverse" get, which is not supported (would be great to > have :)), so I do it via reverse scan. After few tries, the reader > consistently hits this bug. > > This happens with these config changes: > hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G > -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog > gc:/tmp/hbase-regionserver-gc.loghbase-site: > hbase.bucketcache.ioengine=offheap > hbase.bucketcache.size=4196 > hbase.rs.cacheblocksonwrite=true > hfile.block.index.cacheonwrite=true > hfile.block.bloom.cacheonwrite=true > > Interestingly, without these config changes, I can't reproduce the problem. How hard to play w/ combinations? Could you eliminate the cacheonwrites on one server and see if that cures the issue? Could trun off block cache on another to see if that the problem? Anything in your .out files related? St.Ack
