[
https://issues.apache.org/jira/browse/HBASE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yu Sun updated HBASE-16287:
---------------------------
Attachment: HBASE-16287-v2.patch
attach patch v2 to fix ut failed error
this patch also contains a fix to
org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testStoreFileCacheOnWrite()
failed due to this patch. its simply set
LruBlockCache.LRU_HARD_CAPACITY_LIMIT_FACTOR_CONFIG_NAME, to 2.0f, and if we
dont apply this change, estCacheOnWrite.testStoreFileCacheOnWrite() will failed
and the output log is:
{quote}
2016-07-28 23:02:49,801 INFO [main] hfile.CacheConfig(285):
blockCache=LruBlockCache{blockCount=0, currentSize=159452224, f
reeSize=-25234496, maxSize=134217728, heapSize=159452224, minSize=127506840,
minFactor=0.95, multiSize=63753420, multiFactor
=0.5, singleSize=31876710, singleFactor=0.25}, cacheDataOnRead=true,
cacheDataOnWrite=false, cacheIndexesOnWrite=true, cache
BloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=true,
prefetchOnOpen=false
2016-07-28 23:02:49,807 DEBUG [main] hfile.HFile$WriterFactory(345): Unable to
set drop behind on /home/hongxi.sy/hbase/hbas
e-server/target/test-data/b1c99d85-27e3-4796-a66b-324feb06c620/test_cache_on_write/9174b12e141143acb9d4be7b6e7165a9
{quote}
from the log above we can see: currentSize > 1.2f * maxSize *
DEFAULT_ACCEPTABLE_FACTOR, that is 159452224 > 159450660.864, so the block
beening read is not put into LruCache, the assert failed. here i just
increment hard limt factor to make the lur cache large enough for all the
blocks of the file beening read.
> LruBlockCache size should not exceed acceptableSize too many
> ------------------------------------------------------------
>
> Key: HBASE-16287
> URL: https://issues.apache.org/jira/browse/HBASE-16287
> Project: HBase
> Issue Type: Improvement
> Components: BlockCache
> Reporter: Yu Sun
> Assignee: Yu Sun
> Attachments: HBASE-16287-v1.patch, HBASE-16287-v2.patch
>
>
> Our regionserver has a configuation as bellow:
> -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=2 -XX:+UseConcMarkSweepGC
> also we only use blockcache,and set hfile.block.cache.size = 0.3 in
> hbase_site.xml,so under this configuration, the lru block cache size will
> be(32g-1g)*0.3=9.3g. but in some scenarios,some of the rs will occur
> continuous FullGC for hours and most importantly, after FullGC most of the
> object in old will not be GCed. so we dump the heap and analyse with MAT and
> we observed a obvious memory leak in LruBlockCache, which occpy about 16g
> memory, then we set set class LruBlockCache log level to TRACE and observed
> this in log:
> {quote}
> 2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor]
> hfile.LruBlockCache: totalSize=15.29 GB, freeSize=-5.99 GB, max=9.30 GB,
> blockCount=628182, accesses=101799469125, hits=93517800259, hitRatio=91.86%,
> , cachingAccesses=99462650031, cachingHits=93468334621,
> cachingHitsRatio=93.97%, evictions=238199, evicted=4776350518,
> evictedPerRun=20051.93359375{quote}
> we can see blockcache size has exceeded acceptableSize too many, which will
> cause the FullGC more seriously.
> Afterfter some investigations, I found in this function:
> {code:borderStyle=solid}
> public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean
> inMemory,
> final boolean cacheDataInL1) {
> {code}
> No matter the blockcache size has been used, just put the block into it. but
> if the evict thread is not fast enough, blockcache size will increament
> significantly.
> So here I think we should have a check, for example, if the blockcache size >
> 1.2 * acceptableSize(), just return and dont put into it until the blockcache
> size if under watrmark. if this is reasonable, I can make a small patch for
> this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)