[ 
https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072467#comment-16072467
 ] 

Eshcar Hillel commented on HBASE-18294:
---------------------------------------

The per-cell metadata overhead with basic compaction is about 60B. This means 
that with small values metadata can easily double the heap occupancy of the 
memstore.
bq.   When the data size of the memstore is high (choosing one to flush), the 
heap occupancy of it also will be on higher side no?
Not necessarily. It depends on the size of the values. one memstore can have 
50MB data and 100MB total heap size while another memstore has 70MB  data and 
only 80MB total heap size.
bq.  considering data size only for the per region flush decision is more 
inline with a normal user thinking. 
Not necessarily.  Here are 2 blogs which consider 128MB to be the total 
memstore occupancy:
https://www.quora.com/HBase-Region-Server-guidelines-give-a-size-range-of-about-1TB-whereas-data-nodes-are-configured-20-times-bigger-Why
 by [~larsgeorge]
_"we can address 4GB of heap for writes, allowing us up to have 32 regions that 
are written to while flushing at 128MB"_
http://hadoop-hbase.blogspot.co.uk/2013/01/hbase-region-server-memory-sizing.html
 by [~lhofhansl]
_"... you'd need ~338 regions. @128MB that's about 43GB"_
Users who came to understand HBase through this kind of blogs expect a region 
(memstore) not to exceed the 128MB threshold.

With the current implementation all places including regular and blocking 
thresholds at the region level and both selection policies at the RS-region and 
region-store level consider data size and not heap size.

> Flush is based on data size instead of heap size
> ------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size 
> is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the 
> store to another threshold (that can be configured with 
> hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size 
> (key-value only) to the threshold where it should compare the heap size 
> (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to