[ 
https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886946#comment-15886946
 ] 

stack commented on HBASE-17338:
-------------------------------

Trying to follow-along....

bq. We track dataSize (irrespective of cell data in on heap or off heap area).. 
This dataSize been used at Segment level for in memory flush decision, Region 
level for on disk flushes and globally to force flush some regions.

Dumb question. dataSize is KV infrastructure + key content + value + trailing 
tags and sequenceid if any? i.e. the whole KV? And CellSize is infrastructure 
only or rather key+infrastructure?

bq. At the 1st 2 levels, it is not doubt that we have to track all the cell 
data size together. Now the point Ram says is when we have off heap configured 
and max off heap global size is say 12 GB, once the data size globally reaches 
this level, we will force flush some regions. So his point is for this 
tracking, we have to consider only off heap Cells and on heap Cell's data size 
should not get accounted in the data size but only in the heapSize. (At global 
level. But at region and segment level it has to get applied). 2 reasons why I 
am not in favor of this

I'm listening (smile).

bq. 1. This makes the impl so complex. We need to add isOffheap check down the 
layers. Also at 2 layers we have to consider these on heap cell data size and 
one level not.

You can probably guess what I think on the above.

....


bq. So lets consider the cell data size globally also (how we do now) and make 
global flushes.

There is one global threshold whether data is onheap or offheap (I probably got 
this wrong?)

bq. We should be able to turn MSLAB usage ON/OFF per table also. Now this is 
possible? Am not sure. 

We could probably but the direct memory would remain allocated until we restart.

Thanks. Lets figure it and update your 
https://docs.google.com/document/d/1fj5P8JeutQ-Uadb29ChDscMuMaJqaMNRI86C4k5S1rQ/edit#heading=h.x14v1a3zw2q9

> Treat Cell data size under global memstore heap size only when that Cell can 
> not be copied to MSLAB
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17338
>                 URL: https://issues.apache.org/jira/browse/HBASE-17338
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver
>    Affects Versions: 2.0.0
>            Reporter: Anoop Sam John
>            Assignee: Anoop Sam John
>             Fix For: 2.0.0
>
>         Attachments: HBASE-17338.patch, HBASE-17338_V2.patch, 
> HBASE-17338_V2.patch, HBASE-17338_V4.patch, HBASE-17338_V5.patch
>
>
> We have only data size and heap overhead being tracked globally.  Off heap 
> memstore works with off heap backed MSLAB pool.  But a cell, when added to 
> memstore, not always getting copied to MSLAB.  Append/Increment ops doing an 
> upsert, dont use MSLAB.  Also based on the Cell size, we sometimes avoid 
> MSLAB copy.  But now we track these cell data size also under the global 
> memstore data size which indicated off heap size in case of off heap 
> memstore.  For global checks for flushes (against lower/upper watermark 
> levels), we check this size against max off heap memstore size.  We do check 
> heap overhead against global heap memstore size (Defaults to 40% of xmx)  But 
> for such cells the data size also should be accounted under the heap overhead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to