Dear Hbase community,

Our HBase cluster (version 2.1.10) is generating excessively small HFiles,
often below 10 MB and sometimes as low as 2 KB. This occurs despite
configuring hbase.hregion.memstore.flush.size to a significant value and
tuning parameters like
hbase.hregion.percolumnfamilyflush.size.lower.bound.min to match the flush
size. Additionally, we've enabled 'BASIC' in-memory compaction.

The root cause appears to be the global nature of MemStoreSizing (
https://github1s.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L344
). This variable, shared across all column families within a region,
triggers a region-wide flush when the total memstore size exceeds the
threshold. Consequently, even if only one column family is actively
accumulating data, the entire region is flushed, potentially leading to the
creation of small HFiles.
We seek guidance on strategies to prevent the generation of small HFiles
and to enable per-column-family flushing in multi-column-family HBase
tables.
-- 
Best Wishes!
Mohammad Abdollahzade Arani
Computer Engineering @ SBU

Reply via email to