Dear Hbase community, Our HBase cluster (version 2.1.10) is generating excessively small HFiles, often below 10 MB and sometimes as low as 2 KB. This occurs despite configuring hbase.hregion.memstore.flush.size to a significant value and tuning parameters like hbase.hregion.percolumnfamilyflush.size.lower.bound.min to match the flush size. Additionally, we've enabled 'BASIC' in-memory compaction.
The root cause appears to be the global nature of MemStoreSizing ( https://github1s.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L344 ). This variable, shared across all column families within a region, triggers a region-wide flush when the total memstore size exceeds the threshold. Consequently, even if only one column family is actively accumulating data, the entire region is flushed, potentially leading to the creation of small HFiles. We seek guidance on strategies to prevent the generation of small HFiles and to enable per-column-family flushing in multi-column-family HBase tables. -- Best Wishes! Mohammad Abdollahzade Arani Computer Engineering @ SBU