[DISCUSS] RFC: Improving Cache Efficiency and Stability for HBase on Object Storage

Vladimir Rodionov Thu, 19 Mar 2026 09:25:04 -0700

Hello folks,
I would like to share an RFC that explores an architecture for
improving cache efficiency and operational stability for HBase
deployments running on object storage systems such as S3 or GCS. The
motivation comes from several issues commonly observed in
object-store-based deployments:


 - cache loss during rolling upgrades and region reassignment - block
cache scalability limits due to metadata overhead
- compaction churn invalidating large portions of cached blocks - slow
cache refill when reads fall back to object storage

The proposed design introduces a layered caching model consisting of:
- a scalable distributed block cache implementation
- an optional local block cache
- SidecarFS, a lightweight Hadoop-compatible filesystem layer used as
a temporary refill substrate for newly created HFiles In this
architecture, SidecarFS retains newly generated HFiles for a limited
period of time and allows block caches to refill from a lower-latency
filesystem instead of object storage after compactions or region
movement.

The design is intentionally cache-based and does not introduce a
permanent secondary storage layer. The document also discusses
operational considerations such as:
- cache behavior during region reassignment
- compaction-induced cache churn
- wave-based scheduling of major compactions to keep the caching
filesystem bounded

In particular, I would appreciate feedback on whether the proposed
approach fits reasonably well within the existing HBase architecture
and whether there are major concerns with introducing a caching
filesystem layer such as SidecarFS.

RFC document: 
https://docs.google.com/document/d/1JBRUwI90iDXX8yN9FDa4fWy2C0t3bBEhzLhubE4XePs/edit?usp=sharing

Comments and suggestions are very welcome.
 Best regards,
- Vladimir Rodionov

[DISCUSS] RFC: Improving Cache Efficiency and Stability for HBase on Object Storage

Reply via email to