Hello folks,

I would like to share an RFC that explores an architecture for
improving cache efficiency and operational stability for HBase
deployments running on object storage systems such as S3 or GCS.
The motivation comes from several issues commonly observed in
object-store-based deployments:

- cache loss during rolling upgrades and region reassignment
- block cache scalability limits due to metadata overhead
- compaction churn invalidating large portions of cached blocks
- slow cache refill when reads fall back to object storage

The proposed design introduces a layered caching model consisting of:

- a scalable distributed block cache implementation
- an optional local block cache
- SidecarFS, a lightweight Hadoop-compatible filesystem layer used as
a temporary refill substrate for newly created HFiles

In this architecture, SidecarFS retains newly generated HFiles for a
limited period of time and allows block caches to refill from a
lower-latency filesystem instead of object storage after compactions
or region movement. The design is intentionally cache-based and does
not introduce a permanent secondary storage layer.

The document also discusses operational considerations such as:
- cache behavior during region reassignment
- compaction-induced cache churn
- wave-based scheduling of major compactions to keep the caching
filesystem bounded

In particular, I would appreciate feedback on whether the proposed
approach fits reasonably well within the existing HBase architecture
and whether there are major concerns with introducing a caching
filesystem layer such as SidecarFS.


RFC document: 
https://docs.google.com/document/d/1JBRUwI90iDXX8yN9FDa4fWy2C0t3bBEhzLhubE4XePs/edit?usp=sharing

Comments and suggestions are very welcome.

Best regards,
- Vladimir Rodionov

Reply via email to