Hello folks, I would like to share an RFC that explores an architecture for improving cache efficiency and operational stability for HBase deployments running on object storage systems such as S3 or GCS. The motivation comes from several issues commonly observed in object-store-based deployments:
- cache loss during rolling upgrades and region reassignment - block cache scalability limits due to metadata overhead - compaction churn invalidating large portions of cached blocks - slow cache refill when reads fall back to object storage The proposed design introduces a layered caching model consisting of: - a scalable distributed block cache implementation - an optional local block cache - SidecarFS, a lightweight Hadoop-compatible filesystem layer used as a temporary refill substrate for newly created HFiles In this architecture, SidecarFS retains newly generated HFiles for a limited period of time and allows block caches to refill from a lower-latency filesystem instead of object storage after compactions or region movement. The design is intentionally cache-based and does not introduce a permanent secondary storage layer. The document also discusses operational considerations such as: - cache behavior during region reassignment - compaction-induced cache churn - wave-based scheduling of major compactions to keep the caching filesystem bounded In particular, I would appreciate feedback on whether the proposed approach fits reasonably well within the existing HBase architecture and whether there are major concerns with introducing a caching filesystem layer such as SidecarFS. RFC document: https://docs.google.com/document/d/1JBRUwI90iDXX8yN9FDa4fWy2C0t3bBEhzLhubE4XePs/edit?usp=sharing Comments and suggestions are very welcome. Best regards, - Vladimir Rodionov
