Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)

Manish Rai Jain Sun, 11 May 2025 11:06:30 -0700

Hi hackers,

I’ve been exploring the idea of integrating an LSM tree–based storage engine
into PostgreSQL — similar in spirit to MyRocks for MySQL — by
replacing the underlying
storage while preserving PostgreSQL’s upper layers (planner, executor, MVCC,
etc.).


The motivation stems from the well‑known write‑amplification issues
with B‑trees
under high write throughput. An LSM‑based engine could offer:

   - Significant improvements in write performance and space efficiency,
   especially under heavy ingestion workloads.
   - Better scalability with larger datasets, particularly when compression
   is applied.
   - Comparable read performance (with trade‑offs depending on workload),
   and opportunities to optimize through Bloom filters, tiered compaction
   strategies, etc.
   - Reduced reliance on manual VACUUM: obsolete versions would be purged
   naturally during LSM compactions, potentially eliminating routine heap
   vacuuming (transaction‑ID wrap‑around handling and stats collection would
   still need careful design).

The hoped‑for outcome is a faster, more scalable PostgreSQL for >1 TB
workloads, while maintaining the rich feature set and ecosystem
compatibility users expect from Postgres.

Unlike Neon, this approach is not targeting cloud‑native object storage or
remote WAL streaming, but instead optimizing for maximum performance on
local disks or high‑performance block volumes, where write throughput and
compaction efficiency matter most.

This would likely involve implementing a new Table Access Method
(TAM), possibly
backed by a forked engine such as BadgerDB or RocksDB, adapted to
support PostgreSQL’s
MVCC and WAL semantics.

I’d love to hear your thoughts:


   1. Does this direction make sense for experimentation within the Postgres
   ecosystem?
   2. Are there known architectural blockers or prior discussions/attempts in
   this space worth revisiting?
   3. Would such a project be best developed entirely as a fork, or is there
   openness to evolving TAM to better support pluggable storage with LSM‑like
   semantics?

Looking forward to your feedback.
- Manish

https://github.com/manishrjain

Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)

Reply via email to