Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)

Matthias van de Meent Mon, 12 May 2025 12:20:54 -0700

On Sun, 11 May 2025 at 14:06, Manish Rai Jain <manishrj...@gmail.com> wrote:
>
> Hi hackers,
>
> I’ve been exploring the idea of integrating an LSM tree–based storage engine 
> into PostgreSQL — similar in spirit to MyRocks for MySQL — by replacing the 
> underlying storage while preserving PostgreSQL’s upper layers (planner, 
> executor, MVCC, etc.).
>
> The motivation stems from the well‑known write‑amplification issues with 
> B‑trees under high write throughput. An LSM‑based engine could offer:
>
> Significant improvements in write performance and space efficiency, 
> especially under heavy ingestion workloads.
> Better scalability with larger datasets, particularly when compression is 
> applied.
> Comparable read performance (with trade‑offs depending on workload), and 
> opportunities to optimize through Bloom filters, tiered compaction 
> strategies, etc.
> Reduced reliance on manual VACUUM: obsolete versions would be purged 
> naturally during LSM compactions, potentially eliminating routine heap 
> vacuuming (transaction‑ID wrap‑around handling and stats collection would 
> still need careful design).
>
> The hoped‑for outcome is a faster, more scalable PostgreSQL for >1 TB 
> workloads, while maintaining the rich feature set and ecosystem compatibility 
> users expect from Postgres.
>
> Unlike Neon, this approach is not targeting cloud‑native object storage or 
> remote WAL streaming, but instead optimizing for maximum performance on local 
> disks or high‑performance block volumes, where write throughput and 
> compaction efficiency matter most.


Note: Neon does not exactly structure anything PostgreSQL-visible in
an LSM structure. From Postgres' point of view, it's much more similar
to an LSM-based file system than an LSM-based TableAM or IndexAM.

> This would likely involve implementing a new Table Access Method (TAM), 
> possibly backed by a forked engine such as BadgerDB or RocksDB, adapted to 
> support PostgreSQL’s MVCC and WAL semantics.
>
> I’d love to hear your thoughts:
>
> Does this direction make sense for experimentation within the Postgres 
> ecosystem?

IMV, PostgreSQL is uniquely positioned for experimentation with
different index data structures, while it has significant restrictions
on Table AM implementations through the Table AM API and our shared
buffers. Given the significant differences between the usual LSM
(arbitrarily large ranges of values) and block-based storage I'm not
sure how well Postgres would accomodate LSM-based storage. See also
below.

> Are there known architectural blockers or prior discussions/attempts in this 
> space worth revisiting?

It depends on what you're planning to do. LSM-based TableAMs usually
are implemented with Index-organized data in mind. PostgreSQL,
however, expects its tables to have heap-like semantics for stored
tuples, with TID-addressible tuples that identify an unchanging set of
non-summarizingly indexed column values.

Additionally, PostgreSQL has its own buffer pool that is
block/page-based. You might be able to convince an LSM engine to use
block IO and page storage, but if that's not in your plans, you'll
probably also have to figure out how to do buffering and
checkpointing, and integrate with restartpoints.

> Would such a project be best developed entirely as a fork, or is there 
> openness to evolving TAM to better support pluggable storage with LSM‑like 
> semantics?

I think the project is open to extensions of the TableAM to better
support new TableAMs, but only if that benefits the project as a
whole. E.g. if there are changes that impact the way things are going
to be indexed, you'll need to have a compelling story about why that's
needed and how indexes need to be adapted to support the new system.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)

Reply via email to