Re: LSM tree for Postgres

Konstantin Knizhnik Tue, 04 Aug 2020 23:13:39 -0700



On 05.08.2020 02:59, Alexander Korotkov wrote:


The things you're writing makes me uneasy.  I initially understood
lsm3 as a quick and dirty prototype, while you're probably keeping
some design in your mind (for instance, original design of LSM).
However, your message makes me think you're trying to defend the
approach currently implemented in lsm3 extension.  Therefore, I've to
criticise this approach.

1) The base index can degrade.  At first, since merge can cause page
splits.  Therefore logical ordering of pages will become less
correlated with their physical ordering with each merge.
2) If your workload will include updates and/or deletes, page
utilization may also degrade.
3) While base index degrades, merge performance also degrades.
Traverse of base index in logical order will require more and more
random reads (at some point almost every page read will be random).
While the base index becomes large and/or bloat, you push less top
index tuples to a single base index page (at some point you will push
one tuple per page).

Original LSM design implies strict guarantees over average resources
spent per index operation.  Your design doesn't.  Moreover, I bet lsm3
will degrade significantly even on insert-only workload.  It should
degrade to the performance level of B-tree once you insert enough
data.  Try something like number_of_merges =
numer_of_tuples_per_index_page * 2 and you should see this
degradation.  Real LSM doesn't degrade that way.


I mostly agree with your critices.

My Lsm3 is not true LSM, but from my point of view it preserves basicprinciples of LSM: fast and small top index and bulk updates of main index.My experiments with RocksDB shows that degradation also takes place inthis case. More experiments are needed to compare two approaches.

Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,insertion of random keys can cause split of B-Tree page.In the worst case half of B-Tree page will be empty. So B-Tree size willbe two times larger than ideal tree.It may cause degrade up to two times. But that is all. There should notbe infinite degrade of speed tending to zero.


Right now vacuum process Lsm3 indexes in usual way.
Removing records from top indexes may be not needed at all (just because
this indexes will be truncated in any case).
But as far as size of top index is expected to be small enough
vacuumming it should not take a long time,
so I didn't to avoid it (although it should not be difficult - just
disable ambulkdelete for correspondent  nbtree wrappers).
It doesn't seem important, but I don't get your point here.  Postgres
expects ambulkdelete to delete TIDs from index.  If you don't delete
it from the top index, this TID will be merged to the base index.  And
that could lead wrong query answers unless you eliminate those TIDs in
a different way (during the merge stage or something).

Yes, your are right. It is not possible to avoid delete TIDs from topindexes.

Concerning deletes from main index - I do not understand how it can be
optimized.

This is a trick you can learn from almost every LSM implementation.
For instance, check docs for leveldb [1] about "delete marker".  For
sure, that requires some redesign of the vacuum and can't be done in
extension (at least in the reasonable way).  But, frankly speaking, I
think core modifications are inevitable to utilize the power of LSM in
PostgreSQL.

The main idea of Lsm3 was to investigate whether it is possible toachieve the same result as with "classical" LSMusing standard Postgres nbtree indexes. Right now it seems t me thatanswer is positive, but I have not performedexhaustive measurements. For example I have not measured vacuum overhead(it was enabled, so vacuumming takes placein my benchmark, but I have not tries to separate its overhead andinfluence on performance), index search speed,...

Links
1. https://github.com/google/leveldb/blob/master/doc/impl.md

------
Regards,
Alexander Korotkov

Re: LSM tree for Postgres

Reply via email to