On 09.08.2020 04:53, Alexander Korotkov wrote:

I realize that it is not true LSM.
But still I wan to notice that it is able to provide ~10 times increase
of insert speed when size of index is comparable with RAM size.
And "true LSM" from RocksDB shows similar results.
It's very far from being shown.  All the things you've shown is a
naive benchmark.  I don't object that your design can work out some
cases.  And it's great that we have the lsm3 extension now.  But I
think for PostgreSQL core we should think about better design.

Sorry, I mean that at particular benchmark and hardware Lsm3 and RocksDB shows similar performance.
It definitely doesn't mean that it will be true in all other cases.
This is one of the reasons why I have published this Lsm3 and RockDB FDW extensions:
anybody can try to test them at their workload.
It will be very interesting to me to know this results, because I certainly understand that measuring of random insert performance in dummy table is not enough to make some
conclusions.

And I certainly do not want to say that we do not need "right" LSM implementation inside Postgres core.
It just requires an order of magnitude more efforts.
And there are many questions and challenges. For example Postgres buffer size (8kb) seems to be too small for LSM. Should LSM implementation bypass Postgres buffer cache? There pros and contras...

Another issue is logging. Should we just log all operations with LSM in WAL in usual way (as it is done for nbtree and Lsm3)? It seems to me that for LSM alternative and more efficient solutions may be proposed. For example we may not log inserts in top index at all and just replay them during recovery, assuming that this operation with small index is fast enough. And merge of top index with base index can be done in atomic way and so also doesn't require WAL.

As far as I know Anastasia Lubennikova several years ago has implemented LSM for Postgres.
There was some performance issues (with concurrent access?).
This is why the first thing I want to clarify for myself is what are the bottlenecks of LSM architecture and are them caused by LSM itself or its integration in Postgres infrastructure.

I any case, before thinking about details of in-core LSM implementation for Postgres, I think that it is necessary to demonstrate workloads at which RocksDB (or any other existed DBMS with LSM) shows significant performance advantages comparing with Postgres with nbtree/Lsm3.

May be if size of
index will be 100 times larger then
size of RAM, RocksDB will be significantly faster than Lsm3. But modern
servers has 0.5-1Tb of RAM.
Can't believe that there are databases with 100Tb indexes.
Comparison of whole RAM size to single index size looks plain wrong
for me.  I think we can roughly compare whole RAM size to whole
database size.  But also not the whole RAM size is always available
for caching data.  Let's assume half of RAM is used for caching data.
So, a modern server with 0.5-1Tb of RAM, which suffers from random
B-tree insertions and badly needs LSM-like data-structure, runs a
database of 25-50Tb.  Frankly speaking, there is nothing
counterintuitive for me.

There is actually nothing counterintuitive.
I just mean that there are not so much 25-50Tb OLTP databases.



Reply via email to