On 09.08.2020 04:53, Alexander Korotkov wrote:
I realize that it is not true LSM.
But still I wan to notice that it is able to provide ~10 times increase
of insert speed when size of index is comparable with RAM size.
And "true LSM" from RocksDB shows similar results.
It's very far from being shown. All the things you've shown is a
naive benchmark. I don't object that your design can work out some
cases. And it's great that we have the lsm3 extension now. But I
think for PostgreSQL core we should think about better design.
Sorry, I mean that at particular benchmark and hardware Lsm3 and RocksDB
shows similar performance.
It definitely doesn't mean that it will be true in all other cases.
This is one of the reasons why I have published this Lsm3 and RockDB FDW
extensions:
anybody can try to test them at their workload.
It will be very interesting to me to know this results, because I
certainly understand
that measuring of random insert performance in dummy table is not enough
to make some
conclusions.
And I certainly do not want to say that we do not need "right" LSM
implementation inside Postgres core.
It just requires an order of magnitude more efforts.
And there are many questions and challenges. For example Postgres buffer
size (8kb) seems to be too small for LSM.
Should LSM implementation bypass Postgres buffer cache? There pros and
contras...
Another issue is logging. Should we just log all operations with LSM in
WAL in usual way (as it is done for nbtree and Lsm3)?
It seems to me that for LSM alternative and more efficient solutions may
be proposed.
For example we may not log inserts in top index at all and just replay
them during recovery, assuming that this operation with
small index is fast enough. And merge of top index with base index can
be done in atomic way and so also doesn't require WAL.
As far as I know Anastasia Lubennikova several years ago has implemented
LSM for Postgres.
There was some performance issues (with concurrent access?).
This is why the first thing I want to clarify for myself is what are the
bottlenecks of LSM architecture
and are them caused by LSM itself or its integration in Postgres
infrastructure.
I any case, before thinking about details of in-core LSM implementation
for Postgres, I think that
it is necessary to demonstrate workloads at which RocksDB (or any other
existed DBMS with LSM)
shows significant performance advantages comparing with Postgres with
nbtree/Lsm3.
May be if size of
index will be 100 times larger then
size of RAM, RocksDB will be significantly faster than Lsm3. But modern
servers has 0.5-1Tb of RAM.
Can't believe that there are databases with 100Tb indexes.
Comparison of whole RAM size to single index size looks plain wrong
for me. I think we can roughly compare whole RAM size to whole
database size. But also not the whole RAM size is always available
for caching data. Let's assume half of RAM is used for caching data.
So, a modern server with 0.5-1Tb of RAM, which suffers from random
B-tree insertions and badly needs LSM-like data-structure, runs a
database of 25-50Tb. Frankly speaking, there is nothing
counterintuitive for me.
There is actually nothing counterintuitive.
I just mean that there are not so much 25-50Tb OLTP databases.