Ivan Bessonov created IGNITE-18113:
--------------------------------------
Summary: Implement a rows queue for GC in MV storages
Key: IGNITE-18113
URL: https://issues.apache.org/jira/browse/IGNITE-18113
Project: Ignite
Issue Type: Improvement
Reporter: Ivan Bessonov
Please refer to the Epic for the prerequisites. Please also refer to
IGNITE-18031 and IGNITE-18033 for some thoughts on the topic.
IGNITE-18031 is proposed for the implementation in the assumption that it's
easy to implement and then run tests without fear of disk overflow.
But, this may not be the case and there's a chance that it should actually be
closed in favor of the current issue. In depends on how much time we're willing
to spend on the implementation right now and how much simpler the
implementation of the naive approach actually is. We'll evaluate it later.
h3. Algorithm description
First of all, why the naive approach is bad. The reason is simple - it requires
constant reads from the device. This leads to both unnecessary disk operations
and unnecessary page buffer cache invalidation, independent from the engine.
All engines have a limited amount of memory to work with. Generally speaking,
it's significantly less that the amount of data on disk.
Another approach is to have a queue. As already stated, it's very similar to
the way TTL is implemented in Ignite 2.x. Every time there's a commit of the
row, we add a {{(commitTs, rowId)}} pair into the queue. Every time we want to
evict something, we just iterate all pairs until a certain timestamp and delete
these entries from both a queue and the row storage.
* Pros: faster deletion, better utilization of the resources
* Cons: slower insertions
Now, why aren't slow insertions a bad thing? For me personally this is a trickн
question to answer. The way I see it - writes are inherently slow, there's not
much we can do about it.
Meta information must be updated, indexes must be updated, current outgoing
rebalance shapshots must be taken into account. There's even a chance of write
intent resolution taking place, which is weird but we allow it. It's not
necessarily instantaneous.
Having a single additional insert, which is almost always an append (which may
in theory decrease its run time, but that's kinda hard to measure), won't harm.
Especially considering that we're going to perform partial query cleanup in the
same thread later (but after completing raft done closure, I guess, we
shouldn't block the leader from returning the result to the client).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)