Hello, Igniters!

Currently, for page replacement (page rotation between page-memory and
disk) we use Random-LRU algorithm. It has a low maintenance cost and
relatively simple implementation, but it has many disadvantages and affects
performance very much when replacement is started. We even have warnings in
the log when page replacement started and a special event for this. I know
Ignite deployments where administrators force to restart cluster nodes
periodically to avoid page replacement.

I have a couple of proposals to improve page replacement in Ignite:

*Batch page replacement.*

Main idea: in some cases start background task to evict cold pages from
page-memory (for example, pages, last touched more than 12 hours ago).

The task can be started:
- Automatically, triggered by some events, for example, when we expect a
start of Random-LRU page replacing soon (allocated more than 90% of
page-memory) + we have enough amount of cold pages (we need some metric to
calculate the number of cold pages) + some time passed since last batch
page replacement (to avoid too much resource consumption by background
batch replacement).
- Manually (JMX or control.sh), if an administrator wants to control the
time of batch replacement more precisely (for example, to avoid the start
of this task during peak time).

Batch page replacement will be helpful in some workloads (when some data
much colder than another), it can prevent the starting of Random-LRU page
replacement, or if Random-LRU already started it can provide conditions to
stop it.

*Change the page replacement algorithm.*

Good page replacement algorithm should satisfy the requirements:
- low page-fault rates for typical workload
- low maintenance cost (low resource consumption to maintain additional
structures required for page replacement)
- fast searching of next page for replacement
- sequential scans resistance (one sequential scan should not evict all
relatively hot pages from page-memory)

Our Random-LRU has low maintenance cost and sequential scan resistant, but
to find the next page for replacement in the best case we scan 5 pages, in
the worst case we can scan all data region segment. Also, due to random
nature, it's not very effective in predicting the right page for
replacement to minimize the page-fault rate. And it's much time required to
totally evict old cold data.

Usually, database management systems and operating systems use
modifications of LRU algorithms. These algorithms have higher maintenance
costs (pages list should be modified on each page access), but often they
are effective from a "page-fault rate" point of view and have O(1)
complexity for a searching page to replace. Simple LRU is not sequential
scan resistant, but modifications that utilize page access frequency are
resistant to sequential scan.

We can try one of the modifications of LRU as well (for example, "segmented
LRU" seems suitable for Ignite).

Ignite is a memory-centric product, so "low maintenance cost" is very
critical. And there is a risk that page replacement algorithm can affect
workloads, where page replacement is not used (enough RAM to store all
data). Of course, any page replacement solution should be carefully
benchmarked.


Igniters, WDYT? If any of these proposals look reasonable to you, I will
create IEP and start implementation.

Also, I have a draft implementation of system view to determine how hot are
pages in page-memory [1]. I think it will be useful for any of these
approaches (and even if we decide to left page replacement as is).

[1]: https://issues.apache.org/jira/browse/IGNITE-13726

Reply via email to