Re: Why WAL archives enabled by default?

Ivan Daschinsky Fri, 06 Nov 2020 02:21:49 -0800

Kirill and I discussed privately proposed approach. As far as I understand,
Kirill suggests to implement some
heuristic to do a force checkpoint in some cases if user by mistake
misconfigured cluster in order to preserve
requested size of WAL archive.
Currently, as for me, this approach is questionable, because it can cause
some performance problems. But as an option,
it can be used and should be switchable.


пт, 6 нояб. 2020 г. в 12:36, Ivan Daschinsky <[email protected]>:

> Kirill, how your approach will help if user tuned a cluster to do
> checkpoints rarely under load?
> No way.
>
> пт, 6 нояб. 2020 г. в 12:19, ткаленко кирилл <[email protected]>:
>
>> Ivan, I agree with you that the archive is primarily about optimization.
>>
>> If the size of the archive is critical for the user, we have no
>> protection against this, we can always go beyond this limit.
>> Thus, the user needs to remember this and configure it in some way.
>>
>> I suggest not to exceed this limit and give the expected behavior for the
>> user. At the same time, the segments needed for recovery will remain and
>> there will be no data loss.
>>
>> 06.11.2020, 11:29, "Ivan Daschinsky" <[email protected]>:
>> > Guys, fisrt of all, archiving is not for PITR at all, this is
>> optimization.
>> > If we disable archiving, every rollover we need to create new file. If
>> we
>> > enable archiving, we reserve 10 (by default) segments filled with
>> zeroes.
>> > We use mmap by default, so if we use no-archiver approach:
>> > 1. We firstly create new empty file
>> > 2. Call on it sun.nio.ch.FileChannelImpl#map, thats under the hood
>> > a. If file is shorter, than wal segment size, it
>> > calls sun.nio.ch.FileDispatcherImpl#truncate0, this is under the hood
>> just
>> > a system call truncate [1]
>> > b. Than it calls system call mmap on this
>> > file sun.nio.ch.FileChannelImpl#map0, under the hood see [2]
>> > These manipulation are not free and cheap. So rollover will be much much
>> > slower.
>> > If archiving is enabled, 10 segments are already preallocated at the
>> moment
>> > of node's start.
>> >
>> > When archiving is enabled, archiver just copy previous preallocated
>> segment
>> > and move it to archive directory.
>> > This archived segment is crucial for recovery. When new checkpoints
>> > finished, all eligible for trunocating segments are just removed.
>> >
>> > If archiving is disabled, we also write WAL segments in wal directory
>> and
>> > disabling archiving don't prevent you from storing segments, if they are
>> > required for recovery.
>> >
>> >>> Before increasing the size of WAL archive (transferring to archive
>> >
>> > /rollOver, compression, decompression), we can make sure that there
>> will be
>> > enough space in the archive and if there is no such, then we will try to
>> >>> clean it. We cannot delete those segments that are required for
>> recovery
>> >
>> > (between the last two checkpoints) and reserved for example for
>> historical
>> > rebalancing.
>> > First of all, compression/decompression is offtopic here.
>> > Secondly, wal segments are required only with idx higher than LAST
>> > checkpoint marker.
>> > Thirdly, archiving and rolling over can be during checkpoint and we can
>> > broke everything accidentially.
>> > Fourthly, I see no benefits to overcomplicated already complicated
>> logic.
>> > This is basically problem of misunderstanding and tuning.
>> > There are a lot of similar topics for almost every DB. [3]
>> >
>> > [1] -- https://man7.org/linux/man-pages/man2/ftruncate.2.html
>> > [2] -- https://man7.org/linux/man-pages/man2/mmap.2.html
>> > [3] --
>> >
>> https://www.google.com/search?q=pg_wal%2Fxlogtemp+no+space+left+on+device&oq=pg+wal+no
>> >
>> > пт, 6 нояб. 2020 г. в 10:42, ткаленко кирилл <[email protected]>:
>> >
>> >>  Hi, Ivan!
>> >>
>> >>  I have only described ideas. But here are a few more details.
>> >>
>> >>  We can take care not to go beyond
>> >>  DataStorageConfiguration#maxWalArchiveSize.
>> >>
>> >>  Before increasing the size of WAL archive (transferring to archive
>> >>  /rollOver, compression, decompression), we can make sure that there
>> will be
>> >>  enough space in the archive and if there is no such, then we will try
>> to
>> >>  clean it. We cannot delete those segments that are required for
>> recovery
>> >>  (between the last two checkpoints) and reserved for example for
>> historical
>> >>  rebalancing.
>> >>
>> >>  We can receive a notification about the change of checkpoints and the
>> >>  reservation / release of segments, thus we can know how many segments
>> we
>> >>  can delete right now.
>> >>
>> >>  06.11.2020, 09:53, "Ivan Daschinsky" <[email protected]>:
>> >>  >>> For example, when trying to move a segment to the archive.
>> >>  >
>> >>  > We cannot do this, we will lost data. We can truncate archived
>> segment if
>> >>  > and only if it is not required for recovery. If last checkpoint
>> marker
>> >>  > points to segment
>> >>  > with lower index, we cannot delete any segment with higher index.
>> So the
>> >>  > only moment where we can remove truncate segments is a finish of
>> >>  checkpoint.
>> >>  >
>> >>  > пт, 6 нояб. 2020 г. в 09:46, ткаленко кирилл <[email protected]
>> >:
>> >>  >
>> >>  >> Hello, everybody!
>> >>  >>
>> >>  >> As far as I know, WAL archive is used for PITP(GridGain feature)
>> and
>> >>  >> historical rebalancing.
>> >>  >>
>> >>  >> Facundo seems to have a problem with running out of directory
>> >>  >> (/opt/work/walarchive) space.
>> >>  >> Currently, WAL archive is cleared at the end of checkpoint.
>> Potentially
>> >>  >> long transaction may prevent checkpoint starting, thereby not
>> cleaning
>> >>  WAL
>> >>  >> archive, which will lead to such an error.
>> >>  >> At the moment, I see such a WA to increase size of directory
>> >>  >> (/opt/work/walarchive) in k8s and avoid long transactions or
>> something
>> >>  like
>> >>  >> that that modifies data and runs for a long time.
>> >>  >>
>> >>  >> And it is best to fix the logic of working with WAL archive. I
>> think we
>> >>  >> should remove WAL archive cleanup from the end of the checkpoint
>> and
>> >>  do it
>> >>  >> on demand. For example, when trying to move a segment to the
>> archive.
>> >>  >>
>> >>  >> 06.11.2020, 01:58, "Denis Magda" <[email protected]>:
>> >>  >> > Folks,
>> >>  >> >
>> >>  >> > In my understanding, you need the archives only for features
>> such as
>> >>  >> PITR.
>> >>  >> > Considering, that the PITR functionality is not provided in
>> Ignite
>> >>  why do
>> >>  >> > we have the archives enabled by default?
>> >>  >> >
>> >>  >> > How about having this feature disabled by default to prevent the
>> >>  >> following
>> >>  >> > issues experienced by our users:
>> >>  >> >
>> >>  >>
>> >>
>> http://apache-ignite-users.70518.x6.nabble.com/WAL-and-WAL-Archive-volume-size-recommendation-td34458.html
>> >>  >> >
>> >>  >> > -
>> >>  >> > Denis
>> >>  >
>> >>  > --
>> >>  > Sincerely yours, Ivan Daschinskiy
>> >
>> > --
>> > Sincerely yours, Ivan Daschinskiy
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: Why WAL archives enabled by default?

Reply via email to