Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Dmitry Pavlov Fri, 23 Mar 2018 02:49:53 -0700

Hi Ivan,

IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?


Sincerely,
Dmitriy Pavlov

пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glu...@gmail.com>:

> Igniters, there's another important question about this matter.
> Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we
> have to do it: it will cause similar performance drop, but if we
> consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well.
>
> Best Regards,
> Ivan Rakov
>
> On 23.03.2018 10:27, Ivan Rakov wrote:
> > Fixes are quite simple.
> > I expect them to be merged in master in a week in worst case.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 22.03.2018 17:49, Denis Magda wrote:
> >> Ivan,
> >>
> >> How quick are you going to merge the fix into the master? Many
> >> persistence
> >> related optimizations have already stacked up. Probably, we can release
> >> them sooner if the community agrees.
> >>
> >> --
> >> Denis
> >>
> >> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glu...@gmail.com>
> >> wrote:
> >>
> >>> Thanks all!
> >>> We seem to have reached a consensus on this issue. I'll just add
> >>> necessary
> >>> fsyncs under IGNITE-7754.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>>
> >>> On 22.03.2018 15:13, Ilya Lantukh wrote:
> >>>
> >>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from
> >>>> data
> >>>> corruption, it doesn't make sence.
> >>>>
> >>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dma...@apache.org>
> >>>> wrote:
> >>>>
> >>>> +1 for the fix of LOG_ONLY
> >>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk <
> >>>>> alexey.goncha...@gmail.com> wrote:
> >>>>>
> >>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the
> >>>>> provided
> >>>>>> performance results.
> >>>>>>
> >>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >>>>>>
> >>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a
> >>>>>> drop
> >>>>>> at
> >>>>>> all, provided that we fixing a bug. I.e. should we implement it
> >>>>>> correctly
> >>>>>> in the first place we would never notice any "drop".
> >>>>>>> I do not understand why someone would like to use current broken
> >>>>>>> mode.
> >>>>>>>
> >>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov
> >>>>>>> <dpavlov....@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi, I think option 1 is better. As Val said any mode that allows
> >>>>>>> corruption
> >>>>>>>
> >>>>>>>> does not make much sense.
> >>>>>>>>
> >>>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT
> >>>>>>>>
> >>>>>>> (FSYNC
> >>>>>>> now), is still significant perfromance boost.
> >>>>>>>> Sincerely,
> >>>>>>>> Dmitriy Pavlov
> >>>>>>>>
> >>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glu...@gmail.com>:
> >>>>>>>>
> >>>>>>>> I've attached benchmark results to the JIRA ticket.
> >>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of
> >>>>>>>>>
> >>>>>>>> WAL
> >>>>>> compaction enabled flag. It's pretty significant drop: WAL
> >>>>>>>> compaction
> >>>>>> itself gives only ~3% drop.
> >>>>>>>>> I see two options here:
> >>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to
> >>>>>>>>>
> >>>>>>>> release
> >>>>>>>> AI 2.5 with 7% drop.
> >>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note
> >>>>>>>>> to AI
> >>>>>>>>>
> >>>>>>>> 2.5
> >>>>>>> that we added power loss durability in default mode, but user may
> >>>>>>>>> fallback to previous LOG_ONLY in order to retain performance.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>> Best Regards,
> >>>>>>>>> Ivan Rakov
> >>>>>>>>>
> >>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote:
> >>>>>>>>>
> >>>>>>>>>> Val,
> >>>>>>>>>>
> >>>>>>>>>> If a storage is in
> >>>>>>>>>>> corrupted state, does it mean that it needs to be completely
> >>>>>>>>>>>
> >>>>>>>>>> removed
> >>>>>>> and
> >>>>>>>>> cluster needs to be restarted without data?
> >>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be
> >>>>>>>>>>
> >>>>>>>>> lost,
> >>>>>> but only in *power loss**/ OS crash* case.
> >>>>>>>>>> kill -9, JVM crash, death of critical system thread and all
> >>>>>>>>>> other
> >>>>>>>>>> cases that usually take place are variations of *process crash*.
> >>>>>>>>>>
> >>>>>>>>> All
> >>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in
> >>>>>>>>> case
> >>>>>> of
> >>>>>>>> process crash.
> >>>>>>>>>> If so, I'm not sure any mode
> >>>>>>>>>>> that allows corruption makes much sense to me.
> >>>>>>>>>>>
> >>>>>>>>>> It depends on performance impact of enforcing power-loss
> >>>>>>>>>>
> >>>>>>>>> corruption
> >>>>>> safety. Price of full protection from power loss is high - FSYNC
> >>>>>>>>> is
> >>>>>> way slower (2-10 times) than other WAL modes. The question is
> >>>>>>>>> whether
> >>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of
> >>>>>>>>> last
> >>>>>>> updates can) will affect performance as badly as strong
> >>>>>>>>> guarantees.
> >>>>>> I'll share benchmark results soon.
> >>>>>>>>>> Best Regards,
> >>>>>>>>>> Ivan Rakov
> >>>>>>>>>>
> >>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Guys,
> >>>>>>>>>>>
> >>>>>>>>>>> What do we understand under "data corruption" here? If a
> >>>>>>>>>>> storage
> >>>>>>>>>>>
> >>>>>>>>>> is
> >>>>>>> in
> >>>>>>>
> >>>>>>>> corrupted state, does it mean that it needs to be completely
> >>>>>>>>>> removed
> >>>>>>> and
> >>>>>>>>> cluster needs to be restarted without data? If so, I'm not sure
> >>>>>>>>>> any
> >>>>>>> mode
> >>>>>>>>> that allows corruption makes much sense to me. How am I supposed
> >>>>>>>>>> to
> >>>>>>> use a
> >>>>>>>>>>> database, if virtually any failure can end with complete
> >>>>>>>>>>> loss of
> >>>>>>>>>>>
> >>>>>>>>>> data?
> >>>>>>>> In any case, this definitely should not be a default behavior.
> >>>>>>>>>> If
> >>>>>> user ever
> >>>>>>>>>>> switches to corruption-unsafe mode, there should be a clear
> >>>>>>>>>>>
> >>>>>>>>>> warning
> >>>>>>> about
> >>>>>>>>>>> this.
> >>>>>>>>>>>
> >>>>>>>>>>> -Val
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov <
> >>>>>>>>>>>
> >>>>>>>>>> ivan.glu...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>> Ticket to track changes:
> >>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best Regards,
> >>>>>>>>>>>> Ivan Rakov
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov <
> >>>>>>>>>>>> ivan.glu...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>>>>>>> Vladimir,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees
> >>>>>>>>>>>>>> unless power
> >>>>>>>>>>>>>> loss has happened.
> >>>>>>>>>>>>>> Seems like we need to measure performance difference to
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> decide
> >>>>>> whether do
> >>>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> just
> >>>>>> fix
> >>>>>>>> these
> >>>>>>>>>>>>>> bugs without introducing new mode; if it will be
> >>>>>>>>>>>>>> perceptible,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> we'll
> >>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE.
> >>>>>>>>>>>>>> Makes sense?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Yes, this sounds like the right approach.
> >>>>>>>>>>>>>>
> >>>>
> >
>
>

Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC

Reply via email to