Dmitry, Thanks for clarification. So it sounds like if we fix all other modes as we discuss here, NONE would be the only one allowing corruption. I also don't see much sense in this and I think we should clearly state this in the doc, as well print out a warning if NONE mode is used. Eventually, if it's confirmed that there are no reasonable use cases for it, we can deprecate it.
-Val On Fri, Mar 23, 2018 at 3:26 PM, Dmitry Pavlov <dpavlov....@gmail.com> wrote: > Hi Val, > > NONE means that the WAL log is disabled and not written at all. Use of the > mode is at your own risk. It is possible that restore state after the crash > at the middle of checkpoint will not succeed. I do not see much sence in > it, especially in production. > > BACKGROUND is full functional WAL mode, but allows some delay before flush > to disk. > > Sincerely, > Dmitriy Pavlov > > сб, 24 мар. 2018 г. в 1:07, Valentin Kulichenko < > valentin.kuliche...@gmail.com>: > > > I agree. In my view, any possibility to get a corrupted storage is a bug > > which needs to be fixed. > > > > BTW, can someone explain semantics of NONE mode? What is the difference > > from BACKGROUND from user's perspective? Is there any particular use case > > where it can be used? > > > > -Val > > > > On Fri, Mar 23, 2018 at 2:49 AM, Dmitry Pavlov <dpavlov....@gmail.com> > > wrote: > > > > > Hi Ivan, > > > > > > IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree? > > > > > > Sincerely, > > > Dmitriy Pavlov > > > > > > пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glu...@gmail.com>: > > > > > > > Igniters, there's another important question about this matter. > > > > Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that > we > > > > have to do it: it will cause similar performance drop, but if we > > > > consider LOG_ONLY broken without these fixes, BACKGROUND is broken as > > > well. > > > > > > > > Best Regards, > > > > Ivan Rakov > > > > > > > > On 23.03.2018 10:27, Ivan Rakov wrote: > > > > > Fixes are quite simple. > > > > > I expect them to be merged in master in a week in worst case. > > > > > > > > > > Best Regards, > > > > > Ivan Rakov > > > > > > > > > > On 22.03.2018 17:49, Denis Magda wrote: > > > > >> Ivan, > > > > >> > > > > >> How quick are you going to merge the fix into the master? Many > > > > >> persistence > > > > >> related optimizations have already stacked up. Probably, we can > > > release > > > > >> them sooner if the community agrees. > > > > >> > > > > >> -- > > > > >> Denis > > > > >> > > > > >> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov < > ivan.glu...@gmail.com> > > > > >> wrote: > > > > >> > > > > >>> Thanks all! > > > > >>> We seem to have reached a consensus on this issue. I'll just add > > > > >>> necessary > > > > >>> fsyncs under IGNITE-7754. > > > > >>> > > > > >>> Best Regards, > > > > >>> Ivan Rakov > > > > >>> > > > > >>> > > > > >>> On 22.03.2018 15:13, Ilya Lantukh wrote: > > > > >>> > > > > >>>> +1 for fixing LOG_ONLY. If current implementation doesn't > protect > > > from > > > > >>>> data > > > > >>>> corruption, it doesn't make sence. > > > > >>>> > > > > >>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda < > dma...@apache.org> > > > > >>>> wrote: > > > > >>>> > > > > >>>> +1 for the fix of LOG_ONLY > > > > >>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < > > > > >>>>> alexey.goncha...@gmail.com> wrote: > > > > >>>>> > > > > >>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the > > > > >>>>> provided > > > > >>>>>> performance results. > > > > >>>>>> > > > > >>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov < > > voze...@gridgain.com > > > >: > > > > >>>>>> > > > > >>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and > not a > > > > >>>>>> drop > > > > >>>>>> at > > > > >>>>>> all, provided that we fixing a bug. I.e. should we implement > it > > > > >>>>>> correctly > > > > >>>>>> in the first place we would never notice any "drop". > > > > >>>>>>> I do not understand why someone would like to use current > > broken > > > > >>>>>>> mode. > > > > >>>>>>> > > > > >>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov > > > > >>>>>>> <dpavlov....@gmail.com> > > > > >>>>>>> wrote: > > > > >>>>>>> > > > > >>>>>>> Hi, I think option 1 is better. As Val said any mode that > > allows > > > > >>>>>>> corruption > > > > >>>>>>> > > > > >>>>>>>> does not make much sense. > > > > >>>>>>>> > > > > >>>>>>>> What Ivan mentioned here as drop, in relation to old mode > > > DEFAULT > > > > >>>>>>>> > > > > >>>>>>> (FSYNC > > > > >>>>>>> now), is still significant perfromance boost. > > > > >>>>>>>> Sincerely, > > > > >>>>>>>> Dmitriy Pavlov > > > > >>>>>>>> > > > > >>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov < > > ivan.glu...@gmail.com > > > >: > > > > >>>>>>>> > > > > >>>>>>>> I've attached benchmark results to the JIRA ticket. > > > > >>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, > independent > > > of > > > > >>>>>>>>> > > > > >>>>>>>> WAL > > > > >>>>>> compaction enabled flag. It's pretty significant drop: WAL > > > > >>>>>>>> compaction > > > > >>>>>> itself gives only ~3% drop. > > > > >>>>>>>>> I see two options here: > > > > >>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be > ready > > > to > > > > >>>>>>>>> > > > > >>>>>>>> release > > > > >>>>>>>> AI 2.5 with 7% drop. > > > > >>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release > note > > > > >>>>>>>>> to AI > > > > >>>>>>>>> > > > > >>>>>>>> 2.5 > > > > >>>>>>> that we added power loss durability in default mode, but user > > may > > > > >>>>>>>>> fallback to previous LOG_ONLY in order to retain > performance. > > > > >>>>>>>>> > > > > >>>>>>>>> Thoughts? > > > > >>>>>>>>> > > > > >>>>>>>>> Best Regards, > > > > >>>>>>>>> Ivan Rakov > > > > >>>>>>>>> > > > > >>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> Val, > > > > >>>>>>>>>> > > > > >>>>>>>>>> If a storage is in > > > > >>>>>>>>>>> corrupted state, does it mean that it needs to be > > completely > > > > >>>>>>>>>>> > > > > >>>>>>>>>> removed > > > > >>>>>>> and > > > > >>>>>>>>> cluster needs to be restarted without data? > > > > >>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will > > be > > > > >>>>>>>>>> > > > > >>>>>>>>> lost, > > > > >>>>>> but only in *power loss**/ OS crash* case. > > > > >>>>>>>>>> kill -9, JVM crash, death of critical system thread and > all > > > > >>>>>>>>>> other > > > > >>>>>>>>>> cases that usually take place are variations of *process > > > crash*. > > > > >>>>>>>>>> > > > > >>>>>>>>> All > > > > >>>>>>> WAL modes (except NONE, of course) ensure corruption-safety > in > > > > >>>>>>>>> case > > > > >>>>>> of > > > > >>>>>>>> process crash. > > > > >>>>>>>>>> If so, I'm not sure any mode > > > > >>>>>>>>>>> that allows corruption makes much sense to me. > > > > >>>>>>>>>>> > > > > >>>>>>>>>> It depends on performance impact of enforcing power-loss > > > > >>>>>>>>>> > > > > >>>>>>>>> corruption > > > > >>>>>> safety. Price of full protection from power loss is high - > FSYNC > > > > >>>>>>>>> is > > > > >>>>>> way slower (2-10 times) than other WAL modes. The question is > > > > >>>>>>>>> whether > > > > >>>>>>> ensuring weaker guarantees (corruption can't happen, but loss > > of > > > > >>>>>>>>> last > > > > >>>>>>> updates can) will affect performance as badly as strong > > > > >>>>>>>>> guarantees. > > > > >>>>>> I'll share benchmark results soon. > > > > >>>>>>>>>> Best Regards, > > > > >>>>>>>>>> Ivan Rakov > > > > >>>>>>>>>> > > > > >>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Guys, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> What do we understand under "data corruption" here? If a > > > > >>>>>>>>>>> storage > > > > >>>>>>>>>>> > > > > >>>>>>>>>> is > > > > >>>>>>> in > > > > >>>>>>> > > > > >>>>>>>> corrupted state, does it mean that it needs to be completely > > > > >>>>>>>>>> removed > > > > >>>>>>> and > > > > >>>>>>>>> cluster needs to be restarted without data? If so, I'm not > > sure > > > > >>>>>>>>>> any > > > > >>>>>>> mode > > > > >>>>>>>>> that allows corruption makes much sense to me. How am I > > > supposed > > > > >>>>>>>>>> to > > > > >>>>>>> use a > > > > >>>>>>>>>>> database, if virtually any failure can end with complete > > > > >>>>>>>>>>> loss of > > > > >>>>>>>>>>> > > > > >>>>>>>>>> data? > > > > >>>>>>>> In any case, this definitely should not be a default > behavior. > > > > >>>>>>>>>> If > > > > >>>>>> user ever > > > > >>>>>>>>>>> switches to corruption-unsafe mode, there should be a > clear > > > > >>>>>>>>>>> > > > > >>>>>>>>>> warning > > > > >>>>>>> about > > > > >>>>>>>>>>> this. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> -Val > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov < > > > > >>>>>>>>>>> > > > > >>>>>>>>>> ivan.glu...@gmail.com> > > > > >>>>>>> wrote: > > > > >>>>>>>>>>> Ticket to track changes: > > > > >>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754 > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Best Regards, > > > > >>>>>>>>>>>> Ivan Rakov > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov < > > > > >>>>>>>>>>>> ivan.glu...@gmail.com > > > > >>>>>>>> wrote: > > > > >>>>>>>>>>>>> Vladimir, > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write > > > guarantees > > > > >>>>>>>>>>>>>> unless power > > > > >>>>>>>>>>>>>> loss has happened. > > > > >>>>>>>>>>>>>> Seems like we need to measure performance difference > to > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> decide > > > > >>>>>> whether do > > > > >>>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, > > we'll > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> just > > > > >>>>>> fix > > > > >>>>>>>> these > > > > >>>>>>>>>>>>>> bugs without introducing new mode; if it will be > > > > >>>>>>>>>>>>>> perceptible, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> we'll > > > > >>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE. > > > > >>>>>>>>>>>>>> Makes sense? > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Yes, this sounds like the right approach. > > > > >>>>>>>>>>>>>> > > > > >>>> > > > > > > > > > > > > > > > > > > >