Hi Ivan, IMO we have to add extra FSYNCS for BACKGROUND WAL. Agree?
Sincerely, Dmitriy Pavlov пт, 23 мар. 2018 г. в 12:23, Ivan Rakov <ivan.glu...@gmail.com>: > Igniters, there's another important question about this matter. > Do we want to add extra FSYNCS for BACKGROUND WAL mode? I think that we > have to do it: it will cause similar performance drop, but if we > consider LOG_ONLY broken without these fixes, BACKGROUND is broken as well. > > Best Regards, > Ivan Rakov > > On 23.03.2018 10:27, Ivan Rakov wrote: > > Fixes are quite simple. > > I expect them to be merged in master in a week in worst case. > > > > Best Regards, > > Ivan Rakov > > > > On 22.03.2018 17:49, Denis Magda wrote: > >> Ivan, > >> > >> How quick are you going to merge the fix into the master? Many > >> persistence > >> related optimizations have already stacked up. Probably, we can release > >> them sooner if the community agrees. > >> > >> -- > >> Denis > >> > >> On Thu, Mar 22, 2018 at 5:22 AM, Ivan Rakov <ivan.glu...@gmail.com> > >> wrote: > >> > >>> Thanks all! > >>> We seem to have reached a consensus on this issue. I'll just add > >>> necessary > >>> fsyncs under IGNITE-7754. > >>> > >>> Best Regards, > >>> Ivan Rakov > >>> > >>> > >>> On 22.03.2018 15:13, Ilya Lantukh wrote: > >>> > >>>> +1 for fixing LOG_ONLY. If current implementation doesn't protect from > >>>> data > >>>> corruption, it doesn't make sence. > >>>> > >>>> On Wed, Mar 21, 2018 at 10:38 PM, Denis Magda <dma...@apache.org> > >>>> wrote: > >>>> > >>>> +1 for the fix of LOG_ONLY > >>>>> On Wed, Mar 21, 2018 at 11:23 AM, Alexey Goncharuk < > >>>>> alexey.goncha...@gmail.com> wrote: > >>>>> > >>>>> +1 for fixing LOG_ONLY to enforce corruption safety given the > >>>>> provided > >>>>>> performance results. > >>>>>> > >>>>>> 2018-03-21 18:20 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > >>>>>> > >>>>>> +1 for accepting drop in LOG_ONLY. 7% is not that much and not a > >>>>>> drop > >>>>>> at > >>>>>> all, provided that we fixing a bug. I.e. should we implement it > >>>>>> correctly > >>>>>> in the first place we would never notice any "drop". > >>>>>>> I do not understand why someone would like to use current broken > >>>>>>> mode. > >>>>>>> > >>>>>>> On Wed, Mar 21, 2018 at 6:11 PM, Dmitry Pavlov > >>>>>>> <dpavlov....@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Hi, I think option 1 is better. As Val said any mode that allows > >>>>>>> corruption > >>>>>>> > >>>>>>>> does not make much sense. > >>>>>>>> > >>>>>>>> What Ivan mentioned here as drop, in relation to old mode DEFAULT > >>>>>>>> > >>>>>>> (FSYNC > >>>>>>> now), is still significant perfromance boost. > >>>>>>>> Sincerely, > >>>>>>>> Dmitriy Pavlov > >>>>>>>> > >>>>>>>> ср, 21 мар. 2018 г. в 17:56, Ivan Rakov <ivan.glu...@gmail.com>: > >>>>>>>> > >>>>>>>> I've attached benchmark results to the JIRA ticket. > >>>>>>>>> We observe ~7% drop in "fair" LOG_ONLY_SAFE mode, independent of > >>>>>>>>> > >>>>>>>> WAL > >>>>>> compaction enabled flag. It's pretty significant drop: WAL > >>>>>>>> compaction > >>>>>> itself gives only ~3% drop. > >>>>>>>>> I see two options here: > >>>>>>>>> 1) Change LOG_ONLY behavior. That implies that we'll be ready to > >>>>>>>>> > >>>>>>>> release > >>>>>>>> AI 2.5 with 7% drop. > >>>>>>>>> 2) Introduce LOG_ONLY_SAFE, make it default, add release note > >>>>>>>>> to AI > >>>>>>>>> > >>>>>>>> 2.5 > >>>>>>> that we added power loss durability in default mode, but user may > >>>>>>>>> fallback to previous LOG_ONLY in order to retain performance. > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>>> > >>>>>>>>> Best Regards, > >>>>>>>>> Ivan Rakov > >>>>>>>>> > >>>>>>>>> On 20.03.2018 16:00, Ivan Rakov wrote: > >>>>>>>>> > >>>>>>>>>> Val, > >>>>>>>>>> > >>>>>>>>>> If a storage is in > >>>>>>>>>>> corrupted state, does it mean that it needs to be completely > >>>>>>>>>>> > >>>>>>>>>> removed > >>>>>>> and > >>>>>>>>> cluster needs to be restarted without data? > >>>>>>>>>> Yes, there's a chance that in LOG_ONLY all local data will be > >>>>>>>>>> > >>>>>>>>> lost, > >>>>>> but only in *power loss**/ OS crash* case. > >>>>>>>>>> kill -9, JVM crash, death of critical system thread and all > >>>>>>>>>> other > >>>>>>>>>> cases that usually take place are variations of *process crash*. > >>>>>>>>>> > >>>>>>>>> All > >>>>>>> WAL modes (except NONE, of course) ensure corruption-safety in > >>>>>>>>> case > >>>>>> of > >>>>>>>> process crash. > >>>>>>>>>> If so, I'm not sure any mode > >>>>>>>>>>> that allows corruption makes much sense to me. > >>>>>>>>>>> > >>>>>>>>>> It depends on performance impact of enforcing power-loss > >>>>>>>>>> > >>>>>>>>> corruption > >>>>>> safety. Price of full protection from power loss is high - FSYNC > >>>>>>>>> is > >>>>>> way slower (2-10 times) than other WAL modes. The question is > >>>>>>>>> whether > >>>>>>> ensuring weaker guarantees (corruption can't happen, but loss of > >>>>>>>>> last > >>>>>>> updates can) will affect performance as badly as strong > >>>>>>>>> guarantees. > >>>>>> I'll share benchmark results soon. > >>>>>>>>>> Best Regards, > >>>>>>>>>> Ivan Rakov > >>>>>>>>>> > >>>>>>>>>> On 20.03.2018 5:09, Valentin Kulichenko wrote: > >>>>>>>>>> > >>>>>>>>>>> Guys, > >>>>>>>>>>> > >>>>>>>>>>> What do we understand under "data corruption" here? If a > >>>>>>>>>>> storage > >>>>>>>>>>> > >>>>>>>>>> is > >>>>>>> in > >>>>>>> > >>>>>>>> corrupted state, does it mean that it needs to be completely > >>>>>>>>>> removed > >>>>>>> and > >>>>>>>>> cluster needs to be restarted without data? If so, I'm not sure > >>>>>>>>>> any > >>>>>>> mode > >>>>>>>>> that allows corruption makes much sense to me. How am I supposed > >>>>>>>>>> to > >>>>>>> use a > >>>>>>>>>>> database, if virtually any failure can end with complete > >>>>>>>>>>> loss of > >>>>>>>>>>> > >>>>>>>>>> data? > >>>>>>>> In any case, this definitely should not be a default behavior. > >>>>>>>>>> If > >>>>>> user ever > >>>>>>>>>>> switches to corruption-unsafe mode, there should be a clear > >>>>>>>>>>> > >>>>>>>>>> warning > >>>>>>> about > >>>>>>>>>>> this. > >>>>>>>>>>> > >>>>>>>>>>> -Val > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Mar 16, 2018 at 1:06 AM, Ivan Rakov < > >>>>>>>>>>> > >>>>>>>>>> ivan.glu...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>>> Ticket to track changes: > >>>>>>>>>>>> https://issues.apache.org/jira/browse/IGNITE-7754 > >>>>>>>>>>>> > >>>>>>>>>>>> Best Regards, > >>>>>>>>>>>> Ivan Rakov > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 16.03.2018 10:58, Dmitriy Setrakyan wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Mar 16, 2018 at 12:55 AM, Ivan Rakov < > >>>>>>>>>>>> ivan.glu...@gmail.com > >>>>>>>> wrote: > >>>>>>>>>>>>> Vladimir, > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Unlike BACKGROUND, LOG_ONLY provides strict write guarantees > >>>>>>>>>>>>>> unless power > >>>>>>>>>>>>>> loss has happened. > >>>>>>>>>>>>>> Seems like we need to measure performance difference to > >>>>>>>>>>>>>> > >>>>>>>>>>>>> decide > >>>>>> whether do > >>>>>>>>>>>>>> we need separate WAL mode. If it will be invisible, we'll > >>>>>>>>>>>>>> > >>>>>>>>>>>>> just > >>>>>> fix > >>>>>>>> these > >>>>>>>>>>>>>> bugs without introducing new mode; if it will be > >>>>>>>>>>>>>> perceptible, > >>>>>>>>>>>>>> > >>>>>>>>>>>>> we'll > >>>>>>>> continue the discussion about introducing LOG_ONLY_SAFE. > >>>>>>>>>>>>>> Makes sense? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Yes, this sounds like the right approach. > >>>>>>>>>>>>>> > >>>> > > > >