Laurynas, We cannot recover from a torn page only using redo log. But wouldn't undo log record enough information for recovery in the case of a torn page? Undo log should have old values of affected rows. So shouldn't it be enough to recover a torn page using information from undo log?
Xiaofei On Sat, May 9, 2015 at 12:07 AM, Laurynas Biveinis < laurynas.bivei...@gmail.com> wrote: > Xiaofei - > > We can indeed detect the torn page write without the doublewrite > buffer (and WebScaleSQL has a patch utilising this observation). But > we need not only to detect, but to recover the page as well. And > without the doublewrite, if we discard the page, we have nothing: a > half-old half-new page on the disk and the redo log records for that > page are not enough to recover it. > > 2015-05-09 8:44 GMT+03:00 Xiaofei Du <xiaofei.du...@gmail.com>: > > Justin, > > > > I think the fsync I was concerning and the torn page problem are two > > different things. But now I have a question about double write buffer. > If we > > can detect a torn page by checking the top and bottom of a page, why > would > > we still need double write buffer? If the page is consistent, then we use > > it, otherwise, we just discard it. Maybe this is a naive question. But > > please let me know. Thanks. > > > > Xiaofei > > > > On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenl...@gmail.com> > wrote: > >> > >> Hi, > >> > >> The log does not have whole pages. Pages must not be torn for the > >> recovery process to work. A fsync is required when a page is written to > >> disk. During recovery all changes since the last checkpoint are > replayed, > >> then transactions that do not have a commit marker are rolled back. > This is > >> called roll forward/roll back recovery. > >> > >> --Justin > >> > >> On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du...@gmail.com> > >> wrote: > >>> > >>> Justin, > >>> > >>> I was thinking of if fsync is needed each time after a write. The > >>> operations are already in the log. So recovery can always be done from > the > >>> log. The difference is that during recovery, we need to go back > further in > >>> the log and it will take longer. But in that way, I guess it would be > hard > >>> to coordinate with the kernel flush thread. > >>> > >>> Xiaofei > >>> > >>> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenl...@gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> InnoDB recovery can not handle torn pages. An fsync is required to > >>>> ensure that the page is fully written to disk. This is also why the > >>>> doublewrite buffer is used. Before pages are written down to disk, > they are > >>>> first written sequentially into the doublewrite buffer. This buffer > is > >>>> synced, then async page writing can proceed. If the database > crashes, the > >>>> pages in flight will be rewritten by the doublewrite buffer. The > detection > >>>> mechanism for torn pages comes from an LSN, which is written into the > top > >>>> and the bottom of the page. If the LSN at the top and bottom do not > match > >>>> the page is torn. > >>>> > >>>> Regards, > >>>> > >>>> --Justin > >>>> > >>>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du...@gmail.com> > >>>> wrote: > >>>>> > >>>>> Laurynas, > >>>>> > >>>>> This is exactly what I was looking for. I went through these > functions > >>>>> before. I disabled double write buffer, so I didn't pay attention to > code > >>>>> under buf_dblwr... The reason I asked this question is because I > didn't know > >>>>> how the recovery process works, so I was wondering if it's necessary > to > >>>>> fsync after each write. It's a performance concern. Anyway, thank > you very > >>>>> much! > >>>>> > >>>>> Jan -- Thank you for your answer too! > >>>>> > >>>>> Xiaofei > >>>>> > >>>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis > >>>>> <laurynas.bivei...@gmail.com> wrote: > >>>>>> > >>>>>> Xiaofei - > >>>>>> > >>>>>> fsync is performed for all the flush types (LRU, flush, single page) > >>>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The > >>>>>> apparent difference in sync and async is not because of the sync > >>>>>> difference itself, but because of the flush type difference. The > >>>>>> single page flush flushes one page, and requests a fsync for its > file. > >>>>>> Other flushes flush in batches, don't have to fsync for each written > >>>>>> page individually but rather sync once at the end. Then doublewrite > >>>>>> complicates this further. If it is disabled, fsync will happen in > >>>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes > >>>>>> called from buf_flush_common called at the end of either LRU or > flush > >>>>>> list flush. If doublewrite is enabled, fsync will happen in > >>>>>> buf_dblwr_update called from buf_flush_write_complete. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du...@gmail.com>: > >>>>>> > Hi Laurynas, > >>>>>> > > >>>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis > >>>>>> > <laurynas.bivei...@gmail.com> wrote: > >>>>>> >> > >>>>>> >> Xiaofei - > >>>>>> >> > >>>>>> >> > Does InnoDB maintain a dirty > >>>>>> >> > page table? > >>>>>> >> > >>>>>> >> You must be referring to the buffer pool flush_list. > >>>>>> > > >>>>>> > > >>>>>> > You are right. The flush_list is can be used for recovery and > >>>>>> > checkpoint. > >>>>>> > > >>>>>> >> > >>>>>> >> > >>>>>> >> > Is fsync called to guarantee the page to be on persistent > >>>>>> >> > storage so that the dirty page table can be updated? If this is > >>>>>> >> > the > >>>>>> >> > case, > >>>>>> >> > when is the dirty page table updated for asynchronous IOs? > >>>>>> >> > >>>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is > >>>>>> >> called from buf_page_io_complete in buf0buf.cc. > >>>>>> > > >>>>>> > > >>>>>> > You are right that this is the place it updates the dirty page > >>>>>> > information. > >>>>>> > But I still don't understand why the fsync is needed for > synchronous > >>>>>> > IOs, > >>>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for > >>>>>> > other AIO > >>>>>> > operations. But I could only it true in one of many AIO > operations. > >>>>>> > Or maybe > >>>>>> > I am missing something still? > >>>>>> > > >>>>>> >> > >>>>>> >> > >>>>>> >> -- > >>>>>> >> Laurynas > >>>>>> > > >>>>>> > > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Laurynas > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Mailing list: https://launchpad.net/~maria-discuss > >>>>> Post to : maria-discuss@lists.launchpad.net > >>>>> Unsubscribe : https://launchpad.net/~maria-discuss > >>>>> More help : https://help.launchpad.net/ListHelp > >>>>> > >>>> > >>> > >> > > > > > > -- > Laurynas >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp