Justin, I think the fsync I was concerning and the torn page problem are two different things. But now I have a question about double write buffer. If we can detect a torn page by checking the top and bottom of a page, why would we still need double write buffer? If the page is consistent, then we use it, otherwise, we just discard it. Maybe this is a naive question. But please let me know. Thanks.
Xiaofei On Fri, May 8, 2015 at 9:24 PM, Justin Swanhart <greenl...@gmail.com> wrote: > Hi, > > The log does not have whole pages. Pages must not be torn for the > recovery process to work. A fsync is required when a page is written to > disk. During recovery all changes since the last checkpoint are replayed, > then transactions that do not have a commit marker are rolled back. This > is called roll forward/roll back recovery. > > --Justin > > On Fri, May 8, 2015 at 6:09 PM, Xiaofei Du <xiaofei.du...@gmail.com> > wrote: > >> Justin, >> >> I was thinking of if fsync is needed each time after a write. The >> operations are already in the log. So recovery can always be done from the >> log. The difference is that during recovery, we need to go back further in >> the log and it will take longer. But in that way, I guess it would be hard >> to coordinate with the kernel flush thread. >> >> Xiaofei >> >> On Fri, May 8, 2015 at 2:06 PM, Justin Swanhart <greenl...@gmail.com> >> wrote: >> >>> Hi, >>> >>> InnoDB recovery can not handle torn pages. An fsync is required to >>> ensure that the page is fully written to disk. This is also why the >>> doublewrite buffer is used. Before pages are written down to disk, they >>> are first written sequentially into the doublewrite buffer. This buffer is >>> synced, then async page writing can proceed. If the database crashes, the >>> pages in flight will be rewritten by the doublewrite buffer. The detection >>> mechanism for torn pages comes from an LSN, which is written into the top >>> and the bottom of the page. If the LSN at the top and bottom do not match >>> the page is torn. >>> >>> Regards, >>> >>> --Justin >>> >>> On Fri, May 8, 2015 at 12:43 PM, Xiaofei Du <xiaofei.du...@gmail.com> >>> wrote: >>> >>>> Laurynas, >>>> >>>> This is exactly what I was looking for. I went through these functions >>>> before. I disabled double write buffer, so I didn't pay attention to code >>>> under buf_dblwr... The reason I asked this question is because I didn't >>>> know how the recovery process works, so I was wondering if it's necessary >>>> to fsync after each write. It's a performance concern. Anyway, thank you >>>> very much! >>>> >>>> Jan -- Thank you for your answer too! >>>> >>>> Xiaofei >>>> >>>> On Thu, May 7, 2015 at 9:59 PM, Laurynas Biveinis < >>>> laurynas.bivei...@gmail.com> wrote: >>>> >>>>> Xiaofei - >>>>> >>>>> fsync is performed for all the flush types (LRU, flush, single page) >>>>> if it is asked for (innodb_flush_method != O_DIRECT_NO_FSYNC). The >>>>> apparent difference in sync and async is not because of the sync >>>>> difference itself, but because of the flush type difference. The >>>>> single page flush flushes one page, and requests a fsync for its file. >>>>> Other flushes flush in batches, don't have to fsync for each written >>>>> page individually but rather sync once at the end. Then doublewrite >>>>> complicates this further. If it is disabled, fsync will happen in >>>>> buf_dblwr_sync_datafiles called from buf_dblwr_flush_buffered_writes >>>>> called from buf_flush_common called at the end of either LRU or flush >>>>> list flush. If doublewrite is enabled, fsync will happen in >>>>> buf_dblwr_update called from buf_flush_write_complete. >>>>> >>>>> >>>>> >>>>> >>>>> 2015-05-07 9:01 GMT+03:00 Xiaofei Du <xiaofei.du...@gmail.com>: >>>>> > Hi Laurynas, >>>>> > >>>>> > On Wed, May 6, 2015 at 9:14 PM, Laurynas Biveinis >>>>> > <laurynas.bivei...@gmail.com> wrote: >>>>> >> >>>>> >> Xiaofei - >>>>> >> >>>>> >> > Does InnoDB maintain a dirty >>>>> >> > page table? >>>>> >> >>>>> >> You must be referring to the buffer pool flush_list. >>>>> > >>>>> > >>>>> > You are right. The flush_list is can be used for recovery and >>>>> checkpoint. >>>>> > >>>>> >> >>>>> >> >>>>> >> > Is fsync called to guarantee the page to be on persistent >>>>> >> > storage so that the dirty page table can be updated? If this is >>>>> the >>>>> >> > case, >>>>> >> > when is the dirty page table updated for asynchronous IOs? >>>>> >> >>>>> >> Check buf_flush_write_complete in buf0flu.cc. For async IO it is >>>>> >> called from buf_page_io_complete in buf0buf.cc. >>>>> > >>>>> > >>>>> > You are right that this is the place it updates the dirty page >>>>> information. >>>>> > But I still don't understand why the fsync is needed for synchronous >>>>> IOs, >>>>> > but not for the AIOs. Jan Lindstrom said fsync is also called for >>>>> other AIO >>>>> > operations. But I could only it true in one of many AIO operations. >>>>> Or maybe >>>>> > I am missing something still? >>>>> > >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Laurynas >>>>> > >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Laurynas >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~maria-discuss >>>> Post to : maria-discuss@lists.launchpad.net >>>> Unsubscribe : https://launchpad.net/~maria-discuss >>>> More help : https://help.launchpad.net/ListHelp >>>> >>>> >>> >> >
_______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp