Hi, I see, this could be the problem. I'm still working on a file system implementation that reorders writes (I was on vacation for two weeks), I think I can test a fix soon.
Regards, Thomas On Monday, July 6, 2015, Nicolas Barbier <[email protected]> wrote: > Hello, > > I suspect that fsync’s are not issued at the right times when using > PageLog, causing a risk of corruption when the OS reorders writes and > crashes (e.g., power failure) in the middle of writing. > > To investigate this I have read the code a bit, which seemed to > confirm my suspicion. As I am not very well acquainted with the H2 > code, I have also performed a test using strace, to check whether > fsync’s are really performed in the wrong order w.r.t. writes (which > indeed seems to be the case). I describe the results here: > > I put some primitive logging at the following points: > > At the beginning of FileDisk.force (FilePathDisk.java:409): > > System.out.println("nicolas: FileDisk.force"); > new Exception().printStackTrace(); > > At the beginning of PageLog.flushOut (PageLog.java:855): > > System.out.println("nicolas: flushing log"); > > In PageStore.writePage (PageStore.java:1334) right before file.write is > called: > > System.out.println("nicolas: writing page"); > new Exception().printStackTrace(); > > I created a test program (attached) that just opens a database, > inserts one row, waits two seconds, and then closes the database. > > Then, I rebuilt H2 and ran the test program under strace as follows: > > strace -f java -cp > /home/itsme/bla/h2database-read-only/h2/bin/h2-1.4.187.jar:. > PageLogTest > strace.log 2>&1 > > I removed everything from this file that comes before “nicolas: > connected” (the result is attached). > > Then, I seached for “nicolas: flushing log” (found at line 67). The > flushing is performed by the WriterThread (as expected) during the 2 > second sleep. As part of the flushing H2 performs a write (“write(16, > ” at line 107, the stacktrace that confirms that this is part of the > log is right before it at line 74). This write is therefore in the > “log” part of the database file. > > The first fsync is located in the strace file at line 437. It is part > of the code that performs a checkpoint at shutdown. > > The problem now follows: A write is performed to the non-log part of > the database file, *before* the fsync mentioned above. At line 394 in > the strace file, a write is performed that is part of a call to > PageDataLeaf.write (see the logged stracktrace right before, that > starts at line 358). > > When the OS reorders these two writes so that the non-log (“data”) > write is issued to the disk before the write to the log, and the OS > crashes after the first write, the database becomes corrupt. > > What do you think? Am I missing something here? > > Nicolas > > -- > A. Because it breaks the logical sequence of discussion. > Q. Why is top posting bad? > > -- > You received this message because you are subscribed to the Google Groups > "H2 Database" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:;>. > To post to this group, send email to [email protected] > <javascript:;>. > Visit this group at http://groups.google.com/group/h2-database. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
