Hey there, One of the links recently provided by Daniel Klima pointed to a way to enable write caching even on USB devices. So, I could use my Windows installation for experiments now without the risk of brick-ing 2 grand worth of disks by pulling the plug tens of times.
-- Stefan^2. TL;DR ===== FlushFileBuffers operates on whole files, not just the parts written through the respective handle. Not calling it after rename results in potential data loss. Calling it after rename eliminates the problem at least in most cases. Setup: ===== I used the attached program to conduct 3 different experiments, each trying a specific modification / fsync sequence. All would write to an USB stick which had OS write cache enabled for it in Windows 7. All tests run an unlimited number of iterations - until there is an I/O error (e.g. caused by disconnecting out the drive). For each run, separate files and different file contents will being written ("run number xyz", repeated many times). So, we can determine which file contents is complete and correct and whether all files are present. Each successful iteration is logged to the console. We expect the data for all these to be complete. The stick got yanked out at a random point in time, reconnected after about a minute, chkdsk /f run on it and then the program output would be compared with the USB stick's content. Experiment 1: fsync a file written through a different handle. ============================================== Write the same contents to two files, write the same contents 100x alternating between the two files. Both files are the same size >1MB and should be similarly "important" to the OS. Close both files. Re-open the one written last and fsync it. This re-open scenario is similar to what we do with the protorev file. Results: * 10 runs were made, between 17 and 84 iterations each. * 10x, the fsync'ed file and its contents has been complete * 10x, the non-synced files were present and showed the correct file size. The contents of the last few of them were NUL bytes. Interpretation: Re-opening a file and fsync'ing it flushes *all* content changes for that file - at least on Windows. The way we handle the protorev file is correct. Experiment 2: fsync before but not after rename ======================================= This mimics the core of our "move-in-place" logic: Write a small-ish file (here: 10 .. 20k to not get folded into the MFT) with some temporary name, fsync and close it. Rename to its final name in the same folder. Results: * 5 runs were made, between 182 and 435 iterations each. * 1x the final file existed with the correct contents * 3x the file .temp file existed for the last completed iteration. * 1x even the final file for the previous iteration contained NULs. After that run, chkdsk reported and fixed a large number of issues. Interpretation: Not fsync'ing after rename will lead to data loss even with NTFS. IOW, we don't have transactional guarantees for "commit" on Windows servers at the moment. The last case with the more severe corruption may be due to the storage device not handling its buffers correctly. The only thing we can do here is tell people to use battery- backed storage. Experiment 3: fsync before but *and* after rename ======================================= Same as above but re-open the file after rename and fsync it. Results: * 10 runs were made, between 127 and 1984 iterations each. * 7x the final file existed with the correct contents * 1x the next temp already existed with size 0 (this is also a correct state; the last complete iteration's final file existed with the correct contents) * 1x the next temp already existed with correct contents (correct, same as before) * 1x the last final file was missing, there was no temp file and the previous final file contained invalid data. After that run, there were various issues fixed by chkdsk. It was also the run with the most iterations. Interpretation: In 90% of the runs, fsync'ing after rename resulted in correct disk contents. This is much better than the results in Experiment 2. The remainder may be due to limitations of the storage device and has been observed in Exp. 2 as well.
// FSyncExperiment.cpp : Defines the entry point for the console application. // #include "windows.h" #include <stdio.h> #include <tchar.h> #include <string> #include <iostream> static HANDLE open_file(const std::string& path) { return CreateFile(path.c_str(), GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); } static bool write_file(HANDLE file, const std::string &data) { const void *buffer = data.c_str(); DWORD size = data.length(); DWORD written; return WriteFile(file, buffer, size, &written, NULL) && written == size; } static std::string make_buffer(const std::string& run) { std::string buffer = "run number " + run + "\r\n"; for (int i = 0; i < 10; ++i) buffer += buffer; return buffer; } static bool fsync(const std::string& path) { HANDLE file = open_file(path); if (file == INVALID_HANDLE_VALUE) return false; if (!FlushFileBuffers(file)) return false; CloseHandle(file); return true; } static bool reopen_fsync_run(const std::string & parent_path, const std::string& run) { std::string common_path = parent_path + "\\reopen_fsync_" + run; std::string fsync_path = common_path + ".sync"; std::string nosync_path = common_path + ".nosync"; std::string buffer = make_buffer(run); HANDLE nosync_file = open_file(nosync_path); if (nosync_file == INVALID_HANDLE_VALUE) return false; HANDLE fsync_file = open_file(fsync_path); if (fsync_file == INVALID_HANDLE_VALUE) return false; for (int i = 0; i < 100; ++i) { if (!write_file(nosync_file, buffer)) return false; if (!write_file(fsync_file, buffer)) return false; } if (!CloseHandle(nosync_file)) return false; if (!CloseHandle(fsync_file)) return false; return fsync(fsync_path); } static bool rename_run(const std::string & parent_path, const std::string& run, bool fsync_after_rename) { std::string common_path = parent_path + "\\rename_" + run; std::string temp_path = common_path + ".temp"; std::string final_path = common_path + ".final"; std::string buffer = make_buffer(run); HANDLE temp_file = open_file(temp_path); if (temp_file == INVALID_HANDLE_VALUE) return false; if (!write_file(temp_file, buffer)) return false; if (!FlushFileBuffers(temp_file)) return false; if (!CloseHandle(temp_file)) return false; if (!MoveFile(temp_path.c_str(), final_path.c_str())) return false; if (fsync_after_rename) return fsync(final_path); return true; } static void test_reopen_fsync(const std::string& path) { for (int run = 1; reopen_fsync_run(path, std::to_string(run)); ++run) std::cout << "Last fsync'ed run: " << run << std::endl; } static void test_rename(const std::string& path) { for (int run = 1; rename_run(path, std::to_string(run), false); ++run) std::cout << "Last complete rename run: " << run << std::endl; } static void test_rename_fsync(const std::string& path) { for (int run = 1; rename_run(path, std::to_string(run), true); ++run) std::cout << "Last fsync'ed rename run: " << run << std::endl; } int _tmain(int argc, _TCHAR* argv[]) { const std::string base_path = "E:"; // test_reopen_fsync(base_path + "\\reopen_fsync"); // test_rename(base_path + "\\rename"); test_rename_fsync(base_path + "\\rename_fsync"); return 0; }