Data races in C/C++ code are a class of bugs that can severely impact stability of the product while being hard to reproduce and debug. Furthermore, data races are undefined behavior and can lead to unforeseeable code behavior once compilers exploit this fact for better optimizations. We have evidence that data races can cause intermittent crashes and use-after-free memory safety violations that are hard to detect by the existing sanitizers (e.g. AddressSanitizer) due to their intermittent behavior.
ThreadSanitizer <https://clang.llvm.org/docs/ThreadSanitizer.html> (TSan) is another sanitizer, specifically aimed at detecting data races and related problems (e.g. mutex ordering issues, potential deadlock situations, etc). One of the problems with deploying ThreadSanitizer in CI is that we have a fair amount of existing data races that orange pretty much every test we have. In order to solve this situation, we are currently working on the following strategy: 1. Add a Linux TSan build as Tier1 to avoid build regressions (done in bug 1590162 <https://bugzilla.mozilla.org/show_bug.cgi?id=1590162>) 2. Run a set of tests and generate a runtime suppression list <https://searchfox.org/mozilla-central/source/mozglue/build/TsanOptions.cpp> for all of the existing issues. 3. File the existing issues so we can track them (tracking bug is bug 929478 <https://bugzilla.mozilla.org/show_bug.cgi?id=929478>). 4. Enable now-green tests to avoid further regressions (tracked in bug 1612711 <https://bugzilla.mozilla.org/show_bug.cgi?id=1612711>). As part of this process is to file existing race reports, you might already have seen related bug reports in your component. There is no need to immediately react to these reports, but we would of course very much appreciate it if they could eventually be triaged and fixed (Many of you have done so already, thank you!). Keep in mind that some of these reports might point to potential sources of instability and other intermittent misbehavior, so there might be potential to eliminate some nasty bugs. In fact, we have already identified several major issues in our codebase just from running tests. If you identify such a case, we would also ask for you to indicate this somehow in the bug, as we track such bugs separately to assess the value of the tool. It is also likely that you will see benign race reports (or at least reports that look benign). Unfortunately, it is incredibly hard to tell if a race is really benign or not [1][2][3][4], so if an issue is easy to fix, we suggest just fix it and not spend too much time on the analysis. There might be cases where fixing a confirmed-benign race is not worth the investment. In this case, we can add a permanent suppression. Since every suppression costs some performance, we should try to use these carefully though. Overall we hope that this tool will make it easier for all of us to produce more stable and secure code, debug existing issues more effectively and maybe even move the needle when it comes to inexplicable crashes in crash-stats. [1] https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] https://blog.mozilla.org/nfroyd/2015/02/20/finding-races-in-firefox-with-threadsanitizer/ [3] https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/ [4] https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform