Data races in C/C++ code are a class of bugs that can severely impact
stability of the product while being hard to reproduce and debug.
Furthermore, data races are undefined behavior and can lead to
unforeseeable code behavior once compilers exploit this fact for better
optimizations. We have evidence that data races can cause intermittent
crashes and use-after-free memory safety violations that are hard to detect
by the existing sanitizers (e.g. AddressSanitizer) due to their
intermittent behavior.

ThreadSanitizer <https://clang.llvm.org/docs/ThreadSanitizer.html> (TSan)
is another sanitizer, specifically aimed at detecting data races and
related problems (e.g. mutex ordering issues, potential deadlock
situations, etc).

One of the problems with deploying ThreadSanitizer in CI is that we have a
fair amount of existing data races that orange pretty much every test we
have. In order to solve this situation, we are currently working on the
following strategy:


   1.

   Add a Linux TSan build as Tier1 to avoid build regressions (done in bug
   1590162 <https://bugzilla.mozilla.org/show_bug.cgi?id=1590162>)
   2.

   Run a set of tests and generate a runtime suppression list
   <https://searchfox.org/mozilla-central/source/mozglue/build/TsanOptions.cpp>
   for all of the existing issues.
   3.

   File the existing issues so we can track them (tracking bug is bug 929478
   <https://bugzilla.mozilla.org/show_bug.cgi?id=929478>).
   4.

   Enable now-green tests to avoid further regressions (tracked in bug
   1612711 <https://bugzilla.mozilla.org/show_bug.cgi?id=1612711>).


As part of this process is to file existing race reports, you might already
have seen related bug reports in your component. There is no need to
immediately react to these reports, but we would of course very much
appreciate it if they could eventually be triaged and fixed (Many of you
have done so already, thank you!). Keep in mind that some of these reports
might point to potential sources of instability and other intermittent
misbehavior, so there might be potential to eliminate some nasty bugs. In
fact, we have already identified several major issues in our codebase just
from running tests. If you identify such a case, we would also ask for you
to indicate this somehow in the bug, as we track such bugs separately to
assess the value of the tool.

It is also likely that you will see benign race reports (or at least
reports that look benign). Unfortunately, it is incredibly hard to tell if
a race is really benign or not [1][2][3][4], so if an issue is easy to fix,
we suggest just fix it and not spend too much time on the analysis. There
might be cases where fixing a confirmed-benign race is not worth the
investment. In this case, we can add a permanent suppression. Since every
suppression costs some performance, we should try to use these carefully
though.

Overall we hope that this tool will make it easier for all of us to produce
more stable and secure code, debug existing issues more effectively and
maybe even move the needle when it comes to inexplicable crashes in
crash-stats.

[1]
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong


[2]
https://blog.mozilla.org/nfroyd/2015/02/20/finding-races-in-firefox-with-threadsanitizer/

[3]
https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/
[4] https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to