Evgeny Kotkov wrote on Tue, Feb 17, 2015 at 20:44:29 +0300: > - We attempt to solve "B) Taking down other threads" in 1.10 by carefully > examining the calling sites, how the caching behaves, etc., and aim towards > a guarantee that nothing will break with a non-abortive malfunction handler. > I am actually interested in making this happen.
What can we do if an assertion fails inside the cache implementation? I see three options: log it and continue; continue with cache disabled; abort. However, neither option is necessarily safe: - The first might cause data loss if the assertion was ultimately caused by, say, faulty hardware. (If we have faulty hardware, aborting could actually be the best option, since it prevents non-mod_dav_svn threads from experiencing data losses.) - The second might lead to unacceptable performance degradation. - The third will take down non-mod_dav_svn threads too, resulting in denial of service for the requests those threads were handling. How do we choose between three risks? Is this perhaps a place where it's justified to create a knob for the admin to set according to her own risk analysis? --- In general, we treat assertions as an all-or-nothing deal. The malfunction handler is process-global, and it either always raises or always aborts. I keep wondering if we shouldn't give malfunction handlers more information about the failure mode that was observed — e.g., where the assertion was invoked, why it was invoked, and how risky the assertion site deems it to continue — so as to let the malfunction handler make more informed abort-or-raise decisions. For example, assertions that verify preconditions often mean there is a bug in the caller but continued execution would be safe, whereas assertions in the bowels of svn_cache_* are more worrisome. But we use the same SVN_ERR_ASSERT() for both,¹ and the malfunction handler can't tell the difference. (Or maybe we just shouldn't be using SVN_ERR_ASSERT() for preconditions, but that's another way of saying that, yes, we should be making a distinction between failed preconditions and failed cache bowel invariants.) Daniel ¹ Examples: svn_fs_create_access(username=NULL) and svn_repos_authz_check_access(path="").