Re: fs_fs core dumps in checksum code

Blair Zajac Fri, 13 Apr 2012 13:08:45 -0700

On 04/13/2012 12:45 AM, Julian Foad wrote:

Blair Zajac wrote:

Since we discussed this, we moved the Subversion server to a new box and
from RAID to FusionIO storage and we're still getting the core dumps
with the same stack trace, so I don't think its memory corruption.


I meant I suspect corruption of this process's state by any mechanism, which could be 
buffer overflows, bad multi-threading, and so on.  You wrote before, "I'll run our 
our backend severs on the dev cluster in valgrind and see if we pick up anything 
there."  Were you able to try that?  Or just load the core dump files into GDB and 
see what you can see?

We didn't do valgrind, the box is in production and it would be tooslow. Maybe I can set up a dev process on the production server to test.

To do valgrind well, do I need to recompile APR with specific flags toenable pool debugging?

Yesterday, we got two core dumps within 30 minutes of each other.

Would looking at the txn files in progress tell us anything?

[...]

Having the empty files, such as changes, is that odd?  Could that be a hint?


No, that's not interesting, that's just the result of crashing out at the point 
where it did -- in the middle of doing a commit.

The 'changes' is created during the commit process and not building thetransaction? If so, then having an empty changes file is odd andprobably only possible through the RPCS API we wrote that wraps svn_fs.hand svn_repos.h, in which case, could there be a bug with trying tocommit empty transactions in a multithreaded environment?


Blair

Re: fs_fs core dumps in checksum code

Reply via email to