On 04/13/2012 12:45 AM, Julian Foad wrote:
Blair Zajac wrote:

Since we discussed this, we moved the Subversion server to a new box and
from RAID to FusionIO storage and we're still getting the core dumps
with the same stack trace, so I don't think its memory corruption.

I meant I suspect corruption of this process's state by any mechanism, which could be 
buffer overflows, bad multi-threading, and so on.  You wrote before, "I'll run our 
our backend severs on the dev cluster in valgrind and see if we pick up anything 
there."  Were you able to try that?  Or just load the core dump files into GDB and 
see what you can see?

We didn't do valgrind, the box is in production and it would be too slow. Maybe I can set up a dev process on the production server to test.

To do valgrind well, do I need to recompile APR with specific flags to enable pool debugging?

Yesterday, we got two core dumps within 30 minutes of each other.

Would looking at the txn files in progress tell us anything?
[...]
Having the empty files, such as changes, is that odd?  Could that be a hint?

No, that's not interesting, that's just the result of crashing out at the point 
where it did -- in the middle of doing a commit.

The 'changes' is created during the commit process and not building the transaction? If so, then having an empty changes file is odd and probably only possible through the RPCS API we wrote that wraps svn_fs.h and svn_repos.h, in which case, could there be a bug with trying to commit empty transactions in a multithreaded environment?

Blair

Reply via email to