Re: [perf-discuss] Means to reduce core-dumping impact on performance critical system?

Bart Smaalders Tue, 18 Apr 2006 18:35:58 -0700

Philip Beevers wrote:

David McDaniel wrote:
Not sure if this is really the right board for this, but here goes. Ina performance critical, highly available application, when amisbehaving process core dumps the creation of the corefile (gigabytesin size) puts a lot of pressure on the abililty to restart and recoverin a timely fashion.
In my experience, this is down to memory pressure or simply theadditional IO load of dumping out such a large core. I've seenparticularly slow core dumping when the system has to swap pages back insimply to write them out to the core file! Worse still, there's areasonable chance such a large core file will run you out of disk space.
Our application deals with this by stopping the dumping of core filesentirely - we do this through ulimit (setting max core file size to 0),but it can also be done with coreadm. We then have application codewhich catches the signals causing core dumps, prints out the stack(using printstack(3C)) and exits; obviously you have to be very carefulto only use functions which are async signal safe in such a handler.This removes the ability to poke around in the entrails of the corefile, but does give you the key piece of information - where the processwas when it crashed.
This isn't perfect - it would also be worth looking at what coreadm cangive you. For example, I think you can simulate what we do - just muchmore simply and reliably - by using coreadm to just specify that thestack (or perhaps stack and heap, to give you the option of pokingaround the entrails after the crash) should be dumped to a file. Ourcurrent approach evolved before coreadm was around, and I've not gotround to revisiting it.


I would think the perceptible performance problem is due to
flooding the disk w/ write requests; the read requests for your
next page fault end up stuck behind this flood of writes.

Besides running ZFS, which implements a rather clever IO
scheduler in the filesystem to avoid exactly this sort of
read starvation, the use of coreadm to put core files onto
disks or NFS servers that will cope w/ a flood of IO is
a good idea.  This would also help diagnose the
actual cause of the problem.

In general, we strongly encourage ISVs not to disable core
dumping as it makes finding that once every 6 month crash
very difficult indeed.

- Bart

--
Bart Smaalders                  Solaris Kernel Performance
[EMAIL PROTECTED]               http://blogs.sun.com/barts
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Re: [perf-discuss] Means to reduce core-dumping impact on performance critical system?

Reply via email to