Hi, Andrew Morton wrote:
> On Thu, 29 Mar 2007 20:16:59 +0100 > David Howells <[EMAIL PROTECTED]> wrote: > >>Pavel Machek <[EMAIL PROTECTED]> wrote: >> >>>>Userland core dumper is useful because it is relatively easy to be >>>>customized, but its reliability highly depends on the application >>>>programs. >>> >>>Fix userland core dumper to be reliable, then. >> >>I don't think it's that easy. The userland core dumper, as I understand it, >>has to work *within* an application program (it's a library), thus the >>application program my scotch the core dumper in a couple of ways: > > That's no longer necessarily true with the recently-added > dump-to-an-application feature: > > core_pattern: > ... > . If the first character of the pattern is a '|', the kernel will treat > the rest of the pattern as a command to run. The core dump will be > written to the standard input of that program instead of to a file. I think dumping core over a pipe is almost good. Filtering and writing out a core by a separete userspace program can be reliable because it is independent of the failed user process. But I have one concern; data transfer over a pipe is slow. It took 7 seconds to transfer 2GB anonymous shared memory (detailed at the last of this mail). In the case of dumping hundreds processes which share giga bytes memory, it will take a few minutes even if the huge shared memory is not written to a disk. If a user wants to restart the failed application as soon as possible to keep downtime to a minimum, this extra transfer time will be a barrier. So I think in-kernel filtering is still needed. I performed a simple experiment to confirm the transfer speed: 1. run a program which allocates 2GB anonymous shared memory 2. the program receives SIGSEGV 3. core image is generated and passed to tp_pipe(*) over a pipe 4. tp_pipe reads the pipe and revokes the received data while data arrives 5. tp_pipe logs time of completing the data transfer 6. repeat 1 to 5 ten times and calculate the average and standard deviation (*) test program which I prepared for this experiment Here is the result: Average: 7.041 sec Standard deviation: 0.903 And cpuinfo and meminfo of my box: $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.343 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6404.82 clflush size : 64 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 1 cpu MHz : 3200.343 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6400.75 clflush size : 64 $ cat /proc/meminfo MemTotal: 8309240 kB MemFree: 8216036 kB Buffers: 11888 kB Cached: 45200 kB SwapCached: 0 kB Active: 42324 kB Inactive: 26008 kB HighTotal: 7470896 kB HighFree: 7405504 kB LowTotal: 838344 kB LowFree: 810532 kB SwapTotal: 16386260 kB SwapFree: 16386260 kB Dirty: 4 kB Writeback: 0 kB AnonPages: 11232 kB Mapped: 5828 kB Slab: 11916 kB SReclaimable: 5604 kB SUnreclaim: 6312 kB PageTables: 896 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 20540880 kB Committed_AS: 33540 kB VmallocTotal: 118776 kB VmallocUsed: 8324 kB VmallocChunk: 110408 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Thanks, -- Hidehiro Kawai Hitachi, Ltd., Systems Development Laboratory - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/