Hi, If it is not the processes maybe it is the harddisk/filesystem ?
Log files being written by multiple users to different files can have a harsh effect on the performance of the operating system if the harddisk isn't up to it. As I recall from my university sessions :) Cadence tends to write hundreds of small files, and then bind them all into one big simulation file, not sure if it is tweakable - I wasn't the admin back then :) On Tuesday 10 June 2008 20:14:27 Ira Abramov wrote: > still at the client with the VLSI tools. Some of the users here are > running heavy simulations (all userspace, almost 0 kernel time), at > times a single process can hog the entire system. I have no idea how > that happens, as this is a fairly modern kernel (the slightly older > scheduler of RHEL4's 2.6.9) and the Cadence tools are not using > lightw×–ight procs, so all the load is on a single core (on a quad Xeon) > and yet once it starts the whole machine is choked, and I can only hit > the reset. > > step 1: I asked them all to nice down the jobs, but they are not very > happy to. I'm trying to educate them and make them use wrappers (I'm > introducing condor here anyway) > > step2: I have set up the root's .bashrc to renice me up to -4 and so I > can keep a session active for the next time this happens and at least be > able to run "top" and "kill" > > step3: I need a monitor to alert and maybe kill or renice such processes > when they pop up and drag the machine down to a halt. till I find out > who the culprit is, I don't have a procname and so "monit" is not a good > choice. any other good ideas? > > step4: how do I log this without overlogging? some sort of a smart > process auditing daemon? I don't want to improvise with shell scripts > and cron, grepping from PS, because when the excrement impacts the venta > it may not be able to run (unless I hike the crond's priority to a > negative nice). I need a small reliable C proggy to do the right thing. > > the obvious is maybe to set some ulimits on the users, but I don't want > to limit heavy processes that do NOT choke the system. -- Noam Rathaus CTO [EMAIL PROTECTED] http://www.beyondsecurity.com "Know that you are safe." Beyond Security Finalist for the "Red Herring 100 Global" Awards 2007 ================================================================To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]