On Thu, 17 Aug 2006, Petr Stehlik wrote:
> Finn Thain wrote: > > > > difficult to reproduce the bug? > > > It's kinda random. > > > > In that case, it might be necessary to make the scheduler behave in a > > more derministic way (maybe realtime priority?). Single-user mode > > would help. > > I could try upgrading the sarge to etch in single-user mode to see if it > changes something. Yes, but that won't really help to isolate a workload that fails every time. The upgrade will operate differently a second time. I guess you could backup the hard disk image first. Single user-mode was just a way to try to eliminate non-deterministic scheduler behaviour in the interests of repeatability, by making sure that there were no other runnable processes in the system. > > I'd create a script, say /root/crash.sh, make it executable, and boot > > the kernel with "init=/root/crash.sh". In crash.sh I'd run some > > single-threaded stress tests. > > > > http://samba.org/ftp/tridge/dbench/README > > http://weather.ou.edu/~apw/projects/stress/ > > http://www.bitmover.com/lmbench/ > > FYI, I have just finished the following test: > # stress -c 4 -i 16 -m 3 --vm-bytes 32M -d 4 --hdd-bytes 128M > > It's been running for almost 5 hours. No problem detected. On another > console I ran while(true) do uptime; sleep 300; done and saw a > consistent load of 28-29. That is a long run queue. If you did find a problem that way, it could be very hard to reproduce because of the interactions of all the tasks. > So the machine was busy stressing CPU, memory and disk but it didn't > detect anything wrong. Well, maybe we need to concentrate on I/O. I'd try continuous tripwire checks, or a similar intrusion detection system. > > > If you can't reproduce the problem that way, I'd try introducing more > > context switching into the workload. > > like stress -c 1k instead of -c 4? To get a single threaded test, I'd be trying -c 0 -i 0 -m 0 but maybe 1 fork is the minimum (?) > > > s!/usr/bin/perl > > > > Are you sure the problem was not confined to the buffer cache? > > I am not sure at all. If we are going to test disk I/O, we must find a way to disable the buffer cache completely. Does anyone know how to do this? -f > > Re-reading the same file after an unmount/remount would determine that. > > will try the next time. > > Petr > > > -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]