On sabato 5 maggio 2007, Jan Ploski wrote:
> Jeff Dike wrote:
> > On Fri, May 04, 2007 at 07:30:36PM +0200, Jan Ploski wrote:
> >>I am experimenting with UML in a HPC cluster. What I do is basically
> >> start up 60 instances all at once, a bunch of instances on each hardware
> >> node, using the resource manager TORQUE. Each instance gets a different
> >> umid. The instances are configured to boot up, execute a job and halt
> >> after that. Most of the times it works very well. However, every now and
> >> then some instance of the 60 will get stuck with the infamous "INIT: Id
> >> 0 respawning too fast" message at boot and consequently neither run the
> >> job nor terminate.
> >>
> >>So far I have found mentions of two possible causes for this problem: 1)
> >>wrong name of the tty device in inittab 2) /lib/tls problem. Neither
> >>applies in my case (/dev/tty0 is correct, and I have already renamed
> >>/lib/tls, just in case).
> >
> > These would cause problems all the time, not sporadically as you're
> > seeing.
> >
> >>As I can reproduce the problem "statistically" (quite reliably in the
> >>cluster context) but not at will when running a single instance from the
> >>command line, my question is: how should I proceed about troubleshooting
> >>it? Are there any locations in the UML kernel code where I could insert
> >>some debug statements (or maybe delays? maybe the problem is
> >>timing-related somehow?) to gather useful diagnostic information?
> >
> > Is it possible that it is caused by confusion about how quickly real
> > time is progressing compared to how much computation is happening in
> > that time?  By default, UML will match its time to the host, with the
> > effect that, on a busy system, it will see time progressing quickly
> > compared to the work it's doing.
> >
> > If so, then disable CONFIG_UML_REAL_TIME_CLOCK, and use
> > 2.6.21-rc7-mm2, which has a fix in this area, and see if that makes
> > any difference.
>
> Jeff,
>
> I'm having trouble applying the 2.6.21-rc7-mm2 patch against 2.6.21
> sources - lots of rejected hunks (but not all) when I run patch -p1 <
> 2.6.21-rc7-mm2, and the kernel does not compile after that. I have never
> used mm kernels before and Google did not help identify my mistake. Can
> you give me a hint about how/against which target to apply this patch?

It will apply perfectly on top of 2.6.21-rc7, which you can find here:

http://www.kernel.org/pub/linux/kernel/v2.6/testing/

Patch (on top of 2.6.20):
http://www.kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.21-rc7.bz2
or full (40M) tarball:
http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.21-rc7.tar.bz2
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
User-mode-linux-user mailing list
User-mode-linux-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user

Reply via email to