On Mon, Oct 21, 2019 at 12:06:32PM +0100, Ian Jackson wrote:
> Jürgen Groß writes ("Re: [Xen-devel] [xen-unstable test] 142973: regressions 
> - FAIL"):
> > On 21.10.19 10:23, osstest service owner wrote:
> > > flight 142973 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/142973/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >   test-amd64-amd64-xl-pvshim   18 guest-localmigrate/x10   fail REGR. vs. 
> > > 142750
> > 
> > Roger, I believe you have looked into that one?
> > 
> > I guess the conversation via IRC with Ian regarding the race between
> > blkback and OSStest was related to the issue?
> I think this failure is something else.

I agree.

> What happens here is this:
> 2019-10-21 02:58:32 Z executing ssh ... -v root@ date 
> [bounch of output from ssh]
> status (timed out) at Osstest/TestSupport.pm line 550.
> 2019-10-21 02:58:42 Z exit status 4
> is the guest here.  Ie, `ssh date guest' took longer
> than 10s.
> We can see that the guest networking is working soon after the
> migration because we got most of the way through the ssh protocol
> exchange.  On the previous repetition the next message from ssh was
>    debug1: SSH2_MSG_SERVICE_ACCEPT received
> Looking at
> http://logs.test-lab.xenproject.org/osstest/logs/142973/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
> which is, I think, the log of the "new" instance of guest, after
> migration, there are messages about killing various services.  Eg
>   [1918064738.820550] systemd[1]: systemd-udevd.service: Main process
>   exited, code=killed, status=6/ABRT
> They don't seem to be normal.  For example:
> http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest--incoming.log
> is the previous xen-unstable flight and it doesn't have them.  I
> looked in
> http://logs.test-lab.xenproject.org/osstest/logs/142865/test-amd64-amd64-xl-pvshim/rimava1---var-log-xen-console-guest-debian.guest.osstest.log.gz
> too and that has some alarming messages from the kernel like
>  [  686.692660] rcu_sched kthread starved for 1918092123128 jiffies!
>  g18446744073709551359 c18446744073709551358 f0x0 RCU_GP_WAIT_FQS(3)
>  ->state=0x0 ->cpu=0
> and accompanying stack traces.  But the test passed there.  I think
> that is probably something else ?

AFAICT there's corruption when migrating and also some kind of
lockup, not sure if those are related or not yet.

> ABRT suggests guest memory corruption.
> > If this is the case, could you, Ian, please add the workaround you were
> > thinking of to OSStest (unconditional by now, maybe make it condtitional
> > later)?
> I can add the block race workaround but I don't think it will help
> with migration anyway.  The case where things go wrong is destroy.
> Roger, am I right that a normal guest shutdown is race-free ?  I think
> we tear things down in a slower manner and will therefore end up
> waiting for blkback ?  Or is that not true ?

It doesn't really matter whether shutdown or destroy is used, the
issue is that blkback switches to state 6 (Closed) before the disk is
closed, and hence there's no way for the toolstack to detect when the
disk has actually been released.

> Maybe the right workaround is to disable the code in osstest which
> tries to clean up a previous failed run.  I think the kernel doesn't
> mind multiple blkfronts (or indeed multiple other tasks) using the
> same device at once.

Since the action when the disk is found to be in use is to try to
unmount it, maybe osstest should make sure the disk is actually
mounted first by parsing the output of mount? (or maybe there's a
better way to do it)

Thanks, Roger.

Xen-devel mailing list

Reply via email to