On Feb 9, 2015, at 18:51, James Gritton <ja...@freebsd.org> wrote:

> On 2015-02-06 22:23, Garrett Cooper wrote:
>> On Feb 6, 2015, at 18:38, James Gritton <ja...@freebsd.org> wrote:
>>> On 2015-02-06 19:23, Garrett Cooper wrote:
>>>> I think you broke the Jenkins tests runs, and potentially jail support
>>>> in some edgecases:
>>>> https://jenkins.freebsd.org/job/FreeBSD_HEAD-tests2/651/
>>> Where do I go from here?  There error you refer to certainly seems 
>>> jail-related, which leads me to guess at something disconnected between the 
>>> matching rc.d/jail and jail(8) change (i.e. using the new rc file with the 
>>> old jail program).  But that's really just a wild guess.  Is there 
>>> somewhere I look for more information?  For example, where does Jenkins 
>>> actually do its thing?
>>> Sorry for being so stupid in this - Jenkins has only been on the very edge 
>>> of my awareness until now.
>> I honestly don’t think it’s Jenkins because Jenkins runs in bhyve. I
>> think you accidentally broke option handling in the jail configuration
>> (please see my other reply about added “break;” statements).
>> ...
>> You can verify your changes by doing:
>> % (cd /usr/tests/bin/pkill; sudo kyua test)
> 
> After some testing and looking around, I've decided the problem definitely 
> isn't in rc.d where I thought it might be.  I've also decided it's probably 
> not in my patch either.
> 
> I've run this kyua test on a 10 system (don't have current handy for such 
> things at the moment), and sometimes I would see a failure and sometimes I 
> wouldn't.  This was whether I was using the new or old jail code.  Later in 
> the day, when the box was less loaded, it seemed to always pass.  Looking at 
> the pkill-j_test script, I see jails being created with sleep commands both 
> inside and outside the jail around its creation.  I'm guessing this script is 
> very sensitive to timing issues that could be cause by (among other things) 
> system load.  The jail commands in this script were also very simple, with 
> the only parameters used being: path, name, ip4.addr, and command.  This 
> isn't some kind of esoteric exercising of the jail(8) options, and I would 
> expect if it works at one time it would work at another.  I've "hand-run" 
> these particular jail commands and couldn't get them to fail (and the actual 
> content of the jail(8) changes were tests already).
> 
> I looked at the freebsd-current (I think) list where the Jenkins errors are 
> posted, and it's true it started failing the pkill-j test at the time I made 
> my change.  But it's also true that it had failed that test once the day 
> before my change, and then started passing it again.  This particular test 
> just seems to be fragile.
> 
> So I don't have anywhere else to go with this.  I'm going to assume jail(8) 
> isn't the problem here.

The tests are racy and make some interesting assumptions. It appears that 
WITNESS plays a part in it, and I bet VIMAGE (something that I don’t have in my 
kernel config) plays a part in it too. I say this because I just ran into the 
issue when running the tests in a tight loop on my VMware workstation 7 
instance with code from r278636.

Doesn’t surprise me because before r272305, it was failing consistently on 
head, so what Craig did in that commit helped, but it didn’t fully fix the 
raciness of the tests.

I’m going to recompile my system with VIMAGE and see if that impacts 
performance of the tests, and if so, I’ll adjust the sleep between setting up 
the jailed instances, and waiting for them to be fully formed.

Thanks!

$ while : ; do sudo prove -rv pgrep-j_test.sh || break; done
pgrep-j_test.sh .. 
1..3
usage: pgrep [-LSfilnoqvx] [-d delim] [-F pidfile] [-G gid] [-M core] [-N 
system]
             [-P ppid] [-U uid] [-c class] [-g pgrp] [-j jid]
             [-s sid] [-t tty] [-u euid] pattern ...
not ok 1 - pgrep -j <jid> # pgrep output: '', pidfile output: '74275 74278'
ok 2 - pgrep -j any
ok 3 - pgrep -j none
Failed 1/3 subtests 

Test Summary Report
-------------------
pgrep-j_test.sh (Wstat: 0 Tests: 3 Failed: 1)
  Failed test:  1
Files=1, Tests=3,  5 wallclock secs ( 0.04 usr  0.02 sys +  0.02 cusr  0.55 
csys =  0.63 CPU)
Result: FAIL

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to