Re: Rule #2 should die

John Meinel Thu, 04 Jun 2015 20:41:56 -0700

Could it just be a lower priority catch-all rule and have the other rules
evaluated first?
It does seem useful to have a backstop against bad behavior when we
introduce a new failure method that didn't get caught.


John
=:->


On Fri, Jun 5, 2015 at 1:42 AM, Martin Packman <[email protected]
> wrote:

> Currently juju-reports has a rule matching on failures where our CI
> harness interrupted the test because it took too long:
>
> <http://reports.vapour.ws/releases/rules/2>
>
> This seems too generic a symptom, generally if a test is not
> completing within the time we've allocated for it, there's another
> indication in the log, often the final `juju status` output, that
> makes it clearer why juju never finished its work.
>
> Checking over the recent matches for rule #2:
>
> <
> http://reports.vapour.ws/releases/2733/job/gce-deploy-trusty-amd64/attempt/265
> >
>
>   "2":
>     agent-state-info: 'sending new instance request: GCE operation
> "operation-1433442378645-517b54fc8fe09-e1a9dd29-092d6cff"
>       failed'
>     instance-id: pending
>
> GCE failed to give us an instance.
>
>
> <
> http://reports.vapour.ws/releases/2732/job/canonistack-deploy-trusty-amd64/attempt/3092
> >
>
>   "1":
>     agent-state: pending
>     dns-name: 10.55.32.175
>     instance-id: ee22b864-47e9-4931-8bcb-92bbbe08f05e
>     instance-state: ACTIVE
>
> <
> http://data.vapour.ws/juju-ci/products/version-2732/canonistack-deploy-trusty-amd64/build-3092/machine-1/cloud-init.log.gz
> >
>
>     Jun  4 14:57:24 juju-canonistack-deploy-trusty-amd64-machine-1
> [CLOUDINIT] util.py[DEBUG]: Running command ['eatmydata', 'apt-get',
> '--option=Dpkg::Options::=--force-confold',
> '--option=Dpkg::options::=--force-unsafe-io', '--assume-yes',
> '--quiet', 'install', 'curl', 'cpu-checker', 'bridge-utils',
> 'rsyslog-gnutls', 'cloud-utils', 'cloud-image-utils', 'tmux'] with
> allowed return codes [0] (shell=False, capture=False)
>
> Super slow canonistack machine, still crawling along installing
> packages when we gave up.
>
>
> <
> http://reports.vapour.ws/releases/2732/job/functional-backup-restore/attempt/2702
> >
>
> error: cannot re-bootstrap environment: cannot bootstrap new instance:
> waited for 10m0s without being able to connect: ssh: connect to host
> 10.0.0.247 port 22: Connection timed out
>
> Not the best log, but seems clear we never got a usable bootstrap
> machine to restore into.
>
>
> <
> http://reports.vapour.ws/releases/2732/job/joyent-deploy-precise-amd64/attempt/2145
> >
>
>   "1":
>     agent-state: pending
>     dns-name: 165.225.128.214
>     instance-id: fc67a2b4-00ab-4571-e947-ebd68fd54f9b
>     instance-state: running
>
> <
> http://data.vapour.ws/juju-ci/products/version-2732/joyent-deploy-precise-amd64/build-2145/machine-1/cloud-init-output.log.gz
> >
>
>     Attempt 1 to download tools from
> https://10.112.2.15:17070/tools/1.25-alpha1-precise-amd64...
>     + curl -sSfw tools from %{url_effective} downloaded: HTTP
> %{http_code}; time %{time_total}s; size %{size_download} bytes; speed
> %{speed_download} bytes/s  --noproxy * --insecure -o
> /var/lib/juju/tools/1.25-alpha1-precise-amd64/tools.tar.gz
> https://10.112.2.15:17070/tools/1.25-alpha1-precise-amd64
>     curl: (7) couldn't connect to host
>
> Joyent network issue, <https://bugs.launchpad.net/juju-core/+bug/1451104>
>
>
> That all the recent matches for the timeout rule have more useful and
> specific matches (some unfortunately needing to look at other log
> files for all the details), suggests we want those as rules rather
> than this.
>
> Martin
>
> --
> Juju-dev mailing list
> [email protected]
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>

-- 
Juju-dev mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Re: Rule #2 should die

Reply via email to