Is it possible for us to close more gracefully most of the time? John =:->
On Feb 24, 2017 08:27, "Tim Penhey" <tim.pen...@canonical.com> wrote: > OK, I think I got it now... > > This is all crazy, and it was a change due to the gorilla/websocket > change. > > So... what happens when there is a successful restore on the server side > is that it calls os.Exit(...) which then has the pid 1 restart the agent. > However from the API client, this is an abnormal closure. > > In the rpc layer, I capture a number of websocket close errors as > "normal", but I missed the Abnormal closure case, which is 1006. > > I'll update, and repropose to devel. > > Hazaah. > > Tim > > > > On 24/02/17 16:17, Tim Penhey wrote: > >> Hi Curtis (also expanding to juju-dev), >> >> I have been looking into this issue. And the good news is that it >> doesn't appear to be a real problem with gorilla/websocket at all, but >> instead a change in timing showed an existing issue that hadn't surfaced >> before. >> >> I'll be looking into that issue - where the restore command after >> bootstrapping, doesn't appear to retry if it gets an error like "denied: >> upgrade in progress". >> >> Secondly I tried to reproduce on lxd to find that there is an issue with >> the rebootstrap and lxd - it just doesn't work. >> >> Then I tried with AWS, to mirror the CI test as close as possible. I >> didn't hit the same timing issue as before, but instead got a different >> failure with the mongo restore: >> >> http://pastebin.ubuntu.com/24056766/ >> >> I have no idea why juju.txns.stash failed but juju.txns and >> juju.txns.logs succeeded. >> >> Also, a CI run of a develop revision just before the gorilla/websocket >> reversion hit this: >> >> http://reports.vapour.ws/releases/4922/job/functional-ha- >> backup-restore/attempt/5045#highlight >> >> >> cannot create collection "txns": unauthorized mongo access: not >> authorized on juju to execute command { create: "txns" } >> (unauthorized access) >> >> Not sure why that is happening either. Seems that the restore of mongo >> is incredibly fragile. >> >> Again, this shows errors in the restore code, but luckily it has nothing >> to do with gorilla/websockets. >> >> Tim >> >> On 23/02/17 04:02, Curtis Hovey-Canonical wrote: >> >>> Hi Tim, et al. >>> >>> All the restore-backup tests in all the substrates failed with your >>> recent gorilla socket commit. The restore-backup command is often >>> fails when bootstrap or connection behaviours change. This new bug is >>> definitely a connection failure while the client is driving a >>> restore. >>> >>> We need the develop branch fixed. As the previous commit was blessed, >>> as are certain 2.2-alpha1 was in very good shape before the gorilla >>> change. >>> >>> Restore backup failed websocket: close 1006 >>> https://bugs.launchpad.net/juju/+bug/1666898 >>> >>> As seen at >>> http://reports.vapour.ws/releases/issue/5550dda7749a561097cf3d44 >>> >>> All the restore-backup tests failed when testing commit >>> https://github.com/juju/juju/commit/f06c3e96f4e438dc24a28d8e >>> bf7d22c76fff47e2 >>> >>> >>> We see >>> Initial model "default" added. >>> 04:54:39 INFO juju.juju api.go:72 connecting to API addresses: >>> [52.201.105.25:17070 172.31.15.167:17070] >>> 04:54:39 INFO juju.api apiclient.go:569 connection established to >>> "wss://52.201.105.25:17070/model/89bcc17c-9af9-4113-8417-718 >>> 47838f61a/api" >>> >>> ... >>> 04:55:20 ERROR juju.api.backups restore.go:136 could not clean up >>> after failed restore attempt: <nil> >>> 04:55:20 ERROR cmd supercommand.go:458 cannot perform restore: <nil>: >>> codec.ReadHeader error: error receiving message: websocket: close 1006 >>> (abnormal closure): unexpected EOF >>> >>> This is seen in aws, prodstack, gce >>> >>> >>> >>> >> > -- > Juju-dev mailing list > Juju-dev@lists.ubuntu.com > Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm > an/listinfo/juju-dev >
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev