Hello,
Using the following (simplified) configuration, all the servers go (and
stay) into maintenance mode about 30s after start up or config
reload. Default "resolvers/hold timeout" I guess. These log lines get emitted:
2018-01-08T15:38:10.209195+00:00: Proxy dockercloud_hello-world started.
2018-01-08T15:38:10.209200+00:00: Proxy tutum_hello-world started.
[...]
2018-01-08T15:38:41.565222+00:00: Server tutum_hello-world/tutum1 is going
DOWN for maintenance (DNS timeout status). 3 active and 1 backup servers left.
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565233+00:00: Server tutum_hello-world/tutum2 is going
DOWN for maintenance (DNS timeout status). 2 active and 1 backup servers left.
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565238+00:00: Server tutum_hello-world/tutum3 is going
DOWN for maintenance (DNS timeout status). 1 active and 1 backup servers left.
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565244+00:00: Server tutum_hello-world/tutum4 is going
DOWN for maintenance (DNS timeout status). 0 active and 1 backup servers left.
Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565250+00:00: Server dockercloud_hello-world/dockercloud1
is going DOWN for maintenance (DNS timeout status). 0 active and 1 backup
servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in
queue.
resolvers rancher
nameserver dnsmasq 169.254.169.250:53
backend dockercloud_hello-world
default-server inter 2000 rise 2 fall 3 port 80
server sorry 127.0.0.1:8082 backup
server-template dockercloud 1 hello-world.dockercloud.rancher.internal:80
resolvers rancher check
backend tutum_hello-world
default-server inter 2000 rise 2 fall 3 port 80
server sorry 127.0.0.1:8082 backup
server-template tutum 4 hello-world.tutum.rancher.internal:80 resolvers
rancher check
I tcpdumped the DNS traffic and noticed the AAAA answers were empty.
(full output of wireshark's "text export" here:
https://gist.github.com/mfournier/0642a32df759ee0b1fbbd505a862f191)
Simply adding "resolve-prefer ipv4" makes the symptom go away, so no big
deal. But I wanted to point this out, as it might bite others, and I'm
pretty sure 1.7.x didn't have this issue.
Also, it doesn't seem right to me that a whole backend can get knocked
out by an incomplete DNS config (ie: the setup works well the first 30
seconds, as long as only the ipv4 A records get considered).
Thanks !
Marc