Hello,

Using the following (simplified) configuration, all the servers go (and
stay) into maintenance mode about 30s after start up or config
reload. Default "resolvers/hold timeout" I guess. These log lines get emitted:

2018-01-08T15:38:10.209195+00:00:  Proxy dockercloud_hello-world started.
2018-01-08T15:38:10.209200+00:00:  Proxy tutum_hello-world started.
[...]
2018-01-08T15:38:41.565222+00:00:  Server tutum_hello-world/tutum1 is going 
DOWN for maintenance (DNS timeout status). 3 active and 1 backup servers left. 
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565233+00:00:  Server tutum_hello-world/tutum2 is going 
DOWN for maintenance (DNS timeout status). 2 active and 1 backup servers left. 
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565238+00:00:  Server tutum_hello-world/tutum3 is going 
DOWN for maintenance (DNS timeout status). 1 active and 1 backup servers left. 
0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565244+00:00:  Server tutum_hello-world/tutum4 is going 
DOWN for maintenance (DNS timeout status). 0 active and 1 backup servers left. 
Running on backup. 0 sessions active, 0 requeued, 0 remaining in queue.
2018-01-08T15:38:41.565250+00:00:  Server dockercloud_hello-world/dockercloud1 
is going DOWN for maintenance (DNS timeout status). 0 active and 1 backup 
servers left. Running on backup. 0 sessions active, 0 requeued, 0 remaining in 
queue.


resolvers rancher
    nameserver dnsmasq 169.254.169.250:53

backend dockercloud_hello-world
    default-server inter 2000 rise 2 fall 3 port 80
    server sorry 127.0.0.1:8082 backup
    server-template dockercloud 1 hello-world.dockercloud.rancher.internal:80 
resolvers rancher check

backend tutum_hello-world
    default-server inter 2000 rise 2 fall 3 port 80
    server sorry 127.0.0.1:8082 backup
    server-template tutum 4 hello-world.tutum.rancher.internal:80 resolvers 
rancher check


I tcpdumped the DNS traffic and noticed the AAAA answers were empty.
(full output of wireshark's "text export" here:
https://gist.github.com/mfournier/0642a32df759ee0b1fbbd505a862f191)

Simply adding "resolve-prefer ipv4" makes the symptom go away, so no big
deal. But I wanted to point this out, as it might bite others, and I'm
pretty sure 1.7.x didn't have this issue.

Also, it doesn't seem right to me that a whole backend can get knocked
out by an incomplete DNS config (ie: the setup works well the first 30
seconds, as long as only the ipv4 A records get considered).

Thanks !

Marc

Reply via email to