So I'm trying to test ansible 2.2.1-rc3 on puppetmaster. If as root you try
# /root/ianw/ansible/bin/activate # ansible 2.2.1-rc3 venv # /root/ianw/run_all.sh # run_all but with --check to dry-run You should see the problem. ansible-playbook just stops; when you take a look at the wait channel, you can see it's all blocked on a signal. --- root@puppetmaster:/root/ianw/ansible# ps -aefl | grep ansible 0 S root 4202 1010 0 80 0 - 2927 wait 03:06 pts/15 00:00:00 timeout -k 2m 120m ansible-playbook -vvv 0 T root 4203 4202 0 80 0 - 44381 signal 03:06 pts/15 00:00:02 /root/ianw/ansible/bin/python /root/ianw/ 1 T root 4331 4203 0 80 0 - 46533 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 1 T root 4360 4203 0 80 0 - 46596 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4393 4331 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 4 T root 4398 4360 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4400 4203 0 80 0 - 46661 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4410 4400 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4412 4203 0 80 0 - 46661 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 1 T root 4421 4203 0 80 0 - 46725 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4422 4412 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4428 4203 0 80 0 - 46982 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 1 T root 4431 4203 0 80 0 - 46725 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4436 4421 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4438 4203 0 80 0 - 46789 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4446 4438 0 80 0 - 11551 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4447 4203 0 80 0 - 46792 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4448 4447 0 80 0 - 11109 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4452 4203 0 80 0 - 46856 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ 4 T root 4455 4452 0 80 0 - 11037 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 4 T root 4461 4431 0 80 0 - 11005 signal 03:06 pts/15 00:00:00 ssh -C -o ControlMaster=auto -o ControlPe 1 T root 4462 4428 0 80 0 - 46725 signal 03:06 pts/15 00:00:00 /root/ianw/ansible/bin/python /root/ianw/ --- stracing any of those gives --- root@puppetmaster:/root/ianw/ansible# strace -p 4412 Process 4412 attached --- stopped by SIGTTOU --- --- I know from experience that SIGTTOU is just bad news, it means crappy tricky terminal stuff ahead. I started looking through an strace to see where this comes up --- (ansible)root@puppetmaster:/tmp# cat /tmp/output.log | grep TTOU | grep kill 21029 kill(21029, SIGTTOU <unfinished ...> --- Having a look at pid 21029 it's an ssh process, that ansible is launching via some pipe/fifo related system, but it's going to --- execve("/usr/bin/ssh" 21029 connect(3, {sa_family=AF_INET6, sin6_port=htons(22), inet_pton(AF_INET6, "2001:4800:7819:105:be76:4eff:fe04:a5b2", ...) --- Well guess what; this host obviously doesn't have keys deployed and is throwing up a password prompt. --- # ssh 2001:4800:7819:105:be76:4eff:fe04:a5b2 root@2001:4800:7819:105:be76:4eff:fe04:a5b2's password: --- (The other thing is -- what is this host and why is shade picking it up? but we have to handle this. also maybe at the time it was a host key unknown message, but same same) It's at this point we can see ssh opening /dev/tty and then, bang, in comes our SIGTTOU --- 21029 open("/dev/tty", O_RDWR <unfinished ...> 21029 <... open resumed> ) = 4 21029 ioctl(4, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS <unfinished ...> 21029 <... ioctl resumed> , {B38400 opost isig icanon echo ...}) = 0 21029 ioctl(4, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 21029 ioctl(4, SNDCTL_TMR_CONTINUE or SNDRV_TIMER_IOCTL_GPARAMS or TCSETSF <unfinished ...> 21029 <... ioctl resumed> , {B38400 opost isig icanon echo ...}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) 21029 --- SIGTTOU {si_signo=SIGTTOU, si_code=SI_KERNEL} --- --- Which is all great EXCEPT it seems there is an, AFAICT, unresolved bug in "timeout" that means it has incorrectly reset the SIGTTOU handlers to default (?) so everything just completely stops. [1] I believe this is going to happen when run_all.sh tries to run [2], although *maybe* under cron without a terminal it will work? But anyway, we'll hit it if we ever need to run run_all.sh by hand. The thing is, ansible 2.0.2.0 seems to do all the ssh stuff very differently so this doesn't appear to happen. At this point, thoughts welcome, -i [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=15779 [2] https://git.openstack.org/cgit/openstack-infra/system-config/tree/run_all.sh#n41 _______________________________________________ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra