I tested it again thie evening, with Dustin monitoring. We again used lucid-amd64-topo2, and based the installs on the daily server/UEC images (releases.ubuntu.com is not accessible from tamarind, so I could not use Beta2).
Installation was uneventful. I then ran the config_single.yaml test. No problems starting instances, but still the script (or even I, manually) could not ssh into them, failing with a timeout. ran, just for the sake of it (I do not know what is, or is not, blocked by the firewall(s)) a traceroute agaisnt one of the instances, from cepedak. It reached marula (the CC), and then starred all. I then logged in Marula, and ssh-ed to an instance I had manually started. I *could* reach it (but failed, correctly, on public key -- I had not added a new key for this run, and the ones used by uec_test.py had already been revoked). This is the log of the IRC chat between Dustin and myself: 2010-04-13 18:25:32 hggdh kirkland: nodes registered, running a single-instance test now 2010-04-13 18:33:02 hggdh kirkland: test running, log is being written to ~/uec-testing-scripts/resutls/single* 2010-04-13 18:33:09 hggdh kirkland: on cempedak 2010-04-13 18:33:20 kirkland hggdh: cool, and you can ssh in? 2010-04-13 18:35:08 hggdh kirkland: negative 2010-04-13 18:35:19 kirkland hggdh: cannot ssh in 2010-04-13 18:35:25 hggdh kirkland: ssh fails on timeout 2010-04-13 18:35:31 hggdh really sounds like routing 2010-04-13 18:36:18 kirkland hggdh: interesting 2010-04-13 18:36:25 kirkland hggdh: okay, put the log somewhere for me to check out 2010-04-13 18:38:27 hggdh kirkland: k. I just ran one instance by hand, and then tried to ssh into it -- fails with a timeout 2010-04-13 18:39:25 kirkland hggdh: okay, that's easy to reproduce 2010-04-13 18:39:27 kirkland hggdh: log? 2010-04-13 18:42:29 hggdh kirkland: people.c.c/~cerdea/single_test.log.2010-04-13_193218 2010-04-13 18:46:15 kirkland hggdh: rsync -aP people.canonical.com:~cerdea/single_test.log.2010-04-13_193218 . 2010-04-13 18:46:20 kirkland hggdh: file not found 2010-04-13 18:47:04 kirkland hggdh: found it, public_html 2010-04-13 18:47:27 hggdh heh. one wants it on public_html, another on the root ;-) 2010-04-13 18:49:35 kirkland hggdh: ls -alF users/admin/uectest-k0.priv 2010-04-13 18:50:07 kirkland hggdh: and cat that file, make sure it matches -----BEGIN RSA PRIVATE KEY----- 2010-04-13 18:50:33 kirkland hggdh: is that instance still running? 2010-04-13 18:50:43 kirkland hggdh: can you telnet to its port 22 ? 2010-04-13 18:51:03 hggdh kirkland: yes, the instance is still running 2010-04-13 18:52:00 hggdh kirkland: the priv key seems kosher 2010-04-13 18:52:27 kirkland hggdh: and telnet ? 2010-04-13 18:53:50 hggdh kirkland: timeout. Also, a traceroute (FWIW) reaches marula (the CC) and stops there 2010-04-13 18:54:07 kirkland hggdh: oh, interesting 2010-04-13 18:54:22 kirkland hggdh: that's got to be it 2010-04-13 18:54:25 hggdh kirkland: let me try to ssh from marula 2010-04-13 18:54:38 kirkland hggdh: yeah 2010-04-13 18:54:43 kirkland hggdh: scp the priv key over 2010-04-13 18:54:47 kirkland hggdh: and try from there 2010-04-13 18:55:15 hggdh kirkland: first test -- reachability -- successful 2010-04-13 18:55:21 hggdh will move the priv key there now 2010-04-13 18:55:21 kirkland hggdh: ack 2010-04-13 19:00:03 kirkland hggdh: and? 2010-04-13 19:00:13 hggdh kirkland: getting permission denied (pub key) 2010-04-13 19:00:30 hggdh kirkland: but the important piece is that I am *reaching* the instance 2010-04-13 19:00:34 kirkland hggdh: hrm, odd 2010-04-13 19:00:38 kirkland hggdh: agreed on that point 2010-04-13 19:00:49 kirkland hggdh: and you're doing ssh -i ./whatever.priv ubu...@ip ? 2010-04-13 19:00:58 kirkland hggdh: and whatever.priv is perm'd 600 2010-04-13 19:01:17 hggdh kirkland: yes indeed, and will check again 2010-04-13 19:01:26 hggdh but on wrong permission ssh would bail out 2010-04-13 19:03:41 hggdh kirkland: and the full command is ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ./uectest-k0.priv ubu...@10.55.55.100 2010-04-13 19:04:07 hggdh although sort of overworked, I admit 2010-04-13 19:04:24 kirkland hggdh: hmm, okay 2010-04-13 19:04:35 kirkland hggdh: it may be that the guest is having trouble getting out 2010-04-13 19:04:48 kirkland hggdh: or at least to have the key injected 2010-04-13 19:04:58 kirkland hggdh: okay, add your traceroute findings to that bug 2010-04-13 19:05:11 kirkland hggdh: and email mathias (cc me) the link to that log 2010-04-13 19:05:33 kirkland hggdh: i'm reassured that this appears to be a networking issue, but we'll need to get to the bottom of it 2010-04-13 19:05:38 kirkland hggdh: i gotta run for the night 2010-04-13 19:05:41 kirkland hggdh: thanks dude! 2010-04-13 19:05:55 hggdh kirkland: will do, and g'night -- multi-machine topology, cannot reach an instance from the CLC https://bugs.launchpad.net/bugs/559230 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs