I tested it again thie evening, with Dustin monitoring. We again used
lucid-amd64-topo2, and based the installs on the daily server/UEC images
(releases.ubuntu.com is not accessible from tamarind, so I could not use
Beta2).

Installation was uneventful.

I then ran the config_single.yaml test. No problems starting instances,
but still the script (or even I, manually) could not ssh into them,
failing with a timeout.

ran, just for the sake of it (I do not know what is, or is not, blocked
by the firewall(s)) a traceroute agaisnt one of the instances, from
cepedak. It reached marula (the CC), and then starred all.

I then logged in Marula, and ssh-ed to an instance I had manually
started. I *could* reach it (but failed, correctly, on public key -- I
had not added a new key for this run, and the ones used by uec_test.py
had already been revoked).

This is the log of the IRC chat between Dustin and myself:

2010-04-13 18:25:32     hggdh   kirkland: nodes registered, running a 
single-instance test now
2010-04-13 18:33:02     hggdh   kirkland: test running, log is being written to 
~/uec-testing-scripts/resutls/single*
2010-04-13 18:33:09     hggdh   kirkland: on cempedak
2010-04-13 18:33:20     kirkland        hggdh: cool, and you can ssh in?
2010-04-13 18:35:08     hggdh   kirkland: negative
2010-04-13 18:35:19     kirkland        hggdh: cannot ssh in
2010-04-13 18:35:25     hggdh   kirkland: ssh fails on timeout
2010-04-13 18:35:31     hggdh   really sounds like routing
2010-04-13 18:36:18     kirkland        hggdh: interesting
2010-04-13 18:36:25     kirkland        hggdh: okay, put the log somewhere for 
me to check out
2010-04-13 18:38:27     hggdh   kirkland: k. I just ran one instance by hand, 
and then tried to ssh into it -- fails with a timeout
2010-04-13 18:39:25     kirkland        hggdh: okay, that's easy to reproduce
2010-04-13 18:39:27     kirkland        hggdh: log?
2010-04-13 18:42:29     hggdh   kirkland: 
people.c.c/~cerdea/single_test.log.2010-04-13_193218
2010-04-13 18:46:15     kirkland        hggdh: rsync -aP 
people.canonical.com:~cerdea/single_test.log.2010-04-13_193218 .
2010-04-13 18:46:20     kirkland        hggdh: file not found
2010-04-13 18:47:04     kirkland        hggdh: found it, public_html
2010-04-13 18:47:27     hggdh   heh. one wants it on public_html, another on 
the root ;-)
2010-04-13 18:49:35     kirkland        hggdh: ls -alF 
users/admin/uectest-k0.priv
2010-04-13 18:50:07     kirkland        hggdh: and cat that file, make sure it 
matches -----BEGIN RSA PRIVATE KEY-----
2010-04-13 18:50:33     kirkland        hggdh: is that instance still running?
2010-04-13 18:50:43     kirkland        hggdh: can you telnet to its port 22 ?
2010-04-13 18:51:03     hggdh   kirkland: yes, the instance is still running
2010-04-13 18:52:00     hggdh   kirkland: the priv key seems kosher
2010-04-13 18:52:27     kirkland        hggdh: and telnet ?
2010-04-13 18:53:50     hggdh   kirkland: timeout. Also, a traceroute (FWIW) 
reaches marula (the CC) and stops there
2010-04-13 18:54:07     kirkland        hggdh: oh, interesting
2010-04-13 18:54:22     kirkland        hggdh: that's got to be it
2010-04-13 18:54:25     hggdh   kirkland: let me try to ssh from marula
2010-04-13 18:54:38     kirkland        hggdh: yeah
2010-04-13 18:54:43     kirkland        hggdh: scp the priv key over
2010-04-13 18:54:47     kirkland        hggdh: and try from there
2010-04-13 18:55:15     hggdh   kirkland: first test -- reachability -- 
successful
2010-04-13 18:55:21     hggdh   will move the priv key there now
2010-04-13 18:55:21     kirkland        hggdh: ack
2010-04-13 19:00:03     kirkland        hggdh: and?
2010-04-13 19:00:13     hggdh   kirkland: getting permission denied (pub key)
2010-04-13 19:00:30     hggdh   kirkland: but the important piece is that I am 
*reaching* the instance
2010-04-13 19:00:34     kirkland        hggdh: hrm, odd
2010-04-13 19:00:38     kirkland        hggdh: agreed on that point
2010-04-13 19:00:49     kirkland        hggdh: and you're doing ssh -i 
./whatever.priv ubu...@ip ?
2010-04-13 19:00:58     kirkland        hggdh: and whatever.priv is perm'd 600
2010-04-13 19:01:17     hggdh   kirkland: yes indeed, and will check again
2010-04-13 19:01:26     hggdh   but on wrong permission ssh would bail out
2010-04-13 19:03:41     hggdh   kirkland: and the full command is ssh -o 
UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i ./uectest-k0.priv  
ubu...@10.55.55.100
2010-04-13 19:04:07     hggdh   although sort of overworked, I admit
2010-04-13 19:04:24     kirkland        hggdh: hmm, okay
2010-04-13 19:04:35     kirkland        hggdh: it may be that the guest is 
having trouble getting out
2010-04-13 19:04:48     kirkland        hggdh: or at least to have the key 
injected
2010-04-13 19:04:58     kirkland        hggdh: okay, add your traceroute 
findings to that bug
2010-04-13 19:05:11     kirkland        hggdh: and email mathias (cc me) the 
link to that log
2010-04-13 19:05:33     kirkland        hggdh: i'm reassured that this appears 
to be a networking issue, but we'll need to get to the bottom of it
2010-04-13 19:05:38     kirkland        hggdh: i gotta run for the night
2010-04-13 19:05:41     kirkland        hggdh: thanks dude!
2010-04-13 19:05:55     hggdh   kirkland: will do, and g'night

-- 
multi-machine topology, cannot reach an instance from the CLC
https://bugs.launchpad.net/bugs/559230
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to