Excerpts from Beber's message of Sat Oct 22 17:42:55 UTC 2011:
> Public bug reported:
> 
> I'm using Juju rev.409 inside a VirtualBox machine running Oneiric.
> I only use the local provider.
> 
> I'm able to bootstrap the environment and create a first unit.
> But, I never managed to create a second unit (neither another unit from the 
> same service nor a unit from a different service).
> 
> The machine-agent.log file showed lots of errors related to the connection 
> with zookeeper :
> 2011-10-22 
> 13:46:25,280:5297(0xb6f8bb70):ZOO_ERROR@handle_socket_error_msg@1528: Socket 
> [192.168.122.1:55580] zk retcode=-7, errno=110(Connection timed out): 
> connection timed out (exceeded timeout by 24ms)
> 2011-10-22 13:46:26,173:5297(0xb6f8bb70):ZOO_WARN@zookeeper_interest@1461: 
> Exceeded deadline by 1719ms
> 2011-10-22 13:46:29,512:5297(0xb6f8bb70):ZOO_WARN@zookeeper_interest@1461: 
> Exceeded deadline by 5056ms
> 2011-10-22 13:46:30,486:5297(0xb6f8bb70):ZOO_INFO@check_events@1585: 
> initiated connection to server [192.168.122.1:55580]
> 2011-10-22 
> 13:46:36,240:5297(0xb6f8bb70):ZOO_ERROR@handle_socket_error_msg@1528: Socket 
> [192.168.122.1:55580] zk retcode=-7, errno=110(Connection timed out): 
> connection timed out (exceeded timeout by 1ms)
> 2011-10-22 13:46:36,431:5297(0xb6f8bb70):ZOO_WARN@zookeeper_interest@1461: 
> Exceeded deadline by 253ms
> 2011-10-22 13:46:39,986:5297(0xb6f8bb70):ZOO_WARN@zookeeper_interest@1461: 
> Exceeded deadline by 3768ms
> 2011-10-22 13:46:40,084:5297(0xb6f8bb70):ZOO_INFO@check_events@1585: 
> initiated connection to server [192.168.122.1:55580]
> 2011-10-22 
> 13:46:41,146:5297(0xb6f8bb70):ZOO_ERROR@handle_socket_error_msg@1621: Socket 
> [192.168.122.1:55580] zk retcode=-112, errno=116(Stale NFS file handle): 
> sessionId=0x1332b4783a40001 has expired.
> 
> These errors occured during the instanciation of the first unit. After
> the first unit was created, no new lines were appended to machine-
> agent.log.
> 
> I changed the tickTime value from 2000 to 20000 in 
> /usr/share/pyshared/juju/lib/zk.py and my problem has gone :
>   - no more error messages 
>   - I can now create another unit after the first one
> 
> The bahaviour I abserved should not occur on real hardware, but it may
> be useful to be able to set this parameter in the environments.yaml
> file.
> 
> Also, it would be nice if the machine agent could reconnect to zookeeper
> in case the connection is lost for a moment.
> 
> ** Affects: juju (Ubuntu)
>      Importance: Undecided
>          Status: New
> 
> ** Project changed: juju => juju (Ubuntu)
> 

There is in progress work to properly handle session expiration and reconnect 
in agents, we can increase the default session timeout as well.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/880023

Title:
  machine agent disconnects from zoopkeeper on heavy loads

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/juju/+bug/880023/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to