So we just ran into an issue when you are running multiple units on the same machine and one of them is particularly busy.
The specific case is when deploying Openstack and colocating things like "monitoring" charms with the "keystone" charm. Keystone itself has *lots* of things that relate to it, so it wants to fire something like 50 relation-joined+changed hooks. The symptom is that unit-keystone ends up acquiring and re-acquiring the uniter hook lock for approximately 50 minutes and starves out all other units from coming up, because they can't run any of their hooks. >From what I can tell, on Linux we are using net.Listen("abstract-unix-socket") and then polling at a 250ms interval to see if we can grab that socket. However, that means that every process that *doesn't* have the lock has an average time of 125ms to wake up and notice that the lock isn't held. However, a process that had the lock but has more hooks to fire is just going to release the lock, do a bit of logic, and then be ready to acquire the lock again, most likely much faster than 125ms. We *could* introduce some sort of sleep there, to give some other processes a chance. And/or use a range of times, instead of a fixed 250ms. (If sometimes you sleep for 50ms, etc). However, if we were using something like 'flock' then it has a blocking mode, where it can give you the lock as soon as someone else releases it. AIUI the only reason we liked abstract-unix-sockets was to not have a file on disk, but we had a whole directory on disk, and flock seems like it still gives us better sharing primitives than net.Listen. Thoughts? John =:->
-- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev