So we just ran into an issue when you are running multiple units on the
same machine and one of them is particularly busy.

The specific case is when deploying Openstack and colocating things like
"monitoring" charms with the "keystone" charm. Keystone itself has *lots* of
things that relate to it, so it wants to fire something like 50
relation-joined+changed hooks.

The symptom is that unit-keystone ends up acquiring and re-acquiring the
uniter hook lock for approximately 50 minutes and starves out all other
units from coming up, because they can't run any of their hooks.

>From what I can tell, on Linux we are using
net.Listen("abstract-unix-socket") and then polling at a 250ms interval to
see if we can grab that socket.

However, that means that every process that *doesn't* have the lock has an
average time of 125ms to wake up and notice that the lock isn't held.
However, a process that had the lock but has more hooks to fire is just
going to release the lock, do a bit of logic, and then be ready to acquire
the lock again, most likely much faster than 125ms.

We *could* introduce some sort of sleep there, to give some other processes
a chance. And/or use a range of times, instead of a fixed 250ms. (If
sometimes you sleep for 50ms, etc).

However, if we were using something like 'flock' then it has a blocking
mode, where it can give you the lock as soon as someone else releases it.

AIUI the only reason we liked abstract-unix-sockets was to not have a file
on disk, but we had a whole directory on disk, and flock seems like it
still gives us better sharing primitives than net.Listen.

Thoughts?
John
=:->
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to