Unsure what u mean here,

Tooz is already in oslo.

Where u thinking of something else?

D'Angelo, Scott wrote:
Could the work for the tooz variant be leveraged to add a truly distributed 
solution (with the proper tooz distributed backend)? IF so, then +1 to this 
idea. Cinder will be implementing a version of tooz based distribute locks, so 
having it in Olso someday is a goal I'd think.

________________________________________
From: Joshua Harlow [harlo...@fastmail.com]
Sent: Wednesday, December 09, 2015 6:13 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [oslo][all] The lock files saga (and where we can 
go from here)

Sooooo,

To try to reach some kind of conclusion here I am wondering if it would
be acceptable to folks (would people even adopt such a change?) if we
(oslo folks/others) provided a new function in say lockutils.py (in
oslo.concurrency) that would let users of oslo.concurrency pick which
kind of lock they would want to use...

The two types would be:

1. A pid based lock, which would *not* be resistant to crashing
processes, it would perhaps use
https://github.com/openstack/pylockfile/blob/master/lockfile/pidlockfile.py
internally. It would be more easily breakable and more easily
introspect-able (by either deleting the file or `cat` the file to see
the pid inside of it).
2. The existing lock that is resistant to crashing processes (it
automatically releases on owner process crash) but is not easily
introspect-able (to know who is using the lock) and is not easily
breakable (aka to forcefully break the lock and release waiters and the
current lock holder).

Would people use these two variants if (oslo) provided them, or would
the status quo exist and nothing much would change?

A third possibility is to spend energy using/integrating tooz
distributed locks and treating different processes on the same system as
distributed instances [even though they really are not distributed in
the classical sense]). These locks that tooz supports are already
introspect-able (via various means) and can be broken if needed (work is
in progress to make this breaking process more useable via API).

Thoughts?

-Josh

Clint Byrum wrote:
Excerpts from Joshua Harlow's message of 2015-12-01 09:28:18 -0800:
Sean Dague wrote:
On 12/01/2015 08:08 AM, Duncan Thomas wrote:
On 1 December 2015 at 13:40, Sean Dague<s...@dague.net
<mailto:s...@dague.net>>    wrote:


       The current approach means locks block on their own, are processed in
       the order they come in, but deletes aren't possible. The busy lock would
       mean deletes were normal. Some extra cpu spent on waiting, and lock
       order processing would be non deterministic. It's trade offs, but I
       don't know anywhere that we are using locks as queues, so order
       shouldn't matter. The cpu cost on the busy wait versus the lock file
       cleanliness might be worth making. It would also let you actually see
       what's locked from the outside pretty easily.


The cinder locks are very much used as queues in places, e.g. making
delete wait until after an image operation finishes. Given that cinder
can already bring a node into resource issues while doing lots of image
operations concurrently (such as creating lots of bootable volumes at
once) I'd be resistant to anything that makes it worse to solve a
cosmetic issue.
Is that really a queue? Don't do X while Y is a lock. Do X, Y, Z, in
order after W is done is a queue. And what you've explains above about
Don't DELETE while DOING OTHER ACTION, is really just the queue model.

What I mean by treating locks as queues was depending on X, Y, Z
happening in that order after W. With a busy wait approach they might
happen as Y, Z, X or X, Z, B, Y. They will all happen after W is done.
But relative to each other, or to new ops coming in, no real order is
enforced.

So ummm, just so people know the fasteners lock code (and the stuff that
has existed for file locks in oslo.concurrency and prior to that
oslo-incubator...) never has guaranteed the aboved sequencing.

How it works (and has always worked) is the following:

1. A lock object is created
(https://github.com/harlowja/fasteners/blob/master/fasteners/process_lock.py#L85)
2. That lock object acquire is performed
(https://github.com/harlowja/fasteners/blob/master/fasteners/process_lock.py#L125)
3. At that point do_open is called to ensure the file exists (if it
exists already it is opened in append mode, so no overwrite happen) and
the lock object has a reference to the file descriptor of that file
(https://github.com/harlowja/fasteners/blob/master/fasteners/process_lock.py#L112)
4. A retry loop starts, that repeats until either a provided timeout is
elapsed or the lock is acquired, the retry logic u can skip over but the
code that the retry loop calls is
https://github.com/harlowja/fasteners/blob/master/fasteners/process_lock.py#L92

The retry loop (really this loop @
https://github.com/harlowja/fasteners/blob/master/fasteners/_utils.py#L87)
will idle for a given delay between the next attempt to lock the file,
so that means there is no queue like sequencing, and that if for example
entity A (who created lock object at t0) sleeps for 50 seconds between
delays and entity B (who created lock object at t1) and sleeps for 5
seconds between delays would prefer entity B getting it (since entity B
has a smaller retry delay).

So just fyi, I wouldn't be depending on these for queuing/ordering as is...

Agreed, this form of fcntl locking is basically equivalent to
O_CREAT|O_EXCL locks as Sean described, since we never use the blocking
form. I'm not sure why though. The main reason one uses fcntl/flock is
to go ahead and block so waiters queue up efficiently. I'd tend to agree
with Sean that if we're going to busy wait, just using creation locks
will be simpler.

That said, I think what is missing is the metadata for efficiently
cleaning up stale locks. That can be done with fcntl or creation locks,
but with fcntl you have the kernel telling you for sure if the locking
process is still alive when you want to clean up and take the lock. With
creation, you need to write that information into the lock, and remove
it, and then have a way to make sure the process is alive and knows it
has the lock, and that is not exactly simple. For this reason only, I
suggest staying with fcntl.

Beyond that, perhaps what is needed is a tool in oslo_concurrency or
fasteners which one can use to prune stale locks based on said metadata.
Once that exists, a cron job running that is the simplest answer. Or if
need be, let the daemons spawn processes periodically to do that (you
can't use a greenthread, since you may be cleaning up your own locks and
fcntl will gladly let a process re-lock something it already has locked).

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to