On 2013-12-04 12:51, Sean Dague wrote:
On 12/04/2013 11:56 AM, Ben Nemec wrote:
On 2013-12-04 06:07, Sean Dague wrote:
On 12/03/2013 11:21 PM, Clint Byrum wrote:
Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800:
On 12/03/2013 06:13 PM, Ben Nemec wrote:
On 2013-12-03 17:09, Sean Dague wrote:
On 12/03/2013 05:50 PM, Mark McLoughlin wrote:
On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote:
On 2013-12-03 15:56, Sean Dague wrote:
This cinder patch - https://review.openstack.org/#/c/48935/

Is blocked on failing upgrade because the updated oslo
lockutils won't
function until there is a specific configuration variable added
to the
cinder.conf.

That work around is proposed here -
https://review.openstack.org/#/c/52070/3

However I think this is exactly the kind of forward breaks that we
want
to prevent with grenade, as cinder failing to function after a
rolling
upgrade because a config item wasn't added is exactly the kind
of pain
we are trying to prevent happening to ops.

So the question is, how is this done correctly so that a default
can be
set in the cinder code for this value, and it not require a config
change to work?

You're absolutely correct, in principle - if the default value for lock_path worked for users before, we should absolutely continue to
support it.

I don't know that I have a good answer on how to handle this,
but for
context this change is the result of a nasty bug in lockutils that
meant
external locks were doing nothing if lock_path wasn't set.
Basically
it's something we should never have allowed in the first place.

As far as setting this in code, it's important that all of the
processes
for a service are using the same value to avoid the same bad
situation
we were in before.  For tests, we have a lockutils wrapper
(https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282)

that sets an environment variable to address this, but that only works if all of the processes are going to be spawned from within
the same wrapper, and I'm not sure how secure that is for
production
deployments since it puts all of the lock files in a temporary
directory.

Right, I don't think the previous default really "worked" - if
you used
the default, then external locking was broken.

I suspect most distros do set a default - I see RDO has this in its
default nova.conf:

  lock_path = /var/lib/nova/tmp

So, yes - this is all terrible.

IMHO, rather than raise an exception we should log a big fat warning about relying on the default and perhaps just treat the lock as an in-process lock in that case ... since that's essentially what it
was
before, right?

So a default of lock_path = /tmp will work (FHS says that path has
to be
there), even if not optimal. Could we make it a default value like
that
instead of the current default which is null (and hence the problem).

IIRC, my initial fix was something similar to that, but it got shot
down
because putting the lock files in a known world writeable location
was a
security issue.

Although maybe if we put them in a subdirectory of /tmp and ensured
that
the permissions were such that only the user running the service could
use that directory, it might be acceptable?  We could still log a
warning if we wanted.

This seems like it would have implications for people running services on Windows too, but we can probably find a way to make that work if we
decide on a solution.

How is that a security issue? Are the lock files being written with
some
sensitive data in them and have g or o permissions on? The sticky bit
(+t) on /tmp will prevent other users from deleting the file.


Right, but it won't prevent users from creating a symlink with the same
name.

ln -s /var/lib/nova/instances/xxxxx/image.raw /tmp/well.known.location

Now when you do

with open('/tmp/well.known.location', 'w') as lockfile:
  lockfile.write('Stuff')

Nova has just truncated the image file and written 'Stuff' to it.

So that's the generic case (and the way people often write this). But
the oslo lockutils code doesn't work that way. While it does open the
file for write, it does not actually write, it's using fcntl to hold
locks. That's taking a data structure on the fd in kernel memory (IIRC),
so it correctly gives it up if the process crashes.

I'm not saying there isn't some other possible security vulnerability
here as well, but it's not jumping out at me. So I'd love to understand
that, because if we can close that exposure we can provide a working
default, plus a strong recommendation for how to do that *right*. I'd be totally happy with printing WARNING level at startup if lock_path = /tmp
that this should be adjusted.

Full disclosure: I don't claim to be a security expert, so take my
thoughts on this with a grain of salt.

tldr: I still don't see a way to do this without breaking _something_.

Unfortunately, while we don't actually write to the file, just opening
it for write access truncates it.  So there remains the issue Clint
raised if someone can make that file a link.  I suppose we could check
for the bad things people might do, but I'm pretty sure that way lies
madness.

Fair point, more coffee before I make statements like that I guess :).

Follow up question: could we change it to open in "a" mode instead.
Still requires write access, but won't truncate.

As a followup to my suggestion to have lockutils create something like
/tmp/oslo-lock-path and make it accessible only to the owner, I think
there's still a problem because a malicious user could create that
directory ahead of time and when OpenStack is run for the first time it
would happily use the existing directory and potentially overwrite any
linked files in it.  There might be ways to avoid that, but I suspect
they would be racy (which I believe is why you're not supposed to use
tmp paths that are known ahead of time).  Granted, I'm not sure you
should be installing to a system that already has malicious users on it, but I suspect security people would be less than thrilled with that answer.

So the workable options that I see right now are to have it break on
upgrade if lock_path is not set, or continue to break external locking
by having it fall back to process locks when lock_path is not set, as
Mark suggested. I'm not wild about either so I'd love to hear if anyone
has a third option.

Honestly, I'd love us to be clever and figure out a not dangerous way
through this, even if unwise (where we can yell at the user in the LOGs
loudly, and fail them in J if lock_dir=/tmp) that lets us progress
through this while gracefully bringing configs into line.

Copying openstack-security since they should probably sign off on this.

-Ben

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to