Excerpts from Sean Dague's message of 2013-12-04 10:51:16 -0800: > On 12/04/2013 11:56 AM, Ben Nemec wrote: > > On 2013-12-04 06:07, Sean Dague wrote: > >> On 12/03/2013 11:21 PM, Clint Byrum wrote: > >>> Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800: > >>>> On 12/03/2013 06:13 PM, Ben Nemec wrote: > >>>>> On 2013-12-03 17:09, Sean Dague wrote: > >>>>>> On 12/03/2013 05:50 PM, Mark McLoughlin wrote: > >>>>>>> On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote: > >>>>>>>> On 2013-12-03 15:56, Sean Dague wrote: > >>>>>>>>> This cinder patch - https://review.openstack.org/#/c/48935/ > >>>>>>>>> > >>>>>>>>> Is blocked on failing upgrade because the updated oslo > >>>>>>>>> lockutils won't > >>>>>>>>> function until there is a specific configuration variable added > >>>>>>>>> to the > >>>>>>>>> cinder.conf. > >>>>>>>>> > >>>>>>>>> That work around is proposed here - > >>>>>>>>> https://review.openstack.org/#/c/52070/3 > >>>>>>>>> > >>>>>>>>> However I think this is exactly the kind of forward breaks that we > >>>>>>>>> want > >>>>>>>>> to prevent with grenade, as cinder failing to function after a > >>>>>>>>> rolling > >>>>>>>>> upgrade because a config item wasn't added is exactly the kind > >>>>>>>>> of pain > >>>>>>>>> we are trying to prevent happening to ops. > >>>>>>>>> > >>>>>>>>> So the question is, how is this done correctly so that a default > >>>>>>>>> can be > >>>>>>>>> set in the cinder code for this value, and it not require a config > >>>>>>>>> change to work? > >>>>>>> > >>>>>>> You're absolutely correct, in principle - if the default value for > >>>>>>> lock_path worked for users before, we should absolutely continue to > >>>>>>> support it. > >>>>>>> > >>>>>>>> I don't know that I have a good answer on how to handle this, > >>>>>>>> but for > >>>>>>>> context this change is the result of a nasty bug in lockutils that > >>>>>>>> meant > >>>>>>>> external locks were doing nothing if lock_path wasn't set. > >>>>>>>> Basically > >>>>>>>> it's something we should never have allowed in the first place. > >>>>>>>> > >>>>>>>> As far as setting this in code, it's important that all of the > >>>>>>>> processes > >>>>>>>> for a service are using the same value to avoid the same bad > >>>>>>>> situation > >>>>>>>> we were in before. For tests, we have a lockutils wrapper > >>>>>>>> (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282) > >>>>>>>> > >>>>>>>> that sets an environment variable to address this, but that only > >>>>>>>> works if all of the processes are going to be spawned from within > >>>>>>>> the same wrapper, and I'm not sure how secure that is for > >>>>>>>> production > >>>>>>>> deployments since it puts all of the lock files in a temporary > >>>>>>>> directory. > >>>>>>> > >>>>>>> Right, I don't think the previous default really "worked" - if > >>>>>>> you used > >>>>>>> the default, then external locking was broken. > >>>>>>> > >>>>>>> I suspect most distros do set a default - I see RDO has this in its > >>>>>>> default nova.conf: > >>>>>>> > >>>>>>> lock_path = /var/lib/nova/tmp > >>>>>>> > >>>>>>> So, yes - this is all terrible. > >>>>>>> > >>>>>>> IMHO, rather than raise an exception we should log a big fat warning > >>>>>>> about relying on the default and perhaps just treat the lock as an > >>>>>>> in-process lock in that case ... since that's essentially what it > >>>>>>> was > >>>>>>> before, right? > >>>>>> > >>>>>> So a default of lock_path = /tmp will work (FHS says that path has > >>>>>> to be > >>>>>> there), even if not optimal. Could we make it a default value like > >>>>>> that > >>>>>> instead of the current default which is null (and hence the problem). > >>>>> > >>>>> IIRC, my initial fix was something similar to that, but it got shot > >>>>> down > >>>>> because putting the lock files in a known world writeable location > >>>>> was a > >>>>> security issue. > >>>>> > >>>>> Although maybe if we put them in a subdirectory of /tmp and ensured > >>>>> that > >>>>> the permissions were such that only the user running the service could > >>>>> use that directory, it might be acceptable? We could still log a > >>>>> warning if we wanted. > >>>>> > >>>>> This seems like it would have implications for people running services > >>>>> on Windows too, but we can probably find a way to make that work if we > >>>>> decide on a solution. > >>>> > >>>> How is that a security issue? Are the lock files being written with > >>>> some > >>>> sensitive data in them and have g or o permissions on? The sticky bit > >>>> (+t) on /tmp will prevent other users from deleting the file. > >>>> > >>> > >>> Right, but it won't prevent users from creating a symlink with the same > >>> name. > >>> > >>> ln -s /var/lib/nova/instances/xxxxx/image.raw /tmp/well.known.location > >>> > >>> Now when you do > >>> > >>> with open('/tmp/well.known.location', 'w') as lockfile: > >>> lockfile.write('Stuff') > >>> > >>> Nova has just truncated the image file and written 'Stuff' to it. > >> > >> So that's the generic case (and the way people often write this). But > >> the oslo lockutils code doesn't work that way. While it does open the > >> file for write, it does not actually write, it's using fcntl to hold > >> locks. That's taking a data structure on the fd in kernel memory (IIRC), > >> so it correctly gives it up if the process crashes. > >> > >> I'm not saying there isn't some other possible security vulnerability > >> here as well, but it's not jumping out at me. So I'd love to understand > >> that, because if we can close that exposure we can provide a working > >> default, plus a strong recommendation for how to do that *right*. I'd be > >> totally happy with printing WARNING level at startup if lock_path = /tmp > >> that this should be adjusted. > > > > Full disclosure: I don't claim to be a security expert, so take my > > thoughts on this with a grain of salt. > > > > tldr: I still don't see a way to do this without breaking _something_. > > > > Unfortunately, while we don't actually write to the file, just opening > > it for write access truncates it. So there remains the issue Clint > > raised if someone can make that file a link. I suppose we could check > > for the bad things people might do, but I'm pretty sure that way lies > > madness. > > Fair point, more coffee before I make statements like that I guess :). > > Follow up question: could we change it to open in "a" mode instead. > Still requires write access, but won't truncate. >
That would still allow them to DOS you by holding the locks indefinitely. > > As a followup to my suggestion to have lockutils create something like > > /tmp/oslo-lock-path and make it accessible only to the owner, I think > > there's still a problem because a malicious user could create that > > directory ahead of time and when OpenStack is run for the first time it > > would happily use the existing directory and potentially overwrite any > > linked files in it. There might be ways to avoid that, but I suspect > > they would be racy (which I believe is why you're not supposed to use > > tmp paths that are known ahead of time). Granted, I'm not sure you > > should be installing to a system that already has malicious users on it, > > but I suspect security people would be less than thrilled with that answer. > > > > So the workable options that I see right now are to have it break on > > upgrade if lock_path is not set, or continue to break external locking > > by having it fall back to process locks when lock_path is not set, as > > Mark suggested. I'm not wild about either so I'd love to hear if anyone > > has a third option. > > Honestly, I'd love us to be clever and figure out a not dangerous way > through this, even if unwise (where we can yell at the user in the LOGs > loudly, and fail them in J if lock_dir=/tmp) that lets us progress > through this while gracefully bringing configs into line. > I'm not very clever, so I tend to be afraid of clever solutions. :-P _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev