On Fri, 05/04 15:45, Max Reitz wrote: > On 2018-05-03 08:45, Fam Zheng wrote: > > On Sat, 04/28 13:03, Max Reitz wrote: > >> On 2018-04-27 08:22, Fam Zheng wrote: > >>> On Sat, 04/21 00:09, Max Reitz wrote: > >>>> When creating a file, we should take the WRITE and RESIZE permissions. > >>>> We do not need either for the creation itself, but we do need them for > >>>> clearing and resizing it. So we can take the proper permissions by > >>>> replacing O_TRUNC with an explicit truncation to 0, and by taking the > >>>> appropriate file locks between those two steps. > >>>> > >>>> Signed-off-by: Max Reitz <mre...@redhat.com> > >>>> --- > >>>> block/file-posix.c | 41 +++++++++++++++++++++++++++++++++++++++-- > >>>> 1 file changed, 39 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/block/file-posix.c b/block/file-posix.c > >>>> index c98a4a1556..ed7932d6e8 100644 > >>>> --- a/block/file-posix.c > >>>> +++ b/block/file-posix.c > >>>> @@ -1992,6 +1992,7 @@ static int raw_co_create(BlockdevCreateOptions > >>>> *options, Error **errp) > >>>> { > >>>> BlockdevCreateOptionsFile *file_opts; > >>>> int fd; > >>>> + int perm, shared; > >>>> int result = 0; > >>>> > >>>> /* Validate options and set default values */ > >>>> @@ -2006,14 +2007,48 @@ static int raw_co_create(BlockdevCreateOptions > >>>> *options, Error **errp) > >>>> } > >>>> > >>>> /* Create file */ > >>>> - fd = qemu_open(file_opts->filename, O_RDWR | O_CREAT | O_TRUNC | > >>>> O_BINARY, > >>>> - 0644); > >>>> + fd = qemu_open(file_opts->filename, O_RDWR | O_CREAT | O_BINARY, > >>>> 0644); > >>>> if (fd < 0) { > >>>> result = -errno; > >>>> error_setg_errno(errp, -result, "Could not create file"); > >>>> goto out; > >>>> } > >>>> > >>>> + /* Take permissions: We want to discard everything, so we need > >>>> + * BLK_PERM_WRITE; and truncation to the desired size requires > >>>> + * BLK_PERM_RESIZE. > >>>> + * On the other hand, we cannot share the RESIZE permission > >>>> + * because we promise that after this function, the file has the > >>>> + * size given in the options. If someone else were to resize it > >>>> + * concurrently, we could not guarantee that. */ > >>> > >>> Strictly speaking, we do close(fd) before this function returns so the > >>> lock is > >>> lost and race can happen. > >> > >> Right, but then creation from the perspective of file-posix is over. We > >> are going to reopen the file for formatting, but that is just a normal > >> bdrv_open() so it will automatically be locked, no? > > > > After this function close() but before the following bdrv_open(), another > > process can sneak in and steal the lock. So we cannot guarantee the RESIZE > > lock > > does what we really want. > > Right, but I'd argue that is not the purpose of the lock. If someone > wants the file (or then rather the node) to be a certain size, they'd > have to call bdrv_getlength() on it to check. > > But indeed you're right in that not sharing the RESIZE is hypocritical > considering it actually doesn't promise anything. > > Anyway, let's consider the issues. For formatting the file, this > behavior is OK because since Kevin's x-blockdev-create series format > drivers always truncate the file during formatting (that is, once they > have opened the file), and not during creation, so that part is secured. > > But we do have issues with e.g. drive-mirror, where we create an image > and then open it. The opened image may have a different size than what > we wanted to create. Now in this case I wouldn't consider this a big > problem because drive-mirror is basically deprecated anyway, so there is > no reason to waste resources here; and this applies to all of the QMP > commands that create images and then open them automatically. > > In the future, we want users to use blockdev-create and then > blockdev-add in two separate steps. Then it's up to the user to realize > that the blockdev-add'ed node may have a different length then what they > wanted to achieve in blockdev-create. Actually, it may just differ > altogether, I don't think qemu is going to make any promises on the > state of the image in the interim. > > So... I suppose there aren't any real issues with not being able to > promise that the image has the intended length immediately after > creating it. So I guess we can indeed share RESIZE. > > But OTOH, it definitely does not feel right to me to share RESIZE. We > definitely do not want other parties to resize the image while we create it. > > So I guess what I would like to do is keep RESIZE not shared and add a > note after the comment: > > > Note that after this function, we can no longer guarantee that the > > file is not touched by a third party, so it may be resized then. > > Ideally, we'd want the lock to stay active until blockdev-create or > qemu-img create returns, but I don't think that is really worth it. If > there is a race condition between raw_co_create() returning and those > commands returning (and we don't have a format layer to correct things), > then I can't imagine that it couldn't bite you after those commands have > returned. (And after those commands, it isn't our problem anymore anyway.)
Yes, what you said makes total sense to me. Having a comment and _not_ sharing RESIZE seems the most reasonable. Technically we could extend the API to be able to hold the fd after blockdev-create returns by introducing an explicit blockdev-create-cleanup command, but like you said it is probably not worth it. Fam