Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart

mdroth Wed, 20 Mar 2013 12:41:04 -0700

On Wed, Mar 20, 2013 at 02:38:35PM -0400, Luiz Capitulino wrote:
> On Wed, 20 Mar 2013 13:14:21 -0500
> mdroth <mdr...@linux.vnet.ibm.com> wrote:
> 
> > > > > > > > +    handle = s->pstate.fd_counter++;
> > > > > > > > +    if (s->pstate.fd_counter < 0) {
> > > > > > > > +        s->pstate.fd_counter = 0;
> > > > > > > > +    }
> > > > > > > 
> > > > > > > Is this handling the overflow case? Can't fd 0 be in use already?
> > > > > > 
> > > > > > It could, but it's very unlikely that an overflow/counter reset 
> > > > > > would
> > > > > > result in issuing still-in-use handle, since guest-file-open would 
> > > > > > need
> > > > > > to be called 2^63 times in the meantime.
> > > > > 
> > > > > Agreed, but as you do check for that case and as the right fix is 
> > > > > simple
> > > > > and I think it's worth it. I can send a patch myself.
> > > > > 
> > > > 
> > > > A patch to switch to tracking a list of issued handles in the keystore,
> > > > or to return an error on overflow?
> > > 
> > > To return an error on overflow. Minor, but if we do handle it let's do it
> > > right. Or, we could just add an assert like:
> > > 
> > > assert(s->pstate.fd_counter >= 0);
> > 
> > Ah, well, I'm not sure I understand then. You mean dropping:
> > 
> >     if (s->pstate.fd_counter < 0) {
> >         s->pstate.fd_counter = 0;
> >     }
> > 
> > And replacing it with an error or assertion?
> 
> Yes, because I had understood you meant that this is very unlikely to be
> triggered because we'd need guest-file-open to be called 2^63. This is
> what I agreed above, although thinking more about it there's also the
> possibility of a malicious client doing this on purpose.
> 
> But now I see that what you really meant is that it's unlikely for fd 0
> to be in use after 2^63 guest-file-open calls. Am I right? If yes, then


Yup, that's the scenario I was referring to.

> I disagree because there's no way to guarantee when a certain fd will be
> in use or not, unless we allow fds to be returned.

Yes, it's a real scenario that can indeed occur under "normal" usage, but in
my head it's similar to assumptions we make about clocks not overflowing within
a timeframe that anyone cares about in terms of severity. To quantify it
more:

Given RPC latency there's really no way to run up the fd counter faster than
once per nanosecond (more like millisecond atm), so you'd have to keep a handle
open and constantly called guest-file-open for a period of 292 years
before a duplicate handle gets issued.

> 
> > If so, the overflow is actually expected: once we dish out handle MAX_INT64,
> > we should restart at 0. I initially made fd_counter a uint64_t so
> > overflow/reset would happen more naturally, but since we issue handles as
> > int64_t this would've caused other complications.
> > 
> > Something like this might be more clear about the intent though:
> > 
> >     handle = s->pstate.fd_counter;
> >     if (s->pstate.fd_counter == MAX_INT64) {
> >         s->pstate.fd_counter = 0;
> >     } else {
> >         s->pstate.fd_counter++;
> >     }
> 
> I disagree about restarting to zero as I have explained above. You seem to
> not like returning an error, is it because we'll make guest-file-open
> useless after the limit is reached?

Yes. But, on second thought, given the above I guess I don't mind returning an
error as a failsafe for now. Although I think it should be an assert()
with a FIXME:, since it really should be fixed before anyone ever manages
to trigger it.

> 
> Let's review our options:
> 
>  1. When fd_count reaches MAX_INT64 we reset it to zero
> 
>     Pros: simple and guest-file-open always work
>     Cons: fd 0 might be in use by a client
> 
>  2. When fd_count reaches MAX_INT64 we return an error
> 
>     Pros: simple and we fix 'cons' from item 1
>     Cons: guest-file-open will have a usage count limit
> 
>  3. Allow fds to be returned by clients on guest-file-close and do 2 on top
> 
>     Pros: solve problems discussed in items 1 and 2
>     Cons: not trivial and the usage limit problem from item 2 can still
>           happen if the client ends up not calling guest-file-close
>           (although I do think we'll reach the OS limit here)
> 
> Do you see other options? Am I overcomplicating?
> 

No, I think that about sums it up. Doing 3, paired with a limit on the
number of outstanding FDs as in 2 is the full solution. The only
complication that is that the higher the limit we impose, the more I/O
and look-up overhead will be incured for every
guest-file-open/guest-file-close, because those operations will become
O(fd_limit).

So if we do it we'll need to impose a reasonable limit like 16k outstanding
FDs at a time or something (maybe even that's too much, but I'm thinking an
extra 64k read/write with every fopen()/fclose() isn't that big a deal).

But to complicate things somewhat more:

This whole reason for this keystore thing is to patch up the existing
interfaces so they continue functioning without clients needing to
update. So if we're considering something that requires imposing a fairly
small limit on the number of outstanding handles, they'd need to update
their implementations eventually anyway to be correct.

So I wonder if, rather than pursuing option 3, we just introduce an
interface that does what we really want and returns handles as UUIDs,
then mark the existing interfaces as deprecated (and then remove them
within the next 300 years so our assert never gets hit :)

Re: [Qemu-devel] [PATCH] qemu-ga: use key-value store to avoid recycling fd handles after restart

Reply via email to