mdroth <mdr...@linux.vnet.ibm.com> writes: > On Wed, Mar 20, 2013 at 02:38:35PM -0400, Luiz Capitulino wrote: >> On Wed, 20 Mar 2013 13:14:21 -0500 >> mdroth <mdr...@linux.vnet.ibm.com> wrote: >> >> > > > > > > > + handle = s->pstate.fd_counter++; >> > > > > > > > + if (s->pstate.fd_counter < 0) { >> > > > > > > > + s->pstate.fd_counter = 0; >> > > > > > > > + } >> > > > > > > >> > > > > > > Is this handling the overflow case? Can't fd 0 be in use already? >> > > > > > >> > > > > > It could, but it's very unlikely that an overflow/counter >> > > > > > reset would >> > > > > > result in issuing still-in-use handle, since >> > > > > > guest-file-open would need >> > > > > > to be called 2^63 times in the meantime. >> > > > > >> > > > > Agreed, but as you do check for that case and as the right >> > > > > fix is simple >> > > > > and I think it's worth it. I can send a patch myself. >> > > > > >> > > > >> > > > A patch to switch to tracking a list of issued handles in the keystore, >> > > > or to return an error on overflow? >> > > >> > > To return an error on overflow. Minor, but if we do handle it let's do it >> > > right. Or, we could just add an assert like: >> > > >> > > assert(s->pstate.fd_counter >= 0); >> > >> > Ah, well, I'm not sure I understand then. You mean dropping: >> > >> > if (s->pstate.fd_counter < 0) { >> > s->pstate.fd_counter = 0; >> > } >> > >> > And replacing it with an error or assertion? >> >> Yes, because I had understood you meant that this is very unlikely to be >> triggered because we'd need guest-file-open to be called 2^63. This is >> what I agreed above, although thinking more about it there's also the >> possibility of a malicious client doing this on purpose. >> >> But now I see that what you really meant is that it's unlikely for fd 0 >> to be in use after 2^63 guest-file-open calls. Am I right? If yes, then > > Yup, that's the scenario I was referring to. > >> I disagree because there's no way to guarantee when a certain fd will be >> in use or not, unless we allow fds to be returned. > > Yes, it's a real scenario that can indeed occur under "normal" usage, but in > my head it's similar to assumptions we make about clocks not overflowing > within > a timeframe that anyone cares about in terms of severity. To quantify it > more: > > Given RPC latency there's really no way to run up the fd counter faster than > once per nanosecond (more like millisecond atm), so you'd have to keep a > handle > open and constantly called guest-file-open for a period of 292 years > before a duplicate handle gets issued.
In other words, it's a complete non-issue :) >> > If so, the overflow is actually expected: once we dish out handle >> > MAX_INT64, >> > we should restart at 0. I initially made fd_counter a uint64_t so >> > overflow/reset would happen more naturally, but since we issue handles as >> > int64_t this would've caused other complications. >> > >> > Something like this might be more clear about the intent though: >> > >> > handle = s->pstate.fd_counter; >> > if (s->pstate.fd_counter == MAX_INT64) { >> > s->pstate.fd_counter = 0; >> > } else { >> > s->pstate.fd_counter++; >> > } >> >> I disagree about restarting to zero as I have explained above. You seem to >> not like returning an error, is it because we'll make guest-file-open >> useless after the limit is reached? > > Yes. But, on second thought, given the above I guess I don't mind returning an > error as a failsafe for now. Although I think it should be an assert() > with a FIXME:, since it really should be fixed before anyone ever manages > to trigger it. > >> >> Let's review our options: >> >> 1. When fd_count reaches MAX_INT64 we reset it to zero >> >> Pros: simple and guest-file-open always work >> Cons: fd 0 might be in use by a client >> >> 2. When fd_count reaches MAX_INT64 we return an error >> >> Pros: simple and we fix 'cons' from item 1 >> Cons: guest-file-open will have a usage count limit error path unreachable in practice Such errors require special hackery to test. The test needs to include the error's consumer. Waste of time if you ask me. >> 3. Allow fds to be returned by clients on guest-file-close and do 2 on top >> >> Pros: solve problems discussed in items 1 and 2 >> Cons: not trivial and the usage limit problem from item 2 can still >> happen if the client ends up not calling guest-file-close >> (although I do think we'll reach the OS limit here) The OS limit on file descriptors is many orders of magnitude lower than 2^63. >> Do you see other options? Am I overcomplicating? >> > > No, I think that about sums it up. Doing 3, paired with a limit on the > number of outstanding FDs as in 2 is the full solution. The only > complication that is that the higher the limit we impose, the more I/O > and look-up overhead will be incured for every > guest-file-open/guest-file-close, because those operations will become > O(fd_limit). Cure much worse than the disease. > So if we do it we'll need to impose a reasonable limit like 16k outstanding > FDs at a time or something (maybe even that's too much, but I'm thinking an > extra 64k read/write with every fopen()/fclose() isn't that big a deal). I guess the limit should be in the order of the OS's limit on open fds. Instead of a purely theoretical limit, we get a real limit. Stupid. > But to complicate things somewhat more: > > This whole reason for this keystore thing is to patch up the existing > interfaces so they continue functioning without clients needing to > update. So if we're considering something that requires imposing a fairly > small limit on the number of outstanding handles, they'd need to update > their implementations eventually anyway to be correct. > > So I wonder if, rather than pursuing option 3, we just introduce an > interface that does what we really want and returns handles as UUIDs, > then mark the existing interfaces as deprecated (and then remove them > within the next 300 years so our assert never gets hit :) Looks like you guys have no *practical* problems to solve. Congrats! Take a vacation! Please report back no later than 275 years from now, to make sure this 64 bit fd counter overflow problem gets taken care of in time. ;-P