please correct me if I am missing sth here. I am not understanding how would throttle work without cooperation/back-off from producer. new Java producer supports non-blocking API. why would delayed response be able to slow down producer? producer will continue to fire async sends.
On Mon, Mar 16, 2015 at 10:58 PM, Guozhang Wang <wangg...@gmail.com> wrote: > I think we are really discussing two separate issues here: > > 1. Whether we should a) append-then-block-then-returnOKButThrottled or b) > block-then-returnFailDuetoThrottled for quota actions on produce requests. > > Both these approaches assume some kind of well-behaveness of the clients: > option a) assumes the client sets an proper timeout value while can just > ignore "OKButThrottled" response, while option b) assumes the client > handles the "FailDuetoThrottled" appropriately. For any malicious clients > that, for example, just keep retrying either intentionally or not, neither > of these approaches are actually effective. > > 2. For "OKButThrottled" and "FailDuetoThrottled" responses, shall we encode > them as error codes or augment the protocol to use a separate field > indicating "status codes". > > Today we have already incorporated some status code as error codes in the > responses, e.g. ReplicaNotAvailable in MetadataResponse, the pros of this > is of course using a single field for response status like the HTTP status > codes, while the cons is that it requires clients to handle the error codes > carefully. > > I think maybe we can actually extend the single-code approach to overcome > its drawbacks, that is, wrap the error codes semantics to the users so that > users do not need to handle the codes one-by-one. More concretely, > following Jay's example the client could write sth. like this: > > > ----------------- > > if(error.isOK()) > // status code is good or the code can be simply ignored for this > request type, process the request > else if(error.needsRetry()) > // throttled, transient error, etc: retry > else if(error.isFatal()) > // non-retriable errors, etc: notify / terminate / other handling > > ----------------- > > Only when the clients really want to handle, for example FailDuetoThrottled > status code specifically, it needs to: > > if(error.isOK()) > // status code is good or the code can be simply ignored for this > request type, process the request > else if(error == FailDuetoThrottled ) > // throttled: log it > else if(error.needsRetry()) > // transient error, etc: retry > else if(error.isFatal()) > // non-retriable errors, etc: notify / terminate / other handling > > ----------------- > > And for implementation we can probably group the codes accordingly like > HTTP status code such that we can do: > > boolean Error.isOK() { > return code < 300 && code >= 200; > } > > Guozhang > > On Mon, Mar 16, 2015 at 10:24 PM, Ewen Cheslack-Postava <e...@confluent.io > > > wrote: > > > Agreed that trying to shoehorn non-error codes into the error field is a > > bad idea. It makes it *way* too easy to write code that looks (and should > > be) correct but is actually incorrect. If necessary, I think it's much > > better to to spend a couple of extra bytes to encode that information > > separately (a "status" or "warning" section of the response). An > indication > > that throttling is occurring is something I'd expect to be indicated by a > > bit flag in the response rather than as an error code. > > > > Gwen - I think an error code makes sense when the request actually > failed. > > Option B, which Jun was advocating, would have appended the messages > > successfully. If the rate-limiting case you're talking about had > > successfully committed the messages, I would say that's also a bad use of > > error codes. > > > > > > On Mon, Mar 16, 2015 at 10:16 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > > > We discussed an error code for rate-limiting (which I think made > > > sense), isn't it a similar case? > > > > > > On Mon, Mar 16, 2015 at 10:10 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > > > My concern is that as soon as you start encoding non-error response > > > > information into error codes the next question is what to do if two > > such > > > > codes apply (i.e. you have a replica down and the response is > > quota'd). I > > > > think I am trying to argue that error should mean "why we failed your > > > > request", for which there will really only be one reason, and any > other > > > > useful information we want to send back is just another field in the > > > > response. > > > > > > > > -Jay > > > > > > > > On Mon, Mar 16, 2015 at 9:51 PM, Gwen Shapira <gshap...@cloudera.com > > > > > wrote: > > > > > > > >> I think its not too late to reserve a set of error codes (200-299?) > > > >> for "non-error" codes. > > > >> > > > >> It won't be backward compatible (i.e. clients that currently do > "else > > > >> throw" will throw on non-errors), but perhaps its worthwhile. > > > >> > > > >> On Mon, Mar 16, 2015 at 9:42 PM, Jay Kreps <jay.kr...@gmail.com> > > wrote: > > > >> > Hey Jun, > > > >> > > > > >> > I'd really really really like to avoid that. Having just spent a > > > bunch of > > > >> > time on the clients, using the error codes to encode other > > information > > > >> > about the response is super dangerous. The error handling is one > of > > > the > > > >> > hardest parts of the client (Guozhang chime in here). > > > >> > > > > >> > Generally the error handling looks like > > > >> > if(error == none) > > > >> > // good, process the request > > > >> > else if(error == KNOWN_ERROR_1) > > > >> > // handle known error 1 > > > >> > else if(error == KNOWN_ERROR_2) > > > >> > // handle known error 2 > > > >> > else > > > >> > throw Errors.forCode(error).exception(); // or some other > > default > > > >> > behavior > > > >> > > > > >> > This works because we have a convention that and error is > something > > > that > > > >> > prevented your getting the response so the default handling case > is > > > sane > > > >> > and forward compatible. It is tempting to use the error code to > > convey > > > >> > information in the success case. For example we could use error > > codes > > > to > > > >> > encode whether quotas were enforced, whether the request was > served > > > out > > > >> of > > > >> > cache, whether the stock market is up today, or whatever. The > > problem > > > is > > > >> > that since these are not errors as far as the client is concerned > it > > > >> should > > > >> > not throw an exception but process the response, but now we > created > > an > > > >> > explicit requirement that that error be handled explicitly since > it > > is > > > >> > different. I really think that this kind of information is not an > > > error, > > > >> it > > > >> > is just information, and if we want it in the response we should > do > > > the > > > >> > right thing and add a new field to the response. > > > >> > > > > >> > I think you saw the Samza bug that was literally an example of > this > > > >> > happening and leading to an infinite retry loop. > > > >> > > > > >> > Further more I really want to emphasize that hitting your quota in > > the > > > >> > design that Adi has proposed is actually not an error condition at > > > all. > > > >> It > > > >> > is totally reasonable in any bootstrap situation to intentionally > > > want to > > > >> > run at the limit the system imposes on you. > > > >> > > > > >> > -Jay > > > >> > > > > >> > > > > >> > > > > >> > On Mon, Mar 16, 2015 at 4:27 PM, Jun Rao <j...@confluent.io> > wrote: > > > >> > > > > >> >> It's probably useful for a client to know whether its requests > are > > > >> >> throttled or not (e.g., for monitoring and alerting). From that > > > >> >> perspective, option B (delay the requests and return an error) > > seems > > > >> >> better. > > > >> >> > > > >> >> Thanks, > > > >> >> > > > >> >> Jun > > > >> >> > > > >> >> On Wed, Mar 4, 2015 at 3:51 PM, Aditya Auradkar < > > > >> >> aaurad...@linkedin.com.invalid> wrote: > > > >> >> > > > >> >> > Posted a KIP for quotas in kafka. > > > >> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas > > > >> >> > > > > >> >> > Appreciate any feedback. > > > >> >> > > > > >> >> > Aditya > > > >> >> > > > > >> >> > > > >> > > > > > > > > > > > -- > > Thanks, > > Ewen > > > > > > -- > -- Guozhang >