Thanks Guozhang. It wouldn’t be as thoroughly considered without discussing with you :)
Jiangjie (Becket) Qin On 3/16/15, 1:07 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: >Thanks Jiangjie, > >After talking to you offline on this, I have been convinced and changed my >preference to blocking. The immediate shutdown approach does have some >unsafeness in some cases. > >Guozhang > >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <j...@linkedin.com.invalid> >wrote: > >> It looks that the problem we want to solve and the purpose we want to >> achieve is: >> If user uses close() in callback, we want to let user be aware that they >> should use close(0) instead of close() in the callback. >> >> We have agreed that we will have an error log to inform user about this >> mis-usage. The options differ in the way how we can force user to take a >> look at that error log. >> There are two scenarios: >> 1. User does not expect the program to exit. >> 2. User expect the program to exit. >> >> For scenario 1), blocking will probably delay the discovery of the >> problem. Calling close(0) exposes the problem quicker. In this scenario >> producer just encounter a send failure when running normally. >> For scenario 2), blocking will expose the problem quick. Calling >>close(-1) >> might hide the problem. This scenario might include: a) Unit test for a >> send failure. b) Message sending during a close() call from a user >>thread. >> >> So as a summary table: >> >> Scenario 1) Scenario 2) >> >> Blocking Delay problem discovery Guaranteed problem >>discovery >> >> Close(-1) Immediate problem discovery Problem might be hidden >> >> >> Personally I prefer blocking because it seems providing more guarantees >> and safer. >> >> Thanks. >> >> Jiangjie (Becket) Qin >> >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wangg...@gmail.com> wrote: >> >> >HI Jiangjie, >> > >> >As far as I understand calling close() in the ioThread is not common, >>as >> >it >> >may only trigger when we saw some non-retriable error. Hence when user >>run >> >their program it is unlikely that close() will be triggered and problem >> >will be detected. So it seems to me that from the error detection >>aspect >> >these two options seems to be the same as people will usually detect it >> >from the producer metrics all dropping to 0. >> > >> >Guozhang >> > >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin >><j...@linkedin.com.invalid> >> >wrote: >> > >> >> It seems there are two options we can choose from when close() is >>called >> >> from sender thread (callback): >> >> 1. Log an error and close the producer using close(-1) >> >> 2. Log an error and block. >> >> (Throwing an exception will not work because we catch all the >>exception >> >> thrown from user callback. It will just lead to an error log.) >> >> >> >> My concern for the first option is that the producer will be closed >>even >> >> if we logged and error. I am wondering if some user would not even >>take >> >>a >> >> look at the log if producer is closed normally. Because from the >> >>programs >> >> behavior, everything looks good. If that is the case, the error >>message >> >>we >> >> logged probably will just be ignored until some day when people check >> >>the >> >> log and see it. >> >> >> >> As for the second option, because producer does not close but blocks. >> >>User >> >> will notice this the first time they run the program. They probably >>will >> >> look at the log to see why producer could not be closed and they will >> >>see >> >> the error log we put there. So they will get informed about this >> >>mis-usage >> >> of close() in sender thread the first time they run the code instead >>of >> >> some time later. >> >> >> >> Personally I prefer the second one because it is more obvious that >> >> something was wrong. >> >> >> >> Jiangjie (Becket) Qin >> >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: >> >> >> >> >Yeah I agree we should not silently change the behavior of the >>function >> >> >with the given parameters; and I would prefer >> >>error-logging-and-shutdown >> >> >over blocking when close(>0) is used, since as Neha suggested >>blocking >> >> >would also not proceed with sending any data, bu will just let >>users to >> >> >realize the issue later than sooner. >> >> > >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <n...@confluent.io> >> >>wrote: >> >> > >> >> >> > >> >> >> > And I also agree it is better if we can make producer block when >> >> >> > close() is called from sender thread so user will notice >>something >> >> >>went >> >> >> > wrong. >> >> >> >> >> >> >> >> >> This isn't a great experience either. Why can't we just throw an >> >> >>exception >> >> >> for a behavior we know is incorrect and we'd like the user to >>know. >> >> >> Blocking as a means of doing that seems wrong and annoying. >> >> >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <jay.kr...@gmail.com> >> >> wrote: >> >> >> >> >> >> > Cool. >> >> >> > >> >> >> > I think blocking is good or alternately throwing an exception >> >>directly >> >> >> from >> >> >> > close(). Basically I would just worry about subtly doing >>something >> >> >> slightly >> >> >> > different from what the user asked for as it will be hard to >>notice >> >> >>that >> >> >> > behavior difference. >> >> >> > >> >> >> > -Jay >> >> >> > >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin >> >> >><j...@linkedin.com.invalid >> >> >> > >> >> >> > wrote: >> >> >> > >> >> >> > > Hi Jay, >> >> >> > > >> >> >> > > I have modified the KIP as you suggested. I thinks as long as >>we >> >> >>have >> >> >> > > consistent define for timeout across Kafka interface, there >>would >> >> >>be no >> >> >> > > problem. And I also agree it is better if we can make producer >> >>block >> >> >> when >> >> >> > > close() is called from sender thread so user will notice >> >>something >> >> >>went >> >> >> > > wrong. >> >> >> > > >> >> >> > > Thanks. >> >> >> > > >> >> >> > > Jiangjie (Becket) Qin >> >> >> > > >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: >> >> >> > > >> >> >> > > >Hey Jiangjie, >> >> >> > > > >> >> >> > > >I think this is going to be very confusing that >> >> >> > > > close(0) waits indefinitely and >> >> >> > > > close(-1) waits for 0. >> >> >> > > >I understand this appears in other apis, but it is a constant >> >> >>cause of >> >> >> > > >bugs. Let's not repeat that mistake. >> >> >> > > > >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait >> >> >> indefinitely >> >> >> > > >as >> >> >> > > >we already have close() so having a magical constant for >>that is >> >> >> > > >redundant. >> >> >> > > > >> >> >> > > >Calling close() from the I/O thread was already possible and >> >>would >> >> >> block >> >> >> > > >indefinitely. I think trying to silently change the behavior >>is >> >> >> probably >> >> >> > > >not right. I.e. if the user calls close() in the callback >>there >> >>is >> >> >> > > >actually >> >> >> > > >some misunderstanding and they need to think more, silently >> >>making >> >> >> this >> >> >> > > >not >> >> >> > > >block will hide the problem from them which is the opposite >>of >> >> >>what we >> >> >> > > >want. >> >> >> > > > >> >> >> > > >-Jay >> >> >> > > > >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin >> >> >> <j...@linkedin.com.invalid >> >> >> > > >> >> >> > > >wrote: >> >> >> > > > >> >> >> > > >> Hey Joe & Jay, >> >> >> > > >> >> >> >> > > >> Thanks for the comments on the voting thread. Since it >>seems >> >>we >> >> >> > probably >> >> >> > > >> will have more discussion on this, I am just replying from >>the >> >> >> > > >>discussion >> >> >> > > >> thread here. >> >> >> > > >> I’ve updated the KIP page to make it less like half-baked, >> >> >>apologize >> >> >> > for >> >> >> > > >> the rush... >> >> >> > > >> >> >> >> > > >> The contract in current KIP is: >> >> >> > > >> 1. close() - wait until all requests either are sent or >> >>reach >> >> >> > request >> >> >> > > >> timeout. >> >> >> > > >> 2. close(-1, TimeUnit.MILLISECONDS) - close immediately >> >> >> > > >> 3. close(0, TimeUnit.MILLISECONDS) - equivalent to >>close(), >> >> >>i.e. >> >> >> > Wait >> >> >> > > >> until all requests are sent or reach request timeout >> >> >> > > >> 4. close(5, TimeUnit.MILLISECONDS) - try the best to >>finish >> >> >> sending >> >> >> > > >>in 5 >> >> >> > > >> milliseconds, if something went wrong, just shutdown the >> >>producer >> >> >> > > >>anyway, >> >> >> > > >> my callback will handle the failures. >> >> >> > > >> >> >> >> > > >> About how we define what timeout value stands for, I >>actually >> >> >> > struggled >> >> >> > > >>a >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0) >>should >> >> >>mean >> >> >> > > >> immediately, however it seems that all the existing java >>class >> >> >>have >> >> >> > this >> >> >> > > >> convention of timeout=0 means no timeout or never timeout >> >> >> > > >>(Thread.join(0), >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we >>follow >> >>the >> >> >> > > >> intuition or we follow the convention. What I chose is to >> >>follow >> >> >>the >> >> >> > > >> convention but document the interface to let user be aware >>of >> >>the >> >> >> > usage. >> >> >> > > >> The reason is that I think producer.close() is a public >> >> >>interface so >> >> >> > it >> >> >> > > >> might be better to follow java convention. Whereas >>selector is >> >> >>not a >> >> >> > > >> public interface that used by user, so as long as it makes >> >>sense >> >> >>to >> >> >> > us, >> >> >> > > >>it >> >> >> > > >> is less a problem to be different from java convention. >>That >> >>said >> >> >> > since >> >> >> > > >> consumer.poll(timeout) is also a public interface, I think >>it >> >> >>also >> >> >> > makes >> >> >> > > >> sense to make producer.close() to have the same definition >>of >> >> >> > > >> consumer.poll(timeout). >> >> >> > > >> >> >> >> > > >> The main argument for keeping a timeout in close would be >> >> >>separating >> >> >> > the >> >> >> > > >> close timeout from request timeout, which probably makes >> >>sense. I >> >> >> > would >> >> >> > > >> guess typically the request timeout would be long (e.g. 60 >> >> >>seconds) >> >> >> > > >> because we might want to consider retries with back off >>time. >> >>If >> >> >>we >> >> >> > have >> >> >> > > >> multiple batches in accumulator, in worst case that could >>take >> >> >>up to >> >> >> > > >> several minutes to complete all the requests. But when we >> >>close a >> >> >> > > >> producer, we might not want to wait for that long as it >>might >> >> >>cause >> >> >> > some >> >> >> > > >> other problem like deployment tool timeout. >> >> >> > > >> >> >> >> > > >> There is also a subtle difference between close(timeout) >>and >> >> >> > > >> flush(timeout). The only purpose for flush() is to write >>data >> >>to >> >> >>the >> >> >> > > >> broker, so it makes perfect sense to wait until request >> >>timeout. >> >> >>I >> >> >> > think >> >> >> > > >> that is why flush(timeout) looks strange. On the other >>hand, >> >>the >> >> >>top >> >> >> > > >> priority for close() is to close the producer rather than >> >>flush() >> >> >> > data, >> >> >> > > >>so >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for its >>main >> >> >>job. >> >> >> > > >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is not a >> >>public >> >> >> > > >> interface. I mentioned it in Proposed Changes section >>which I >> >> >> thought >> >> >> > > >>was >> >> >> > > >> supposed to provide implementation details. >> >> >> > > >> >> >> >> > > >> Thanks again for all the comments and suggestions! >> >> >> > > >> >> >> >> > > >> Jiangjie (Becket) Qin >> >> >> > > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <j...@linkedin.com> >> wrote: >> >> >> > > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments. >> >> >> > > >> >I¹d like to initiate the voting process if no further >> >>comments >> >> >>are >> >> >> > > >> >received by tomorrow. >> >> >> > > >> > >> >> >> > > >> >Jiangjie (Becket) Qin >> >> >> > > >> > >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <jay.kr...@gmail.com> >>wrote: >> >> >> > > >> > >> >> >> > > >> >>Hey Jiangjie, >> >> >> > > >> >> >> >> >> > > >> >>Can you capture the full motivation and use cases for the >> >> >>feature? >> >> >> > > >>This >> >> >> > > >> >>mentions your interest in having a way of aborting from >> >>inside >> >> >>the >> >> >> > > >> >>Callback. But it doesn't really explain that usage or why >> >>other >> >> >> > people >> >> >> > > >> >>would want to do that. It also doesn't list the primary >>use >> >> >>case >> >> >> for >> >> >> > > >> >>having >> >> >> > > >> >>close with a bounded timeout which was to avoid blocking >>too >> >> >>long >> >> >> on >> >> >> > > >> >>shutdown. >> >> >> > > >> >> >> >> >> > > >> >>-Jay >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin >> >> >> > > >><j...@linkedin.com.invalid >> >> >> > > >> > >> >> >> > > >> >>wrote: >> >> >> > > >> >> >> >> >> > > >> >>> Hi, >> >> >> > > >> >>> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new >> >> >> producer. >> >> >> > > >>Most >> >> >> > > >> >>>of >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth >> >> >> Brahmbhatt >> >> >> > > >>has >> >> >> > > >> >>> already done a lot of work. >> >> >> > > >> >>> Since this is an interface change so we are going >>through >> >>the >> >> >> KIP >> >> >> > > >> >>>process. >> >> >> > > >> >>> Here is the KIP link: >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> >> >> >> > > >> >> >> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978 >> >> >> > > >> >>>2 >> >> >> > > >> >>> >> >> >> > > >> >>> Thanks. >> >> >> > > >> >>> >> >> >> > > >> >>> Jiangjie (Becket) Qin >> >> >> > > >> >>> >> >> >> > > >> > >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> > > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Thanks, >> >> >> Neha >> >> >> >> >> > >> >> > >> >> > >> >> >-- >> >> >-- Guozhang >> >> >> >> >> > >> > >> >-- >> >-- Guozhang >> >> > > >-- >-- Guozhang