Thanks Jiangjie, After talking to you offline on this, I have been convinced and changed my preference to blocking. The immediate shutdown approach does have some unsafeness in some cases.
Guozhang On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > It looks that the problem we want to solve and the purpose we want to > achieve is: > If user uses close() in callback, we want to let user be aware that they > should use close(0) instead of close() in the callback. > > We have agreed that we will have an error log to inform user about this > mis-usage. The options differ in the way how we can force user to take a > look at that error log. > There are two scenarios: > 1. User does not expect the program to exit. > 2. User expect the program to exit. > > For scenario 1), blocking will probably delay the discovery of the > problem. Calling close(0) exposes the problem quicker. In this scenario > producer just encounter a send failure when running normally. > For scenario 2), blocking will expose the problem quick. Calling close(-1) > might hide the problem. This scenario might include: a) Unit test for a > send failure. b) Message sending during a close() call from a user thread. > > So as a summary table: > > Scenario 1) Scenario 2) > > Blocking Delay problem discovery Guaranteed problem discovery > > Close(-1) Immediate problem discovery Problem might be hidden > > > Personally I prefer blocking because it seems providing more guarantees > and safer. > > Thanks. > > Jiangjie (Becket) Qin > > > On 3/16/15, 10:11 AM, "Guozhang Wang" <wangg...@gmail.com> wrote: > > >HI Jiangjie, > > > >As far as I understand calling close() in the ioThread is not common, as > >it > >may only trigger when we saw some non-retriable error. Hence when user run > >their program it is unlikely that close() will be triggered and problem > >will be detected. So it seems to me that from the error detection aspect > >these two options seems to be the same as people will usually detect it > >from the producer metrics all dropping to 0. > > > >Guozhang > > > >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin <j...@linkedin.com.invalid> > >wrote: > > > >> It seems there are two options we can choose from when close() is called > >> from sender thread (callback): > >> 1. Log an error and close the producer using close(-1) > >> 2. Log an error and block. > >> (Throwing an exception will not work because we catch all the exception > >> thrown from user callback. It will just lead to an error log.) > >> > >> My concern for the first option is that the producer will be closed even > >> if we logged and error. I am wondering if some user would not even take > >>a > >> look at the log if producer is closed normally. Because from the > >>programs > >> behavior, everything looks good. If that is the case, the error message > >>we > >> logged probably will just be ignored until some day when people check > >>the > >> log and see it. > >> > >> As for the second option, because producer does not close but blocks. > >>User > >> will notice this the first time they run the program. They probably will > >> look at the log to see why producer could not be closed and they will > >>see > >> the error log we put there. So they will get informed about this > >>mis-usage > >> of close() in sender thread the first time they run the code instead of > >> some time later. > >> > >> Personally I prefer the second one because it is more obvious that > >> something was wrong. > >> > >> Jiangjie (Becket) Qin > >> > >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > >> > >> >Yeah I agree we should not silently change the behavior of the function > >> >with the given parameters; and I would prefer > >>error-logging-and-shutdown > >> >over blocking when close(>0) is used, since as Neha suggested blocking > >> >would also not proceed with sending any data, bu will just let users to > >> >realize the issue later than sooner. > >> > > >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <n...@confluent.io> > >>wrote: > >> > > >> >> > > >> >> > And I also agree it is better if we can make producer block when > >> >> > close() is called from sender thread so user will notice something > >> >>went > >> >> > wrong. > >> >> > >> >> > >> >> This isn't a great experience either. Why can't we just throw an > >> >>exception > >> >> for a behavior we know is incorrect and we'd like the user to know. > >> >> Blocking as a means of doing that seems wrong and annoying. > >> >> > >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <jay.kr...@gmail.com> > >> wrote: > >> >> > >> >> > Cool. > >> >> > > >> >> > I think blocking is good or alternately throwing an exception > >>directly > >> >> from > >> >> > close(). Basically I would just worry about subtly doing something > >> >> slightly > >> >> > different from what the user asked for as it will be hard to notice > >> >>that > >> >> > behavior difference. > >> >> > > >> >> > -Jay > >> >> > > >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin > >> >><j...@linkedin.com.invalid > >> >> > > >> >> > wrote: > >> >> > > >> >> > > Hi Jay, > >> >> > > > >> >> > > I have modified the KIP as you suggested. I thinks as long as we > >> >>have > >> >> > > consistent define for timeout across Kafka interface, there would > >> >>be no > >> >> > > problem. And I also agree it is better if we can make producer > >>block > >> >> when > >> >> > > close() is called from sender thread so user will notice > >>something > >> >>went > >> >> > > wrong. > >> >> > > > >> >> > > Thanks. > >> >> > > > >> >> > > Jiangjie (Becket) Qin > >> >> > > > >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: > >> >> > > > >> >> > > >Hey Jiangjie, > >> >> > > > > >> >> > > >I think this is going to be very confusing that > >> >> > > > close(0) waits indefinitely and > >> >> > > > close(-1) waits for 0. > >> >> > > >I understand this appears in other apis, but it is a constant > >> >>cause of > >> >> > > >bugs. Let's not repeat that mistake. > >> >> > > > > >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait > >> >> indefinitely > >> >> > > >as > >> >> > > >we already have close() so having a magical constant for that is > >> >> > > >redundant. > >> >> > > > > >> >> > > >Calling close() from the I/O thread was already possible and > >>would > >> >> block > >> >> > > >indefinitely. I think trying to silently change the behavior is > >> >> probably > >> >> > > >not right. I.e. if the user calls close() in the callback there > >>is > >> >> > > >actually > >> >> > > >some misunderstanding and they need to think more, silently > >>making > >> >> this > >> >> > > >not > >> >> > > >block will hide the problem from them which is the opposite of > >> >>what we > >> >> > > >want. > >> >> > > > > >> >> > > >-Jay > >> >> > > > > >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin > >> >> <j...@linkedin.com.invalid > >> >> > > > >> >> > > >wrote: > >> >> > > > > >> >> > > >> Hey Joe & Jay, > >> >> > > >> > >> >> > > >> Thanks for the comments on the voting thread. Since it seems > >>we > >> >> > probably > >> >> > > >> will have more discussion on this, I am just replying from the > >> >> > > >>discussion > >> >> > > >> thread here. > >> >> > > >> I’ve updated the KIP page to make it less like half-baked, > >> >>apologize > >> >> > for > >> >> > > >> the rush... > >> >> > > >> > >> >> > > >> The contract in current KIP is: > >> >> > > >> 1. close() - wait until all requests either are sent or > >>reach > >> >> > request > >> >> > > >> timeout. > >> >> > > >> 2. close(-1, TimeUnit.MILLISECONDS) - close immediately > >> >> > > >> 3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), > >> >>i.e. > >> >> > Wait > >> >> > > >> until all requests are sent or reach request timeout > >> >> > > >> 4. close(5, TimeUnit.MILLISECONDS) - try the best to finish > >> >> sending > >> >> > > >>in 5 > >> >> > > >> milliseconds, if something went wrong, just shutdown the > >>producer > >> >> > > >>anyway, > >> >> > > >> my callback will handle the failures. > >> >> > > >> > >> >> > > >> About how we define what timeout value stands for, I actually > >> >> > struggled > >> >> > > >>a > >> >> > > >> little bit when wrote the patch. Intuitively, close(0) should > >> >>mean > >> >> > > >> immediately, however it seems that all the existing java class > >> >>have > >> >> > this > >> >> > > >> convention of timeout=0 means no timeout or never timeout > >> >> > > >>(Thread.join(0), > >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we follow > >>the > >> >> > > >> intuition or we follow the convention. What I chose is to > >>follow > >> >>the > >> >> > > >> convention but document the interface to let user be aware of > >>the > >> >> > usage. > >> >> > > >> The reason is that I think producer.close() is a public > >> >>interface so > >> >> > it > >> >> > > >> might be better to follow java convention. Whereas selector is > >> >>not a > >> >> > > >> public interface that used by user, so as long as it makes > >>sense > >> >>to > >> >> > us, > >> >> > > >>it > >> >> > > >> is less a problem to be different from java convention. That > >>said > >> >> > since > >> >> > > >> consumer.poll(timeout) is also a public interface, I think it > >> >>also > >> >> > makes > >> >> > > >> sense to make producer.close() to have the same definition of > >> >> > > >> consumer.poll(timeout). > >> >> > > >> > >> >> > > >> The main argument for keeping a timeout in close would be > >> >>separating > >> >> > the > >> >> > > >> close timeout from request timeout, which probably makes > >>sense. I > >> >> > would > >> >> > > >> guess typically the request timeout would be long (e.g. 60 > >> >>seconds) > >> >> > > >> because we might want to consider retries with back off time. > >>If > >> >>we > >> >> > have > >> >> > > >> multiple batches in accumulator, in worst case that could take > >> >>up to > >> >> > > >> several minutes to complete all the requests. But when we > >>close a > >> >> > > >> producer, we might not want to wait for that long as it might > >> >>cause > >> >> > some > >> >> > > >> other problem like deployment tool timeout. > >> >> > > >> > >> >> > > >> There is also a subtle difference between close(timeout) and > >> >> > > >> flush(timeout). The only purpose for flush() is to write data > >>to > >> >>the > >> >> > > >> broker, so it makes perfect sense to wait until request > >>timeout. > >> >>I > >> >> > think > >> >> > > >> that is why flush(timeout) looks strange. On the other hand, > >>the > >> >>top > >> >> > > >> priority for close() is to close the producer rather than > >>flush() > >> >> > data, > >> >> > > >>so > >> >> > > >> close(timeout) gives guarantee on bounded waiting for its main > >> >>job. > >> >> > > >> > >> >> > > >> Sorry for the confusion about forceClose flag. It is not a > >>public > >> >> > > >> interface. I mentioned it in Proposed Changes section which I > >> >> thought > >> >> > > >>was > >> >> > > >> supposed to provide implementation details. > >> >> > > >> > >> >> > > >> Thanks again for all the comments and suggestions! > >> >> > > >> > >> >> > > >> Jiangjie (Becket) Qin > >> >> > > >> > >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <j...@linkedin.com> > wrote: > >> >> > > >> > >> >> > > >> >The KIP page has been updated per Jay¹s comments. > >> >> > > >> >I¹d like to initiate the voting process if no further > >>comments > >> >>are > >> >> > > >> >received by tomorrow. > >> >> > > >> > > >> >> > > >> >Jiangjie (Becket) Qin > >> >> > > >> > > >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: > >> >> > > >> > > >> >> > > >> >>Hey Jiangjie, > >> >> > > >> >> > >> >> > > >> >>Can you capture the full motivation and use cases for the > >> >>feature? > >> >> > > >>This > >> >> > > >> >>mentions your interest in having a way of aborting from > >>inside > >> >>the > >> >> > > >> >>Callback. But it doesn't really explain that usage or why > >>other > >> >> > people > >> >> > > >> >>would want to do that. It also doesn't list the primary use > >> >>case > >> >> for > >> >> > > >> >>having > >> >> > > >> >>close with a bounded timeout which was to avoid blocking too > >> >>long > >> >> on > >> >> > > >> >>shutdown. > >> >> > > >> >> > >> >> > > >> >>-Jay > >> >> > > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin > >> >> > > >><j...@linkedin.com.invalid > >> >> > > >> > > >> >> > > >> >>wrote: > >> >> > > >> >> > >> >> > > >> >>> Hi, > >> >> > > >> >>> > >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new > >> >> producer. > >> >> > > >>Most > >> >> > > >> >>>of > >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth > >> >> Brahmbhatt > >> >> > > >>has > >> >> > > >> >>> already done a lot of work. > >> >> > > >> >>> Since this is an interface change so we are going through > >>the > >> >> KIP > >> >> > > >> >>>process. > >> >> > > >> >>> Here is the KIP link: > >> >> > > >> >>> > >> >> > > >> >>> > >> >> > > >> > >> >> > > > >> >> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978 > >> >> > > >> >>>2 > >> >> > > >> >>> > >> >> > > >> >>> Thanks. > >> >> > > >> >>> > >> >> > > >> >>> Jiangjie (Becket) Qin > >> >> > > >> >>> > >> >> > > >> > > >> >> > > >> > >> >> > > >> > >> >> > > > >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Thanks, > >> >> Neha > >> >> > >> > > >> > > >> > > >> >-- > >> >-- Guozhang > >> > >> > > > > > >-- > >-- Guozhang > > -- -- Guozhang