HI Jiangjie, As far as I understand calling close() in the ioThread is not common, as it may only trigger when we saw some non-retriable error. Hence when user run their program it is unlikely that close() will be triggered and problem will be detected. So it seems to me that from the error detection aspect these two options seems to be the same as people will usually detect it from the producer metrics all dropping to 0.
Guozhang On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > It seems there are two options we can choose from when close() is called > from sender thread (callback): > 1. Log an error and close the producer using close(-1) > 2. Log an error and block. > (Throwing an exception will not work because we catch all the exception > thrown from user callback. It will just lead to an error log.) > > My concern for the first option is that the producer will be closed even > if we logged and error. I am wondering if some user would not even take a > look at the log if producer is closed normally. Because from the programs > behavior, everything looks good. If that is the case, the error message we > logged probably will just be ignored until some day when people check the > log and see it. > > As for the second option, because producer does not close but blocks. User > will notice this the first time they run the program. They probably will > look at the log to see why producer could not be closed and they will see > the error log we put there. So they will get informed about this mis-usage > of close() in sender thread the first time they run the code instead of > some time later. > > Personally I prefer the second one because it is more obvious that > something was wrong. > > Jiangjie (Becket) Qin > > On 3/15/15, 4:27 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > > >Yeah I agree we should not silently change the behavior of the function > >with the given parameters; and I would prefer error-logging-and-shutdown > >over blocking when close(>0) is used, since as Neha suggested blocking > >would also not proceed with sending any data, bu will just let users to > >realize the issue later than sooner. > > > >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <n...@confluent.io> wrote: > > > >> > > >> > And I also agree it is better if we can make producer block when > >> > close() is called from sender thread so user will notice something > >>went > >> > wrong. > >> > >> > >> This isn't a great experience either. Why can't we just throw an > >>exception > >> for a behavior we know is incorrect and we'd like the user to know. > >> Blocking as a means of doing that seems wrong and annoying. > >> > >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <jay.kr...@gmail.com> > wrote: > >> > >> > Cool. > >> > > >> > I think blocking is good or alternately throwing an exception directly > >> from > >> > close(). Basically I would just worry about subtly doing something > >> slightly > >> > different from what the user asked for as it will be hard to notice > >>that > >> > behavior difference. > >> > > >> > -Jay > >> > > >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin > >><j...@linkedin.com.invalid > >> > > >> > wrote: > >> > > >> > > Hi Jay, > >> > > > >> > > I have modified the KIP as you suggested. I thinks as long as we > >>have > >> > > consistent define for timeout across Kafka interface, there would > >>be no > >> > > problem. And I also agree it is better if we can make producer block > >> when > >> > > close() is called from sender thread so user will notice something > >>went > >> > > wrong. > >> > > > >> > > Thanks. > >> > > > >> > > Jiangjie (Becket) Qin > >> > > > >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: > >> > > > >> > > >Hey Jiangjie, > >> > > > > >> > > >I think this is going to be very confusing that > >> > > > close(0) waits indefinitely and > >> > > > close(-1) waits for 0. > >> > > >I understand this appears in other apis, but it is a constant > >>cause of > >> > > >bugs. Let's not repeat that mistake. > >> > > > > >> > > >Let's make close(0) wait for 0. We don't need a way to wait > >> indefinitely > >> > > >as > >> > > >we already have close() so having a magical constant for that is > >> > > >redundant. > >> > > > > >> > > >Calling close() from the I/O thread was already possible and would > >> block > >> > > >indefinitely. I think trying to silently change the behavior is > >> probably > >> > > >not right. I.e. if the user calls close() in the callback there is > >> > > >actually > >> > > >some misunderstanding and they need to think more, silently making > >> this > >> > > >not > >> > > >block will hide the problem from them which is the opposite of > >>what we > >> > > >want. > >> > > > > >> > > >-Jay > >> > > > > >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin > >> <j...@linkedin.com.invalid > >> > > > >> > > >wrote: > >> > > > > >> > > >> Hey Joe & Jay, > >> > > >> > >> > > >> Thanks for the comments on the voting thread. Since it seems we > >> > probably > >> > > >> will have more discussion on this, I am just replying from the > >> > > >>discussion > >> > > >> thread here. > >> > > >> I’ve updated the KIP page to make it less like half-baked, > >>apologize > >> > for > >> > > >> the rush... > >> > > >> > >> > > >> The contract in current KIP is: > >> > > >> 1. close() - wait until all requests either are sent or reach > >> > request > >> > > >> timeout. > >> > > >> 2. close(-1, TimeUnit.MILLISECONDS) - close immediately > >> > > >> 3. close(0, TimeUnit.MILLISECONDS) - equivalent to close(), > >>i.e. > >> > Wait > >> > > >> until all requests are sent or reach request timeout > >> > > >> 4. close(5, TimeUnit.MILLISECONDS) - try the best to finish > >> sending > >> > > >>in 5 > >> > > >> milliseconds, if something went wrong, just shutdown the producer > >> > > >>anyway, > >> > > >> my callback will handle the failures. > >> > > >> > >> > > >> About how we define what timeout value stands for, I actually > >> > struggled > >> > > >>a > >> > > >> little bit when wrote the patch. Intuitively, close(0) should > >>mean > >> > > >> immediately, however it seems that all the existing java class > >>have > >> > this > >> > > >> convention of timeout=0 means no timeout or never timeout > >> > > >>(Thread.join(0), > >> > > >> Object.wait(0), etc.) So here the dilemma is either we follow the > >> > > >> intuition or we follow the convention. What I chose is to follow > >>the > >> > > >> convention but document the interface to let user be aware of the > >> > usage. > >> > > >> The reason is that I think producer.close() is a public > >>interface so > >> > it > >> > > >> might be better to follow java convention. Whereas selector is > >>not a > >> > > >> public interface that used by user, so as long as it makes sense > >>to > >> > us, > >> > > >>it > >> > > >> is less a problem to be different from java convention. That said > >> > since > >> > > >> consumer.poll(timeout) is also a public interface, I think it > >>also > >> > makes > >> > > >> sense to make producer.close() to have the same definition of > >> > > >> consumer.poll(timeout). > >> > > >> > >> > > >> The main argument for keeping a timeout in close would be > >>separating > >> > the > >> > > >> close timeout from request timeout, which probably makes sense. I > >> > would > >> > > >> guess typically the request timeout would be long (e.g. 60 > >>seconds) > >> > > >> because we might want to consider retries with back off time. If > >>we > >> > have > >> > > >> multiple batches in accumulator, in worst case that could take > >>up to > >> > > >> several minutes to complete all the requests. But when we close a > >> > > >> producer, we might not want to wait for that long as it might > >>cause > >> > some > >> > > >> other problem like deployment tool timeout. > >> > > >> > >> > > >> There is also a subtle difference between close(timeout) and > >> > > >> flush(timeout). The only purpose for flush() is to write data to > >>the > >> > > >> broker, so it makes perfect sense to wait until request timeout. > >>I > >> > think > >> > > >> that is why flush(timeout) looks strange. On the other hand, the > >>top > >> > > >> priority for close() is to close the producer rather than flush() > >> > data, > >> > > >>so > >> > > >> close(timeout) gives guarantee on bounded waiting for its main > >>job. > >> > > >> > >> > > >> Sorry for the confusion about forceClose flag. It is not a public > >> > > >> interface. I mentioned it in Proposed Changes section which I > >> thought > >> > > >>was > >> > > >> supposed to provide implementation details. > >> > > >> > >> > > >> Thanks again for all the comments and suggestions! > >> > > >> > >> > > >> Jiangjie (Becket) Qin > >> > > >> > >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <j...@linkedin.com> wrote: > >> > > >> > >> > > >> >The KIP page has been updated per Jay¹s comments. > >> > > >> >I¹d like to initiate the voting process if no further comments > >>are > >> > > >> >received by tomorrow. > >> > > >> > > >> > > >> >Jiangjie (Becket) Qin > >> > > >> > > >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <jay.kr...@gmail.com> wrote: > >> > > >> > > >> > > >> >>Hey Jiangjie, > >> > > >> >> > >> > > >> >>Can you capture the full motivation and use cases for the > >>feature? > >> > > >>This > >> > > >> >>mentions your interest in having a way of aborting from inside > >>the > >> > > >> >>Callback. But it doesn't really explain that usage or why other > >> > people > >> > > >> >>would want to do that. It also doesn't list the primary use > >>case > >> for > >> > > >> >>having > >> > > >> >>close with a bounded timeout which was to avoid blocking too > >>long > >> on > >> > > >> >>shutdown. > >> > > >> >> > >> > > >> >>-Jay > >> > > >> >> > >> > > >> >> > >> > > >> >> > >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin > >> > > >><j...@linkedin.com.invalid > >> > > >> > > >> > > >> >>wrote: > >> > > >> >> > >> > > >> >>> Hi, > >> > > >> >>> > >> > > >> >>> I just created a KIP for adding a close(timeout) to new > >> producer. > >> > > >>Most > >> > > >> >>>of > >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth > >> Brahmbhatt > >> > > >>has > >> > > >> >>> already done a lot of work. > >> > > >> >>> Since this is an interface change so we are going through the > >> KIP > >> > > >> >>>process. > >> > > >> >>> Here is the KIP link: > >> > > >> >>> > >> > > >> >>> > >> > > >> > >> > > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978 > >> > > >> >>>2 > >> > > >> >>> > >> > > >> >>> Thanks. > >> > > >> >>> > >> > > >> >>> Jiangjie (Becket) Qin > >> > > >> >>> > >> > > >> > > >> > > >> > >> > > >> > >> > > > >> > > > >> > > >> > >> > >> > >> -- > >> Thanks, > >> Neha > >> > > > > > > > >-- > >-- Guozhang > > -- -- Guozhang