How does close(0) work if it's called from the sender thread? If close(0) needs to wait for the sender thread to join, wouldn't this cause a deadlock?
Thanks, Jun On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Thanks Guozhang. It wouldn’t be as thoroughly considered without > discussing with you :) > > Jiangjie (Becket) Qin > > On 3/16/15, 1:07 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > > >Thanks Jiangjie, > > > >After talking to you offline on this, I have been convinced and changed my > >preference to blocking. The immediate shutdown approach does have some > >unsafeness in some cases. > > > >Guozhang > > > >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin <j...@linkedin.com.invalid > > > >wrote: > > > >> It looks that the problem we want to solve and the purpose we want to > >> achieve is: > >> If user uses close() in callback, we want to let user be aware that they > >> should use close(0) instead of close() in the callback. > >> > >> We have agreed that we will have an error log to inform user about this > >> mis-usage. The options differ in the way how we can force user to take a > >> look at that error log. > >> There are two scenarios: > >> 1. User does not expect the program to exit. > >> 2. User expect the program to exit. > >> > >> For scenario 1), blocking will probably delay the discovery of the > >> problem. Calling close(0) exposes the problem quicker. In this scenario > >> producer just encounter a send failure when running normally. > >> For scenario 2), blocking will expose the problem quick. Calling > >>close(-1) > >> might hide the problem. This scenario might include: a) Unit test for a > >> send failure. b) Message sending during a close() call from a user > >>thread. > >> > >> So as a summary table: > >> > >> Scenario 1) Scenario 2) > >> > >> Blocking Delay problem discovery Guaranteed problem > >>discovery > >> > >> Close(-1) Immediate problem discovery Problem might be hidden > >> > >> > >> Personally I prefer blocking because it seems providing more guarantees > >> and safer. > >> > >> Thanks. > >> > >> Jiangjie (Becket) Qin > >> > >> > >> On 3/16/15, 10:11 AM, "Guozhang Wang" <wangg...@gmail.com> wrote: > >> > >> >HI Jiangjie, > >> > > >> >As far as I understand calling close() in the ioThread is not common, > >>as > >> >it > >> >may only trigger when we saw some non-retriable error. Hence when user > >>run > >> >their program it is unlikely that close() will be triggered and problem > >> >will be detected. So it seems to me that from the error detection > >>aspect > >> >these two options seems to be the same as people will usually detect it > >> >from the producer metrics all dropping to 0. > >> > > >> >Guozhang > >> > > >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin > >><j...@linkedin.com.invalid> > >> >wrote: > >> > > >> >> It seems there are two options we can choose from when close() is > >>called > >> >> from sender thread (callback): > >> >> 1. Log an error and close the producer using close(-1) > >> >> 2. Log an error and block. > >> >> (Throwing an exception will not work because we catch all the > >>exception > >> >> thrown from user callback. It will just lead to an error log.) > >> >> > >> >> My concern for the first option is that the producer will be closed > >>even > >> >> if we logged and error. I am wondering if some user would not even > >>take > >> >>a > >> >> look at the log if producer is closed normally. Because from the > >> >>programs > >> >> behavior, everything looks good. If that is the case, the error > >>message > >> >>we > >> >> logged probably will just be ignored until some day when people check > >> >>the > >> >> log and see it. > >> >> > >> >> As for the second option, because producer does not close but blocks. > >> >>User > >> >> will notice this the first time they run the program. They probably > >>will > >> >> look at the log to see why producer could not be closed and they will > >> >>see > >> >> the error log we put there. So they will get informed about this > >> >>mis-usage > >> >> of close() in sender thread the first time they run the code instead > >>of > >> >> some time later. > >> >> > >> >> Personally I prefer the second one because it is more obvious that > >> >> something was wrong. > >> >> > >> >> Jiangjie (Becket) Qin > >> >> > >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > >> >> > >> >> >Yeah I agree we should not silently change the behavior of the > >>function > >> >> >with the given parameters; and I would prefer > >> >>error-logging-and-shutdown > >> >> >over blocking when close(>0) is used, since as Neha suggested > >>blocking > >> >> >would also not proceed with sending any data, bu will just let > >>users to > >> >> >realize the issue later than sooner. > >> >> > > >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede <n...@confluent.io> > >> >>wrote: > >> >> > > >> >> >> > > >> >> >> > And I also agree it is better if we can make producer block when > >> >> >> > close() is called from sender thread so user will notice > >>something > >> >> >>went > >> >> >> > wrong. > >> >> >> > >> >> >> > >> >> >> This isn't a great experience either. Why can't we just throw an > >> >> >>exception > >> >> >> for a behavior we know is incorrect and we'd like the user to > >>know. > >> >> >> Blocking as a means of doing that seems wrong and annoying. > >> >> >> > >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps <jay.kr...@gmail.com> > >> >> wrote: > >> >> >> > >> >> >> > Cool. > >> >> >> > > >> >> >> > I think blocking is good or alternately throwing an exception > >> >>directly > >> >> >> from > >> >> >> > close(). Basically I would just worry about subtly doing > >>something > >> >> >> slightly > >> >> >> > different from what the user asked for as it will be hard to > >>notice > >> >> >>that > >> >> >> > behavior difference. > >> >> >> > > >> >> >> > -Jay > >> >> >> > > >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin > >> >> >><j...@linkedin.com.invalid > >> >> >> > > >> >> >> > wrote: > >> >> >> > > >> >> >> > > Hi Jay, > >> >> >> > > > >> >> >> > > I have modified the KIP as you suggested. I thinks as long as > >>we > >> >> >>have > >> >> >> > > consistent define for timeout across Kafka interface, there > >>would > >> >> >>be no > >> >> >> > > problem. And I also agree it is better if we can make producer > >> >>block > >> >> >> when > >> >> >> > > close() is called from sender thread so user will notice > >> >>something > >> >> >>went > >> >> >> > > wrong. > >> >> >> > > > >> >> >> > > Thanks. > >> >> >> > > > >> >> >> > > Jiangjie (Becket) Qin > >> >> >> > > > >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <jay.kr...@gmail.com> > wrote: > >> >> >> > > > >> >> >> > > >Hey Jiangjie, > >> >> >> > > > > >> >> >> > > >I think this is going to be very confusing that > >> >> >> > > > close(0) waits indefinitely and > >> >> >> > > > close(-1) waits for 0. > >> >> >> > > >I understand this appears in other apis, but it is a constant > >> >> >>cause of > >> >> >> > > >bugs. Let's not repeat that mistake. > >> >> >> > > > > >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to wait > >> >> >> indefinitely > >> >> >> > > >as > >> >> >> > > >we already have close() so having a magical constant for > >>that is > >> >> >> > > >redundant. > >> >> >> > > > > >> >> >> > > >Calling close() from the I/O thread was already possible and > >> >>would > >> >> >> block > >> >> >> > > >indefinitely. I think trying to silently change the behavior > >>is > >> >> >> probably > >> >> >> > > >not right. I.e. if the user calls close() in the callback > >>there > >> >>is > >> >> >> > > >actually > >> >> >> > > >some misunderstanding and they need to think more, silently > >> >>making > >> >> >> this > >> >> >> > > >not > >> >> >> > > >block will hide the problem from them which is the opposite > >>of > >> >> >>what we > >> >> >> > > >want. > >> >> >> > > > > >> >> >> > > >-Jay > >> >> >> > > > > >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin > >> >> >> <j...@linkedin.com.invalid > >> >> >> > > > >> >> >> > > >wrote: > >> >> >> > > > > >> >> >> > > >> Hey Joe & Jay, > >> >> >> > > >> > >> >> >> > > >> Thanks for the comments on the voting thread. Since it > >>seems > >> >>we > >> >> >> > probably > >> >> >> > > >> will have more discussion on this, I am just replying from > >>the > >> >> >> > > >>discussion > >> >> >> > > >> thread here. > >> >> >> > > >> I’ve updated the KIP page to make it less like half-baked, > >> >> >>apologize > >> >> >> > for > >> >> >> > > >> the rush... > >> >> >> > > >> > >> >> >> > > >> The contract in current KIP is: > >> >> >> > > >> 1. close() - wait until all requests either are sent or > >> >>reach > >> >> >> > request > >> >> >> > > >> timeout. > >> >> >> > > >> 2. close(-1, TimeUnit.MILLISECONDS) - close immediately > >> >> >> > > >> 3. close(0, TimeUnit.MILLISECONDS) - equivalent to > >>close(), > >> >> >>i.e. > >> >> >> > Wait > >> >> >> > > >> until all requests are sent or reach request timeout > >> >> >> > > >> 4. close(5, TimeUnit.MILLISECONDS) - try the best to > >>finish > >> >> >> sending > >> >> >> > > >>in 5 > >> >> >> > > >> milliseconds, if something went wrong, just shutdown the > >> >>producer > >> >> >> > > >>anyway, > >> >> >> > > >> my callback will handle the failures. > >> >> >> > > >> > >> >> >> > > >> About how we define what timeout value stands for, I > >>actually > >> >> >> > struggled > >> >> >> > > >>a > >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0) > >>should > >> >> >>mean > >> >> >> > > >> immediately, however it seems that all the existing java > >>class > >> >> >>have > >> >> >> > this > >> >> >> > > >> convention of timeout=0 means no timeout or never timeout > >> >> >> > > >>(Thread.join(0), > >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we > >>follow > >> >>the > >> >> >> > > >> intuition or we follow the convention. What I chose is to > >> >>follow > >> >> >>the > >> >> >> > > >> convention but document the interface to let user be aware > >>of > >> >>the > >> >> >> > usage. > >> >> >> > > >> The reason is that I think producer.close() is a public > >> >> >>interface so > >> >> >> > it > >> >> >> > > >> might be better to follow java convention. Whereas > >>selector is > >> >> >>not a > >> >> >> > > >> public interface that used by user, so as long as it makes > >> >>sense > >> >> >>to > >> >> >> > us, > >> >> >> > > >>it > >> >> >> > > >> is less a problem to be different from java convention. > >>That > >> >>said > >> >> >> > since > >> >> >> > > >> consumer.poll(timeout) is also a public interface, I think > >>it > >> >> >>also > >> >> >> > makes > >> >> >> > > >> sense to make producer.close() to have the same definition > >>of > >> >> >> > > >> consumer.poll(timeout). > >> >> >> > > >> > >> >> >> > > >> The main argument for keeping a timeout in close would be > >> >> >>separating > >> >> >> > the > >> >> >> > > >> close timeout from request timeout, which probably makes > >> >>sense. I > >> >> >> > would > >> >> >> > > >> guess typically the request timeout would be long (e.g. 60 > >> >> >>seconds) > >> >> >> > > >> because we might want to consider retries with back off > >>time. > >> >>If > >> >> >>we > >> >> >> > have > >> >> >> > > >> multiple batches in accumulator, in worst case that could > >>take > >> >> >>up to > >> >> >> > > >> several minutes to complete all the requests. But when we > >> >>close a > >> >> >> > > >> producer, we might not want to wait for that long as it > >>might > >> >> >>cause > >> >> >> > some > >> >> >> > > >> other problem like deployment tool timeout. > >> >> >> > > >> > >> >> >> > > >> There is also a subtle difference between close(timeout) > >>and > >> >> >> > > >> flush(timeout). The only purpose for flush() is to write > >>data > >> >>to > >> >> >>the > >> >> >> > > >> broker, so it makes perfect sense to wait until request > >> >>timeout. > >> >> >>I > >> >> >> > think > >> >> >> > > >> that is why flush(timeout) looks strange. On the other > >>hand, > >> >>the > >> >> >>top > >> >> >> > > >> priority for close() is to close the producer rather than > >> >>flush() > >> >> >> > data, > >> >> >> > > >>so > >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for its > >>main > >> >> >>job. > >> >> >> > > >> > >> >> >> > > >> Sorry for the confusion about forceClose flag. It is not a > >> >>public > >> >> >> > > >> interface. I mentioned it in Proposed Changes section > >>which I > >> >> >> thought > >> >> >> > > >>was > >> >> >> > > >> supposed to provide implementation details. > >> >> >> > > >> > >> >> >> > > >> Thanks again for all the comments and suggestions! > >> >> >> > > >> > >> >> >> > > >> Jiangjie (Becket) Qin > >> >> >> > > >> > >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <j...@linkedin.com> > >> wrote: > >> >> >> > > >> > >> >> >> > > >> >The KIP page has been updated per Jay¹s comments. > >> >> >> > > >> >I¹d like to initiate the voting process if no further > >> >>comments > >> >> >>are > >> >> >> > > >> >received by tomorrow. > >> >> >> > > >> > > >> >> >> > > >> >Jiangjie (Becket) Qin > >> >> >> > > >> > > >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <jay.kr...@gmail.com> > >>wrote: > >> >> >> > > >> > > >> >> >> > > >> >>Hey Jiangjie, > >> >> >> > > >> >> > >> >> >> > > >> >>Can you capture the full motivation and use cases for the > >> >> >>feature? > >> >> >> > > >>This > >> >> >> > > >> >>mentions your interest in having a way of aborting from > >> >>inside > >> >> >>the > >> >> >> > > >> >>Callback. But it doesn't really explain that usage or why > >> >>other > >> >> >> > people > >> >> >> > > >> >>would want to do that. It also doesn't list the primary > >>use > >> >> >>case > >> >> >> for > >> >> >> > > >> >>having > >> >> >> > > >> >>close with a bounded timeout which was to avoid blocking > >>too > >> >> >>long > >> >> >> on > >> >> >> > > >> >>shutdown. > >> >> >> > > >> >> > >> >> >> > > >> >>-Jay > >> >> >> > > >> >> > >> >> >> > > >> >> > >> >> >> > > >> >> > >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin > >> >> >> > > >><j...@linkedin.com.invalid > >> >> >> > > >> > > >> >> >> > > >> >>wrote: > >> >> >> > > >> >> > >> >> >> > > >> >>> Hi, > >> >> >> > > >> >>> > >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to new > >> >> >> producer. > >> >> >> > > >>Most > >> >> >> > > >> >>>of > >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where Parth > >> >> >> Brahmbhatt > >> >> >> > > >>has > >> >> >> > > >> >>> already done a lot of work. > >> >> >> > > >> >>> Since this is an interface change so we are going > >>through > >> >>the > >> >> >> KIP > >> >> >> > > >> >>>process. > >> >> >> > > >> >>> Here is the KIP link: > >> >> >> > > >> >>> > >> >> >> > > >> >>> > >> >> >> > > >> > >> >> >> > > > >> >> >> > >> >> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978 > >> >> >> > > >> >>>2 > >> >> >> > > >> >>> > >> >> >> > > >> >>> Thanks. > >> >> >> > > >> >>> > >> >> >> > > >> >>> Jiangjie (Becket) Qin > >> >> >> > > >> >>> > >> >> >> > > >> > > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > > >> >> >> > > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Thanks, > >> >> >> Neha > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> >-- > >> >> >-- Guozhang > >> >> > >> >> > >> > > >> > > >> >-- > >> >-- Guozhang > >> > >> > > > > > >-- > >-- Guozhang > >