Hey Jae, > If so, what's the best way to shutdown the container without using command topic?
YARN does send a SIGTERM before SIGKILL. The config in YARN to set the latency is here: yarn.nodemanager.sleep-delay-before-sigkill.ms The default is 250ms. Samza does *not* currently handle the SIGTERM gracefully (it doesn't shut itself down). The ticket to do this is here: https://issues.apache.org/jira/browse/SAMZA-506 If you'd like to work on that patch, that should make it work. If not, yes, you'll have to use some form of a shutdown command. Zach (the guy who opened the JIRA) was able to hack around this himself by adding a shutdown hook. You could do something similar, if you want: add a shutdown hook that sets a variable, have window() check the variable ever N ms, and call coordinator.shutdown if it's set to true. You'd probably also have to raise the delay to more than 250ms in YARN. Options: 1. Use a topic like samza_command. 2. Fix SAMZA-506. 3. Write a custom shutdown hook with a static variable. > Does it hurt overall processing performance? I don't think so, but I want to confirm. Nope, shouldn't. It only sleeps during "idle" time (no messages available). When there are messages available, you shouldn't get null_envelopes (unless you have a custom MessageChooser that withholds available messages, which I doubt you do). Cheers, Chris On Fri, Feb 6, 2015 at 12:30 PM, Bae, Jae Hyeon <metac...@gmail.com> wrote: > What I am doing is, consuming two topics, samza_input and samza_command. > samza_command will have some control command something like "shutdown,all" > because kill-yarn-job.sh does not gracefully shutdown SamzaContainer. Am I > correct? If so, what's the best way to shutdown the container without using > command topic? > > 10ms explains why 50 null envelops were consumed per second. Does it hurt > overall processing performance? I don't think so, but I want to confirm. > > Thank you > Best, Jae > > On Fri, Feb 6, 2015 at 12:16 PM, Chris Riccomini <criccom...@apache.org> > wrote: > > > Hey Jae, > > > > SamzaContainer polls for new messages by calling > > consumerMultiplexer.choose. In a case where there are no messages > > available, choose will return null. The next time choose is called, it > will > > be invoked with a timeout (the default is 10ms). This time, the poll call > > will block until 1) the timeout is hit 2) there is a new message > available > > to process. This is to prevent a tight loop. > > > > > its frequency is too high, in my testing environment, it's more than 50 > > per second. > > > > Why do you think this is too high? It either has to do this, or sleep for > > longer. The longer the container sleeps, the more latency that's > introduced > > when there *is* a message available. 10ms is what we use by default. > > > > Cheers, > > Chris > > > > On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <metac...@gmail.com> > > wrote: > > > > > Could you explain why consumerMultiplexer.choose returns null? > > > > > > Can it happen when there's no message in the kafka topic? > > > > > > If my theory is correct, its frequency is too high, in my testing > > > environment, it's more than 50 per second. > > > > > > Thank you > > > Best, Jae > > > > > >