Re: Zombie writers protection

Yi Pan Wed, 10 Feb 2016 17:23:14 -0800

Hi, Rick and John,

Thanks for the great discussion! As Jacob said, we realized the possible
drawbacks relying solely on YARN for process liveness detection as well and
that's why SAMZA-871 was opened. Please help to comment on the JIRA so that
we can track the discussion and move the design process forward.


Thanks a lot!

-Yi

On Wed, Feb 10, 2016 at 2:10 PM, Rick Mangi <r...@chartbeat.com> wrote:

> Jake, Not my question, I was just adding my 2 cents :)
>
> John, it’s not that yarn is responsible for maintaining 1 instance of each
> container, samza has an abstract management layer that defers this to yarn,
> but some people bypass yarn all together and manage their containers
> themselves or run on things like mesos.
>
> For your purposes though, if you are using yarn, then yes this is yarn’s
> job.
>
> The case I ran into was with cloudera’s distro of yarn with an older
> version of ubuntu and yarn. I haven’t seen zombies since moving to the
> latest yarn distro.
>
>
>
> > On Feb 10, 2016, at 4:44 PM, Jacob Maes <jacob.m...@gmail.com> wrote:
> >
> > Hey Rick,
> >
> > If I understand your question, the goal is really to make sure there are
> no
> > orphaned containers that continue to run "off the books".
> >
> > The newly added SAMZA-871 describes a heart beat mechanism to make sure
> > orphaned containers actually get killed.
> >
> > Also, the YARN Node Manager Restart capability might help. We're in the
> > process of testing this at LinkedIn:
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
> >
> > -Jake
> >
> > On Wed, Feb 10, 2016 at 1:42 PM, John Dennison <dennison.j...@gmail.com>
> > wrote:
> >
> >> To second Rick's point. Its less about malicious actors, but rather
> >> containers thought to be lost due to a network partition popping up
> later
> >> and starting to write to the change log. I assume from Rick's response
> that
> >> yarn is responsible for ensure only one version of each container is
> >> running and samza has nothing internal to deal with this.
> >>
> >> I guess you could hijack kafka's auth framework to block old zombie
> >> containers from writing. Use some global lock's incrementing token as
> the
> >> password. A zombie process would auth with an old token and be denied. I
> >> haven't looked but i imagine that 0.9.0 auth framework isn't done on a
> >> partition level.
> >>
> >> On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <r...@chartbeat.com> wrote:
> >>
> >>> Security wouldn’t stop zombie processes from writing to kafka. I had
> this
> >>> problem with yarn before where the container thought it was killing
> jobs
> >>> but they never actually died, and in fact continued to write to kafka.
> >>>
> >>>
> >>>> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
> >>> jagadish1...@gmail.com> wrote:
> >>>>
> >>>> Hi John
> >>>>
> >>>> Currently there is no authorization on who writes to Kafka. There is a
> >>>> Kafka security proposal that the kafka community is working on.
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
> >>>>
> >>>> Building this into Samza may entail expensive coordination (to prevent
> >>>> other jobs). Since, jobs are usually run in a trusted environment,
> I've
> >>> not
> >>>> seen people requesting this use-case. Even if we did build this into
> >>> Samza,
> >>>> nothing stops people from writing to that Kafka topic by bypassing
> >> Samza
> >>>> completely. (thro' the kafka producer or external library)
> >>>>
> >>>> I'd think Kafka would build support for authorization, principals,
> >> roles
> >>>> etc. in the future and Samza can leverage it once it's done.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> On Wednesday, February 10, 2016, John Dennison <
> >> dennison.j...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Greetings,
> >>>>>
> >>>>> I have general design question i did not see addressed in the docs.
> >>>>> Basically how does samza guarantee a single writer for each changelog
> >>>>> partition. Because of strong ordering assumption of these changelog,
> >>> how do
> >>>>> you protect against zombie processes writing to the changelog with
> out
> >>> of
> >>>>> date values.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> John
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Zombie writers protection

Reply via email to