Hi, Rick and John,

Thanks for the great discussion! As Jacob said, we realized the possible
drawbacks relying solely on YARN for process liveness detection as well and
that's why SAMZA-871 was opened. Please help to comment on the JIRA so that
we can track the discussion and move the design process forward.

Thanks a lot!

-Yi

On Wed, Feb 10, 2016 at 2:10 PM, Rick Mangi <r...@chartbeat.com> wrote:

> Jake, Not my question, I was just adding my 2 cents :)
>
> John, it’s not that yarn is responsible for maintaining 1 instance of each
> container, samza has an abstract management layer that defers this to yarn,
> but some people bypass yarn all together and manage their containers
> themselves or run on things like mesos.
>
> For your purposes though, if you are using yarn, then yes this is yarn’s
> job.
>
> The case I ran into was with cloudera’s distro of yarn with an older
> version of ubuntu and yarn. I haven’t seen zombies since moving to the
> latest yarn distro.
>
>
>
> > On Feb 10, 2016, at 4:44 PM, Jacob Maes <jacob.m...@gmail.com> wrote:
> >
> > Hey Rick,
> >
> > If I understand your question, the goal is really to make sure there are
> no
> > orphaned containers that continue to run "off the books".
> >
> > The newly added SAMZA-871 describes a heart beat mechanism to make sure
> > orphaned containers actually get killed.
> >
> > Also, the YARN Node Manager Restart capability might help. We're in the
> > process of testing this at LinkedIn:
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html
> >
> > -Jake
> >
> > On Wed, Feb 10, 2016 at 1:42 PM, John Dennison <dennison.j...@gmail.com>
> > wrote:
> >
> >> To second Rick's point. Its less about malicious actors, but rather
> >> containers thought to be lost due to a network partition popping up
> later
> >> and starting to write to the change log. I assume from Rick's response
> that
> >> yarn is responsible for ensure only one version of each container is
> >> running and samza has nothing internal to deal with this.
> >>
> >> I guess you could hijack kafka's auth framework to block old zombie
> >> containers from writing. Use some global lock's incrementing token as
> the
> >> password. A zombie process would auth with an old token and be denied. I
> >> haven't looked but i imagine that 0.9.0 auth framework isn't done on a
> >> partition level.
> >>
> >> On Wed, Feb 10, 2016 at 2:27 PM, Rick Mangi <r...@chartbeat.com> wrote:
> >>
> >>> Security wouldn’t stop zombie processes from writing to kafka. I had
> this
> >>> problem with yarn before where the container thought it was killing
> jobs
> >>> but they never actually died, and in fact continued to write to kafka.
> >>>
> >>>
> >>>> On Feb 10, 2016, at 4:23 PM, Jagadish Venkatraman <
> >>> jagadish1...@gmail.com> wrote:
> >>>>
> >>>> Hi John
> >>>>
> >>>> Currently there is no authorization on who writes to Kafka. There is a
> >>>> Kafka security proposal that the kafka community is working on.
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/Security
> >>>>
> >>>> Building this into Samza may entail expensive coordination (to prevent
> >>>> other jobs). Since, jobs are usually run in a trusted environment,
> I've
> >>> not
> >>>> seen people requesting this use-case. Even if we did build this into
> >>> Samza,
> >>>> nothing stops people from writing to that Kafka topic by bypassing
> >> Samza
> >>>> completely. (thro' the kafka producer or external library)
> >>>>
> >>>> I'd think Kafka would build support for authorization, principals,
> >> roles
> >>>> etc. in the future and Samza can leverage it once it's done.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> On Wednesday, February 10, 2016, John Dennison <
> >> dennison.j...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Greetings,
> >>>>>
> >>>>> I have general design question i did not see addressed in the docs.
> >>>>> Basically how does samza guarantee a single writer for each changelog
> >>>>> partition. Because of strong ordering assumption of these changelog,
> >>> how do
> >>>>> you protect against zombie processes writing to the changelog with
> out
> >>> of
> >>>>> date values.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> John
> >>>>>
> >>>
> >>>
> >>
>
>

Reply via email to