Re: [DISCUSS] Tracing in the Hadoop ecosystem

Stack Tue, 21 Aug 2018 07:50:54 -0700

On Tue, Aug 21, 2018 at 3:44 AM Tsuyoshi Ozawa <[email protected]> wrote:


> Thanks for starting discussion, Stack.
>
> The ZipKin seems to be coming to the Apache Incubator. As Andrew
> Purtell said on HADOOP-15566, it would be good option since there is
> no problem about licenses.
> https://wiki.apache.org/incubator/ZipkinProposal
>
>
Yes. This is nice to see.



> Stack, do you have any knowledge about differences between Zipkin and
> HTrace? Might measurable performance overhead be observed still in
> Zipkin?
>
>
I've not measured to see if disabled trace points are friction-free.
Perhaps someone else has?



> To decrease the overhead, we need to do additional work like ftrace,
> well known dtrace implementation in Linux kernel. If I understand
> correctly, ftrace replace its function calls with NOP operations of
> CPU instruction when it is disabled. This ensures the lower overhead
> by the tracer. By replacing the function calls for tracing to JVM's
> NOP operation, can we achieve the minimum overhead?
>
>
That'd be ideal. Makes sense inside the kernel. But up in our sloppy java
context, we should be able to get away with something less exotic.

Thanks Tsuyoshi,
S




> Regards
> - Tsuyoshi
> On Tue, Jul 31, 2018 at 9:59 AM Eric Yang <[email protected]> wrote:
> >
> > Most of code coverage tools can instrument java classes without make any
> > source code changes, but tracing distributed system is more involved
> because
> > code execution via network interactions are not easy to match up.
> > All interactions between sender and receiver have some form of session id
> > or sequence id.  Hadoop had some logic to assist the stitching of
> distributed
> > interactions together in clienttrace log.  This information seems to
> have been
> > lost in the last 5-6 years of Hadoop evolutions.  Htrace is invented to
> fill the void
> > left behind by clienttrace as a programmable API to send out useful
> tracing data for
> > downstream analytical program to visualize the interaction.
> >
> > Large companies have common practice to enforce logging the session id,
> and
> > write homebrew tools to stitch together debugging logic for a specific
> software.
> > There are also growing set of tools from Splunk or similar companies to
> write
> > analytical tools to stitch the views together.  Hadoop does not seem to
> be on
> > top of the list for those company to implement the tracing because Hadoop
> > networking layer is complex and changed more frequently than desired.
> >
> > If we go back to logging approach, instead of API approach, it will help
> > someone to write the analytical program someday.  The danger of logging
> > approach is that It is boring to write LOG.debug() everywhere, and we
> > often forgot about it, and log entries are removed.
> >
> > API approach can work, if real time interactive tracing can be done.
> > However, this is hard to realize in Hadoop because massive amount of
> > parallel data is difficult to aggregate at real time without hitting
> timeout.
> > It has a higher chance to require changes to network protocol that might
> cause
> > more headache than it's worth.  I am in favor of removing Htrace support
> > and redo distributed tracing using logging approach.
> >
> > Regards,
> > Eric
> >
> > On 7/30/18, 3:06 PM, "Stack" <[email protected]> wrote:
> >
> >     There is a healthy discussion going on over in HADOOP-15566 on
> tracing
> >     in the Hadoop ecosystem. It would sit better on a mailing list than
> in
> >     comments up on JIRA so here's an attempt at porting the chat here.
> >
> >     Background/Context: Bits of Hadoop and HBase had Apache HTrace trace
> >     points added. HTrace was formerly "incubating" at Apache but has
> since
> >     been retired, moved to Apache Attic. HTrace and the efforts at
> >     instrumenting Hadoop wilted for want of attention/resourcing. Our
> Todd
> >     Lipcon noticed that the HTrace instrumentation can add friction on
> >     some code paths so can actually be harmful even when disabled.  The
> >     natural follow-on is that we should rip out tracings of a "dead"
> >     project. This then beggars the question, should something replace it
> >     and if so what? This is where HADOOP-15566 is at currently.
> >
> >     HTrace took two or three runs, led by various Heros, at building a
> >     trace lib for Hadoop (first). It was trying to build the trace lib, a
> >     store, and a visualizer. Always, it had a mechanism for dumping the
> >     traces out to external systems for storage and viewing (e.g. Zipkin).
> >     HTrace started when there was little else but the, you guessed it,
> >     Google paper that described the Dapper system they had internally.
> >     Since then, the world of tracing has come on in leaps and bounds with
> >     healthy alternatives, communities, and even commercialization.
> >
> >     If interested, take a read over HADOOP-15566. Will try and encourage
> >     participants to move the chat here.
> >
> >     Thanks,
> >     St.Ack
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: [email protected]
> >     For additional commands, e-mail: [email protected]
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>

Re: [DISCUSS] Tracing in the Hadoop ecosystem

Reply via email to