On Tue, Aug 21, 2018 at 3:44 AM Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Thanks for starting discussion, Stack. > > The ZipKin seems to be coming to the Apache Incubator. As Andrew > Purtell said on HADOOP-15566, it would be good option since there is > no problem about licenses. > https://wiki.apache.org/incubator/ZipkinProposal > > Yes. This is nice to see. > Stack, do you have any knowledge about differences between Zipkin and > HTrace? Might measurable performance overhead be observed still in > Zipkin? > > I've not measured to see if disabled trace points are friction-free. Perhaps someone else has? > To decrease the overhead, we need to do additional work like ftrace, > well known dtrace implementation in Linux kernel. If I understand > correctly, ftrace replace its function calls with NOP operations of > CPU instruction when it is disabled. This ensures the lower overhead > by the tracer. By replacing the function calls for tracing to JVM's > NOP operation, can we achieve the minimum overhead? > > That'd be ideal. Makes sense inside the kernel. But up in our sloppy java context, we should be able to get away with something less exotic. Thanks Tsuyoshi, S > Regards > - Tsuyoshi > On Tue, Jul 31, 2018 at 9:59 AM Eric Yang <ey...@hortonworks.com> wrote: > > > > Most of code coverage tools can instrument java classes without make any > > source code changes, but tracing distributed system is more involved > because > > code execution via network interactions are not easy to match up. > > All interactions between sender and receiver have some form of session id > > or sequence id. Hadoop had some logic to assist the stitching of > distributed > > interactions together in clienttrace log. This information seems to > have been > > lost in the last 5-6 years of Hadoop evolutions. Htrace is invented to > fill the void > > left behind by clienttrace as a programmable API to send out useful > tracing data for > > downstream analytical program to visualize the interaction. > > > > Large companies have common practice to enforce logging the session id, > and > > write homebrew tools to stitch together debugging logic for a specific > software. > > There are also growing set of tools from Splunk or similar companies to > write > > analytical tools to stitch the views together. Hadoop does not seem to > be on > > top of the list for those company to implement the tracing because Hadoop > > networking layer is complex and changed more frequently than desired. > > > > If we go back to logging approach, instead of API approach, it will help > > someone to write the analytical program someday. The danger of logging > > approach is that It is boring to write LOG.debug() everywhere, and we > > often forgot about it, and log entries are removed. > > > > API approach can work, if real time interactive tracing can be done. > > However, this is hard to realize in Hadoop because massive amount of > > parallel data is difficult to aggregate at real time without hitting > timeout. > > It has a higher chance to require changes to network protocol that might > cause > > more headache than it's worth. I am in favor of removing Htrace support > > and redo distributed tracing using logging approach. > > > > Regards, > > Eric > > > > On 7/30/18, 3:06 PM, "Stack" <st...@duboce.net> wrote: > > > > There is a healthy discussion going on over in HADOOP-15566 on > tracing > > in the Hadoop ecosystem. It would sit better on a mailing list than > in > > comments up on JIRA so here's an attempt at porting the chat here. > > > > Background/Context: Bits of Hadoop and HBase had Apache HTrace trace > > points added. HTrace was formerly "incubating" at Apache but has > since > > been retired, moved to Apache Attic. HTrace and the efforts at > > instrumenting Hadoop wilted for want of attention/resourcing. Our > Todd > > Lipcon noticed that the HTrace instrumentation can add friction on > > some code paths so can actually be harmful even when disabled. The > > natural follow-on is that we should rip out tracings of a "dead" > > project. This then beggars the question, should something replace it > > and if so what? This is where HADOOP-15566 is at currently. > > > > HTrace took two or three runs, led by various Heros, at building a > > trace lib for Hadoop (first). It was trying to build the trace lib, a > > store, and a visualizer. Always, it had a mechanism for dumping the > > traces out to external systems for storage and viewing (e.g. Zipkin). > > HTrace started when there was little else but the, you guessed it, > > Google paper that described the Dapper system they had internally. > > Since then, the world of tracing has come on in leaps and bounds with > > healthy alternatives, communities, and even commercialization. > > > > If interested, take a read over HADOOP-15566. Will try and encourage > > participants to move the chat here. > > > > Thanks, > > St.Ack > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org >