Thanks for starting discussion, Stack. The ZipKin seems to be coming to the Apache Incubator. As Andrew Purtell said on HADOOP-15566, it would be good option since there is no problem about licenses. https://wiki.apache.org/incubator/ZipkinProposal
Stack, do you have any knowledge about differences between Zipkin and HTrace? Might measurable performance overhead be observed still in Zipkin? To decrease the overhead, we need to do additional work like ftrace, well known dtrace implementation in Linux kernel. If I understand correctly, ftrace replace its function calls with NOP operations of CPU instruction when it is disabled. This ensures the lower overhead by the tracer. By replacing the function calls for tracing to JVM's NOP operation, can we achieve the minimum overhead? Regards - Tsuyoshi On Tue, Jul 31, 2018 at 9:59 AM Eric Yang <ey...@hortonworks.com> wrote: > > Most of code coverage tools can instrument java classes without make any > source code changes, but tracing distributed system is more involved because > code execution via network interactions are not easy to match up. > All interactions between sender and receiver have some form of session id > or sequence id. Hadoop had some logic to assist the stitching of distributed > interactions together in clienttrace log. This information seems to have been > lost in the last 5-6 years of Hadoop evolutions. Htrace is invented to fill > the void > left behind by clienttrace as a programmable API to send out useful tracing > data for > downstream analytical program to visualize the interaction. > > Large companies have common practice to enforce logging the session id, and > write homebrew tools to stitch together debugging logic for a specific > software. > There are also growing set of tools from Splunk or similar companies to write > analytical tools to stitch the views together. Hadoop does not seem to be on > top of the list for those company to implement the tracing because Hadoop > networking layer is complex and changed more frequently than desired. > > If we go back to logging approach, instead of API approach, it will help > someone to write the analytical program someday. The danger of logging > approach is that It is boring to write LOG.debug() everywhere, and we > often forgot about it, and log entries are removed. > > API approach can work, if real time interactive tracing can be done. > However, this is hard to realize in Hadoop because massive amount of > parallel data is difficult to aggregate at real time without hitting timeout. > It has a higher chance to require changes to network protocol that might cause > more headache than it's worth. I am in favor of removing Htrace support > and redo distributed tracing using logging approach. > > Regards, > Eric > > On 7/30/18, 3:06 PM, "Stack" <st...@duboce.net> wrote: > > There is a healthy discussion going on over in HADOOP-15566 on tracing > in the Hadoop ecosystem. It would sit better on a mailing list than in > comments up on JIRA so here's an attempt at porting the chat here. > > Background/Context: Bits of Hadoop and HBase had Apache HTrace trace > points added. HTrace was formerly "incubating" at Apache but has since > been retired, moved to Apache Attic. HTrace and the efforts at > instrumenting Hadoop wilted for want of attention/resourcing. Our Todd > Lipcon noticed that the HTrace instrumentation can add friction on > some code paths so can actually be harmful even when disabled. The > natural follow-on is that we should rip out tracings of a "dead" > project. This then beggars the question, should something replace it > and if so what? This is where HADOOP-15566 is at currently. > > HTrace took two or three runs, led by various Heros, at building a > trace lib for Hadoop (first). It was trying to build the trace lib, a > store, and a visualizer. Always, it had a mechanism for dumping the > traces out to external systems for storage and viewing (e.g. Zipkin). > HTrace started when there was little else but the, you guessed it, > Google paper that described the Dapper system they had internally. > Since then, the world of tracing has come on in leaps and bounds with > healthy alternatives, communities, and even commercialization. > > If interested, take a read over HADOOP-15566. Will try and encourage > participants to move the chat here. > > Thanks, > St.Ack > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org