Most of code coverage tools can instrument java classes without make any source code changes, but tracing distributed system is more involved because code execution via network interactions are not easy to match up. All interactions between sender and receiver have some form of session id or sequence id. Hadoop had some logic to assist the stitching of distributed interactions together in clienttrace log. This information seems to have been lost in the last 5-6 years of Hadoop evolutions. Htrace is invented to fill the void left behind by clienttrace as a programmable API to send out useful tracing data for downstream analytical program to visualize the interaction.
Large companies have common practice to enforce logging the session id, and write homebrew tools to stitch together debugging logic for a specific software. There are also growing set of tools from Splunk or similar companies to write analytical tools to stitch the views together. Hadoop does not seem to be on top of the list for those company to implement the tracing because Hadoop networking layer is complex and changed more frequently than desired. If we go back to logging approach, instead of API approach, it will help someone to write the analytical program someday. The danger of logging approach is that It is boring to write LOG.debug() everywhere, and we often forgot about it, and log entries are removed. API approach can work, if real time interactive tracing can be done. However, this is hard to realize in Hadoop because massive amount of parallel data is difficult to aggregate at real time without hitting timeout. It has a higher chance to require changes to network protocol that might cause more headache than it's worth. I am in favor of removing Htrace support and redo distributed tracing using logging approach. Regards, Eric On 7/30/18, 3:06 PM, "Stack" <st...@duboce.net> wrote: There is a healthy discussion going on over in HADOOP-15566 on tracing in the Hadoop ecosystem. It would sit better on a mailing list than in comments up on JIRA so here's an attempt at porting the chat here. Background/Context: Bits of Hadoop and HBase had Apache HTrace trace points added. HTrace was formerly "incubating" at Apache but has since been retired, moved to Apache Attic. HTrace and the efforts at instrumenting Hadoop wilted for want of attention/resourcing. Our Todd Lipcon noticed that the HTrace instrumentation can add friction on some code paths so can actually be harmful even when disabled. The natural follow-on is that we should rip out tracings of a "dead" project. This then beggars the question, should something replace it and if so what? This is where HADOOP-15566 is at currently. HTrace took two or three runs, led by various Heros, at building a trace lib for Hadoop (first). It was trying to build the trace lib, a store, and a visualizer. Always, it had a mechanism for dumping the traces out to external systems for storage and viewing (e.g. Zipkin). HTrace started when there was little else but the, you guessed it, Google paper that described the Dapper system they had internally. Since then, the world of tracing has come on in leaps and bounds with healthy alternatives, communities, and even commercialization. If interested, take a read over HADOOP-15566. Will try and encourage participants to move the chat here. Thanks, St.Ack --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org