Most of code coverage tools can instrument java classes without make any 
source code changes, but tracing distributed system is more involved because 
code execution via network interactions are not easy to match up.  
All interactions between sender and receiver have some form of session id 
or sequence id.  Hadoop had some logic to assist the stitching of distributed
interactions together in clienttrace log.  This information seems to have been
lost in the last 5-6 years of Hadoop evolutions.  Htrace is invented to fill 
the void
left behind by clienttrace as a programmable API to send out useful tracing 
data for
downstream analytical program to visualize the interaction.

Large companies have common practice to enforce logging the session id, and 
write homebrew tools to stitch together debugging logic for a specific 
software.  
There are also growing set of tools from Splunk or similar companies to write 
analytical tools to stitch the views together.  Hadoop does not seem to be on 
top of the list for those company to implement the tracing because Hadoop 
networking layer is complex and changed more frequently than desired.  

If we go back to logging approach, instead of API approach, it will help 
someone to write the analytical program someday.  The danger of logging 
approach is that It is boring to write LOG.debug() everywhere, and we 
often forgot about it, and log entries are removed.

API approach can work, if real time interactive tracing can be done.  
However, this is hard to realize in Hadoop because massive amount of 
parallel data is difficult to aggregate at real time without hitting timeout.
It has a higher chance to require changes to network protocol that might cause 
more headache than it's worth.  I am in favor of removing Htrace support
and redo distributed tracing using logging approach.

Regards,
Eric

On 7/30/18, 3:06 PM, "Stack" <st...@duboce.net> wrote:

    There is a healthy discussion going on over in HADOOP-15566 on tracing
    in the Hadoop ecosystem. It would sit better on a mailing list than in
    comments up on JIRA so here's an attempt at porting the chat here.
    
    Background/Context: Bits of Hadoop and HBase had Apache HTrace trace
    points added. HTrace was formerly "incubating" at Apache but has since
    been retired, moved to Apache Attic. HTrace and the efforts at
    instrumenting Hadoop wilted for want of attention/resourcing. Our Todd
    Lipcon noticed that the HTrace instrumentation can add friction on
    some code paths so can actually be harmful even when disabled.  The
    natural follow-on is that we should rip out tracings of a "dead"
    project. This then beggars the question, should something replace it
    and if so what? This is where HADOOP-15566 is at currently.
    
    HTrace took two or three runs, led by various Heros, at building a
    trace lib for Hadoop (first). It was trying to build the trace lib, a
    store, and a visualizer. Always, it had a mechanism for dumping the
    traces out to external systems for storage and viewing (e.g. Zipkin).
    HTrace started when there was little else but the, you guessed it,
    Google paper that described the Dapper system they had internally.
    Since then, the world of tracing has come on in leaps and bounds with
    healthy alternatives, communities, and even commercialization.
    
    If interested, take a read over HADOOP-15566. Will try and encourage
    participants to move the chat here.
    
    Thanks,
    St.Ack
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
    For additional commands, e-mail: common-dev-h...@hadoop.apache.org
    
    


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to