Re: [DISCUSS] Tracing in the Hadoop ecosystem

Andrew Purtell Tue, 21 Aug 2018 11:42:57 -0700

I was assuming taking the HTrace API implementation, removing all code from the 
methods, and reimplementing with Brave wouldn't face insurmountable challenges, 
especially given the result is only meant for near term use, but I can't say 
I've tried nor looked into it in details. Was thinking that would be the first 
step of an effort in this regard. If not possible, we will have to do this 
major lift where everyone reimplements tracing cross stack right away, rather 
than pivot on a transitional facade.


> On Aug 21, 2018, at 11:18 AM, Stack <[email protected]> wrote:
> 
>> On Tue, Aug 21, 2018 at 10:09 AM Andrew Purtell <[email protected]> wrote:
>> 
>> What if someone built a HTrace facade for Zipkin / Brave?
> 
> 
> I like the idea but taking a look, HTrace does static dispatch. I was
> thinking that precludes our being able to do a facade. I would love to hear
> otherwise.
> Thanks,
> S
> 
> 
>> Hadoop, HBase,
>> Phoenix, and other HTrace API users would still need to move away from
>> embedding HTrace instrumentation points to whatever is the normal API of
>> the accepted replacement, but such a facade would give you a drop in
>> replacement requiring no code changes to currently shipping code lines, and
>> some time to do a hopefully coordinated replacement involving all upstreams
>> and downstreams. Just a thought. Zipkin / Brave has widespread adoption of
>> that option and the impending incubation here at the ASF will make it quite
>> attractive, I think.
>> 
>> 
>>> On Tue, Aug 21, 2018 at 7:50 AM Stack <[email protected]> wrote:
>>> 
>>>> On Tue, Aug 21, 2018 at 3:44 AM Tsuyoshi Ozawa <[email protected]> wrote:
>>>> 
>>>> Thanks for starting discussion, Stack.
>>>> 
>>>> The ZipKin seems to be coming to the Apache Incubator. As Andrew
>>>> Purtell said on HADOOP-15566, it would be good option since there is
>>>> no problem about licenses.
>>>> https://wiki.apache.org/incubator/ZipkinProposal
>>>> 
>>>> 
>>> Yes. This is nice to see.
>>> 
>>> 
>>> 
>>>> Stack, do you have any knowledge about differences between Zipkin and
>>>> HTrace? Might measurable performance overhead be observed still in
>>>> Zipkin?
>>>> 
>>>> 
>>> I've not measured to see if disabled trace points are friction-free.
>>> Perhaps someone else has?
>>> 
>>> 
>>> 
>>>> To decrease the overhead, we need to do additional work like ftrace,
>>>> well known dtrace implementation in Linux kernel. If I understand
>>>> correctly, ftrace replace its function calls with NOP operations of
>>>> CPU instruction when it is disabled. This ensures the lower overhead
>>>> by the tracer. By replacing the function calls for tracing to JVM's
>>>> NOP operation, can we achieve the minimum overhead?
>>>> 
>>>> 
>>> That'd be ideal. Makes sense inside the kernel. But up in our sloppy java
>>> context, we should be able to get away with something less exotic.
>>> 
>>> Thanks Tsuyoshi,
>>> S
>>> 
>>> 
>>> 
>>> 
>>>> Regards
>>>> - Tsuyoshi
>>>> On Tue, Jul 31, 2018 at 9:59 AM Eric Yang <[email protected]>
>> wrote:
>>>>> 
>>>>> Most of code coverage tools can instrument java classes without make
>>> any
>>>>> source code changes, but tracing distributed system is more involved
>>>> because
>>>>> code execution via network interactions are not easy to match up.
>>>>> All interactions between sender and receiver have some form of
>> session
>>> id
>>>>> or sequence id.  Hadoop had some logic to assist the stitching of
>>>> distributed
>>>>> interactions together in clienttrace log.  This information seems to
>>>> have been
>>>>> lost in the last 5-6 years of Hadoop evolutions.  Htrace is invented
>> to
>>>> fill the void
>>>>> left behind by clienttrace as a programmable API to send out useful
>>>> tracing data for
>>>>> downstream analytical program to visualize the interaction.
>>>>> 
>>>>> Large companies have common practice to enforce logging the session
>> id,
>>>> and
>>>>> write homebrew tools to stitch together debugging logic for a
>> specific
>>>> software.
>>>>> There are also growing set of tools from Splunk or similar companies
>> to
>>>> write
>>>>> analytical tools to stitch the views together.  Hadoop does not seem
>> to
>>>> be on
>>>>> top of the list for those company to implement the tracing because
>>> Hadoop
>>>>> networking layer is complex and changed more frequently than desired.
>>>>> 
>>>>> If we go back to logging approach, instead of API approach, it will
>>> help
>>>>> someone to write the analytical program someday.  The danger of
>> logging
>>>>> approach is that It is boring to write LOG.debug() everywhere, and we
>>>>> often forgot about it, and log entries are removed.
>>>>> 
>>>>> API approach can work, if real time interactive tracing can be done.
>>>>> However, this is hard to realize in Hadoop because massive amount of
>>>>> parallel data is difficult to aggregate at real time without hitting
>>>> timeout.
>>>>> It has a higher chance to require changes to network protocol that
>>> might
>>>> cause
>>>>> more headache than it's worth.  I am in favor of removing Htrace
>>> support
>>>>> and redo distributed tracing using logging approach.
>>>>> 
>>>>> Regards,
>>>>> Eric
>>>>> 
>>>>> On 7/30/18, 3:06 PM, "Stack" <[email protected]> wrote:
>>>>> 
>>>>>    There is a healthy discussion going on over in HADOOP-15566 on
>>>> tracing
>>>>>    in the Hadoop ecosystem. It would sit better on a mailing list
>> than
>>>> in
>>>>>    comments up on JIRA so here's an attempt at porting the chat
>> here.
>>>>> 
>>>>>    Background/Context: Bits of Hadoop and HBase had Apache HTrace
>>> trace
>>>>>    points added. HTrace was formerly "incubating" at Apache but has
>>>> since
>>>>>    been retired, moved to Apache Attic. HTrace and the efforts at
>>>>>    instrumenting Hadoop wilted for want of attention/resourcing. Our
>>>> Todd
>>>>>    Lipcon noticed that the HTrace instrumentation can add friction
>> on
>>>>>    some code paths so can actually be harmful even when disabled.
>> The
>>>>>    natural follow-on is that we should rip out tracings of a "dead"
>>>>>    project. This then beggars the question, should something replace
>>> it
>>>>>    and if so what? This is where HADOOP-15566 is at currently.
>>>>> 
>>>>>    HTrace took two or three runs, led by various Heros, at building
>> a
>>>>>    trace lib for Hadoop (first). It was trying to build the trace
>>> lib, a
>>>>>    store, and a visualizer. Always, it had a mechanism for dumping
>> the
>>>>>    traces out to external systems for storage and viewing (e.g.
>>> Zipkin).
>>>>>    HTrace started when there was little else but the, you guessed
>> it,
>>>>>    Google paper that described the Dapper system they had
>> internally.
>>>>>    Since then, the world of tracing has come on in leaps and bounds
>>> with
>>>>>    healthy alternatives, communities, and even commercialization.
>>>>> 
>>>>>    If interested, take a read over HADOOP-15566. Will try and
>>> encourage
>>>>>    participants to move the chat here.
>>>>> 
>>>>>    Thanks,
>>>>>    St.Ack
>>>>> 
>>>>> 
>>> ---------------------------------------------------------------------
>>>>>    To unsubscribe, e-mail: [email protected]
>>>>>    For additional commands, e-mail:
>> [email protected]
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>> 
>>> 
>> 
>> 
>> --
>> Best regards,
>> Andrew
>> 
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>   - A23, Crosstalk
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Tracing in the Hadoop ecosystem

Reply via email to