I got some crude performance numbers for TSDebug() and TSDEBUG()
when proxy.config.diags.debug.enabled is 0.  The good news is that
TSDebug() looks to take about 8 nanoseconds (the raw data is below).  So,
if we take 1000 requests per core per seconds as a ballpark estimate of TS
performance, it would take a lot of them to have a significant impact.  But
TSDebug() takes about 30 times as long as TSDEBUG().  So it seems
worthwhile to make the tiny effort to change TSDEBUG() so it won't swallow
a following 'else' clause.  I also think we should rename it to TSFastDbg().

We could rename TSDebug() to TSDebug_(), and TSDEBUG() to TSDebug().  But I
think the very small benefit is outweighed by the potential risk.  TS seems
to be rife with race conditions, so a change that will alter the length of
lots of code paths could create yet more instability.

Here is the raw data (6 tests, 3 repetitions).  Done with
https://github.com/apache/trafficserver/pull/5638 .  I don't understand why
it's not more consistent.  If anyone knows a better way to time this,
please let me know.

Performance for TSDebug() with no parameters, 1000000000 repetitions,
microseconds user=7670000 system=20000
Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
microseconds user=260000 system=0
Performance for TSDebug() with parameters, 1000000000 repetitions,
microseconds user=8030000 system=50000
Performance for TSDEBUG() with parameters, 1000000000 repetitions,
microseconds user=280000 system=0
Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
microseconds user=8830000 system=30000
Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
microseconds user=270000 system=0

Performance for TSDebug() with no parameters, 1000000000 repetitions,
microseconds user=7520000 system=50000
Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
microseconds user=260000 system=0
Performance for TSDebug() with parameters, 1000000000 repetitions,
microseconds user=8040000 system=30000
Performance for TSDEBUG() with parameters, 1000000000 repetitions,
microseconds user=260000 system=0
Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
microseconds user=8450000 system=50000
Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
microseconds user=260000 system=0

Performance for TSDebug() with no parameters, 1000000000 repetitions,
microseconds user=7470000 system=80000
Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
microseconds user=260000 system=0
Performance for TSDebug() with parameters, 1000000000 repetitions,
microseconds user=8120000 system=50000
Performance for TSDEBUG() with parameters, 1000000000 repetitions,
microseconds user=270000 system=0
Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
microseconds user=8500000 system=60000
Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
microseconds user=270000 system=0

On Wed, Jun 12, 2019 at 12:58 PM Walt Karas <wka...@verizonmedia.com> wrote:

> Since I'm still relatively new to this application, there are limits to
> what I can take on on my own initiative.  I don't see how anything I've
> proposed would interfere with any higher-priority work.
>
> On Wed, Jun 12, 2019 at 12:41 PM Leif Hedstrom <zw...@apache.org> wrote:
>
>> Also, I’d like to reiterate my call to arms on v9.0.0:
>>
>>    Please, everyone lets focus on the things that are critical to ATS
>> v9.0.0 readiness!
>>
>>
>> This includes:
>>
>>         1) Fixing all the critical bugs (and there’s a lot of them)
>>
>>         2) Testing “master"
>>
>>         3) Doing the incompatible changes that needs to be done before we
>> can release
>>
>>         4) Removing features that we do not want to support for the next
>> 3+ years
>>
>>         5) Make sure that v9.0.0 has the functionality that you
>> *absolutely* need to be able
>>         to use it
>>
>>
>> I kinda feel that several of these recent proposal do not fit into this,
>> and in fact, are a fair amount of distraction. I urge everyone to not go
>> down this route until we’ve got v9.0.0 stable and functional, and instead
>> focus on the v9.0.0 tasks at hand.  I don’t think this one for example fits
>> into any of those 5 points above.
>>
>> My $.01.
>>
>> — Leif
>>
>> P.s
>>    https://github.com/apache/trafficserver/issues/5457
>>
>>
>> > On Jun 12, 2019, at 11:30 AM, Walt Karas <wka...@verizonmedia.com.INVALID>
>> wrote:
>> >
>> > I would guess the performance increase would be well under 1%.  But
>> it's a
>> > fairly simple change.  The biggest performance increase I think would
>> come
>> > from avoiding a PCRE match, to see if the specific debug tag was
>> enabled.
>> > For some TSDebug calls, there can be many arguments to push, and
>> perhaps a
>> > fair amount of calculation to generate the correct arguments.
>> >
>> > On Wed, Jun 12, 2019 at 12:21 PM Bryan Call <bc...@apache.org> wrote:
>> >
>> >> I am surprised you are seeing this as a performance issue.  Looking at
>> the
>> >> code it looks like 1 function call and a couple pointer dereferences.
>> >>
>> >> If you wanted to speed up TSDebug and make the code harder to maintain
>> you
>> >> could create a structure that looks like like DiagsConfig in ts.h and
>> then
>> >> set a pointer used in the API to look at the gobal DiagsConfig.  Then
>> you
>> >> could have a macro that would do basically what on() does.  This would
>> >> bypass a function call and would have 1 more pointer dereference than
>> your
>> >> example.
>> >>
>> >> -Bryan
>> >>
>> >>
>> >>> On Jun 12, 2019, at 10:00 AM, Walt Karas <wka...@verizonmedia.com
>> .INVALID>
>> >> wrote:
>> >>>
>> >>> What do you mean by "exposing diags"?
>> >>>
>> >>> On Wed, Jun 12, 2019 at 11:50 AM Bryan Call <bc...@apache.org> wrote:
>> >>>
>> >>>> I would rather see TSDebug be optimized and exposing diags instead of
>> >>>> creating a new type of debug.
>> >>>>
>> >>>> -Bryan
>> >>>>
>> >>>>
>> >>>>> On Jun 12, 2019, at 9:44 AM, Walt Karas <wka...@verizonmedia.com
>> >> .INVALID>
>> >>>> wrote:
>> >>>>>
>> >>>>> I don't believe so.  The issue I'm trying to address is that there
>> is
>> >>>> still
>> >>>>> significant execution time overhead for debug output calls when
>> >>>>> proxy.config.diags.debug.enabled is 0.  This change would make the
>> >>>>> execution time overhead negligible (if TSFastDbg is used or TSDebug
>> is
>> >>>>> changed to be like the proposed TSFastDbg).  Another way of putting
>> it
>> >>>> is,
>> >>>>> I want to clean up the existing TSDEBUG alternative to TSDebug.
>> >>>>>
>> >>>>> On Wed, Jun 12, 2019 at 11:31 AM Bryan Call <bc...@apache.org>
>> wrote:
>> >>>>>
>> >>>>>> How would this be controlled on the command line when running
>> >>>>>> trafic_server -T?
>> >>>>>>
>> >>>>>> It looks like it is a departure from how debug tags and controlled
>> >> now.
>> >>>>>>
>> >>>>>> -Bryan
>> >>>>>>
>> >>>>>>
>> >>>>>>> On Jun 11, 2019, at 1:21 PM, Walt Karas <wka...@verizonmedia.com
>> >>>> .INVALID>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Any +1's for simply replacing TSDebug with the proposed TSFastDbg
>> ?
>> >>>>>>>
>> >>>>>>> On Mon, Apr 8, 2019 at 6:20 PM Leif Hedstrom <zw...@apache.org>
>> >> wrote:
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Apr 4, 2019, at 8:05 AM, Walt Karas <wka...@verizonmedia.com
>> >>>>>> .INVALID>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Because senility. probably "on" should be a non-pointer of type
>> >> const
>> >>>>>>>> volatile char, and InkAPI in the core can just keep a pointer to
>> the
>> >>>>>> whole
>> >>>>>>>> structure for updating.
>> >>>>>>>>
>> >>>>>>>> But I'm leaning more to fixing up the capability that already
>> kinda
>> >>>>>> there.
>> >>>>>>>> Which is optimizimg only for the case when all debug output is
>> >>>> disabled.
>> >>>>>>>> Especially since maybe zwoop would be on my side for once :o) .
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> My bad, whatever Walt said, I take the opposite stance.
>> >>>>>>>>
>> >>>>>>>> — leif
>> >>>>>>>>
>> >>>>>>>> P.s
>> >>>>>>>> J/K!
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, Apr 4, 2019 at 8:51 AM Alan Carroll
>> >>>>>>>> <solidwallofc...@verizonmedia.com.invalid> wrote:
>> >>>>>>>>
>> >>>>>>>> This could enable a significant speed up for debug tags. One
>> point
>> >> is
>> >>>>>> that
>> >>>>>>>> when the debug tag string is set, at that point the debug objects
>> >>>> could
>> >>>>>> be
>> >>>>>>>> updated to the correct state according to the debug tag string,
>> >> rather
>> >>>>>> than
>> >>>>>>>> checking it every time a debug message is logged.
>> >>>>>>>>
>> >>>>>>>> I have to ask, why is there a pointer to the enable flag,
>> instead of
>> >>>>>> just
>> >>>>>>>> the enable flag? Your logic doesn't even check that flag, it
>> checks
>> >>>>>> whether
>> >>>>>>>> the pointer is null.
>> >>>>>>>>
>> >>>>>>>> On Wed, Apr 3, 2019 at 11:52 AM Leif Hedstrom <zw...@apache.org>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Apr 3, 2019, at 10:47 AM, Walt Karas <wka...@verizonmedia.com
>> >>>>>>>>
>> >>>>>>>> .INVALID>
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> It looks like the TS API already has less granular version of
>> this:
>> >>>>>>>>
>> >>>>>>>> extern int diags_on_for_plugins;
>> >>>>>>>>
>> >>>>>>>> #define TSDEBUG             \
>> >>>>>>>>
>> >>>>>>>> if (diags_on_for_plugins) \
>> >>>>>>>>
>> >>>>>>>> TSDebug
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> A problem with this is if you write:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> if (flag)
>> >>>>>>>>
>> >>>>>>>> TSDEBUG(yadayada);
>> >>>>>>>>
>> >>>>>>>> else
>> >>>>>>>>
>> >>>>>>>> do_something();
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> then the else will be associated with the if hidden in the macro.
>> >> I'd
>> >>>>>>>> prefer to change to something like:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> extern const volatile char TS_detail_dbg_output_enabled;
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> #define TSFastDbg(_TAG, ...) \
>> >>>>>>>> do { \
>> >>>>>>>> if (TS_detail_dbg_output_enabled) \
>> >>>>>>>> TSDebug(_TAG, __VA_ARGS__); \
>> >>>>>>>>
>> >>>>>>>> } while(0)
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Yeh, that kinda sounds like a bug, doesn’t it?
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> The negative is that we'd only see the benefit when all debug
>> output
>> >>>>>>>>
>> >>>>>>>> was
>> >>>>>>>>
>> >>>>>>>> disabled.  But, in any case, that is the main benefit, to avoid
>> >>>> impacts
>> >>>>>>>> during normal operation.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Right, we should always optimize for the common cases, obviously.
>> >> When
>> >>>>>>>> there’s a decision to make, where you can either make it
>> “equally”
>> >>>>>>>> expensive always, or more expensive in a rare case, but cheaper
>> in
>> >> the
>> >>>>>>>> common case, we should likely chose the latter.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Looking in src/tscore/Diags.cc, it looks like the debug enable
>> flag
>> >> is
>> >>>>>>>>
>> >>>>>>>> only
>> >>>>>>>>
>> >>>>>>>> set at startup time.  So we wouldn't be able to enable TSFastDbg
>> >> using
>> >>>>>>>> traffic_cntl without adding complexity.
>> >>>>>>>>
>> >>>>>>>> On Tue, Apr 2, 2019 at 5:48 PM Walt Karas <
>> wka...@verizonmedia.com>
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> Add this new structure to apidefs.h:
>> >>>>>>>>
>> >>>>>>>> typedef struct TSFastDbgCntl_
>> >>>>>>>> {
>> >>>>>>>> const char * const tag; // nul-terminated string
>> >>>>>>>> const volatile char * const on; // pointer to 1-byte flag
>> >>>>>>>> }
>> >>>>>>>> TSFastDbgCntl;
>> >>>>>>>>
>> >>>>>>>> Add this new API function, that returns a pointer to an instance
>> of
>> >>>>>>>>
>> >>>>>>>> the
>> >>>>>>>>
>> >>>>>>>> above structure:
>> >>>>>>>>
>> >>>>>>>> TSFastDbgCntl * TSCreateFastDbgCntl(const char *tag);
>> >>>>>>>>
>> >>>>>>>> Add this macro, which would be used in place of direct use of
>> >>>>>>>>
>> >>>>>>>> TSDebug().
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> #define TSFastDbg(_FD_CNTL, ...) \
>> >>>>>>>> do { \
>> >>>>>>>> if ((_FD_CNTL)->on) \
>> >>>>>>>> TSDebug((_FD_CNTL)->tag, __VA_ARGS__); \
>> >>>>>>>> } while(0)
>> >>>>>>>>
>> >>>>>>>> The first parameter to TSFastDbg() is a pointer that was
>> returned by
>> >>>>>>>> TSCreateFastDbgCntl().  The remaining parameters are printf
>> >> parameters
>> >>>>>>>> (format string and values matching format specifiers).  The core
>> >> would
>> >>>>>>>>
>> >>>>>>>> be
>> >>>>>>>>
>> >>>>>>>> responsible for changing the value of the 'on' fields in the
>> >>>>>>>>
>> >>>>>>>> TSFastDbgCntl
>> >>>>>>>>
>> >>>>>>>> instances.  For direct calls to TSDebug(), even if the tag is
>> >>>>>>>>
>> >>>>>>>> disabled,
>> >>>>>>>>
>> >>>>>>>> all
>> >>>>>>>>
>> >>>>>>>> the parameters have to be prepared and pushed onto the stack.  As
>> >> was
>> >>>>>>>> discussed in IRC, you have to feel guilty about doing converting
>> a
>> >>>>>>>> string_view to a string and outputting it with c_str().  Because
>> the
>> >>>>>>>>
>> >>>>>>>> code
>> >>>>>>>>
>> >>>>>>>> to do this will be executed even if the debug tag is disabled.
>> >>>>>>>> Furthermore, for each call to TSDebug, it's necessary to do an
>> >>>>>>>>
>> >>>>>>>> associative
>> >>>>>>>>
>> >>>>>>>> lookup on the tag string to determine that output for the tag is
>> >>>>>>>>
>> >>>>>>>> disabled.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> We would want to put equivalent capabilities in Diags.h in the
>> core
>> >>>>>>>>
>> >>>>>>>> for
>> >>>>>>>>
>> >>>>>>>> debug output.  The implementation is non-trivial, but I think it
>> >> would
>> >>>>>>>>
>> >>>>>>>> take
>> >>>>>>>>
>> >>>>>>>> at most a week including the Au test additions.
>> >>>>>>>>
>> >>>>>>>> I looked at making TSFastDbg() an inline function rather than a
>> >> macro:
>> >>>>>>>> https://godbolt.org/z/IfVbBk
>> >>>>>>>> The opitimization works well if you compile the code with the
>> inline
>> >>>>>>>> function as C, but the optimization is poor if you compile it as
>> >> C++.
>> >>>>>>>>
>> >>>>>>>> This
>> >>>>>>>>
>> >>>>>>>> difference exists for both gcc and clang.  It's weird.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>

Reply via email to