> On Jun 14, 2019, at 5:18 PM, Walt Karas <wka...@verizonmedia.com.INVALID> 
> wrote:
> 
> I got some crude performance numbers for TSDebug() and TSDEBUG()
> when proxy.config.diags.debug.enabled is 0.  The good news is that
> TSDebug() looks to take about 8 nanoseconds (the raw data is below).  So,
> if we take 1000 requests per core per seconds as a ballpark estimate of TS
> performance, it would take a lot of them to have a significant impact.  But
> TSDebug() takes about 30 times as long as TSDEBUG().  So it seems
> worthwhile to make the tiny effort to change TSDEBUG() so it won't swallow
> a following 'else' clause.  I also think we should rename it to TSFastDbg().

The numbers sounds good, but why on earth keep the old Debug? Why would I chose 
to use a slow API over a fast API? That just makes no sense.

if we do this, just change TSDebug().  If it’s incompatible, we document it 
with appropriate warnings.

Does any of this affect / improve the core Debug() statements? Those are 
probably 10x - 100x more critical to get fast, because they are *everywhere*.

— leif

> 
> We could rename TSDebug() to TSDebug_(), and TSDEBUG() to TSDebug().  But I
> think the very small benefit is outweighed by the potential risk.  TS seems
> to be rife with race conditions, so a change that will alter the length of
> lots of code paths could create yet more instability.
> 
> Here is the raw data (6 tests, 3 repetitions).  Done with
> https://github.com/apache/trafficserver/pull/5638 .  I don't understand why
> it's not more consistent.  If anyone knows a better way to time this,
> please let me know.
> 
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7670000 system=20000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8030000 system=50000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=280000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8830000 system=30000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
> 
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7520000 system=50000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8040000 system=30000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8450000 system=50000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> 
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7470000 system=80000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8120000 system=50000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8500000 system=60000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
> 
> On Wed, Jun 12, 2019 at 12:58 PM Walt Karas <wka...@verizonmedia.com> wrote:
> 
>> Since I'm still relatively new to this application, there are limits to
>> what I can take on on my own initiative.  I don't see how anything I've
>> proposed would interfere with any higher-priority work.
>> 
>> On Wed, Jun 12, 2019 at 12:41 PM Leif Hedstrom <zw...@apache.org> wrote:
>> 
>>> Also, I’d like to reiterate my call to arms on v9.0.0:
>>> 
>>>   Please, everyone lets focus on the things that are critical to ATS
>>> v9.0.0 readiness!
>>> 
>>> 
>>> This includes:
>>> 
>>>        1) Fixing all the critical bugs (and there’s a lot of them)
>>> 
>>>        2) Testing “master"
>>> 
>>>        3) Doing the incompatible changes that needs to be done before we
>>> can release
>>> 
>>>        4) Removing features that we do not want to support for the next
>>> 3+ years
>>> 
>>>        5) Make sure that v9.0.0 has the functionality that you
>>> *absolutely* need to be able
>>>        to use it
>>> 
>>> 
>>> I kinda feel that several of these recent proposal do not fit into this,
>>> and in fact, are a fair amount of distraction. I urge everyone to not go
>>> down this route until we’ve got v9.0.0 stable and functional, and instead
>>> focus on the v9.0.0 tasks at hand.  I don’t think this one for example fits
>>> into any of those 5 points above.
>>> 
>>> My $.01.
>>> 
>>> — Leif
>>> 
>>> P.s
>>>   https://github.com/apache/trafficserver/issues/5457
>>> 
>>> 
>>>> On Jun 12, 2019, at 11:30 AM, Walt Karas <wka...@verizonmedia.com.INVALID>
>>> wrote:
>>>> 
>>>> I would guess the performance increase would be well under 1%.  But
>>> it's a
>>>> fairly simple change.  The biggest performance increase I think would
>>> come
>>>> from avoiding a PCRE match, to see if the specific debug tag was
>>> enabled.
>>>> For some TSDebug calls, there can be many arguments to push, and
>>> perhaps a
>>>> fair amount of calculation to generate the correct arguments.
>>>> 
>>>> On Wed, Jun 12, 2019 at 12:21 PM Bryan Call <bc...@apache.org> wrote:
>>>> 
>>>>> I am surprised you are seeing this as a performance issue.  Looking at
>>> the
>>>>> code it looks like 1 function call and a couple pointer dereferences.
>>>>> 
>>>>> If you wanted to speed up TSDebug and make the code harder to maintain
>>> you
>>>>> could create a structure that looks like like DiagsConfig in ts.h and
>>> then
>>>>> set a pointer used in the API to look at the gobal DiagsConfig.  Then
>>> you
>>>>> could have a macro that would do basically what on() does.  This would
>>>>> bypass a function call and would have 1 more pointer dereference than
>>> your
>>>>> example.
>>>>> 
>>>>> -Bryan
>>>>> 
>>>>> 
>>>>>> On Jun 12, 2019, at 10:00 AM, Walt Karas <wka...@verizonmedia.com
>>> .INVALID>
>>>>> wrote:
>>>>>> 
>>>>>> What do you mean by "exposing diags"?
>>>>>> 
>>>>>> On Wed, Jun 12, 2019 at 11:50 AM Bryan Call <bc...@apache.org> wrote:
>>>>>> 
>>>>>>> I would rather see TSDebug be optimized and exposing diags instead of
>>>>>>> creating a new type of debug.
>>>>>>> 
>>>>>>> -Bryan
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jun 12, 2019, at 9:44 AM, Walt Karas <wka...@verizonmedia.com
>>>>> .INVALID>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I don't believe so.  The issue I'm trying to address is that there
>>> is
>>>>>>> still
>>>>>>>> significant execution time overhead for debug output calls when
>>>>>>>> proxy.config.diags.debug.enabled is 0.  This change would make the
>>>>>>>> execution time overhead negligible (if TSFastDbg is used or TSDebug
>>> is
>>>>>>>> changed to be like the proposed TSFastDbg).  Another way of putting
>>> it
>>>>>>> is,
>>>>>>>> I want to clean up the existing TSDEBUG alternative to TSDebug.
>>>>>>>> 
>>>>>>>> On Wed, Jun 12, 2019 at 11:31 AM Bryan Call <bc...@apache.org>
>>> wrote:
>>>>>>>> 
>>>>>>>>> How would this be controlled on the command line when running
>>>>>>>>> trafic_server -T?
>>>>>>>>> 
>>>>>>>>> It looks like it is a departure from how debug tags and controlled
>>>>> now.
>>>>>>>>> 
>>>>>>>>> -Bryan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jun 11, 2019, at 1:21 PM, Walt Karas <wka...@verizonmedia.com
>>>>>>> .INVALID>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Any +1's for simply replacing TSDebug with the proposed TSFastDbg
>>> ?
>>>>>>>>>> 
>>>>>>>>>> On Mon, Apr 8, 2019 at 6:20 PM Leif Hedstrom <zw...@apache.org>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 4, 2019, at 8:05 AM, Walt Karas <wka...@verizonmedia.com
>>>>>>>>> .INVALID>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Because senility. probably "on" should be a non-pointer of type
>>>>> const
>>>>>>>>>>> volatile char, and InkAPI in the core can just keep a pointer to
>>> the
>>>>>>>>> whole
>>>>>>>>>>> structure for updating.
>>>>>>>>>>> 
>>>>>>>>>>> But I'm leaning more to fixing up the capability that already
>>> kinda
>>>>>>>>> there.
>>>>>>>>>>> Which is optimizimg only for the case when all debug output is
>>>>>>> disabled.
>>>>>>>>>>> Especially since maybe zwoop would be on my side for once :o) .
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> My bad, whatever Walt said, I take the opposite stance.
>>>>>>>>>>> 
>>>>>>>>>>> — leif
>>>>>>>>>>> 
>>>>>>>>>>> P.s
>>>>>>>>>>> J/K!
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Apr 4, 2019 at 8:51 AM Alan Carroll
>>>>>>>>>>> <solidwallofc...@verizonmedia.com.invalid> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> This could enable a significant speed up for debug tags. One
>>> point
>>>>> is
>>>>>>>>> that
>>>>>>>>>>> when the debug tag string is set, at that point the debug objects
>>>>>>> could
>>>>>>>>> be
>>>>>>>>>>> updated to the correct state according to the debug tag string,
>>>>> rather
>>>>>>>>> than
>>>>>>>>>>> checking it every time a debug message is logged.
>>>>>>>>>>> 
>>>>>>>>>>> I have to ask, why is there a pointer to the enable flag,
>>> instead of
>>>>>>>>> just
>>>>>>>>>>> the enable flag? Your logic doesn't even check that flag, it
>>> checks
>>>>>>>>> whether
>>>>>>>>>>> the pointer is null.
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Apr 3, 2019 at 11:52 AM Leif Hedstrom <zw...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Apr 3, 2019, at 10:47 AM, Walt Karas <wka...@verizonmedia.com
>>>>>>>>>>> 
>>>>>>>>>>> .INVALID>
>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> It looks like the TS API already has less granular version of
>>> this:
>>>>>>>>>>> 
>>>>>>>>>>> extern int diags_on_for_plugins;
>>>>>>>>>>> 
>>>>>>>>>>> #define TSDEBUG             \
>>>>>>>>>>> 
>>>>>>>>>>> if (diags_on_for_plugins) \
>>>>>>>>>>> 
>>>>>>>>>>> TSDebug
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> A problem with this is if you write:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> if (flag)
>>>>>>>>>>> 
>>>>>>>>>>> TSDEBUG(yadayada);
>>>>>>>>>>> 
>>>>>>>>>>> else
>>>>>>>>>>> 
>>>>>>>>>>> do_something();
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> then the else will be associated with the if hidden in the macro.
>>>>> I'd
>>>>>>>>>>> prefer to change to something like:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> extern const volatile char TS_detail_dbg_output_enabled;
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> #define TSFastDbg(_TAG, ...) \
>>>>>>>>>>> do { \
>>>>>>>>>>> if (TS_detail_dbg_output_enabled) \
>>>>>>>>>>> TSDebug(_TAG, __VA_ARGS__); \
>>>>>>>>>>> 
>>>>>>>>>>> } while(0)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Yeh, that kinda sounds like a bug, doesn’t it?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> The negative is that we'd only see the benefit when all debug
>>> output
>>>>>>>>>>> 
>>>>>>>>>>> was
>>>>>>>>>>> 
>>>>>>>>>>> disabled.  But, in any case, that is the main benefit, to avoid
>>>>>>> impacts
>>>>>>>>>>> during normal operation.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Right, we should always optimize for the common cases, obviously.
>>>>> When
>>>>>>>>>>> there’s a decision to make, where you can either make it
>>> “equally”
>>>>>>>>>>> expensive always, or more expensive in a rare case, but cheaper
>>> in
>>>>> the
>>>>>>>>>>> common case, we should likely chose the latter.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Looking in src/tscore/Diags.cc, it looks like the debug enable
>>> flag
>>>>> is
>>>>>>>>>>> 
>>>>>>>>>>> only
>>>>>>>>>>> 
>>>>>>>>>>> set at startup time.  So we wouldn't be able to enable TSFastDbg
>>>>> using
>>>>>>>>>>> traffic_cntl without adding complexity.
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Apr 2, 2019 at 5:48 PM Walt Karas <
>>> wka...@verizonmedia.com>
>>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Add this new structure to apidefs.h:
>>>>>>>>>>> 
>>>>>>>>>>> typedef struct TSFastDbgCntl_
>>>>>>>>>>> {
>>>>>>>>>>> const char * const tag; // nul-terminated string
>>>>>>>>>>> const volatile char * const on; // pointer to 1-byte flag
>>>>>>>>>>> }
>>>>>>>>>>> TSFastDbgCntl;
>>>>>>>>>>> 
>>>>>>>>>>> Add this new API function, that returns a pointer to an instance
>>> of
>>>>>>>>>>> 
>>>>>>>>>>> the
>>>>>>>>>>> 
>>>>>>>>>>> above structure:
>>>>>>>>>>> 
>>>>>>>>>>> TSFastDbgCntl * TSCreateFastDbgCntl(const char *tag);
>>>>>>>>>>> 
>>>>>>>>>>> Add this macro, which would be used in place of direct use of
>>>>>>>>>>> 
>>>>>>>>>>> TSDebug().
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> #define TSFastDbg(_FD_CNTL, ...) \
>>>>>>>>>>> do { \
>>>>>>>>>>> if ((_FD_CNTL)->on) \
>>>>>>>>>>> TSDebug((_FD_CNTL)->tag, __VA_ARGS__); \
>>>>>>>>>>> } while(0)
>>>>>>>>>>> 
>>>>>>>>>>> The first parameter to TSFastDbg() is a pointer that was
>>> returned by
>>>>>>>>>>> TSCreateFastDbgCntl().  The remaining parameters are printf
>>>>> parameters
>>>>>>>>>>> (format string and values matching format specifiers).  The core
>>>>> would
>>>>>>>>>>> 
>>>>>>>>>>> be
>>>>>>>>>>> 
>>>>>>>>>>> responsible for changing the value of the 'on' fields in the
>>>>>>>>>>> 
>>>>>>>>>>> TSFastDbgCntl
>>>>>>>>>>> 
>>>>>>>>>>> instances.  For direct calls to TSDebug(), even if the tag is
>>>>>>>>>>> 
>>>>>>>>>>> disabled,
>>>>>>>>>>> 
>>>>>>>>>>> all
>>>>>>>>>>> 
>>>>>>>>>>> the parameters have to be prepared and pushed onto the stack.  As
>>>>> was
>>>>>>>>>>> discussed in IRC, you have to feel guilty about doing converting
>>> a
>>>>>>>>>>> string_view to a string and outputting it with c_str().  Because
>>> the
>>>>>>>>>>> 
>>>>>>>>>>> code
>>>>>>>>>>> 
>>>>>>>>>>> to do this will be executed even if the debug tag is disabled.
>>>>>>>>>>> Furthermore, for each call to TSDebug, it's necessary to do an
>>>>>>>>>>> 
>>>>>>>>>>> associative
>>>>>>>>>>> 
>>>>>>>>>>> lookup on the tag string to determine that output for the tag is
>>>>>>>>>>> 
>>>>>>>>>>> disabled.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> We would want to put equivalent capabilities in Diags.h in the
>>> core
>>>>>>>>>>> 
>>>>>>>>>>> for
>>>>>>>>>>> 
>>>>>>>>>>> debug output.  The implementation is non-trivial, but I think it
>>>>> would
>>>>>>>>>>> 
>>>>>>>>>>> take
>>>>>>>>>>> 
>>>>>>>>>>> at most a week including the Au test additions.
>>>>>>>>>>> 
>>>>>>>>>>> I looked at making TSFastDbg() an inline function rather than a
>>>>> macro:
>>>>>>>>>>> https://godbolt.org/z/IfVbBk
>>>>>>>>>>> The opitimization works well if you compile the code with the
>>> inline
>>>>>>>>>>> function as C, but the optimization is poor if you compile it as
>>>>> C++.
>>>>>>>>>>> 
>>>>>>>>>>> This
>>>>>>>>>>> 
>>>>>>>>>>> difference exists for both gcc and clang.  It's weird.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 

Reply via email to