> On Jun 14, 2019, at 5:18 PM, Walt Karas <wka...@verizonmedia.com.INVALID>
> wrote:
>
> I got some crude performance numbers for TSDebug() and TSDEBUG()
> when proxy.config.diags.debug.enabled is 0. The good news is that
> TSDebug() looks to take about 8 nanoseconds (the raw data is below). So,
> if we take 1000 requests per core per seconds as a ballpark estimate of TS
> performance, it would take a lot of them to have a significant impact. But
> TSDebug() takes about 30 times as long as TSDEBUG(). So it seems
> worthwhile to make the tiny effort to change TSDEBUG() so it won't swallow
> a following 'else' clause. I also think we should rename it to TSFastDbg().
The numbers sounds good, but why on earth keep the old Debug? Why would I chose
to use a slow API over a fast API? That just makes no sense.
if we do this, just change TSDebug(). If it’s incompatible, we document it
with appropriate warnings.
Does any of this affect / improve the core Debug() statements? Those are
probably 10x - 100x more critical to get fast, because they are *everywhere*.
— leif
>
> We could rename TSDebug() to TSDebug_(), and TSDEBUG() to TSDebug(). But I
> think the very small benefit is outweighed by the potential risk. TS seems
> to be rife with race conditions, so a change that will alter the length of
> lots of code paths could create yet more instability.
>
> Here is the raw data (6 tests, 3 repetitions). Done with
> https://github.com/apache/trafficserver/pull/5638 . I don't understand why
> it's not more consistent. If anyone knows a better way to time this,
> please let me know.
>
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7670000 system=20000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8030000 system=50000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=280000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8830000 system=30000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
>
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7520000 system=50000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8040000 system=30000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8450000 system=50000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
>
> Performance for TSDebug() with no parameters, 1000000000 repetitions,
> microseconds user=7470000 system=80000
> Performance for TSDEBUG() with no parameters, 1000000000 repetitions,
> microseconds user=260000 system=0
> Performance for TSDebug() with parameters, 1000000000 repetitions,
> microseconds user=8120000 system=50000
> Performance for TSDEBUG() with parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
> Performance for TSDebug() with volatile parameters, 1000000000 repetitions,
> microseconds user=8500000 system=60000
> Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions,
> microseconds user=270000 system=0
>
> On Wed, Jun 12, 2019 at 12:58 PM Walt Karas <wka...@verizonmedia.com> wrote:
>
>> Since I'm still relatively new to this application, there are limits to
>> what I can take on on my own initiative. I don't see how anything I've
>> proposed would interfere with any higher-priority work.
>>
>> On Wed, Jun 12, 2019 at 12:41 PM Leif Hedstrom <zw...@apache.org> wrote:
>>
>>> Also, I’d like to reiterate my call to arms on v9.0.0:
>>>
>>> Please, everyone lets focus on the things that are critical to ATS
>>> v9.0.0 readiness!
>>>
>>>
>>> This includes:
>>>
>>> 1) Fixing all the critical bugs (and there’s a lot of them)
>>>
>>> 2) Testing “master"
>>>
>>> 3) Doing the incompatible changes that needs to be done before we
>>> can release
>>>
>>> 4) Removing features that we do not want to support for the next
>>> 3+ years
>>>
>>> 5) Make sure that v9.0.0 has the functionality that you
>>> *absolutely* need to be able
>>> to use it
>>>
>>>
>>> I kinda feel that several of these recent proposal do not fit into this,
>>> and in fact, are a fair amount of distraction. I urge everyone to not go
>>> down this route until we’ve got v9.0.0 stable and functional, and instead
>>> focus on the v9.0.0 tasks at hand. I don’t think this one for example fits
>>> into any of those 5 points above.
>>>
>>> My $.01.
>>>
>>> — Leif
>>>
>>> P.s
>>> https://github.com/apache/trafficserver/issues/5457
>>>
>>>
>>>> On Jun 12, 2019, at 11:30 AM, Walt Karas <wka...@verizonmedia.com.INVALID>
>>> wrote:
>>>>
>>>> I would guess the performance increase would be well under 1%. But
>>> it's a
>>>> fairly simple change. The biggest performance increase I think would
>>> come
>>>> from avoiding a PCRE match, to see if the specific debug tag was
>>> enabled.
>>>> For some TSDebug calls, there can be many arguments to push, and
>>> perhaps a
>>>> fair amount of calculation to generate the correct arguments.
>>>>
>>>> On Wed, Jun 12, 2019 at 12:21 PM Bryan Call <bc...@apache.org> wrote:
>>>>
>>>>> I am surprised you are seeing this as a performance issue. Looking at
>>> the
>>>>> code it looks like 1 function call and a couple pointer dereferences.
>>>>>
>>>>> If you wanted to speed up TSDebug and make the code harder to maintain
>>> you
>>>>> could create a structure that looks like like DiagsConfig in ts.h and
>>> then
>>>>> set a pointer used in the API to look at the gobal DiagsConfig. Then
>>> you
>>>>> could have a macro that would do basically what on() does. This would
>>>>> bypass a function call and would have 1 more pointer dereference than
>>> your
>>>>> example.
>>>>>
>>>>> -Bryan
>>>>>
>>>>>
>>>>>> On Jun 12, 2019, at 10:00 AM, Walt Karas <wka...@verizonmedia.com
>>> .INVALID>
>>>>> wrote:
>>>>>>
>>>>>> What do you mean by "exposing diags"?
>>>>>>
>>>>>> On Wed, Jun 12, 2019 at 11:50 AM Bryan Call <bc...@apache.org> wrote:
>>>>>>
>>>>>>> I would rather see TSDebug be optimized and exposing diags instead of
>>>>>>> creating a new type of debug.
>>>>>>>
>>>>>>> -Bryan
>>>>>>>
>>>>>>>
>>>>>>>> On Jun 12, 2019, at 9:44 AM, Walt Karas <wka...@verizonmedia.com
>>>>> .INVALID>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I don't believe so. The issue I'm trying to address is that there
>>> is
>>>>>>> still
>>>>>>>> significant execution time overhead for debug output calls when
>>>>>>>> proxy.config.diags.debug.enabled is 0. This change would make the
>>>>>>>> execution time overhead negligible (if TSFastDbg is used or TSDebug
>>> is
>>>>>>>> changed to be like the proposed TSFastDbg). Another way of putting
>>> it
>>>>>>> is,
>>>>>>>> I want to clean up the existing TSDEBUG alternative to TSDebug.
>>>>>>>>
>>>>>>>> On Wed, Jun 12, 2019 at 11:31 AM Bryan Call <bc...@apache.org>
>>> wrote:
>>>>>>>>
>>>>>>>>> How would this be controlled on the command line when running
>>>>>>>>> trafic_server -T?
>>>>>>>>>
>>>>>>>>> It looks like it is a departure from how debug tags and controlled
>>>>> now.
>>>>>>>>>
>>>>>>>>> -Bryan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Jun 11, 2019, at 1:21 PM, Walt Karas <wka...@verizonmedia.com
>>>>>>> .INVALID>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Any +1's for simply replacing TSDebug with the proposed TSFastDbg
>>> ?
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 8, 2019 at 6:20 PM Leif Hedstrom <zw...@apache.org>
>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Apr 4, 2019, at 8:05 AM, Walt Karas <wka...@verizonmedia.com
>>>>>>>>> .INVALID>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Because senility. probably "on" should be a non-pointer of type
>>>>> const
>>>>>>>>>>> volatile char, and InkAPI in the core can just keep a pointer to
>>> the
>>>>>>>>> whole
>>>>>>>>>>> structure for updating.
>>>>>>>>>>>
>>>>>>>>>>> But I'm leaning more to fixing up the capability that already
>>> kinda
>>>>>>>>> there.
>>>>>>>>>>> Which is optimizimg only for the case when all debug output is
>>>>>>> disabled.
>>>>>>>>>>> Especially since maybe zwoop would be on my side for once :o) .
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> My bad, whatever Walt said, I take the opposite stance.
>>>>>>>>>>>
>>>>>>>>>>> — leif
>>>>>>>>>>>
>>>>>>>>>>> P.s
>>>>>>>>>>> J/K!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 4, 2019 at 8:51 AM Alan Carroll
>>>>>>>>>>> <solidwallofc...@verizonmedia.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>> This could enable a significant speed up for debug tags. One
>>> point
>>>>> is
>>>>>>>>> that
>>>>>>>>>>> when the debug tag string is set, at that point the debug objects
>>>>>>> could
>>>>>>>>> be
>>>>>>>>>>> updated to the correct state according to the debug tag string,
>>>>> rather
>>>>>>>>> than
>>>>>>>>>>> checking it every time a debug message is logged.
>>>>>>>>>>>
>>>>>>>>>>> I have to ask, why is there a pointer to the enable flag,
>>> instead of
>>>>>>>>> just
>>>>>>>>>>> the enable flag? Your logic doesn't even check that flag, it
>>> checks
>>>>>>>>> whether
>>>>>>>>>>> the pointer is null.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 3, 2019 at 11:52 AM Leif Hedstrom <zw...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Apr 3, 2019, at 10:47 AM, Walt Karas <wka...@verizonmedia.com
>>>>>>>>>>>
>>>>>>>>>>> .INVALID>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It looks like the TS API already has less granular version of
>>> this:
>>>>>>>>>>>
>>>>>>>>>>> extern int diags_on_for_plugins;
>>>>>>>>>>>
>>>>>>>>>>> #define TSDEBUG \
>>>>>>>>>>>
>>>>>>>>>>> if (diags_on_for_plugins) \
>>>>>>>>>>>
>>>>>>>>>>> TSDebug
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> A problem with this is if you write:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> if (flag)
>>>>>>>>>>>
>>>>>>>>>>> TSDEBUG(yadayada);
>>>>>>>>>>>
>>>>>>>>>>> else
>>>>>>>>>>>
>>>>>>>>>>> do_something();
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> then the else will be associated with the if hidden in the macro.
>>>>> I'd
>>>>>>>>>>> prefer to change to something like:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> extern const volatile char TS_detail_dbg_output_enabled;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #define TSFastDbg(_TAG, ...) \
>>>>>>>>>>> do { \
>>>>>>>>>>> if (TS_detail_dbg_output_enabled) \
>>>>>>>>>>> TSDebug(_TAG, __VA_ARGS__); \
>>>>>>>>>>>
>>>>>>>>>>> } while(0)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yeh, that kinda sounds like a bug, doesn’t it?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The negative is that we'd only see the benefit when all debug
>>> output
>>>>>>>>>>>
>>>>>>>>>>> was
>>>>>>>>>>>
>>>>>>>>>>> disabled. But, in any case, that is the main benefit, to avoid
>>>>>>> impacts
>>>>>>>>>>> during normal operation.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Right, we should always optimize for the common cases, obviously.
>>>>> When
>>>>>>>>>>> there’s a decision to make, where you can either make it
>>> “equally”
>>>>>>>>>>> expensive always, or more expensive in a rare case, but cheaper
>>> in
>>>>> the
>>>>>>>>>>> common case, we should likely chose the latter.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking in src/tscore/Diags.cc, it looks like the debug enable
>>> flag
>>>>> is
>>>>>>>>>>>
>>>>>>>>>>> only
>>>>>>>>>>>
>>>>>>>>>>> set at startup time. So we wouldn't be able to enable TSFastDbg
>>>>> using
>>>>>>>>>>> traffic_cntl without adding complexity.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 2, 2019 at 5:48 PM Walt Karas <
>>> wka...@verizonmedia.com>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Add this new structure to apidefs.h:
>>>>>>>>>>>
>>>>>>>>>>> typedef struct TSFastDbgCntl_
>>>>>>>>>>> {
>>>>>>>>>>> const char * const tag; // nul-terminated string
>>>>>>>>>>> const volatile char * const on; // pointer to 1-byte flag
>>>>>>>>>>> }
>>>>>>>>>>> TSFastDbgCntl;
>>>>>>>>>>>
>>>>>>>>>>> Add this new API function, that returns a pointer to an instance
>>> of
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> above structure:
>>>>>>>>>>>
>>>>>>>>>>> TSFastDbgCntl * TSCreateFastDbgCntl(const char *tag);
>>>>>>>>>>>
>>>>>>>>>>> Add this macro, which would be used in place of direct use of
>>>>>>>>>>>
>>>>>>>>>>> TSDebug().
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> #define TSFastDbg(_FD_CNTL, ...) \
>>>>>>>>>>> do { \
>>>>>>>>>>> if ((_FD_CNTL)->on) \
>>>>>>>>>>> TSDebug((_FD_CNTL)->tag, __VA_ARGS__); \
>>>>>>>>>>> } while(0)
>>>>>>>>>>>
>>>>>>>>>>> The first parameter to TSFastDbg() is a pointer that was
>>> returned by
>>>>>>>>>>> TSCreateFastDbgCntl(). The remaining parameters are printf
>>>>> parameters
>>>>>>>>>>> (format string and values matching format specifiers). The core
>>>>> would
>>>>>>>>>>>
>>>>>>>>>>> be
>>>>>>>>>>>
>>>>>>>>>>> responsible for changing the value of the 'on' fields in the
>>>>>>>>>>>
>>>>>>>>>>> TSFastDbgCntl
>>>>>>>>>>>
>>>>>>>>>>> instances. For direct calls to TSDebug(), even if the tag is
>>>>>>>>>>>
>>>>>>>>>>> disabled,
>>>>>>>>>>>
>>>>>>>>>>> all
>>>>>>>>>>>
>>>>>>>>>>> the parameters have to be prepared and pushed onto the stack. As
>>>>> was
>>>>>>>>>>> discussed in IRC, you have to feel guilty about doing converting
>>> a
>>>>>>>>>>> string_view to a string and outputting it with c_str(). Because
>>> the
>>>>>>>>>>>
>>>>>>>>>>> code
>>>>>>>>>>>
>>>>>>>>>>> to do this will be executed even if the debug tag is disabled.
>>>>>>>>>>> Furthermore, for each call to TSDebug, it's necessary to do an
>>>>>>>>>>>
>>>>>>>>>>> associative
>>>>>>>>>>>
>>>>>>>>>>> lookup on the tag string to determine that output for the tag is
>>>>>>>>>>>
>>>>>>>>>>> disabled.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We would want to put equivalent capabilities in Diags.h in the
>>> core
>>>>>>>>>>>
>>>>>>>>>>> for
>>>>>>>>>>>
>>>>>>>>>>> debug output. The implementation is non-trivial, but I think it
>>>>> would
>>>>>>>>>>>
>>>>>>>>>>> take
>>>>>>>>>>>
>>>>>>>>>>> at most a week including the Au test additions.
>>>>>>>>>>>
>>>>>>>>>>> I looked at making TSFastDbg() an inline function rather than a
>>>>> macro:
>>>>>>>>>>> https://godbolt.org/z/IfVbBk
>>>>>>>>>>> The opitimization works well if you compile the code with the
>>> inline
>>>>>>>>>>> function as C, but the optimization is poor if you compile it as
>>>>> C++.
>>>>>>>>>>>
>>>>>>>>>>> This
>>>>>>>>>>>
>>>>>>>>>>> difference exists for both gcc and clang. It's weird.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>