I got some crude performance numbers for TSDebug() and TSDEBUG() when proxy.config.diags.debug.enabled is 0. The good news is that TSDebug() looks to take about 8 nanoseconds (the raw data is below). So, if we take 1000 requests per core per seconds as a ballpark estimate of TS performance, it would take a lot of them to have a significant impact. But TSDebug() takes about 30 times as long as TSDEBUG(). So it seems worthwhile to make the tiny effort to change TSDEBUG() so it won't swallow a following 'else' clause. I also think we should rename it to TSFastDbg().
We could rename TSDebug() to TSDebug_(), and TSDEBUG() to TSDebug(). But I think the very small benefit is outweighed by the potential risk. TS seems to be rife with race conditions, so a change that will alter the length of lots of code paths could create yet more instability. Here is the raw data (6 tests, 3 repetitions). Done with https://github.com/apache/trafficserver/pull/5638 . I don't understand why it's not more consistent. If anyone knows a better way to time this, please let me know. Performance for TSDebug() with no parameters, 1000000000 repetitions, microseconds user=7670000 system=20000 Performance for TSDEBUG() with no parameters, 1000000000 repetitions, microseconds user=260000 system=0 Performance for TSDebug() with parameters, 1000000000 repetitions, microseconds user=8030000 system=50000 Performance for TSDEBUG() with parameters, 1000000000 repetitions, microseconds user=280000 system=0 Performance for TSDebug() with volatile parameters, 1000000000 repetitions, microseconds user=8830000 system=30000 Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions, microseconds user=270000 system=0 Performance for TSDebug() with no parameters, 1000000000 repetitions, microseconds user=7520000 system=50000 Performance for TSDEBUG() with no parameters, 1000000000 repetitions, microseconds user=260000 system=0 Performance for TSDebug() with parameters, 1000000000 repetitions, microseconds user=8040000 system=30000 Performance for TSDEBUG() with parameters, 1000000000 repetitions, microseconds user=260000 system=0 Performance for TSDebug() with volatile parameters, 1000000000 repetitions, microseconds user=8450000 system=50000 Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions, microseconds user=260000 system=0 Performance for TSDebug() with no parameters, 1000000000 repetitions, microseconds user=7470000 system=80000 Performance for TSDEBUG() with no parameters, 1000000000 repetitions, microseconds user=260000 system=0 Performance for TSDebug() with parameters, 1000000000 repetitions, microseconds user=8120000 system=50000 Performance for TSDEBUG() with parameters, 1000000000 repetitions, microseconds user=270000 system=0 Performance for TSDebug() with volatile parameters, 1000000000 repetitions, microseconds user=8500000 system=60000 Performance for TSDEBUG() with volatile parameters, 1000000000 repetitions, microseconds user=270000 system=0 On Wed, Jun 12, 2019 at 12:58 PM Walt Karas <wka...@verizonmedia.com> wrote: > Since I'm still relatively new to this application, there are limits to > what I can take on on my own initiative. I don't see how anything I've > proposed would interfere with any higher-priority work. > > On Wed, Jun 12, 2019 at 12:41 PM Leif Hedstrom <zw...@apache.org> wrote: > >> Also, I’d like to reiterate my call to arms on v9.0.0: >> >> Please, everyone lets focus on the things that are critical to ATS >> v9.0.0 readiness! >> >> >> This includes: >> >> 1) Fixing all the critical bugs (and there’s a lot of them) >> >> 2) Testing “master" >> >> 3) Doing the incompatible changes that needs to be done before we >> can release >> >> 4) Removing features that we do not want to support for the next >> 3+ years >> >> 5) Make sure that v9.0.0 has the functionality that you >> *absolutely* need to be able >> to use it >> >> >> I kinda feel that several of these recent proposal do not fit into this, >> and in fact, are a fair amount of distraction. I urge everyone to not go >> down this route until we’ve got v9.0.0 stable and functional, and instead >> focus on the v9.0.0 tasks at hand. I don’t think this one for example fits >> into any of those 5 points above. >> >> My $.01. >> >> — Leif >> >> P.s >> https://github.com/apache/trafficserver/issues/5457 >> >> >> > On Jun 12, 2019, at 11:30 AM, Walt Karas <wka...@verizonmedia.com.INVALID> >> wrote: >> > >> > I would guess the performance increase would be well under 1%. But >> it's a >> > fairly simple change. The biggest performance increase I think would >> come >> > from avoiding a PCRE match, to see if the specific debug tag was >> enabled. >> > For some TSDebug calls, there can be many arguments to push, and >> perhaps a >> > fair amount of calculation to generate the correct arguments. >> > >> > On Wed, Jun 12, 2019 at 12:21 PM Bryan Call <bc...@apache.org> wrote: >> > >> >> I am surprised you are seeing this as a performance issue. Looking at >> the >> >> code it looks like 1 function call and a couple pointer dereferences. >> >> >> >> If you wanted to speed up TSDebug and make the code harder to maintain >> you >> >> could create a structure that looks like like DiagsConfig in ts.h and >> then >> >> set a pointer used in the API to look at the gobal DiagsConfig. Then >> you >> >> could have a macro that would do basically what on() does. This would >> >> bypass a function call and would have 1 more pointer dereference than >> your >> >> example. >> >> >> >> -Bryan >> >> >> >> >> >>> On Jun 12, 2019, at 10:00 AM, Walt Karas <wka...@verizonmedia.com >> .INVALID> >> >> wrote: >> >>> >> >>> What do you mean by "exposing diags"? >> >>> >> >>> On Wed, Jun 12, 2019 at 11:50 AM Bryan Call <bc...@apache.org> wrote: >> >>> >> >>>> I would rather see TSDebug be optimized and exposing diags instead of >> >>>> creating a new type of debug. >> >>>> >> >>>> -Bryan >> >>>> >> >>>> >> >>>>> On Jun 12, 2019, at 9:44 AM, Walt Karas <wka...@verizonmedia.com >> >> .INVALID> >> >>>> wrote: >> >>>>> >> >>>>> I don't believe so. The issue I'm trying to address is that there >> is >> >>>> still >> >>>>> significant execution time overhead for debug output calls when >> >>>>> proxy.config.diags.debug.enabled is 0. This change would make the >> >>>>> execution time overhead negligible (if TSFastDbg is used or TSDebug >> is >> >>>>> changed to be like the proposed TSFastDbg). Another way of putting >> it >> >>>> is, >> >>>>> I want to clean up the existing TSDEBUG alternative to TSDebug. >> >>>>> >> >>>>> On Wed, Jun 12, 2019 at 11:31 AM Bryan Call <bc...@apache.org> >> wrote: >> >>>>> >> >>>>>> How would this be controlled on the command line when running >> >>>>>> trafic_server -T? >> >>>>>> >> >>>>>> It looks like it is a departure from how debug tags and controlled >> >> now. >> >>>>>> >> >>>>>> -Bryan >> >>>>>> >> >>>>>> >> >>>>>>> On Jun 11, 2019, at 1:21 PM, Walt Karas <wka...@verizonmedia.com >> >>>> .INVALID> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>> Any +1's for simply replacing TSDebug with the proposed TSFastDbg >> ? >> >>>>>>> >> >>>>>>> On Mon, Apr 8, 2019 at 6:20 PM Leif Hedstrom <zw...@apache.org> >> >> wrote: >> >>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Apr 4, 2019, at 8:05 AM, Walt Karas <wka...@verizonmedia.com >> >>>>>> .INVALID> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Because senility. probably "on" should be a non-pointer of type >> >> const >> >>>>>>>> volatile char, and InkAPI in the core can just keep a pointer to >> the >> >>>>>> whole >> >>>>>>>> structure for updating. >> >>>>>>>> >> >>>>>>>> But I'm leaning more to fixing up the capability that already >> kinda >> >>>>>> there. >> >>>>>>>> Which is optimizimg only for the case when all debug output is >> >>>> disabled. >> >>>>>>>> Especially since maybe zwoop would be on my side for once :o) . >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> My bad, whatever Walt said, I take the opposite stance. >> >>>>>>>> >> >>>>>>>> — leif >> >>>>>>>> >> >>>>>>>> P.s >> >>>>>>>> J/K! >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Thu, Apr 4, 2019 at 8:51 AM Alan Carroll >> >>>>>>>> <solidwallofc...@verizonmedia.com.invalid> wrote: >> >>>>>>>> >> >>>>>>>> This could enable a significant speed up for debug tags. One >> point >> >> is >> >>>>>> that >> >>>>>>>> when the debug tag string is set, at that point the debug objects >> >>>> could >> >>>>>> be >> >>>>>>>> updated to the correct state according to the debug tag string, >> >> rather >> >>>>>> than >> >>>>>>>> checking it every time a debug message is logged. >> >>>>>>>> >> >>>>>>>> I have to ask, why is there a pointer to the enable flag, >> instead of >> >>>>>> just >> >>>>>>>> the enable flag? Your logic doesn't even check that flag, it >> checks >> >>>>>> whether >> >>>>>>>> the pointer is null. >> >>>>>>>> >> >>>>>>>> On Wed, Apr 3, 2019 at 11:52 AM Leif Hedstrom <zw...@apache.org> >> >>>> wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Apr 3, 2019, at 10:47 AM, Walt Karas <wka...@verizonmedia.com >> >>>>>>>> >> >>>>>>>> .INVALID> >> >>>>>>>> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> It looks like the TS API already has less granular version of >> this: >> >>>>>>>> >> >>>>>>>> extern int diags_on_for_plugins; >> >>>>>>>> >> >>>>>>>> #define TSDEBUG \ >> >>>>>>>> >> >>>>>>>> if (diags_on_for_plugins) \ >> >>>>>>>> >> >>>>>>>> TSDebug >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> A problem with this is if you write: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> if (flag) >> >>>>>>>> >> >>>>>>>> TSDEBUG(yadayada); >> >>>>>>>> >> >>>>>>>> else >> >>>>>>>> >> >>>>>>>> do_something(); >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> then the else will be associated with the if hidden in the macro. >> >> I'd >> >>>>>>>> prefer to change to something like: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> extern const volatile char TS_detail_dbg_output_enabled; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> #define TSFastDbg(_TAG, ...) \ >> >>>>>>>> do { \ >> >>>>>>>> if (TS_detail_dbg_output_enabled) \ >> >>>>>>>> TSDebug(_TAG, __VA_ARGS__); \ >> >>>>>>>> >> >>>>>>>> } while(0) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Yeh, that kinda sounds like a bug, doesn’t it? >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> The negative is that we'd only see the benefit when all debug >> output >> >>>>>>>> >> >>>>>>>> was >> >>>>>>>> >> >>>>>>>> disabled. But, in any case, that is the main benefit, to avoid >> >>>> impacts >> >>>>>>>> during normal operation. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Right, we should always optimize for the common cases, obviously. >> >> When >> >>>>>>>> there’s a decision to make, where you can either make it >> “equally” >> >>>>>>>> expensive always, or more expensive in a rare case, but cheaper >> in >> >> the >> >>>>>>>> common case, we should likely chose the latter. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Looking in src/tscore/Diags.cc, it looks like the debug enable >> flag >> >> is >> >>>>>>>> >> >>>>>>>> only >> >>>>>>>> >> >>>>>>>> set at startup time. So we wouldn't be able to enable TSFastDbg >> >> using >> >>>>>>>> traffic_cntl without adding complexity. >> >>>>>>>> >> >>>>>>>> On Tue, Apr 2, 2019 at 5:48 PM Walt Karas < >> wka...@verizonmedia.com> >> >>>>>>>> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Add this new structure to apidefs.h: >> >>>>>>>> >> >>>>>>>> typedef struct TSFastDbgCntl_ >> >>>>>>>> { >> >>>>>>>> const char * const tag; // nul-terminated string >> >>>>>>>> const volatile char * const on; // pointer to 1-byte flag >> >>>>>>>> } >> >>>>>>>> TSFastDbgCntl; >> >>>>>>>> >> >>>>>>>> Add this new API function, that returns a pointer to an instance >> of >> >>>>>>>> >> >>>>>>>> the >> >>>>>>>> >> >>>>>>>> above structure: >> >>>>>>>> >> >>>>>>>> TSFastDbgCntl * TSCreateFastDbgCntl(const char *tag); >> >>>>>>>> >> >>>>>>>> Add this macro, which would be used in place of direct use of >> >>>>>>>> >> >>>>>>>> TSDebug(). >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> #define TSFastDbg(_FD_CNTL, ...) \ >> >>>>>>>> do { \ >> >>>>>>>> if ((_FD_CNTL)->on) \ >> >>>>>>>> TSDebug((_FD_CNTL)->tag, __VA_ARGS__); \ >> >>>>>>>> } while(0) >> >>>>>>>> >> >>>>>>>> The first parameter to TSFastDbg() is a pointer that was >> returned by >> >>>>>>>> TSCreateFastDbgCntl(). The remaining parameters are printf >> >> parameters >> >>>>>>>> (format string and values matching format specifiers). The core >> >> would >> >>>>>>>> >> >>>>>>>> be >> >>>>>>>> >> >>>>>>>> responsible for changing the value of the 'on' fields in the >> >>>>>>>> >> >>>>>>>> TSFastDbgCntl >> >>>>>>>> >> >>>>>>>> instances. For direct calls to TSDebug(), even if the tag is >> >>>>>>>> >> >>>>>>>> disabled, >> >>>>>>>> >> >>>>>>>> all >> >>>>>>>> >> >>>>>>>> the parameters have to be prepared and pushed onto the stack. As >> >> was >> >>>>>>>> discussed in IRC, you have to feel guilty about doing converting >> a >> >>>>>>>> string_view to a string and outputting it with c_str(). Because >> the >> >>>>>>>> >> >>>>>>>> code >> >>>>>>>> >> >>>>>>>> to do this will be executed even if the debug tag is disabled. >> >>>>>>>> Furthermore, for each call to TSDebug, it's necessary to do an >> >>>>>>>> >> >>>>>>>> associative >> >>>>>>>> >> >>>>>>>> lookup on the tag string to determine that output for the tag is >> >>>>>>>> >> >>>>>>>> disabled. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> We would want to put equivalent capabilities in Diags.h in the >> core >> >>>>>>>> >> >>>>>>>> for >> >>>>>>>> >> >>>>>>>> debug output. The implementation is non-trivial, but I think it >> >> would >> >>>>>>>> >> >>>>>>>> take >> >>>>>>>> >> >>>>>>>> at most a week including the Au test additions. >> >>>>>>>> >> >>>>>>>> I looked at making TSFastDbg() an inline function rather than a >> >> macro: >> >>>>>>>> https://godbolt.org/z/IfVbBk >> >>>>>>>> The opitimization works well if you compile the code with the >> inline >> >>>>>>>> function as C, but the optimization is poor if you compile it as >> >> C++. >> >>>>>>>> >> >>>>>>>> This >> >>>>>>>> >> >>>>>>>> difference exists for both gcc and clang. It's weird. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >> >> >> >> >>