It looked like a simple fast change. Are you saying you can see that TSDEBUG would not start printing debug messages if you enabled the global flag with traffic_cntl ?
An alternative would be to have the core functions that print the debug messages check the global flag first before they call PCRE to check if the specific tag is enabled. That would improvement would have effect for both core and plugin debug messages. If wouldn't eliminate the parameter push and function call overhead, but the testing indicates that is much less significant. On Mon, Jun 17, 2019 at 2:01 PM Bryan Call <bc...@apache.org> wrote: > Walt, > > TSDEBUG uses diags_on_for_plugins and that looks to only be set on startup > and then just calls TSDebug(). It doesn’t make sense to replace TSDebug > with TSDEBUG. > > I don’t see why you are trying to optimize just the plugin debug. When > debug is enabled it has a performance impact on the entire system, not just > for plugins. > > -Bryan > > > > On Jun 14, 2019, at 4:51 PM, Leif Hedstrom <zw...@apache.org> wrote: > > > > > > > >> On Jun 14, 2019, at 5:18 PM, Walt Karas <wka...@verizonmedia.com.INVALID> > wrote: > >> > >> I got some crude performance numbers for TSDebug() and TSDEBUG() > >> when proxy.config.diags.debug.enabled is 0. The good news is that > >> TSDebug() looks to take about 8 nanoseconds (the raw data is below). > So, > >> if we take 1000 requests per core per seconds as a ballpark estimate of > TS > >> performance, it would take a lot of them to have a significant impact. > But > >> TSDebug() takes about 30 times as long as TSDEBUG(). So it seems > >> worthwhile to make the tiny effort to change TSDEBUG() so it won't > swallow > >> a following 'else' clause. I also think we should rename it to > TSFastDbg(). > > > > The numbers sounds good, but why on earth keep the old Debug? Why would > I chose to use a slow API over a fast API? That just makes no sense. > > > > if we do this, just change TSDebug(). If it’s incompatible, we document > it with appropriate warnings. > > > > Does any of this affect / improve the core Debug() statements? Those are > probably 10x - 100x more critical to get fast, because they are > *everywhere*. > > > > — leif > > > >> > >> We could rename TSDebug() to TSDebug_(), and TSDEBUG() to TSDebug(). > But I > >> think the very small benefit is outweighed by the potential risk. TS > seems > >> to be rife with race conditions, so a change that will alter the length > of > >> lots of code paths could create yet more instability. > >> > >> Here is the raw data (6 tests, 3 repetitions). Done with > >> https://github.com/apache/trafficserver/pull/5638 . I don't > understand why > >> it's not more consistent. If anyone knows a better way to time this, > >> please let me know. > >> > >> Performance for TSDebug() with no parameters, 1000000000 repetitions, > >> microseconds user=7670000 system=20000 > >> Performance for TSDEBUG() with no parameters, 1000000000 repetitions, > >> microseconds user=260000 system=0 > >> Performance for TSDebug() with parameters, 1000000000 repetitions, > >> microseconds user=8030000 system=50000 > >> Performance for TSDEBUG() with parameters, 1000000000 repetitions, > >> microseconds user=280000 system=0 > >> Performance for TSDebug() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=8830000 system=30000 > >> Performance for TSDEBUG() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=270000 system=0 > >> > >> Performance for TSDebug() with no parameters, 1000000000 repetitions, > >> microseconds user=7520000 system=50000 > >> Performance for TSDEBUG() with no parameters, 1000000000 repetitions, > >> microseconds user=260000 system=0 > >> Performance for TSDebug() with parameters, 1000000000 repetitions, > >> microseconds user=8040000 system=30000 > >> Performance for TSDEBUG() with parameters, 1000000000 repetitions, > >> microseconds user=260000 system=0 > >> Performance for TSDebug() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=8450000 system=50000 > >> Performance for TSDEBUG() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=260000 system=0 > >> > >> Performance for TSDebug() with no parameters, 1000000000 repetitions, > >> microseconds user=7470000 system=80000 > >> Performance for TSDEBUG() with no parameters, 1000000000 repetitions, > >> microseconds user=260000 system=0 > >> Performance for TSDebug() with parameters, 1000000000 repetitions, > >> microseconds user=8120000 system=50000 > >> Performance for TSDEBUG() with parameters, 1000000000 repetitions, > >> microseconds user=270000 system=0 > >> Performance for TSDebug() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=8500000 system=60000 > >> Performance for TSDEBUG() with volatile parameters, 1000000000 > repetitions, > >> microseconds user=270000 system=0 > >> > >> On Wed, Jun 12, 2019 at 12:58 PM Walt Karas <wka...@verizonmedia.com> > wrote: > >> > >>> Since I'm still relatively new to this application, there are limits to > >>> what I can take on on my own initiative. I don't see how anything I've > >>> proposed would interfere with any higher-priority work. > >>> > >>> On Wed, Jun 12, 2019 at 12:41 PM Leif Hedstrom <zw...@apache.org> > wrote: > >>> > >>>> Also, I’d like to reiterate my call to arms on v9.0.0: > >>>> > >>>> Please, everyone lets focus on the things that are critical to ATS > >>>> v9.0.0 readiness! > >>>> > >>>> > >>>> This includes: > >>>> > >>>> 1) Fixing all the critical bugs (and there’s a lot of them) > >>>> > >>>> 2) Testing “master" > >>>> > >>>> 3) Doing the incompatible changes that needs to be done before > we > >>>> can release > >>>> > >>>> 4) Removing features that we do not want to support for the next > >>>> 3+ years > >>>> > >>>> 5) Make sure that v9.0.0 has the functionality that you > >>>> *absolutely* need to be able > >>>> to use it > >>>> > >>>> > >>>> I kinda feel that several of these recent proposal do not fit into > this, > >>>> and in fact, are a fair amount of distraction. I urge everyone to not > go > >>>> down this route until we’ve got v9.0.0 stable and functional, and > instead > >>>> focus on the v9.0.0 tasks at hand. I don’t think this one for > example fits > >>>> into any of those 5 points above. > >>>> > >>>> My $.01. > >>>> > >>>> — Leif > >>>> > >>>> P.s > >>>> https://github.com/apache/trafficserver/issues/5457 > >>>> > >>>> > >>>>> On Jun 12, 2019, at 11:30 AM, Walt Karas <wka...@verizonmedia.com > .INVALID> > >>>> wrote: > >>>>> > >>>>> I would guess the performance increase would be well under 1%. But > >>>> it's a > >>>>> fairly simple change. The biggest performance increase I think would > >>>> come > >>>>> from avoiding a PCRE match, to see if the specific debug tag was > >>>> enabled. > >>>>> For some TSDebug calls, there can be many arguments to push, and > >>>> perhaps a > >>>>> fair amount of calculation to generate the correct arguments. > >>>>> > >>>>> On Wed, Jun 12, 2019 at 12:21 PM Bryan Call <bc...@apache.org> > wrote: > >>>>> > >>>>>> I am surprised you are seeing this as a performance issue. Looking > at > >>>> the > >>>>>> code it looks like 1 function call and a couple pointer > dereferences. > >>>>>> > >>>>>> If you wanted to speed up TSDebug and make the code harder to > maintain > >>>> you > >>>>>> could create a structure that looks like like DiagsConfig in ts.h > and > >>>> then > >>>>>> set a pointer used in the API to look at the gobal DiagsConfig. > Then > >>>> you > >>>>>> could have a macro that would do basically what on() does. This > would > >>>>>> bypass a function call and would have 1 more pointer dereference > than > >>>> your > >>>>>> example. > >>>>>> > >>>>>> -Bryan > >>>>>> > >>>>>> > >>>>>>> On Jun 12, 2019, at 10:00 AM, Walt Karas <wka...@verizonmedia.com > >>>> .INVALID> > >>>>>> wrote: > >>>>>>> > >>>>>>> What do you mean by "exposing diags"? > >>>>>>> > >>>>>>> On Wed, Jun 12, 2019 at 11:50 AM Bryan Call <bc...@apache.org> > wrote: > >>>>>>> > >>>>>>>> I would rather see TSDebug be optimized and exposing diags > instead of > >>>>>>>> creating a new type of debug. > >>>>>>>> > >>>>>>>> -Bryan > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jun 12, 2019, at 9:44 AM, Walt Karas <wka...@verizonmedia.com > >>>>>> .INVALID> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> I don't believe so. The issue I'm trying to address is that > there > >>>> is > >>>>>>>> still > >>>>>>>>> significant execution time overhead for debug output calls when > >>>>>>>>> proxy.config.diags.debug.enabled is 0. This change would make > the > >>>>>>>>> execution time overhead negligible (if TSFastDbg is used or > TSDebug > >>>> is > >>>>>>>>> changed to be like the proposed TSFastDbg). Another way of > putting > >>>> it > >>>>>>>> is, > >>>>>>>>> I want to clean up the existing TSDEBUG alternative to TSDebug. > >>>>>>>>> > >>>>>>>>> On Wed, Jun 12, 2019 at 11:31 AM Bryan Call <bc...@apache.org> > >>>> wrote: > >>>>>>>>> > >>>>>>>>>> How would this be controlled on the command line when running > >>>>>>>>>> trafic_server -T? > >>>>>>>>>> > >>>>>>>>>> It looks like it is a departure from how debug tags and > controlled > >>>>>> now. > >>>>>>>>>> > >>>>>>>>>> -Bryan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Jun 11, 2019, at 1:21 PM, Walt Karas < > wka...@verizonmedia.com > >>>>>>>> .INVALID> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Any +1's for simply replacing TSDebug with the proposed > TSFastDbg > >>>> ? > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Apr 8, 2019 at 6:20 PM Leif Hedstrom <zw...@apache.org > > > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Apr 4, 2019, at 8:05 AM, Walt Karas < > wka...@verizonmedia.com > >>>>>>>>>> .INVALID> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Because senility. probably "on" should be a non-pointer of > type > >>>>>> const > >>>>>>>>>>>> volatile char, and InkAPI in the core can just keep a pointer > to > >>>> the > >>>>>>>>>> whole > >>>>>>>>>>>> structure for updating. > >>>>>>>>>>>> > >>>>>>>>>>>> But I'm leaning more to fixing up the capability that already > >>>> kinda > >>>>>>>>>> there. > >>>>>>>>>>>> Which is optimizimg only for the case when all debug output is > >>>>>>>> disabled. > >>>>>>>>>>>> Especially since maybe zwoop would be on my side for once :o) > . > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> My bad, whatever Walt said, I take the opposite stance. > >>>>>>>>>>>> > >>>>>>>>>>>> — leif > >>>>>>>>>>>> > >>>>>>>>>>>> P.s > >>>>>>>>>>>> J/K! > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Apr 4, 2019 at 8:51 AM Alan Carroll > >>>>>>>>>>>> <solidwallofc...@verizonmedia.com.invalid> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> This could enable a significant speed up for debug tags. One > >>>> point > >>>>>> is > >>>>>>>>>> that > >>>>>>>>>>>> when the debug tag string is set, at that point the debug > objects > >>>>>>>> could > >>>>>>>>>> be > >>>>>>>>>>>> updated to the correct state according to the debug tag > string, > >>>>>> rather > >>>>>>>>>> than > >>>>>>>>>>>> checking it every time a debug message is logged. > >>>>>>>>>>>> > >>>>>>>>>>>> I have to ask, why is there a pointer to the enable flag, > >>>> instead of > >>>>>>>>>> just > >>>>>>>>>>>> the enable flag? Your logic doesn't even check that flag, it > >>>> checks > >>>>>>>>>> whether > >>>>>>>>>>>> the pointer is null. > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Apr 3, 2019 at 11:52 AM Leif Hedstrom < > zw...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Apr 3, 2019, at 10:47 AM, Walt Karas < > wka...@verizonmedia.com > >>>>>>>>>>>> > >>>>>>>>>>>> .INVALID> > >>>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> It looks like the TS API already has less granular version of > >>>> this: > >>>>>>>>>>>> > >>>>>>>>>>>> extern int diags_on_for_plugins; > >>>>>>>>>>>> > >>>>>>>>>>>> #define TSDEBUG \ > >>>>>>>>>>>> > >>>>>>>>>>>> if (diags_on_for_plugins) \ > >>>>>>>>>>>> > >>>>>>>>>>>> TSDebug > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> A problem with this is if you write: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> if (flag) > >>>>>>>>>>>> > >>>>>>>>>>>> TSDEBUG(yadayada); > >>>>>>>>>>>> > >>>>>>>>>>>> else > >>>>>>>>>>>> > >>>>>>>>>>>> do_something(); > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> then the else will be associated with the if hidden in the > macro. > >>>>>> I'd > >>>>>>>>>>>> prefer to change to something like: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> extern const volatile char TS_detail_dbg_output_enabled; > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> #define TSFastDbg(_TAG, ...) \ > >>>>>>>>>>>> do { \ > >>>>>>>>>>>> if (TS_detail_dbg_output_enabled) \ > >>>>>>>>>>>> TSDebug(_TAG, __VA_ARGS__); \ > >>>>>>>>>>>> > >>>>>>>>>>>> } while(0) > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Yeh, that kinda sounds like a bug, doesn’t it? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> The negative is that we'd only see the benefit when all debug > >>>> output > >>>>>>>>>>>> > >>>>>>>>>>>> was > >>>>>>>>>>>> > >>>>>>>>>>>> disabled. But, in any case, that is the main benefit, to > avoid > >>>>>>>> impacts > >>>>>>>>>>>> during normal operation. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Right, we should always optimize for the common cases, > obviously. > >>>>>> When > >>>>>>>>>>>> there’s a decision to make, where you can either make it > >>>> “equally” > >>>>>>>>>>>> expensive always, or more expensive in a rare case, but > cheaper > >>>> in > >>>>>> the > >>>>>>>>>>>> common case, we should likely chose the latter. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Looking in src/tscore/Diags.cc, it looks like the debug enable > >>>> flag > >>>>>> is > >>>>>>>>>>>> > >>>>>>>>>>>> only > >>>>>>>>>>>> > >>>>>>>>>>>> set at startup time. So we wouldn't be able to enable > TSFastDbg > >>>>>> using > >>>>>>>>>>>> traffic_cntl without adding complexity. > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Apr 2, 2019 at 5:48 PM Walt Karas < > >>>> wka...@verizonmedia.com> > >>>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Add this new structure to apidefs.h: > >>>>>>>>>>>> > >>>>>>>>>>>> typedef struct TSFastDbgCntl_ > >>>>>>>>>>>> { > >>>>>>>>>>>> const char * const tag; // nul-terminated string > >>>>>>>>>>>> const volatile char * const on; // pointer to 1-byte flag > >>>>>>>>>>>> } > >>>>>>>>>>>> TSFastDbgCntl; > >>>>>>>>>>>> > >>>>>>>>>>>> Add this new API function, that returns a pointer to an > instance > >>>> of > >>>>>>>>>>>> > >>>>>>>>>>>> the > >>>>>>>>>>>> > >>>>>>>>>>>> above structure: > >>>>>>>>>>>> > >>>>>>>>>>>> TSFastDbgCntl * TSCreateFastDbgCntl(const char *tag); > >>>>>>>>>>>> > >>>>>>>>>>>> Add this macro, which would be used in place of direct use of > >>>>>>>>>>>> > >>>>>>>>>>>> TSDebug(). > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> #define TSFastDbg(_FD_CNTL, ...) \ > >>>>>>>>>>>> do { \ > >>>>>>>>>>>> if ((_FD_CNTL)->on) \ > >>>>>>>>>>>> TSDebug((_FD_CNTL)->tag, __VA_ARGS__); \ > >>>>>>>>>>>> } while(0) > >>>>>>>>>>>> > >>>>>>>>>>>> The first parameter to TSFastDbg() is a pointer that was > >>>> returned by > >>>>>>>>>>>> TSCreateFastDbgCntl(). The remaining parameters are printf > >>>>>> parameters > >>>>>>>>>>>> (format string and values matching format specifiers). The > core > >>>>>> would > >>>>>>>>>>>> > >>>>>>>>>>>> be > >>>>>>>>>>>> > >>>>>>>>>>>> responsible for changing the value of the 'on' fields in the > >>>>>>>>>>>> > >>>>>>>>>>>> TSFastDbgCntl > >>>>>>>>>>>> > >>>>>>>>>>>> instances. For direct calls to TSDebug(), even if the tag is > >>>>>>>>>>>> > >>>>>>>>>>>> disabled, > >>>>>>>>>>>> > >>>>>>>>>>>> all > >>>>>>>>>>>> > >>>>>>>>>>>> the parameters have to be prepared and pushed onto the > stack. As > >>>>>> was > >>>>>>>>>>>> discussed in IRC, you have to feel guilty about doing > converting > >>>> a > >>>>>>>>>>>> string_view to a string and outputting it with c_str(). > Because > >>>> the > >>>>>>>>>>>> > >>>>>>>>>>>> code > >>>>>>>>>>>> > >>>>>>>>>>>> to do this will be executed even if the debug tag is disabled. > >>>>>>>>>>>> Furthermore, for each call to TSDebug, it's necessary to do an > >>>>>>>>>>>> > >>>>>>>>>>>> associative > >>>>>>>>>>>> > >>>>>>>>>>>> lookup on the tag string to determine that output for the tag > is > >>>>>>>>>>>> > >>>>>>>>>>>> disabled. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> We would want to put equivalent capabilities in Diags.h in the > >>>> core > >>>>>>>>>>>> > >>>>>>>>>>>> for > >>>>>>>>>>>> > >>>>>>>>>>>> debug output. The implementation is non-trivial, but I think > it > >>>>>> would > >>>>>>>>>>>> > >>>>>>>>>>>> take > >>>>>>>>>>>> > >>>>>>>>>>>> at most a week including the Au test additions. > >>>>>>>>>>>> > >>>>>>>>>>>> I looked at making TSFastDbg() an inline function rather than > a > >>>>>> macro: > >>>>>>>>>>>> https://godbolt.org/z/IfVbBk > >>>>>>>>>>>> The opitimization works well if you compile the code with the > >>>> inline > >>>>>>>>>>>> function as C, but the optimization is poor if you compile it > as > >>>>>> C++. > >>>>>>>>>>>> > >>>>>>>>>>>> This > >>>>>>>>>>>> > >>>>>>>>>>>> difference exists for both gcc and clang. It's weird. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > > > >