Re: [openstack-dev] Oslo logging eats system level tracebacks by default

Joshua Harlow Wed, 28 May 2014 18:38:30 -0700

An idea, and one that I've tried to apply in taskflow.

Since 3.0 added with http://legacy.python.org/dev/peps/pep-3134 if we can 
*simulate* except chaining in our projects this would likely help even more 
with traceability and debugging. I have a common exception that that I've been 
using to approximate this (since chaining doesn’t work/exist in 2.7 and 2.6) 
and it might be useful for others to have similar types of exceptions (and try 
to print out as much of it as possible when errors occur).


https://github.com/openstack/taskflow/blob/master/taskflow/exceptions.py#L22 
(see pformat() method that dumps a large string containing all connected 
causes).

Might be useful for others (if there are better approaches/libraries that do 
similar things let me know),

-Josh

From: Morgan Fainberg 
<[email protected]<mailto:[email protected]>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, May 28, 2014 at 8:53 AM
To: Jay Pipes <[email protected]<mailto:[email protected]>>, "OpenStack 
Development Mailing List (not for usage questions)" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [openstack-dev] Oslo logging eats system level tracebacks by 
default

+1 Providing service crashing information is very valuable. In general we need 
to provide as much information about why the service exited 
(critically/traceback/unexpectedly) for our operators.

—Morgan

—
Morgan Fainberg

From: Jay Pipes [email protected]<mailto:[email protected]>
Reply: OpenStack Development Mailing List (not for usage 
questions)[email protected]<mailto:[email protected]>
Date: May 28, 2014 at 08:50:25
To: [email protected]<mailto:[email protected]> 
[email protected]<mailto:[email protected]>
Subject:  Re: [openstack-dev] Oslo logging eats system level tracebacks by 
default

On 05/28/2014 11:39 AM, Doug Hellmann wrote:
> On Wed, May 28, 2014 at 10:38 AM, Sean Dague 
> <[email protected]<mailto:[email protected]>> wrote:
>> When attempting to build a new tool for Tempest, I found that my python
>> syntax errors were being completely eaten. After 2 days of debugging I
>> found that oslo log.py does the following *very unexpected* thing.
>>
>> - replaces the sys.excepthook with it's own function
>> - eats the execption traceback unless debug or verbose are set to True
>> - sets debug and verbose to False by default
>> - prints out a completely useless summary log message at Critical
>> ([CRITICAL] [-] 'id' was my favorite of these)
>>
>> This is basically for an exit level event. Something so breaking that
>> your program just crashed.
>>
>> Note this has nothing to do with preventing stack traces that are
>> currently littering up the logs that happen at many logging levels, it's
>> only about removing the stack trace of a CRITICAL level event that's
>> going to very possibly result in a crashed daemon with no information as
>> to why.
>>
>> So the process of including oslo log makes the code immediately
>> undebuggable unless you change your config file to not the default.
>>
>> Whether or not there was justification for this before, one of the
>> things we heard loud and clear from the operator's meetup was:
>>
>> - Most operators are running at DEBUG level for all their OpenStack
>> services because you can't actually do problem determination in
>> OpenStack for anything < that.
>> - Operators reacted negatively to the idea of removing stack traces
>> from logs, as that's typically the only way to figure out what's going
>> on. It took a while of back and forth to explain that our initiative to
>> do that wasn't about removing them per say, but having the code
>> correctly recover.
>>
>> So the current oslo logging behavior seems inconsistent (we spew
>> exceptions at INFO and WARN levels, and hide all the important stuff
>> with a legitimately uncaught system level crash), undebuggable, and
>> completely against the prevailing wishes of the operator community.
>>
>> I'd like to change that here - https://review.openstack.org/#/c/95860/
>>
>> -Sean
>
> I agree, we should dump as much detail as we can when we encounter an
> unhandled exception that causes an app to die.

+1

-jay

_______________________________________________
OpenStack-dev mailing list
[email protected]<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Oslo logging eats system level tracebacks by default

Reply via email to