On Jun 14, 2013, at 11:18 PM, Christopher Armstrong <ra...@twistedmatrix.com> 
wrote:

> This conversation has gotten pretty sprawling; time to reel it in with
> some code.
> 
> What do you think of this for an API that meets in the middle?
> 
> https://gist.github.com/radeex/5787124
> 
> This example implementation only concerns itself with the points under
> debate right now; obviously it's completely unusable in general. But
> anyway:
> 
> 1. it supports English logs
> 2. it doesn't require you to specify a formatting if you want to just
> log a bunch of data
> 3. it makes it easy to specify a system (both manually and based on
> the module your code is in)


First off, thanks again for taking the time to correspond about this, I'm very 
happy for this conversation to be attracting attention and making progress :).

Moving right along...

This code is relying on the semantics of the existing log.msg() call (which are 
bad, for the reasons I enumerated in the start of this thread), right down to 
the dependence on %-formatting.  The main thing I'm interested in at the moment 
is actually fixing those semantics to be more useful; the specific thing we're 
talking about here, persistent log identifiers, is interesting to me at the 
moment mostly as it relates to that.

I realize you might just be relying on it for familiarity, you might not have 
had time to read the whole new thing, so I'm not trying to tear down your idea 
based on that alone :).

But Wilfredo already implemented a thing which addresses the 
namespacing-of-log-messaages issue, and it does about 2/3 of what your proposal 
already does (except it has, like, tests and stuff).  Right down to the 
_getframe() hack (although  
<http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py#L312>
 uses inspect.currentframe() which has a public-sounding name at least :)).  As 
I mentioned in my immediately-previous reply, the logger's 'namespace' does not 
need to be the same as your module name, so it can be a "more persistent" 
identifier if you have one.

More significantly, you're actually going to the trouble of constructing a 
format string.  This produces output in an ad-hoc format: '='-separated str()s 
of your log keys (although only the ones you pass directly in your message; 
those annotating or augmenting the message at subsequent points by other 
publishers / observers are ignored), which you are presumably relying upon 
being smooshed into a plain-text log file, right along side many other messages 
which do not fit this format.  Presumably this could lead to issues like 
traceback lines with '=' assignments in them showing up in whatever kind of 
ad-hoc log-analysis you want to do later, and ambiguous unparseable results if 
any fields in the log message have a '=' and a space in their repr.

Let's not make it the responsibility of the emitter, or the format string, to 
get those values into a text log file that can be scanned with a regex (albeit 
in the case of your proposal, a much-simplified regex).  Let's push that 
responsibility down into the log observer, and actually save it as structured 
data and *parse* it later.

As all the hip young kids will tell you, this type of data should be persisted 
with JSON, but we should probably have code to support multiple formats.  
Another advantage of serializing and parsing the logs as structured data would 
allow tools written to work on a "live" log observer to also analyze persistent 
logs in at least a similar way, if not entirely the same way.

(An object that has survived the grim ordeal of translation to and from JSON 
will surely be made less by the process, and can't be treated quite the same 
way by a log observer; but, as I'm sure you'll agree, this is probably better 
than the alternative of Pickle.  An unpickled object will be brought fully back 
to life, but, in the words of the recent NIN release, it will have "came back 
haunted".)

Maybe we could even do a SQLite observer.  Honestly I'm not sure which format 
is best.

Since it's also important to be able to *filter* logs, and avoid the work of 
formatting or serializing them at all, it would be best to push this work as 
far down the line as possible.  The construction of the format string is itself 
a whole ton of string-copying, the sort of thing I'd really like to avoid.  
(Last I measured, dictionary operations that don't cause a reallocation - and 
most don't, unless you're inserting in a long loop - are cheaper than even 
apparently small string copies.)

In fact, it strikes me that perhaps *this* is the feature we should be talking 
about implementing: writing a structured log observer and parser, firming up 
the contract between emitters and observers so it's clear what to do with data 
that can't be serialized - more than exactly which fields we use to identify 
messages that are interesting to us.

Relatedly, we could make log analysis a first-class task; instead of assuming 
someone will grep the log file later, actually provide a simple log-query tool 
that users can plug their code into, presuming they're using a log format which 
is cleanly parseable.  Maybe have a little web resource that serves up nicely 
formatted views of your logs.

I can implement the UUID thing as a completely separate ticket and we can 
discuss it on its own merits; meanwhile, you can happily add log IDs to 
everything in your applications.  I'd be quite happy to table this part of the 
conversation for a separate ticket; I just thought it would be a nice thing 
that could fall out of some of the other parts of a new logging system.


-glyph
_______________________________________________
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Reply via email to