Re: [Twisted-Python] logging

2013-06-15 Thread Hynek Schlawack

Am 15.06.2013 um 08:18 schrieb Christopher Armstrong :

>> As far as actual *proposals* go, I have these ones, that are all independent:
>> 
>> 1. include all keyword arguments in log output without requiring
>> specifying the formatting in the string
>> 2. make it really easy to specify the "system"
>> 3. stop affecting the "system" of application code based on the
>> framework code that's running the application code (i.e., don't use
>> callWithContext to specify the system any more)
>> 
>> Of these, I think #2 and #3 have the most benefit; I can do #1 with my
>> own logging statements just fine, and while IMO it'd be nice if the
>> whole world adopted a nice identifier-based system, the lion's share
>> of the benefit comes from my use of it consistently in the
>> application's codebase.
> 
> This conversation has gotten pretty sprawling; time to reel it in with
> some code.
> 
> What do you think of this for an API that meets in the middle?
> 
> https://gist.github.com/radeex/5787124
> 
> This example implementation only concerns itself with the points under
> debate right now; obviously it's completely unusable in general. But
> anyway:
> 
> 1. it supports English logs
> 2. it doesn't require you to specify a formatting if you want to just
> log a bunch of data
> 3. it makes it easy to specify a system (both manually and based on
> the module your code is in)

I’ve held back from this discussion so far because it seemed to me that I 
always missed some part of the discussion to fully understand what you’re all 
talking about. I would like to comment on this concrete proposal though before 
I hold my peace forever. (NB I’m not replying just to Christopher but try to 
address everything I saw on the thread so far – I like most of his proposal.)

I find that there’s some kind of false dichotomy brought up in this discussion 
and API and output are somewhat muddled together a bit (maybe I’m just 
misunderstanding though – that’s why I didn’t comment until now).

I personally like my logs 100% structured (except for Exceptions) and still be 
able to “comment” on events in plain English if I need to.

And I don’t see why comments/events should be special case on output (square 
brackets in this example). If you have an event called user_error, you can 
always add a key called error for another “symbol” or just an error_msg if you 
insist on English. When looking for a certain type of user_error, you simply 
write an AND clause in your logging software (Kibana or whatever). It’s pretty 
easy to keep *that* consistent across applications.

For example, *my* log would look like this:

event=user_error peer=127.0.0.1 error=pebkac

If the programmer in question hadn’t enough logging pain in their life to see 
that’s reasonable, they can always do:

event=user_error peer=127.0.0.1 error_msg='Problem exists between keyboard and 
chair.'

Still perfectly parseable, perfectly readable. And with {!r} easy to achieve. A 
nice API I would like to have be:

log('user_error', peer=self.transport.getPeer().host, error_msg='Problem exists 
between keyboard and chair.') – and log figures out itself if it needs to 
quote. I could also live with them all quoted, i.e.:

event='user_error' peer='127.0.0.1' error='pebkac'

to have less special cases.


I hope that makes some sense, the point I’m trying to make that events don’t 
need to be distinct by themselves. If you enforce that, you’re forcing 
structure on them which you could spread over multiple fields that are *much* 
more pleasant to parse.

Regards,
Hynek

P.S. For convenience, I usually write a log method in Twisted protocols that 
prepends the messages with state data, peer IP etc, but that JFTR.
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


[Twisted-Python] 13.1.0 and #6499

2013-06-15 Thread Hynek Schlawack
Hi,

#6499 (Teach chain certificates to SSL server endpoints string syntax) has been 
pronounced ready for merge but since I’m no committer, I can’t do that myself.

It would be *great* if it could be merged before Ashwini cuts the release 
branch!

Thanks & sorry for the noise,
—h
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 10:43 PM, Christopher Armstrong  
wrote:

> I don't think it's worth coming up with some kind of GUID-based
> system, because I don't think anyone's going to go to the trouble to
> use it, and I think it basically offers no practical benefit over
> simple event names.

Here are the benefits over simple event names:

It's opaque, so as code moves around, it doesn't look dated or wrong.  Your 
'system' argument looks a whole lot like a FQPN, but as you keep saying, code 
moves around a lot.  Having a *second* dotted identifier that looks a lot like 
the module name is going to look a lot like cruft to someone when it stops 
matching a module name, they're going to be motivated to "clean it up", and 
then the "cleaned up" version is going to break everyone's logging 
instrumentation.
Since UUIDs can be derived from other information, you can start off without 
one and add one later.  If you didn't feel like specifying one in the first 
place with your log messages, then you're stuck until the monitoring and 
application code that comes to some agreement.

My view of our present disagreement seems to come down to this: I believe that 
nobody is going to bother to do logging right in the first place (particularly, 
they are not going to take the time to specify or enumerate the types of events 
that are interesting that their system emits), and will only realize they need 
to extract interesting stuff from their logs later.  I think we need to provide 
the best possible support for the developer "later" (the opaque UUID they can 
associate with an API symbol) and the monitoring folks "now" (the derived UUID 
that they can use in place of a gross and broken-in-the-future regex).

This doesn't mean that the UUID magically makes monitoring always work in the 
future; developers can still change their format strings willy-nilly.  But, it 
at least provides a mechanism that they *could* use to avoid breakage if 
everyone believes it should be avoided.

I understand your point to be that you think that people should, and that they 
therefore will, go to the trouble to categorize every thing that gets logged as 
they're writing the logging stuff.

In support of my argument, I offer as evidence the unfortunate mess that is 
Twisted's current ad-hoc usage of logging, plus the existence of Splunk, a 
publicly-traded company whose whole reason for existing is that they can run 
regexes over poorly-structured logs to extract useful information and 
statistics from them :-).

To be fair, one element in support of your argument is that you managed to 
write a system that used this idiom and it worked well in practice.  I'm sure 
that it did, and I think it's a good thing for people to adopt.

My concern is that lots of useful log-worthy events are coming out of an 
open-source library that won't be exposed to the discipline that a particular 
team adopts for logging.

Finally, it's worth noting that GUID-based identifiers and textual, 
hierarchical identifiers are not mutually exclusive.  You can have 
log.info(log_guid="98185C94-57D7-439A-8206-1D4A2ACBD983", 
log_system="spaceships.combat.explosion") in a single event.  We could provide 
both tools to the developer, and they're not obliged to use either.

(I see you've written some other messages, so more forthcoming...)

-glyph

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 10:48 PM, Christopher Armstrong  
wrote:

> There's... a lot here. I'm overwhelmed by how complex this system
> would be, and I think that end users would be as well. I don't really
> want to put UUIDs into my source code, whether I type them or paste
> them, and I don't want to have to invoke command line tools to figure
> out what those UUIDs in order to be a good developer.

The system isn't really that complex at all.  It's calling uuid.uuid5 once, and 
it's even doing that internally in a utility function, so you don't need to 
understand what it's doing.

> I think there's basically no practical benefit over the hierarchical
> "system" + event-identifier system, where:

I tried to lay out the benefits in my other recent reply.

> 1. it's trivial to specify a hierarchical, easy-to-read "system" key that has 
> a small-ish scope

The fact that you have to say "trivial" here suggests it's not actually trivial 
:).  I think that the amount of energy people initially put into logging is so 
low that they will often forget to do something like this.  But, even if they 
remember in the future, we have to deal with the plethora of messages already 
within Twisted, and every Twisted library, that *don't* do this.

> 2. you think of a unique event name in that small-ish scope and type it out.

How do you know that your event name is unique though, other than running 
'grep'?

> I'm beginning to think #1 is the most important of all, but I think
> I'll continue to use event-names to describe all my log statements.

This might be a good practice anyway.

> As I said in my response to Wilfredo, I think it's really important to
> stop specifying the "system" for application code in Twisted. The
> application's code should not be considered a part of the HTTP
> protocol's system. There are also other problems with the way we use
> system, like putting extra random data like request ID in there.


The "system" that Twisted currently specifies is supposed to be the reactor 
event which caused a particular chunk of code to run.  This is not related to 
the subsystem within your application where the functionality that is logging 
is implemented (which, really, still sounds to me like "module name").  I think 
this could be quite useful, especially in systems where the same functionality 
is exposed over multiple protocols and disparate events could cause the same 
subsystem to eventually get called.

However, there are a ton of problems with the way that's implemented; I can't 
really argue with you here about the current implementation, because it's so 
poorly done that I think it's next to useless.  With Failure's extended 
"exception caught here" stacks, it doesn't provide any information other than 
the traceback.

However, I'd like to bring it back at some point, especially if we can figure 
out how to have the causality of an event.  Wouldn't it be fantastic to see a 
trace like this when something went wrong?

.dataReceived
  called callLater(x)
  called callInThread(y)
  which called callFromThread(z)
  and that's why your code is running now


___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] 13.1.0 and #6499

2013-06-15 Thread Laurens Van Houtven
This was resolved :)

cheers
lvh
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Phil Mayers

On 06/15/2013 06:48 AM, Christopher Armstrong wrote:


There's... a lot here. I'm overwhelmed by how complex this system
would be, and I think that end users would be as well. I don't really
want to put UUIDs into my source code, whether I type them or paste
them, and I don't want to have to invoke command line tools to figure
out what those UUIDs in order to be a good developer.


+1. I like the general idea of better logging via keyword args and an 
"id", but I strongly dislike the idea of using UUIDs as log message 
identifiers.


In fact, I'll go so far as to say I defintely wouldn't use such a 
system, and for Twisted to migrate to it would actually reduce the 
utility of the in-built logging, because I'd have to build and maintain 
something to translate Twisted log messages I cared about into whatever 
system I built to do my own logging :o(



I think there's basically no practical benefit over the hierarchical
"system" + event-identifier system, where:


+1. Hierarchial IDs also permit prefix-based wildcarding:

log.observe("twisted.factory.*")
log.exclude("twisted.factory.http.*")
log.observe("myapp.thing")
log.observe("txSomeProject.event.*")

Obviously this is just an example; I haven't put any real thought into it.



1. it's trivial to specify a hierarchical, easy-to-read "system" key
that has a small-ish scope
2. you think of a unique event name in that small-ish scope and type it out.

I'm beginning to think #1 is the most important of all, but I think
I'll continue to use event-names to describe all my log statements.


I pretty much agree with all of this.

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 3:55 AM, Phil Mayers  wrote:

> On 06/15/2013 06:48 AM, Christopher Armstrong wrote:
> 
>> There's... a lot here. I'm overwhelmed by how complex this system
>> would be, and I think that end users would be as well. I don't really
>> want to put UUIDs into my source code, whether I type them or paste
>> them, and I don't want to have to invoke command line tools to figure
>> out what those UUIDs in order to be a good developer.
> 
> +1. I like the general idea of better logging via keyword args and an "id", 
> but I strongly dislike the idea of using UUIDs as log message identifiers.
> 
> In fact, I'll go so far as to say I defintely wouldn't use such a system, and 
> for Twisted to migrate to it would actually reduce the utility of the 
> in-built logging, because I'd have to build and maintain something to 
> translate Twisted log messages I cared about into whatever system I built to 
> do my own logging :o(

I am really, really puzzled by this reaction.  I am wondering if you read my 
message carefully, or if I didn't express myself well.

If I were to implement the system that I have proposed, you could completely 
ignore it.  You could never deal with a UUID ever, and process logs based on 
whatever other attributes you like.  There's nothing to translate.  Unless when 
you say "translate" you mean translating the UUID of a currently ad-hoc message 
(one which doesn't specify or document its own unique attributes) which is 
something that would just be impossible without this feature.

The only overhead that you'd ever incur would be a single call per log event - 
probably per-process-lifetime, since the result could be cached - to 
uuid.uuid5, which is not exactly the most expensive function.  It would be 
called internally though; you wouldn't have to call it yourself or be aware 
that it was being called.

If you had the use-case that I believe many people do - which is to evolve a 
stable identifier for a previously ad-hoc log message - then you could do that. 
 Otherwise this wouldn't affect you.

What I've proposed with UUIDs is to identify *specific semantic events* that 
you might want to do monitoring/alerting on, which may move between systems.

Does this explanation make you feel less worried about the inclusion of such a 
feature?

>> I think there's basically no practical benefit over the hierarchical
>> "system" + event-identifier system, where:
> 
> +1. Hierarchial IDs also permit prefix-based wildcarding:
> 
> log.observe("twisted.factory.*")
> log.exclude("twisted.factory.http.*")
> log.observe("myapp.thing")
> log.observe("txSomeProject.event.*")

This is already implemented in 
;
 see for example setLogLevelForNamespace() 
.

Namespaces are just python module names by default, but nothing requires that; 
the only requirement is that they're strings separated with dots.  You can 
instantiate a Logger object with whatever 'namespace' argument you want.

So: again, you can get what you want exactly and ignore the other stuff.

>> 1. it's trivial to specify a hierarchical, easy-to-read "system" key
>> that has a small-ish scope
>> 2. you think of a unique event name in that small-ish scope and type it out.
>> 
>> I'm beginning to think #1 is the most important of all, but I think
>> I'll continue to use event-names to describe all my log statements.
> 
> I pretty much agree with all of this.

I am still not seeing the utility of one of these identifiers beyond 
class/method name, but you can always just pass the namespace explicitly.

-glyph___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 14, 2013, at 11:18 PM, Christopher Armstrong  
wrote:

> This conversation has gotten pretty sprawling; time to reel it in with
> some code.
> 
> What do you think of this for an API that meets in the middle?
> 
> https://gist.github.com/radeex/5787124
> 
> This example implementation only concerns itself with the points under
> debate right now; obviously it's completely unusable in general. But
> anyway:
> 
> 1. it supports English logs
> 2. it doesn't require you to specify a formatting if you want to just
> log a bunch of data
> 3. it makes it easy to specify a system (both manually and based on
> the module your code is in)


First off, thanks again for taking the time to correspond about this, I'm very 
happy for this conversation to be attracting attention and making progress :).

Moving right along...

This code is relying on the semantics of the existing log.msg() call (which are 
bad, for the reasons I enumerated in the start of this thread), right down to 
the dependence on %-formatting.  The main thing I'm interested in at the moment 
is actually fixing those semantics to be more useful; the specific thing we're 
talking about here, persistent log identifiers, is interesting to me at the 
moment mostly as it relates to that.

I realize you might just be relying on it for familiarity, you might not have 
had time to read the whole new thing, so I'm not trying to tear down your idea 
based on that alone :).

But Wilfredo already implemented a thing which addresses the 
namespacing-of-log-messaages issue, and it does about 2/3 of what your proposal 
already does (except it has, like, tests and stuff).  Right down to the 
_getframe() hack (although  

 uses inspect.currentframe() which has a public-sounding name at least :)).  As 
I mentioned in my immediately-previous reply, the logger's 'namespace' does not 
need to be the same as your module name, so it can be a "more persistent" 
identifier if you have one.

More significantly, you're actually going to the trouble of constructing a 
format string.  This produces output in an ad-hoc format: '='-separated str()s 
of your log keys (although only the ones you pass directly in your message; 
those annotating or augmenting the message at subsequent points by other 
publishers / observers are ignored), which you are presumably relying upon 
being smooshed into a plain-text log file, right along side many other messages 
which do not fit this format.  Presumably this could lead to issues like 
traceback lines with '=' assignments in them showing up in whatever kind of 
ad-hoc log-analysis you want to do later, and ambiguous unparseable results if 
any fields in the log message have a '=' and a space in their repr.

Let's not make it the responsibility of the emitter, or the format string, to 
get those values into a text log file that can be scanned with a regex (albeit 
in the case of your proposal, a much-simplified regex).  Let's push that 
responsibility down into the log observer, and actually save it as structured 
data and *parse* it later.

As all the hip young kids will tell you, this type of data should be persisted 
with JSON, but we should probably have code to support multiple formats.  
Another advantage of serializing and parsing the logs as structured data would 
allow tools written to work on a "live" log observer to also analyze persistent 
logs in at least a similar way, if not entirely the same way.

(An object that has survived the grim ordeal of translation to and from JSON 
will surely be made less by the process, and can't be treated quite the same 
way by a log observer; but, as I'm sure you'll agree, this is probably better 
than the alternative of Pickle.  An unpickled object will be brought fully back 
to life, but, in the words of the recent NIN release, it will have "came back 
haunted".)

Maybe we could even do a SQLite observer.  Honestly I'm not sure which format 
is best.

Since it's also important to be able to *filter* logs, and avoid the work of 
formatting or serializing them at all, it would be best to push this work as 
far down the line as possible.  The construction of the format string is itself 
a whole ton of string-copying, the sort of thing I'd really like to avoid.  
(Last I measured, dictionary operations that don't cause a reallocation - and 
most don't, unless you're inserting in a long loop - are cheaper than even 
apparently small string copies.)

In fact, it strikes me that perhaps *this* is the feature we should be talking 
about implementing: writing a structured log observer and parser, firming up 
the contract between emitters and observers so it's clear what to do with data 
that can't be serialized - more than exactly which fields we use to identify 
messages that are interesting to us.

Relatedly, we could make log analysis a first-class task; instead of assuming 
someone will grep the log file later

Re: [Twisted-Python] logging

2013-06-15 Thread Phil Mayers

On 06/15/2013 12:13 PM, Glyph wrote:


I am really, really puzzled by this reaction.  I am wondering if you
read my message carefully, or if I didn't express myself well.


Careful re-reading of the very last bit of your message suggests I may 
have misunderstood.


I think I understand the "final" stage, and in that situation the UUID 
is invisible, correct? It's hidden behind the declaration of a "log 
event" object which can be called to emit or observe said events. That 
seems fine, though I'm not sure what the UUID *does* in that situation - 
route/match is via python object access, no?


I *think* I now understand the intermediate stage, where the log events 
are emitted by old code, and observed by UUID. You're suggesting 
calculating the UUID from the module name and static data (format 
string). I guess that's no worse than any other solution - until the log 
emitter is converted to a newer/better API, there's no great way to 
observe it.


Before we proceed, can you confirm I've understood your proposal correctly?

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 2:18 AM, Hynek Schlawack  wrote:
>
> Am 15.06.2013 um 08:18 schrieb Christopher Armstrong 
> :
>> What do you think of this for an API that meets in the middle?
>>
>> https://gist.github.com/radeex/5787124
>>
>> This example implementation only concerns itself with the points under
>> debate right now; obviously it's completely unusable in general. But
>> anyway:
>>
>> 1. it supports English logs
>> 2. it doesn't require you to specify a formatting if you want to just
>> log a bunch of data
>> 3. it makes it easy to specify a system (both manually and based on
>> the module your code is in)
>
> I’ve held back from this discussion so far because it seemed to me that I 
> always missed some part of the discussion to fully understand what you’re all 
> talking about. I would like to comment on this concrete proposal though 
> before I hold my peace forever. (NB I’m not replying just to Christopher but 
> try to address everything I saw on the thread so far – I like most of his 
> proposal.)
>
> I find that there’s some kind of false dichotomy brought up in this 
> discussion and API and output are somewhat muddled together a bit (maybe I’m 
> just misunderstanding though – that’s why I didn’t comment until now).

You're right. I didn't make a strong enough point about the fact that
the output formatting isn't important, but I assumed everyone already
knew that. I should have been more clear.


> I personally like my logs 100% structured (except for Exceptions) and still 
> be able to “comment” on events in plain English if I need to.
>
> And I don’t see why comments/events should be special case on output (square 
> brackets in this example). If you have an event called user_error, you can 
> always add a key called error for another “symbol” or just an error_msg if 
> you insist on English. When looking for a certain type of user_error, you 
> simply write an AND clause in your logging software (Kibana or whatever). 
> It’s pretty easy to keep *that* consistent across applications.
>
> For example, *my* log would look like this:
>
> event=user_error peer=127.0.0.1 error=pebkac

The special formatting in the example I gave was only intended for the
dumb file-based format. I thought it was just a nice touch on the spur
of the moment. In practice, I would use this for local logging, but
then set up a log observer that passed raw key/value data to my real
log aggregation/storage/filtering system. I'm making a distinction
between what we show our users (in a twistd.log file) and what we give
to our automated systems (sent over network protocols to logging
systems).

> If the programmer in question hadn’t enough logging pain in their life to see 
> that’s reasonable, they can always do:
>
> event=user_error peer=127.0.0.1 error_msg='Problem exists between keyboard 
> and chair.'
>
> Still perfectly parseable, perfectly readable. And with {!r} easy to achieve. 
> A nice API I would like to have be:
>
> log('user_error', peer=self.transport.getPeer().host, error_msg='Problem 
> exists between keyboard and chair.') – and log figures out itself if it needs 
> to quote. I could also live with them all quoted, i.e.:
>
> event='user_error' peer='127.0.0.1' error='pebkac'
>
> to have less special cases.

It sounds like you're arguing that the human-readable *.log format
should be closer to the simple key/value representation that we use
underneath.

Would you also argue that instead of having a log line that looks like:

2013-06-15 10:24:21-0500 [-] Server Shut Down.

We should actually format them (in twistd.log) like this?

time=1371331461.0 system='-' msg='Server Shut Down.'

> I hope that makes some sense, the point I’m trying to make that events don’t 
> need to be distinct by themselves. If you enforce that, you’re forcing 
> structure on them which you could spread over multiple fields that are *much* 
> more pleasant to parse.

So, if I had my own way, "event" would be a required argument to the
log() function in my example, so that the only thing I'm forcing on
the structure of a log message is that you *have* an event argument.

But I'm pretty sure that's not going to fly. :-) So the only reason I
wanted to support "event" in a special way in my example is to
emphasize and encourage its use.

> Hynek


--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 6:37 AM, Glyph  wrote:
>
> On Jun 14, 2013, at 11:18 PM, Christopher Armstrong
>  wrote:
>
> This conversation has gotten pretty sprawling; time to reel it in with
> some code.
>
> What do you think of this for an API that meets in the middle?
>
> https://gist.github.com/radeex/5787124
>
> This example implementation only concerns itself with the points under
> debate right now; obviously it's completely unusable in general. But
> anyway:
>
> 1. it supports English logs
> 2. it doesn't require you to specify a formatting if you want to just
> log a bunch of data
> 3. it makes it easy to specify a system (both manually and based on
> the module your code is in)
>
>
> First off, thanks again for taking the time to correspond about this, I'm
> very happy for this conversation to be attracting attention and making
> progress :).
>
> Moving right along...
>
> This code is relying on the semantics of the existing log.msg() call (which
> are bad, for the reasons I enumerated in the start of this thread), right
> down to the dependence on %-formatting.  The main thing I'm interested in at
> the moment is actually fixing those semantics to be more useful; the
> specific thing we're talking about here, persistent log identifiers, is
> interesting to me at the moment mostly as it relates to that.
>
> I realize you might just be relying on it for familiarity, you might not
> have had time to read the whole new thing, so I'm not trying to tear down
> your idea based on that alone :).

Yeah... I feel like a lot of the things you say in this email aren't
related to the point I was trying to get across, which was the
interface and effect, not the implementation of it.


>
> But Wilfredo already implemented a thing which addresses the
> namespacing-of-log-messaages issue, and it does about 2/3 of what your
> proposal already does (except it has, like, tests and stuff).  Right down to
> the _getframe() hack (although
> 
> uses inspect.currentframe() which has a public-sounding name at least :)).
> As I mentioned in my immediately-previous reply, the logger's 'namespace'
> does not need to be the same as your module name, so it can be a "more
> persistent" identifier if you have one.
>
> More significantly, you're actually going to the trouble of constructing a
> format string.  This produces output in an ad-hoc format: '='-separated
> str()s of your log keys (although only the ones you pass directly in your
> message; those annotating or augmenting the message at subsequent points by
> other publishers / observers are ignored), which you are presumably relying
> upon being smooshed into a plain-text log file, right along side many other
> messages which do not fit this format.  Presumably this could lead to issues
> like traceback lines with '=' assignments in them showing up in whatever
> kind of ad-hoc log-analysis you want to do later, and ambiguous unparseable
> results if any fields in the log message have a '=' and a space in their
> repr.
>
> Let's not make it the responsibility of the emitter, or the format string,
> to get those values into a text log file that can be scanned with a regex
> (albeit in the case of your proposal, a much-simplified regex).  Let's push
> that responsibility down into the log observer, and actually save it as
> structured data and *parse* it later.
>
> As all the hip young kids will tell you, this type of data should be
> persisted with JSON, but we should probably have code to support multiple
> formats.  Another advantage of serializing and parsing the logs as
> structured data would allow tools written to work on a "live" log observer
> to also analyze persistent logs in at least a similar way, if not entirely
> the same way.
>
> (An object that has survived the grim ordeal of translation to and from JSON
> will surely be made less by the process, and can't be treated quite the same
> way by a log observer; but, as I'm sure you'll agree, this is probably
> better than the alternative of Pickle.  An unpickled object will be brought
> fully back to life, but, in the words of the recent NIN release, it will
> have "came back haunted".)
>
> Maybe we could even do a SQLite observer.  Honestly I'm not sure which
> format is best.

So, I responded to Hynek about this confusion already, so I'll let you
read that. I have always assumed that twistd.log would continue to be
vaguely human-readable, with special formatting for several of the
keys. I also assume that developers will be able to hook up additional
observers to save out .concatenated-json files, or stream to some
network protocol, or whatever. The little bit of formatting my example
does is just a minor convenience for the people still logging at
twistd.log files. The arguments you're making now sounds ilke you want
to et rid of that ad-hoc formatting; is that true? So, instead of


2013-06-15 10:24:21-0500 [-] Server Shut Do

Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Thu, Jun 13, 2013 at 5:14 PM, Glyph  wrote:
> Hello everybody. Today I'd like to talk about everyone's favorite subsystem
> within Twisted: logging.
>
> There are a number of problems with Twisted's logging system.
>
> The general theme of these problems is that Twisted log system was designed
> with some intended use-cases in mind, but its implementation was stopped
> halfway through and none of those use-cases were really well satisfied.
>
> At various conferences, and in person at various times, we've gotten
> together to express frustration at the logging system, but I don't think
> that the problems with it have ever been really thoroughly written down.
>
> Wilfredo has been working on logging here:
> http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py.
> This began as an attempt to just address some issues with our own logging
> wrappers, but has been growing into a design that is more fully capable of
> replacing logging within Twisted wholesale. I hope one of us can contribute
> it to Twisted soon, but in the meanwhile, please feel free to provide
> feedback - hopefully if some people can provide a bit of informal feedback
> now, the formal review process can be shorter and more pleasant :).

I like it. Let's merge it.


--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 11:09 AM, Christopher Armstrong
 wrote:
> On Thu, Jun 13, 2013 at 5:14 PM, Glyph  wrote:
>> Hello everybody. Today I'd like to talk about everyone's favorite subsystem
>> within Twisted: logging.
>>
>> There are a number of problems with Twisted's logging system.
>>
>> The general theme of these problems is that Twisted log system was designed
>> with some intended use-cases in mind, but its implementation was stopped
>> halfway through and none of those use-cases were really well satisfied.
>>
>> At various conferences, and in person at various times, we've gotten
>> together to express frustration at the logging system, but I don't think
>> that the problems with it have ever been really thoroughly written down.
>>
>> Wilfredo has been working on logging here:
>> http://trac.calendarserver.org/browser/CalendarServer/trunk/twext/python/log.py.
>> This began as an attempt to just address some issues with our own logging
>> wrappers, but has been growing into a design that is more fully capable of
>> replacing logging within Twisted wholesale. I hope one of us can contribute
>> it to Twisted soon, but in the meanwhile, please feel free to provide
>> feedback - hopefully if some people can provide a bit of informal feedback
>> now, the formal review process can be shorter and more pleasant :).
>
> I like it. Let's merge it.


I think I would just want one more feature:

>>> Logger().emit(dudelog.LogLevel.warn, a="hi")

Right now, this doesn't spit out anything into my log file, with
default observers. I want it to emit some simple representation of the
keys and values.

On the other hand, the same is true of log.msg(a="hi"), but maybe this
should be implemented (somehow) in Logger() codepath to avoid breaking
compatibility?


--
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 5:28 AM, Phil Mayers  wrote:

> On 06/15/2013 12:13 PM, Glyph wrote:
> 
>> I am really, really puzzled by this reaction.  I am wondering if you
>> read my message carefully, or if I didn't express myself well.
> 
> Careful re-reading of the very last bit of your message suggests I may have 
> misunderstood.

OK, whew :).

> I think I understand the "final" stage, and in that situation the UUID is 
> invisible, correct? It's hidden behind the declaration of a "log event" 
> object which can be called to emit or observe said events.

That's right.

> That seems fine, though I'm not sure what the UUID *does* in that situation - 
> route/match is via python object access, no?

What it does is it allows *older* monitoring scripts to work.  It also holds on 
to the UUID internally so that if, for example, the module's name is changed in 
the future, the API name can be re-named and code can be pointed at it via the 
normal deprecate/redirect mechanism.  (Of course, _any_ sort of explicit / 
unique identifier would work for this latter use-case; it's just that this one 
has the benefit of not visibly containing the string that has now been changed, 
and so there's no long-term impulse to "clean it up" further and thus break 
stuff.)

> I *think* I now understand the intermediate stage, where the log events are 
> emitted by old code, and observed by UUID. You're suggesting calculating the 
> UUID from the module name and static data (format string). I guess that's no 
> worse than any other solution - until the log emitter is converted to a 
> newer/better API, there's no great way to observe it.

> Before we proceed, can you confirm I've understood your proposal correctly?

Sounds like you've got it just about right now.

-g___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 8:33 AM, Christopher Armstrong  
wrote:

> It sounds like you're arguing that the human-readable *.log format
> should be closer to the simple key/value representation that we use
> underneath.
> 
> Would you also argue that instead of having a log line that looks like:
> 
> 2013-06-15 10:24:21-0500 [-] Server Shut Down.
> 
> We should actually format them (in twistd.log) like this?
> 
> time=1371331461.0 system='-' msg='Server Shut Down.'


I'm actually arguing that, in the log *file* it should look like this:

{"log_format": "{service} service shut down.", "log_system": "-", "log_time": 
1371331461.0, "service": "dns"}

When a human actually reads it (with, as a straw man, 'python -m 
twisted.python.log read twistd.log.json'), I'd rather it still read somewhat 
like the above starting '2013...': except, for example, with the time 
translated into their local timezone (or the timezone of the computer where 
they're reading the logs, at least).

But I'm not saying that we should change the default twistd.log format for now; 
I'm sure there's tons of tooling built up around the ad-hoc text-based nature 
of it as it is and I wouldn't want to break that.  I'm just saying that we 
should add additional options for logging to things other than plain text 
files, and those are the things that we should build tools to get information 
back out of.

-glyph___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 8:46 AM, Christopher Armstrong  
wrote:

> Yeah... I feel like a lot of the things you say in this email aren't
> related to the point I was trying to get across, which was the
> interface and effect, not the implementation of it.

Even if it's not really what you'd intended, I think it's pretty good to keep 
discussing these issues (avoiding unnecessary formatting, putting the right 
responsibilities in the right place).

> I also assume that developers will be able to hook up additional
> observers to save out .concatenated-json files, or stream to some
> network protocol, or whatever. 

To reiterate a little from my previous reply, I think we should be, as much as 
possible, moving to make that the default.  Now, actually making it the default 
is a pretty disruptive change, but I think we could get as far as "implement 
the feature at all" before anyone would start complaining ;).

In other words, every developer should not have to realize they should write a 
JSON(ish) emitter and figure out how to get that to work on their own; we 
should just provide one that does the right thing.

To dial this back to what we actually need to talk about in the proposed log 
system, the issue is mostly to do with 

>> I can implement the UUID thing as a completely separate ticket and we can
>> discuss it on its own merits; meanwhile, you can happily add log IDs to
>> everything in your applications.  I'd be quite happy to table this part of
>> the conversation for a separate ticket; I just thought it would be a nice
>> thing that could fall out of some of the other parts of a new logging
>> system.
> 
> Yeah, I'm sorry that this has gotten so much scope-creep.

No problem.  This is all good stuff to talk about; we don't discuss operational 
concerns of managing Twisted services (either server *or* client side) nearly 
often enough on this list.  I wish we had more discussions like this!

> I'll reiterate that I think redoing what I've been speaking about as the
> "system" key is most important (but maybe it should be called
> something other than "system", since you've clarified what its intent
> was in another email).


Well, "system" was a pretty horrible name for the thing satisfying that intent. 
 We have "log_namespace" in the new logging system, and I think that (in 
combination with an idiomatic "log_id", that only needs to be unique within 
that namespace) you could get more or less what you want.

I think we're going to drop the "system" key entirely (except in the contexts 
where it's required for compatibility, of course).  Jean-Paul's recent comment 
in the quote file is apropos.

-glyph

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Glyph

On Jun 15, 2013, at 9:17 AM, Christopher Armstrong  
wrote:

> I think I would just want one more feature:
> 
 Logger().emit(dudelog.LogLevel.warn, a="hi")
> 
> Right now, this doesn't spit out anything into my log file, with
> default observers. I want it to emit some simple representation of the
> keys and values.
> 
> On the other hand, the same is true of log.msg(a="hi"), but maybe this
> should be implemented (somehow) in Logger() codepath to avoid breaking
> compatibility?

I keep going back and forth on this.

On the one hand, the main thing I want to do is to just say "if you want your 
message to appear in the log file, let's just implement a structured log file 
and you can read it from there.  If you have no human-readable format string to 
explain your message, clearly you didn't intend for a human to read it."

On the other hand, we already implement log filtering, so filtering the message 
out of the log based on the absence of the format string is a little 
overloaded.  And we should really be encouraging people to populate messages 
with useful structured data.

(I was going to say that without a format string I couldn't have my stand-in 
UUID feature, but then I realized that namespace + set of keys is probably good 
enough to generate that too, so never mind.  Also it seems like I'm the only 
one who likes that feature so maybe it doesn't matter!)

How would you feel about a special token that you have to pass explicitly?

Logger().warn(ALL_KEYS, a="hi")

-glyph

___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python


Re: [Twisted-Python] logging

2013-06-15 Thread Christopher Armstrong
On Sat, Jun 15, 2013 at 9:38 PM, Glyph  wrote:

>
> On Jun 15, 2013, at 9:17 AM, Christopher Armstrong <
> ra...@twistedmatrix.com> wrote:
>
> I think I would just want one more feature:
>
> Logger().emit(dudelog.LogLevel.warn, a="hi")
>
>
> Right now, this doesn't spit out anything into my log file, with
> default observers. I want it to emit some simple representation of the
> keys and values.
>
> On the other hand, the same is true of log.msg(a="hi"), but maybe this
> should be implemented (somehow) in Logger() codepath to avoid breaking
> compatibility?
>
>
> I keep going back and forth on this.
>
> On the one hand, the main thing I want to do is to just say "if you want
> your message to appear in the log file, let's just implement a structured
> log file and you can read it from there.  If you have no human-readable
> format string to explain your message, clearly you didn't intend for a
> human to read it."
>
> On the other hand, we already implement log filtering, so filtering the
> message out of the log based on the absence of the format string is a
> little overloaded.  And we should really be encouraging people to populate
> messages with useful structured data.
>
> (I was going to say that without a format string I couldn't have my
> stand-in UUID feature, but then I realized that namespace + set of keys is
> probably good enough to generate that too, so never mind.  Also it seems
> like I'm the only one who likes that feature so maybe it doesn't matter!)
>
> How would you feel about a special token that you have to pass explicitly?
>
> Logger().warn(ALL_KEYS, a="hi")
>
>
I'm not wild about the need for an extra constant I'd have to import, but
I'd like to understand why you think the arbitrary keys shouldn't be
included in the message. Can you elaborate on why?


-- 
Christopher Armstrong
http://radix.twistedmatrix.com/
http://planet-if.com/
___
Twisted-Python mailing list
Twisted-Python@twistedmatrix.com
http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python