On Sun, Feb 13, 2011 at 01:37:29PM -0500, Todd Willey wrote: > On Wed, Feb 9, 2011 at 1:38 PM, Eric Day <e...@oddments.org> wrote: > > * Firehouse - So far we've not discussed this too much, but I > > think when we did there was agreement that we need it. As more > > service come into the picture, we want the ability to combine and > > multiplex our logs, events, billing information, etc. so we can > > report per account, per service, and so forth. For example, as > > a user, I want to be able to see the logs or billing events with > > all entries from all my services (or filter by service), but as a > > sysadmin I may want to view per service, or per zone. We may have > > registered handlers to grab certain events for PuSH notifications > > too. To maintain maximum flexibility across deployments we need > > keep the interface generic, the payload can be a JSON object or some > > more efficient serialized message (this can be pluggable). The only > > required fields are probably: > > > > <timestamp> <service> <account_id> <blob> > > > > Where <blob> is a list of key/value pairs that handlers can > > perform routing and processing on. For a logging event, blob > > may be "priority=ERROR, message=oops!" or "priority=information, > > message=instance X launched". We can keep things really simple and > > flexible, relying on a set of documented common attributes that > > common event producers, routers, and handlers can key in on. > > Regarding the Firehose, I'd suggest we look over > http://wiki.openstack.org/AuditLogging to see what we want to change. > That page doesn't talk about centralization or aggregation, so those > still need to be thought out.
Thanks, I wasn't aware of this blueprint. As you mention, it's more about nova specific logging rather than aggregation. While many ideas can be used for other services, I'm thinking more about the layer above it. The AuditLogging ideas help provide requirements though. > I like the idea of using JSON and > including some tools to work with it. I'd suggest that we'd add a bit > more info into the log messages, so that we have more fields to filter > by. This should include the logger name, level, deep context, and the > "extra" kwarg of the call to the log function (which we pack with the > environment when handling exceptions). All of these fields are > available to each call to a logging method (though context and extra > may be None). I don't see any reason for making messages look stupid > (header + blob) when we're going to be parsing them as JSON anyway and > can build better filters without having to parse out the "blob" field. When I mentioned "blob", what I really meant was optional or service-specific fields. I want to be clear what is required vs optional fields across all services, so we don't want to require something Nova specific. There can certainly be context requirements, such as 'if service==nova, fields X,Y,Z are also required'. > A resulting message may look like this: > > { "timestamp": "2011-02-13T17:50:11Z", > "service": "nova-compute", > "logger": "nova.virt.libvirt_conn", > "level": "DEBUG", > "message": "instance i-00000001: rebooted", > "context": { "request": "XXXXXX", "user": "u", "project": "p", > "admin": "0", "elevated": "1" }, > "extra": "" } Looks good, but perhaps we should split "service" into two, such as service=nova and nova-service=compute. I suppose we could also do prefix matching for the top-level service, but I want to keep routing of messages very simple. > I've change context to note the original admin state vs. the current > state if elevated, which I think is a good idea, but only tangential > to this conversation. As long as context is project agnostic. If not, I would also include the non-project-specific context (account ID like I mentioned above). This may overlap with context, but context looks nova-specific right now with request. We want a common ID that a deployment could use to match up between swift, nova, and so on to allow for easy routing and aggregation per account. I guess we first need to resolve how accounts/users/projects/whatever will look and interact with a common auth service. :) > I also vote for an "ACCOUNTING" log level that functions similarly to > how we added an "AUDIT" level that only carries information relevant > to billing. Perhaps the top-level envelope should require a 'type', which can be log/audit/accounting/... > As far as aggregation of logs, I think the best thing would to have a > sweeper running on each machine that is generating logs, and at a > specified interval (hourly?) move all messages generated for that > interval into swift. We can make a container for each interval, and > populate it with files named by hostname. Then you can just grab all > the files out of a container and process them however you want (maybe > even combining them, decorating the json with internal account > identifiers, and adding the combined log file back into the swift > container). Instead of a sweeper, what I was proposing was to instead have real-time pushing of messages into the firehose service (essentially a queue). Workers can then pull from this queue to route, aggregate, and do as they wish. For example one worker listening for nova-events could collect messages and every hour push to a swift container as you suggest. We want the ability to access these events real-time, allowing both internal and external consumers to tap into this as they see fit. For example, we can expose PuSH, HTTP long-poll, or batched short-poll interfaces for the firehose and users can subscribe with their account ID, optionally filtering by service, type, etc. Imagine having a real-time 'tail -f' tool showing messages for your entire public cloud account. :) > I think we still need to have the ability to log to syslog as well. > We should probably just keep the same formatter we have for stderr and > --syslog, and have a json formatter for the handler installed by the > --logfile flag. It is easy enough to just add a new handler with a > different formatter in a way that doesn't break what we have already. Agreed, and this is where logging plugins come into play for each service. You should be able to enable both syslog and a firehose plugin in nova, so every message goes to both. > At least that is how I think of tackling that problem, but feedback is > always appreciated (especially since we haven't really talked about > log aggregation yet). I'm willing to step up and implement lots of > these features, since I've already got a pretty good handle on the > logging. I'm really interested in working on the aggregation services that nova and others can leverage, so lets continue to get the API/message format defined and the appropriate consumer interfaces exposed. -Eric _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp