On Wed, Nov 23, 2011 at 7:19 AM, Peter Geoghegan <pe...@2ndquadrant.com> wrote: > On 23 November 2011 02:49, Robert Haas <robertmh...@gmail.com> wrote: >> There is no sort of systematic labeling of error messages in the log >> to enable the DBA to figure out that the first error message is likely >> nothing more serious than an integrity constraint doing its bit to >> preserve data integrity, while the second is likely a sign of >> impending disaster. > > +1 > > I suggested that there be an INTERNAL_ERROR severity level before on > this list, in response to an opaque internal error that was raised in > the planner due to a bug in master (it was a simple elog() call that > raised the error), and the idea was not well received. Tom said that > "Well, the SQLSTATE for this sort of thing is already > ERRCODE_INTERNAL_ERROR". A quick search of that shows that it only > appears in the following places:
I mostly agree, but I don't think "internal error" is aiming at quite the right target. For one thing, one big cause of concern is when the database observes difficulty accessing the underlying storage. Those errors are not "internal" to the database at all - they are coming from the operating system. IME, users often don't realize what those messages mean. So even aside from the difficulty of systematic log filtering, it's easy for an inexperienced DBA to read a message complaining about inability to write a block or flush XLOG and think "oh, what a buggy piece of software PostgreSQL is" or perhaps "I wonder why it's having so much trouble performing that operation?". What we want them to think is "oh crap! my disk is dying". So I would propose to steer clear of the word "internal", because the really scary errors typically are not internal to PostgreSQL at all. What I think we want to distinguish between is things that are PEBKAC/GIGO, and everything else. In other words, if a particular error message can be caused by typing something stupid, unexpected, erroneous, or whatever into psql, it's just an error. But if no input, however misguided, should ever cause that symptom, then it's, I don't know what the terminology should be, say, a "severe error". So, for example, these would all be severe errors: cannot commit a transaction that deleted files but has no xid StartTransactionCommand: unexpected state %s could not create file "%s": %m xlog flush request %X/%X is not satisfied --- flushed only to %X/%X All of these are situations that no SQL command should ever be able to manufacture. Either PostgreSQL has a bug, or the hardware is dying, or at a minimum there is a disk full, something that the DBA will certainly want to know about sooner rather than later. The failure of the those transaction is not the user's "fault"; some external circumstance has intervened. Now, in some environments, it may be that things like unique key violations are worrisome, because they may indicate *application* bugs. So it would still be up to the DBA to provide monitoring for those conditions as needed. But at least grepping the logs for severe errors would provide an easy way for DBAs to know whether the database believes itself to be sick. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs