[BUGS] Strange AFTER UPDATE trigger behavior

2004-07-29 Thread Mason


POSTGRESQL BUG REPORT TEMPLATE



Your name : mason
Your email address : mason (at) vanten.com

System Configuration
-
Architecture (example: Intel Pentium) : AMD, multi proc, not sure exactly

Operating System (example: Linux 2.0.26 ELF) : NetBSD

PostgreSQL version (example: PostgreSQL-7.3.4): PostgreSQL-7.3.4


Please enter a FULL description of your problem:
 

The following scenerio is the expected behavior:

A function performs two operations:

1) updates a column on a table, casing a trigger to fire which will insert a
row into a new table.
2) The original fuction then updates that newly inserted row with some
additional data.

Actual behavior:

If the trigger is placed BEFORE UPDATE on the table with the changing
column, this works as planned.
If the trigger is instead placed AFTER UPDATE on the table then the function
cannot find the newly inserted row to update after the trigger has fired
even though the row is very clearly in the table.


Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
-- 

To see the two types of behavior just switch the word AFTER with BEFORE and
run again.

CREATE TABLE foo (

id  serial PRIMARY KEY

) WITHOUT OIDS;

CREATE TABLE bar (

id  serial PRIMARY KEY REFERENCES foo,
datatext

) WITHOUT OIDS;

CREATE OR REPLACE FUNCTION insert_rec()
RETURNS TRIGGER
AS '
BEGIN
INSERT INTO bar
 VALUES (NEW.id, ''text'');
RETURN NEW;
END;
' LANGUAGE 'plpgsql' SECURITY DEFINER;

CREATE TRIGGER foo_trigger
AFTER UPDATE ON foo FOR EACH ROW
EXECUTE PROCEDURE insert_rec();

INSERT INTO foo VALUES (1);
INSERT INTO foo VALUES (2);
INSERT INTO foo VALUES (3);
INSERT INTO foo VALUES (4);

CREATE OR REPLACE FUNCTION test()
RETURNS boolean
AS '
BEGIN
UPDATE foo SET id = 10 WHERE id = 1;
UPDATE bar SET data = ''changed'' WHERE id = 10;
UPDATE foo SET id = 20 WHERE id = 2;
UPDATE bar SET data = ''changed'' WHERE id = 20;
RETURN true;
END;
' LANGUAGE 'plpgsql' SECURITY DEFINER;

SELECT test();

SELECT * FROM foo;
SELECT * FROM bar;

DROP TABLE foo CASCADE;
DROP TABLE bar CASCADE;


If you know how this problem might be fixed, list the solution below:
- 

??

-- 
Mason Glaves, Senior Programmer, Vanten K.K.
[EMAIL PROTECTED] Tel: 03-5919-0266
http://www.vanten.com   Fax: 03-5919-0267




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #5015: MySQL migration wizard does not start

2009-08-26 Thread Sam Mason
On Wed, Aug 26, 2009 at 02:55:37PM +, Ken Smith wrote:
> I also noted
> that if I try to run from the commandline that you use '/' in some of the
> program paths causing the OS to say it cannot find the file - "C:/Program
> Files/Java/jdk1.5.0_16/jre/bin/java.exe".

I'm pretty sure that Windows has known about forward slashes being path
delimiters since Windows 95 and the start of Win32.  I know I normally
use forward slashes in XP when I'm at work!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] inconsistent composite type null handling in plpgsql out variable

2009-08-31 Thread Sam Mason
On Fri, Aug 28, 2009 at 02:06:02PM -0400, Merlin Moncure wrote:
> 3) If we decide the sql standard is correct, so that (null, null) is
> null == true, then we should observe rule 1 and make things work in
> consistent way.  This means, for example, that null::foo and (null,
> null)::foo should not be distinct.

The more awkward case (to me anyway) is that the standard says (1,NULL)
IS NULL should evaluate to TRUE.

I'd never noticed the ROW / RECORD dichotomy before; could one of these
be made SQL compatible and the other use more sane semantics?

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] inconsistent composite type null handling in plpgsql out variable

2009-09-01 Thread Sam Mason
On Mon, Aug 31, 2009 at 07:26:59PM +0200, Pavel Stehule wrote:
> 2009/8/31 Sam Mason :
> > The more awkward case (to me anyway) is that the standard says (1,NULL)
> > IS NULL should evaluate to TRUE.
> 
> what?
> 
> only (NULL, NULL) IS NULL is true

Bah, sorry you're right!  I was rattling my favorite tin and getting
mixed up with the behavior with IS NOT NULL, the negation of which
would say this row is null.  I.e:

  SELECT NOT (1,NULL) IS NOT NULL;

evaluates to TRUE.  I think the consensus is that we should continue to
follow the spec on this, but I was getting confused as to which operator
contains the EXISTS and FORALL operator.  I.e. a value "v" IS NULL iff
all elements of "v" are not 'the null value', whereas "v" IS NOT NULL
iff an element of "v" is 'the null value'.

> p.s. what isn't consistent (maybe - there are more possible
> interpretations) is
> 
> (NULL, NULL) IS DISTINCT FROM NULL is true

Yup, I'd agree with Merlin that a ROW consisting entirely of 'null
values' should itself be 'the null value' (to use the terminology from
the copy of the SQL spec I'm reading).  I think this should also work
recursively:

  SELECT ROW(ROW(NULL)) IS DISTINCT FROM NULL;

should return FALSE, in my understanding.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is "char"

2009-09-01 Thread Sam Mason
On Tue, Sep 01, 2009 at 04:36:25PM +, Joseph Shraibman wrote:
> Description:CASE returns ELSE value always when type is "char"

I think it's just silently truncating the literal to a single character.

> [local]:playpen=> select c.relkind, CASE c.relkind WHEN 'r' THEN 'table'
> WHEN 'v' THEN 'view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's'
> THEN 'special' WHEN 'c' THEN 'composite type' WHEN 't' THEN 'toast' ELSE
> c.relkind
> playpen->  END from (select 'r'::"char" AS relkind) c;
>  relkind | relkind 
> -+-
>  r   | t

Here, 'r' maps to the "char" literal 'table' which PG interprets as the
value 't'--i.e. PG silently chops of the 'able'.  The bug would seem to
be in your code, but PG could maybe throw an error to tell you this is
what is happening?

A possible fix would be to have your ELSE clause as:

  c.relkind::text

As that way the other branches would be interpreted as text and they
wouldn't be getting chopped off along the way.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is "char"

2009-09-02 Thread Sam Mason
On Tue, Sep 01, 2009 at 05:49:25PM +0100, Sam Mason wrote:
> PG could maybe throw an error to tell you this is
> what is happening?

Would something like the included patch be accepted?

-- 
  Sam  http://samason.me.uk/
*** src/backend/utils/adt/char.c~	2009-01-01 17:23:49.0 +
--- src/backend/utils/adt/char.c	2009-09-02 10:11:13.0 +0100
***
*** 34,39 
--- 34,45 
  {
  	char	   *ch = PG_GETARG_CSTRING(0);
  
+ 	if (ch[1])
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+  errmsg("only a single character is supported in \"char\" literals: \"%s\"",
+ 		ch)));
+ 
  	PG_RETURN_CHAR(ch[0]);
  }
  

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 03:50:05PM +0100, Greg Stark wrote:
> Perhaps we should stop thinking of "unknown" as, er, "unknown" and
> think of it as "text literal". A text literal has implicit casts to
> every data type but a normal text string has to be explicitly cast.

How does that help things?  You seem to be keeping the semantics and
only changing the name, when it's the semantics that you seem to be
complaining about.

I'm pretty sure it's correct to treat it as "unknown", other
type-systems do this and it all works well.  The thing that makes things
ambiguous is the ad-hoc polymorphism that exists in function calls
and operators.  With PG's type system you know almost nothing about
any types involved in an arbitrary expression, operators are better
than functions (given the types of the arguments you know the return
type, with default parameters even this knowledge doesn't exist with
functions) but still leave things far too open to have any rigor without
spelling out types in full everywhere.  For example:

  CREATE FUNCTION add(int,int) RETURNS int LANGUAGE sql
 AS $$ SELECT $1 + $2; $$;

  CREATE FUNCTION add(int,int,int DEFAULT NULL) RETURNS text LANGUAGE sql
AS $$ SELECT ($1 + $2)::text; $$;

What type should it attribute to the result of:

  SELECT add(1,2);

In fact it doesn't seem to want to play ball at all.  Even given the
apparently unambiguous:

  SELECT 1+add(1,2);
or
  SELECT 'hi'||add(1,2);

It doesn't get anywhere.  No need for "text 'hi'" in the second one
because || isn't defined for values of integer type.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 12:36:00PM -0400, Robert Haas wrote:
> On Wed, Sep 2, 2009 at 11:55 AM, Sam Mason wrote:
> > In fact it doesn't seem to want to play ball at all.  Even given the
> > apparently unambiguous:
> >
> >   SELECT 1+add(1,2);
> > or
> >   SELECT 'hi'||add(1,2);
> >
> > It doesn't get anywhere.  No need for "text 'hi'" in the second one
> > because || isn't defined for values of integer type.
> 
> Right.  This is exactly the sort of thing that languages with real
> type inference have no problem handling.  Of course, ML for example
> doesn't allow overloading precisely because (AIUI) it makes type
> inference difficult.  It would be awesome if we could make this work
> though.

Difficult, but not intractable.  Haskell has done this sort of thing for
quite a while; although it handles ad-hoc polymorphism differently than
PG does.  You basically end up saying how much polymorphism you want
to allow for each function, for example the return type of an operator
(in PG) is determined exactly by the type of its arguments.  In Haskell
you would have the power to say, if you so wanted, that the type of an
operator's RHS is determined exactly by the type of its LHS and return
type, or even, in the most general case, that it's parametrized over all
three types.  Obviously the more types you leave free the more typing
you have to do specifying all the types as the type-inference has less
leverage to work with.


I've been trying to think about how to apply a more modern type system
to PG for a while and hence my comments about how things like NULL
rows should be handled are based on this and may come across as rather
dogmatic sometimes, it's about the only way I can get things to hold
together without introducing much more complexity than seems strictly
necessary.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 01:19:07PM -0400, Tom Lane wrote:
> Sam Mason  writes:
> > ... For example:
> 
> >   CREATE FUNCTION add(int,int) RETURNS int LANGUAGE sql
> >  AS $$ SELECT $1 + $2; $$;
> 
> >   CREATE FUNCTION add(int,int,int DEFAULT NULL) RETURNS text LANGUAGE sql
> > AS $$ SELECT ($1 + $2)::text; $$;
> 
> > What type should it attribute to the result of:
> 
> >   SELECT add(1,2);
> 
> > In fact it doesn't seem to want to play ball at all.  Even given the
> > apparently unambiguous:
> 
> >   SELECT 1+add(1,2);
> > or
> >   SELECT 'hi'||add(1,2);
> 
> > It doesn't get anywhere.
> 
> Well, no, because our type resolution is bottom-up; it does not consider
> context when trying to resolve the overloaded "add()" function
> reference.  "Unknown" is the only part of the system that allows for any
> delay at all in identifying the type of a construct, and even that is
> limited to a literal and its first-level surrounding context.

OK, I got distracted and my example was bad.

> It's interesting that you want to go in 100% the opposite direction from
> Kevin, who seems to want to eliminate type inference altogether.  Maybe
> our current compromise isn't too bad, if it makes everybody unhappy in
> opposite directions ;-)

The current compromise has worked for a while so there's no immediate
reason to change it.  I wasn't interpreting Kevin's request directly
as I don't think he really means what he's saying and really wants
an unambiguous way of writing literals in queries.  I think it's
easier to get this without specifying types everywhere and introducing
constraints elsewhere to maintain what compatibility we've got with the
SQL standard.

If we did follow Kevin's request directly, should we also be specifying
the type of NULL?

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 12:54:03PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote:
> > If we did follow Kevin's request directly, should we also be
> > specifying the type of NULL?
>  
> I don't *think* the SQL standard requires that, and barring that I
> don't see any compelling reason to type NULL.

The SQL standard certainly doesn't require it.  It's just that you were
requiring the types of literals that happened to be enclosed in quotes
to have their type ascribed, so why not the NULL literal?

> One problem I do see
> with the current scheme, however, is that NULL *does* get typed to
> text when it makes no sense.  In my view, a CASE expression which has
> only NULL for its return values, or an abbreviated form of CASE, such
> as COALESCE or NULLIF, should be evaluated exactly the same as if they
> were replaced by NULL itself.  For example, COALESCE(NULL, NULL)
> currently yields NULL::text.  In my view that's wrong.  I view it as a
> bug, but that seems to be a hard sell here.

Yes, that's because PG does a bottom-up solve of the type constraints.
I think it should really result in an unknown type as well.

> Likewise, I think that in the query which started this thread, the
> cast to "char" is not sensible.  I'm not sure how that could be
> resolved, but it doesn't seem correct to me.

All branches unify with the "char" type (i.e. they're all "char" or
unknown) and hence the result of the expression is determined to be of
type "char".

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 01:27:35PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote: 
> > you were requiring the types of literals that happened to be
> > enclosed in quotes to have their type ascribed, so why not the NULL
> > literal?
>  
> Well, unless things have changed in recent versions of the standard
> and I've missed the change, a series of characters enclosed in
> apostrophes is what the standard calls a "character string literal"
> and defines it to be be related to character based types such as
> varchar.  As far as I'm aware, considering it to be undefined is a
> PostgreSQL extension.  If you can point to something in the standard
> to show where I'm mistaken, I'll look it over.  I'll go looking for
> something to back my memories on the topic, too, since my memory seems
> to be less reliable than it once was.

Sorry, I was referring to your explicit naming of types as in the
following:

  
http://archives.postgresql.org/message-id/4a9e337802250002a...@gw.wicourts.gov

reading it back again I'm not sure if that's what you meant now.  Email
is hard work isn't it!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 01:55:28PM -0500, Kevin Grittner wrote:
> So the behavior of the "char" type is anomalous in this regard?  Other
> character-based types behave like varchar (which has the behavior I
> would expect here)?  That is encouraging.  Why isn't the behavior of
> "char" in this regard considered a bug to be fixed?

I think there are lots of implicit casts going on that muddy the water
with respect to what's going on at a type level.  Things get promoted to
TEXT easily in PG.

I posted a patch in this thread:

  http://archives.postgresql.org/message-id/20090902091654.gl5...@samason.me.uk

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 01:37:20PM -0500, Kevin Grittner wrote:
> That still seems to be the case in the draft of the 2003 standard I
> have:
>  
>  ::=
> 
>   | 
>   | 
>   | 
>   | 
>   | 
>   | 
>  ::=
> [  ]
>  [ ... ] 
> [ {   [ ... ] 
> }... ]
>  
> The ball's in your court to show something in the standard to say that
> a character string literal is ever *not* to be taken as a character
> string.

Huh, you're right.  I'd always thought '2001-01-01' was a valid date
literal, seems the standard has required it to be prefixed by DATE at
least back to SQL92.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 12:54:03PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote:
> > If we did follow Kevin's request directly, should we also be
> > specifying the type of NULL?
>  
> I don't *think* the SQL standard requires that, and barring that I
> don't see any compelling reason to type NULL.

I've just realized that either I'm missing your point entirely (it's
happened before :) or this ignores the point entirely.  PG wants to
assign types to every expression, whether this expression will evaluate
to a NULL value at run-time or not is immaterial in this regard.  I
think SQL wants to do the same, but I don't have as much conviction as
Tom here.  Once we're ascribing types to expressions then whether it
happens to contain the literal "1", "'txt'" or "NULL" we're committed to
giving it some type---the only question is which one.  We thus need to
type expressions consisting of just NULL constants.

A fun puzzle to base any inductive solution on is what type to ascribe
to the following:

  CREATE VIEW v (c) AS
SELECT NULL;

PG allows it, but the resulting view seems somewhat unusable.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 02:41:32PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote: 
>  
> > I'd always thought '2001-01-01' was a valid date literal, seems the
> > standard has required it to be prefixed by DATE at least back to
> > SQL92.
>  
> Yep.  I don't know if it would be remotely feasible, but the
> implementation which seems like it would be "standard-safe" but still
> give reasonable concessions to those wanting to skip the extra
> keystrokes of declaring the type of literals which are not character
> based would be to go with the suggestion of having a character string
> literal type, and change the semantics such that if there is a valid
> interpretation of the statement with the character string literal
> taken as text, it should be used; if not, resolve by current "unknown"
> rules. Probably not feasible, but it seems likely it would make
> everyone reasonably happy if it could be done.

Sounds as though that'll introduce more ambiguity into the system than
there is already.

> That leaves the issue of NULL being forced to type text in the absence
> of any type info in CASE, COALESCE, and NULLIF.  If there were a way
> to say that these could return unknown type, that would be solved.
> That doesn't seem as though it would be likely to be massively
> difficult, although I could be wrong about that.

Would be nice and I'd love it to work like this, but it's quite a big
change I think.  Currently, once PG has decided on a type it sticks with
it against all further evidence.  Another example:

  SELECT NULL AS c UNION SELECT '1' UNION SELECT 2;

Once you're doing the above you're into the world of full parametric
polymorphism and you're having to do much more complicated things at the
type level.  When my free time becomes infinite I'll have a chance!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is"char"

2009-09-02 Thread Sam Mason
On Wed, Sep 02, 2009 at 02:59:54PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote:
> >   CREATE VIEW v (c) AS
> > SELECT NULL;
> > 
> > PG allows it, but the resulting view seems somewhat unusable.
>  
> I'm not sure whether the only place the standard doesn't require a
> cast is on assignment, but this is one place that the standard clearly
> does require a cast, and I'm all for that.

I'm probably missing something obvious again, but where does it say
that?  Bear in mind that my simple NULL could be an arbitrarily complex
expression, I've just chosen a simple NULL for pedagogic reasons.

I can only see a requirement that the chosen type must be compatible.
That seems to leave it open to arbitrarily choosing any type in this
case.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is "char"

2009-09-04 Thread Sam Mason
On Fri, Sep 04, 2009 at 10:59:48AM -0500, Kevin Grittner wrote:
> Tom Lane  wrote: 
> > I certainly don't want to have "char" emulate the misbegotten
> > decision to have explicit and implicit coercions behave differently.
> > So it looks to me like the argument to make "char" work like char(1)
> > doesn't actually help us much to decide if an error should be thrown
> > here or not.  On the whole, throwing an error seems better from a
> > usability perspective.

I'm all for the error being thrown; no particular feelings about whether
it only happens during "explicit" casts or everywhere.  There's always
the substring() function if the user wants it.

> I feel that the behavior of "char" in at least this case should match
> char(1) (or just plain char):

Hum, I'm not sure if that's useful behavior.  As far as I can tell, you
seem to be wanting in-memory representations of "string like types"
to all use the same representation and only use the actual types when
saving to/from disk.  This would give behavior that is consistent with
what you're for asking below.

> test=# select case when true then 'xxx' else 'a'::"char" end from t;
>  case
> --
>  x
> (1 row)

With the patch I gave, or something like it, this would throw an error
because 'xxx' is being used to initialize a value of "char" type.

> test=# select case when true then 'xxx' else 'a'::char(1) end from t;
>  case
> --
>  xxx
> (1 row)

This gives back 'xxx' because the types character and bpchar have an
implicit cast defined between them.  The result is thus of type bpchar
which places no restrictions on the length (that would apply here
anyway), with 'a' being cast from character to bpchar implicitly.

> test=# select case when true then 'xxx' else 'a'::char end from t;
>  case
> --
>  xxx
> (1 row)

This does the same as above.

> Much as the reason for the behavior of "char" may seem clear when
> inside the code looking out, it is astonishing for someone writing
> application code.

I think things would be clearer if an error was thrown in the
constructor of "char" types when the length wasn't correct.  Implicit
casts are muddying the waters as well, which doesn't aid understanding
of what's going on here.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is "char"

2009-09-04 Thread Sam Mason
On Fri, Sep 04, 2009 at 12:26:19PM -0500, Kevin Grittner wrote:
> Sam Mason  wrote:
> > Kevin Grittner wrote:
> >> test=# select case when true then 'xxx' else 'a'::"char" end from t;
> > 
> > 'xxx' is being used to initialize a value of "char" type.
>  
> As I read the semantics of the CASE predicate, it returns one of the
> given values.  'x' is not one of the given values, regardless of type.

You seem to be confused about the difference between literals and
values.  Maybe a different example:

  SELECT '1'::int;

I get '1' back from that, and not '1'.  This is because '1' is
the literal that is parsed into a value of type integer and then the
query is run and this same value is asked to convert itself back into a
literal to be written out to the screen.

Back to your example; you're asking PG to interpret the literal 'xxx'
as a "char" and it does that (but doesn't give any error back when it
chucks data away).  This behavior may be confusing because for text
types the literal exactly the same as the value itself, but this is only
a very specific behavior of text types.  For example, '{"1"}', '{1}'
and even '{"+001"}' are all literal representations of identical integer
arrays.

> I don't think an error is the right thing, I think returning the
> specified value is the right thing.  I don't think it's a good thing
> that the type system decides that the result type for this case
> predicate is "char" and that 'xxx' needs to be coerced to that type.

I fail to see how an error isn't the right thing; if we try with some
other types let see if you think any of these should succeed.

  SELECT CASE WHEN TRUE THEN text 'xxx' ELSE 0 END;
  SELECT CASE WHEN TRUE THEN text 'xxx' ELSE TRUE END;
  SELECT CASE WHEN TRUE THEN text 'xxx' ELSE '{1}'::INT[] END;
  SELECT CASE WHEN TRUE THEN text 'xxx' ELSE array [1] END;

"char" is no different other than, by default, it happens to look a lot
like any value of text type.  It's a different type (that happens to
have some implicit casts to confuse things) and hence I can't see why
invalid literals should not be thrown out.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5028: CASE returns ELSE value always when type is "char"

2009-09-04 Thread Sam Mason
On Fri, Sep 04, 2009 at 02:01:07PM -0500, Kevin Grittner wrote:
> > "char" is no different other than, by default, it happens to look a
> > lot like any value of text type.
>  
> So much so that it has the same name as a text type (wrapped in
> quotes) and behaves a lot like one:

You're getting bitten by implicit casts.  Try creating a view of any
of your examples and seeing what the resulting types are, or in 8.4 an
EXPLAIN VERBOSE seems to show how it's typed the expression.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5053: domain constraints still leak

2009-09-14 Thread Sam Mason
On Mon, Sep 14, 2009 at 11:16:23AM -0400, Robert Haas wrote:
> I haven't read the code in this area, but for what it's worth, I guess
> I lean toward the view that treating a row of NULLs as being the same
> thing as an undecorated NULL does not make very much sense.

I agree; when compared to most languages it doesn't.  When compared
to the semantics of the other operators in SQL it gets better.  I
personally think PG should strive to be internally consistent rather
than consistency with other (non-SQL based) languages.

> If I have
> a table row which contains (1, NULL, NULL) and I update the first
> column to be NULL, I feel like I now have (NULL, NULL, NULL), not just
> NULL.  Every other programming language I'm aware of makes this
> distinction - for good reasons - and I don't really see any reason why
> SQL should do anything different.

I'm not aware of any other language that does the automatic "lifting"
(to borrow nomenclature from Haskell) that SQL does, allowing NULL
appear in *every* type.  Java, for example, has null references,
but these are very different creatures from nulls in databases--the
programmer has to explicitly deal with them all the time and also they
only apply to references.  Taken another way, each object in a normal
imperative language has its own identity, but in a database two rows
that "look" the same are the same.  Thirdly, IS NULL is defined to look
"inside" composite values to see if they're "really" null.  Its these
differences in semantics that seem to make it all OK.

> Under that view, null::test is not itself a test, but denotes the
> absence of one.

OK, but how can you distinguish NULL from ROW(NULL,NULL)?

  SELECT v IS NULL, v.a, v.b
  FROM (SELECT NULL, NULL) v(a,b);

Would appear to return the same thing if ROW(NULL,NULL) evaluated to
NULL or not.  The only time it would show up is when you're trying to
save the value into a table and I think this would tend to do the right
thing more often.  For example:

  INSERT INTO t (id,rv)
SELECT f.id, b
FROM foo f
  LEFT JOIN bar b ON (f.id = b.id);

Would fail if any bar's didn't exist, whereas the current behavior is
to insert a row with rv containing all null values.  You can't test for
this case because IS NULL would return the "wrong" thing as it looks
inside composites.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5053: domain constraints still leak

2009-09-14 Thread Sam Mason
On Mon, Sep 14, 2009 at 10:54:07AM -0500, Kevin Grittner wrote:
> Sam Mason  wrote:
> > the deeper problem seems to be that the table was created as:
> > 
> >   create table test (a tstdom);
> > 
> > and not as:
> > 
> >   create table test (a tstdom not null);
>  
> Given that tstdom is declared as NOT NULL, is this difference
> considered a *feature* or is it an implementation quirk?

That's why I pointed it out!

Based on my reading of the SQL spec (and reading about Codd's
descriptions of domains) I'd say it was a bug/implementation quirk.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5053: domain constraints still leak

2009-09-14 Thread Sam Mason
On Mon, Sep 14, 2009 at 10:22:34AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > It seems like regardless of this discussion you oughtn't to be able to
> > store a NULL into that table column.  But maybe I'm just confused.
> 
> The system is relying on the not-unreasonable assumption that if it
> extracts a column of type X from someplace, what it has is a valid value
> of type X.

Yup; the deeper problem seems to be that the table was created as:

  create table test (a tstdom);

and not as:

  create table test (a tstdom not null);

which is how you seem to be treating it.

> Depending on what we decide about the whole composite-null
> mess, maybe we will be forced to abandon that assumption ... but I'm
> sure not going to do so until my back is to the wall.

There seems to be a little space yet!


This whole issue seems only distantly related to the treatment of null
rows to me.  I think PG has got its semantics confused along the way
somewhere and things need tweaking.  The only way I can get it all to
work nicely in my head is if ROW(NULL) evaluates to a NULL value (and
not a row containing a NULL value, as it does at the moment) and the
NULL/NOT NULL constraint on the CREATE DOMAIN is used somehow for the
nullness constraint of any columns using this domain.  It's the second
part that seems to be more critical, but there are various ways of
interpreting the meaning.

I'm tempted to say that the nullness specified in the domain puts a
bound on the amount of nullness available--i.e. it would be impossible
to create a nullable column from a domain that specified NOT NULL.  The
reason it's only a "limit" is that it seems useful to be able to say
that a normally nullable domain can't be null for this column in this
table.  Not sure if this is what people want though.

You then get into fun cases like:

  create domain tstdom as integer;
  create domain tstdom2 as tstdom;

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] unable to fail over to warm standby server

2010-01-27 Thread Mason Hale
by hand using `pg_ctl -D
/data/pgdata-8.3/ start`  -- this command returned a "server starting"
message,
but the server never appeared in the process list.


7.) The contents of the newly created postgresql.log were:

tail postgresql-2010-01-18_210946.log
2010-01-18 21:09:46 UTC ()LOG:  database system was interrupted while
in recovery at log time 2010-01-18 20:10:59 UTC
2010-01-18 21:09:46 UTC ()HINT:  If this has occurred more than once
some data might be corrupted and you might need to choose an earlier
recovery target.
2010-01-18 21:09:46 UTC ()LOG:  starting archive recovery
2010-01-18 21:09:46 UTC ()LOG:  restore_command = 'pg_standby -t
/tmp/pgsql.trigger.5432 /home/postgres/wal_archive %f %p %r
2>>pg_standby.log'
2010-01-18 21:09:46 UTC ()FATAL:  could not restore file
"0002.history" from archive: return code 65280
2010-01-18 21:09:46 UTC ()LOG:  startup process (PID 9284) exited with
exit code 1
2010-01-18 21:09:46 UTC ()LOG:  aborting startup due to startup process failure

8.) Sysadmin noticed the 'recovery.conf' file was still in the db root
directory (it had not been renamed recovery.done).
The recovery.conf file was renamed to recovery.conf.old and running
`pg_ctl -D /data/pgdata-8.3/ start` attempted again

9.) The server did not come up (again). This time the contents of the
new postgresql.log file were:

[postg...@prod-db-2 pg_log]$ tail -n 100 postgresql-2010-01-18_211132.log
2010-01-18 21:11:32 UTC ()LOG:  database system was interrupted while
in recovery at log time 2010-01-18 20:10:59 UTC
2010-01-18 21:11:32 UTC ()HINT:  If this has occurred more than once
some data might be corrupted and you might need to choose an earlier
recovery target.
2010-01-18 21:11:32 UTC ()LOG:  could not open file
"pg_xlog/00023C8200A3" (log file 15490, segment 163): No
such file or directory
2010-01-18 21:11:32 UTC ()LOG:  invalid primary checkpoint record
2010-01-18 21:11:32 UTC ()LOG:  could not open file
"pg_xlog/00023C820049" (log file 15490, segment 73): No
such file or directory
2010-01-18 21:11:32 UTC ()LOG:  invalid secondary checkpoint record
2010-01-18 21:11:32 UTC ()PANIC:  could not locate a valid checkpoint record
2010-01-18 21:11:32 UTC ()LOG:  startup process (PID 9328) was
terminated by signal 6: Aborted
2010-01-18 21:11:32 UTC ()LOG:  aborting startup due to startup process failure

10.) At this point a lot of time was spent on the #postgresql IRC
channel looking for help diagnosing this problem.
Before the issue was resolved, the primary server was brought back
online. Soon after we decided to create a new
base backup on the standby server and aborted trying to recover.

FYI: The last few lines of pg_standby.log were:

[postg...@prod-db-2 pgdata-8.3]$ tail -n 300 pg_standby.log
trigger file found
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
cp: cannot stat `/home/postgres/wal_archive/0002.history': No such
file or directory
trigger file found

 ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not
permittedtrigger file found

 ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not permitted

This file was not looked until after the attempt to recover was
aborted. Clearly the permissions on /tmp/pgsql.trigger.5432 were a
problem,
but we don't see how that would explain the error messages, which seem
to indicate that data on the standby server was corrupted.

If you need additional information we have saved console logs and logs
files from both the primary and secondary servers.

Thanks for reading this far and any help in tracking down the cause of
this unexpected failure.

cheers,

- Mason

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] unable to fail over to warm standby server

2010-01-28 Thread Mason Hale
Hello Heikki --

Thank you for investigating this issue and clearing up this mystery.
I do not believe it is obvious that the postgres process needs to be able to
remove the trigger file.

My naive assumption was that the trigger file was merely a flag to signal
that recovery mode needed to be stopped. If I were to guess what those steps
would be, I would assume the following:

   - detect the presence of the trigger file
   - stop the postgres process safely (e.g pg_ctl ... stop)
   - rename recovery.conf to recovery.done
   - restart the postgres process (e.g. pg_ctl ... start)

It is not obvious that the trigger file needs to be removed.
And if permissions prevent it from being removed the last thing that should
happen is to cause to database to become corrupted.

At minimum the pg_standby documentation should make this requirement clear.
I suggest language to the effect of the following:

Note it is critical the trigger file be created with permissions that allow
> the postgres process to remove the file. Generally this is best done by
> creating the file from the postgres user account. Data corruption may result
> if the trigger file permissions prevent deletion of the trigger file.


Of course the best solution is to avoid this issue entirely. Something as
easy to miss as file permissions should not cause data corruption,
especially in the process meant to fail over from a crashing primary
database.

thanks,

Mason Hale
http://www.onespot.com
direct +1 800.618.0768 ext 701



On Thu, Jan 28, 2010 at 3:49 AM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> Mason Hale wrote:
> >  ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not
> > permittedtrigger file found
> >
> >  ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not
> permitted
> >
> > This file was not looked until after the attempt to recover was
> > aborted. Clearly the permissions on /tmp/pgsql.trigger.5432 were a
> > problem,
> > but we don't see how that would explain the error messages, which seem
> > to indicate that data on the standby server was corrupted.
>
> Yes, that permission problem seems to be the root cause of the troubles.
> If pg_standby fails to remove the trigger file, it exit()s with whatever
> return code the unlink() call returned:
>
> >   /*
> >* If trigger file found, we *must* delete it. Here's why:
> When
> >* recovery completes, we will be asked again for the same
> file from
> >* the archive using pg_standby so must remove trigger file
> so we can
> >* reload file again and come up correctly.
> >*/
> >   rc = unlink(triggerPath);
> >   if (rc != 0)
> >   {
> >   fprintf(stderr, "\n ERROR: could not remove \"%s\":
> %s", triggerPath, strerror(errno));
> >   fflush(stderr);
> >   exit(rc);
> >   }
>
> unlink() returns -1 on error, so pg_standby calls exit(-1). -1 is out of
> the range of normal return codes, and apparently gets mangled into the
> mysterious 65280 code you saw in the logs. The server treats that as a
> fatal error, and dies.
>
> That seems like a bug in pg_standby, but I'm not sure what it should do
> if the unlink() fails. It could exit with some other exit code, so that
> the server wouldn't die, but the lingering trigger file could cause
> problems, as the comment explains. If it should indeed cause FATAL, it
> should do so in a more robust way than the exit(rc) call above.
>
> BTW, this changed in PostgreSQL 8.4; pg_standby no longer tries to
> delete the trigger file (so that problematic block of code is gone), but
> there's a new restore_end_command option in recovery.conf instead, where
> you're supposed to put 'rm '. I think in that
> configuration, the standby would've started up, even though removal of
> the trigger file would've still failed.
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>


Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
> On Fri, Jan 29, 2010 at 12:03 AM, Mason Hale  wrote:
> > Of course the best solution is to avoid this issue entirely. Something as
> > easy to miss as file permissions should not cause data corruption,
> > especially in the process meant to fail over from a crashing primary
> > database.
>
> I believe that such a file permission problem does nothing but
> shut down the standby by a FATAL error, and wouldn't cause data
> corruption. So if you remove the trigger file with a wrong
> permission after the shutdown, you can restart a recovery well
> by just starting the standby postgres.
>
>
Perhaps my wording of "data corruption" was too harsh?

While I did not remove the trigger file, I did rename recovery.conf to
recovery.conf.old.

That file contained the recovery_command configuration that identified the
trigger file. So that rename should have eliminated the problem. But it
didn't. Even after making this change and taking the trigger file out of the
equation my database failed to come online.

Maybe that wasn't data corruption. Maybe the issue was repairable. I just
know that with my 3+ years of experience working with Postgres and the help
of the #postgresql IRC channel, I was not able to revive the database at a
time when I desperately needed it to work. The failover process failed for
me at the worst possible time.

I will surely be careful about trigger file permissions in the future. I
just shared my experience so that future  DBA's who might make the same
mistake in a similar situation don't have to deal with the same unexpected
results.

- Mason


Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
Hello Fujii --

Thanks for the clarification. It's clear my understanding of the recovery
process is lacking.

My naive assumption was that Postgres would recover using whatever files
were available
and if it had run out of files it would stop there and come up. And that if
recovery.conf were renamed it would stop copying files from the wal_archive
into pg_xlog. Thus without the recovery.conf file, the database would just
come up, without expecting or waiting on additional files. I see my
assumption was wrong, but I think you can agree that it is not surprising
someone could expect things this work this way if they aren't directly
familiar with the code.

I think you can also see how seeing the message "If this has occurred more
than once some data might be corrupted and you might need to choose an
earlier recovery target" in the log would lead me to believe my database was
corrupted.

It is good to know that if I had left recovery.conf in place and just
removed the trigger file the issue would have resolved itself.

I'm happy to hear the database was not, in fact, corrupted by this error.

Perhaps its best to chalk this up to a scenario that creates a confusing,
hard-to-diagnose issue -- one that easily looks like corruption, but
thankfully is not.

Hopefully if anyone tuning into this thread experiences or hears of similar
fail-over problems in the future (say on IRC), they'll remember to check the
permissions on the trigger file.

Thanks again,
Mason


On Fri, Jan 29, 2010 at 10:02 AM, Fujii Masao  wrote:

> On Fri, Jan 29, 2010 at 11:49 PM, Mason Hale  wrote:
> > While I did not remove the trigger file, I did rename recovery.conf to
> > recovery.conf.old.
> > That file contained the recovery_command configuration that identified
> the
> > trigger file. So that rename should have eliminated the problem. But it
> > didn't. Even after making this change and taking the trigger file out of
> the
> > equation my database failed to come online.
>
> Renaming of the recovery.conf doesn't resolve the problem at all. Instead,
> the sysadmin had to remove only the trigger file with a wrong permission
> and just restart postgres.
>
> >> 9.) The server did not come up (again). This time the contents of the
> >> new postgresql.log file were:
> >>
> >> [postg...@prod-db-2 pg_log]$ tail -n 100
> postgresql-2010-01-18_211132.log
> >> 2010-01-18 21:11:32 UTC ()LOG:  database system was interrupted while in
> recovery at log time 2010-01-18 20:10:59 UTC
> >> 2010-01-18 21:11:32 UTC ()HINT:  If this has occurred more than once
> some data might be corrupted and you might need to choose an earlier
> recovery target.
> >> 2010-01-18 21:11:32 UTC ()LOG:  could not open file
> "pg_xlog/00023C8200A3" (log file 15490, segment 163): No such
> file or directory
> >> 2010-01-18 21:11:32 UTC ()LOG:  invalid primary checkpoint record
> >> 2010-01-18 21:11:32 UTC ()LOG:  could not open file
> "pg_xlog/00023C820049" (log file 15490, segment 73): No such
> file or directory
> >> 2010-01-18 21:11:32 UTC ()LOG:  invalid secondary checkpoint record
> >> 2010-01-18 21:11:32 UTC ()PANIC:  could not locate a valid checkpoint
> record
> >> 2010-01-18 21:11:32 UTC ()LOG:  startup process (PID 9328) was
> terminated by signal 6: Aborted
> >> 2010-01-18 21:11:32 UTC ()LOG:  aborting startup due to startup process
> failure
>
> You seem to focus on the above trouble. I think that this happened because
> recovery.conf was deleted and restore_command was not given. In fact, the
> WAL file (e.g., pg_xlog/00023C8200A3) required for recovery
> was unable to be restored from the archive because restore_command was
> not supplied. Then recovery failed.
>
> If the sysadmin had left the recovery.conf and removed the trigger file,
> pg_standby in restore_command would have restored all WAL files required
> for recovery, and recovery would advance well.
>
> Hope this helps.
>
> Regards,
>
> --
> Fujii Masao
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>


Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
>
>
> >> If the sysadmin had left the recovery.conf and removed the trigger file,
> >> pg_standby in restore_command would have restored all WAL files required
> >> for recovery, and recovery would advance well.
> >
> > That may be true, but it's certainly seems unfortunate that we don't
> > handle this case a bit more gracefully.
>
> Yeah. I don't think we can do much better in versions <= 8.3, though we
> should change that unlink() call to cause the FATAL error in a more
> explicit way; it seems accidental and possibly non-portable as it is.
>
> Adding the note to the docs that Mason suggested is a good idea.
>
>
Given that this situation did NOT actually cause corruption, rather the
error message was mangled such that it suggested corruption, I offer this
revised suggestion for update to the documentation:

Important note: It is critical the trigger file be created with permissions
> allowing the postgres process to remove the file. Generally this is best
> done by creating the file from the postgres user account. Failure to do so
> will prevent completion of WAL file recovery and the server from coming back
> online successfully.


Best regards,
Mason


[BUGS] BUG #5459: Unable to cancel query while in send()

2010-05-11 Thread Mason Hale

The following bug has been logged online:

Bug reference:  5459
Logged by:  Mason Hale
Email address:  ma...@onespot.com
PostgreSQL version: 8.3.8
Operating system:   Redhat EL 5.1-64 bit
Description:Unable to cancel query while in send()
Details: 

ISSUE: unable to cancel queries using pg_cancel_backend(), that are in
send() function call, waiting on client receipt of data.

EXPECTED RESULT: expect to be able to cancel most/all queries using
pg_cancel_backend() as superuser, perhaps with some wait time, but not an
hour or more.

= SYMPTOM =

A SELECT query was running over 18 hours on our PostgreSQL 8.3.8 server.
Verified that it was not waiting on any locks via pg_stat_activity.
Attempted to cancel the query using pg_cancel_backend(), which returned 't'.
However more than an hour later the process was still active, using about 6%
of CPU and 5% of RAM.

Terminated the client process that was running the query (from another
server) did not cause the query process on the pgsql server to stop. In this
case the client was connecting via a ssh tunnel through an intermediate
'gateway' server.

Connection path was:

   CLIENT -->  SSH GATEWAY --> DB SERVER

= DIAGNOSIS =

Diagnosed this issue with help from 'andres' in #postgresql IRC. Per his
request, attached to 'stuck' process using gdb, generating the following
outputs:

  - Initial backtrace: http://pgsql.privatepaste.com/6f15c7e363
  -( 'c', then ctrl+c, then 'bt full') x 4:
http://pgsql.privatepaste.com/3d3261659a
  - Stepping several times with 'n':
http://pgsql.privatepaste.com/0f302125a8

'andres' reported that interrupts were not checked in send() and probably
should be, and suggested opening this bug report. 

Additional investigation of the ssh tunnel connection revealed the
connection on the intermediate gateway server was stuck in a FIN_WAIT2 state
(as reported by netstat). The other end of the connection on the pgsql
server was reported as CLOSE_WAIT by netstat. 

Kiling the ssh tunnel process on the gateway server cleared the connection
and the long-running query process db server terminated very soon after.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #2739: INTERSECT ALL not working

2006-11-06 Thread Mason Hale

The following bug has been logged online:

Bug reference:  2739
Logged by:  Mason Hale
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.1.5
Operating system:   GNU/Linux 2.6.9-42.0.3.ELsmp
Description:INTERSECT ALL not working
Details: 

'INTERSECT ALL' does not return duplicate rows in a query.

The query below should return 10 rows, but it returns 5 rows on my system:

(
   SELECT tablename
   FROM pg_tables
   LIMIT 5
)
INTERSECT ALL
( 
   (
 SELECT tablename
 FROM pg_tables
 LIMIT 5
   )
   UNION ALL
   (
 SELECT tablename
 FROM pg_tables
 LIMIT 5
   )
)

Note, the above is a simplied query meant to demonstrate the problem. This
same behavior occurs (and was discovered) in real-world situations with
user-defined tables.

This is nearly a deal-stopper for our application. Please reply to let me
know the status of this report.

Thanks,
Mason Hale

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #2739: INTERSECT ALL not working

2006-11-06 Thread Mason Hale
Tom -Many thanks for the quick reply. I feel honored to receive email from you after seeing your name so many times in my web searches on Postgres topics.That's not how I understood INTERSECT ALL to work. But it's the clear the spec is right and my understanding is wrong.
This is not a bug.Unfortunately the INTERSECT ALL as spec'd and implemented doesn't quite give me what I need. So back to the drawing board for me...best regards,Mason
On 11/6/06, Tom Lane <[EMAIL PROTECTED]> wrote:
"Mason Hale" <[EMAIL PROTECTED]> writes:> The query below should return 10 rows,Not by my reading of the spec.  SQL92 7.10 saith:b) If a set operator is specified, then the result of applying
  the set operator is a table containing the following rows:  i) Let R be a row that is a duplicate of some row in T1 or of some row in T2 or both. Let m be the number of duplicates
 of R in T1 and let n be the number of duplicates of R in T2, where m >= 0 and n >= 0iii) If ALL is specified, then... 3) If INTERSECT is specified, then the number of duplicates
   of R that T contains is the minimum of m and n.You have m = 1, n = 2 for each distinct row at the INTERSECT step,ergo you get one copy out.regards, tom lane



[BUGS] BUG #3723: dropping an index that doesn't refer to table's columns

2007-11-06 Thread Sam Mason

The following bug has been logged online:

Bug reference:  3723
Logged by:  Sam Mason
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.2.5
Operating system:   Linux
Description:dropping an index that doesn't refer to table's columns
Details: 

Hi,

I've just discovered that an index that doesn't refer to any of its table's
columns isn't automatically dropped when its table is.  The test case for me
is:

  CREATE TABLE foo (
id INTEGER NOT NULL
  );

  CREATE UNIQUE INDEX foo_id  ON foo (id);
  CREATE UNIQUE INDEX foo_exp ON foo ((1));

  DROP TABLE foo;

  -- foo_id will have gone, but foo_exp will still be there
  \di foo_id
  \di foo_exp


AndrewSN suggested the following query to show indexes that have missing
tables:

  SELECT indexrelid::regclass
  FROM pg_index i LEFT JOIN pg_class c
ON i.indrelid=c.oid
  WHERE c.oid IS NULL;

He also showed me which system catalogs to change in order to delete these
indexes which I'm happy with at the moment.


Thanks,
  Sam

p.s. the reason for creating this strange index was to ensure that a maximum
of one row was inserted into the table---I can do this different ways for
now.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [BUGS] BUG #3723: dropping an index that doesn't refer to table's columns

2007-11-06 Thread Sam Mason
On Tue, Nov 06, 2007 at 10:00:43AM -0500, Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > Not sure that's enough of a use case to justify not banning it...
> 
> Yeah, it probably is.

It's reasonably easy to do this instead:

  CREATE TABLE foo (
one INTEGER NOT NULL UNIQUE CHECK (one = 1)
  );

The bug (for me) was that it's possible to get the database into an
inconsisant state, with no warning or obvious way to back out.


  Sam

---(end of broadcast)---
TIP 6: explain analyze is your friend


[BUGS] BUG #3724: Duplicate values added to table despite unique index

2007-11-06 Thread Mason Hale

The following bug has been logged online:

Bug reference:  3724
Logged by:  Mason Hale
Email address:  [EMAIL PROTECTED]
PostgreSQL version: 8.2.5
Operating system:   Redhat Linux (kernel: Linux 2.6.18-8.1.15.el5PAE)
Description:Duplicate values added to table despite unique index
Details: 

I discovered this issue when an update statement was resulting in a
duplicate key violation error, even though the update did not change any of
the columns in the unique index (and neither did the one trigger on this
table).

Here is the table description:

prod_2=> \d topic_version_page
  Table "bdu.topic_version_page"
   Column|Type |   Modifiers   
-+-+---
 topic_version_id| integer | not null
 page_id | integer | not null
 link_score  | double precision| 
 created_at  | timestamp without time zone | default now()
 updated_at  | timestamp without time zone | default now()
 is_entry_page   | boolean | default false
 linking_entry_count | integer | default 0
Indexes:
"index_topic_version_page_on_topic_version_id_and_page_id" UNIQUE, btree
(topic_version_id, page_id)
"index_topic_version_page_on_link_score" btree (link_score)
"index_topic_version_page_on_topic_version_id_and_created_at" btree
(topic_version_id, created_at)
Foreign-key constraints:
"topic_version_page_topic_version_id_fkey" FOREIGN KEY
(topic_version_id) REFERENCES topic_version(id) ON DELETE CASCADE
Triggers:
topic_version_page_updated_at_trigger BEFORE UPDATE ON
topic_version_page FOR EACH ROW EXECUTE PROCEDURE
update_updated_at_timestamp()


Note that there is a unique index on (topic_version_id, page_id).

Now look at the result of this query:

prod_2=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
  page_id  | count 
---+---
 161335682 | 2
 194359108 | 2
(2 rows)

Here we have two rows that violate the unique index constraint.

Looking at the rows in more detail:

prod_2=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score | created_at |   
 updated_at | is_entry_page | linking_entry_count 
--+---+++---
-+---+-
  263 | 161335682 |  0 | 2007-10-13 02:40:49.864219 |
2007-11-01 15:58:57.268593 | f |   5
  263 | 194359108 |  0 | 2007-10-25 13:34:20.654336 |
2007-10-25 13:34:20.654336 | f |   1
  263 | 194359108 |  0 | 2007-10-25 13:34:20.654336 |
2007-11-04 13:08:03.011292 | f |   2
  263 | 161335682 |  0 | 2007-10-13 02:40:49.864219 |
2007-11-04 13:08:03.011292 | f |   6
(4 rows)

We can see that each duplicate within a pair was created at the same time
(or at least within the same transaction), but that each pair was created at
a different time.

My expectation is that with the unique index in place this should not be
able to happen.

Other info that may be useful:

This database was created by taking a dump from a 8.2.4 database on 11/3 and
restoring into a new 8.2.5 database (on different hardware). The created_at
timestamps seem to indicate that these duplicates were created prior to the
dump being created.

However running the same query on the original 8.2.4 database returns zero
rows:

prod_1=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
 page_id | count 
-+---
(0 rows)

prod_1=> 

Finally, I restored the *same* dump file in another 8.2.5 database instance
and that database does not have the duplicate rows either.

>From this it seems to me that on the following occurred:

Some subsequent update operation resulted in two identical rows being
created in the same table. This likely happened at '2007-11-04
13:08:03.011292' which is the updated_at timestamp on two of the duplicate
rows (and is after the restore).

This looks like a bug related to update operations and unique indexes, but I
could of course be wrong.

Thanks for looking into it.

Mason

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [BUGS] BUG #3724: Duplicate values added to table despite unique index

2007-11-06 Thread Mason Hale
On Nov 6, 2007 11:16 AM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> > However running the same query on the original 8.2.4 database returns
> zero
> > rows:
>
> > prod_1=> select page_id, count(*) from topic_version_page where
> > topic_version_id = 263 group by 1 having count(*) > 1;
> >  page_id | count
> > -+---
> > (0 rows)
>
> Is that still true if you do the query with enable_indexscan = off
> and enable_bitmapscan = off?  If you see matching rows then, I'd
> suspect a corrupt index in the 8.2.4 installation --- does REINDEXing
> it succeed?
>

With enable_indexscan = off and enable_bitmapscan = off on the
8.2.4instance, it still returns zero rows.

prod_1=> set enable_indexscan = off;
SET
prod_1=> set enable_bitmapscan = off;
SET
prod_1=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
 page_id | count
-+---
(0 rows)


> For that matter, do you still see dups if you prevent use of the index
> in the 8.2.5 query?  Maybe it's that index that is corrupt.

Unfortunately, I'm not able to test that at this point.
To get our production db (the 8.2.5 instance) back in operation I deleted
the extra duplicate rows, so that the update statement would complete.

If I need to reindex the table in 8.2.5 database I'll need to wait until
this evening, barring any emergencies. (It is a 20GB table)

Mason


Re: [BUGS] BUG #3724: Duplicate values added to table despite unique index

2007-11-06 Thread Mason Hale
On Nov 6, 2007 1:06 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> >> For that matter, do you still see dups if you prevent use of the index
> >> in the 8.2.5 query?  Maybe it's that index that is corrupt.
>
> > Unfortunately, I'm not able to test that at this point.
> > To get our production db (the 8.2.5 instance) back in operation I
> deleted
> > the extra duplicate rows, so that the update statement would complete.
>
> Mph.  I'm afraid the evidence is mostly gone then, and we probably won't
> be able to figure out what happened.


Sorry about that. But I had to get things back up and running.


> However, it would be worth
> checking two things:
>
> 1. Can you confirm that the rows that got duplicated were in fact
> present (in only one copy) in the 8.2.4 DB?


Yes, they are present:

prod_1=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score | created_at
| updated_at | is_entry_page | linking_entry_count
--+---++++---+-
  263 | 161335682 |  0 | 2007-10-13 02:40:49.864219 |
2007-11-01 15:58:57.268593 | f |   5
  263 | 194359108 |  0 | 2007-10-25 13:34:20.654336 |
2007-11-04 01:01:50.512446 | f |   2
(2 rows)

prod_1=>


>
> 2. Can you check that there are still 1 (rather than 0) copies of the
> rows in the 8.2.5 DB?  One possible theory about this is that what you
> had was (two instances of) two index entries pointing at the same heap
> row, in which case a DELETE that you thought removed only one copy would
> have deleted both.
>

Yes, we have 1 of each row (I kept the most recently updated version of
each):

prod_2=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score | created_at
| updated_at | is_entry_page | linking_entry_count
--+---++++---+-
  263 | 194359108 |  0 | 2007-10-25 13:34:20.654336 |
2007-11-04 13:08:03.011292 | f |   2
  263 | 161335682 |  0 | 2007-10-13 02:40:49.864219 |
2007-11-06 16:01:36.393953 | f |   7
(2 rows)

prod_2=>



Mason


Re: [BUGS] BUG #3724: Duplicate values added to table despite unique index

2007-11-12 Thread Mason Hale
> >> 2. Can you check that there are still 1 (rather than 0) copies of the
> >> rows in the 8.2.5 DB?
>
> > Yes, we have 1 of each row (I kept the most recently updated version of
> > each):
>
> Ah, I forgot that the rows were obviously not identical because of the
> differing updated_at values.
>
> Tell us more about the updating process --- is it likely that there
> could be conflicting concurrent updates on a row?  What does your
> app do in such cases?
>

We do some application-layer locking to prevent the same 'topic-version'
from being updated at the same time.
But there could be a bug somewhere that let's that happen, in which case,
concurrent updates of the same row could occur. So I guess I would say it is
unlikely, but possible.

If there is a concurrent update of the same row, I guess we're relying on
Postgres to handle that. If that results in a deadlock or any other error,
then we catch and log the error, mark the update job as failed (in another
db table) and try again. In most every case, the same request will complete
successfully on a second try. In this case, every update to the rows in
question was resulting in a unique index violation, which resulted in a
near-constant stream of errors being logged.


[BUGS] Duplicate values found when reindexing unique index

2007-12-30 Thread Mason Hale
Hello --

I noticed recently some errors in my postgres log like the following:

ERROR:  could not open segment 9 of relation 1663/16830/1384385 (target
block 776929292): No such file or directory

These are occurring at a rate of 1 every 2-3 days. But that rate has been
increasing.

After Googling around, I found this error could indicate index corruption.

I found that the relation with oid 1384385 was the following
index: "index_entry_on_guid_and_feed_id" UNIQUE, btree (guid, feed_id) -- on
a table named entry.

When I tried to rebuild that index, I got the following error:

prod_2=# reindex index index_entry_on_guid_and_feed_id;
ERROR:  could not create unique index
DETAIL:  Table contains duplicated values.

Now, if that index was in fact corrupted, the perhaps it is not such a
surprise that the duplicate values are there. The thing that makes this more
mysterious is that we have another unique index on entry(feed_id, guid) --
and I have no reason to believe it is corrupted (although I guess it still
could be).

Please note: I have previously reported another issue with duplicate values
making into into a table despite having a unique index in place (bug #3724).
This issue involves a completely different table.

We can stay running despite this error but I would very much like to track
down the cause as soon as possible. To avoid destroying any evidence, I'm
going to leave things intact and await further instructions.

Given my experience, the reliability of unique indexes is becoming somewhat
suspect. Please help. ;-)

thanks in advance,
Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
On Dec 30, 2007 12:09 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> > Given my experience, the reliability of unique indexes is becoming
> somewhat
> > suspect. Please help. ;-)
>
> Well, as in the previous report, there is not enough information here to
> offer much chance of understanding what's going wrong.
>

Please just tell me what information you need and I will provide what I can.


> Have you tried reindexing that other index with the same columns in the
> other order?  My guess is that there really are duplicate entries in the
> table; if so the other one should fail too.  If so, please try to
> identify the duplicated values, along the lines of
>
>select guid, feed_id from entry group by 1,2 having count(*) > 1
>
> and show us the system columns (ctid,xmin,xmax,cmin,cmax) from the
> tuples having duplicate value(s).  Note that you should probably disable
> indexscan and bitmapscan while doing this probing, so as not to have the
> queries use the suspect indexes.


I found a single pair of rows that were duplicated. Interestingly it was not
just the guid and feed_id that were duplicated but all columns were
indentical, including the primary key, except an update_at column which is
automatically populated via a trigger (BEFORE UPDATE on entry FOR EACH ROW).

The duplicate data included a created_at column which defaults to now() --
that the two duplicate rows have exactly the same values strongly hints to
me that the duplicates were created during the same transaction.

Here's the system column data you requested.

id | ctid | xmin | xmax | cmin | cmax
---+--+--+--+--+--
 151341072 | (1508573,11) |2 |0 |   19 |0
 151341072 | (1818219,11) |2 |0 |   19 |0
(2 rows)



> This is 8.2.5 right?  Was the DB loaded fresh into 8.2.5, or was it
> inherited from previous 8.2.x release(s)?
>

It is 8.2.5. It was loaded from a pg_dump from an 8.2.3 database into a
fresh 8.2.5 database on new hardware.


> BTW, what are the datatypes of the index columns?
>

id integer not null (serial)
guid character varying not null   ( no size limit defined )
feed_id integer not null


>
>regards, tom lane
>

thanks for the help Tom.

I do want to clear out one of the offending duplicates and reindex. But I'll
wait to do that until I get the okay from you.


Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
Trolling through my server log I found this error:
2007-12-30 20:02:08 CST (10.11.199.136) PANIC:  right sibling's left-link
doesn't match
2007-12-30 20:02:08 CST (10.11.199.136) STATEMENT:  update bdu.entry set
title=$1, author=$2, description_type=$3, description_length=$4,
description=$5, published_at=$6, republished_at=$7, link=$8,
link_page_id=$9, link_count=$10, enclosure=$11, enclosure_page_id=$12,
enclosure_count=$13 where id=$14
2007-12-30 20:02:08 CST () LOG:  server process (PID 30004) was terminated
by signal 6
2007-12-30 20:02:08 CST () LOG:  terminating any other active server
processes

This seems related to the entry table -- so I wonder if it is related to
this problem?



On Dec 30, 2007 8:23 PM, Mason Hale <[EMAIL PROTECTED]> wrote:

>
>
> On Dec 30, 2007 12:09 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> > "Mason Hale" <[EMAIL PROTECTED]> writes:
> > > Given my experience, the reliability of unique indexes is becoming
> > somewhat
> > > suspect. Please help. ;-)
> >
> > Well, as in the previous report, there is not enough information here to
> > offer much chance of understanding what's going wrong.
> >
>
> Please just tell me what information you need and I will provide what I
> can.
>
>
> > Have you tried reindexing that other index with the same columns in the
> > other order?  My guess is that there really are duplicate entries in the
> >
> > table; if so the other one should fail too.  If so, please try to
> > identify the duplicated values, along the lines of
> >
> >select guid, feed_id from entry group by 1,2 having count(*) > 1
> >
> > and show us the system columns (ctid,xmin,xmax,cmin,cmax) from the
> > tuples having duplicate value(s).  Note that you should probably disable
> > indexscan and bitmapscan while doing this probing, so as not to have the
> > queries use the suspect indexes.
>
>
> I found a single pair of rows that were duplicated. Interestingly it was
> not just the guid and feed_id that were duplicated but all columns were
> indentical, including the primary key, except an update_at column which is
> automatically populated via a trigger (BEFORE UPDATE on entry FOR EACH ROW).
>
>
> The duplicate data included a created_at column which defaults to now() --
> that the two duplicate rows have exactly the same values strongly hints to
> me that the duplicates were created during the same transaction.
>
> Here's the system column data you requested.
>
> id | ctid | xmin | xmax | cmin | cmax
> ---+--+--+--+--+--
>  151341072 | (1508573,11) |2 |0 |   19 |0
>  151341072 | (1818219,11) |2 |0 |   19 |0
> (2 rows)
>
>
>
> > This is 8.2.5 right?  Was the DB loaded fresh into 8.2.5, or was it
> > inherited from previous 8.2.x release(s)?
> >
>
> It is 8.2.5. It was loaded from a pg_dump from an 8.2.3 database into a
> fresh 8.2.5 database on new hardware.
>
>
> > BTW, what are the datatypes of the index columns?
> >
>
> id integer not null (serial)
> guid character varying not null   ( no size limit defined )
> feed_id integer not null
>
>
> >
> >regards, tom lane
> >
>
> thanks for the help Tom.
>
> I do want to clear out one of the offending duplicates and reindex. But
> I'll wait to do that until I get the okay from you.
>
>
> Mason
>


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
>
>
> Can you show us all the triggers on this table?


Here they are:

Triggers:
entry_correct_published_at_trigger BEFORE INSERT OR UPDATE ON entry FOR
EACH ROW EXECUTE PROCEDURE correct_published_at()
entry_feed_page_trigger BEFORE INSERT OR UPDATE ON entry FOR EACH ROW
EXECUTE PROCEDURE entry_feed_page_trigger()
entry_updated_at_trigger BEFORE UPDATE ON entry FOR EACH ROW EXECUTE
PROCEDURE update_updated_at_timestamp()
feed_entry_count_trigger AFTER INSERT ON entry FOR EACH ROW EXECUTE
PROCEDURE update_feed_entry_count()


>  Also, it would be real
> interesting to see "pg_filedump -i -f" output for the two blocks in
> question (1508573 and 1818219) --- see http://sources.redhat.com/rhdb/
> to get a copy of pg_filedump.
>

I have downloaded, compiled and installed pg_filedump -- but I am not sure
how to determine which file I should have it dump. I am not very familiar
with the postgres file structure. Can you please provide some guidance? How
do I determine the correct file?

I've determined the relation "entry" has an oid = 16838 -- but the
/data/base/16830 directory contains 92 1GB files named 16838.[1-92]

I've tried:

pg_filedump -i -f -R 1508573 16838

and got the following error:

[EMAIL PROTECTED] 16830]$ pg_filedump -i -f -R 1508573 16838

***
* PostgreSQL File/Block Formatted Dump Utility - Version 8.2.0
*
* File: 16838
* Options used: -i -f -R 1508573
*
* Dump created on: Sun Dec 30 23:18:01 2007
***
Error: Seek error encountered before requested start block <1508573>.

I tried a few other files in the 1-92 range with the same results.


Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
>
>
> I think you misread Mason's post of 20:23 GMT-6 where he says the
> created_at values are the *same*, not different. Mason's previous bug
> report 3724 also had duplicate rows with matching created_at values.
>

Yes, to confirm, the created_at values are the same. The *only* values that
are different are updated_at and that value is set via trigger.

Another data point that may or may not be relevant: I've been working to set
up a warm standby server and have run into some weird behavior there as
well. The symptom is the server encounters an error during the file restore,
then pre-maturely exits recovery mode. I'm using the pg_standby program for
the restore_command in my recovery.conf on the standby server.

The error message from the standby server was:

Last week I had turned my attention away from the database restore, and
didn't notice that our standby server had exited recovery mode on 12/20.
Here's the last few lines from the log file (full log attached).

2007-12-20 04:11:43 CST () LOG:  restored log file
"000104220057" from archive
2007-12-20 04:13:09 CST () LOG:  restored log file
"000104220058" from archive
2007-12-20 04:14:40 CST () LOG:  restored log file
"000104220059" from archive
2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file 1058,
segment 89, offset 0
2007-12-20 04:14:40 CST () LOG:  redo done at 422/58FFEE38
2007-12-20 04:14:40 CST () LOG:  restored log
file "000104220058" from archive
2007-12-20 04:14:40 CST () LOG:  archive recovery complete
2007-12-20 04:24:57 CST () LOG:  database system is ready

thanks,
Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
On Dec 31, 2007 9:48 AM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> >> I think you misread Mason's post of 20:23 GMT-6 where he says the
> >> created_at values are the *same*, not different. Mason's previous bug
> >> report 3724 also had duplicate rows with matching created_at values.
>
> > Yes, to confirm, the created_at values are the same. The *only* values
> that
> > are different are updated_at and that value is set via trigger.
>
> Which still demonstrates that they aren't exact images.  I'd still like
> to see the pg_filedump output though ...


I sent in previous email. You saw it, yes?


> > 2007-12-20 04:11:43 CST () LOG:  restored log file
> > "000104220057" from archive
> > 2007-12-20 04:13:09 CST () LOG:  restored log file
> > "000104220058" from archive
> > 2007-12-20 04:14:40 CST () LOG:  restored log file
> > "000104220059" from archive
> > 2007-12-20 04:14:40 CST () LOG:  invalid info bits 0001 in log file
> 1058,
> > segment 89, offset 0
>
> Do you by any chance still have 000104220058 and
> 000104220059 archived?  If so it would be useful to
> look at the first dozen lines of "od -x" dump of each of them.
>


Yes, I do. Here's the output:

[EMAIL PROTECTED] wal_archive]$ od -x 000104220058 | head -n15
000 d05e 0002 0001  0423   c100
020 f7df 472e e701 4728  0100 2000 
040 a1db 81e6 0423  0068 c000  
060 0048  002c    0423 
100 0020 c100     0001 
120   780b 2ede 9f68 00f5 7834 
140 2f4d 0001 1f35 4774    
160        
*
002 d05e  0001  0422  2000 5800
0020020 53c2 48bc 0422  1fbc 5800 ce6f 2edd
0020040 003a  001e  0b00  067f 
0020060 41be  1ff8 0015  0186 0007 
0020100 4337 000a 000c 008f 0122  db84 d429
0020120 0422  2010 5800 ce6f 2edd 003a 
[EMAIL PROTECTED] wal_archive]$ od -x 000104220059 | head -n15
000 d05e 0001 0001  006b  6000 69dc
020 12ae  6380 0024 0010 375a 21cd 1174
040 4001 0001 637c 0058 0010 375a 21cd 1174
060 4001 0001 6355 0010 0010 375a 21cd 1174
100 4001 0001 631d 005a 0010 375a 21cd 1174
120 4001 0001 62e8 001e 0010 375a 21cd 1174
140 4001 0001 629a 0019 0010 375a 21cd 1174
160 4001 0001 604a 0013 0010 375a 21cd 1174
200 4001 0001 6045 0025 0010 375a 21cd 1174
220 4001 0001 6044 0029 0010 375a 21cd 1174
240 4001 0001 603a 0005 0010 375a 21cd 1174
260 4001 0001 6020 0035 0010 375a 21cd 1174
300 4001 0001 600e 0050 0010 375a 21cd 1174
320 4001 0001 600d 0018 0010 375a 21cd 1174
340 4001 0001 600b 0033 0010 375a 21cd 1174
[EMAIL PROTECTED] wal_archive]$


Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
Tom, I'll send these to you privately.
Mason

On Dec 31, 2007 10:22 AM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> >> Can you show us all the triggers on this table?
>
> > Here they are:
>
> > Triggers:
> > entry_correct_published_at_trigger BEFORE INSERT OR UPDATE ON entry
> FOR
> > EACH ROW EXECUTE PROCEDURE correct_published_at()
> > entry_feed_page_trigger BEFORE INSERT OR UPDATE ON entry FOR EACH
> ROW
> > EXECUTE PROCEDURE entry_feed_page_trigger()
> > entry_updated_at_trigger BEFORE UPDATE ON entry FOR EACH ROW EXECUTE
> > PROCEDURE update_updated_at_timestamp()
> > feed_entry_count_trigger AFTER INSERT ON entry FOR EACH ROW EXECUTE
> > PROCEDURE update_feed_entry_count()
>
> Actually I wanted to see the function bodies ...
>
>regards, tom lane
>


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
>
> Something else interesting about that: the apparent interloper page
> contains just a single WAL record, which appears to be a shutdown
> checkpoint bearing the timestamp "Thu Dec 27 2007, 16:55:01 EST".
> Not sure if Mason can correlate that with any recent activity...
>

Nothing jumps to mind, but I'll crawl through the logs looking for anything
interesting.

I'm starting to think that Occam's razor says you've got hardware
> problems.  Or maybe a kernel-level bug that is causing writes to get
> discarded.
>

For what its worth, we do closely monitor the server for RAID and kernel
errors and haven't seen either variety reported in quite some time.

If it is some mysterious hardware or kernel error -- any suggestions for how
I should go about narrowing that down?

Finally, if it would help to see the full wal segment files, let me know and
I will email individually. (I assume the list doesn't want 32MB of
attachments).

Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
On Dec 31, 2007 12:51 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> > If it is some mysterious hardware or kernel error -- any suggestions for
> how
> > I should go about narrowing that down?
>
> The first thing to ask is whether you're running an up-to-date kernel
> (and whose is it).
>

The kernel is not up to date. What we have is:

Kernel: Linux 2.6.18-8.1.15.el5PAE
Distribution: RedHat EL 5.0-32

We can/will update the kernel at our next opportunity.


>
> > Finally, if it would help to see the full wal segment files, let me know
> and
> > I will email individually. (I assume the list doesn't want 32MB of
> > attachments).
>
> If you'd send them to me privately, I'd be interested.
>

 I will send these to you right away.

Mason


Re: [BUGS] Duplicate values found when reindexing unique index

2007-12-31 Thread Mason Hale
Hello Tom --
a thousand thanks for taking the time to walk through those files.

This could be the kernel's fault, but I'm wondering whether the
> RAID controller is going south.  Offhand it seems like the controller
> would be the only place that would be likely to explain both the
> bogus-data and good-data-in-wrong-place behaviors.
>

To clarify a bit further -- on the production server, the data is written to
a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a separate,
dedicated SATA II disk.

There is a similar setup on the standby server, except that in addition to
the RAID for the data, and a separate SATA II disk for the pg_xlog, there is
another disk (also SATA II) dedicated for the archive of wal files copied
over from the production server.

I mention all this because if the problem is the wal files in the pg_xlog
directory becoming corrupted, and those exist on a dedicated disk (not in a
RAID), then doesn't that indicate that the RAID controller is not involved?

Mason


Re: [BUGS] BUG #3848: function pg_catalog.substring(date, integer, integer) does not exist

2008-01-01 Thread Sam Mason
On Tue, Jan 01, 2008 at 08:11:01PM +, Ion wrote:
> Email address:  [EMAIL PROTECTED]

lets hope Ion is subscribed! :)

> I tested PostgreSQL 8.3 beta4 with tinyerp and I have this problem:
> pg_catalog.substring(date, integer, integer) does not exist

This is most likely a bug in tinyerp. substring has only ever been
defined for string data types and isn't valid for date types at all.

> I have not this problem with postgresql 8.2.5.

Prior to PG 8.3, all types would be silently coerced into TEXT types in
certain cases, leading to all sorts of strangeness.

I've just downloaded the source of tinyerp and had a look though and
this practice seems pretty endemic.  They should really be using

  to_char(date_exp, '-MM')

instead of

  substring(date_exp FOR 7)

in all their queries.


  Sam

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [BUGS] BUG #3848: function pg_catalog.substring(date, integer, integer) does not exist

2008-01-01 Thread Sam Mason
On Tue, Jan 01, 2008 at 04:29:47PM -0500, Tom Lane wrote:
> Sam Mason <[EMAIL PROTECTED]> writes:
> > I've just downloaded the source of tinyerp and had a look though and
> > this practice seems pretty endemic.  They should really be using
> >   to_char(date_exp, '-MM')
> > instead of
> >   substring(date_exp FOR 7)
> > in all their queries.
> 
> Or at least explicitly casting the date to text.  But your way is
> better, since it won't break if the datestyle is something other
> than ISO.

It also improves code readability is is always a good thing.

I've just gone though the code and sent a patch to the tinyerp-devel
list with these fixes in.  Lets see if it gets incorporated.


  Sam

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] Duplicate values found when reindexing unique index

2008-01-03 Thread Mason Hale
Hi everyone --

Sorry to revisit a dead horse, but I wanted to clear up some misinformation
--

On Dec 31, 2007 5:35 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Mason Hale" <[EMAIL PROTECTED]> writes:
> >> This could be the kernel's fault, but I'm wondering whether the
> >> RAID controller is going south.
>
> > To clarify a bit further -- on the production server, the data is
> written to
> > a 10-disk RAID 1+0, but the pg_xlog directory is symlinked to a
> separate,
> > dedicated SATA II disk.
>
> > There is a similar setup on the standby server, except that in addition
> to
> > the RAID for the data, and a separate SATA II disk for the pg_xlog,
> there is
> > another disk (also SATA II) dedicated for the archive of wal files
> copied
> > over from the production server.
>

It turns out that the separate SATA II disk was configured as a single-disk
JBOD on the same controller as the 10-disk RAID 1+0.

Since we've seen corruption in the data directory (on the RAID) and in the
pg_xlog directory (on the SATA II disk) the RAID controller is one of the
few common elements between those two partitions and hence is highly
suspect, and may dispel some of the mystery with our situation.

We will be replacing the RAID controller in short order. For what it is
worth it is an Adaptec 31605 with a battery backup module.


>
> Oh.  Maybe it's one of those disks' fault then.  Although WAL corruption
> would not lead to corruption of the primary DB as long as there were no
> crash/replay events.  Maybe there is more than one issue here, or maybe
> it's the kernel's fault after all.
>
>
Given the new information about the RAID controller is managing all the
disks in the question (after all) -- if the RAID controller is going south,
then there would be no need for a crash/replay event for that corruption to
make it into the primary DB. Seems to be pretty damning evidence against the
RAID controller, agreed?

Mason


Re: [BUGS] BUG #4035: sql table aliases do not work

2008-03-15 Thread Sam Mason
On Fri, Mar 14, 2008 at 08:53:08PM +, RGF wrote:
> PostgreSQL version: latest

do you mean 8.2.6, 8.3.0 or a latest version of some other series?
please try a little harder next time!!

> Description:sql table aliases do not work

they do whenever I use them! by the looks of your SQL you're not even
using them where you should be:

> Context: SQL statement "update tonodes set tonodes.cost = case when
> tonodes.cost is NULL then fromnodes.cost + paths.cost when fromnodes.cost +
> paths.cost < tonodes.cost then fromnodes.cost + paths.cost else tonodes.cost
> end, tonodes.pathid = paths.pathid from nodes as fromnodes inner join paths
> on paths.fromnodeid = fromnodes.nodeid inner join tonodes on tonodes.nodeid
> = paths.tonodeid where fromnodes.nodeid =  $1  and (tonodes.cost is NULL or
> fromnodes.cost + paths.cost < tonodes.cost) and tonodes.calculated = 0"
> PL/pgSQL function "dijkstra_resolve" line 53 at SQL statement

If we rewrite this to be somewhat readable:

  update tonodes set
 tonodes.cost = case
   when tonodes.cost is NULL
 then fromnodes.cost + paths.cost
   when fromnodes.cost + paths.cost < tonodes.cost
 then fromnodes.cost + paths.cost
 else tonodes.cost end,
 tonodes.pathid = paths.pathid
  from nodes as fromnodes
 inner join paths   on paths.fromnodeid = fromnodes.nodeid
 inner join tonodes on tonodes.nodeid   = paths.tonodeid
  where fromnodes.nodeid =  $1
and (tonodes.cost is NULL or fromnodes.cost + paths.cost < tonodes.cost)
and tonodes.calculated = 0;

You refer to "tonodes" but never actually say that it's an alias for
nodes (I assume, you've not actually said this anywhere).

> The tables referenced (nodes and paths) exist and have data. The SQL works
> in MS SQL Server 2000

The PG manual[1] has this to say:

  Some other database systems offer a FROM option in which the target
  table is supposed to be listed again within FROM. That is not how
  PostgreSQL interprets FROM. Be careful when porting applications that
  use this extension.

I'd guess this is what MS SQL does.  That said, it's easy to rewrite
your query to use PG syntax.  I've also noticed that your CASE statement
is somewhat redundant so I've removed it (it's cases are exactly the
same as the WHERE clause).

  UPDATE nodes f SET cost = f.cost + p.cost, pathid = p.pathid
  FROM nodes t, paths p
  WHERE (p.fromnodeid,p.tonodeid) = (f.nodeid,t.nodeid)
AND (t.cost IS NULL OR f.cost + p.cost < t.cost)
AND t.calculated = 0
AND f.nodeid =  $1;

Which, to me, is even more readable.  For future reference, the
pgsql-general mailing list[2] is more appropiate for questions like
this.

As a side note, do you have exactly one path from each node to another
node, or do you run this code several times until it converges on the
minimum?  In the latter case you'd probably be better off using an
aggregation to find the shortest path in a single pass.


  Sam

 [1] http://www.postgresql.org/docs/8.3/static/sql-update.html#AEN61013
 [2] http://archives.postgresql.org/pgsql-general/

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Redundant explicit field name/types description while select from function with return type of record

2008-03-28 Thread Sam Mason
On Fri, Mar 28, 2008 at 01:43:25PM -0300, Euler Taveira de Oliveira wrote:
> [EMAIL PROTECTED] wrote:
> >a column definition list is required for functions returning "record"
> >It seems a BUG
> >
> I don't think so. We can say it is a missing feature. As stated in [1], 
> record types don't have a predefined structure -- they're placeholders. 

I was having a similar discussion with Gregory Stark about this and
hadn't realised that such small amounts of state was recorded with each
row.

> How do you know the row structure before hand? Its structure can be 
> changed on-the-fly.

Sorry, I don't understand this comment.  Could you elaborate?


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4085: No implicit cast after coalesce

2008-04-03 Thread Sam Mason
On Thu, Apr 03, 2008 at 12:24:17AM +0200, Peter Eisentraut wrote:
> Jeff Dwyer wrote:
> > This works fine:
> > select 1 where current_date between '1900-3-3' and '1900-2-2';
> > This doesn't:
> > select 1 where current_date between coalesce(null,current_date)  and
> > coalesce(null, '1900-1-2');
> >
> > This fix works:
> > select 1 where current_date between coalesce(null,current_date)  and
> > coalesce(null, date('1900-1-2'));
> >
> > This seems like a bug to me. Why should an explicit cast be necessary after
> > a coalesce?
> 
> Because coalesce(null, '1900-1-2') has no other type information attached, so 
> it would have picked text by default as result type, and that then clashes 
> with the result type of coalesce(null,current_date), which can be derived to 
> be date.  This is a robustness improvement: 8.2 and earlier would silently 
> accept coalesce(null, 'abc') and apply text-semantics comparison.

The types look as though they could be interpreted unambiguously and
correctly very easily.  Parametric polymorphism and some basic type
inference would easily be able to resolve this.  BETWEEN would have
the following type (very informally presented; lower case characters
stand for type variables, Titlecase for type names, UPPERCASE for
identifiers!):

  Boolean (t BETWEEN t AND t)

i.e. when BETWEEN is called all the types must be the same.  COALESCE is
also parametrised over a single type:

  t COALESCE(t,t)

NULLs could be encoded in the type system in many ways as long it had a
polymorphic type.  The type system should realise that "current_date"
is of type Date and because NULL is polymorphic the COALESCEs would
unify, both returning values of type Date, which would in turn unify
with the BETWEEN operator resulting in a value of BOOLEAN type, which is
exactly what the WHERE clause expects.

This sort of type inference has been known (and extensively studied)
for about 50 years now, it always surprises me how little it's known
outside the functional programming community (ML and Haskell being the
old guard).  Apparently, according to the fountain of wisdom that is
Wikipedia, It's finally starting to break into very mainstream languages
like the next versions of VB9 and C#3.

Moving an existing implementation over to a new type system is an
entirely non-trivial matter though!


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistent shift operator

2008-04-20 Thread Sam Mason
On Sat, Apr 19, 2008 at 11:26:57AM -0400, Tom Lane wrote:
> Roman Kononov <[EMAIL PROTECTED]> writes:
> > The below test cases show the obvious inconsistency between different 
> > integer types.
> 
> [ shrug... ]  The << and >> operators just expose the behavior of the
> local C compiler's shift operators, and it's clearly stated in the C
> spec that shifting by more than the word width produces unspecified
> results.

I thought that getting the correct answer was more important than
getting the "wrong" answer quickly.  The current code also introduces
its own set of strangeness in the 16 bit case on my x86_64 box.  It does
something closer to:

  int16 shl (int val, int n) {
n %= 32;
return n < 16 ? val << n : 0;
  }

This is exactly what the code says, it's just the resulting semantics
aren't very nice for the user.

Wouldn't it be easy to put some code like this in:

  if (arg2 < 16)
return PG_RETURN_INT16(arg1 << arg2);
  return PG_RETURN_INT16(0);

People would have one less platform specific weirdo to worry about.

As an aside, I thought it would be interesting to see what MySQL did and
it seems to check for and handle this case--albeit only the 64bit case,
but as far as I can tell it only really knows about "long long" ints.


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Inconsistent shift operator

2008-04-20 Thread Sam Mason
On Sun, Apr 20, 2008 at 12:27:38PM -0400, Tom Lane wrote:
> Sam Mason <[EMAIL PROTECTED]> writes:
> > Wouldn't it be easy to put some code like this in:
> >   if (arg2 < 16)
> > return PG_RETURN_INT16(arg1 << arg2);
> >   return PG_RETURN_INT16(0);
> 
> This is a straw man.  It covers *one* of the behaviors left undefined
> by the C standard.  I quote from C99:

 [...]

wow, I never realised how little semantics C actually defines.  I'm a
fan of formally defined semantics and the above just seems like a big
cop-out.  The case of E1 being negative seems completely implementation
defined!

> We are shifting signed types so we are exposed to every one of these
> unspecified behaviors.  In particular, since we don't know whether the
> behavior of >> will be zero-fill or sign-fill, it's not going to be
> exactly trivial to make a consistent extension of it to shift values
> greater than the word width.

About the only reasonable thing I can think of that would remain within
C's spec (at least with respect of shifting) would be to always treat
E1 as an unsigned value.  This would allow it to be used consistently
with the other bit-wise operators, but would cause any non-bitwise
interpretation of the result to become implementation defined.  Not very
nice.

> By the time you get done replicating all this for the int2, int4,
> and int8 shift operators, it's not looking like such a small patch
> anymore.  And I still find the premise entirely unconvincing.

After an afternoon of getting utterly bogged down looking into what
other languages do and getting very distracted with ring theory I'm
tempted to reluctantly agree.  Maybe the warning about floats could be
extended to cover the shift operators as well.  Maybe:

  The bit shift operators only return consistent results when both the
  RHS is within the bit-size of the arguments' data type and the LHS is
  positive, in all other cases the behaviour is platform specific.

I think that's technically correct, but seems to come across as very
pessimistic.

> Maybe the user *wants* to see the local behavior of shift, whatever
> it might be.  It's certainly not impossible that we'd break applications
> that worked fine before (at least on the hardware they were being
> used on).

Yes.  It always surprises me how hard getting formal semantics into an
existing language seems to be.


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4114: Inconsistent shift operator

2008-04-20 Thread Sam Mason
On Sun, Apr 20, 2008 at 08:17:50PM +0200, Zdenek Kotala wrote:
> Roman Kononov napsal(a):
> >The below test cases show the obvious inconsistency between different
> >integer types.
> 
> It seems to be OK regarding how C shift operator works. Try

Yes, but I interpret this behaviour as not being very useful for people
writing SQL code that uses the shift operators.  C is a very low level
language and you almost always end up writing platform specific code,
SQL is supposed to be much higher level and should abstract away system
specific differences.

I've not been able to think of a nice way of doing this though.


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] PB with postgresql 7.4

2008-04-28 Thread Sam Mason
On Mon, Apr 28, 2008 at 02:14:17PM +0200, Nicolas ZABOURI wrote:
> My server is down and all my dump aren't make.
> How can make for re-install the data on a new install of the server ?

If you can get another copy of 7.4 installed that's built for the same
processor as the database was originally running from and then point it
at your old data directory it should be able to use it.  I.e. something
like:

  postmaster-7.4 -D /path/to/old/data

It is probably worth working with a copy of the data just in case
something strange happens.

If you want to stay with 7.4 then you're done, if you want to upgrade
to a newer release (i.e. 8.2 or 8.3) of PG then you should do a pg_dump
from the newer version's dump program.

> I don't find a forum who can speak about this !

The [EMAIL PROTECTED] mailing list is probably the best place
to ask these sorts of questions.  They are described here:

  http://www.postgresql.org/community/lists/


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #4207: EXTRACT function

2008-05-29 Thread Sam Mason
On Thu, May 29, 2008 at 07:01:46AM +, Jeferson Kasper wrote:
> the EXTRACT(field FROM source) function allows to use some kind of 'fields',
> and i need to know if a time (like '07:00') is after the midday(12:00) or
> not.
> I know i can resolve it implementing some function like "select
> ('07:00'<'12:00')".. i just want to know if is difficult to implement a new
> field called 'am', to use like.
> 
> select extract(AM from '07:00');

What's wrong with just doing:

  SELECT extract(hour FROM date) < 12;


  Sam

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs