Aggregate functions with FROM clause and ROW_COUNT diagnostics

2018-05-21 Thread Alexey Dokuchaev
Hi,

I'm seeing somewhat confusing results here with 9.6.8, and cannot find
the answer in the docs or google.

I'm returning JSON array (or any array, it does not make a difference)
from my plpgsql function like this:

OUT retcode int,
OUT result json)
. . .
result := json_agg(_) FROM (
  SELECT foo, bar, baz ...
  FROM t1, t2, t3 WHERE ...) AS _;  -- this works fine

GET DIAGNOSTICS retcode = ROW_COUNT;-- always returns 1

I'd expected `retcode' to contain the number of SELECT'ed rows, but it
is something else (always 1).  Apparently, aggregation functions like
json_agg()/array_agg() mangle the ROW_COUNT from the inner SELECT (the
one I'm interested in).

Is this expected and correct behavior?  Is it possible to obtain the
first ROW_COUNT (after SELECT) without performing it twice?  Thanks,

./danfe



Re: Aggregate functions with FROM clause and ROW_COUNT diagnostics

2018-05-21 Thread Alexey Dokuchaev
On Mon, May 21, 2018 at 10:45:27AM -0400, Tom Lane wrote:
> OP could do something like
> 
> declare r record;
> ...
> select json_agg(_) as j, count(*) as c INTO r FROM (
>   SELECT foo, bar, baz ...
>   FROM t1, t2, t3 WHERE ...) AS _;
> 
> This would be slightly more expensive than doing only the one aggregate,
> but it should beat anything involving a temp table.

Thanks, I've arrived at the same solution (using local RECORD) eventually.
It works as intended, but I still need to assign OUT parameters by hand:

retcode := tmp.c;
result := tmp.j;

I'd love to get rid of OUT parameters and return RECORD directly (it also
would make the code more "functional" in style), but then I'd be getting
this annoying ``a column definition list is required for functions
returning "record"'' error which I don't know how to avoid.  Using OUT's
simplifies things for callers (they don't have to annotate the types by
hand) while still preserving type safety.

On Mon, May 21, 2018 at 07:30:44AM -0700, David G. Johnston wrote:
> > ...
> > Is this expected and correct behavior?
>
> Yes, the query returned only one row, with a single json column.  You
> wrote the equivalent of:
>
> SELECT json_agg(...) FROM ... INTO result;
>
> And you are getting the count of the top-most select (which is implied
> in the syntax that you used).

I see, thanks; I've neglected the fact that there's still another implicit
SELECT being involved.

./danfe



Re: Aggregate functions with FROM clause and ROW_COUNT diagnostics

2018-05-21 Thread Alexey Dokuchaev
On Mon, May 21, 2018 at 12:20:52PM -0400, Tom Lane wrote:
> Alexey Dokuchaev  writes:
> > On Mon, May 21, 2018 at 10:45:27AM -0400, Tom Lane wrote:
> >> OP could do something like
> >>select json_agg(_) as j, count(*) as c INTO r FROM (
> 
> > Thanks, I've arrived at the same solution (using local RECORD) eventually.
> > It works as intended, but I still need to assign OUT parameters by hand:
> 
> > retcode := tmp.c;
> > result := tmp.j;
> 
> Actually, it should work to do
> 
>   select json_agg(_), count(*) INTO result, retcode FROM ...

Bingo!  Thanks Tom, you're the best (as always).  Quick reality check
question: are count(*) vs. count(_) equivalent above?  I vaguely recall
that count(1) got somewhat slower than count(*) after 8.2, but cannot
back it up right now.

./danfe



Re: Aggregate functions with FROM clause and ROW_COUNT diagnostics

2018-05-21 Thread Alexey Dokuchaev
On Mon, May 21, 2018 at 12:54:56PM -0400, Tom Lane wrote:
> Alexey Dokuchaev  writes:
> > Quick reality check question: are count(*) vs. count(_) equivalent
> > above?
> 
> Only if _ is guaranteed non-null ... which, as a rowtype result, it
> probably is.  But I'd use count(*) if you're concerned about speed.

Understood.  Yes, my primary concern (after correctness) is speed.  Thank
you all again for very helpful and prompt replies guys.

./danfe



(2^63 - 1)::bigint => out of range? (because of the double precision)

2018-06-08 Thread Alexey Dokuchaev
Hi there,

I've decided to run some tests to see how my tables' ids would survive
when their yielding sequences would start hitting their MAXVALUE's, by
doing some "SELECT setval('foo_id_seq', ~maxbigint)".  As I don't like
to hardcode numbers (esp. huge numbers, because sequences are always[*]
bigint's), I've tried to use (2^63 - 1)::bigint as "maxbigint", to no
avail, in contrast to (2^31 - 1)::int (-> below is short mnemonic for
"returns"):

  select (2^31 - 1)::int-> 2147483647 (correct)

  select (2^63 - 1)::bigint -> bigint out of range (???)
  select (9223372036854775807)::bigint  -> 9223372036854775807 (correct)

Apparently, this is because the type of 2^63 is double precision, which
is inexact; if I explicitly cast any of 2 or 63 to ::numeric, it behaves
as expected:

  select (2::numeric^63 - 1)::bigint-> 9223372036854775807 (ok)
  select (2^63::numeric - 1)::bigint-> 9223372036854775807 (ditto)

What is the rationale for (int ^ int) to return double precision rather
than numeric?  I am missing something obvious here?

./danfe

P.S.  On a tangentally related note, why is "NO CYCLE" is the default
for sequences?

[*] Per documentation, "The [SQL] standard's AS  expression
is not supported."  Another "why is it so?" question, btw. ;-)



Re: (2^63 - 1)::bigint => out of range? (because of the double precision)

2018-06-09 Thread Alexey Dokuchaev
On Fri, Jun 08, 2018 at 10:30:45AM -0700, Adrian Klaver wrote:
> On 06/08/2018 10:23 AM, Alexey Dokuchaev wrote:
> >I've decided to run some tests to see how my tables' ids would survive
> >when their yielding sequences would start hitting their MAXVALUE's, by
> >doing some "SELECT setval('foo_id_seq', ~maxbigint)".  As I don't like
> >to hardcode numbers (esp. huge numbers, because sequences are always[*]
> >bigint's)
> 
> Not always, bigints are just the default. All those cases where folks
> use the serial 'type' are getting an int sequence:
> 
> \d+ serial_test2_id_seq
> Sequence "public.serial_test2_id_seq"
>   Type   | Start | Minimum |  Maximum   | Increment | Cycles? | Cache
> -+---+-++---+-+---
>  integer | 1 |   1 | 2147483647 | 1 | no  | 1

Apparently this is only in 10+, while I'm kind of confined to 9.x ATM.

> >What is the rationale for (int ^ int) to return double precision rather
> >than numeric?  I am missing something obvious here?
> 
> Not sure, someone else will have to explain.

Tom already did, thanks Tom!

> >On a tangentally related note, why is "NO CYCLE" is the default
> >for sequences?
> 
> My guess is because sequences are often used to provide numbers for a
> PRIMARY KEY and NO CYCLE is a heads up for key duplication before the
> PK code kicks in.

OK, but what about highly volatile tables for come-and-go type of things?
Think of a session pool, or task queue.  I want to use NO CYCLE for this
kind of tables as it would allow me to never worry about hitting "nextval:
reached maximum value of sequence" error, recycle ids (because they come
and go), and still be safe because PK constraint protects me.  Any flaws
in this vision of mine?

./danfe



Re: (2^63 - 1)::bigint => out of range? (because of the double precision)

2018-06-09 Thread Alexey Dokuchaev
On Sat, Jun 09, 2018 at 07:20:17AM -0700, Adrian Klaver wrote:
> On 06/09/2018 05:24 AM, Alexey Dokuchaev wrote:
> >OK, but what about highly volatile tables for come-and-go type of things?
> >Think of a session pool, or task queue.  I want to use NO CYCLE for this
> >kind of tables as it would allow me to never worry about hitting "nextval:
> >reached maximum value of sequence" error, recycle ids (because they come
> >and go), and still be safe because PK constraint protects me.  Any flaws
> >in this vision of mine?
> 
> Assuming you meant CYCLE not NO CYCLE, I see no issue.

Oh, mea culpa, I meant CYCLE of course (in the quoted paragraph above).

> If you do use a sequence with NO CYCLE you can use ALTER SEQUENCE some_seq
> RESTART to reset it:
> https://www.postgresql.org/docs/10/static/sql-altersequence.html

I understand that I can reset it; the idea was to minimize the table and
sequence maintenance while allowing it to work, well, forever.  Hence the
CYCLE idea.  Anyway, I've heard you, thanks Adrian.

./danfe



Catching unique_violation exception on specific column/index

2018-06-11 Thread Alexey Dokuchaev
Hi there,

I have a table with several UNIQUE and CHECK constraints.  One of these
UNIQUE constraints actually *can* be violated -- not on the table level,
of course, but on the application level -- meaning, if the entry with
particular foo_key is already in there, do not throw an exception, just
silently do nothing.

The usual approach ("EXCEPTION WHEN unique_violation THEN ... END") does
not really cut it because I want to catch unique_violation only when it
happens on "foo_key", and still rightfully complain on others.  However,
there is no "EXCEPTION WHEN unique_violation ON (foo_key)" or something.
Is there a way to do this without using triggers and in a less ugly way
than the code below?

IF SQLERRM = 'duplicate key value violates unique constraint' ||
  ' "foo_key"' THEN
RAISE NOTICE '%: %', SQLSTATE, SQLERRM;
ELSE
RAISE EXCEPTION '%: %', SQLSTATE, SQLERRM;
END IF;

./danfe



Re: Catching unique_violation exception on specific column/index

2018-06-11 Thread Alexey Dokuchaev
On Mon, Jun 11, 2018 at 05:10:33PM +0700, Alexey Dokuchaev wrote:
> The usual approach ("EXCEPTION WHEN unique_violation THEN ... END") does
> not really cut it because I want to catch unique_violation only when it
> happens on "foo_key", and still rightfully complain on others.  However,
> there is no "EXCEPTION WHEN unique_violation ON (foo_key)" or something.
> Is there a way to do this without using triggers and in a less ugly way
> than the code below?

Oh, I completely forgot about rules:

CREATE OR REPLACE RULE ignore_duplicate_foo AS
  ON INSERT TO table_bar WHERE EXISTS (
SELECT 1 FROM table_bar WHERE foo = NEW.foo)
  DO INSTEAD NOTHING;

Sorry for the noise.

./danfe



Re: Catching unique_violation exception on specific column/index

2018-06-11 Thread Alexey Dokuchaev
On Mon, Jun 11, 2018 at 12:30:13PM +0200, Thomas Kellerer wrote:
> Alexey Dokuchaev schrieb am 11.06.2018 um 12:10:
> > I have a table with several UNIQUE and CHECK constraints.  One of these
> > UNIQUE constraints actually *can* be violated -- not on the table level,
> > of course, but on the application level -- meaning, if the entry with
> > particular foo_key is already in there, do not throw an exception, just
> > silently do nothing.
>  
> What's wrong with:
> 
>   INSERT ...
>   ON CONFLICT (foo_key) DO NOTHING

Nothing I guess, except that it is available since 9.5 (right?), and I try
to stay compatible with 9.3.  Sorry for not saying this in the first place.

./danfe



Re: Catching unique_violation exception on specific column/index

2018-06-11 Thread Alexey Dokuchaev
On Mon, Jun 11, 2018 at 01:26:16PM +0200, Thomas Kellerer wrote:
> If that functionality is an important part of your code, you should
> consider upgrading to 10 (or 9.6 if your are really conservative)
> rather sooner than later.

Oh well, fair enough.  As much as I'd love to stick to the lowest
supported (and sometimes even unsupported) versions, ON CONFLICT is
indeed very handy, esp. since I have a few UPSERT's implemented the
old way already (via catching the "unique_violation" exception).

Shall I update to 9.6/10, I have a bit off-topic (to the original
subject) question: right now, when I need to get the length of an
array (never multidimensional), I do this:

coalesce(array_length(foo, 1), 0);

In 9.4+, I can call cardinality().  I'm a bit hesitant: is doing so
semantically correct, or should I still do coalesce(..., 0) in this
case?  This is not about the outcome, it's whether cardinality() is
semantically correct to obtain the number of the array items, or it
was introduced for other means?

./danfe