Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Barry Kimelman
On Fri, Sep 23, 2022 at 4:42 PM Steve Baldwin 
wrote:

> You need to prefix the rhs column(s) with 'excluded.'. For example:
>
> on conflict (company_name) do update set company_name =
> concat(excluded.company_name,'++',excluded.string_company_id);
>
> On Sat, Sep 24, 2022 at 7:28 AM Barry Kimelman 
> wrote:
>
>>
>> I have not been able to get the "ON CONFLICT" clause to work
>> I am using psql 14.4 on ubuntu
>>
>> Given the following table definition
>>
>> CREATE TABLE my_companies (
>>   company_id SERIAL NOT NULL PRIMARY KEY,
>>   second_id INTEGER NOT NULL REFERENCES big_list(second_id),
>>   string_company_id TEXT NOT NULL,
>>   company_name TEXT NOT NULL,
>>   person_name TEXT NOT NULL
>> )
>>   INHERITS ( _audit_table, _delete_table );
>>
>> and the following constraints
>>
>> CREATE UNIQUE INDEX my_companies_id_unique ON
>> my_companies(string_company_id) WHERE delete_timestamp IS NULL;
>> CREATE UNIQUE INDEX my_companies_company_name_unique ON
>> my_companies(company_name) WHERE delete_timestamp IS NULL;
>>
>> I issued the following sql insert in an attempt to use "on conflict" to
>> update the
>> duplicate column name
>>
>> insert into my_companies
>> (second_id,string_company_id,company_name,person_name)
>> values (1,'66','widgets','seller-toto')
>> on conflict (company_name) do update set company_name =
>> concat(company_name,'++',string_company_id);
>>
>> In this example a record with a company name of 'widgets' already exists
>> and will
>> result in an constraint violation
>>
>> when I ran my sql statement I received the following error message
>>
>> bkimelman=# \i insert-companies-1c.sql
>> psql:insert-companies-1c.sql:3: ERROR:  column reference "company_name"
>> is ambiguous
>> LINE 3: ...company_name) do update set company_name = concat(company_na...
>>
>> I tried fully qualifying the column names in the concat() function call,
>> but all that did was get me a different error message
>>
>> What would be the proper format for the "on conflict" clause ?
>>
>

Thanks for the response. But when I "prefixed" the columns with "excluded"
I got a different error message

ERROR: there is no unique or exclusion constraint matching the ON CONFLICT
specification


Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Christophe Pettus



> On Sep 24, 2022, at 07:29, Barry Kimelman  wrote:
> 
> CREATE UNIQUE INDEX my_companies_company_name_unique ON 
> my_companies(company_name) WHERE delete_timestamp IS NULL;

The issue here is that the unique index is partial (it has a WHERE clause).  In 
order to use that as an arbiter, you need include a WHERE clause in the ON 
CONFLICT clause that matches the one on the index.

I believe that something like:

insert into my_companies (second_id,string_company_id,company_name,person_name)
values (1,'66','widgets','seller-toto')
on conflict (company_name) where delete_timestamp IS NULL do update set 
company_name = concat(company_name,'++',string_company_id)

... will work.  Note that if you do an insert with a duplicate "company_name", 
but "delete_timestamp" not null, it *won't* treat that as a conflict and won't 
run the ON CONFLICT action; it'll just insert the row.



Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Barry Kimelman
On Sat, Sep 24, 2022 at 9:47 AM Christophe Pettus  wrote:

>
>
> > On Sep 24, 2022, at 07:29, Barry Kimelman  wrote:
> >
> > CREATE UNIQUE INDEX my_companies_company_name_unique ON
> my_companies(company_name) WHERE delete_timestamp IS NULL;
>
> The issue here is that the unique index is partial (it has a WHERE
> clause).  In order to use that as an arbiter, you need include a WHERE
> clause in the ON CONFLICT clause that matches the one on the index.
>
> I believe that something like:
>
> insert into my_companies
> (second_id,string_company_id,company_name,person_name)
> values (1,'66','widgets','seller-toto')
> on conflict (company_name) where delete_timestamp IS NULL do update set
> company_name = concat(company_name,'++',string_company_id)
>
> ... will work.  Note that if you do an insert with a duplicate
> "company_name", but "delete_timestamp" not null, it *won't* treat that as a
> conflict and won't run the ON CONFLICT action; it'll just insert the row.



Thanks for the response. When I ran the INSERT with your suggested change I
got an error message telling me
"column reference 'company_name' is ambiguous"


Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Christophe Pettus



> On Sep 24, 2022, at 08:29, Barry Kimelman  wrote:
> Thanks for the response. When I ran the INSERT with your suggested change I 
> got an error message telling me
> "column reference 'company_name' is ambiguous"

As previously noted, you'll need to do both: add "excluded." to qualify the 
column names in the UPDATE.  Here's a contrived example:

xof=# create table t(i integer, j integer, k integer);
CREATE TABLE
xof=# create unique index on t(i) where j != 0;
CREATE INDEX
xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) do update set 
k=k+1;
ERROR:  column reference "k" is ambiguous
LINE 1: ..., j, k) values(1, 2, 3) on conflict (i) do update set k=k+1;
xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) do update set 
k=excluded.k+1;
ERROR:  there is no unique or exclusion constraint matching the ON CONFLICT 
specification
xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) where j != 0 do 
update set k=excluded.k+1;
INSERT 0 1





Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Barry Kimelman
On Sat, Sep 24, 2022 at 10:44 AM Christophe Pettus  wrote:

>
>
> > On Sep 24, 2022, at 08:29, Barry Kimelman  wrote:
> > Thanks for the response. When I ran the INSERT with your suggested
> change I got an error message telling me
> > "column reference 'company_name' is ambiguous"
>
> As previously noted, you'll need to do both: add "excluded." to qualify
> the column names in the UPDATE.  Here's a contrived example:
>
> xof=# create table t(i integer, j integer, k integer);
> CREATE TABLE
> xof=# create unique index on t(i) where j != 0;
> CREATE INDEX
> xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) do update set
> k=k+1;
> ERROR:  column reference "k" is ambiguous
> LINE 1: ..., j, k) values(1, 2, 3) on conflict (i) do update set k=k+1;
> xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) do update set
> k=excluded.k+1;
> ERROR:  there is no unique or exclusion constraint matching the ON
> CONFLICT specification
> xof=# insert into t(i, j, k) values(1, 2, 3) on conflict (i) where j != 0
> do update set k=excluded.k+1;
> INSERT 0 1
>
>
I added the excluded reference as you stated. I ran the INSERT and I did
indeed get back a response of "INSERT  0 1"
However, when I ran a select to look at the table, nothing had been
inserted.
I thought the whole point of ON CONFLICT DO UPDATE was so that you could
modify the data so that it would be inserted


Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Christophe Pettus



> On Sep 24, 2022, at 08:49, Barry Kimelman  wrote:
> I thought the whole point of ON CONFLICT DO UPDATE was so that you could 
> modify the data so that it would be inserted

ON CONFLICT DO UPDATE allows you to modify the existing row that conflicted 
with the row being inserted, but it won't allow you to do change a row to 
resolve a conflict, and then insert it.



Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Barry Kimelman
On Sat, Sep 24, 2022 at 10:56 AM Christophe Pettus  wrote:

>
>
> > On Sep 24, 2022, at 08:49, Barry Kimelman  wrote:
> > I thought the whole point of ON CONFLICT DO UPDATE was so that you could
> modify the data so that it would be inserted
>
> ON CONFLICT DO UPDATE allows you to modify the existing row that
> conflicted with the row being inserted, but it won't allow you to do change
> a row to resolve a conflict, and then insert it.


Unfortunately, that is exactly what I need to do, I need to modify the data
on the 2nd request so that it also gets inserted.
This is a business requirement. I need to insert all of the data, including
the ones with conflicts


Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Christophe Pettus



> On Sep 24, 2022, at 09:04, Barry Kimelman  wrote:
> Unfortunately, that is exactly what I need to do, I need to modify the data 
> on the 2nd request so that it also gets inserted.

The best approach is to optimistically insert the row, catch the error if 
there's a conflict, and modify the data and try again.  If a row with the same 
conflict can come from multiple sessions, you'll need to decide how to handle 
the case that some other session "wins" and inserts the row, and you get a 
conflict again.





Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread David G. Johnston
On Sat, Sep 24, 2022 at 8:49 AM Barry Kimelman  wrote:

>
> I thought the whole point of ON CONFLICT DO UPDATE was so that you could
> modify the data so that it would be inserted
>

Nope, the words "DO UPDATE" mean "DO an UPDATE command instead of failing
for the CONFLICTing INSERT command".  As mentioned, if you want to ensure
you perform an insert of a new record the data you are inserting must not
conflict with existing data.  You must do that prior to executing the
command.

Or, in the words of the documentation:

The optional ON CONFLICT clause specifies an alternative action to raising
a unique violation or exclusion constraint violation error.
...
ON CONFLICT DO UPDATE updates the existing row that conflicts with the row
proposed for insertion as its alternative action.

https://www.postgresql.org/docs/current/sql-insert.html

Which seems sufficiently clearly written.

David J.


Re: problem with on conflict / do update using psql 14.4

2022-09-24 Thread Adrian Klaver

On 9/24/22 09:04, Barry Kimelman wrote:


On Sat, Sep 24, 2022 at 10:56 AM Christophe Pettus > wrote:




 > On Sep 24, 2022, at 08:49, Barry Kimelman mailto:blkimel...@gmail.com>> wrote:
 > I thought the whole point of ON CONFLICT DO UPDATE was so that
you could modify the data so that it would be inserted

ON CONFLICT DO UPDATE allows you to modify the existing row that
conflicted with the row being inserted, but it won't allow you to do
change a row to resolve a conflict, and then insert it.


Unfortunately, that is exactly what I need to do, I need to modify the 
data on the 2nd request so that it also gets inserted.
This is a business requirement. I need to insert all of the data, 
including the ones with conflicts


Do the conflicting inserts need to be in the same table?

If not you could a trigger function that inserts the conflicts to 
another table and then UNION that table to the primary for query purposes.


--
Adrian Klaver
adrian.kla...@aklaver.com





Order by in a sub query when aggregating the main query

2022-09-24 Thread Federico
Hi all,

I have a question related to the order by clause used in a subquery of
a main query that uses one or more aggregation functions with a group
by.
A basic example of the type of query in question is the following (see
below for the actual query):

  select w, array_agg(x)
  from (
select v, v / 10 as w
from pg_catalog.generate_series(25, 0, -1) as t(v)
order by v
  ) as t(x)
  group by w

This query will return an ordered array as specified by the order by
clause.in the subquery.
Can this behaviour be relied upon?
>From what I could find from searching in SQL the order by in a
subquery could be ignored by the engines, but I've found that
postgresql will always respect it.

The context of the question is the updated reflection logic that will
be introduced in version 2 of SQLAlchemy, that makes use of orderby in
subqueries to, for example, match column index of a constraint with
the column name of a table. This query and other similar one return
the correct result, and they seem stable in their output (ie the CI is
not randomly failing because the order has changed). For more
information this potential issue with the current query is traket in
the issue https://github.com/sqlalchemy/sqlalchemy/issues/8561
Below is the full query that will be used in sqlalchemy to reflect
constraints given the constraint type and on a list of table oids:

  select
attr.conrelid,
array_agg(attr.attname) as cols,
attr.conname,
min(attr.description) as description
  from (
select
  con.conrelid as conrelid,
  con.conname as conname,
  con.description as description,
  pg_catalog.pg_attribute.attname as attname
from pg_catalog.pg_attribute
join (
  select
pg_catalog.pg_constraint.conrelid as conrelid,
pg_catalog.pg_constraint.conname as conname,
unnest(pg_catalog.pg_constraint.conkey) as attnum,
generate_subscripts(pg_catalog.pg_constraint.conkey,
%(generate_subscripts_1)s) as ord,
pg_catalog.pg_description.description as description
  from pg_catalog.pg_constraint
  left outer join pg_catalog.pg_description on
pg_catalog.pg_description.objoid = pg_catalog.pg_constraint.oid
  where
pg_catalog.pg_constraint.contype = :contype
and pg_catalog.pg_constraint.conrelid in (:oids)
  ) as con on
  pg_catalog.pg_attribute.attnum = con.attnum
  and pg_catalog.pg_attribute.attrelid = con.conrelid
order by con.conname, con.ord
  ) as attr
  group by attr.conrelid, attr.conname
  order by attr.conrelid, attr.conname

The other reflection queries that use order by in subqueries are
similar to the above, I can post them here if they may prove useful.

Thank you
Federico




Re: Order by in a sub query when aggregating the main query

2022-09-24 Thread Tom Lane
Federico  writes:
> A basic example of the type of query in question is the following (see
> below for the actual query):

>   select w, array_agg(x)
>   from (
> select v, v / 10 as w
> from pg_catalog.generate_series(25, 0, -1) as t(v)
> order by v
>   ) as t(x)
>   group by w

> This query will return an ordered array as specified by the order by
> clause.in the subquery.
> Can this behaviour be relied upon?

No, not really.  It might always work given a particular set of
circumstances.  As long as the planner chooses to do the outer
query's grouped aggregation as a HashAgg, there'd be no reason
for it to reshuffle the subquery output before feeding that to
array_agg.  However, if it decided that sort-group-and-aggregate
was better, it'd insert a sort by w above the subquery, and then
you'd lose any certainty of the ordering by v continuing to hold.
(Maybe the sort by w would be stable for equal keys, but that's
not guaranteed.)

What you really ought to do is write

  select w, array_agg(x order by x)
  from ...

to be in the clear per SQL standard.  I think that right now that'd
incur additional sorting overhead, which is annoying.  But work is
ongoing to recognize when the input is already correctly sorted
for an aggregate, so it should get better in PG 16 or so.

regards, tom lane




Re: Order by in a sub query when aggregating the main query

2022-09-24 Thread Federico
Understood, thanks for the explanation.
I'll work on updating the queries used by sqlalchemy to do array_agg(x
order by x) instead of the order by in the subquery.

> I think that right now that'd
> incur additional sorting overhead, which is annoying.  But work is
> ongoing to recognize when the input is already correctly sorted
> for an aggregate, so it should get better in PG 16 or so.

Nice to know, hopefully it's too bad for this use case

Thanks, Federico Caselli

On Sun, 25 Sept 2022 at 00:20, Tom Lane  wrote:
>
> Federico  writes:
> > A basic example of the type of query in question is the following (see
> > below for the actual query):
>
> >   select w, array_agg(x)
> >   from (
> > select v, v / 10 as w
> > from pg_catalog.generate_series(25, 0, -1) as t(v)
> > order by v
> >   ) as t(x)
> >   group by w
>
> > This query will return an ordered array as specified by the order by
> > clause.in the subquery.
> > Can this behaviour be relied upon?
>
> No, not really.  It might always work given a particular set of
> circumstances.  As long as the planner chooses to do the outer
> query's grouped aggregation as a HashAgg, there'd be no reason
> for it to reshuffle the subquery output before feeding that to
> array_agg.  However, if it decided that sort-group-and-aggregate
> was better, it'd insert a sort by w above the subquery, and then
> you'd lose any certainty of the ordering by v continuing to hold.
> (Maybe the sort by w would be stable for equal keys, but that's
> not guaranteed.)
>
> What you really ought to do is write
>
>   select w, array_agg(x order by x)
>   from ...
>
> to be in the clear per SQL standard.  I think that right now that'd
> incur additional sorting overhead, which is annoying.  But work is
> ongoing to recognize when the input is already correctly sorted
> for an aggregate, so it should get better in PG 16 or so.
>
> regards, tom lane