Re: survey: psql syntax errors abort my transactions

2020-07-06 Thread Laurenz Albe
On Fri, 2020-07-03 at 12:46 -0500, Ron wrote:
> > This is my favorite example why I like the way PostgreSQL does things:
> > 
> > /* poor man's VACUUM (FULL) */
> > BEGIN;
> > CREATTE TABLE t2 AS SELECT * FROM t1;
> > DROP TABLE t1;
> > ALTER TABLE t2 RENAME TO t1;
> > COMMIT;
> 
> How so, since it does not carry over indexes, foreign keys, triggers, 
> partition references, etc?

It is an example of what a transaction could look like that
would suffer from statement-level rollback.

I am not claimimg that that code as such is very useful.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





PostgreSQL server does not increment a SERIAL internally

2020-07-06 Thread Matthias Apitz


Hello,

Me and my team passed a full weekend hunting a bug in our Perl written
software were rows have been inserted with the same id 'acq_haushalt.hnr'
which should not have been the case because any budget year in that
table has a single internal number 'hnr'

The table in the 11.4 server is created as:

create table acq_haushalt (
  hnr  serialnot NULL ,   /*  internal budget year number primary key   */
  hjahr smallint   not NULL , /*  budget year  */
  stufe smallint   not NULL , /*  level  0,1,2,3*/
  kurzname char (16)  ,   /*  short name for ... */
  ...
  );

We update the serial 'acq_haushalt_hnr_seq' with this statement after loading:

/* table: acq_haushalt */
DO $$
DECLARE
  max_id int;
BEGIN
 if to_regclass('acq_haushalt') is not null then
  SELECT INTO max_id GREATEST(COALESCE(max(hnr), 0),0) + 1 FROM acq_haushalt;
  RAISE NOTICE '% % %', 'acq_haushalt', 'hnr', max_id ;
  EXECUTE 'ALTER SEQUENCE acq_haushalt_hnr_seq RESTART ' || max_id::text;
 end if;
END $$ LANGUAGE plpgsql;


Usage in Perl DBI to get the next value for acq_haushalt.hnr:

 if ( &getDBDriverName eq 'Pg') {
 $erg = &selectData("SELECT NEXTVAL('acq_haushalt_hnr_seq')",[]);
 if ($erg->{'CountData'} == 0) {
   $newhnr=1;
 }else{
   $newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}};
 }
 } else {   code block for Sybase ...

 }

But the serial was not incremented internally as we could see with
'psql' and so more than one row was build and inserted with the same
number in $newhnr.

What helped was using:

 $erg = &selectData("SELECT max(hnr) FROM acq_haushalt",[]);
 if ($erg->{'CountData'} == 0) {
  $newhnr=1;
 }else{
  $newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}}+1;
 }

What we are doing wrong?

Thanks

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!




Using Postgres jdbc driver with Oracle SQL Developer

2020-07-06 Thread Niels Jespersen
Hello all

Oracle SQL Developer allows you to add what they call a Third Party JDBC 
Driver. With that you can connect and use SQL Developer against Postgres. So 
far, so good. But when you connect, there is a button "Choose database" that 
fills a dropdown box with databasenames available on the cluster. This works 
great if you connect with the user named postgres, as it will connect to the 
database named postgres and check for available databases. However, if you try 
to connect using for instance username "user1" it will try to connect to a 
database named "user1" This mostly fails, since you most often does not have a 
database named exactly like your username.

I think this is a bug in Oracles call of Postgres jdbc, because it does not 
send a databasename, and the driver defaults to a database named as the 
username.

I may file a service request with Oracle to have this fixed down the road 
sometime.

In the meantime, is there a way to provide a default username to Postgres jdbc 
out-of-band from the actual calls from Oracle to Poostgres jdbc?

In the documentation 
https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
 is says "If a property is specified both in URL and in Properties object, the 
value from Properties object is ignored." How can I inject a property object 
into the gap between Oracle and Postgres jdbc?

Regards Niels Jespersen


SV: Using Postgres jdbc driver with Oracle SQL Developer

2020-07-06 Thread Niels Jespersen
Sorry, "way to provide a default username" should have been "way to provide a 
default databasename"

Fra: Niels Jespersen 
Sendt: 6. juli 2020 13:20
Til: pgsql-general@lists.postgresql.org
Emne: Using Postgres jdbc driver with Oracle SQL Developer

Hello all

Oracle SQL Developer allows you to add what they call a Third Party JDBC 
Driver. With that you can connect and use SQL Developer against Postgres. So 
far, so good. But when you connect, there is a button "Choose database" that 
fills a dropdown box with databasenames available on the cluster. This works 
great if you connect with the user named postgres, as it will connect to the 
database named postgres and check for available databases. However, if you try 
to connect using for instance username "user1" it will try to connect to a 
database named "user1" This mostly fails, since you most often does not have a 
database named exactly like your username.

I think this is a bug in Oracles call of Postgres jdbc, because it does not 
send a databasename, and the driver defaults to a database named as the 
username.

I may file a service request with Oracle to have this fixed down the road 
sometime.

In the meantime, is there a way to provide a default username to Postgres jdbc 
out-of-band from the actual calls from Oracle to Poostgres jdbc?

In the documentation 
https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters
 is says "If a property is specified both in URL and in Properties object, the 
value from Properties object is ignored." How can I inject a property object 
into the gap between Oracle and Postgres jdbc?

Regards Niels Jespersen


Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Robins Tharakan
Hi,

When an SQL needs to UNION constants on either side, it should be possible
to
implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect
understanding,
or something already discussed but rejected for some reason?

This need came up while reviewing generated SQL, where the need was to
return true when
at least one of two lists had a row. A simplified version is given below:

(SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
vs.
(select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); --
Faster




postgres=# explain analyse (select 1 from generate_series(1,1)) UNION
(select 1 from generate_series(1,1));
  QUERY
PLAN
--
 HashAggregate  (cost=550.00..750.00 rows=2 width=4) (actual
time=54.847..54.866 rows=1 loops=1)
   Group Key: (1)
   ->  Append  (cost=0.00..500.00 rows=2 width=4) (actual
time=0.782..40.215 rows=2 loops=1)
 ->  Function Scan on generate_series  (cost=0.00..100.00
rows=1 width=4) (actual time=0.780..7.542 rows=1 loops=1)
 ->  Function Scan on generate_series generate_series_1
 (cost=0.00..100.00 rows=1 width=4) (actual time=0.929..7.706
rows=1 loops=1)
 Planning Time: 0.055 ms
 Execution Time: 55.535 ms
(7 rows)

postgres=# explain analyse (select 1 from generate_series(1,1) limit 1)
UNION (select 1 from generate_series(1,1) limit 1);

QUERY PLAN
--
 Unique  (cost=0.07..0.08 rows=2 width=4) (actual time=1.292..1.298 rows=1
loops=1)
   ->  Sort  (cost=0.07..0.07 rows=2 width=4) (actual time=1.290..1.292
rows=2 loops=1)
 Sort Key: (1)
 Sort Method: quicksort  Memory: 25kB
 ->  Append  (cost=0.00..0.06 rows=2 width=4) (actual
time=0.554..1.266 rows=2 loops=1)
   ->  Limit  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.552..0.554 rows=1 loops=1)
 ->  Function Scan on generate_series
 (cost=0.00..100.00 rows=1 width=4) (actual time=0.550..0.551 rows=1
loops=1)
   ->  Limit  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.706..0.707 rows=1 loops=1)
 ->  Function Scan on generate_series generate_series_1
 (cost=0.00..100.00 rows=1 width=4) (actual time=0.704..0.705 rows=1
loops=1)
 Planning Time: 0.096 ms
 Execution Time: 1.847 ms
(11 rows)


postgres=# select version();
  version
---
 PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.2.1
20170915 (Red Hat 7.2.1-2), 64-bit
(1 row)

-
robins


Re: PostgreSQL server does not increment a SERIAL internally

2020-07-06 Thread Sándor Daku
Hi Matthias,

On Mon, 6 Jul 2020 at 11:43, Matthias Apitz  wrote:

>
> Hello,
>
> Me and my team passed a full weekend hunting a bug in our Perl written
> software were rows have been inserted with the same id 'acq_haushalt.hnr'
> which should not have been the case because any budget year in that
> table has a single internal number 'hnr'
>
> The table in the 11.4 server is created as:
>
> create table acq_haushalt (
>   hnr  serialnot NULL ,   /*  internal budget year number primary key
>  */
>   hjahr smallint   not NULL , /*  budget year  */
>   stufe smallint   not NULL , /*  level  0,1,2,3*/
>   kurzname char (16)  ,   /*  short name for ... */
>   ...
>   );
>
> We update the serial 'acq_haushalt_hnr_seq' with this statement after
> loading:
>

What does "loading" mean, and why do you reset the sequence after loading?
(And as I can see you setting it to the value it most likely already has.)
My guess is that your problem lurks somewhere here as in certain
circumstances you reset it to an incorrect(previous) value.


> /* table: acq_haushalt */
> DO $$
> DECLARE
>   max_id int;
> BEGIN
>  if to_regclass('acq_haushalt') is not null then
>   SELECT INTO max_id GREATEST(COALESCE(max(hnr), 0),0) + 1 FROM
> acq_haushalt;
>   RAISE NOTICE '% % %', 'acq_haushalt', 'hnr', max_id ;
>   EXECUTE 'ALTER SEQUENCE acq_haushalt_hnr_seq RESTART ' || max_id::text;
>  end if;
> END $$ LANGUAGE plpgsql;
>
>
> Usage in Perl DBI to get the next value for acq_haushalt.hnr:
>
>  if ( &getDBDriverName eq 'Pg') {
>  $erg = &selectData("SELECT NEXTVAL('acq_haushalt_hnr_seq')",[]);
>  if ($erg->{'CountData'} == 0) {
>$newhnr=1;
>  }else{
>
>  $newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}};
>  }
>  } else {   code block for Sybase ...
>
>  }
>
> But the serial was not incremented internally as we could see with
> 'psql' and so more than one row was build and inserted with the same
> number in $newhnr.
>
> What helped was using:
>
>  $erg = &selectData("SELECT max(hnr) FROM acq_haushalt",[]);
>  if ($erg->{'CountData'} == 0) {
>   $newhnr=1;
>  }else{
>
> $newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}}+1;
>  }
>

Where do you want to insert this $newhnr?


> What we are doing wrong?
>
> Thanks
>
> matthias
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/
> +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
> May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
>
>
>


Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Francisco Olarte
Robins:

On Mon, Jul 6, 2020 at 1:37 PM Robins Tharakan  wrote:

> When an SQL needs to UNION constants on either side, it should be possible to
> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect 
> understanding,
> or something already discussed but rejected for some reason?

Maybe the optimization does not hold its weight. I mean, the increased
complexity in the optimizer, bigger memory footprint, testing and
developer usage, is not worth it.

> This need came up while reviewing generated SQL, where the need was to return 
> true when
> at least one of two lists had a row. A simplified version is given below:

I.e., I do not think this is a "need", specially in generated SQL,
seems more like a deficiency in the generator ( specially since
generators are able, and some do it, to massively overcondition the
generated code to insure the optimizer does not miss anything ), and
wrapping things in a limit 1 when just testing for row existence seems
easy to do while generating.

> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
> vs.
> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- 
> Faster

As an aside, isn't it easier, more correct ( in my opinion ) and
clearer to just use exists for row-existence test? Because you have to
at least see it there is a result above, probably using exists, and
you can do...

select exists(SELECT 1 FROM pg_class) or exists(SELECT 1 FROM pg_class);

to get a direct boolean and benefit from shortcircuiting, by putting
the most likely one first, and from the internal knowledge the
optimizer may have to not fully evaluate queries, which may be greater
than deducting from the union/limit case.

Francisco Olarte.




Re: PostgreSQL server does not increment a SERIAL internally

2020-07-06 Thread Matthias Apitz
El día Montag, Juli 06, 2020 a las 01:58:04 +0200, Sándor Daku escribió:

> > We update the serial 'acq_haushalt_hnr_seq' with this statement after
> > loading:
> >
> 
> What does "loading" mean, and why do you reset the sequence after loading?
> (And as I can see you setting it to the value it most likely already has.)
> My guess is that your problem lurks somewhere here as in certain
> circumstances you reset it to an incorrect(previous) value.

Hello Sándor,

All the tables (~400) are loaded from an export in CSV like format done
from the same Sybase ASE database with this commands for any table:

08:08:00 TRUNCATE TABLE acq_haushalt ;
08:08:00 TRUNCATE TABLE
08:08:01 \COPY acq_haushalt FROM 
'/home/sisis/guru/sisisDBsrap14/CC-acq_haushalt.load' WITH ( NULL '' , 
DELIMITER '|' )
08:08:01 COPY 2862

and afterwards we have to adjust the serials to the highest used value
with the shown command:
> 
> 
> > /* table: acq_haushalt */
> > DO $$
> > DECLARE
> >   max_id int;
> > BEGIN
> >  if to_regclass('acq_haushalt') is not null then
> >   SELECT INTO max_id GREATEST(COALESCE(max(hnr), 0),0) + 1 FROM
> > acq_haushalt;
> >   RAISE NOTICE '% % %', 'acq_haushalt', 'hnr', max_id ;
> >   EXECUTE 'ALTER SEQUENCE acq_haushalt_hnr_seq RESTART ' || max_id::text;
> >  end if;
> > END $$ LANGUAGE plpgsql;

It's output (for this table) was:

NOTICE:  acq_haushalt hnr 3183

which is correct because it matches the highest value +1 of 'acq_haushalt.hnr'. 

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!




Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Simon Riggs
On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:


> When an SQL needs to UNION constants on either side, it should be possible
> to
> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect
> understanding,
> or something already discussed but rejected for some reason?
>
> This need came up while reviewing generated SQL, where the need was to
> return true when
> at least one of two lists had a row. A simplified version is given below:
>
> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
> vs.
> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1);
> -- Faster
>

Those two queries aren't logically equivalent, so you can't apply the LIMIT
1 as an optimization.

First query returns lots of random rows, the second query returns just one
random row.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

Mission Critical Databases


Re: SV: Using Postgres jdbc driver with Oracle SQL Developer

2020-07-06 Thread Tom Lane
Niels Jespersen  writes:
> Sorry, "way to provide a default username" should have been "way to provide a 
> default databasename"

Not sure, but you'd be more likely to find a knowledgeable answer
on the pgsql-jdbc list.

regards, tom lane




Re: SV: Using Postgres jdbc driver with Oracle SQL Developer

2020-07-06 Thread Dave Cramer
On Mon, 6 Jul 2020 at 09:33, Tom Lane  wrote:

> Niels Jespersen  writes:
> > Sorry, "way to provide a default username" should have been "way to
> provide a default databasename"
>
> Not sure, but you'd be more likely to find a knowledgeable answer
> on the pgsql-jdbc list.
>
>
Same way, in the properties object. But I'm totally unfamiliar with Oracle
SQL Developer. Can you provide properties at all ?

Dave


Re: PostgreSQL server does not increment a SERIAL internally

2020-07-06 Thread Adrian Klaver

On 7/6/20 2:43 AM, Matthias Apitz wrote:


Hello,

Me and my team passed a full weekend hunting a bug in our Perl written
software were rows have been inserted with the same id 'acq_haushalt.hnr'
which should not have been the case because any budget year in that
table has a single internal number 'hnr'

The table in the 11.4 server is created as:

create table acq_haushalt (
   hnr  serialnot NULL ,   /*  internal budget year number primary key   */


Is this the complete definition, I'm not seeing PRIMARY KEY?


   hjahr smallint   not NULL , /*  budget year  */
   stufe smallint   not NULL , /*  level  0,1,2,3*/
   kurzname char (16)  ,   /*  short name for ... */
   ...
   );

We update the serial 'acq_haushalt_hnr_seq' with this statement after loading:

/* table: acq_haushalt */
DO $$
DECLARE
   max_id int;
BEGIN
  if to_regclass('acq_haushalt') is not null then
   SELECT INTO max_id GREATEST(COALESCE(max(hnr), 0),0) + 1 FROM acq_haushalt;


The GREATEST() is redundant, the COALSESCE is going to yield either 0 or 
a number > 0.



   RAISE NOTICE '% % %', 'acq_haushalt', 'hnr', max_id ;
   EXECUTE 'ALTER SEQUENCE acq_haushalt_hnr_seq RESTART ' || max_id::text;


You don't need to cast max_id.


  end if;
END $$ LANGUAGE plpgsql;




So what are you trying to do with the code below, create a new row or 
something else?




Usage in Perl DBI to get the next value for acq_haushalt.hnr:

  if ( &getDBDriverName eq 'Pg') {
  $erg = &selectData("SELECT NEXTVAL('acq_haushalt_hnr_seq')",[]);
  if ($erg->{'CountData'} == 0) {
$newhnr=1;
  }else{
$newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}};
  }
  } else {   code block for Sybase ...

  }

But the serial was not incremented internally as we could see with
'psql' and so more than one row was build and inserted with the same
number in $newhnr.

What helped was using:

  $erg = &selectData("SELECT max(hnr) FROM acq_haushalt",[]);
  if ($erg->{'CountData'} == 0) {
   $newhnr=1;
  }else{
   $newhnr=$erg->{'Data'}->[0]->{$erg->{'Fields'}->[0]->{'Name'}}+1;
  }

What we are doing wrong?

Thanks

matthias




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: SV: Using Postgres jdbc driver with Oracle SQL Developer

2020-07-06 Thread legrand legrand
Try Replacing hostname by hostname/Database?
Don’t Forget ?

Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Pgbench : vacuum default behaviour history

2020-07-06 Thread Christophe Courtois
Hi,

We're wondering why pgbench behaves this way by default:
"With neither -n nor -v,
pgbench will vacuum the pgbench_tellers and pgbench_branches tables, and
will truncate pgbench_history."

Why pgbench_accounts not vacuumed by default?

I've dug the history of pgbench until 1999. The vaccum of
tellers|branches was added when creating the pgbench contrib in Jan
2000, but I've not found any explanation or discussion.

My hypothesis is that vaccuuming the accounts was too long in 1999, so
it could be skipped for quick tests. And it was never changed.

Did I miss something?

-- 
Christophe Courtois
Consultant Dalibo
https://dalibo.com/




Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Sebastien Arod
Hi all,

I'm trying to optimize the following query on postgres 11.6 (running on
Aurora)
select distinct
  c1,
  first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
  first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
  first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from
  t;

>From the explain plan (attached at the end of the email) I see that
postgresql is doing several sorts one per window function and one for the
distinct that seems ok.
However all the window functions being on the same partition I would have
expected postgresql to "share" a preliminary sort on c1 that would then be
useful to reduce the work on all window functions but it doesn't.
I even created an index on c1 hoping that postgresql would be able to use
it in order to minimize the cost of the sorts but I couldn't make it use it.

Is there something I am missing?

You can find below a script to set up a table and data to reproduce as well
as the explain plan.


*Setup Script*
create table t(
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);

insert into t
(pk, c1, c2, c3, c4 )
select
 generate_series::text pk,
 'Grp' ||(generate_series / 4)::text c1,
 generate_series::text c2,
 generate_series::text c3,
 generate_series::text c4
from generate_series(0, 100);

*Explain Plan*
Unique  (cost=808480.87..820980.88 rows=101 width=123) (actual
time=7131.675..7781.082 rows=250001 loops=1)
  ->  Sort  (cost=808480.87..810980.87 rows=101 width=123) (actual
time=7131.673..7603.926 rows=101 loops=1)
Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER
(?)), (first_value(c4) OVER (?))
Sort Method: external merge  Disk: 59640kB
->  WindowAgg  (cost=558937.90..578937.92 rows=101 width=123)
(actual time=5179.374..6268.937 rows=101 loops=1)
  ->  Sort  (cost=558937.90..561437.90 rows=101 width=91)
(actual time=5179.355..5679.136 rows=101 loops=1)
Sort Key: c1, c4
Sort Method: external merge  Disk: 52912kB
->  WindowAgg  (cost=336736.93..356736.95 rows=101
width=91) (actual time=3260.950..4389.116 rows=101 loops=1)
  ->  Sort  (cost=336736.93..339236.93 rows=101
width=59) (actual time=3260.934..3778.385 rows=101 loops=1)
Sort Key: c1, c3
Sort Method: external merge  Disk: 46176kB
->  WindowAgg  (cost=141877.96..161877.98
rows=101 width=59) (actual time=1444.692..2477.284 rows=101 loops=1)
  ->  Sort  (cost=141877.96..144377.96
rows=101 width=27) (actual time=1444.669..1906.993 rows=101 loops=1)
Sort Key: c1, c2
Sort Method: external merge
 Disk: 39424kB
->  Seq Scan on t
 (cost=0.00..18294.01 rows=101 width=27) (actual time=0.011..177.815
rows=101 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms


Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Michael Lewis
On Mon, Jul 6, 2020 at 5:37 AM Robins Tharakan  wrote:

> This need came up while reviewing generated SQL, where the need was to
> return true when
> at least one of two lists had a row.
>

Generated SQL... yep. That will happen. Manual SQL may be more work, but
often has significant reward.

If you can change how the SQL is generated, I would expect that EXISTS
would likely be more performant in your case, even if you need to do UNION
ALL between the two current queries that potentially return true.


Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Michael Lewis
Does this give the same result and do the optimization you want?

select
  c1,
  min(c2) AS c2,
  min(c3) AS c3,
  min(c4) AS c4
from
  t
group by
  c1;

>


Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread David Rowley
On Tue, 7 Jul 2020 at 00:43, Simon Riggs  wrote:
>
> On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:
>
>>
>> When an SQL needs to UNION constants on either side, it should be possible to
>> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect 
>> understanding,
>> or something already discussed but rejected for some reason?
>>
>> This need came up while reviewing generated SQL, where the need was to 
>> return true when
>> at least one of two lists had a row. A simplified version is given below:
>>
>> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
>> vs.
>> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- 
>> Faster
>
>
> Those two queries aren't logically equivalent, so you can't apply the LIMIT 1 
> as an optimization.
>
> First query returns lots of random rows, the second query returns just one 
> random row.

I think the idea here is that because the target list contains only
constants that pulling additional rows from the query after the first
one will just be a duplicate row and never add any rows after the
UNION is processed.

David




Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread Simon Riggs
On Mon, 6 Jul 2020 at 21:49, David Rowley  wrote:

> On Tue, 7 Jul 2020 at 00:43, Simon Riggs  wrote:
> >
> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:
> >
> >>
> >> When an SQL needs to UNION constants on either side, it should be
> possible to
> >> implicitly apply a LIMIT 1 and get good speed up. Is this an incorrect
> understanding,
> >> or something already discussed but rejected for some reason?
> >>
> >> This need came up while reviewing generated SQL, where the need was to
> return true when
> >> at least one of two lists had a row. A simplified version is given
> below:
> >>
> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
> >> vs.
> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit
> 1); -- Faster
> >
> >
> > Those two queries aren't logically equivalent, so you can't apply the
> LIMIT 1 as an optimization.
> >
> > First query returns lots of random rows, the second query returns just
> one random row.
>
> I think the idea here is that because the target list contains only
> constants that pulling additional rows from the query after the first
> one will just be a duplicate row and never add any rows after the
> UNION is processed.
>

OK, I see. Are you saying you think it's a worthwhile optimization to
autodetect?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

Mission Critical Databases


Re: Apply LIMIT when computation is logically irrelevant

2020-07-06 Thread David Rowley
On Tue, 7 Jul 2020 at 09:03, Simon Riggs  wrote:
>
> On Mon, 6 Jul 2020 at 21:49, David Rowley  wrote:
>>
>> On Tue, 7 Jul 2020 at 00:43, Simon Riggs  wrote:
>> >
>> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan  wrote:
>> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
>> >> vs.
>> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); 
>> >> -- Faster
>> >
>> >
>> > Those two queries aren't logically equivalent, so you can't apply the 
>> > LIMIT 1 as an optimization.
>> >
>> > First query returns lots of random rows, the second query returns just one 
>> > random row.
>>
>> I think the idea here is that because the target list contains only
>> constants that pulling additional rows from the query after the first
>> one will just be a duplicate row and never add any rows after the
>> UNION is processed.
>
>
> OK, I see. Are you saying you think it's a worthwhile optimization to 
> autodetect?

I guess it's all about how much effort would be involved to detect
such cases vs how likely it is that we're going to speed up someone's
query.  I imagine it's not much effort to detect this, but also, this
is the first time I recall seeing this mentioned, so perhaps that
means not many people would be rewarded by making such a change. (It
does seem like quite a strange way to express the query.)

There is currently a patch floating around that implements UniqueKeys
which allows RelOptInfos to be tagged with the properties that they're
unique on.  With the current design of that patch, there is no way to
say "this relation *only* has duplicate rows".  Perhaps some design
tweaks there can make detecting this case cheaper in terms of CPU
effort during planning, and perhaps also in terms of how much code it
would take to make it work.  I'll try to keep this in mind when I
review that work soon.   If we were to start adding optimisations for
cases such as this, then I'd rather they were done in some
general-purpose way that just makes them "just work" rather than
adding special cases around the codebase that trigger some special
behaviour.  Likely in this case, it would still take at least some
code when planning setops.

Additionally, the setops code is badly in need of a rewrite, so the
bar is likely pretty high on adding too many smarts in there that we
need to migrate forwards after a rewrite. The setops planning code is
still not making use of the upper planner pathification work that Tom
did years ago. In many cases UNION would be far more optimal if it
were implemented as an Index Scan -> MergeAppend -> Unique.  I think
right now we only consider Append -> Sort -> Unique and Append -> Hash
Aggregate.


David




Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Michael Lewis
Distinct is a great way to get quick results when writing quick &
dirty queries, but I rarely have them perform better than a re-write that
avoids the need. It collects a ton of results, orders them, and throws away
duplicates in the process. I don't love the idea of that extra work. Did
you say you have an index on c1?

select
  c1,
  sub1.c2,
  sub2.c3
from
  t
join lateral (select c2 from t1 where t1.c1 = t.c1 order by c2, c4 limit 1
) as sub1 on true
join lateral (select c3 from t1 where t1.c1 = t.c1 order by coalesce(c4,
'000'), c3 limit 1 ) as sub2 on true;

>

select
  c1,
  (select c2 from t1 where t1.c1 = t.c1 order by c2, c4 limit 1 ) AS c2,
  (select c3 from t1 where t1.c1 = t.c1 order by coalesce(c4, '000'), c3
limit 1 ) AS c3
from
  t;

I don't know the data, but I assume there may be many rows with the same c1
value, so then you would likely benefit from getting that distinct set
first like below as your FROM table.

*(select c1 from t group by c1 ) AS t*


Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread David G. Johnston
On Monday, July 6, 2020, Michael Lewis  wrote:

> Did you say you have an index on c1?
> [...]
> I don't know the data, but I assume there may be many rows with the same
> c1 value, so then you would likely benefit from getting that distinct set
> first like below as your FROM table.
>

Re-reading the original email I see both the answer to your question and
the data being queried.

David J.


Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread David G. Johnston
On Monday, July 6, 2020, Sebastien Arod  wrote:

> I would have expected postgresql to "share" a preliminary sort on c1 that
> would then be useful to reduce the work on all window functions but it
> doesn't.
>

The plan shown does share - the output of one sort goes into another.
Subsequent sorts still have to happen but they should be faster as the
first field is already grouped.  Doesn’t change the plan though.


> I even created an index on c1 hoping that postgresql would be able to use
> it in order to minimize the cost of the sorts but I couldn't make it use it.
>

Use it how?  You are still evaluating 250k groups so doing anything
piece-wise seems like an expected loss compared to sequential scan.

David J.


Re: Is postgres able to share sorts required by common partition window functions?

2020-07-06 Thread Sebastien Arod
Hi Michael,

I simplified the real query before posting it here and I now realize that I
oversimplified things.

Unfortunately the real query cannot be re-written with a group by.

Some of the window functions are more complex with order by clause using
complex expressions involving multiple columns.

The following example would be more realistic:

select distinct
  c1,
  first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2,
  first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3)
AS c3
from
  t;



On Mon, Jul 6, 2020 at 9:55 PM Michael Lewis  wrote:

> Does this give the same result and do the optimization you want?
>
> select
>   c1,
>   min(c2) AS c2,
>   min(c3) AS c3,
>   min(c4) AS c4
> from
>   t
> group by
>   c1;
>
>>


Efficiently advancing a sequence without risking it going backwards.

2020-07-06 Thread Paul McGarry
I have two sequences in different dbs which I want to keep roughly in sync
(they don't have to be exactly in sync, I am just keeping them in the same
ballpark).

Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large (ie
calling nextval a hundred thousand times).

I don't think I can use setval(), because it risks making sequences go
backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on
another process,  (2) would take the sequence back from 1235 to 1234 and I
would end up trying to create a duplicate key ID from the sequence.

So what I really want is something equivalent to the setval, but with
"where DB2sequence <1234" logic so it doesn't overwrite the value if it is
already large.

Is there such a mechanism?

Thanks for any help.

Paul


Re: Efficiently advancing a sequence without risking it going backwards.

2020-07-06 Thread Adrian Klaver

On 7/6/20 7:06 PM, Paul McGarry wrote:
I have two sequences in different dbs which I want to keep roughly in 
sync (they don't have to be exactly in sync, I am just keeping them in 
the same ballpark).


Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large 
(ie calling nextval a hundred thousand times).


I don't think I can use setval(), because it risks making sequences go 
backwards, eg:


1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on 
another process,  (2) would take the sequence back from 1235 to 1234 and 
I would end up trying to create a duplicate key ID from the sequence.


So what I really want is something equivalent to the setval, but with 
"where DB2sequence <1234" logic so it doesn't overwrite the value if it 
is already large.


Is there such a mechanism?


Well sequences are designed to be operated on independently from each 
session, so there is not much you can do about locking on a number. The 
best you can do is use setval() to increment the number by enough to get 
past any potential sequence advances in other sessions. Say advance by 
10, 50 or 100 depending on what you think is a reasonable number of 
other sessions also hitting the sequence.





Thanks for any help.

Paul



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: [HELP] Regarding how to install libraries

2020-07-06 Thread Praveen Kumar K S
Thank you. That solved the problem.

On Fri, Jul 3, 2020 at 8:24 PM Adrian Klaver 
wrote:

> On 7/3/20 7:21 AM, Adrian Klaver wrote:
> > On 7/3/20 4:37 AM, Praveen Kumar K S wrote:
> >> Hello,
> >>
> >> PostgreSQL libraries are required while compiling pgpool. I need to
> >> install postgresql-libs and postgresql-devel on Ubuntu Server 16.04
> >
> > Why not install pgpool from package?
> >
> > Assuming using PGDG repo:
> >
> > sudo apt install postgresql-12-pgpool2
>
> This was assuming you have Postgres 12 installed. If necessary modify to
> fit version actually installed.
>
> >
> >
> >>
> >> How can I install only these packages without having to install the
> >> entire postgres server ? TIA.
> >>
> >> --
> >> *Regards,
> >>
> >> *
> >> *K S Praveen Kumar
> >>
> >> *
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


-- 


*Regards,*


*K S Praveen KumarM: +91-9986855625 *


Re: libpq pipelineing

2020-07-06 Thread Samuel Williams
Hi,

Here are some initial numbers.

DB::Client
Warming up --
 db-postgres   281.000  i/100ms
  db-mariadb   399.000  i/100ms
  mysql2   533.000  i/100ms
  pg   591.000  i/100ms
Calculating -
 db-postgres  2.725k (± 1.8%) i/s - 13.769k in   5.053750s
  db-mariadb  3.990k (± 2.4%) i/s - 19.950k in   5.002453s
  mysql2  5.153k (± 4.7%) i/s - 26.117k in   5.079570s
  pg  5.772k (± 4.4%) i/s - 28.959k in   5.027423s

Comparison:
  pg: 5771.7 i/s
  mysql2: 5152.8 i/s - 1.12x  (± 0.00) slower
  db-mariadb: 3990.3 i/s - 1.45x  (± 0.00) slower
 db-postgres: 2725.5 i/s - 2.12x  (± 0.00) slower

The db-* gems are event driven. However, that is of less interest right now.

This benchmark creates a table, inserts 1000 (or some fixed number) of
rows, and then selects them all back out.

What I noticed is that `PQgetvalue` and `PQgetisnull` is not
particularly efficient, at least via FFI. Requires rows * columns * 2
FFI calls.

libmariadb provides `mysql_fetch_row` which returns a `char **` per
row. Requires only rows FFI calls.

Does a similar method exist for libpq? e.g. `PGgetrow(index) ->
char**` (array of strings, one for each column, may be nil to indicate
null).

Kind regards,
Samuel

On Tue, 30 Jun 2020 at 12:50, Samuel Williams
 wrote:
>
> Tom, I'm implementing a small abstraction layer for event-driven
> result streaming on PostgreSQL and MariaDB for Ruby, and I'll endeavor
> to report back with some numbers once I have enough of it working to
> benchmark something meaningful.
>
> Thanks for your patience and help.
>
> Kind regards,
> Samuel
>
> On Tue, 30 Jun 2020 at 02:06, Tom Lane  wrote:
> >
> > Samuel Williams  writes:
> > > Those methods don't seem to have an equivalent in libpq - you can use
> > > PQgetResult but it buffers all the rows. Using single row mode results
> > > in many results for each query (seems like a big overhead).
> >
> > Have you got any actual evidence for that?  Sure, the overhead is
> > more than zero, but does it mean anything in comparison to the other
> > costs of data transmission?
> >
> > > Maybe the statement about efficiency is incorrect, but it would be
> > > nice if you could incrementally stream a single result set more
> > > easily.
> >
> > More easily than what?  If we did not construct a PGresult then we would
> > need some other abstraction for access to the returned row, dealing with
> > error cases, etc etc.  That would mean a lot of very duplicative API code
> > in libpq, and a painful bunch of adjustments in client code.
> >
> > regards, tom lane