n_distinct off by a factor of 1000

2020-06-23 Thread Klaudie Willis
Friends,

I run Postgresql 12.3, on Windows. I have just discovered a pretty significant 
problem with Postgresql and my data. I have a large table, 500M rows, 50 
columns. It is split in 3 partitions by Year. In addition to the primary key, 
one of the columns is indexed, and I do lookups on this.

Select * from bigtable b where b.instrument_ref in (x,y,z,...)
limit 1000

It responded well with sub-second response, and it uses the index of the 
column. However, when I changed it to:

Select * from bigtable b where b.instrument_ref in (x,y,z,)
limit 1 -- (notice 10K now)

The planner decided to do a full table scan on the entire 500M row table! And 
that did not work very well. First I had no clue as to why it did so, and when 
I disabled sequential scan the query immediately returned. But I should not 
have to do so.

I got my first hint of why this problem occurs when I looked at the statistics. 
For the column in question, "instrument_ref" the statistics claimed it to be:

The default_statistics_target=500, and analyze has been run.
select * from pg_stats where attname like 'instr%_ref'; -- Result: 40.000
select count(distinct instrumentid_ref) from bigtable -- Result: 33 385 922 (!!)

That is an astonishing difference of almost a 1000X.

When the planner only thinks there are 40K different values, then it makes 
sense to switch to table scan in order to fill the limit=10.000. But it is 
wrong, very wrong, an the query returns in 100s of seconds instead of a few.

I have tried to increase the statistics target to 5000, and it helps, but it 
reduces the error to 100X. Still crazy high.

I understand that this is a known problem. I have read previous posts about it, 
still I have never seen anyone reach such a high difference factor.

I have considered these fixes:
- hardcode the statistics to a particular ratio of the total number of rows
- randomize the rows more, so that it does not suffer from page clustering. 
However, this has probably other implications

Feel free to comment :)

K

Re: n_distinct off by a factor of 1000

2020-06-23 Thread Ron

Maybe I missed it, but did you run "ANALYZE VERBOSE bigtable;"?

On 6/23/20 7:42 AM, Klaudie Willis wrote:

Friends,

I run Postgresql 12.3, on Windows. I have just discovered a pretty 
significant problem with Postgresql and my data.  I have a large table, 
500M rows, 50 columns. It is split in 3 partitions by Year.  In addition 
to the primary key, one of the columns is indexed, and I do lookups on this.


Select * from bigtable b where b.instrument_ref in (x,y,z,...)
limit 1000

It responded well with sub-second response, and it uses the index of the 
column.  However, when I changed it to:


Select * from bigtable b where b.instrument_ref in (x,y,z,)
limit 1 -- (notice 10K now)

The planner decided to do a full table scan on the entire 500M row table! 
And that did not work very well.  First I had no clue as to why it did so, 
and when I disabled sequential scan the query immediately returned.  But I 
should not have to do so.


I got my first hint of why this problem occurs when I looked at the 
statistics.  For the column in question, "instrument_ref" the statistics 
claimed it to be:


The default_statistics_target=500, and analyze has been run.
select * from pg_stats where attname like 'instr%_ref'; -- Result: *40.000*
select count(distinct instrumentid_ref) from bigtable -- Result: *33 385 
922 (!!)*


That is an astonishing difference of almost a 1000X.

When the planner only thinks there are 40K different values, then it makes 
sense to switch to table scan in order to fill the limit=10.000.  But it 
is wrong, very wrong, an the query returns in 100s of seconds instead of a 
few.


I have tried to increase the statistics target to 5000, and it helps, but 
it reduces the error to 100X.  Still crazy high.


I understand that this is a known problem.  I have read previous posts 
about it, still I have never seen anyone reach such a high difference factor.


I have considered these fixes:
- hardcode the statistics to a particular ratio of the total number of rows
- randomize the rows more, so that it does not suffer from page 
clustering.  However, this has probably other implications


Feel free to comment :)


K



--
Angular momentum makes the world go 'round.


Re: n_distinct off by a factor of 1000

2020-06-23 Thread Klaudie Willis
I didn't run it with "verbose" but otherwise, yes, several times. I can do it 
again with verbose if you are interested in the output. Just give me some time. 
500M rows 50 columns, is no small job :)

K

‐‐‐ Original Message ‐‐‐
On Tuesday, June 23, 2020 2:51 PM, Ron  wrote:

> Maybe I missed it, but did you run "ANALYZE VERBOSE bigtable;"?
>
> On 6/23/20 7:42 AM, Klaudie Willis wrote:
>
>> Friends,
>>
>> I run Postgresql 12.3, on Windows. I have just discovered a pretty 
>> significant problem with Postgresql and my data. I have a large table, 500M 
>> rows, 50 columns. It is split in 3 partitions by Year. In addition to the 
>> primary key, one of the columns is indexed, and I do lookups on this.
>>
>> Select * from bigtable b where b.instrument_ref in (x,y,z,...)
>> limit 1000
>>
>> It responded well with sub-second response, and it uses the index of the 
>> column. However, when I changed it to:
>>
>> Select * from bigtable b where b.instrument_ref in (x,y,z,)
>> limit 1 -- (notice 10K now)
>>
>> The planner decided to do a full table scan on the entire 500M row table! 
>> And that did not work very well. First I had no clue as to why it did so, 
>> and when I disabled sequential scan the query immediately returned. But I 
>> should not have to do so.
>>
>> I got my first hint of why this problem occurs when I looked at the 
>> statistics. For the column in question, "instrument_ref" the statistics 
>> claimed it to be:
>>
>> The default_statistics_target=500, and analyze has been run.
>> select * from pg_stats where attname like 'instr%_ref'; -- Result: 40.000
>> select count(distinct instrumentid_ref) from bigtable -- Result: 33 385 922 
>> (!!)
>>
>> That is an astonishing difference of almost a 1000X.
>>
>> When the planner only thinks there are 40K different values, then it makes 
>> sense to switch to table scan in order to fill the limit=10.000. But it is 
>> wrong, very wrong, an the query returns in 100s of seconds instead of a few.
>>
>> I have tried to increase the statistics target to 5000, and it helps, but it 
>> reduces the error to 100X. Still crazy high.
>>
>> I understand that this is a known problem. I have read previous posts about 
>> it, still I have never seen anyone reach such a high difference factor.
>>
>> I have considered these fixes:
>> - hardcode the statistics to a particular ratio of the total number of rows
>> - randomize the rows more, so that it does not suffer from page clustering. 
>> However, this has probably other implications
>>
>> Feel free to comment :)
>>
>> K
>
> --
> Angular momentum makes the world go 'round.

pg_dump empty tables

2020-06-23 Thread Edu Gargiulo
Hi all,

We are using postgresql 11.7 on Debian.
I noticed that pg_dump is not including records on tables with fields
defined as array type (integer[] and real[]). The table structure is
normally restored but they have 0 records on restoring.

i'm wondering if is it a normal behaviour of pg_dump and how should I
execute it to include data on that tables.

Any hint would be appreciated. Thanks in advance and sorry for my english

--
edugarg


UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Jason Ralph
Hello List,
PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (R
ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign keys 
prior to the update, however I am not sure if I should use a transaction or not.
My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log in a 
transaction.

Would it make sense to make a back up of the table then execute update without 
a transaction? How would you guys do it?

Thanks,

Jason Ralph

This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. E-mail transmission cannot be guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender therefore 
does not accept liability for any errors or omissions in the contents of this 
message, which arise as a result of e-mail transmission. If verification is 
required please request a hard-copy version.


Re: pg_dump empty tables

2020-06-23 Thread Adrian Klaver

On 6/23/20 6:30 AM, Edu Gargiulo wrote:

Hi all,

We are using postgresql 11.7 on Debian.
I noticed that pg_dump is not including records on tables with fields 
defined as array type (integer[] and real[]). The table structure is 
normally restored but they have 0 records on restoring.


What is the complete command you are using when running pg_dump?

What is the schema for one of the tables? e.g \dt table_name

What is does a SELECT on those fields show?



i'm wondering if is it a normal behaviour of pg_dump and how should I 
execute it to include data on that tables.


Any hint would be appreciated. Thanks in advance and sorry for my english

--
edugarg



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Adrian Klaver

On 6/23/20 6:32 AM, Jason Ralph wrote:

Hello List,

PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (R


ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign 
keys prior to the update, however I am not sure if I should use a 
transaction or not.


My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log 
in a transaction.


Unless this is an UNLOGGED table WALs will be written.



Would it make sense to make a back up of the table then execute update 
without a transaction? How would you guys do it?


You could break it down into multiple transactions if there is way to 
specify ranges of records.




Thanks,

*Jason Ralph*

This message contains confidential information and is intended only for 
the individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be 
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
contain viruses. The sender therefore does not accept liability for any 
errors or omissions in the contents of this message, which arise as a 
result of e-mail transmission. If verification is required please 
request a hard-copy version.



--
Adrian Klaver
adrian.kla...@aklaver.com




RE: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Jason Ralph
Thanks Adrian,
> You could break it down into multiple transactions if there is way to specify 
> ranges of records.
Say I couldn't break it up, would it be faster in or out of the transaction?


Jason Ralph

-Original Message-
From: Adrian Klaver 
Sent: Tuesday, June 23, 2020 9:38 AM
To: Jason Ralph ; 
pgsql-general@lists.postgresql.org
Subject: Re: UPDATE on 20 Million Records Transaction or not?

On 6/23/20 6:32 AM, Jason Ralph wrote:
> Hello List,
>
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (R
>
> ed Hat 4.4.7-23), 64-bit
>
> I am planning an update on a table with 20Million records, I have been
> researching the best practices.  I will remove all indexes and foreign
> keys prior to the update, however I am not sure if I should use a
> transaction or not.
>
> My thought process is that a transaction would be easier to recover if
> something fails, however it would take more time to write to the WAL
> log in a transaction.

Unless this is an UNLOGGED table WALs will be written.

>
> Would it make sense to make a back up of the table then execute update
> without a transaction? How would you guys do it?

You could break it down into multiple transactions if there is way to specify 
ranges of records.

>
> Thanks,
>
> *Jason Ralph*
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please notify
> the sender immediately by e-mail if you have received this e-mail by
> mistake and delete this e-mail from your system. E-mail transmission
> cannot be guaranteed to be secure or error-free as information could
> be intercepted, corrupted, lost, destroyed, arrive late or incomplete,
> or contain viruses. The sender therefore does not accept liability for
> any errors or omissions in the contents of this message, which arise
> as a result of e-mail transmission. If verification is required please
> request a hard-copy version.


--
Adrian Klaver
adrian.kla...@aklaver.com
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. E-mail transmission cannot be guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender therefore 
does not accept liability for any errors or omissions in the contents of this 
message, which arise as a result of e-mail transmission. If verification is 
required please request a hard-copy version.




Re: pg_dump empty tables

2020-06-23 Thread Edu Gargiulo
On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver 
wrote:

> On 6/23/20 6:30 AM, Edu Gargiulo wrote:
> > Hi all,
> >
> > We are using postgresql 11.7 on Debian.
> > I noticed that pg_dump is not including records on tables with fields
> > defined as array type (integer[] and real[]). The table structure is
> > normally restored but they have 0 records on restoring.
>
> What is the complete command you are using when running pg_dump?
>

/usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp


>
> What is the schema for one of the tables? e.g \dt table_name
>

historic=# \dt well.surface_card
List of relations
 Schema | Name | Type  |  Owner
+--+---+--
 well   | surface_card | table | historic
(1 row)

historic=# \d well.surface_card
   Table "well.surface_card"
  Column   |   Type   | Collation | Nullable |
Default
---+--+---+--+-
 id| bigint   |   | not null |
 tstamp| timestamp with time zone |   | not null |
 card_tstamp   | timestamp with time zone |   | not null |
 shutdown_event_id | smallint |   | not null |
 quality   | boolean  |   | not null |
 load_min  | integer  |   | not null |
 load_max  | integer  |   | not null |
 stroke_length | real |   | not null |
 stroke_period | real |   | not null |
 positions | real[]   |   | not null |
 loads | integer[]|   | not null |


>
> What is does a SELECT on those fields show?
>

historic=# select positions,loads from well.surface_card limit 1;
-
 
{0.05,0.11,0.22,0.33,0.41,0.76,1.1,1.45,1.79,2.3,2.89,3.47,4.05,4.78,5.61,6.44,7.28,8.21,9.31,10.4,11.5,12.66,14.03,15.4,16.77,18.16,19.82,21.48,23.13,24.77,26.72,28.68,30.63,32.59,34.77,37.02,39.27,41.52,43.95,46.49,49.03,51.56,54.22,57.03,59.85,62.66,65.54,68.57,71.61,74.65,77.69,80.87,84.05,87.22,90.38,93.59,96.81,100.03,103.25,106.42,109.58,112.76,115.97,118.98,121.99,125.02,128.03,130.83,133.53,136.25,138.96,141.44,143.75,146.08,148.39,150.52,152.39,154.25,156.13,157.85,159.23,160.6,161.97,163.25,164.1,164.95,165.8,166.63,166.96,167.3,167.63,167.97,167.85,167.7,167.54,167.49,166.88,166.27,165.66,165.05,164.2,163.17,162.17,161.14,159.98,158.61,157.24,155.87,154.42,152.74,151.07,149.39,147.7,145.77,143.84,141.91,139.98,137.87,135.75,133.62,131.49,129.25,126.98,124.7,122.41,120.08,117.69,115.3,112.91,110.49,108.04,105.59,103.14,100.69,98.22,95.75,93.28,90.8,88.35,85.9,83.44,80.98,78.57,76.17,73.76,71.36,69.01,66.69,64.37,62.05,59.79,57.59,55.38,53.17,51.02,48.95,46.88,44.8,42.76,40.84,38.91,37,35.09,33.34,31.6,29.86,28.11,26.54,24.99,23.43,21.85,20.48,19.11,17.74,16.37,15.15,13.98,12.81,11.64,10.59,9.62,8.64,7.69,6.82,6.06,5.3,4.54,3.86,3.32,2.77,2.22,1.73,1.4,1.07,0.74,0.44,0.33,0.22,0.11,0.05}
|
{5716,6021,6524,6586,6422,6121,5953,5672,5095,6110,7770,7796,7785,7712,7533,7447,7317,7150,7004,6891,6839,6808,6846,7019,7195,7283,7386,7522,7565,7592,7612,7426,7169,6858,6762,6536,6442,6417,6386,6421,6501,6651,6949,7277,7435,7470,7427,7408,7217,7117,7008,6544,6097,5658,5508,5520,5717,5880,5923,6126,6568,7069,7606,7979,8033,7712,7116,6799,6233,5766,5143,4538,4202,4178,4605,5236,5997,6801,7494,8095,8571,9014,9429,9859,10187,10550,11153,11330,11440,11519,11590,11691,11790,11834,11837,11868,11948,12038,12133,12205,12399,12539,12581,12620,12638,12642,12630,12625,12620,12601,12583,12590,12606,12620,12746,12898,12915,12991,13026,13043,13044,13068,13039,12970,12927,12909,12889,12885,12914,13020,13005,13273,13463,13517,13576,13633,13618,13534,13502,13391,13175,13085,13049,12996,12980,13063,13466,13467,13501,13696,13808,13880,13932,13956,13654,13266,13108,12964,12776,12704,12666,12679,12795,13329,13695,14046,14452,14823,15033,15059,14948,14478,13766,13128,12591,12117,11790,11575,11407,10778,10061,9545,9145,8854,8707,8690,8422,7951,7456,7125,7005,7014,7057,7076,6992,6697,6321,6133,5831,5716}
(1 row)


>
> >
> > i'm wondering if is it a normal behaviour of pg_dump and how should I
> > execute it to include data on that tables.
> >
> > Any hint would be appreciated. Thanks in advance and sorry for my english
> >
> > --
> > edugarg
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread luis . roberto



Thanks Adrian,
> You could break it down into multiple transactions if there is way to specify 
> ranges of records.
Say I couldn't break it up, would it be faster in or out of the transaction?


It depends whether you have concurrent transactions using the table. If you do, 
I think it would be better to split the update between smaller transactions.




Re: pg_dump empty tables

2020-06-23 Thread Adrian Klaver

On 6/23/20 6:48 AM, Edu Gargiulo wrote:
On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 6/23/20 6:30 AM, Edu Gargiulo wrote:
 > Hi all,
 >
 > We are using postgresql 11.7 on Debian.
 > I noticed that pg_dump is not including records on tables with
fields
 > defined as array type (integer[] and real[]). The table structure is
 > normally restored but they have 0 records on restoring.

What is the complete command you are using when running pg_dump?


/usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp


Coffee has kicked in and I realized I should have asked for the 
pg_restore command as well. So what is that?






What is the schema for one of the tables? e.g \dt table_name


historic=# \dt well.surface_card
             List of relations
  Schema |     Name     | Type  |  Owner
+--+---+--
  well   | surface_card | table | historic
(1 row)

historic=# \d well.surface_card
                            Table "well.surface_card"
       Column       |           Type           | Collation | Nullable | 
Default

---+--+---+--+-
  id                | bigint                   |           | not null |
  tstamp            | timestamp with time zone |           | not null |
  card_tstamp       | timestamp with time zone |           | not null |
  shutdown_event_id | smallint                 |           | not null |
  quality           | boolean                  |           | not null |
  load_min          | integer                  |           | not null |
  load_max          | integer                  |           | not null |
  stroke_length     | real                     |           | not null |
  stroke_period     | real                     |           | not null |
  positions         | real[]                   |           | not null |
  loads             | integer[]                |           | not null |


What is does a SELECT on those fields show?


historic=# select positions,loads from well.surface_card limit 1;
-
  
{0.05,0.11,0.22,0.33,0.41,0.76,1.1,1.45,1.79,2.3,2.89,3.47,4.05,4.78,5.61,6.44,7.28,8.21,9.31,10.4,11.5,12.66,14.03,15.4,16.77,18.16,19.82,21.48,23.13,24.77,26.72,28.68,30.63,32.59,34.77,37.02,39.27,41.52,43.95,46.49,49.03,51.56,54.22,57.03,59.85,62.66,65.54,68.57,71.61,74.65,77.69,80.87,84.05,87.22,90.38,93.59,96.81,100.03,103.25,106.42,109.58,112.76,115.97,118.98,121.99,125.02,128.03,130.83,133.53,136.25,138.96,141.44,143.75,146.08,148.39,150.52,152.39,154.25,156.13,157.85,159.23,160.6,161.97,163.25,164.1,164.95,165.8,166.63,166.96,167.3,167.63,167.97,167.85,167.7,167.54,167.49,166.88,166.27,165.66,165.05,164.2,163.17,162.17,161.14,159.98,158.61,157.24,155.87,154.42,152.74,151.07,149.39,147.7,145.77,143.84,141.91,139.98,137.87,135.75,133.62,131.49,129.25,126.98,124.7,122.41,120.08,117.69,115.3,112.91,110.49,108.04,105.59,103.14,100.69,98.22,95.75,93.28,90.8,88.35,85.9,83.44,80.98,78.57,76.17,73.76,71.36,69.01,66.69,64.37,62.05,59.79,57.59,55.38,53.17,51.02,48.95,46.88,44.8,42.76,40.84,38.91,37,35.09,33.34,31.6,29.86,28.11,26.54,24.99,23.43,21.85,20.48,19.11,17.74,16.37,15.15,13.98,12.81,11.64,10.59,9.62,8.64,7.69,6.82,6.06,5.3,4.54,3.86,3.32,2.77,2.22,1.73,1.4,1.07,0.74,0.44,0.33,0.22,0.11,0.05}
 | 
{5716,6021,6524,6586,6422,6121,5953,5672,5095,6110,7770,7796,7785,7712,7533,7447,7317,7150,7004,6891,6839,6808,6846,7019,7195,7283,7386,7522,7565,7592,7612,7426,7169,6858,6762,6536,6442,6417,6386,6421,6501,6651,6949,7277,7435,7470,7427,7408,7217,7117,7008,6544,6097,5658,5508,5520,5717,5880,5923,6126,6568,7069,7606,7979,8033,7712,7116,6799,6233,5766,5143,4538,4202,4178,4605,5236,5997,6801,7494,8095,8571,9014,9429,9859,10187,10550,11153,11330,11440,11519,11590,11691,11790,11834,11837,11868,11948,12038,12133,12205,12399,12539,12581,12620,12638,12642,12630,12625,12620,12601,12583,12590,12606,12620,12746,12898,12915,12991,13026,13043,13044,13068,13039,12970,12927,12909,12889,12885,12914,13020,13005,13273,13463,13517,13576,13633,13618,13534,13502,13391,13175,13085,13049,12996,12980,13063,13466,13467,13501,13696,13808,13880,13932,13956,13654,13266,13108,12964,12776,12704,12666,12679,12795,13329,13695,14046,14452,14823,15033,15059,14948,14478,13766,13128,12591,12117,11790,11575,11407,10778,10061,9545,9145,8854,8707,8690,8422,7951,7456,7125,7005,7014,7057,7076,6992,6697,6321,6133,5831,5716}
(1 row)


 >
 > i'm wondering if is it a normal behaviour of pg_dump and how
should I
 > execute it to include data on that tables.
 >
 > Any hint would be appreciated. Thanks in advance and sorry for my
english
 >
 > --
 > edugarg


-- 
Adri

Re: n_distinct off by a factor of 1000

2020-06-23 Thread Fabio Pardi

On 23/06/2020 14:42, Klaudie Willis wrote:
> I got my first hint of why this problem occurs when I looked at the 
> statistics.  For the column in question, "instrument_ref" the statistics 
> claimed it to be:
>
> The default_statistics_target=500, and analyze has been run.
> select * from pg_stats where attname like 'instr%_ref'; -- Result: *40.000*
> select count(distinct instrumentid_ref) from bigtable -- Result: *33 385 922 
> (!!)*
>
> That is an astonishing difference of almost a 1000X. 
>

I think you are counting 2 different things here.

The first query returns all the columns "like 'instr%_ref'" present in the 
statistics (so in the whole cluster), while the second is counting the actual 
number of different rows in bigtable.


regards,

fabio pardi


Re: n_distinct off by a factor of 1000

2020-06-23 Thread Adrian Klaver

On 6/23/20 7:05 AM, Fabio Pardi wrote:


On 23/06/2020 14:42, Klaudie Willis wrote:
I got my first hint of why this problem occurs when I looked at the 
statistics.  For the column in question, "instrument_ref" the 
statistics claimed it to be:


The default_statistics_target=500, and analyze has been run.
select * from pg_stats where attname like 'instr%_ref'; -- Result: 
*40.000*
select count(distinct instrumentid_ref) from bigtable -- Result: *33 
385 922 (!!)*


That is an astonishing difference of almost a 1000X.



I think you are counting 2 different things here.

The first query returns all the columns "like 'instr%_ref'" present in 
the statistics (so in the whole cluster), while the second is counting 
the actual number of different rows in bigtable.


I believe the OP actually meant the query to be:

select n_distinct from pg_stats where attname like 'instr%_ref';




regards,

fabio pardi



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: pg_dump empty tables

2020-06-23 Thread Adrian Klaver

On 6/23/20 6:48 AM, Edu Gargiulo wrote:
On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 6/23/20 6:30 AM, Edu Gargiulo wrote:
 > Hi all,
 >
 > We are using postgresql 11.7 on Debian.
 > I noticed that pg_dump is not including records on tables with
fields
 > defined as array type (integer[] and real[]). The table structure is
 > normally restored but they have 0 records on restoring.

What is the complete command you are using when running pg_dump?


/usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp



When you do the restore are there any errors in the Postgres log?


--
Adrian Klaver
adrian.kla...@aklaver.com




Re: i am getting issue as FATAL: password authentication failed for user ' postgres' tried

2020-06-23 Thread Adrian Klaver

On 6/22/20 9:18 PM, prachi surangalikar wrote:
Please relpy to list also.
Ccing lis.
i  tried to connect to postgres using psql  and then its asking me 
password for user 'administrator' but i have not created any 
administrator user and when i logged in as user 'postgres' it is asking 
for  password then when i entered password it said password 
authentication failed for user 'postgres'.

Even when i connect using pgadmin its giving the same error.


Are you running this on Windows and using the Enterprise DB download of 
Postgres?


If so when you installed Postgres the installer asked you to create a 
password. This is the password you need when connecting via pgAdmin. Now 
the 'administrator' could also mean the program(psql) needs to run as 
the Windows Administrator. Can you show the command you are running and 
where/when you being prompted for the password?




Do i need to change any configuration in pg_hba.conf file?
or add the user 'administrator' in the conf file.

On Thu, Jun 18, 2020 at 8:25 AM Adrian Klaver > wrote:


On 6/17/20 7:14 PM, prachi surangalikar wrote:
 > hello team,
 > i have  tried every  thing  but still i  could not find the
solution to
 > this problem.
 > i made changes in the pg_hba.conf file also , please help me to
solve
 > this problem.

What is the connection string you are using when you get the error?

What are the settings in the pg_hba.conf file?



-- 
Adrian Klaver

adrian.kla...@aklaver.com 




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: pg_dump empty tables

2020-06-23 Thread Edu Gargiulo
On Tue, Jun 23, 2020 at 10:56 AM Adrian Klaver 
wrote:

> On 6/23/20 6:48 AM, Edu Gargiulo wrote:
> > On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 6/23/20 6:30 AM, Edu Gargiulo wrote:
> >  > Hi all,
> >  >
> >  > We are using postgresql 11.7 on Debian.
> >  > I noticed that pg_dump is not including records on tables with
> > fields
> >  > defined as array type (integer[] and real[]). The table structure
> is
> >  > normally restored but they have 0 records on restoring.
> >
> > What is the complete command you are using when running pg_dump?
> >
> >
> > /usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp
>
> Coffee has kicked in and I realized I should have asked for the
> pg_restore command as well. So what is that?
>

pg_restore -d historic -h localhost --clean srvtsdb01.dmp
pg_restore -d historic --schema well --verbose srvtsdb01.dmp
pg_restore -d historic --schema well --clean --verbose srvtsdb01.dmp


Re: pg_dump empty tables

2020-06-23 Thread Adrian Klaver

On 6/23/20 7:37 AM, Edu Gargiulo wrote:



On Tue, Jun 23, 2020 at 10:56 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 6/23/20 6:48 AM, Edu Gargiulo wrote:
 > On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver
 > mailto:adrian.kla...@aklaver.com>
>> wrote:
 >
 >     On 6/23/20 6:30 AM, Edu Gargiulo wrote:
 >      > Hi all,
 >      >
 >      > We are using postgresql 11.7 on Debian.
 >      > I noticed that pg_dump is not including records on tables with
 >     fields
 >      > defined as array type (integer[] and real[]). The table
structure is
 >      > normally restored but they have 0 records on restoring.
 >
 >     What is the complete command you are using when running pg_dump?
 >
 >
 > /usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp

Coffee has kicked in and I realized I should have asked for the
pg_restore command as well. So what is that?


pg_restore -d historic -h localhost --clean srvtsdb01.dmp
pg_restore -d historic --schema well --verbose srvtsdb01.dmp
pg_restore -d historic --schema well --clean --verbose srvtsdb01.dmp



With --verbose are you seeing any errors?

What does pg_restore -V show?



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Ron

On 6/23/20 8:32 AM, Jason Ralph wrote:


Hello List,

PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (R


ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign 
keys prior to the update, however I am not sure if I should use a 
transaction or not.


My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log 
in a transaction.




Are you updating *every* row in the table?

Are you updating indexed fields?  (If not, then leave the indexes and FKs, 
since they won't be touched.)


Would it make sense to make a back up of the table then execute update 
without a transaction?




Always make a backup.


How would you guys do it?



It depends on what percentage of the rows are being updated, which columns 
are being updated and how big the records are.



--
Angular momentum makes the world go 'round.


RE: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Jason Ralph

Hello List,
PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (R
ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign keys 
prior to the update, however I am not sure if I should use a transaction or not.
My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log in a 
transaction.

>Are you updating every row in the table?
No I am using an update like so: UPDATE members SET regdate='2038-01-18' WHERE 
regdate='2020-07-07'
DB=# select count(*) from members where regdate = '2020-07-07';
  count
--
17333090
(1 row)

>Are you updating indexed fields?  (If not, then leave the indexes and FKs, 
>since they won't be touched.)
Just checked regdate is not indexed so I will leave them in place.



Would it make sense to make a back up of the table then execute update without 
a transaction?

>Always make a backup.
Agreed


How would you guys do it?

>It depends on what percentage of the rows are being updated, which columns are 
>being updated and how big the records are.
Please see above, thanks

Jason Ralph

From: Ron 
Sent: Tuesday, June 23, 2020 10:57 AM
To: pgsql-general@lists.postgresql.org
Subject: Re: UPDATE on 20 Million Records Transaction or not?

On 6/23/20 8:32 AM, Jason Ralph wrote:

Hello List,
PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (R
ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign keys 
prior to the update, however I am not sure if I should use a transaction or not.
My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log in a 
transaction.

Are you updating every row in the table?

Are you updating indexed fields?  (If not, then leave the indexes and FKs, 
since they won't be touched.)



Would it make sense to make a back up of the table then execute update without 
a transaction?

Always make a backup.


How would you guys do it?

It depends on what percentage of the rows are being updated, which columns are 
being updated and how big the records are.

--
Angular momentum makes the world go 'round.
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. E-mail transmission cannot be guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender therefore 
does not accept liability for any errors or omissions in the contents of this 
message, which arise as a result of e-mail transmission. If verification is 
required please request a hard-copy version.


Re: n_distinct off by a factor of 1000

2020-06-23 Thread Klaudie Willis
Adrian, you are correct.  My mistanke.

K

‐‐‐ Original Message ‐‐‐
On Tuesday, June 23, 2020 4:14 PM, Adrian Klaver  
wrote:

> On 6/23/20 7:05 AM, Fabio Pardi wrote:
>
> > On 23/06/2020 14:42, Klaudie Willis wrote:
> >
> > > I got my first hint of why this problem occurs when I looked at the
> > > statistics.  For the column in question, "instrument_ref" the
> > > statistics claimed it to be:
> > > The default_statistics_target=500, and analyze has been run.
> > > select * from pg_stats where attname like 'instr%_ref'; -- Result:
> > > 40.000
> > > select count(distinct instrumentid_ref) from bigtable -- Result: 33
> > > 385 922 (!!)That is an astonishing difference of almost a 1000X.
> >
> > I think you are counting 2 different things here.
> > The first query returns all the columns "like 'instr%_ref'" present in
> > the statistics (so in the whole cluster), while the second is counting
> > the actual number of different rows in bigtable.
>
> I believe the OP actually meant the query to be:
>
> select n_distinct from pg_stats where attname like 'instr%_ref';
>
> > regards,
> > fabio pardi
>
> --
>
> Adrian Klaver
> adrian.kla...@aklaver.com






Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Ron


I'd make a copy of the table, and test how long the various methods take.

On 6/23/20 10:17 AM, Jason Ralph wrote:


Hello List,

PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (R


ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been 
researching the best practices.  I will remove all indexes and foreign 
keys prior to the update, however I am not sure if I should use a 
transaction or not.


My thought process is that a transaction would be easier to recover if 
something fails, however it would take more time to write to the WAL log 
in a transaction.



>Are you updating *every* row in the table?

No I am using an update like so: UPDATE members SET regdate='2038-01-18' 
WHERE regdate='2020-07-07'


DB=# select count(*) from members where regdate = '2020-07-07';

  count

--

17333090

(1 row)

>Are you updating indexed fields?  (If not, then leave the indexes and 
FKs, since they won't be touched.)


Just checked regdate is not indexed so I will leave them in place.


Would it make sense to make a back up of the table then execute update 
without a transaction?



>Always make a backup.

Agreed


How would you guys do it?


>It depends on what percentage of the rows are being updated, which 
columns are being updated and how big the records are.


Please see above, thanks

*Jason Ralph*

*From:* Ron 
*Sent:* Tuesday, June 23, 2020 10:57 AM
*To:* pgsql-general@lists.postgresql.org
*Subject:* Re: UPDATE on 20 Million Records Transaction or not?

On 6/23/20 8:32 AM, Jason Ralph wrote:

Hello List,

PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (R

ed Hat 4.4.7-23), 64-bit

I am planning an update on a table with 20Million records, I have been
researching the best practices.  I will remove all indexes and foreign
keys prior to the update, however I am not sure if I should use a
transaction or not.

My thought process is that a transaction would be easier to recover if
something fails, however it would take more time to write to the WAL
log in a transaction.


Are you updating *every* row in the table?

Are you updating indexed fields?  (If not, then leave the indexes and FKs, 
since they won't be touched.)



Would it make sense to make a back up of the table then execute update
without a transaction?


Always make a backup.


How would you guys do it?


It depends on what percentage of the rows are being updated, which columns 
are being updated and how big the records are.


--
Angular momentum makes the world go 'round.

This message contains confidential information and is intended only for 
the individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be intercepted, 
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
The sender therefore does not accept liability for any errors or omissions 
in the contents of this message, which arise as a result of e-mail 
transmission. If verification is required please request a hard-copy version. 


--
Angular momentum makes the world go 'round.


getting daily stats for event counts

2020-06-23 Thread David Gauthier
Hi:
9.6.0 on linux

I have a table which logs the inception of an event with an
"incept_datetime" (timestamptz) field.  I want to know how many events
occurred per day from one date to the next using midnight as a marker for
each day reported.  The end result should be something like...

2019-06-23  44
2019-06-24  18
2019-06-25  38
etc...

Is there a query that can do this ?


Re: getting daily stats for event counts

2020-06-23 Thread Tom Lane
David Gauthier  writes:
> I have a table which logs the inception of an event with an
> "incept_datetime" (timestamptz) field.  I want to know how many events
> occurred per day from one date to the next using midnight as a marker for
> each day reported.  The end result should be something like...

> 2019-06-23  44
> 2019-06-24  18
> 2019-06-25  38
> etc...

> Is there a query that can do this ?

Something involving GROUP BY date_trunc('day', incept_datetime)
ought to do it, possibly with some fooling around required
depending on what definition of "midnight" you want to use.

regards, tom lane




Dependencies of Matviews?

2020-06-23 Thread Thomas Kellerer

I would like to extract the dependency between materialized views.

e.g. in the following situation:

   create materialized view mv1 as select ;
   create materialized view mv2 as select ... from mv1, ...;

I would like to know that mv2 depends on mv1.

I assumed this could be done through pg_depend, but the only dependency I see 
there for the matviews is the one for the namespace.

Clearly this information is stored somehow, as a DROP MATERIALIZED VIEW 
complains about being referenced by other matviews.

Am I missing something in pg_depend or do I need to look in a different system 
table?

Regards
Thomas





Re: pg_dump empty tables

2020-06-23 Thread Edu Gargiulo
On Tue, Jun 23, 2020 at 11:45 AM Adrian Klaver 
wrote:

> On 6/23/20 7:37 AM, Edu Gargiulo wrote:
> >
> >
> > On Tue, Jun 23, 2020 at 10:56 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 6/23/20 6:48 AM, Edu Gargiulo wrote:
> >  > On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver
> >  > mailto:adrian.kla...@aklaver.com>
> >  > >> wrote:
> >  >
> >  > On 6/23/20 6:30 AM, Edu Gargiulo wrote:
> >  >  > Hi all,
> >  >  >
> >  >  > We are using postgresql 11.7 on Debian.
> >  >  > I noticed that pg_dump is not including records on tables
> with
> >  > fields
> >  >  > defined as array type (integer[] and real[]). The table
> > structure is
> >  >  > normally restored but they have 0 records on restoring.
> >  >
> >  > What is the complete command you are using when running
> pg_dump?
> >  >
> >  >
> >  > /usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp
> >
> > Coffee has kicked in and I realized I should have asked for the
> > pg_restore command as well. So what is that?
> >
> >
> > pg_restore -d historic -h localhost --clean srvtsdb01.dmp
> > pg_restore -d historic --schema well --verbose srvtsdb01.dmp
> > pg_restore -d historic --schema well --clean --verbose srvtsdb01.dmp
> >
>
> With --verbose are you seeing any errors?
>
> What does pg_restore -V show?
>

Thanks for your response Adrian, looking at the pg_restore output I saw
issues with triggers and timescaledb extension on restoring those empty
tables.

--
edugarg


Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Ganesh Korde
You can try this

First take backup of table then execute below statements.

create table members_temp
as
select ,'2038-01-18'  regdate from  members where regdate =
'2020-07-07';

delete from  members  where regdate = '2020-07-07';

insert into  members  select * from  members_temp ;

drop table  members_temp;

Regards,
Ganesh Korde.

On Tue, Jun 23, 2020 at 9:06 PM Ron  wrote:

>
> I'd make a copy of the table, and test how long the various methods take.
>
> On 6/23/20 10:17 AM, Jason Ralph wrote:
>
>
>
> Hello List,
>
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (R
>
> ed Hat 4.4.7-23), 64-bit
>
>
>
> I am planning an update on a table with 20Million records, I have been
> researching the best practices.  I will remove all indexes and foreign keys
> prior to the update, however I am not sure if I should use a transaction or
> not.
>
> My thought process is that a transaction would be easier to recover if
> something fails, however it would take more time to write to the WAL log in
> a transaction.
>
>
> >Are you updating *every* row in the table?
>
> No I am using an update like so: UPDATE members SET regdate='2038-01-18'
> WHERE regdate='2020-07-07'
>
> DB=# select count(*) from members where regdate = '2020-07-07';
>
>   count
>
> --
>
> 17333090
>
> (1 row)
>
> >Are you updating indexed fields?  (If not, then leave the indexes and
> FKs, since they won't be touched.)
>
> Just checked regdate is not indexed so I will leave them in place.
>
>
>
>
> Would it make sense to make a back up of the table then execute update
> without a transaction?
>
>
> >Always make a backup.
>
> Agreed
>
>
> How would you guys do it?
>
>
> >It depends on what percentage of the rows are being updated, which
> columns are being updated and how big the records are.
>
> Please see above, thanks
>
>
>
> *Jason Ralph*
>
>
>
> *From:* Ron  
> *Sent:* Tuesday, June 23, 2020 10:57 AM
> *To:* pgsql-general@lists.postgresql.org
> *Subject:* Re: UPDATE on 20 Million Records Transaction or not?
>
>
>
> On 6/23/20 8:32 AM, Jason Ralph wrote:
>
> Hello List,
>
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (R
>
> ed Hat 4.4.7-23), 64-bit
>
>
>
> I am planning an update on a table with 20Million records, I have been
> researching the best practices.  I will remove all indexes and foreign keys
> prior to the update, however I am not sure if I should use a transaction or
> not.
>
> My thought process is that a transaction would be easier to recover if
> something fails, however it would take more time to write to the WAL log in
> a transaction.
>
>
> Are you updating *every* row in the table?
>
> Are you updating indexed fields?  (If not, then leave the indexes and FKs,
> since they won't be touched.)
>
>
>
>
> Would it make sense to make a back up of the table then execute update
> without a transaction?
>
>
> Always make a backup.
>
>
> How would you guys do it?
>
>
> It depends on what percentage of the rows are being updated, which columns
> are being updated and how big the records are.
>
> --
> Angular momentum makes the world go 'round.
> This message contains confidential information and is intended only for
> the individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required please request a hard-copy
> version.
>
>
> --
> Angular momentum makes the world go 'round.
>


Re: pg_dump empty tables

2020-06-23 Thread Adrian Klaver

On 6/23/20 9:44 AM, Edu Gargiulo wrote:


On Tue, Jun 23, 2020 at 11:45 AM Adrian Klaver 
mailto:adrian.kla...@aklaver.com>> wrote:


On 6/23/20 7:37 AM, Edu Gargiulo wrote:
 >
 >
 > On Tue, Jun 23, 2020 at 10:56 AM Adrian Klaver
 > mailto:adrian.kla...@aklaver.com>
>> wrote:
 >
 >     On 6/23/20 6:48 AM, Edu Gargiulo wrote:
 >      > On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver
 >      > mailto:adrian.kla...@aklaver.com> >
 >     
 >           >
 >      >     On 6/23/20 6:30 AM, Edu Gargiulo wrote:
 >      >      > Hi all,
 >      >      >
 >      >      > We are using postgresql 11.7 on Debian.
 >      >      > I noticed that pg_dump is not including records on
tables with
 >      >     fields
 >      >      > defined as array type (integer[] and real[]). The table
 >     structure is
 >      >      > normally restored but they have 0 records on restoring.
 >      >
 >      >     What is the complete command you are using when
running pg_dump?
 >      >
 >      >
 >      > /usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp
 >
 >     Coffee has kicked in and I realized I should have asked for the
 >     pg_restore command as well. So what is that?
 >
 >
 > pg_restore -d historic -h localhost --clean srvtsdb01.dmp
 > pg_restore -d historic --schema well --verbose srvtsdb01.dmp
 > pg_restore -d historic --schema well --clean --verbose srvtsdb01.dmp
 >

With --verbose are you seeing any errors?

What does pg_restore -V show?


Thanks for your response Adrian, looking at the pg_restore output I saw 
issues with triggers and timescaledb extension on restoring those empty 
tables.


I'm going to bet that is the problem.

What where the errors?

Is the timescaledb extension installed on the database you are restoring to?



--
edugarg



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: n_distinct off by a factor of 1000

2020-06-23 Thread Michael Lewis
>
> > > On 23/06/2020 14:42, Klaudie Willis wrote:
> > >
> > > > I got my first hint of why this problem occurs when I looked at the
> > > > statistics.  For the column in question, "instrument_ref" the
> > > > statistics claimed it to be:
> > > > The default_statistics_target=500, and analyze has been run.
> > > > select * from pg_stats where attname like 'instr%_ref'; -- Result:
> > > > 40.000
> > > > select count(distinct instrumentid_ref) from bigtable -- Result: 33
> > > > 385 922 (!!)That is an astonishing difference of almost a 1000X.
>

Try something like this to check how representative those "most common
values" are. If you have n_distinct very low compared to reality and also
the fraction of the table that the "most common" values are claiming to
cover is low, then you can get very bad estimates when querying for values
that are not in the MCVs list. The planner will assume an even distribution
for other values and that may be much much higher or lower than reality.
That is, if you have statistics target of 100 like normal, and those cover
5% of the table, and you have ndistinct value of 500, then the other 400
values are assumed to evenly cover that 95% of the table so each value
would be .95/400 * reltuples as an estimate. If your real count of distinct
values is 4 then the number of values you expect to get for each value
in your IN clause drops hugely.

Using a custom ndistinct will dramatically impact the estimates that the
planner is using to make the decision of index vs sequential scan. Also, if
the custom ndistinct and the actual distinct count vary by 2x or 10x as
your data grows, it matters very little IMO as compared to relying on the
sample taken by (auto)analyze job being off by a factor of 1000x or even
100x as you have experienced.


SELECT

( SELECT SUM (x) FROM UNNEST (most_common_freqs) x ) frac_MCV,

tablename,

attname,

inherited,

null_frac,

n_distinct,

array_length(most_common_vals,1) n_mcv,

array_length(histogram_bounds,1) n_hist,

correlation,

*

FROM pg_stats

WHERE

schemaname = 'public'

AND tablename=‘table’
AND attname=‘column’;


Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Michael Lewis
>
> >Are you updating *every* row in the table?
>>
>> No I am using an update like so: UPDATE members SET regdate='2038-01-18'
>> WHERE regdate='2020-07-07'
>>
>> DB=# select count(*) from members where regdate = '2020-07-07';
>>
>>   count
>>
>> --
>>
>> 17333090
>>
>> (1 row)
>>
>>
Just update them and be done with it. Do the work in batches if it doesn't
matter that concurrent accesses to the table might see some rows that have
old value and some new. Given you are on PG11 and can commit within a
function, you could adapt something like this to just run until done. Oh,
if you have not tuned the autovacuum process much, depending on the total
number of rows in your table, it could be good to manually vacuum once or
twice during this process so that space is reused.

*the below was shared by Andreas Krogh on one of these mailing lists about
6 months back.


do $_$
declare
num_rows bigint;
begin
loop
delete from YourTable where id in
(select id from YourTable where id <
500 limit 100);
commit;
get diagnostics num_rows = row_count;
raise notice 'deleted % rows', num_rows;
exit when num_rows = 0;
end loop;
end;$_$;


Re: pg_dump empty tables

2020-06-23 Thread Edu Gargiulo
On Tue, Jun 23, 2020 at 2:25 PM Adrian Klaver 
wrote:

> On 6/23/20 9:44 AM, Edu Gargiulo wrote:
> >
> > On Tue, Jun 23, 2020 at 11:45 AM Adrian Klaver
> > mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 6/23/20 7:37 AM, Edu Gargiulo wrote:
> >  >
> >  >
> >  > On Tue, Jun 23, 2020 at 10:56 AM Adrian Klaver
> >  > mailto:adrian.kla...@aklaver.com>
> >  > >> wrote:
> >  >
> >  > On 6/23/20 6:48 AM, Edu Gargiulo wrote:
> >  >  > On Tue, Jun 23, 2020 at 10:34 AM Adrian Klaver
> >  >  >  >   > >
> >  >  > 
> >  >  >  >  >  >
> >  >  > On 6/23/20 6:30 AM, Edu Gargiulo wrote:
> >  >  >  > Hi all,
> >  >  >  >
> >  >  >  > We are using postgresql 11.7 on Debian.
> >  >  >  > I noticed that pg_dump is not including records on
> > tables with
> >  >  > fields
> >  >  >  > defined as array type (integer[] and real[]). The
> table
> >  > structure is
> >  >  >  > normally restored but they have 0 records on
> restoring.
> >  >  >
> >  >  > What is the complete command you are using when
> > running pg_dump?
> >  >  >
> >  >  >
> >  >  > /usr/bin/pg_dump -d historic -Z 9 -Fc -f srvtsdb01.dmp
> >  >
> >  > Coffee has kicked in and I realized I should have asked for
> the
> >  > pg_restore command as well. So what is that?
> >  >
> >  >
> >  > pg_restore -d historic -h localhost --clean srvtsdb01.dmp
> >  > pg_restore -d historic --schema well --verbose srvtsdb01.dmp
> >  > pg_restore -d historic --schema well --clean --verbose
> srvtsdb01.dmp
> >  >
> >
> > With --verbose are you seeing any errors?
> >
> > What does pg_restore -V show?
> >
> >
> > Thanks for your response Adrian, looking at the pg_restore output I saw
> > issues with triggers and timescaledb extension on restoring those empty
> > tables.
>
> I'm going to bet that is the problem.
>
> What where the errors?
>
> Is the timescaledb extension installed on the database you are restoring
> to?
>
> It was not installed on the restoring database. After install and execute
timescaledb_pre_restore() and timescaledb_post_restore() before and after
pg_restore it was restored normally.

Thank you very much


Persistent Connections

2020-06-23 Thread Bee.Lists
I have an issue with a server (v10) that’s seeing increasing connections until 
it’s maxxed-out.  

max_connections for my 4-core server is set to 12.  

I’ve installed pg_stat_activity and pg_stat_statements.

I access this server through a web app, and another client on two machines.  I 
also send queries through SSH on the LAN.  psql queries indicate there are too 
many clients already.  I’ve been getting errors from my workstation through a 
Ruby gem that says dropped connections happen.  Essentially, the server isn’t 
giving up connections and clients from all sides are showing some issues.

pg_stat_activity has shown a handful of idle queries that are quite simple.  
I’ve tested those queries and they seem fine on a workstation client.  I’m 
assuming these queries somehow don’t finish and leave the connection open, but 
I could be wrong.  All of this is new to me.  

It was suggested on Slack that it sounds like my workstation had some TCP 
issues with these connections, and that it was a firewall.  I disabled the 
firewall and the same happened.  The firewall has been restarted.  

I am running no connection pool tool of any sort, and I’ve never used one.  

At this point I don’t know what to even be watching for.  Connections increase 
and connections “drop”, according to the gem I’m using.  I have simple queries 
that are idle and won’t disappear for some reason.  

How can I figure this out so the connections remain within the max_connections 
limit, and connections are not dropped?  

Any insight appreciated.  


Cheers, Bee








RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-23 Thread Jim Hurne
Magnus Hagander  wrote on 06/22/2020 04:44:33 PM:
> Yes, that's how VACUUM FULL works (more or less). 

Thanks for the confirmation and further explanation Magnus! I'm definitely 
getting a better idea of how the vacuum and autovacuum facilities work.


> And for autovacuum, with a cost_delay of 20ms and a cost_limit of 
> 200, autovacuum would spend about 55 hours just on vacuum delay for 
> the reads...

Ah, very interesting. That explains (at least in part) why the autovacuums 
seem to be taking a long time. We'll consider adjusting the cost_delay and 
cost_limit settings. Is there any downsides to setting a very low 
cost_delay (maybe even 0ms), or to setting a large cost_limit (such as 
100,000)?


> The logs you posted originally seem to be excluding the actual 
> autovacuum details -- can you include those?

Sure! Below are more of the details from the same set of logs. Looking at 
them myself, I see that there is always some percentage of tuples that are 
dead but are not yet removable. And that number increases on every vacuum, 
which might explain in part why autovacuum elapsed times keep increasing. 

What causes a dead tuple to be unremovable?

-- LOGS --
2020-06-12T19:41:58.335881879Z stderr F 2020-06-12 19:41:58 UTC [528141]: 
[2-1] user=,db=,client= LOG:  automatic vacuum of table 
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:41:58.335899718Z stderr F pages: 0 removed, 8649 
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:41:58.335908658Z stderr F tuples: 0 removed, 30445 
remain, 17410 are dead but not yet removable, oldest xmin: 1942750826
2020-06-12T19:41:58.335917078Z stderr F buffer usage: 17444 hits, 
0 misses, 0 dirtied
2020-06-12T19:41:58.335925217Z stderr F avg read rate: 0.000 MB/s, 
avg write rate: 0.000 MB/s
2020-06-12T19:41:58.335931494Z stderr F system usage: CPU: user: 
0.02 s, system: 0.00 s, elapsed: 1.77 s
2020-06-12T19:42:59.704833148Z stderr F 2020-06-12 19:42:59 UTC [528231]: 
[2-1] user=,db=,client= LOG:  automatic vacuum of table 
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 1
2020-06-12T19:42:59.704852755Z stderr F pages: 0 removed, 14951 
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:42:59.704861911Z stderr F tuples: 2 removed, 53054 
remain, 27127 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:42:59.704870553Z stderr F buffer usage: 30144 hits, 
0 misses, 3 dirtied
2020-06-12T19:42:59.704878195Z stderr F avg read rate: 0.000 MB/s, 
avg write rate: 0.008 MB/s
2020-06-12T19:42:59.704884752Z stderr F system usage: CPU: user: 
0.02 s, system: 0.02 s, elapsed: 3.06 s
2020-06-12T19:44:01.928225496Z stderr F 2020-06-12 19:44:01 UTC [528326]: 
[2-1] user=,db=,client= LOG:  automatic vacuum of table 
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:44:01.928265589Z stderr F pages: 0 removed, 22395 
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:44:01.928276366Z stderr F tuples: 0 removed, 76562 
remain, 37235 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:44:01.9282856Z stderr F   buffer usage: 45131 hits, 0 
misses, 433 dirtied
2020-06-12T19:44:01.928293976Z stderr F avg read rate: 0.000 MB/s, 
avg write rate: 0.621 MB/s
2020-06-12T19:44:01.928300512Z stderr F system usage: CPU: user: 
0.06 s, system: 0.01 s, elapsed: 5.44 s
2020-06-12T19:45:14.124140716Z stderr F 2020-06-12 19:45:14 UTC [528433]: 
[2-1] user=,db=,client= LOG:  automatic vacuum of table 
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:45:14.124178864Z stderr F pages: 0 removed, 31029 
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:45:14.124187376Z stderr F tuples: 0 removed, 104441 
remain, 65915 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:45:14.124196608Z stderr F buffer usage: 57692 hits, 
4834 misses, 2095 dirtied
2020-06-12T19:45:14.124203046Z stderr F avg read rate: 2.204 MB/s, 
avg write rate: 0.955 MB/s
2020-06-12T19:45:14.124209167Z stderr F system usage: CPU: user: 
0.11 s, system: 0.05 s, elapsed: 17.13 s
2020-06-12T19:46:16.287517553Z stderr F 2020-06-12 19:46:16 UTC [528529]: 
[2-1] user=,db=,client= LOG:  automatic vacuum of table 
"ibmclouddb.pg_catalog.pg_largeobject": index scans: 0
2020-06-12T19:46:16.287558376Z stderr F pages: 0 removed, 36257 
remain, 0 skipped due to pins, 0 skipped frozen
2020-06-12T19:46:16.287567454Z stderr F tuples: 0 removed, 122273 
remain, 81080 are dead but not yet removable, oldest xmin: 1942751009
2020-06-12T19:46:16.287575752Z stderr F buffer usage: 62700 hits, 
10366 misses, 76 dirtied
2020-06-12T19:46:16.287582866Z stderr F avg read rate: 4.252 MB/s, 
avg write rate: 0.031 MB/s
2020-06-12T19:46:16.28758936Z stderr F  system usage: CPU: user: 0.13 s, 
system: 0.08 s, elapsed: 1

Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked

2020-06-23 Thread Michael Lewis
On Tue, Jun 23, 2020 at 2:34 PM Jim Hurne  wrote:

> Sure! Below are more of the details from the same set of logs. Looking at
> them myself, I see that there is always some percentage of tuples that are
> dead but are not yet removable. And that number increases on every vacuum,
> which might explain in part why autovacuum elapsed times keep increasing.
>
> What causes a dead tuple to be unremovable?
>

Here are a couple good write ups.
https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/
https://www.2ndquadrant.com/en/blog/when-autovacuum-does-not-vacuum/

Long running transactions are the common one that I see. You might be
dealing with replication slots or prepared transactions. Basically, if some
process might see that "old truth", then it can't be vacuumed away yet.


Re: Persistent Connections

2020-06-23 Thread Michael Lewis
On Tue, Jun 23, 2020 at 2:29 PM Bee.Lists  wrote:

> I have an issue with a server (v10) that’s seeing increasing connections
> until it’s maxxed-out.
>
> max_connections for my 4-core server is set to 12.
>
> I’ve installed pg_stat_activity and pg_stat_statements.
>

Do you see anything in pg_stat_activity that stays idle for a while and
then *does* disappear on its own? Perhaps some types of connections are
doing client side/application stuff before telling the DB to close the
connection.


> pg_stat_activity has shown a handful of idle queries that are quite
> simple.  I’ve tested those queries and they seem fine on a workstation
> client.  I’m assuming these queries somehow don’t finish and leave the
> connection open, but I could be wrong.  All of this is new to me.
>

Idle means the query finished and that was the last query run. It isn't
active or waiting on another process, that connection is open by idle.


> I am running no connection pool tool of any sort, and I’ve never used
> one.
>

It sounds like a good time to set one up.


> How can I figure this out so the connections remain within the
> max_connections limit, and connections are not dropped?
>

I would increase the limit directly, or with a pooler and research which
connections are behaving, and which are taking too long to close or not
closing at all. You could set up a process to snapshot pg_stat_activity
every minute or 5 and trace which pids are terminating properly, and/or
make logging very verbose.


RE: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Jason Ralph
>Just update them and be done with it. Do the work in batches if it doesn't 
>matter that concurrent accesses to the table might see some >rows that have 
>old value and some new. Given you are on PG11 and can commit within a 
>function, you could adapt something like this to >just run until done. Oh, if 
>you have not tuned the autovacuum process much, depending on the total number 
>of rows in your table, it could >be good to manually vacuum once or twice 
>during this process so that space is reused.

>*the below was shared by Andreas Krogh on one of these mailing lists about 6 
>months back.


>do $_$
>declare
 >   num_rows bigint;
>begin
>loop
>delete from YourTable where id in
>(select id from YourTable where id < 500 
> limit 100);
>commit;
>get diagnostics num_rows = row_count;
>raise notice 'deleted % rows', num_rows;
>   exit when num_rows = 0;
>end loop;
> end;$_$;

Thanks to all the suggestions, I really like the function, I will test this.  I 
have autovacuum fully tuned for this table, so should be good.


Jason Ralph

From: Michael Lewis 
Sent: Tuesday, June 23, 2020 1:43 PM
To: Ganesh Korde 
Cc: Ron ; pgsql-general@lists.postgresql.org
Subject: Re: UPDATE on 20 Million Records Transaction or not?

>Are you updating every row in the table?
No I am using an update like so: UPDATE members SET regdate='2038-01-18' WHERE 
regdate='2020-07-07'
DB=# select count(*) from members where regdate = '2020-07-07';
  count
--
17333090
(1 row)

Just update them and be done with it. Do the work in batches if it doesn't 
matter that concurrent accesses to the table might see some rows that have old 
value and some new. Given you are on PG11 and can commit within a function, you 
could adapt something like this to just run until done. Oh, if you have not 
tuned the autovacuum process much, depending on the total number of rows in 
your table, it could be good to manually vacuum once or twice during this 
process so that space is reused.

*the below was shared by Andreas Krogh on one of these mailing lists about 6 
months back.


do $_$
declare
num_rows bigint;
begin
loop
delete from YourTable where id in
(select id from YourTable where id < 500 
limit 100);
commit;
get diagnostics num_rows = row_count;
raise notice 'deleted % rows', num_rows;
exit when num_rows = 0;
end loop;
end;$_$;
This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. E-mail transmission cannot be guaranteed to be 
secure or error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender therefore 
does not accept liability for any errors or omissions in the contents of this 
message, which arise as a result of e-mail transmission. If verification is 
required please request a hard-copy version.


Re: Netapp SnapCenter

2020-06-23 Thread Wolff, Ken L
One last follow up on this thread.  I don't remember this being mentioned, so 
wanted to share in case if might be of help. The following is NetApps's Best 
Practices for PostgreSQL on NetApp ONTAP storage:

 https://www.netapp.com/us/media/tr-4770.pdf 




Re: Dependencies of Matviews?

2020-06-23 Thread Tom Lane
Thomas Kellerer  writes:
> I would like to extract the dependency between materialized views.
> e.g. in the following situation:
> create materialized view mv1 as select ;
> create materialized view mv2 as select ... from mv1, ...;
> I would like to know that mv2 depends on mv1.
> I assumed this could be done through pg_depend, but the only dependency I see 
> there for the matviews is the one for the namespace.

Most of the interesting dependencies for a view or matview are actually
held by its ON SELECT rule.  For example:

regression=# create materialized view mv1 as select * from int8_tbl;
SELECT 5
regression=# create materialized view mv2 as select * from mv1;
SELECT 5
regression=# select 'mv1'::regclass::oid;
  oid  
---
 58550
(1 row)
regression=# select * from pg_depend where objid >= 58550 or refobjid >= 58550;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype 
-+---+--++--+-+-
1247 | 58552 |0 |   1259 |58550 |   0 | i
1247 | 58551 |0 |   1247 |58552 |   0 | i
1259 | 58550 |0 |   2615 | 2200 |   0 | n
2618 | 58553 |0 |   1259 |58550 |   0 | i
2618 | 58553 |0 |   1259 |58550 |   0 | n
2618 | 58553 |0 |   1259 |37540 |   1 | n
2618 | 58553 |0 |   1259 |37540 |   2 | n
1247 | 58556 |0 |   1259 |58554 |   0 | i
1247 | 58555 |0 |   1247 |58556 |   0 | i
1259 | 58554 |0 |   2615 | 2200 |   0 | n
2618 | 58557 |0 |   1259 |58554 |   0 | i
2618 | 58557 |0 |   1259 |58554 |   0 | n
2618 | 58557 |0 |   1259 |58550 |   1 | n
2618 | 58557 |0 |   1259 |58550 |   2 | n
(14 rows)

or more readably,

regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from 
pg_depend where objid >= 58550 or refobjid >= 58550;
  obj  |ref | 
deptype 
---++-
 type mv1  | materialized view mv1  | i
 type mv1[]| type mv1   | i
 materialized view mv1 | schema public  | n
 rule _RETURN on materialized view mv1 | materialized view mv1  | i
 rule _RETURN on materialized view mv1 | materialized view mv1  | n
 rule _RETURN on materialized view mv1 | column q1 of table int8_tbl| n 
  <
 rule _RETURN on materialized view mv1 | column q2 of table int8_tbl| n 
  <
 type mv2  | materialized view mv2  | i
 type mv2[]| type mv2   | i
 materialized view mv2 | schema public  | n
 rule _RETURN on materialized view mv2 | materialized view mv2  | i
 rule _RETURN on materialized view mv2 | materialized view mv2  | n
 rule _RETURN on materialized view mv2 | column q1 of materialized view mv1 | n 
  <
 rule _RETURN on materialized view mv2 | column q2 of materialized view mv1 | n 
  <
(14 rows)

where I marked the actually-interesting dependencies with <.

regards, tom lane




Re: Dependencies of Matviews?

2020-06-23 Thread Thomas Kellerer

Tom Lane schrieb am 23.06.2020 um 23:25:

I would like to extract the dependency between materialized views.
e.g. in the following situation:
 create materialized view mv1 as select ;
 create materialized view mv2 as select ... from mv1, ...;

or more readably,

regression=# select pg_describe_object(classid,objid,objsubid) as obj, 
pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from pg_depend 
where objid >= 58550 or refobjid >= 58550;


Ah, great.
That's what I was looking for, thanks

Thomas





Re: i am getting issue as FATAL: password authentication failed for user ' postgres' tried

2020-06-23 Thread Bruce Momjian
On Sat, Jun 20, 2020 at 06:28:52PM +0200, Peter J. Holzer wrote:
> On 2020-06-17 20:00:51 -0700, Adrian Klaver wrote:
> > On 6/17/20 7:14 PM, prachi surangalikar wrote:
> > > i have  tried every  thing  but still i  could not find the solution to
> > > this problem.
> > > i made changes in the pg_hba.conf file also , please help me to solve
> > > this problem.
> > 
> > Should have added to previous post:
> > 
> > Are you sure that you are using the correct password or that the 'postgres'
> > user has a password?
> 
> And that the OP is indeed using the 'postgres' user and not the ' postgres'
> user (as she wrote in the subject).

Uh, how are those different?

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee





Re: Persistent Connections

2020-06-23 Thread Bee.Lists
> 
> On Jun 23, 2020, at 4:51 PM, Michael Lewis  wrote:
> 
> Do you see anything in pg_stat_activity that stays idle for a while and then 
> *does* disappear on its own? Perhaps some types of connections are doing 
> client side/application stuff before telling the DB to close the connection.

I’m finding those queries sticking around.  These queries are very simple.  
Last login type of stuff.  

> Idle means the query finished and that was the last query run. It isn't 
> active or waiting on another process, that connection is open by idle.

OK.  The page that I load up is a dashboard and has a handful of queries.  From 
the looks of it, it looks like they’re still working, but idle.  But you’re 
saying they’re just open connections?  Why would they remain open?

I check for numbackends this way:

pgconns='psql -c "select datid, datname, numbackends, xact_commit, stats_reset 
from pg_stat_database where datname in ('\’'mydbname'\'');”'

> It sounds like a good time to set one up.

OK, some further questions:

Who do the connections belong to?  Not the client, not the server (apparently). 
 Is there one that’s independent and behaves as the front end of connection 
management?

> I would increase the limit directly, or with a pooler and research which 
> connections are behaving, and which are taking too long to close or not 
> closing at all. You could set up a process to snapshot pg_stat_activity every 
> minute or 5 and trace which pids are terminating properly, and/or make 
> logging very verbose.

How do I go about researching connection behaviour?  I guess a pooler should be 
investigated first.  I have that pgconns already logging, so I’ll do one for 
pg_stat_activity.  

Once I find culprits, what options do I have?  Not sure why new connections are 
made when these idle past connections seem valid and usable.  

There is agreement that ORMs shouldn’t be managing a connection pool, and this 
doesn’t achieve to do that.  I’ll be looking into a pooler.  This client (the 
gem is Sequel, btw) uses what it assumes are valid connections, but that’s 
where it fails as the database apparently disconnects prematurely.  The gem has 
a procedure to check how long since the last pool was investigated for legit 
connections, but I think that’s irrelevant.  It’s finding what it’s told are 
legit connections, which are not.  It’s been lied to.  


Cheers, Bee



Re: i am getting issue as FATAL: password authentication failed for user ' postgres' tried

2020-06-23 Thread Adrian Klaver

On 6/23/20 3:16 PM, Bruce Momjian wrote:

On Sat, Jun 20, 2020 at 06:28:52PM +0200, Peter J. Holzer wrote:

On 2020-06-17 20:00:51 -0700, Adrian Klaver wrote:

On 6/17/20 7:14 PM, prachi surangalikar wrote:

i have  tried every  thing  but still i  could not find the solution to
this problem.
i made changes in the pg_hba.conf file also , please help me to solve
this problem.


Should have added to previous post:

Are you sure that you are using the correct password or that the 'postgres'
user has a password?


And that the OP is indeed using the 'postgres' user and not the ' postgres'
user (as she wrote in the subject).


Uh, how are those different?



Extra space.

It happens:

https://stackoverflow.com/questions/62541581/django-db-utils-operationalerror-fatal-password-authentication-failed-for-user

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Persistent Connections

2020-06-23 Thread David G. Johnston
Why is there now a second thread for this topic?

On Tue, Jun 23, 2020 at 3:21 PM Bee.Lists  wrote:

> >
> > On Jun 23, 2020, at 4:51 PM, Michael Lewis  wrote:
> >
> > Do you see anything in pg_stat_activity that stays idle for a while and
> then *does* disappear on its own? Perhaps some types of connections are
> doing client side/application stuff before telling the DB to close the
> connection.
>
> I’m finding those queries sticking around.  These queries are very
> simple.  Last login type of stuff.
>
> > Idle means the query finished and that was the last query run. It isn't
> active or waiting on another process, that connection is open by idle.
>
> OK.  The page that I load up is a dashboard and has a handful of queries.
> From the looks of it, it looks like they’re still working, but idle.  But
> you’re saying they’re just open connections?  Why would they remain open?
>

"they (queries) are still working, but idle" - your terminology is
problematic and it is probably affecting your understanding.  As I said on
the other thread you should probably post the actual output you are
commenting on if you want to actually move this discussion forward.

> It sounds like a good time to set one up.
>
> OK, some further questions:
>
> Who do the connections belong to?  Not the client, not the server
> (apparently).  Is there one that’s independent and behaves as the front end
> of connection management?
>

As I asked on the other thread: a connection is a link between two
parties.  What does it mean to "belong to" in this context?  You have
mis-interpreted Tom's answer from the other thread.

> I would increase the limit directly, or with a pooler and research which
> connections are behaving, and which are taking too long to close or not
> closing at all. You could set up a process to snapshot pg_stat_activity
> every minute or 5 and trace which pids are terminating properly, and/or
> make logging very verbose.
>
> How do I go about researching connection behaviour?  I guess a pooler
> should be investigated first.


Until you get a better grasp of the basics you should not be introducing
any more moving parts.  If anything you need to remove some in order to
figure out which one of the existing parts is causing your problem.


> Once I find culprits, what options do I have?  Not sure why new
> connections are made when these idle past connections seem valid and
> usable.
>

Not sure how you expect an answer to "how do I fix the problem" without an
understanding of what the problem is.

There is agreement that ORMs shouldn’t be managing a connection pool, and
> this doesn’t achieve to do that.  I’ll be looking into a pooler.  This
> client (the gem is Sequel, btw) uses what it assumes are valid connections,
> but that’s where it fails as the database apparently disconnects
> prematurely.  The gem has a procedure to check how long since the last pool
> was investigated for legit connections, but I think that’s irrelevant.
> It’s finding what it’s told are legit connections, which are not.  It’s
> been lied to.
>

That the pooler in your application is being lied to is probably the most
likely answer, as Tom said in the other thread.  But the rest of what you
are saying here just sounds like nonsense.  "I'll be looking into a pooler.
... The gem has a procedure to check how long since the last pool was
investigated for legit connections ...".  I'd like to point out that you
seem to be saying that you simultaneously have a connection pool and don't
have a connection pool involved here...

As I suggested on the other thread, and repeated above, you need to build
up a script that can reproduce the problem.  Something that can open a
connection to the server and then wait for a period of time before
executing a query against it to see if that session gets dropped on the
client side while still remaining visible on the server.

David J.


Re: Persistent Connections

2020-06-23 Thread Tim Cross


Bee.Lists  writes:

> I have an issue with a server (v10) that’s seeing increasing connections 
> until it’s maxxed-out.  
>
> max_connections for my 4-core server is set to 12.  
>
> I’ve installed pg_stat_activity and pg_stat_statements.
>
> I access this server through a web app, and another client on two machines.  
> I also send queries through SSH on the LAN.  psql queries indicate there are 
> too many clients already.  I’ve been getting errors from my workstation 
> through a Ruby gem that says dropped connections happen.  Essentially, the 
> server isn’t giving up connections and clients from all sides are showing 
> some issues.
>
> pg_stat_activity has shown a handful of idle queries that are quite simple.  
> I’ve tested those queries and they seem fine on a workstation client.  I’m 
> assuming these queries somehow don’t finish and leave the connection open, 
> but I could be wrong.  All of this is new to me.  
>
> It was suggested on Slack that it sounds like my workstation had some TCP 
> issues with these connections, and that it was a firewall.  I disabled the 
> firewall and the same happened.  The firewall has been restarted.  
>
> I am running no connection pool tool of any sort, and I’ve never used one.  
>
> At this point I don’t know what to even be watching for.  Connections 
> increase and connections “drop”, according to the gem I’m using.  I have 
> simple queries that are idle and won’t disappear for some reason.  
>
> How can I figure this out so the connections remain within the 
> max_connections limit, and connections are not dropped?  
>
> Any insight appreciated.  
>

Sounds like your web app may not be closing connections once it has
finished with them? The fact your seeing idle connections would seem to
support this. I would be verifying there isn't a code path in your
client application which is failing to close a connection correctly. Seeing
connections go up and down in the gem may not be telling you the full story - 
could
be that your client connection objects are being destroyed in your app,
but are not closing the connection correctly so PG is unaware the client
has finished. 

Typically, due to the overhead of making a connection, you don't want
your client app/web app to create a new connection for every query.
Instead, you would use some type of connection pool. Many development
languages support some form of pooling as part of their db connection
library (don't know about Ruby, but JS, Python, PHP, Java, Perl all do)
and there are external 3rd party solutions like pgbouncer which sit
between your client and the database and work a bit like a connection
broker which will manage a pool of connections.

>From the description of what your doing, I would first look to see what
level of connection pooling your development language supports. This
will likely be easier to configure and use than setting up a whole
additional bit of infrastructure which is probably far more powerful
than you need.

I would also go through your code and make sure that every time you
create a database connection, there is some code which is closing that
connection once your finished with it. This is a frequent source of
problems, especially during development when your code might throw an
error and the code you have to disconnect does not get executed or you
simply forget to issue a disconnect when your finished. 

A connection pool can help in tracking down such issues as well. Most
pooling solutions will allow you to set a max pool size. In addition to
enabling you to 'reserve' a set number of connections for a client, you
will know which client seems to be running out of connections, helping
to identify the culprit. 

-- 
Tim Cross




Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Tim Cross


Jason Ralph  writes:

> Hello List,
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
> (R
> ed Hat 4.4.7-23), 64-bit
>
> I am planning an update on a table with 20Million records, I have been 
> researching the best practices.  I will remove all indexes and foreign keys 
> prior to the update, however I am not sure if I should use a transaction or 
> not.
> My thought process is that a transaction would be easier to recover if 
> something fails, however it would take more time to write to the WAL log in a 
> transaction.
>
> Would it make sense to make a back up of the table then execute update 
> without a transaction? How would you guys do it?
>

This is really the sort of thing you need to test in a development
environment. There are just too many unknowns to provide a definitive
answer. You need to run this with minimal 'adjustments' in a dev
scenario to get a baseline and then decide what, if anything, you need
to do.

Just because you have 20M records, it doesn't mean that you have to do
anything 'special'. A lot will depend on how many of that 20M need to be
updated, the size of the records and fields being updated, how the
updated records are spread through the table, what other processes are
updating or querying the table during the operation, which indexes are
used and which are unnecessary or just update overhead etc, etc.

Trying to optimise this operation without knowing where the bottlenecks
are is extremely difficult. Unless you know that every one of the 20M
records will be updated, I wouldn't rush into removing all indexes. I
would definitely look at breaking up the update into smaller
transactions.

As to the question about whether to use a transaction or not, it really
depends on your data and how easily you can recover from a partial
transaction. Assume the update is going to be interrupted before
completion. If you can recover from that situation easily, then not
using a full transaction with commit/rollback may be acceptable.
otherwise, I would avoid it.

At any rate, baseline first as you may be surprised. I was recently
working on an application using PG 9.6 which had numerous web clients
and a backend process which ran multiple times a day and which updated
millions of rows each run in a table with over 100M records. The full
update would take around 20 - 30 min, but consisted of multiple
transactions. The data was part of a weather information system used for
predictive modelling. The data consisted of rainfall, max/min temp,
humidity and vapour pressure for 9am and 3pm and solar radiation for
grid points 5km apart covering all of Australia. The table had records
for every day for the last 4+ years and as yet, does not use
partitioning. This is a lot of records, but each record is quite small.
While at some point, we will need to look at additional optimisations
like partitioning, the performance is currently within acceptable
limits. The only optimisation we have done is basic db tuning. When I
did this tuning, performance saw a significant improvement, but none of
it was specific to this large table or the update process.

The key point is that large numbers of records doesn't necessarily mean
that an update will be slow (or more specifically too slow for
acceptable performance, whatever that is). Assuming you will need to
take lots of special action may be premature - you may need to do none
or only a couple of things. Get a baseline first and you will know how
big of an issue you have. You will also be able to determine if what you
try has any benefit. 
-- 
Tim Cross




Re: UPDATE on 20 Million Records Transaction or not?

2020-06-23 Thread Tom Lane
Tim Cross  writes:
> Jason Ralph  writes:
>> I am planning an update on a table with 20Million records, I have been 
>> researching the best practices.  I will remove all indexes and foreign keys 
>> prior to the update, however I am not sure if I should use a transaction or 
>> not.
>> My thought process is that a transaction would be easier to recover if 
>> something fails, however it would take more time to write to the WAL log in 
>> a transaction.
>> 
>> Would it make sense to make a back up of the table then execute update 
>> without a transaction? How would you guys do it?

> Just because you have 20M records, it doesn't mean that you have to do
> anything 'special'.

Indeed.  That's really not a lot of data, on any modern machine.
Just for experiment, I tried updating a table of 20M rows on my
development machine (no powerhouse by any serious standard, plus
it's running a debug build which is likely 30% or more slower than
production):

regression=# \timing
Timing is on.
regression=# create table foo as select x, 'what' as w from 
generate_series(1,2000) x;
SELECT 2000
Time: 23573.818 ms (00:23.574)
regression=# update foo set w = 'other';
UPDATE 2000
Time: 60878.418 ms (01:00.878)

OK, that's cheating, because I had no indexes, so ...

regression=# create index on foo(w);
CREATE INDEX
Time: 11774.094 ms (00:11.774)
regression=# update foo set w = x::text;
UPDATE 2000
Time: 186425.606 ms (03:06.426)

That may still be cheating, because I had no foreign-key constraints or
other possibly-expensive features.  And you might have lots of indexes
not just one.  Still, this doesn't seem to be territory where you need to
spend days researching the best alternative.  I'm not even convinced that
it's worth your time to remove and rebuild indexes.

But the real reason I'm commenting is that the original question seems to
embed a whole lot of misunderstanding about how Postgres works.  There is
no such thing as updating "outside a transaction".  You can use an
explicit transaction (ie BEGIN + COMMIT) if you'd like to preserve the
option to roll back after the UPDATE command proper, but that will make
absolutely no difference to performance of the UPDATE.  Likewise, there
is no such thing as not writing WAL (unless you're using an UNLOGGED
table, which you should only ever do for throwaway data).  And also, there's
unlikely to be any performance benefit from carving up the update into
smaller pieces.  If you're seriously short of disk space you might need
to split up the update (because just doing it in one go will about double
the table's size-on-disk) and vacuum between the stages to reclaim disk
space.  But that will take more time not less ... and again, on any
modern machine, such steps shouldn't be necessary for only 20M rows.

regards, tom lane