Difference between Bulk Load (Multiple inserts or single inserts) and COPY

2019-12-06 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/sql-copy.html
Description:

Hello,

> I experimented with Bulk load and COPY.
> Loading in COPY was very fast.
> However, after COPYing data from a CSV file to PostgreSQL Table. The
query
> execution took lot of time for 1 of the first 4 queries.
> Only this slow query was taking so much time, that even if I had used
normal
> bulk load, it would have been faster in total.
> Then all other Query executions took equal time as it took while querying
a
> table after the Bulk data load method.
>
> So, I want to know the exact reason what's the issue with COPY.
> How exactly they differ? The only thing from the document I could
identify
> was row security.
> But it did not mention anything about indexing. Like, in Bulk load, do
> indices(or constraint checks) are created with data loading?
> & in COPY it's done after? so when indices are being created that query
> slows down??

*Added details*

"Table & Query details"
I have 1 Table is there having 3 attributes:
TableName{ Column1 Varchar300,  Column2 Varchar300,  Column3 Varchar300};
I haven't created any primary keys or FKs. No other constraints.

Data set size: 150MB / 1M records

Queries:
Select count(*) from Table;
Select count(distinct( Column1, Column2 , Column3 )) from Table;
Select Column1, Column2, Column3 from Table as T1, Table as T2,  Table as T3
where T1. Column1=T2.Column3 and T1. Column1="xyz";

Please let me know, how Bulk load vs. COPY different in both situations
1) Do the internal representation differs after data is loaded using Bulk
vs. COPY?
2) what if I have added Keys and Constraints, are they checked later? Means
loading is shown completed but in background it's creating indices/checking
constraints.
3) Can it be the reason that some other process(which?) is running in
background during query execution ? as I query the data as soon as the load
after COPY is complete.


Description of Authentication Methods Supported for Map is Misleading

2019-12-06 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/auth-pg-hba-conf.html
Description:

"This option is supported for all authentication methods that receive
external user names."

More properly, the authentication methods supported are:  ident, peer,
gassapi, sspi, and cert.  

LDAP is not supported and attempting to use map with LDAP provides the
following error: 
'authentication option ""map"" is only valid for authentication methods
ident, peer, gssapi, sspi, and cert'


Re: monitoring-stats.html is too impenetrable

2019-12-06 Thread James Salsman
Thanks, Michael, but I am absolutely convinced that whether a needed
index exists or not is absolutely one of the most run-time
consequential inputs to the query planner. Also, that page is where
people look to optimize, unlike the impenetrable wall-of-text stats
page. Please correct me if I am wrong. Thank you for your
consideration.

Best regards,
Jim

On Thu, Dec 5, 2019 at 7:05 PM Michael Paquier  wrote:
>
> On Wed, Dec 04, 2019 at 03:29:55AM -0800, James Salsman wrote:
> > Thank you for your thoughtful reply. This might be much easier:
> >
> > How about adding another example to
> > https://www.postgresql.org/docs/11/planner-stats.html ?
>
> Not sure I see the parallel here.  This page talks about planner
> statistics, and yours about being able to find missing indexes because
> of incorrect stats.
>
> > SELECT relname, seq_scan-idx_scan AS too_much_seq,
> >case when seq_scan-idx_scan>0 THEN 'Missing Index?' ELSE 'OK' END,
> >pg_relation_size(relid::regclass) AS rel_size, seq_scan, idx_scan
> > FROM pg_stat_all_tables
> > WHERE schemaname='public' AND pg_relation_size(relid::regclass)>8
> > ORDER BY too_much_seq DESC;
>
> Again.  this is a bit more complex than that.
> --
> Michael




It is recommended to add detailed description about initdb ...

2019-12-06 Thread PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/creating-cluster.html
Description:

When I run initdb -D /home/myname/pgsql/data in /home/myname/pgsql/bin, it
return bash: initdb: command not found The others are like this. It must
be /home/myname/pgsql/bin/initdb -D /home/myname/pgsql/data.


Re: Description of Authentication Methods Supported for Map is Misleading

2019-12-06 Thread Magnus Hagander
On Fri, Dec 6, 2019 at 10:13 AM PG Doc comments form 
wrote:

> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/10/auth-pg-hba-conf.html
> Description:
>
> "This option is supported for all authentication methods that receive
> external user names."
>
> More properly, the authentication methods supported are:  ident, peer,
> gassapi, sspi, and cert.
>
> LDAP is not supported and attempting to use map with LDAP provides the
> following error:
> 'authentication option ""map"" is only valid for authentication methods
> ident, peer, gssapi, sspi, and cert'
>

This is correct. LDAP authentication does not receive external usernames.
It uses an external service to validate the password, but it gets the
username from the client.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: description of Aggregate Expressions

2019-12-06 Thread John Lumby
On 12/05/19 18:06, David G. Johnston wrote:
On Thu, Dec 5, 2019 at 3:18 PM John Lumby 
<johnlu...@hotmail.com>
 wrote:
In PostgreSQL 12.1 Documentation chapter 4.2.7. Aggregate Expressions  it says


The syntax of an aggregate expression is one of the following:
  ...
aggregate_name (DISTINCT expression [ , ... ] [ order_by_clause ] ) [ FILTER ( 
WHERE filter_clause ) ]
...

I believe this is incorrect in the case where the DISTINCT is on a 
comma-separated list of expressions.
It would imply that this is legal

It is...you didn't get a syntax error.

Hmm,  even though true,  I think this is unhelpful.
If a reference document states that the syntax for a something-or-other 
construct is  one of the following diagrams,
then I expect that the diagrams are valid for *every* kind of 
something-or-other,  not just some.
Yet the diagram I quote always results in being rejected in the case of COUNT -
which I consider to be as good as saying it is invalid syntax.


select count(DISTINCT parent_id , name) from  mytable

but that is rejected with
ERROR:  function count(bigint, text) does not exist

The error is that while the query is syntactically correct in order to execute 
it as written a function would need to exist that does not.  As far as a 
general syntax diagram goes it has correctly communicated what is legal.


whereas

select count(DISTINCT ( parent_id , name) ) from mytable

is accepted.

Correct, converting the two individual columns into a "tuple" allows the 
default tuple distinct-making infrastructure to be used to execute the query.


So I think to handle all cases the line in the doc should read

aggregate_name (DISTINCT ( expression [ , ... ] ) [ order_by_clause ] ) [ 
FILTER ( WHERE filter_clause ) ]

I don't know how to indicate that those extra parentheses can be omitted if the 
list has only one expression.

Then I would have to say the proposed solution to this edge case is worse than 
the problem.  I also don't expect there to be a clean solution to dealing with 
the complexities of expressions at the syntax diagram level.

Yes,  I see what I suggested is not ideal either. But I think something 
needs to be changed.

How about replacing "expression [ , ... ]"  by "parameter_list" in the 
description, and then stating that parameter_list can be either a single 
expression or ,   if the particular aggregate function accepts it (for which,   
consult that function's reference),a comma-separated list of expressions.


David J.





Re: description of Aggregate Expressions

2019-12-06 Thread Tom Lane
John Lumby  writes:
> On 12/05/19 18:06, David G. Johnston wrote:
>> Then I would have to say the proposed solution to this edge case is worse 
>> than the problem.  I also don't expect there to be a clean solution to 
>> dealing with the complexities of expressions at the syntax diagram level.

> Yes,  I see what I suggested is not ideal either. But I think something 
> needs to be changed.

> How about replacing "expression [ , ... ]"  by "parameter_list" in the 
> description, and then stating that parameter_list can be either a single 
> expression or ,   if the particular aggregate function accepts it (for which, 
>   consult that function's reference),a comma-separated list of 
> expressions.

That's just as wrong.  As we tried to explain before, the
parenthesized-list syntax is a row constructor, so it only works
in cases where the aggregate function can accept a composite
argument type.  Most don't.

Moreover, the very same thing holds in *any* expression context,
not only aggregate arguments.  So if we took this seriously there
would have to be a lot of other places plastered with equally
confusing/unhelpful addenda.

regards, tom lane




Re: description of Aggregate Expressions

2019-12-06 Thread John Lumby
Tom Lane wrote

> > How about replacing "expression [ , ... ]"  by "parameter_list" in the 
> > description, 
> > and then stating that parameter_list can be either a single expression or , 
> >  if the particular aggregate function accepts it (for which,   consult that 
> > function's reference),
> >   a comma-separated list of expressions.
> 
> That's just as wrong.  As we tried to explain before, the
> parenthesized-list syntax is a row constructor, so it only works
> in cases where the aggregate function can accept a composite
> argument type.  Most don't.
>

But surely not *all* cases of a multi-expression parameter list of an aggregate 
function are row constructors are they?    What about

select parent_id, name, GROUPING(parent_id , name), count(*) FROM mytable GROUP 
BY ROLLUP(parent_id , name);

In the piece "GROUPING(parent_id , name)" ,
  is "(parent_id , name)"  a row constructor or a list of two expressions?

Or are you saying those are one and the same thing?

Cheers,   John



Re: description of Aggregate Expressions

2019-12-06 Thread John Lumby
John Lumby wrote:
<  
< Tom Lane wrote
< 
< > > How about replacing "expression [ , ... ]"  by "parameter_list" in the 
description, 
< > > and then stating that parameter_list can be either a single expression or 
, 
< > >  if the particular aggregate function accepts it (for which,   consult 
that function's reference),
< > >   a comma-separated list of expressions.
< > 
< > That's just as wrong.  As we tried to explain before, the
< > parenthesized-list syntax is a row constructor, so it only works
< > in cases where the aggregate function can accept a composite
< > argument type.  Most don't.
< >
< 
< But surely not *all* cases of a multi-expression parameter list of an 
aggregate function are row constructors are they?What about
< 
< select parent_id, name, GROUPING(parent_id , name), count(*) FROM mytable 
GROUP BY ROLLUP(parent_id , name);
< 
< In the piece "GROUPING(parent_id , name)" ,
<   is "(parent_id , name)"  a row constructor or a list of two expressions?
< 
< Or are you saying those are one and the same thing?
< 

I think I can answer my own question   -  No they are not the same  -   because 
:

select parent_id, name, GROUPING(ROW(parent_id , name)), count(*) FROM mytable 
GROUP BY ROLLUP(parent_id , name);
ERROR:  arguments to GROUPING must be grouping expressions of the associated 
query level
LINE 1: select parent_id, name, GROUPING(ROW(parent_id , name)), cou...
 ^