Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause

2010-01-22 Thread Greg Stark
On Thu, Jan 21, 2010 at 9:27 PM, Allen Johnson  wrote:
> What I noticed in the production query was that ~1000ms was spent on
> sorting alone. The hack query reduced that to ~400ms. I should also
> note that there was plenty of work_mem and that the sort was not
> hitting disk.
>

The "hack" form actually seems far worse to me but I guess it depends
on the actual data layout.

Have you tried something like this which I would expect to be far faster:

select
 ct.name, c.lname, c.fname, c.mname,
 c.email, c.address1, c.address2,
 c.city, c.state, c.zip_code,
 (select count(*) from attachments where attachments.contact_id = c.id)
from
 contacts c
 inner join contact_types ct on (ct.code = c.contact_type_code)
where
 c.company_id = 1
order by
 ct.name, c.lname, c.fname, c.mname;


The question arises why Postgres can't automatically detect that this
query is equivalent. That might come when we start implementing the
"functional dependency" stuff from the standard and can determine that
the group by list uniquely identifies a row from the first join.
Currently we don't do that kind of analysis.

-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause

2010-01-22 Thread Greg Stark
On Fri, Jan 22, 2010 at 2:02 PM, Greg Stark  wrote:
> The question arises why Postgres can't automatically detect that this
> query is equivalent.

Hm, actually rereading your query it's not technically equivalent.
Since you weren't grouping on contact.id or contact_type.code if you
happened to have multiple contacts that had the same name and address
and multiple codes which had the same name then your query would
consolidate them into one result record. You might know that will
never happen but the database can't prove that's true so it would
never be able to do this transform.


-- 
greg

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

2010-01-22 Thread WildWezyr
On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote:
>
> [...]
>
> For the test I changed all regional settings to "Polish (Poland)",
> restarted the machine, created a new cluster and a new database (using
> the commands from the last mail from Heikki).
>
> A side-note: tested exactly the same function but on "Bulgarian
> (Bulgaria)" system locale and cluster and database created with
> "Bulgarian_Bulgaria.1251" locale - no errors.
>
> --
> Milen A. Radev
>

No error for Bulgarian locale might be related to absence of Bulgarian
national characters in words generated by my test code. I've put only
polish characters (ąćęłńóśźż) - and it fails with polish collation,
but after removing polish characters - i get no error with polish
collation. I thing it may behave exactly the same for Bulgarian - to
produce errors you must modify letters to include Bulgarian chars
(line: letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz'; --<--
put these letters here).

Have you tried this?

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails

2010-01-22 Thread Milen A. Radev

WildWezyr написа:

On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote:

[...]

For the test I changed all regional settings to "Polish (Poland)",
restarted the machine, created a new cluster and a new database (using
the commands from the last mail from Heikki).

A side-note: tested exactly the same function but on "Bulgarian
(Bulgaria)" system locale and cluster and database created with
"Bulgarian_Bulgaria.1251" locale - no errors.

--
Milen A. Radev



No error for Bulgarian locale might be related to absence of Bulgarian
national characters in words generated by my test code. I've put only
polish characters (ąćęłńóśźż) - and it fails with polish collation,
but after removing polish characters - i get no error with polish
collation. I thing it may behave exactly the same for Bulgarian - to
produce errors you must modify letters to include Bulgarian chars
(line: letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz'; --<--
put these letters here).

Have you tried this?



Yes, actually I tried it yesterday but just to be sure I tried it again 
today - no errors:



NOTICE:  i = 800
NOTICE:  i = 900
NOTICE:  i = 1000

Total query runtime: 2231947 ms.
1 row retrieved.


The "letters" variable used in the test (in case you have the time to 
try it yourself):

letters varchar(255) :=   'юяьъщшчцфjklmnopqrstuvwxyz';

And the database definition:

CREATE DATABASE spb
  WITH OWNER = postgres
   ENCODING = 'UTF8'
   LC_COLLATE = 'Bulgarian_Bulgaria.1251'
   LC_CTYPE = 'Bulgarian_Bulgaria.1251'
   CONNECTION LIMIT = -1;


The system locale is "Bulgarian (Bulgaria)".


--
Milen A. Radev

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause

2010-01-22 Thread Tom Lane
Allen Johnson  writes:
> Ok, I've generated a test database with:
>   * 20,000 users
>   * 250,000 contacts
>   * 1,124,700 attachments
> The summary of the results is that the normal query takes about 32sec
> on my machine. The hack query takes about  13sec.

I poked at this for a bit.  At least with the test data (dunno about
your real data), the first few grouping columns are pretty nearly unique
so the "extra" sort columns really aren't affecting the runtime anyway.
I believe that the reason the hacked query is cheaper is simply that the
sort is sorting fewer rows because it's applied after aggregation
instead of beforehand.

The planner is well aware of that effect, but the reason it fails to
choose hashed aggregation is that it doesn't think the aggregation will
reduce the number of rows --- so it estimates the sort for that case as
being much more expensive than it really is.  Notice that the
post-aggregation and pre-aggregation rowcount estimates are just the
same in both these queries.  If I force choose_hashed_grouping() to
make the other decision, I get the same plan out of the "normal"
query as the hacked query produces.

I have an idea for improving the accuracy of the post-aggregation
rowcount estimate, which I'll post on pgsql-hackers in a bit.  But
it's not something I have enough confidence in to risk back-patching.
So for the moment your hack with forcing the sort to be done separately
is probably your best answer.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs