Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause
On Thu, Jan 21, 2010 at 9:27 PM, Allen Johnson wrote: > What I noticed in the production query was that ~1000ms was spent on > sorting alone. The hack query reduced that to ~400ms. I should also > note that there was plenty of work_mem and that the sort was not > hitting disk. > The "hack" form actually seems far worse to me but I guess it depends on the actual data layout. Have you tried something like this which I would expect to be far faster: select ct.name, c.lname, c.fname, c.mname, c.email, c.address1, c.address2, c.city, c.state, c.zip_code, (select count(*) from attachments where attachments.contact_id = c.id) from contacts c inner join contact_types ct on (ct.code = c.contact_type_code) where c.company_id = 1 order by ct.name, c.lname, c.fname, c.mname; The question arises why Postgres can't automatically detect that this query is equivalent. That might come when we start implementing the "functional dependency" stuff from the standard and can determine that the group by list uniquely identifies a row from the first join. Currently we don't do that kind of analysis. -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause
On Fri, Jan 22, 2010 at 2:02 PM, Greg Stark wrote: > The question arises why Postgres can't automatically detect that this > query is equivalent. Hm, actually rereading your query it's not technically equivalent. Since you weren't grouping on contact.id or contact_type.code if you happened to have multiple contacts that had the same name and address and multiple codes which had the same name then your query would consolidate them into one result record. You might know that will never happen but the database can't prove that's true so it would never be able to do this transform. -- greg -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote: > > [...] > > For the test I changed all regional settings to "Polish (Poland)", > restarted the machine, created a new cluster and a new database (using > the commands from the last mail from Heikki). > > A side-note: tested exactly the same function but on "Bulgarian > (Bulgaria)" system locale and cluster and database created with > "Bulgarian_Bulgaria.1251" locale - no errors. > > -- > Milen A. Radev > No error for Bulgarian locale might be related to absence of Bulgarian national characters in words generated by my test code. I've put only polish characters (ąćęłńóśźż) - and it fails with polish collation, but after removing polish characters - i get no error with polish collation. I thing it may behave exactly the same for Bulgarian - to produce errors you must modify letters to include Bulgarian chars (line: letters varchar(255) := 'ąćęłńóśźżjklmnopqrstuvwxyz'; --<-- put these letters here). Have you tried this? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Re: BUG #5290: Simple loop with insert into and check to avoid duplicate values fails
WildWezyr написа: On Jan 21, 6:17 pm, mi...@radev.net ("Milen A. Radev") wrote: [...] For the test I changed all regional settings to "Polish (Poland)", restarted the machine, created a new cluster and a new database (using the commands from the last mail from Heikki). A side-note: tested exactly the same function but on "Bulgarian (Bulgaria)" system locale and cluster and database created with "Bulgarian_Bulgaria.1251" locale - no errors. -- Milen A. Radev No error for Bulgarian locale might be related to absence of Bulgarian national characters in words generated by my test code. I've put only polish characters (ąćęłńóśźż) - and it fails with polish collation, but after removing polish characters - i get no error with polish collation. I thing it may behave exactly the same for Bulgarian - to produce errors you must modify letters to include Bulgarian chars (line: letters varchar(255) := 'ąćęłńóśźżjklmnopqrstuvwxyz'; --<-- put these letters here). Have you tried this? Yes, actually I tried it yesterday but just to be sure I tried it again today - no errors: NOTICE: i = 800 NOTICE: i = 900 NOTICE: i = 1000 Total query runtime: 2231947 ms. 1 row retrieved. The "letters" variable used in the test (in case you have the time to try it yourself): letters varchar(255) := 'юяьъщшчцфjklmnopqrstuvwxyz'; And the database definition: CREATE DATABASE spb WITH OWNER = postgres ENCODING = 'UTF8' LC_COLLATE = 'Bulgarian_Bulgaria.1251' LC_CTYPE = 'Bulgarian_Bulgaria.1251' CONNECTION LIMIT = -1; The system locale is "Bulgarian (Bulgaria)". -- Milen A. Radev -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5294: Sorts on more than just the order-by clause
Allen Johnson writes: > Ok, I've generated a test database with: > * 20,000 users > * 250,000 contacts > * 1,124,700 attachments > The summary of the results is that the normal query takes about 32sec > on my machine. The hack query takes about 13sec. I poked at this for a bit. At least with the test data (dunno about your real data), the first few grouping columns are pretty nearly unique so the "extra" sort columns really aren't affecting the runtime anyway. I believe that the reason the hacked query is cheaper is simply that the sort is sorting fewer rows because it's applied after aggregation instead of beforehand. The planner is well aware of that effect, but the reason it fails to choose hashed aggregation is that it doesn't think the aggregation will reduce the number of rows --- so it estimates the sort for that case as being much more expensive than it really is. Notice that the post-aggregation and pre-aggregation rowcount estimates are just the same in both these queries. If I force choose_hashed_grouping() to make the other decision, I get the same plan out of the "normal" query as the hacked query produces. I have an idea for improving the accuracy of the post-aggregation rowcount estimate, which I'll post on pgsql-hackers in a bit. But it's not something I have enough confidence in to risk back-patching. So for the moment your hack with forcing the sort to be done separately is probably your best answer. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs