Re: [PERFORM] hash semi join caused by "IN (select ...)"

2011-05-16 Thread Dave Johansen
On Mon, May 16, 2011 at 3:30 PM, Clemens Eisserer wrote: > Hi, > > I have a quite complex, performance sensitive query in a system with a > few (7) joins: > select from t1 left join t2 WHERE id IN (select ) > > For this query the planner evaluates the IN with a hash semi join last, >

Re: [PERFORM] hash semi join caused by "IN (select ...)"

2011-05-16 Thread Tom Lane
Clemens Eisserer writes: > I have a quite complex, performance sensitive query in a system with a > few (7) joins: > select from t1 left join t2 WHERE id IN (select ) Does it work as expected with one less join? If so, try increasing join_collapse_limit ...

[PERFORM] hash semi join caused by "IN (select ...)"

2011-05-16 Thread Clemens Eisserer
Hi, I have a quite complex, performance sensitive query in a system with a few (7) joins: select from t1 left join t2 WHERE id IN (select ) For this query the planner evaluates the IN with a hash semi join last, and all the joining is done by hash joins for all rows contained in t1.

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Nathan Boley writes: >> The accesses to an index are far more likely to be clustered than the >> accesses to the underlying table, because the index is organized in a >> way that's application-meaningful and the table not so much. > So, to clarify, are you saying that if query were actually reque

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Nathan Boley
> The accesses to an index are far more likely to be clustered than the > accesses to the underlying table, because the index is organized in a > way that's application-meaningful and the table not so much. So, to clarify, are you saying that if query were actually requesting rows uniformly random

Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread Jeff
On May 16, 2011, at 1:06 PM, John Rouillard wrote: that is a #define in pgiosim.c So which is a better test, modifying the #define to allow specifying 200-300 1GB files, or using 64 files but increasing the size of my files to 2-3GB for a total bytes in the file two or three times the memory

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Jeff Janes writes: > On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote: >> Ok, it may not work as well with index'es, since having 1% in cache may very >> well mean that 90% of all requested blocks are there.. for tables in should >> be more trivial. > Why would the index have a meaningful hot

Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Mon, May 16, 2011 at 12:23:13PM -0400, Jeff wrote: > On May 16, 2011, at 9:17 AM, John Rouillard wrote: > >However, in my case I have an 8 disk raid 10 with a read only load (in > >this testing configuration). Shouldn't I expect more iops than a > >single disk can provide? Maybe pgiosim is hitti

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Jeff Janes
On Sun, May 15, 2011 at 9:49 PM, Jesper Krogh wrote: > > Ok, it may not work as well with index'es, since having 1% in cache may very > well mean that 90% of all requested blocks are there.. for tables in should > be more trivial. Why would the index have a meaningful hot-spot unless the underlyi

Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread Jeff
On May 16, 2011, at 9:17 AM, John Rouillard wrote: I am seeing really poor (70) iops with pgiosim. According to: http://www.tomshardware.com/reviews/2tb-hdd-7200,2430-8.html in the database benchmark they are seeing ~170 iops on a single disk for these drives. I would expect an 8 disk raid 10

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Tom Lane
Robert Haas writes: > On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote: >> Ok, it may not work as well with index'es, since having 1% in cache may very >> well mean that 90% of all requested blocks are there.. for tables in should >> be more trivial. > Tables can have hot spots, too. Consid

Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 12:49 AM, Jesper Krogh wrote: >> To me it seems like a robust and fairly trivial way to to get better >> numbers. The >> fear is that the OS-cache is too much in flux to get any stable numbers >> out >> of it. > > Ok, it may not work as well with index'es, since having 1% i

Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-16 Thread Robert Haas
On Mon, May 16, 2011 at 4:19 AM, Robert Klemme wrote: >> - If the planner chooses a Bitmap Index Scan, it effectively scans the >> index to figure out which table blocks to read, and then reads those >> table blocks in block number order, so that the I/O is sequential, >> with skips. > > Are these

Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Tom Lane
Denis de Bernardy writes: > An alternative plan could have been to hash join the tables together, > to sort the result set, and to apply the limit/offset on the resulting > set. Indeed. I rather wonder why the planner didn't do that to start with. This plan looks to me like it might be suffering

Re: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-16 Thread Merlin Moncure
On Sat, May 14, 2011 at 5:10 AM, Stefan Keller wrote: > Hi, > > I am conducting a benchmark to compare KVP table vs. hstore and got > bad hstore performance results when the no. of records is greater than > about 500'000. > > CREATE TABLE kvp ( id SERIAL PRIMARY KEY, key text NOT NULL, value text

Re: [PERFORM] Using pgiosim realistically

2011-05-16 Thread John Rouillard
On Sat, May 14, 2011 at 12:07:02PM -0500, k...@rice.edu wrote: > On Fri, May 13, 2011 at 09:09:41PM +, John Rouillard wrote: > > I am adding pgiosim to our testing for new database hardware and I am > > seeing something I don't quite get and I think it's because I am using > > pgiosim incorrect

Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Denis de Bernardy
[big nestloop with a huge number of rows] You're in an edge case, and I doubt you'll get things to run much faster: you want the last 1k rows out of an 18M row result set... It will be slow no matter what you do. What the plan is currently doing, is it's going through these 18M rows using a fo

Re: [PERFORM] Why query takes soo much time

2011-05-16 Thread Craig Ringer
On 05/16/2011 01:39 PM, Adarsh Sharma wrote: Dear all, I have a query on 3 tables in a database as :- _*Explain Analyze Output :-*_ explain anayze select c.clause, s.subject ,s.object , s.verb, s.subject_type , s.object_type ,s.doc_id ,s.svo_id from clause2 c, svo2 s ,page_content p where c.c

Re: [PERFORM] [PERFORMANCE] expanding to SAN: which portion best to move

2011-05-16 Thread Robert Klemme
On Fri, May 13, 2011 at 9:04 PM, Robert Haas wrote: > On Wed, May 4, 2011 at 6:31 AM, Willy-Bas Loos wrote: >> I'm asking them for (real) benchmarks, thanks for the advice. (fio is not >> available for us now to do it myself, grmbl) >> It just occurred to me that it is not necessarily the case th