Re: [PERFORM] Comparative performance

2005-10-04 Thread PFC


It's more understandable if the table names are in front of the column  
names :


SELECT relationship.topic_id1, relationship.topic_id2, topic.topic_name,  
topic.categ_id, topic.list_name, topic.title,
topic.url, topic.page_type, relationship.rel_type, entry_type.inverse_id,  
relationship.description AS rel_descrip,

relationship.created, relationship.updated
FROM relationship, topic, entry_type
WHERE
	((relationship.topic_id1 = topic.topic_id AND relationship.topic_id2 =  
1252)
	OR (relationship.topic_id2 =  topic.topic_id and relationship.topic_id1 =  
1252))

AND relationship.rel_type = entry_type.type_id
AND entry_type.class_id = 2
ORDER BY rel_type, list_name;

I see a few problems in your schema.
- topic_id1 and topic_id2 play the same role, there is no constraint to  
determine which is which, hence it is possible to define the same relation  
twice.

- as you search on two columns with OR, you need UNION to use indexes.
- lack of indexes
- I don't understand why the planner doesn't pick up hash joins...
- if you use a version before 8, type mismatch will prevent use of the  
indexes.


I'd suggest rewriting the query like this :
SELECT topic.*, foo.* FROM
topic,
(SELECT topic_id2 as fetch_id, topic_id1, topic_id2, rel_type, description  
as rel_descrip, created, updated

FROM relationship
WHERE
rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2)
AND topic_id1 = 1252
UNION
SELECT topic_id1 as fetch_id, topic_id1, topic_id2, rel_type, description  
as rel_descrip, created, updated

FROM relationship
WHERE
rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2)
AND topic_id2 = 1252)
AS foo
WHERE topic.topic_id = foo.fetch_id


CREATE INDEX'es ON
entry_type( class_id )

relationship( topic_id1, rel_type, topic_id2 )	which becomes your new  
PRIMARY KEY

relationship( topic_id2, rel_type, topic_id1 )

Of course, this doesn't explain how MySQL manages to execute the query  
in about 9 msec.  The only minor differences in the schema are:   
entry_type.title and rel_title are char(32) in MySQL,  
entry_type.class_id is a tinyint, and topic.categ_id, page_type and  
dark_ind are also tinyints. MySQL also doesn't have the REFERENCES.


Can you post the result from MySQL EXPLAIN ?

You might be interested in the following code. Just replace mysql_ by pg_,  
it's quite useful.


$global_queries_log = array();

function _getmicrotime()	{	list($u,$s) = explode(' ',microtime());	return  
$u+$s;	}


/*  Formats query, with given arguments, escaping all strings as needed.
db_quote_query( 'UPDATE junk SET a=%s WHERE b=%s', array( 1,"po'po" ) )
=>   UPDATE junk SET a='1 WHERE b='po\'po'
*/
function db_quote_query( $sql, $params=false )
{
// if no params, send query raw
if( !$params )
return $sql;

// quote params
foreach( $params as $key => $val )
{
if( is_array( $val ))
$val = implode( ',', $val );
$params[$key] = "'".mysql_real_escape_string( $val )."'";
}
return vsprintf( $sql, $params );
}

/*  Formats query, with given arguments, escaping all strings as needed.
Runs query, logging its execution time.
Returns the query, or dies with error.
*/
function db_query( $sql, $params=false )
{
// it's already a query
if( is_resource( $sql ))
return $sql;

$sql = db_quote_query( $sql, $params );

$t = _getmicrotime();
$r = mysql_query( $sql );
if( !$r )
{
		echo "Erreur MySQL  
:".mysql_error()."Requ􏻪te  
:".$sql."Traceback :";

foreach( debug_backtrace() as $t ) xdump( $t );
echo "";
die();
}
global $global_queries_log;
$global_queries_log[] = array( _getmicrotime()-$t, $sql );
return $r;
}

At the end of your page, display the contents of $global_queries_log.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Stefan Weiss
On 2005-09-30 01:21, Lane Van Ingen wrote:
>   (3) Assure that a disk-based table is always in memory (outside of keeping
> it in
>   memory buffers as a result of frequent activity which would prevent
> LRU
>   operations from taking it out) ?

I was wondering about this too. IMO it would be useful to have a way to tell
PG that some tables were needed frequently, and should be cached if
possible. This would allow application developers to consider joins with
these tables as "cheap", even when querying on columns that are not indexed.
I'm thinking about smallish tables like users, groups, *types, etc which
would be needed every 2-3 queries, but might be swept out of RAM by one
large query in between. Keeping a table like "users" on a RAM fs would not
be an option, because the information is not volatile.


cheers,
stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Lane Van Ingen
Yes, Stefan, the kind of usage you are mentioning is exactly why I was
asking.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Stefan Weiss
Sent: Tuesday, October 04, 2005 6:32 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way 


On 2005-09-30 01:21, Lane Van Ingen wrote:
>   (3) Assure that a disk-based table is always in memory (outside of
keeping
> it in
>   memory buffers as a result of frequent activity which would prevent
> LRU
>   operations from taking it out) ?

I was wondering about this too. IMO it would be useful to have a way to tell
PG that some tables were needed frequently, and should be cached if
possible. This would allow application developers to consider joins with
these tables as "cheap", even when querying on columns that are not indexed.
I'm thinking about smallish tables like users, groups, *types, etc which
would be needed every 2-3 queries, but might be swept out of RAM by one
large query in between. Keeping a table like "users" on a RAM fs would not
be an option, because the information is not volatile.


cheers,
stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Slow concurrent update of same row in a given table

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 07:59:34AM +0530, Rajesh Kumar Mallah wrote:
> > I see. These problems regularly come up in database design. The best thing
> > you can do is modify your database design/application such that instead of
> > incrementing a count in a single row, you insert a row into a table,
> > recording the 'dispatch_id'. Counting the number of rows for a given
> > dispatch id will give you your count.
> >
> 
> sorry i will be accumulating huge amount of rows in seperate table
> with no extra info when i really want just the count. Do you have
> a better database design in mind?
> 
> Also i encounter same problem in implementing read count of
> articles in sites and in counting banner impressions where same
> row get updated by multiple processes frequently.

Databases like to work on *sets* of data, not individual rows. Something
like this would probably perform much better than what you've got now,
and would prevent having a huge table laying around:

INSERT INTO holding_table ... -- Done for every incomming
connection/what-have-you

CREATE OR REPLACE FUNCTION summarize() RETURNS void AS $$
DECLARE
v_rows int;
BEGIN
DELETE FROM holding_table;
GET DIAGNOSTICS v_rows = ROW_COUNT;
UPDATE count_table
SET count = count + v_rows
;
END;
$$ LANGUAGE plpgsql;

Periodically (say, once a minute):
SELECT summarize()
VACUUM holding_table;
VACUUM count_table;
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Logarithmic change (decrease) in performance

2005-10-04 Thread Jim C. Nasby
On Wed, Sep 28, 2005 at 06:03:03PM -0400, Ron Peacetree wrote:
> 1= keep more of the data set in RAM
> 2= increase the size of your HD IO buffers
> 3= make your RAID sets wider (more parallel vs sequential IO)
> 4= reduce the atomic latency of your RAID sets
> (time for Fibre Channel 15Krpm HD's vs 7.2Krpm SATA ones?)
> 5= make sure your data is as unfragmented as possible
> 6= change you DB schema to minimize the problem
> a= overall good schema design
> b= partitioning the data so that the system only has to manipulate a
> reasonable chunk of it at a time.

Note that 6 can easily swamp the rest of these tweaks. A poor schema
design will absolutely kill any system. Also of great importance is how
you're using the database. IE: are you doing any row-by-row operations?

> In many cases, there's a number of ways to accomplish the above.
> Unfortunately, most of them require CapEx.
> 
> Also, ITRW world such systems tend to have this as a chronic
> problem.  This is not a "fix it once and it goes away forever".  This
> is a part of the regular maintenance and upgrade plan(s). 

And why DBA's typically make more money that other IT folks. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 08:44:16AM -0400, Joe wrote:
> CREATE TABLE entry (
>   entry_id serial PRIMARY KEY,
>   title VARCHAR(128) NOT NULL,
>   subtitle VARCHAR(128),
>   subject_type SMALLINT,
>   subject_id INTEGER REFERENCES topic,
>   actor_type SMALLINT,
>   actor_id INTEGER REFERENCES topic,
>   actor VARCHAR(64),
>   actor_role VARCHAR(64),
>   rel_entry_id INTEGER,
>   rel_entry VARCHAR(64),
>   description VARCHAR(255),
>   quote text,
>   url VARCHAR(255),
>   entry_date CHAR(10),
>   created DATE NOT NULL DEFAULT CURRENT_DATE,
>   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
> WITHOUT OIDS;
> CREATE INDEX entry_actor_id ON entry (actor_id);
> CREATE INDEX entry_subject_id ON entry (subject_id);

A few tips...

Fields in PostgreSQL have alignment requirements, so the smallints
aren't saving you anything right now. If you put both of them together
though, you'll save 4 bytes on most hardware.

You'll also get some minor gains from putting all the variable-length
fields at the end, as well as nullable fields. If you search the
archives for 'field order' you should be able to find some useful info.

Make sure these indexes exist if you'll be updating or inserting into
entry:

CREATE INDEX topic__subject_id ON topic(subject_id);
CREATE INDEX topic__actor_id ON topic(actor_id);

Also, the fact that subject and actor both point to topic along with
subject_type and actor_type make me suspect that your design is
de-normalized. Of course there's no way to know without more info.

FWIW, I usually use timestamptz for both created and updated fields.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [HACKERS] Query in SQL statement

2005-10-04 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 12:51:08PM -0700, Roger Hand wrote:
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] Behalf Of Jim C. Nasby
> > Sent: Friday, September 30, 2005 4:49 PM
> > Subject: Re: [PERFORM] [HACKERS] Query in SQL statement
>  
> > I suggest ditching the CamelCase and going with underline_seperators.
> > I'd also not use the bareword id, instead using bad_user_id. And I'd
> > name the table bad_user. But that's just me. :)
> 
> I converted a db from MS SQL, where tables and fields were CamelCase, and 
> just lowercased the ddl to create the tables.
> 
> So table and fields names were all created in lowercase, but I didn't have to 
> change
> any of the application code: the SELECT statements worked fine with mixed 
> case.
> 
> -- sample DDL
> CREATE TABLE testtable
> (
>   fieldone int4
> ) 
> insert into TestTable (fieldone) values (11);

That will usually work (see Tom's reply), but fieldone is a heck of a
lot harder to read than field_one. But like I said, this is the coding
conventions I've found work well; YMMV.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 04:39:36PM -0400, Joe wrote:
> Andreas Pflug wrote:
> >Hm, if you only have 4 tables, why do you need 12 queries?
> >To reduce queries, join them in the query; no need to merge them 
> >physically. If you have only two main tables, I'd bet you only need 1-2 
> >queries for the whole page.
> 
> There are more than four tables and the queries are not functionally 
> overlapping.  As an example, allow me to refer to the page 
> www.freedomcircle.com/topic.php/Economists.
> 
> The top row of navigation buttons (Life, Liberty, etc.) is created from a 
> query of the 'topic' table.  It could've been hard-coded as a PHP array, 
> but with less flexibility.  The alphabetical links are from a SELECT 
> DISTINCT substring from topic.  It could've been generated by a PHP for 
> loop (originally implemented that way) but again with less flexibility.  
> The listing of economists is another SELECT from topic.  The subheadings 
> (Articles, Books) come from a SELECT of an entry_type table --which 
> currently has 70 rows-- and is read into a PHP array since we don't know 
> what headings will be used in a given page.  The detail of the entries 

I suspect this might be something better done in a join.

> comes from that query that I posted earlier, but there are three additional 
> queries that are used for specialized entry types (relationships between 
> topics --e.g., Prof. Williams teaches at George Mason, events, and 
> multi-author or multi-subject articles and books).  And there's yet another 

Likewise...

> table for the specific book information.  Once the data is retrieved it's 
> sorted internally with PHP, at the heading level, before display.

It's often better to let the database sort and/or aggregate data.

> Maybe there is some way to merge all the queries (some already fairly 
> complex) that fetch the data for the entries box but I believe it would be 
> a monstrosity with over 100 lines of SQL.

Also, just because no one else has mentioned it, remember that it's very
easy to get MySQL into a mode where you have no data integrity. If
that's the case it's going to be faster than PostgreSQL (though I'm not
sure how much that affects the performance of SELECTs).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-04 Thread Jim C. Nasby
There was a discussion about this about 2 months ago. See the archives.

On Mon, Oct 03, 2005 at 01:02:26PM +0200, Steinar H. Gunderson wrote:
> I thought this might be interesting, not the least due to the extremely low
> price ($150 + the price of regular DIMMs):
> 
>   http://www.tomshardware.com/storage/20050907/index.html
> 
> Anybody know a good reason why you can't put a WAL on this, and enjoy a hefty
> speed boost for a fraction of the price of a traditional SSD? (Yes, it's
> SATA, not PCI, so the throughput is not all that impressive -- but still,
> it's got close to zero seek time.)
> 
> /* Steinar */
> -- 
> Homepage: http://www.sesse.net/
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Lists or external TABLE?

2005-10-04 Thread Jim C. Nasby
On Fri, Sep 30, 2005 at 10:34:35AM +0100, Richard Huxton wrote:
> xchris wrote:
> >
> >Let's suppose i need to add an info about addresses (which includes
> >country,city,capetc etc).
> >Addresses can vary from 1 to 20 entries..
> >
> >Talking about performance is it better to include a list of addresses in
> >TABLE A or is it better to create an external TABLE B?
> 
> Don't optimise before you have to.
> 
> Do the addresses belong in "A"? If so, put them there. On the other 
> hand, if you want items in "A" to have more than one address, or to 
> share addresses then clearly you will want a separate address table. 
> It's difficult to say more without a clear example of your requirements.
> 
> Even if you choose to alter your design for performance reasons, you 
> should make sure you run tests with realistic workloads and hardware. 
> But first, trust PG to do its job and design your database according to 
> the problem requirements.

On top of what Richard said, 5000 rows is pretty tiny. Even if each row
was 1K wide, that's still only 5MB.

Also, if from a data-model standpoint it doesn't matter which way you
go, I suggest looking at what it will take to write queries against both
versions before deciding. I tend to stay away from arrays because
they tend to be harder to query against.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

Hi Jim,

Jim C. Nasby wrote:

Also, just because no one else has mentioned it, remember that it's very
easy to get MySQL into a mode where you have no data integrity. If
that's the case it's going to be faster than PostgreSQL (though I'm not
sure how much that affects the performance of SELECTs).


Yes indeed.  When I added the REFERENCES to the schema and reran the conversion 
scripts, aside from having to reorder the table creation and loading (they used 
to be in alphabetical order), I also found a few referential integrity errors in 
the MySQL data.


Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of keeping
> > it in
> >   memory buffers as a result of frequent activity which would prevent
> > LRU
> >   operations from taking it out) ?
> 
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.

Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

PFC wrote:
- if you use a version before 8, type mismatch will prevent use of the  
indexes.


I'm using 8.0.3, but the type mismatch between relationship.rel_type and 
entry_type.type_id was unintended.  The current databases use SMALLINT for both. 
 The PostgreSQL schema was derived from an export script stored in Subversion, 
apparently before the column datatypes were changed.



CREATE INDEX'es ON
entry_type( class_id )

relationship( topic_id1, rel_type, topic_id2 )which becomes your 
new  PRIMARY KEY

relationship( topic_id2, rel_type, topic_id1 )


Creating the second relationship index was sufficient to modify the query plan 
to cut down runtime to zero:


 Sort  (cost=75.94..75.95 rows=2 width=381) (actual time=0.000..0.000 rows=0 
loops=1)

   Sort Key: r.rel_type, t.list_name
   ->  Nested Loop  (cost=16.00..75.93 rows=2 width=381) (actual 
time=0.000..0.000 rows=0 loops=1)
 Join Filter: ((("outer".topic_id1 = "inner".topic_id) AND 
("outer".topic_id2 = 1252)) OR (("outer".topic_id2 = "inner".topic_id) AND 
("outer".topic_id1 = 1252)))
 ->  Nested Loop  (cost=16.00..35.11 rows=1 width=169) (actual 
time=0.000..0.000 rows=0 loops=1)

   Join Filter: ("inner".rel_type = "outer".type_id)
   ->  Seq Scan on entry_type e  (cost=0.00..18.75 rows=4 width=4) 
(actual time=0.000..0.000 rows=15 loops=1)

 Filter: (class_id = 2)
   ->  Materialize  (cost=16.00..16.04 rows=4 width=167) (actual 
time=0.000..0.000 rows=0 loops=15)
 ->  Seq Scan on relationship r  (cost=0.00..16.00 rows=4 
width=167) (actual time=0.000..0.000 rows=0 loops=1)

   Filter: ((topic_id2 = 1252) OR (topic_id1 = 1252))
 ->  Seq Scan on topic t  (cost=0.00..30.94 rows=494 width=216) (never 
executed)

 Total runtime: 0.000 ms
(13 rows)

The overall execution time for the Economists page for PostgreSQL is within 4% 
of the MySQL time, so for the time being I'll leave the query in its current form.


Thanks for your help.

Joe


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SQL Function performance

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 10:54:58PM +0300, [EMAIL PROTECTED] wrote:
> Hi All,
>  
> I have a SQL function like :
>  
> CREATE OR REPLACE FUNCTION
> fn_get_yetkili_inisyer_listesi(int4, int4)
>   RETURNS SETOF kod_adi_liste_type AS
> $BODY$
>  SELECT Y.KOD,Y.ADI
>FROM T_YER Y
>   WHERE EXISTS (SELECT 1
> FROM T_GUZER G
>   WHERE (G.BIN_YER_KOD = $1 OR COALESCE($1,0)=0)
>  AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,$2) = 1
>  AND G.IN_YER_KOD = Y.KOD)
> AND Y.IPTAL = 'H';
> $BODY$
>   LANGUAGE 'sql' VOLATILE;
> 
> When i use like "SELECT * FROM
> fn_get_yetkili_inisyer_listesi(1, 3474)" and 
> planner result is  "Function Scan on
> fn_get_yetkili_inisyer_listesi  (cost=0.00..12.50 rows=1000
> width=36) (1 row) " and it runs very slow.
>  
> But when i use like 
> 
> "SELECT Y.KOD,Y.ADI
>  FROM T_YER Y
>WHERE EXISTS (SELECT 1
> FROM T_GUZER G
>   WHERE (G.BIN_YER_KOD
> = 1 OR COALESCE(1,0)=0)
>  AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,3474) = 1
>  AND G.IN_YER_KOD = Y.KOD)
>  AND Y.IPTAL = 'H';" 
> 
> planner result :
> 
> " 
>  QUERY PLAN
>  
> 
> -
>  Seq Scan on t_yer y  (cost=0.00..3307.79 rows=58 width=14)
>Filter: (((iptal)::text = 'H'::text) AND (subplan))
>SubPlan
>  ->  Index Scan using
> t_guzer_ucret_giris_performans_idx on t_guzer g  (cost
> =0.00..28.73 rows=1 width=0)
>Index Cond: ((bin_yer_kod = 1) AND (in_yer_kod =
> $0))
>Filter: (fn_firma_isvisible(firma_no, 3474) = 1)
> (6 rows)
> "
>   and it runs very fast.
> 
> Any idea ?

Need EXPLAIN ANALYZE.

I suspect this is due to a cached query plan. PostgreSQL will cache a
query plan for the SELECT the first time you run the function and that
plan will be re-used. Depending on what data you call the function with,
you could get a very different plan.

Also, you might do better with a JOIN instead of using EXISTS.  You can
also make this function STABLE instead of VOLATILE. Likewise, if
FN_FIRMA_ISVISIBLE can't change any data, you can also make it STABLE
which would likely improve the performance of the query. But neither of
these ideas would account for the difference between function
performance and raw query performance.

On a side note, if OR $1 IS NULL works that will be more readable (and
probably faster) than the OR COALESCE($1,0)=0.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-04 Thread Jim C. Nasby
On Fri, Sep 23, 2005 at 08:17:03PM +0800, K C Lau wrote:
> esdt=> create or replace function player_max_atdate (varchar(32)) returns 
> varchar(32) as $$
> esdt$>  select distinct on (PlayerID) AtDate from player where PlayerID= $1 
> order by PlayerID desc, AtDate desc limit 1;
> esdt$> $$ language sql immutable;
> CREATE FUNCTION

That function is not immutable, it should be defined as stable.

> esdt=> create or replace view VCurPlayer3 as select * from Player where 
> AtDate = player_max_atdate(PlayerID);
> CREATE VIEW
> esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where 
> PlayerID='0';
> 
>  Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
> width=23) (actual time=76.660..76.664 rows=1 loops=1)
>Index Cond: ((playerid)::text = '0'::text)
>Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
>  Total runtime: 76.716 ms
> 
> Why wouldn't the function get the row as quickly as the direct sql does?

PostgreSQL doesn't pre-compile functions, at least not until 8.1 (and
I'm not sure how much those are pre-compiled, though they are
syntax-checked at creation). Do you get the same result time when you
run it a second time? What time do you get from running just the
function versus the SQL in the function?

Also, remember that every layer you add to the cake means more work for
the database. If speed is that highly critical you'll probably want to
not wrap things in functions, and possibly not use views either.

Also, keep in mind that getting below 1ms doesn't automatically mean
you'll be able to scale to 1000TPS. Things will definately change when
you load the system down, so if performance is that critical you should
start testing with the system under load if you're not already.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

Jim C. Nasby wrote:

Make sure these indexes exist if you'll be updating or inserting into
entry:

CREATE INDEX topic__subject_id ON topic(subject_id);
CREATE INDEX topic__actor_id ON topic(actor_id);


Actually, topic's primary key is topic_id.


Also, the fact that subject and actor both point to topic along with
subject_type and actor_type make me suspect that your design is
de-normalized. Of course there's no way to know without more info.


Yes, the design is denormalized.  The reason is that a book or article is 
usually by a single author (an "actor" topic) and it will be listed under one 
main topic (a "subject" topic).  There's a topic_entry table where additional 
actors and subjects can be added.


It's somewhat ironic because I used to teach and/or preach normalization and the 
"goodness" of a 3NF+ design (also about having the database do aggregation and 
sorting as you mentioned in your other email).



FWIW, I usually use timestamptz for both created and updated fields.


IIRC 'created' ended up as a DATE because MySQL 4 has a restriction about a 
single TIMESTAMP column per table taking the default value of current_timestamp.


Joe


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-04 Thread Jim C. Nasby
On Fri, Sep 23, 2005 at 04:53:55PM +0800, K C Lau wrote:
> Thank you all for your suggestions. I' tried, with some variations too, but 
> still no success. The times given are the best of a few repeated tries on 
> an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.
> 
> For reference, only the following gets the record quickly:
> 
> esdt=> explain analyze select PlayerID,AtDate from Player a
>  where PlayerID='0' and AtDate = (select b.AtDate from Player b
>  where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 
>  1);
> 
>  Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23) 
> (actual time=0.054..0.057 rows=1 loops=1)
>Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = 
> ($0)::text))
>InitPlan
>  ->  Limit  (cost=0.00..0.75 rows=1 width=23) (actual 
> time=0.027..0.028 rows=1 loops=1)
>->  Index Scan Backward using pk_player on player 
> b  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 
> loops=1)
>  Index Cond: ((playerid)::text = '0'::text)
>  Total runtime: 0.132 ms

If you're doing that, you should try something like the following:
decibel=# explain analyze select  * from t where ctid=(select ctid from rrs 
order by rrs_id desc limit 1);
 QUERY PLAN 
 
-
 Tid Scan on t  (cost=0.44..4.45 rows=1 width=42) (actual time=0.750..0.754 
rows=1 loops=1)
   Filter: (ctid = $0)
   InitPlan
 ->  Limit  (cost=0.00..0.44 rows=1 width=10) (actual time=0.548..0.549 
rows=1 loops=1)
   ->  Index Scan Backward using rrs_rrs__rrs_id on rrs  
(cost=0.00..3.08 rows=7 width=10) (actual time=0.541..0.541 rows=1 loops=1)
 Total runtime: 1.061 ms
(6 rows)

decibel=# select count(*) from t; count  

 458752

Note that that's on my nice slow laptop to boot (the count took like 10
seconds).

Just remember that ctid *is not safe outside of a transaction*!! So you can't
do something like

SELECT ctid FROM ...
store that in some variable...
SELECT * FROM table WHERE ctid = variable
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 05:11:19PM -0400, Joe wrote:
> Hi Jim,
> 
> Jim C. Nasby wrote:
> >Also, just because no one else has mentioned it, remember that it's very
> >easy to get MySQL into a mode where you have no data integrity. If
> >that's the case it's going to be faster than PostgreSQL (though I'm not
> >sure how much that affects the performance of SELECTs).
> 
> Yes indeed.  When I added the REFERENCES to the schema and reran the 
> conversion scripts, aside from having to reorder the table creation and 
> loading (they used to be in alphabetical order), I also found a few 
> referential integrity errors in the MySQL data.

Data integrity != refferential integrity. :) It's very easy to
accidentally get MyISAM tables in MySQL, which means you are nowhere
near ACID which also means you can't get anything close to an apples to
apples comparison to PostgreSQL.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 04:15:41PM -0500, Jim C. Nasby wrote:
> >Index Cond: ((playerid)::text = '0'::text)

Also, why is playerid a text field? Comparing ints will certainly be
faster...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Mark Lewis
Which version of PG are you using?  One of the new features for 8.0 was
an improved caching algorithm that was smart enough to avoid letting a
single big query sweep everything else out of cache.

-- Mark Lewis


On Tue, 2005-10-04 at 10:45 -0400, Lane Van Ingen wrote:
> Yes, Stefan, the kind of usage you are mentioning is exactly why I was
> asking.
> 
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Stefan Weiss
> Sent: Tuesday, October 04, 2005 6:32 AM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Is There Any Way 
> 
> 
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of
> keeping
> > it in
> >   memory buffers as a result of frequent activity which would prevent
> > LRU
> >   operations from taking it out) ?
> 
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.
> 
> 
> cheers,
> stefan
> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
> 
> 
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Comparative performance

2005-10-04 Thread Dario
Postgresql uses MVCC to ensure data integrity. Server must choose the right
version of tuple, according to transaction ID of statement. Even for a
select (ACID features of postgresql, I think C and I apply here), it must
accomplish some extra work.

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Joe
Enviado el: martes, 04 de octubre de 2005 18:11
Para: Jim C. Nasby
CC: Andreas Pflug; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] Comparative performance


Hi Jim,

Jim C. Nasby wrote:
> Also, just because no one else has mentioned it, remember that it's very
> easy to get MySQL into a mode where you have no data integrity. If
> that's the case it's going to be faster than PostgreSQL (though I'm not
> sure how much that affects the performance of SELECTs).

Yes indeed.  When I added the REFERENCES to the schema and reran the
conversion
scripts, aside from having to reorder the table creation and loading (they
used
to be in alphabetical order), I also found a few referential integrity
errors in
the MySQL data.

Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Ron Peacetree
pg is _very_ stupid about caching.  Almost all of the caching is left
to the OS, and it's that way by design (as post after post by TL has
pointed out).

That means pg has almost no ability to take application domain
specific knowledge into account when deciding what to cache.
There's plenty of papers on caching out there that show that
context dependent knowledge leads to more effective caching
algorithms than context independent ones are capable of.

(Which means said design choice is a Mistake, but unfortunately
one with too much inertia behind it currentyl to change easily.)

Under these circumstances, it is quite possible that an expert class
human could optimize memory usage better than the OS + pg.
   
If one is _sure_ they know what they are doing, I'd suggest using
tmpfs or the equivalent for critical read-only tables.  For "hot"
tables that are rarely written to and where data loss would not be
a disaster, "tmpfs" can be combined with an asyncronous writer
process push updates to HD.  Just remember that a power hit
means that 

The (much) more expensive alternative is to buy SSD(s) and put
the critical tables on it at load time.

Ron
 

-Original Message-
From: "Jim C. Nasby" <[EMAIL PROTECTED]>
Sent: Oct 4, 2005 4:57 PM
To: Stefan Weiss <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way 

On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:
> On 2005-09-30 01:21, Lane Van Ingen wrote:
> >   (3) Assure that a disk-based table is always in memory (outside of keeping
> > it in
> >   memory buffers as a result of frequent activity which would prevent
> > LRU
> >   operations from taking it out) ?
> 
> I was wondering about this too. IMO it would be useful to have a way to tell
> PG that some tables were needed frequently, and should be cached if
> possible. This would allow application developers to consider joins with
> these tables as "cheap", even when querying on columns that are not indexed.
> I'm thinking about smallish tables like users, groups, *types, etc which
> would be needed every 2-3 queries, but might be swept out of RAM by one
> large query in between. Keeping a table like "users" on a RAM fs would not
> be an option, because the information is not volatile.

Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 07:33:47PM -0400, Ron Peacetree wrote:
> pg is _very_ stupid about caching.  Almost all of the caching is left
> to the OS, and it's that way by design (as post after post by TL has
> pointed out).
> 
> That means pg has almost no ability to take application domain
> specific knowledge into account when deciding what to cache.
> There's plenty of papers on caching out there that show that
> context dependent knowledge leads to more effective caching
> algorithms than context independent ones are capable of.
> 
> (Which means said design choice is a Mistake, but unfortunately
> one with too much inertia behind it currentyl to change easily.)
> 
> Under these circumstances, it is quite possible that an expert class
> human could optimize memory usage better than the OS + pg.

Do you have any examples where this has actually happened? Especially
with 8.x, which isn't all that 'stupid' about how it handles buffers?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Douglas J. Trainor


Ron Peacetree sounds like someone talking out of his _AZZ_.
He can save his unreferenced flapdoodle for his SQL Server
clients.  Maybe he will post references so that we may all
learn at the feet of Master Peacetree.  :-)

douglas

On Oct 4, 2005, at 7:33 PM, Ron Peacetree wrote:


pg is _very_ stupid about caching.  Almost all of the caching is left
to the OS, and it's that way by design (as post after post by TL has
pointed out).

That means pg has almost no ability to take application domain
specific knowledge into account when deciding what to cache.
There's plenty of papers on caching out there that show that
context dependent knowledge leads to more effective caching
algorithms than context independent ones are capable of.

(Which means said design choice is a Mistake, but unfortunately
one with too much inertia behind it currentyl to change easily.)

Under these circumstances, it is quite possible that an expert class
human could optimize memory usage better than the OS + pg.

If one is _sure_ they know what they are doing, I'd suggest using
tmpfs or the equivalent for critical read-only tables.  For "hot"
tables that are rarely written to and where data loss would not be
a disaster, "tmpfs" can be combined with an asyncronous writer
process push updates to HD.  Just remember that a power hit
means that

The (much) more expensive alternative is to buy SSD(s) and put
the critical tables on it at load time.

Ron


-Original Message-
From: "Jim C. Nasby" <[EMAIL PROTECTED]>
Sent: Oct 4, 2005 4:57 PM
To: Stefan Weiss <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way 

On Tue, Oct 04, 2005 at 12:31:42PM +0200, Stefan Weiss wrote:

On 2005-09-30 01:21, Lane Van Ingen wrote:
  (3) Assure that a disk-based table is always in memory (outside of 
keeping

it in
  memory buffers as a result of frequent activity which would 
prevent

LRU
  operations from taking it out) ?


I was wondering about this too. IMO it would be useful to have a way 
to tell

PG that some tables were needed frequently, and should be cached if
possible. This would allow application developers to consider joins 
with
these tables as "cheap", even when querying on columns that are not 
indexed.
I'm thinking about smallish tables like users, groups, *types, etc 
which
would be needed every 2-3 queries, but might be swept out of RAM by 
one
large query in between. Keeping a table like "users" on a RAM fs 
would not

be an option, because the information is not volatile.


Why do you think you'll know better than the database how frequently
something is used? At best, your guess will be correct and PostgreSQL
(or the kernel) will keep the table in memory. Or, your guess is wrong
and you end up wasting memory that could have been used for something
else.

It would probably be better if you describe why you want to force this
table (or tables) into memory, so we can point you at more appropriate
solutions.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Indexes on ramdisk

2005-10-04 Thread Emil Briggs

I have an application that has a table that is both read and write intensive. 
Data from iostat indicates that the write speed of the system is the factor 
that is limiting performance. The table has around 20 columns and most of the 
columns are indexed. The data and the indices for the table are distributed 
over several mirrored disk partitions and pg_xlog is on another. I'm looking 
at ways to improve performance and besides the obvious one of getting an SSD 
I thought about putting the indices on a ramdisk. That means that after a 
power failure or shutdown I would have to recreate them but that is 
acceptable for this application. What I am wondering though is whether or not 
I would see much performance benefit and if there would be any startup 
problems after a power down event due to the indices not being present. Any 
insight would be appreciated.

Emil

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Joshua D. Drake

Douglas J. Trainor wrote:



Ron Peacetree sounds like someone talking out of his _AZZ_.
He can save his unreferenced flapdoodle for his SQL Server
clients.  Maybe he will post references so that we may all
learn at the feet of Master Peacetree.  :-)


Although I agree that I would definitely like to see some test cases
for what Ron is talking about, I don't think that resorting to insults
is going to help the situation.

Ron, if you would please -- provide some test cases for what you are
describing I am sure that anyone would love to see them. We are all
for improving PostgreSQL.

Sincerely,

Joshua D. Drake

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Ron Peacetree
Unfortunately, no matter what I say or do, I'm not going to please
or convince anyone who has already have made their minds up
to the extent that they post comments like Mr Trainor's below.
His response style pretty much proves my earlier point that this
is presently a religious issue within the pg community.

The absolute best proof would be to build a version of pg that does
what Oracle and DB2 have done and implement it's own DB
specific memory manager and then compare the performance
between the two versions on the same HW, OS, and schema.

The second best proof would be to set up either DB2 or Oracle so
that they _don't_ use their memory managers and compare their
performance to a set up that _does_ use said memory managers
on the same HW, OS, and schema.

I don't currently have the resources for either experiment.

Some might even argue that IBM (where Codd and Date worked)
and Oracle just _might_ have had justification for the huge effort
they put into developing such infrastructure. 

Then there's the large library of research on caching strategies
in just about every HW and SW domain, including DB theory,
that points put that the more context dependent, ie application
or domain specific awareness, caching strategies are the better
they are.

Maybe after we do all we can about physical IO and sorting
performance I'll take on the religious fanatics on this one.

One problem set at a time.
Ron   
 

-Original Message-
From: "Joshua D. Drake" <[EMAIL PROTECTED]>
Sent: Oct 4, 2005 9:32 PM
To: "Douglas J. Trainor" <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is There Any Way 

Douglas J. Trainor wrote:

>
> Ron Peacetree sounds like someone talking out of his _AZZ_.
> He can save his unreferenced flapdoodle for his SQL Server
> clients.  Maybe he will post references so that we may all
> learn at the feet of Master Peacetree.  :-)

Although I agree that I would definitely like to see some test cases
for what Ron is talking about, I don't think that resorting to insults
is going to help the situation.

Ron, if you would please -- provide some test cases for what you are
describing I am sure that anyone would love to see them. We are all
for improving PostgreSQL.

Sincerely,

Joshua D. Drake

-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Indexes on ramdisk

2005-10-04 Thread Alex Turner
Talk about your IO system a bit.  There might be obvious ways to improve.

What System/Motherboard are you using?
What Controller Cards are you using?
What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k)
What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?
What kind of RAIDs do you have setup (How many drives what stripe sizes, how many used for what).
What levels of RAID are you using (0,1,10,5,50)?

With good setup, a dual PCI-X bus motherboard can hit 2GB/sec and
thousands of transactions  to disk if you have a controller/disks
that can keep up.  That is typicaly enough for most people without
resorting to SSD.

Alex Turner
NetEconomistOn 10/4/05, Emil Briggs <[EMAIL PROTECTED]> wrote:
I have an application that has a table that is both read and write intensive.Data from iostat indicates that the write speed of the system is the factorthat is limiting performance. The table has around 20 columns and most of the
columns are indexed. The data and the indices for the table are distributedover several mirrored disk partitions and pg_xlog is on another. I'm lookingat ways to improve performance and besides the obvious one of getting an SSD
I thought about putting the indices on a ramdisk. That means that after apower failure or shutdown I would have to recreate them but that isacceptable for this application. What I am wondering though is whether or not
I would see much performance benefit and if there would be any startupproblems after a power down event due to the indices not being present. Anyinsight would be appreciated.Emil---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Indexes on ramdisk

2005-10-04 Thread Emil Briggs
> Talk about your IO system a bit. There might be obvious ways to improve.
>
> What System/Motherboard are you using?
> What Controller Cards are you using?
> What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k)
> What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?
> What kind of RAIDs do you have setup (How many drives what stripe sizes,
> how many used for what).
> What levels of RAID are you using (0,1,10,5,50)?
>

It's a quad opteron system. RAID controller is a 4 channel LSILogic Megaraid  
320 connected to 10 15k 36.7G SCSI disks. The disks are configured in 5 
mirrored partitions. The pg_xlog is on one mirror and the data and indexes 
are spread over the other 4 using tablespaces. These numbers from 
pg_stat_user_tables are from about 2 hours earlier today on this one table.


idx_scan   20578690
idx_tup_fetch  35866104841
n_tup_ins1940081
n_tup_upd   1604041
n_tup_del1880424



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Steve Atkins
On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:

> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure. 

The OS and FS world is very, very different now than it was when
the Oracle and DB2 architectures were being crafted. What may have
been an excellent development effort then may not provide such good
ROI now.

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
> 
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.

Actually, the main "religious fanatic" I've seen recently is yourself.
While I have a gut feel that some of the issues you raise could
certainly do with further investigation, I'm not seeing that much from
you other than statements that muchof what postgresql does is wrong
(not "wrong for your Ron's use case", but "wrong in every respect").

A little less arrogance and a little more "here are some possibilities
for improvement", "here is an estimate of the amount of effort that
might be needed" and "here are some rough benchmarks showing the
potential return on that investment" would, at the very least, make
the threads far less grating to read.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread mark
On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:
> Unfortunately, no matter what I say or do, I'm not going to please
> or convince anyone who has already have made their minds up
> to the extent that they post comments like Mr Trainor's below.
> His response style pretty much proves my earlier point that this
> is presently a religious issue within the pg community.

Religious for some. Conservative for others.

Sometimes people need to see the way, before they are willing to
accept it merely on the say so of another person. In some circles, it
is called the scientific method... :-)

Also, there is a cost to complicated specific optimizations. They can
be a real maintenance and portability head-ache. What is the value ratio
of performance to maintenance or portability?

> The absolute best proof would be to build a version of pg that does
> what Oracle and DB2 have done and implement it's own DB
> specific memory manager and then compare the performance
> between the two versions on the same HW, OS, and schema.

Not necessarily. Even if a version of PostgreSQL were to be written to
function in this new model, there would be no guarantee that it was
written in the most efficient manner possible. Performance could show
PostgreSQL using its own caching, and disk space management
implementation, and performing poorly. The only true, and accurate way
would be to implement, and then invest time by those most competent to
test, and optimize the implementation. At this point, it would become
a moving target, as those who believe otherwise, would be free to
pursue using more efficient file systems, or modifications to the
operating system to better co-operate with PostgreSQL.

I don't think there can be a true answer to this one. The more
innovative, and clever people, will always be able to make their
solution work better. If the difference in performance was really so
obvious, there wouldn't be doubters on either side. It would be clear
to all. The fact is, there is reason to doubt. Perhaps not doubt that
the final solution would be more efficient, but rather, the reason
to doubt that the difference in efficiency would be significant.

> The second best proof would be to set up either DB2 or Oracle so
> that they _don't_ use their memory managers and compare their
> performance to a set up that _does_ use said memory managers
> on the same HW, OS, and schema.

Same as above. If Oracle was designed to work with the functionality,
then disabling the functionality, wouldn't prove that an efficient
design would perform equally poorly, or even, poorly at all. I think
it would be obvious that Oracle would have invested most of their
dollars into the common execution paths, with the expected
functionality present.

> I don't currently have the resources for either experiment.

This is the real problem. :-)

> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure. 

Or, not. They might just have more money to throw at the problem,
and be entrenched into their solution to the point that they need
to innovate to ensure that their solution appears to be the best.

> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.

A lot of this is theory. It may be good theory, but there is no
guarantee that the variables listed in these theories match, or
properly estimate the issues that would be found in a real
implementation.

> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.
> One problem set at a time.

In any case, I'm on your side - in theory. Intuitively, I don't
understand how anybody could claim that a general solution could ever
be faster than a specific solution. Anybody who claimed this, would
go down in my books as a fool. It should be obvious to these people
that, as an extreme, the entire operating system caching layer, and
the entire file system layer could be inlined into PostgreSQL,
avoiding many of the expenses involved in switching back and forth
between user space and system space, leaving a more efficient,
although significantly more complicated solution.

Whether by luck, or by experience of those involved, I haven't seen
any senior PostgreSQL developers actually stating that it couldn't be
faster. Instead, I've seen it claimed that the PostgreSQL developers
don't have the resources to attack this problem, as there are other
far more pressing features, product defects, and more obviously
beneficial optimization opportunities to work on. Memory management,
or disk management, is "good enough" as provided by decent operating
systems, and the itch just isn't bad enough to scratch yet. They
rema

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Hannu Krosing
On E, 2005-10-03 at 14:16 -0700, Josh Berkus wrote:
> Jeff,
> 
> > > Nope, LOTS of testing, at OSDL, GreenPlum and Sun.   For comparison, A
> > > Big-Name Proprietary Database doesn't get much more than that either.
> >
> > I find this claim very suspicious.  I get single-threaded reads in
> > excess of 1GB/sec with XFS and > 250MB/sec with ext3.
> 
> Database reads?  Or raw FS reads?  It's not the same thing.

Just FYI, I run a count(*) on a 15.6GB table on a lightly loaded db and
it run in 163 sec. (Dual opteron 2.6GHz, 6GB RAM, 6 x 74GB 15k  disks in
RAID10, reiserfs). A little less than 100MB sec.

After this I ran count(*) over a 2.4GB file from another tablespace on
another device (4x142GB 10k disks in RAID10) and it run 22.5 sec on
first run and 12.5 on second.

db=# show shared_buffers ;
 shared_buffers

 196608
(1 row)

db=# select version();
  version

 PostgreSQL 8.0.3 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 3.3.6
(Debian 1:3.3.6-7)
(1 row)


-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Gregory Maxwell
On 10/3/05, Ron Peacetree <[EMAIL PROTECTED]> wrote:
[snip]
> Just how bad is this CPU bound condition?  How powerful a CPU is
> needed to attain a DB IO rate of 25MBps?
>
> If we replace said CPU with one 2x, 10x, etc faster than that, do we
> see any performance increase?
>
> If a modest CPU can drive a DB IO rate of 25MBps, but that rate
> does not go up regardless of how much extra CPU we throw at
> it...

Single threaded was mentioned.
Plus even if it's purely cpu bound, it's seldom as trivial as throwing
CPU at it, consider the locking in both the application, in the
filesystem, and elsewhere in the kernel.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] index on custom function; explain

2005-10-04 Thread Jan Aerts
Some additional thoughts: what appears to take the most time (i.e.
account for the highest cost in the explain), is _not_ running the
function itself (cost=0.00..0.01), but comparing the result from that
function with the name1 column in the mappings table
(cost=0.00..35935.05). Am I right? (See EXPLAIN in previous post.) If
so: that's pretty strange, because the name1-column in the mappings
table is indexed...

jan.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Dann Corbit
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of PFC
> Sent: Thursday, September 29, 2005 9:10 AM
> To: [EMAIL PROTECTED]
> Cc: Pg Hackers; pgsql-performance@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
> 
> 
>   Just to add a little anarchy in your nice debate...
> 
>   Who really needs all the results of a sort on your terabyte
table ?

Reports with ORDER BY/GROUP BY, and many other possibilities.  40% of
mainframe CPU cycles are spent sorting.  That is because huge volumes of
data require lots of energy to be meaningfully categorized.  Let's
suppose that instead of a terabyte of data (or a petabyte or whatever)
we have 10% of it.  That's still a lot of data.

>   I guess not many people do a SELECT from such a table and want
all
> the
> results. 

What happens when they do?  The cases where it is already fast are not
very important.  The cases where things go into the crapper are the ones
that need attention.

>So, this leaves :
>   - Really wanting all the results, to fetch using a cursor,
>   - CLUSTER type things, where you really want everything in
order,
>   - Aggregates (Sort->GroupAggregate), which might really need to
sort
> the
> whole table.
>   - Complex queries where the whole dataset needs to be examined,
in
> order
> to return a few values
>   - Joins (again, the whole table is probably not going to be
> selected)
>   - And the ones I forgot.
> 
>   However,
> 
>   Most likely you only want to SELECT N rows, in some ordering :
>   - the first N (ORDER BY x LIMIT N)
>   - last N (ORDER BY x DESC LIMIT N)

For these, the QuickSelect algorithm is what is wanted.  For example:
#include 
typedef double  Etype;

extern EtypeRandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t   RandRange(size_t a, size_t b);
extern size_t   RandomPartition(Etype * A, size_t p, size_t r);
extern size_t   Partition(Etype * A, size_t p, size_t r);

/*
**
** In the following code, every reference to CLR means:
**
**"Introduction to Algorithms"
**By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**ISBN 0-07-013143-0
*/


/*
** CLR, page 187
*/
Etype   RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t  q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i <= k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t  RandRange(size_t a, size_t b)
{
size_t  c = (size_t) ((double) rand() / ((double) RAND_MAX +
1) * (b - a));
return c + a;
}

/*
** CLR, page 162
*/
size_t  RandomPartition(Etype A[], size_t p, size_t r)
{
size_t  i = RandRange(p, r);
Etype   Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
size_t  Partition(Etype A[], size_t p, size_t r)
{
Etype   x,
temp;
size_t  i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] <= x));
do {
i++;
} while (!(A[i] >= x));
if (i < j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

>   - WHERE x>value ORDER BY x LIMIT N
>   - WHERE x   - and other variants
> 
>   Or, you are doing a Merge JOIN against some other table ; in
that
> case,
> yes, you might need the whole sorted terabyte table, but most likely
there
> are WHERE clauses in the query that restrict the set, and thus, maybe
we
> can get some conditions or limit values on the column to sort.

Where clause filters are to be applied AFTER the join operations,
according to the SQL standard.

>   Also the new, optimized hash join, which is more memory
efficient,
> might
> cover this case.

For == joins.  Not every order by is applied to joins.  And not every
join is an equal join.

>   Point is, sometimes, you only need part of the results of your
sort.
> And
> the bigger the sort, the most likely it becomes that you only want
part of
> the results.

That is an assumption that will sometimes be true, and sometimes not.
It is not possible to predict usage patterns for a general purpose
database system.

> So, while we're in the fun hand-waving, new algorithm trying
> mode, why not consider this right from the start ? (I know I'm totally
in
> hand-waving mode right now, so slap me if needed).
> 
>   I'd say your new, fancy sort algorithm needs a few more input
values
> :
> 
>   - Range of values that must appear in the final result of the
sort :
>   none, minimum, maximum, both, or even a set of values
from the
> other
> side of the join, hashed, or sorted.

That will already

[PERFORM] index on custom function; explain

2005-10-04 Thread [EMAIL PROTECTED]
Hi,

I'm trying to include a custom function in my SQL-queries, which
unfortunately leaves the server hanging...

I basically search through two tables:
* TABLE_MAPPING: lists that 'abc' is mapped to 'def'
  id1 | name1 | id2 | name2
  -
  1   | abc   | 2   | def
  3   | uvw   | 4   | xyz
This data means that 'abc' is mapped_to 'def', and 'uvw' is mapped_to
'xyz'. About 1,500,000 records in total.

* TABLE ALIASES: lists different aliases of the same thing
  id1 | name1 | id2 | name2
  -
  3   | uvw   | 2   | def
This data means that 'uvw' and 'def' are essentially the same thing.
About 820,000 records in total.

I have indexes on all columns of the above tables.

Based on the two tables above, 'abc' is indirectly mapped_to 'xyz' as
well (through 'def' also-known-as 'uvw').

I wrote this little function: aliases_of
CREATE FUNCTION aliases_of(INTEGER) RETURNS SETOF integer
AS 'SELECT $1
UNION
SELECT id1 FROM aliases WHERE id2 = $1
UNION
SELECT id2 FROM aliases WHERE id1 = $1
'
LANGUAGE SQL
STABLE;

A simple SELECT aliases_of(2) shows:
  aliases_of
  --
  2
  3

Now, when I want to traverse the aliases, I would write a query as
follows:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (SELECT aliases_of(m1.name2));

Unfortunately, this query seems to keep running and to never stop...


An EXPLAIN of the above query shows:
QUERY PLAN
-
 Nested Loop  (cost=0.00..118379.45 rows=1384837 width=80)
   Join Filter: (subplan)
   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
 Index Cond: (name1 = 'abc')
   ->  Seq Scan on c_mappings m2  (cost=0.00..35935.05 rows=1423805
width=40)
   SubPlan
 ->  Result  (cost=0.00..0.01 rows=1 width=0)
(7 rows)

Strangely enough, I _do_ get output when I type the following query:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (
  SELECT m1.name2
  UNION
  SELECT name2 FROM aliases WHERE name1 = m1.name2
  UNION
  SELECT name1 FROM aliases WHERE name2 = m2.name1
);

The EXPLAIN for this query is:
   QUERY PLAN
---
 Nested Loop  (cost=0.00..36712030.90 rows=1384837 width=80)
   Join Filter: (subplan)
   ->  Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
 Index Cond: (object1_id = 16575564)
   ->  Seq Scan on c_mappings m2  (cost=0.00..35935.05 rows=1423805
width=40)
   SubPlan
 ->  Unique  (cost=13.21..13.23 rows=1 width=4)
   ->  Sort  (cost=13.21..13.22 rows=3 width=4)
 Sort Key: object2_id
 ->  Append  (cost=0.00..13.18 rows=3 width=4)
   ->  Subquery Scan "*SELECT* 1"  (cost=0.00..0.01
rows=1 width=0)
 ->  Result  (cost=0.00..0.01 rows=1
width=0)
   ->  Subquery Scan "*SELECT* 2"  (cost=0.00..5.92
rows=1 width=4)
 ->  Index Scan using
ind_caliases_object2_id on c_aliases  (cost=0.00..5.92 rows=1 width=4)
   Index Cond: (object2_id = $0)
   ->  Subquery Scan "*SELECT* 3"  (cost=0.00..7.25
rows=1 width=4)
 ->  Index Scan using
ind_caliases_object1_id on c_aliases  (cost=0.00..7.25 rows=1 width=4)
   Index Cond: (object1_id = $0)
(18 rows)

So my questions are:
* Does anyone have any idea how I can integrate a function that lists
all aliases for a given name into such a mapping query?
* Does the STABLE keyword in the function definition make the function
to read all its data into memory?
* Is there a way to let postgres use an "Index scan" on that function
instead of a "seq scan"?

Any help very much appreciated,
Jan Aerts


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Is There Any Way ....

2005-10-04 Thread Kevin Grittner
First off, Mr. Trainor's response proves nothing about anyone or
anything except Mr. Trainor.
 
I'm going to offer an opinion on the caching topic.  I don't have
any benchmarks; I'm offering a general sense of the issue based on
decades of experience, so I'll give a short summary of that.
 
I've been earning my living by working with computers since 1972,
and am the architect and primary author of a little-known
database product (developed in 1984) which saw tens of thousands
of installations in various vertical markets.  (Last I checked, a
couple years ago, it was still being used statewide by one state
government after a multi-million dollar attempt to replace it with a
popular commercial database product failed.)  I've installed and
tuned many other database products over the years.  I'm just getting
to know PostgreSQL, and am pretty excited about it.
 
Now on to the meat of it.
 
My experience is that a DBMS can improve performance by caching
certain types of data.  In the product I developed, we had a fairly
small cache which used a weighting algorithm for what to keep
(rather than simply relying on LRU).  Index pages got higher weight
than data pages; the higher in the index, the higher the weight.
Recent access got higher weight than older access, although it took
quite a while for the older access to age out entirely. This
improved performance quite a bit over a generalized caching
product alone.
 
However, there was a point of diminishing return.  My sense is that
every 10% you add to a "smart" cache yields less benefit at a
higher cost, so beyond a certain point, taking RAM from the general
cache to expand the smart cache degrades performance.  Clever
programming techniques can shift the break-even point, but I very
much doubt it can be eliminated entirely, unless the ratio of
performance between CPU+RAM and persistent storage is much
more extreme than I've ever seen.
 
There is another issue, which has been raised from time to time in
these lists, but not enunciated clearly enough in my view.  These
discussions about caching generally address maximum throughput,
while there are times when it is important that certain tables can
be queried very quickly, even if it hurts overall throughput.  As an
example, there can be tables which are accessed as a user types in
a window and tabs around from one GUI control to another.  The
user perception of the application performance is going to depend
PRIMARILY on how quickly the GUI renders the results of these
queries; if the run time for a large report goes up by 10%, they
will probably not notice.  This is a situation where removing RAM
from a generalized cache, or one based on database internals, to
create an "application specific" cache can yield big rewards.
 
One client has addressed this in a commercial product by defining
a named cache large enough to hold these tables, and binding those
tables to the cache.  One side benefit is that such caches can be
defined as "relaxed LRU" -- meaning that they eliminate the
overhead of tracking accesses, since they can assume that data will
rarely, if ever, be discarded from the cache.
 
It seems to me that in the PostgreSQL world, this would currently
be addressed by binding the tables to a tablespace where the file
system, controller, or drive(s) would cache the data, although this
is somewhat less flexible than the "named cache" approach -- unless
there is a file system that can layer a cache on top of a reference to
some other file system's space.  (And let's not forget the many OS
environments in which people use PostgreSQL.)  So I do see that
there would be benefit to adding a feature to PostgreSQL to define
caches and bind tables or indexes to them.
 
So I do think that it is SMART of PostgreSQL to rely on the
increasingly sophisticated file systems to provide the MAIN cache.
I suspect that a couple types of smaller "smart" caches in front of
this could boost performance, and it might be a significant boost.
I'm not sure what the current shared memory is used for; perhaps
this is already caching specific types of structures for the DBMS.
I'm pretty sure that programmers of GUI apps would appreciate the
named cache feature, so they could tune the database for snappy
GUI response, even under heavy load.
 
I realize this is short on specifics -- I'm shooting for perspective.
For the record, I don't consider myself particularly religious on the
topic, but I do pull back a little at arguments which sound strictly
academic -- I've found that most of what I've drawn from those
circles has needed adjustment in solving real-world problems.
(Particularly when algorithms optimize for best worst-case
performance.  I've found users are much happier with best typical
case performance as long as the product of worst case performance
and worst case frequency is low.)
 
Like many others who have posted on the topic, I am quite
prepared to alter my views in the face of relavent evidence.
 
Feel free to laugh at the old far

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Dann Corbit
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: Friday, September 30, 2005 11:02 PM
> To: Jeffrey W. Baker
> Cc: Luke Lonergan; Josh Berkus; Ron Peacetree; pgsql-
> [EMAIL PROTECTED]; pgsql-performance@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
> 
> "Jeffrey W. Baker" <[EMAIL PROTECTED]> writes:
> > I think the largest speedup will be to dump the multiphase merge and
> > merge all tapes in one pass, no matter how large M.  Currently M is
> > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes
over
> > the tape.  It could be done in a single pass heap merge with
N*log(M)
> > comparisons, and, more importantly, far less input and output.
> 
> I had more or less despaired of this thread yielding any usable ideas
> :-( but I think you have one here.  

I believe I made the exact same suggestion several days ago.

>The reason the current code uses a
> six-way merge is that Knuth's figure 70 (p. 273 of volume 3 first
> edition) shows that there's not much incremental gain from using more
> tapes ... if you are in the regime where number of runs is much
greater
> than number of tape drives.  But if you can stay in the regime where
> only one merge pass is needed, that is obviously a win.
> 
> I don't believe we can simply legislate that there be only one merge
> pass.  That would mean that, if we end up with N runs after the
initial
> run-forming phase, we need to fit N tuples in memory --- no matter how
> large N is, or how small work_mem is.  But it seems like a good idea
to
> try to use an N-way merge where N is as large as work_mem will allow.
> We'd not have to decide on the value of N until after we've completed
> the run-forming phase, at which time we've already seen every tuple
> once, and so we can compute a safe value for N as work_mem divided by
> largest_tuple_size.  (Tape I/O buffers would have to be counted too
> of course.)

You only need to hold the sort column(s) in memory, except for the queue
you are exhausting at the time.  [And of those columns, only the values
for the smallest one in a sub-list.]  Of course, the more data from each
list that you can hold at once, the fewer the disk reads and seeks.

Another idea (not sure if it is pertinent):
Instead of having a fixed size for the sort buffers, size it to the
query.  Given a total pool of size M, give a percentage according to the
difficulty of the work to perform.  So a query with 3 small columns and
a cardinality of 1000 gets a small percentage and a query with 10 GB of
data gets a big percentage of available sort mem.
 
> It's been a good while since I looked at the sort code, and so I don't
> recall if there are any fundamental reasons for having a compile-time-
> constant value of the merge order rather than choosing it at runtime.
> My guess is that any inefficiencies added by making it variable would
> be well repaid by the potential savings in I/O.
> 
>   regards, tom lane
> 
> ---(end of
broadcast)---
> TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Dann Corbit
I have perused the tuple sort stuff.

The good:
The documentation of the sort algorithm from Knuth's TAOCP was
beautifully done.  Everyone who writes an algorithm should credit the
original source like this, and also where it deviates.  That was done
very nicely.

The bad:
With random access, tape style merging is not necessary.  A priority
queue based merge will be a lot faster.

The UGLY:
Please, someone, tell me I was hallucinating.  Is that code really
READING AND WRITING THE WHOLE TUPLE with every sort exchange?!  Maybe
there is a layer of abstraction that I am somehow missing.  I just can't
imagine that is what it is really doing.  If (somehow) it really is
doing that, a pointer based sort which forms a permutation based upon
the keys, would be a lot better.

The fundamental algorithm itself could also be improved somewhat.

Here is a {public domain} outline for an introspective quick sort that
Pete Filander and I wrote some time ago and contributed to FastDB.  It
is written as a C++ template, but it will take no effort to make it a
simple C routine.  It assumes that e_type has comparison operators, so
in C you would use a compare function instead.

/*
** Sorting stuff by Dann Corbit and Pete Filandr.
** ([EMAIL PROTECTED] and [EMAIL PROTECTED])
** Use it however you like.
*/

//
//  The insertion sort template is used for small partitions.
//

template < class e_type >
void insertion_sort(e_type * array, size_t nmemb)
{
e_type temp,
  *last,
  *first,
  *middle;
if (nmemb > 1) {
first = middle = 1 + array;
last = nmemb - 1 + array;
while (first != last) {
++first;
if ((*(middle) > *(first))) {
middle = first;
}
}
if ((*(array) > *(middle))) {
((void) ((temp) = *(array), *(array) = *(middle), *(middle)
= (temp)));
}
++array;
while (array != last) {
first = array++;
if ((*(first) > *(array))) {
middle = array;
temp = *middle;
do {
*middle-- = *first--;
} while ((*(first) > *(&temp)));
*middle = temp;
}
}
}
}

//
// The median estimate is used to choose pivots for the quicksort
algorithm
//

template < class e_type >
void median_estimate(e_type * array, size_t n)
{
e_type  temp;
long unsigned   lu_seed = 123456789LU;
const size_tk = ((lu_seed) = 69069 * (lu_seed) + 362437) % --n;
((void) ((temp) = *(array), *(array) = *(array + k), *(array + k) =
(temp)));
if ((*((array + 1)) > *((array {
(temp) = *(array + 1);
if ((*((array + n)) > *((array {
*(array + 1) = *(array);
if ((*(&(temp)) > *((array + n {
*(array) = *(array + n);
*(array + n) = (temp);
} else {
*(array) = (temp);
}
} else {
*(array + 1) = *(array + n);
*(array + n) = (temp);
}
} else {
if ((*((array)) > *((array + n {
if ((*((array + 1)) > *((array + n {
(temp) = *(array + 1);
*(array + 1) = *(array + n);
*(array + n) = *(array);
*(array) = (temp);
} else {
((void) (((temp)) = *((array)), *((array)) = *((array +
n)), *((array + n)) = ((temp;
}
}
}
}


//
// This is the heart of the quick sort algorithm used here.
// If the sort is going quadratic, we switch to heap sort.
// If the partition is small, we switch to insertion sort.
//

template < class e_type >
void qloop(e_type * array, size_t nmemb, size_t d)
{
e_type temp,
  *first,
  *last;
while (nmemb > 50) {
if (sorted(array, nmemb)) {
return;
}
if (!d--) {
heapsort(array, nmemb);
return;
}
median_estimate(array, nmemb);
first = 1 + array;
last = nmemb - 1 + array;
do {
++first;
} while ((*(array) > *(first)));
do {
--last;
} while ((*(last) > *(array)));
while (last > first) {
((void) ((temp) = *(last), *(last) = *(first), *(first) =
(temp)));
do {
++first;
} while ((*(array) > *(first)));
do {
--last;
} while ((*(last) > *(array)));
}
((void) ((temp) = *(array), *(array) = *(last), *(last) =
(temp)));
qloop(last + 1, nmemb - 1 + array - last, d);
nmemb = last - array;
}
insertion_sort(array, nmemb);
}

//
// This heap sort is better than average because it uses Lamont's heap.
//

template < class e_type >
void heapsort(e_type * array, size_t nmemb)
{
size_t i,
   child,
   parent;
e_type temp;
if (nmemb > 1)

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Dann Corbit
Judy definitely rates a WOW!!

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
> [EMAIL PROTECTED] On Behalf Of Gregory Maxwell
> Sent: Friday, September 30, 2005 7:07 PM
> To: Ron Peacetree
> Cc: Jeffrey W. Baker; pgsql-hackers@postgresql.org; pgsql-
> [EMAIL PROTECTED]
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
> 
> On 9/28/05, Ron Peacetree <[EMAIL PROTECTED]> wrote:
> > 2= We use my method to sort two different tables.  We now have these
> > very efficient representations of a specific ordering on these
tables.
> A
> > join operation can now be done using these Btrees rather than the
> > original data tables that involves less overhead than many current
> > methods.
> 
> If we want to make joins very fast we should implement them using RD
> trees. For the example cases where a join against a very large table
> will produce a much smaller output, a RD tree will provide pretty much
> the optimal behavior at a very low memory cost.
> 
> On the subject of high speed tree code for in-core applications, you
> should check out http://judy.sourceforge.net/ . The performance
> (insert, remove, lookup, AND storage) is really quite impressive.
> Producing cache friendly code is harder than one might expect, and it
> appears the judy library has already done a lot of the hard work.
> Though it is *L*GPLed, so perhaps that might scare some here away from
> it. :) and good luck directly doing joins with a LC-TRIE. ;)
> 
> ---(end of
broadcast)---
> TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Dann Corbit
I see the following routines that seem to be related to sorting.

If I were to examine these routines to consider ways to improve it, what
routines should I key in on?  I am guessing that tuplesort.c is the hub
of activity for database sorting.

Directory of U:\postgresql-snapshot\src\backend\access\nbtree

08/11/2005  06:22 AM24,968 nbtsort.c
   1 File(s) 24,968 bytes

 Directory of U:\postgresql-snapshot\src\backend\executor

03/16/2005  01:38 PM 7,418 nodeSort.c
   1 File(s)  7,418 bytes

 Directory of U:\postgresql-snapshot\src\backend\utils\sort

09/23/2005  08:36 AM67,585 tuplesort.c
   1 File(s) 67,585 bytes

 Directory of U:\postgresql-snapshot\src\bin\pg_dump

06/29/2005  08:03 PM31,620 pg_dump_sort.c
   1 File(s) 31,620 bytes

 Directory of U:\postgresql-snapshot\src\port

07/27/2005  09:03 PM 5,077 qsort.c
   1 File(s)  5,077 bytes

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Hannu Krosing
On R, 2005-09-30 at 13:38 -0700, Luke Lonergan wrote:

> 
> Bulk loading speed is irrelevant here - that is dominated by parsing, which
> we have covered copiously (har har) previously and have sped up by 500%,
> which still makes Postgres < 1/2 the loading speed of MySQL.

Is this < 1/2 of MySQL with WAL on different spindle and/or WAL
disabled ?

-- 
Hannu Krosing <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Martijn van Oosterhout
On Sat, Oct 01, 2005 at 10:22:40AM -0400, Ron Peacetree wrote:
> Assuming we get the abyssmal physical IO performance fixed...
> (because until we do, _nothing_ is going to help us as much)

I'm still not convinced this is the major problem. For example, in my
totally unscientific tests on an oldish machine I have here:

Direct filesystem copy to /dev/null
21MB/s10% user 50% system  (dual cpu, so the system is using a whole CPU)

COPY TO /dev/null WITH binary
13MB/s55% user 45% system  (ergo, CPU bound)

COPY TO /dev/null
4.4MB/s   60% user 40% system

\copy to /dev/null in psql
6.5MB/s   60% user 40% system

This machine is a bit strange setup, not sure why fs copy is so slow.
As to why \copy is faster than COPY, I have no idea, but it is
repeatable. And actually turning the tuples into a printable format is
the most expensive. But it does point out that the whole process is
probably CPU bound more than anything else.

So, I don't think physical I/O is the problem. It's something further
up the call tree. I wouldn't be surprised at all it it had to do with
the creation and destruction of tuples. The cost of comparing tuples
should not be underestimated.
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


pgpDEXF6GSZ4G.pgp
Description: PGP signature


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Gregory Maxwell
On 9/28/05, Ron Peacetree <[EMAIL PROTECTED]> wrote:
> 2= We use my method to sort two different tables.  We now have these
> very efficient representations of a specific ordering on these tables.  A
> join operation can now be done using these Btrees rather than the
> original data tables that involves less overhead than many current
> methods.

If we want to make joins very fast we should implement them using RD
trees. For the example cases where a join against a very large table
will produce a much smaller output, a RD tree will provide pretty much
the optimal behavior at a very low memory cost.

On the subject of high speed tree code for in-core applications, you
should check out http://judy.sourceforge.net/ . The performance
(insert, remove, lookup, AND storage) is really quite impressive.
Producing cache friendly code is harder than one might expect, and it
appears the judy library has already done a lot of the hard work. 
Though it is *L*GPLed, so perhaps that might scare some here away from
it. :) and good luck directly doing joins with a LC-TRIE. ;)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Which one FreeBSD or Linux

2005-10-04 Thread AL� �EL�K
FreeBSD or Linux , which system has better performance for PostgreSQL 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-04 Thread Andrew Dunstan



Ron Peacetree wrote:


The good news is all this means it's easy to demonstrate that we can
improve the performance of our sorting functionality.

Assuming we get the abyssmal physical IO performance fixed...
(because until we do, _nothing_ is going to help us as much)

 



I for one would be paying more attention if such a demonstration were 
forthcoming, in the form of a viable patch and some benchmark results.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend