Re: [PERFORM] Perfomance of views

2005-10-27 Thread Richard Huxton

Svenne Krap wrote:

Hi there.

I am currently building a system, where it would be nice to use multiple 
levels of views upon each other (it is a staticstics system, where 
traceability is important).


Is there any significant performance reduction in say 10 levels of views 
instead of one giant, nested sql-statement ? I especially think exection 
planner-wise.


The planner tries to push conditions "inside" views where it can. It's 
not perfect though, and if you're writing a big query by hand you might 
be able to do better than it.


In short, I'd test if you can.
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance of views

2005-10-27 Thread Svenne Krap

What do you mean exactly but "pushing conditions inside" ?

I don't think I will have the option of testing on the full queries, as 
these take many days to write (the current ones, they are replacing on a 
mssql takes up more that 5kb of query). The current ones are nightmares 
from a maintaince standpoint.


Basicly what the application is doing is selecting some base data from 
the "large" table for a point in time (usually a quarter) and selects 
all matching auxilliare data from the other tables. They are made in a 
time-travel like manner with a first and last useable date.


The ways I have considered was :
1) write a big query in hand (not preferred as it gets hard to manage)
2) write layers of views (still not prefered as I still have to remember 
to put on the right conditions everywhere)
3) write layers of sql-functions (returning the right sets of rows from 
the underlying tables) - which I prefer from a development angel .. it 
gets very clean and I cant forget a parameter anywhere.


But I seem to remember (and I have used PGSQL in production since 7.0) 
that the planner has some problems with solution 3 (i.e. estimating the 
cost and rearranging the query), but frankly that would be the way I 
would like to go.


Based on the current (non-optimal) design and hardware constraints, I 
still have to make sure, the query runs fairly optimal - that means the 
planner must use indexes intelligently and other stuff as if it was 
(well-)written using solution 1.


What do you think of the three solutions ? And is there some ressource 
about the planners capabilites for someone like me (that is very used to 
write reasonably fast and complex sql, can read c-code, but does not 
really want to dig into the source code)


Regards

Svenne

Richard Huxton wrote:


Svenne Krap wrote:


Hi there.

I am currently building a system, where it would be nice to use 
multiple levels of views upon each other (it is a staticstics system, 
where traceability is important).


Is there any significant performance reduction in say 10 levels of 
views instead of one giant, nested sql-statement ? I especially think 
exection planner-wise.



The planner tries to push conditions "inside" views where it can. It's 
not perfect though, and if you're writing a big query by hand you 
might be able to do better than it.


In short, I'd test if you can.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Perfomance of views

2005-10-27 Thread Richard Huxton


Don't forget to CC the list


Svenne Krap wrote:

What do you mean exactly but "pushing conditions inside" ?


If I have something like "SELECT * FROM complicated_view WHERE foo = 7" 
then the planner can look "inside" complicated_view and see where it can 
attach the condition "foo=7", rather than running the query and applying 
the condition at the end.


There are cases where it is safe for the planner to do this, but it 
isn't smart enough to do so.


I don't think I will have the option of testing on the full queries, as 
these take many days to write (the current ones, they are replacing on a 
mssql takes up more that 5kb of query). The current ones are nightmares 
from a maintaince standpoint.


Hmm - it sounds like they would be.

Basicly what the application is doing is selecting some base data from 
the "large" table for a point in time (usually a quarter) and selects 
all matching auxilliare data from the other tables. They are made in a 
time-travel like manner with a first and last useable date.


The ways I have considered was :
1) write a big query in hand (not preferred as it gets hard to manage)


Agreed.

2) write layers of views (still not prefered as I still have to remember 
to put on the right conditions everywhere)


This is what I'd probably do, but of course I don't have full 
information about your situation.


3) write layers of sql-functions (returning the right sets of rows from 
the underlying tables) - which I prefer from a development angel .. it 
gets very clean and I cant forget a parameter anywhere.


But I seem to remember (and I have used PGSQL in production since 7.0) 
that the planner has some problems with solution 3 (i.e. estimating the 
cost and rearranging the query), but frankly that would be the way I 
would like to go.


Well, 8.x can "inline" a simple sql function into a larger query, but it 
doesn't sound like that will be enough in your case. Once a function 
becomes a "black box" then there's not much the planner can do to figure 
out what to do.


Based on the current (non-optimal) design and hardware constraints, I 
still have to make sure, the query runs fairly optimal - that means the 
planner must use indexes intelligently and other stuff as if it was 
(well-)written using solution 1.


Well, #1,#2 are likely to be the most efficient, but you won't know for 
sure about #2 until you test it.


There are a couple of other options though:

#4 - Write a set-returning function that breaks the query into steps and 
executes each in turn. So - fetch IDs from the main table in step 1 and 
store them in a temporary table, join other tables in later steps.


#5 - Write a function that writes your big query for you and either 
returns the SQL to your application, or runs it and returns the results.


What do you think of the three solutions ? And is there some ressource 
about the planners capabilites for someone like me (that is very used to 
write reasonably fast and complex sql, can read c-code, but does not 
really want to dig into the source code)


There is some stuff in the "Internals" section of the manuals and it 
might be worth rummaging around on http://techdocs.postgresql.org


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] browsing table with 2 million records

2005-10-27 Thread Merlin Moncure
Christopher 
> > - Present a nifty date selector to choose the records from any
day,
> > hour,  minute, second
> > - show them, with "next day" and "previous day" buttons
> >
> > - It's more useful to the user (most likely he wants to know
what
> > happened on 01/05/2005 rather than view page 2857)
> > - It's faster (no more limit/offset ! just "date BETWEEN a AND
b",
> > indexed of course)
> > - no more new items pushing old ones to the next page while you
> browse
> > - you can pretend to your boss it's just like a paginated list
> 
> All very well and good, but now do it generically...

I've done it...  
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets.  Among other things, it has serious
problems if you have multiple updating your table.  Also it's kind of
silly to be doing this in a set based data paradigm.

The 'SQL' way to browse a dataset is by key.  If your key has multiple
parts or you are trying to sort on two or more fields, you are supposed
to use the row constructor:

select * from t where (x, y) > (xc, yc) order by x,y;

Unfortunately, this gives the wrong answer in postgresql :(.

The alternative is to use boolean logic.  Here is a log snippit from my
ISAM driver (in ISAM, you are *always* browsing datasets):

prepare system_read_next_menu_item_favorite_file_0 (character varying,
int4, int4, int4)
as select from system.menu_item_favorite_file
where mif_user_id >= $1 and 
(mif_user_id >  $1 or  mif_menu_item_id >= $2) and 
(mif_user_id >  $1 or  mif_menu_item_id >  $2 or
mif_sequence_no >  $3) 
order by mif_user_id, mif_menu_item_id, mif_sequence_no
limit $4

This is a Boolean based 'get next record' in a 3 part key plus a
parameterized limit.  You can do this without using prepared statements
of course but with the prepared version you can at least do 

execute system_read_next_menu_item_favorite_file_0('abc', 1, 2, 1);

Merlin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insertion of bytea

2005-10-27 Thread Chris Mair
> On Tue, Oct 25, 2005 at 03:44:36PM +0200, Chris Mair wrote:
>>Is there a better, faster way to do these inserts?
>
> COPY is generally the fastest way to do bulk inserts (see
> PQputCopyData).


Hi,

I've rewritten the testclient now to use COPY, but I'm getting
the exact same results as when doing bundled, prepared inserts.

I'm CPU-bound with an I/O well below what my disks could do :(


Bye, Chris.


PS1: someone off-list suggested using oprofile, which I will do.

PS2: in case somebody is iterested, the test client is here:
 http://www.1006.org/tmp/20051027/

 pgclient-1.1.c is prepared inserts, 2.0 is binary copy.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance of views

2005-10-27 Thread Jan Wieck

On 10/27/2005 7:29 AM, Richard Huxton wrote:


Don't forget to CC the list


Svenne Krap wrote:

What do you mean exactly but "pushing conditions inside" ?


If I have something like "SELECT * FROM complicated_view WHERE foo = 7" 
then the planner can look "inside" complicated_view and see where it can 
attach the condition "foo=7", rather than running the query and applying 
the condition at the end.


Sorry, but the planner doesn't attach the condition anywhere. It is the 
rewriter that takes the actual query, replaces the views rangetable and 
expression entries with the actual underlying objects and adds the views 
condition with an AND to the queries condition. Simply example:


Given a view

create view v1 as select a1, b1, c2 from t1, t2 where a1 = a2;

The statement

select * from v1 where b1 = 'foo';

will result in a parsetree equivalent to what you would get if the 
original query was


select a1, b1, c2 from t1, t2 where (b1 = 'foo') and (a1 = a2);

It is the planners and optimizers job to recognize where in the 
execution plan it can push qualifications down into filters or even 
scankeys. The planner should be able to realize that


select * from v1 where a1 = 42;

is in fact equivalent to

select a1, b1, c2 from t1, t2 where a1 = 42 and a1 = a2;

as well as

select a1, b1, c2 from t1, t2 where a1 = 42 and a1 = a2 and a2 = 42;

This very last addition of "a2 = 42" because of "a2 = a1 = 42" allows it 
to put a constant scankey onto the scan of t2. The 8.0 planner does 
that, so the resulting query plan for the last three selects above is 
absolutely identical.




There are cases where it is safe for the planner to do this, but it 
isn't smart enough to do so.


Example?


Jan



I don't think I will have the option of testing on the full queries, as 
these take many days to write (the current ones, they are replacing on a 
mssql takes up more that 5kb of query). The current ones are nightmares 
from a maintaince standpoint.


Hmm - it sounds like they would be.

Basicly what the application is doing is selecting some base data from 
the "large" table for a point in time (usually a quarter) and selects 
all matching auxilliare data from the other tables. They are made in a 
time-travel like manner with a first and last useable date.


The ways I have considered was :
1) write a big query in hand (not preferred as it gets hard to manage)


Agreed.

2) write layers of views (still not prefered as I still have to remember 
to put on the right conditions everywhere)


This is what I'd probably do, but of course I don't have full 
information about your situation.


3) write layers of sql-functions (returning the right sets of rows from 
the underlying tables) - which I prefer from a development angel .. it 
gets very clean and I cant forget a parameter anywhere.


But I seem to remember (and I have used PGSQL in production since 7.0) 
that the planner has some problems with solution 3 (i.e. estimating the 
cost and rearranging the query), but frankly that would be the way I 
would like to go.


Well, 8.x can "inline" a simple sql function into a larger query, but it 
doesn't sound like that will be enough in your case. Once a function 
becomes a "black box" then there's not much the planner can do to figure 
out what to do.


Based on the current (non-optimal) design and hardware constraints, I 
still have to make sure, the query runs fairly optimal - that means the 
planner must use indexes intelligently and other stuff as if it was 
(well-)written using solution 1.


Well, #1,#2 are likely to be the most efficient, but you won't know for 
sure about #2 until you test it.


There are a couple of other options though:

#4 - Write a set-returning function that breaks the query into steps and 
executes each in turn. So - fetch IDs from the main table in step 1 and 
store them in a temporary table, join other tables in later steps.


#5 - Write a function that writes your big query for you and either 
returns the SQL to your application, or runs it and returns the results.


What do you think of the three solutions ? And is there some ressource 
about the planners capabilites for someone like me (that is very used to 
write reasonably fast and complex sql, can read c-code, but does not 
really want to dig into the source code)


There is some stuff in the "Internals" section of the manuals and it 
might be worth rummaging around on http://techdocs.postgresql.org


--
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

-

[PERFORM] how postgresql request the computer resources

2005-10-27 Thread Sidar López Cruz
Is there something that tells postgres to take the resorces from computer 
(RAM, HDD, SWAP on linux) as it need, not modifying variables on 
postgresql.conf and other operating system things?


A days ago i am trying to show that postgres is better than mssql but when 
execute a simple query like:


(1)
select count(*) from
(
select archivo from fotos
except
select archivo from archivos
) x;
Aggregate  (cost=182162.83..182162.84 rows=1 width=0) (actual 
time=133974.495..133974.498 rows=1 loops=1)
 ->  Subquery Scan x  (cost=173857.98..181830.63 rows=132878 width=0) 
(actual time=109148.158..15.279 rows=169672 loops=1)
   ->  SetOp Except  (cost=173857.98..180501.86 rows=132878 width=58) 
(actual time=109148.144..132094.382 rows=169672 loops=1)
 ->  Sort  (cost=173857.98..177179.92 rows=1328775 width=58) 
(actual time=109147.656..113870.975 rows=1328775 loops=1)

   Sort Key: archivo
   ->  Append  (cost=0.00..38710.50 rows=1328775 width=58) 
(actual time=27.062..29891.075 rows=1328775 loops=1)
 ->  Subquery Scan "*SELECT* 1"  
(cost=0.00..17515.62 rows=523431 width=58) (actual time=27.052..9560.719 
rows=523431 loops=1)
   ->  Seq Scan on fotos  (cost=0.00..12281.31 
rows=523431 width=58) (actual time=27.038..5390.238 rows=523431 loops=1)
 ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..21194.88 rows=805344 width=58) (actual time=10.803..12117.788 
rows=805344 loops=1)
   ->  Seq Scan on archivos  
(cost=0.00..13141.44 rows=805344 width=58) (actual time=10.784..5420.164 
rows=805344 loops=1)

Total runtime: 134552.325 ms


(2)
select count(*) from fotos where archivo not in (select archivo from 
archivos)
Aggregate  (cost=29398.98..29398.99 rows=1 width=0) (actual 
time=26660.565..26660.569 rows=1 loops=1)
 ->  Seq Scan on fotos  (cost=15154.80..28744.69 rows=261716 width=0) 
(actual time=13930.060..25859.340 rows=169799 loops=1)

   Filter: (NOT (hashed subplan))
   SubPlan
 ->  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344 
width=58) (actual time=0.319..5647.043 rows=805344 loops=1)

Total runtime: 26747.236 ms


(3)
select count(1) from fotos f where not exists (select a.archivo from 
archivos a where a.archivo=f.archivo)
Aggregate  (cost=1761354.08..1761354.09 rows=1 width=0) (actual 
time=89765.384..89765.387 rows=1 loops=1)
 ->  Seq Scan on fotos f  (cost=0.00..1760699.79 rows=261716 width=0) 
(actual time=75.556..0.234 rows=169799 loops=1)

   Filter: (NOT (subplan))
   SubPlan
 ->  Index Scan using archivos_archivo_idx on archivos a  
(cost=0.00..13451.40 rows=4027 width=58) (actual time=0.147..0.147 rows=1 
loops=523431)

   Index Cond: ((archivo)::text = ($0)::text)
Total runtime: 89765.714 ms



(4)
SELECT count(*)
FROM fotos f
LEFT JOIN archivos a USING(archivo)
WHERE a.archivo IS NULL
Aggregate  (cost=31798758.40..31798758.41 rows=1 width=0) (actual 
time=114267.337..114267.341 rows=1 loops=1)
 ->  Merge Left Join  (cost=154143.73..31772412.02 rows=10538550 width=0) 
(actual time=85472.696..113392.399 rows=169799 loops=1)

   Merge Cond: ("outer"."?column2?" = "inner"."?column2?")
   Filter: ("inner".archivo IS NULL)
   ->  Sort  (cost=62001.08..63309.66 rows=523431 width=58) (actual 
time=38018.343..39998.201 rows=523431 loops=1)

 Sort Key: (f.archivo)::text
 ->  Seq Scan on fotos f  (cost=0.00..12281.31 rows=523431 
width=58) (actual time=0.158..4904.410 rows=523431 loops=1)
   ->  Sort  (cost=92142.65..94156.01 rows=805344 width=58) (actual 
time=47453.790..50811.216 rows=805701 loops=1)

 Sort Key: (a.archivo)::text
 ->  Seq Scan on archivos a  (cost=0.00..13141.44 rows=805344 
width=58) (actual time=0.206..7160.148 rows=805344 loops=1)

Total runtime: 114893.116 ms




WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS


PLEASE HELP ME

_
Consigue aquí las mejores y mas recientes ofertas de trabajo en América 
Latina y USA: http://latam.msn.com/empleos/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issues with custom functions

2005-10-27 Thread Edward Di Geronimo Jr.




Tom Lane wrote:

  This is fairly hard to read ... it would help a lot if you had shown the
view definitions that the query relies on, so that we could match up the
plan elements with the query a bit better.
  

I wasn't sure how helpful it would be. Here they are:

create view development.network as 
select * from odbc_select('amsterdam', 'bob.dbo.network') as (
    network_id varchar ,
    status_cd varchar  ,
    name varchar  ,
    network_action varchar  ,
    physical_type_cd varchar  ,
    service_type_cd varchar  ,
    parent_network_id varchar  ,
    commission_network_id varchar  ,
    rep_id varchar  ,
    tax_id varchar  ,
    url varchar  ,
    entry_method_cd varchar  ,
    entry_individual_type_cd varchar  ,
    entry_individual_id varchar  ,
    service varchar (30),
    cost_routine varchar (150),
    commission_rate numeric(5, 5)  ,
    directory_number varchar (11),
    search_url varchar (200),
    member_rate numeric(15, 2)  ,
    free_months numeric(18, 0)  ,
    eligibility_hound varchar (60)
)

create view development.network_state as 
select * from odbc_select('amsterdam', 'bob.dbo.network_state') as (
    network_id varchar,
    state_cd varchar,
    product varchar (100) ,
    status_cd varchar,
    entry_method_cd varchar,
    entry_individual_type_cd varchar,
    entry_individual_id varchar,
    logo_id int ,
    from_date timestamp ,
    thru_date timestamp 
)

create view development.xlat_tbl as
select * from odbc_select('amsterdam', 'xlat_tbl') as (
    field_name varchar  ,
    field_value varchar  ,
    status_cd varchar  ,
    descr varchar  ,
    descrshort varchar  ,
    entry_method_cd varchar  ,
    entry_individual_type_cd varchar  ,
    entry_individual_id varchar 
)


  However, I'm thinking the problem is with this IN clause:

  
  
where pl.network_id in (select ns.network_id
from development.network_state ns
  where ns.from_date < current_time
and (ns.thru_date > current_time or 
ns.thru_date is null)
and (ns.state_cd = pl.state_cd or ns.state_cd='')
  )

  
  
Because the sub-SELECT references pl.state_cd (an outer variable
reference), there's no chance of optimizing this into a join-style IN.
So the sub-SELECT has to be re-executed for each row of the outer query.

BTW, it's not apparent to me that your "flattened" query gives the same
answers as the original.  What if a pl row can join to more than one
row of the ns output?
  

Well, I guess you are right. As far as the database can tell, the
queries aren't the same. In practice, they are. network_state is
essentially tracking our contract dates with different discount
healthcare networks. from_date and thru_date track the timeframe we use
that network, with thru_date being null for the current networks. Some
networks cover all states, in which case state_cd is an empty string.
Otherwise, there will be a row per state covered. I can't think of any
way to enforce data integrity on this other than maybe via triggers. Is
there any way to make things more clear to the database (both in
general and on the postgres end of this) ? At the moment, the SQL
Server table has the primary key defined as (network_id, state_cd,
product), which is ok for now, but I'm realizing going forward could be
an issue if we ever stopped using a network in a state and then went
back to it.

I guess the next question is, is there any way I can give postgres
hints about what constraints exist on the data in these views?

Ed




Re: [PERFORM] browsing table with 2 million records

2005-10-27 Thread PFC




I've done it...
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets.  Among other things, it has serious
problems if you have multiple updating your table.  Also it's kind of
silly to be doing this in a set based data paradigm.


	Recently I've been browsing some site and it had this problem : as users  
kept adding new entries as I was browsing the list page after page, when I  
hit "next page" I got on the next page half of what I already saw on the  
previous page. Of course the webmaster has set the "visited links" color  
the same as "unvisited links", so I couldn't tell, and had to use my  
brain, which was quite upsetting XDDD


	And bookmarking a page to resume browsing at some later time does not  
work either, because if I bookmark page 15, then when I come back, users  
have added 10 pages of content and what I bookmarked is now on page 25...



All very well and good, but now do it generically...


Hehe. I like ranting...
	It is not possible to do it in a generic way that works in all cases. For  
instance :


Forum topic case :
- posts are added at the bottom and not at the top
- page number is relevant and meaningful

However, in most cases, you can use a multipart key and get it right.
	Suppose that, for instance, you have a base of several million records,  
organized according to :


- date (like the original poster)
or :
- country, region, city, customer last name, first name.

	You could ask for the first three, but then you'll get 5 Smiths in  
New York and 1 Van Bliezinsky.


	Or you could precalculate, once a week, a key interval distribution that  
creates reasonable sized intervals (for instance, 100 values in each),  
maybe asking that each interval should only contain only one city. So, you  
would get :


Country Region City LastNameFirstName
USA NYC NY  Smith,  ''
USA NYC NY  Smith,  Albert
USA NYC NY  Smith,  Bernard
.
USA NYC NY  Smith,  William
...
USA NYC NY  Von Braun
...

	So you'd predetermine your "page breaks" ahead of time, and recompute  
them once in a while. You won't get identically sized pages, but if the  
statistical distribution of the data plays nice, you should get evenly  
sized pages.


	The interesting part is that you can present the user with a selector  
which presents meaningful and useful data, AND is fast to compute, AND is  
fast to use.
	In this case, it would amount to "Select country, region, city", then,  
display a list like this :

Smith, ...Albert
Smith, Albus...Bernard
...
Smith, William...
...
Von Braun...Von Schwarts
...

	So Jeannette Smith would be easy to find, being in the link "Smith,  
Jean...John" for instance.


	If the aim is to quickly locate a particular record, I like  
javascript-powered autocompletion better ; but for browsing, this  
pagination method is cool.


Regards !


























---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Perfomance of views

2005-10-27 Thread Tom Lane
Svenne Krap <[EMAIL PROTECTED]> writes:
> The ways I have considered was :
> 1) write a big query in hand (not preferred as it gets hard to manage)
> 2) write layers of views (still not prefered as I still have to remember 
> to put on the right conditions everywhere)
> 3) write layers of sql-functions (returning the right sets of rows from 
> the underlying tables) - which I prefer from a development angel .. it 
> gets very clean and I cant forget a parameter anywhere.

#1 and #2 should behave pretty similarly, assuming that the "one big
query" would have been structured the same way as the nest of views is.
#3 unfortunately will pretty much suck, because there's no chance for
cross-level optimization.

There's been some discussion of inline-expanding SQL functions that
return sets when they are called in FROM, which would make a SQL
function that contains just a SELECT effectively equivalent to a view
as far as the planner's powers of optimization go.  No one's tried to
make it happen yet though.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance of views

2005-10-27 Thread Svenne Krap

Tom Lane wrote:


There's been some discussion of inline-expanding SQL functions that
return sets when they are called in FROM, which would make a SQL
function that contains just a SELECT effectively equivalent to a view
as far as the planner's powers of optimization go.  No one's tried to
make it happen yet though.
 



This is exactly what would be brilliant in my case. Use the functions as 
a kind of strict, parameterized views, that in the planner (or wherever) 
gets replaced down to a simple (?!?!) sql-statement.
This would imho be highly valuable for almost any kind of complex 
time-travel application (and surely dozens of other applications).


And before anyone suggests it, I don't code C well enough (*cough* 
rusty) to try to do it myself. I would apriciate if it went on the todo 
for 8.2 though. (I might even be willing to sponsor some money (a single 
or perhpas two thousands of US dollars) for getting it done and release 
it immediately under postgresql standard license (BSD)).


I by the way also support the idea of a way to force a table into a 
PgSQL managed cache like discussed a while ago. Sometimes overall speed 
for the system is less important than speed of a single query.


I must also say, that I am very impressed with the performance 
enhancements of 8.1 beta, the bitmap index scans are amazing ! Good job, 
guys - PgSQL has come a far way from 7.0 (where I started) and the 
future looks bright ;)


Svenne

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Update using primary key slow

2005-10-27 Thread Denis
The following update was captured in the database log and the elapsed time 
was 1058.956 ms.  A later explain analyze shows total run time of 730 ms.  
Although isn't the actual time to update the row 183 ms.  Where is the 
other 547 ms coming from?  Updating the two secondary indexes??


Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-1] LOG:  duration: 
1058.956 ms  statement: UPDATE CONTRACT SET CUSTOMER_KEY = 143567 
,SOURCE_CODE_KEY = null ,PRIOR_CONTRACT =
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-2]  
'265985' ,PRIOR_CONTRACT_ELITE = null ,CONTRACT_NEW = 'N' ,RENEWAL_CONTRACT 
= '1373990'  ,RENEWAL_CONTRACT_ELITE = null
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-3]  ,CONTRACT_DROPPED = 
'N' ,TOTAL_SALE_DOLLARS = 3492.00 ,TARGET_DOLLARS = 3576 
,ASSIGN_DOLLARS_OVERRIDE = null ,BOOK_KEY = 160
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-4]  ,PUB_SEQUENCE = 25 
,DROP_DATE = null ,OUT_OF_BUSINESS = 'N' ,RENEWAL_SALESREP_KEY = 3639 
,SALESREP_KEY = 3639 ,NEW_CATEGORY =
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-5]  
'NEW_INSERT' ,PENDING_DELETE_DATE = null ,CLIENT_NAME = null ,DATE_SOLD = 
'2004-08-30'  ,DATE_RECEIVED = '2004-09-03' 
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-6]  ,DATE_ENTERED = 
'2004-09-07'  ,DATE_SHELLED = null  ,DATE_APPROVED = '2004-09-09'  WHERE 
REGION_KEY = 14  AND CONTRACT_KEY =
Oct 27 08:09:48 ybcdrdwdb01 postgres[12315]: [145-7]  1070135


The explain for this update is as follows.


dw=# begin;
BEGIN
dw=#
dw=# explain analyze UPDATE CONTRACT SET CUSTOMER_KEY = 143567, 
SOURCE_CODE_KEY = null ,
dw-# PRIOR_CONTRACT = '265985' ,PRIOR_CONTRACT_ELITE = null ,CONTRACT_NEW = 
'N' ,RENEWAL_CONTRACT = '1373990'  ,RENEWAL_CONTRACT_ELITE = null
dw-#  ,CONTRACT_DROPPED = 'N' ,TOTAL_SALE_DOLLARS = 3492.00 ,TARGET_DOLLARS 
= 3576 ,ASSIGN_DOLLARS_OVERRIDE = null ,BOOK_KEY = 160
dw-#  ,PUB_SEQUENCE = 25 ,DROP_DATE = null ,OUT_OF_BUSINESS = 
'N' ,RENEWAL_SALESREP_KEY = 3639 ,SALESREP_KEY = 3639
dw-#  ,NEW_CATEGORY = 'NEW_INSERT' ,PENDING_DELETE_DATE = null ,CLIENT_NAME 
= null ,DATE_SOLD = '2004-08-30'  ,DATE_RECEIVED = '2004-09-03'
dw-#   ,DATE_ENTERED = '2004-09-07'  ,DATE_SHELLED = null  ,DATE_APPROVED = 
'2004-09-09'
dw-# WHERE REGION_KEY = 14  AND CONTRACT_KEY = 1070135;
QUERY PLAN
---

 Index Scan using contract_pkey on contract  (cost=0.00..10.61 rows=3 
width=115) (actual time=0.181..0.183 rows=1 loops=1)
   Index Cond: ((contract_key = 1070135) AND (region_key = 14))
 Total runtime: 0.730 ms
(3 rows)

dw=# rollback;
ROLLBACK



Here is the table and index definitions

dw=# \d contract
 Table "ods.contract"
 Column  |Type |   Modifiers
-+-+---
 contract_key| integer | not null
 customer_key| integer | not null
 source_code_key | integer |
 prior_contract  | character varying(20)   |
 prior_contract_elite| character varying(20)   |
 renewal_contract| character varying(20)   |
 contract_dropped| character varying(1)| not null
 renewal_contract_elite  | character varying(20)   |
 total_sale_dollars  | numeric(9,2)| not null
 assign_dollars_override | numeric(9,2)|
 target_dollars  | numeric(9,2)|
 book_key| integer | not null
 pub_sequence| integer |
 drop_date   | timestamp without time zone |
 out_of_business | character varying(1)| not null
 salesrep_key| integer |
 renewal_salesrep_key| integer |
 new_category| character varying(20)   |
 region_key  | integer | not null
 contract_new| character varying(1)| not null
 pending_delete_date | timestamp without time zone |
 client_name | character varying(150)  |
 fuzzy_client_name   | character varying(150)  |
 last_update_date| timestamp without time zone | default now()
 date_sold   | date|
 date_received   | date|
 date_entered| date|
 date_shelled| date|
 date_approved   | date|
Indexes:
"contract_pkey" primary key, btree (contract_key, region_key)
"XIE1_Contract" btree (region_key, book_key, pub_sequence)
"XIE2_Contract" btree (customer_key, region_key)


The table contains approximately 5 million rows

Postgres version is PostgreSQL 7.4.3 on i686-pc-linux-gnu, compile

Re: [PERFORM] Perfomance of views

2005-10-27 Thread Richard Huxton

Jan Wieck wrote:

On 10/27/2005 7:29 AM, Richard Huxton wrote:

Svenne Krap wrote:


What do you mean exactly but "pushing conditions inside" ?


If I have something like "SELECT * FROM complicated_view WHERE foo = 
7" then the planner can look "inside" complicated_view and see where 
it can attach the condition "foo=7", rather than running the query and 
applying the condition at the end.


Sorry, but the planner doesn't attach the condition anywhere. It is the 
rewriter that takes the actual query, replaces the views rangetable and 
expression entries with the actual underlying objects and adds the views 
condition with an AND to the queries condition. Simply example:


Thanks for the correction Jan.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] how postgresql request the computer resources

2005-10-27 Thread Richard Huxton

Sidar López Cruz wrote:
Is there something that tells postgres to take the resorces from 
computer (RAM, HDD, SWAP on linux) as it need, not modifying variables 
on postgresql.conf and other operating system things?


Ah, and how is it to know what to share with other processes?

A days ago i am trying to show that postgres is better than mssql but 
when execute a simple query like:


(1)
select count(*) from
Total runtime: 134552.325 ms

(2)
select count(*) from fotos where archivo not in (select archivo from 
Total runtime: 26747.236 ms


(3)
select count(1) from fotos f where not exists (select a.archivo from 
Total runtime: 89765.714 ms


(4)
SELECT count(*)
Total runtime: 114893.116 ms



WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS


In which case they make a bad choice for showing PostgreSQL is faster 
than MSSQL. Is this the only query you have, or are others giving you 
problems too?


I think count(*) is about the weakest point in PG, but I don't think 
there'll be a general solution available soon. As I'm sure someone has 
mentioned, whatever else, PG needs to check the row for its visibility 
information.


From the start of your email, you seem to suspect your configuration 
needs some work. Once you are happy that your settings in general are 
good, you can override some by issuing set statements before your query. 
For example:

SET work_mem = 1;
might well improve example #2 where you had a hash.

--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Update using primary key slow

2005-10-27 Thread Tom Lane
Denis <[EMAIL PROTECTED]> writes:
> The following update was captured in the database log and the elapsed time 
> was 1058.956 ms.  A later explain analyze shows total run time of 730 ms.  
> Although isn't the actual time to update the row 183 ms.  Where is the 
> other 547 ms coming from?  Updating the two secondary indexes??

The 183 msec is the time needed to *fetch* the row, not the time to
update it.  So it could well be that the other time is just the time
needed to update the table and indexes.  If this seems slower than
your hardware ought to be able to handle, I'd wonder about how recently
the table has been vacuumed.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Update using primary key slow

2005-10-27 Thread Denis
[EMAIL PROTECTED] (Tom Lane) wrote in
news:[EMAIL PROTECTED]: 

> Denis <[EMAIL PROTECTED]> writes:
>> The following update was captured in the database log and the elapsed
>> time was 1058.956 ms.  A later explain analyze shows total run time
>> of 730 ms.  Although isn't the actual time to update the row 183 ms. 
>> Where is the other 547 ms coming from?  Updating the two secondary
>> indexes?? 
> 
> The 183 msec is the time needed to *fetch* the row, not the time to
> update it.  So it could well be that the other time is just the time
> needed to update the table and indexes.  If this seems slower than
> your hardware ought to be able to handle, I'd wonder about how
> recently the table has been vacuumed.
> 
>regards, tom lane
> 
> ---(end of
> broadcast)--- TIP 1: if posting/reading
> through Usenet, please send an appropriate 
>subscribe-nomail command to [EMAIL PROTECTED] so that
>your message can get through to the mailing list cleanly
> 
> 

There is a vacuumdb done at 6 AM and 5PM

In addition this table is vacuumed at 2AM, 8AM, 10AM, 12PM, 2PM, and 4PM

This is the vacuum from last night at 5PM


INFO:  vacuuming "ods.contract"
INFO:  index "XIE1_Contract" now contains 5105322 row versions in 27710 
pages
DETAIL:  2174 index row versions were removed.
893 index pages have been deleted, 893 are currently reusable.
CPU 1.91s/1.58u sec elapsed 34.14 sec.
INFO:  index "XIE2_Contract" now contains 5105331 row versions in 21701 
pages
DETAIL:  2174 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.40s/1.42u sec elapsed 22.73 sec.
INFO:  index "contract_pkey" now contains 5105337 row versions in 21480 
pages
DETAIL:  2174 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.80s/1.52u sec elapsed 18.59 sec.
INFO:  "contract": removed 2174 row versions in 893 pages
DETAIL:  CPU 0.42s/0.08u sec elapsed 1.22 sec.
INFO:  "contract": found 2174 removable, 5105321 nonremovable row 
versions in 129154 pages
DETAIL:  1357 dead row versions cannot be removed yet.
There were 1967941 unused item pointers.
0 pages are entirely empty.
CPU 11.38s/5.09u sec elapsed 85.48 sec.
INFO:  analyzing "ods.contract"
INFO:  "contract": 129154 pages, 3000 rows sampled, 5277622 estimated 
total rows


Here is the latest vacuum today.

INFO:  vacuuming "ods.contract"
INFO:  index "XIE1_Contract" now contains 5106346 row versions in 28233 
pages
DETAIL:  64146 index row versions were removed.
706 index pages have been deleted, 669 are currently reusable.
CPU 2.03s/2.33u sec elapsed 20.08 sec.
INFO:  index "XIE2_Contract" now contains 5106347 row versions in 21951 
pages
DETAIL:  64146 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 2.16s/3.39u sec elapsed 12.23 sec.
INFO:  index "contract_pkey" now contains 5106347 row versions in 21516 
pages
DETAIL:  64146 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.76s/2.47u sec elapsed 11.80 sec.
INFO:  "contract": removed 64146 row versions in 26115 pages
DETAIL:  CPU 1.94s/2.55u sec elapsed 7.78 sec.
INFO:  "contract": found 64146 removable, 5106307 nonremovable row 
versions in 129154 pages
DETAIL:  890 dead row versions cannot be removed yet.
There were 1905028 unused item pointers.
0 pages are entirely empty.
CPU 14.83s/11.48u sec elapsed 60.96 sec.
INFO:  analyzing "ods.contract"
INFO:  "contract": 129154 pages, 3000 rows sampled, 5236929 estimated 
total rows


I would think this should be very fast.  I already described the CPU and 
memory.  THe disk is backed by an EMC DMX2000.  This particular server 
has 1 physical volume group of 500GB which is split over two logical 
volumes.  One for $PGDATA and the other ofr PG_XLOG.  THis split was not 
really done for performance since it comes from the same physical volume 
group, but more for space manageability.  The physical volume group 
consists of 11GB stripes from across the EMC san.  So that would be 
about 50 stripes which is really coming from dozens of backend disk 
drives.  Typical I/O response times for these is 3-5 ms.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] How much memory?

2005-10-27 Thread PostgreSQL
Is there a rule-of-thumb for determining the amount of system memory a 
database requres (other than "all you can afford")? 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update using primary key slow

2005-10-27 Thread Tom Lane
Denis <[EMAIL PROTECTED]> writes:
> [EMAIL PROTECTED] (Tom Lane) wrote in
> news:[EMAIL PROTECTED]: 
>> The 183 msec is the time needed to *fetch* the row, not the time to
>> update it.  So it could well be that the other time is just the time
>> needed to update the table and indexes.  If this seems slower than
>> your hardware ought to be able to handle, I'd wonder about how
>> recently the table has been vacuumed.

> Here is the latest vacuum today.

> INFO:  "contract": removed 64146 row versions in 26115 pages
> DETAIL:  CPU 1.94s/2.55u sec elapsed 7.78 sec.
> INFO:  "contract": found 64146 removable, 5106307 nonremovable row 
> versions in 129154 pages
> DETAIL:  890 dead row versions cannot be removed yet.
> There were 1905028 unused item pointers.

The "unused item pointers" number seems a bit high, but otherwise that
looks pretty reasonable.

Is it possible that the particular row you were updating has been
updated quite a lot of times since the last vacuum?  Or even quite
a few times within a single transaction?  The only thing I can think
of that would explain such a slow fetch is if the code has to reject
a bunch of recently-dead versions of the row.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] What gets cached?

2005-10-27 Thread PostgreSQL
Thank each of you for your replies.  I'm just beginning to understand the 
scope of my opportunities.

Someone (I apologize, I forgot who) recently posted this query:
SELECT oid::regclass, reltuples, relpages
FROM pg_class
ORDER BY 3 DESC

Though the application is a relatively low-volume TP system, it is 
structured a lot like a data warehouse with one primary table that 
everything else hangs off.  What the query above shows is that my largest 
table, at 34 million rows, takes almost 1.4 million pages or 10+ Gb if my 
math is good.  The same table has 14 indexes, totaling another 12Gb.  All 
this is running on a box with 4Gb of memory.

So what I believe I see happening is that almost every query is clearing out 
memory to load the particular index it needs.  Hence my "first queries are 
the fastest" observation at the beginning of this thread.

There are certainly design improvements to be done, but I've already started 
the process of getting the memory increased on our production db server.  We 
are btw running 8.1 beta 3.

""Steinar H. Gunderson"" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote:
>> Just to play devils advocate here for as second, but if we have an 
>> algorithm
>> that is substational better than just plain old LRU, which is what I 
>> believe
>> the kernel is going to use to cache pages (I'm no kernel hacker), then 
>> why
>> don't we apply that and have a significantly larger page cache a la 
>> Oracle?
>
> There have (AFAIK) been reports of setting huge amounts of shared_buffers
> (close to the total amount of RAM) performing much better in 8.1 than in
> earlier versions, so this might actually be okay these days.
>
> I haven't heard of anybody reporting increase setting such values, though.
>
> /* Steinar */
> -- 
> Homepage: http://www.sesse.net/
>
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
> 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Reasons and drawbacks for unused item pointers (was: Update using primary key slow)

2005-10-27 Thread Martin Lesser
Tom Lane <[EMAIL PROTECTED]> writes:

> Denis <[EMAIL PROTECTED]> writes:
>> There were 1905028 unused item pointers.
> The "unused item pointers" number seems a bit high, but otherwise that
> looks pretty reasonable.
>
> Is it possible that the particular row you were updating has been
> updated quite a lot of times since the last vacuum?  Or even quite
> a few times within a single transaction?

What causes this "unused item pointers" and which impact do they have
regarding performance?

If I understood your last posting correctly more than one update on a
single row between two vacuum's would i.e. result in one ore more
"unused item pointer". Does this slow down the vacuum process and/or
other processes?  Until now I could not find an answer what this number
implies.

Regards

Martin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Reasons and drawbacks for unused item pointers (was: Update using primary key slow)

2005-10-27 Thread Tom Lane
Martin Lesser <[EMAIL PROTECTED]> writes:
> What causes this "unused item pointers" and which impact do they have
> regarding performance?

Those are item pointer slots that were once used but aren't used at the
moment.  VACUUM leaves an empty slot behind when it removes a dead
tuple, and the slot is then available for re-use next time a tuple is
created on that page.  See
http://developer.postgresql.org/docs/postgres/storage-page-layout.html

The direct performance impact is really pretty minimal (and none at all
on indexscans, AFAIR).  The reason Denis' number drew my attention was
that it implied that the table had gone un-vacuumed for awhile at some
time in the past.  His stats were showing about 64000 tuples deleted
per vacuum pass, which would have created 64000 unused item pointers
--- but in a steady-state situation those would be eaten up again by
the time of the next vacuum.  To have 1905028 unused pointers in a
table with only 5106307 live entries suggests that at some point there
were 1.9 million (or so) dead but not-yet-vacuumed tuples, which
suggests insufficient vacuuming.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] how postgresql request the computer resources

2005-10-27 Thread Michael Best

Richard Huxton wrote:

WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS



In which case they make a bad choice for showing PostgreSQL is faster 
than MSSQL. Is this the only query you have, or are others giving you 
problems too?


I think count(*) is about the weakest point in PG, but I don't think 
there'll be a general solution available soon. As I'm sure someone has 
mentioned, whatever else, PG needs to check the row for its visibility 
information.


 From the start of your email, you seem to suspect your configuration 
needs some work. Once you are happy that your settings in general are 
good, you can override some by issuing set statements before your query. 
For example:

SET work_mem = 1;
might well improve example #2 where you had a hash.

--
  Richard Huxton
  Archonet Ltd


Someone had suggested keeping a vector table with +1 and -1 for row 
insertion and deletion and then running a cron to sum the vectors and 
update a table so that you could select from that table to get the row 
count.  Perhaps some sort of SUM() on a column function.


Since this seems like a reasonable approach (or perhaps there may be yet 
another better mechanism), cannot someone add this sort of functionality 
to Postgresql to do behind the scenes?


-Mike

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Reasons and drawbacks for unused item pointers

2005-10-27 Thread Martin Lesser
Tom Lane <[EMAIL PROTECTED]> writes:

> Martin Lesser <[EMAIL PROTECTED]> writes:
>> What causes this "unused item pointers" and which impact do they have
>> regarding performance?
> The direct performance impact is really pretty minimal (and none at
> all on indexscans, AFAIR).  The reason Denis' number drew my attention
> was that it implied that the table had gone un-vacuumed for awhile at
> some time in the past. [...]  To have 1905028 unused pointers in a
> table with only 5106307 live entries suggests that at some point there
> were 1.9 million (or so) dead but not-yet-vacuumed tuples, which
> suggests insufficient vacuuming.

Does each update of a single row result in an "unused item pointer"?
I.e. if I update one row 10 times between VACUUMing the table the result
are 10 unused pointers?

Some rows in some of my tables are updated much more frequently than
others so I'm not sure whether the number of unused pointers implie that
I should VACUUM more often than every 24 hours.

Martin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How much memory?

2005-10-27 Thread Ron Peacetree
Databases basically come in 4 sizes:

1= The entire DB fits into memory.
2= The performance critical table(s) fit(s) into memory
3= The indexes of the performance critical table(s) fit into memory.
4= Neither the performance critical tables nor their indexes fit into memory.

Performance decreases (exponentially), and development + maintenance 
cost/difficulty/pain increases (exponentially), as you go down the list.

While it is often not possible to be in class "1" above, do everything you can 
to be in at least class "3" and do everything you can to avoid class "4".

At ~$75-$150 per GB as of this post, RAM is the cheapest investment you can 
make in a high perfomance, low hassle DBMS.  IWill's and Tyan's 16 DIMM slot 
mainboards are worth every penny.

ron


-Original Message-
From: PostgreSQL <[EMAIL PROTECTED]>
Sent: Oct 27, 2005 3:31 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] How much memory?

Is there a rule-of-thumb for determining the amount of system memory a 
database requres (other than "all you can afford")? 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] What gets cached?

2005-10-27 Thread Jim C. Nasby
Did the patch that allows multiple seqscans to piggyback on each other
make it into 8.1? It might help in this situation.

BTW, if a query requires loading more than a few percent of an index
PostgreSQL will usually go with a sequential scan instead. You should
check explain/explain analyze on your queries and see what's actually
happening. If you've got stats turned on you can also look at
pg_stat_user_indexes to get a better idea of what indexes are and aren't
being used.

On Thu, Oct 27, 2005 at 03:41:10PM -0500, PostgreSQL wrote:
> Thank each of you for your replies.  I'm just beginning to understand the 
> scope of my opportunities.
> 
> Someone (I apologize, I forgot who) recently posted this query:
> SELECT oid::regclass, reltuples, relpages
> FROM pg_class
> ORDER BY 3 DESC
> 
> Though the application is a relatively low-volume TP system, it is 
> structured a lot like a data warehouse with one primary table that 
> everything else hangs off.  What the query above shows is that my largest 
> table, at 34 million rows, takes almost 1.4 million pages or 10+ Gb if my 
> math is good.  The same table has 14 indexes, totaling another 12Gb.  All 
> this is running on a box with 4Gb of memory.
> 
> So what I believe I see happening is that almost every query is clearing out 
> memory to load the particular index it needs.  Hence my "first queries are 
> the fastest" observation at the beginning of this thread.
> 
> There are certainly design improvements to be done, but I've already started 
> the process of getting the memory increased on our production db server.  We 
> are btw running 8.1 beta 3.
> 
> ""Steinar H. Gunderson"" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
> > On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote:
> >> Just to play devils advocate here for as second, but if we have an 
> >> algorithm
> >> that is substational better than just plain old LRU, which is what I 
> >> believe
> >> the kernel is going to use to cache pages (I'm no kernel hacker), then 
> >> why
> >> don't we apply that and have a significantly larger page cache a la 
> >> Oracle?
> >
> > There have (AFAIK) been reports of setting huge amounts of shared_buffers
> > (close to the total amount of RAM) performing much better in 8.1 than in
> > earlier versions, so this might actually be okay these days.
> >
> > I haven't heard of anybody reporting increase setting such values, though.
> >
> > /* Steinar */
> > -- 
> > Homepage: http://www.sesse.net/
> >
> >
> > ---(end of broadcast)---
> > TIP 2: Don't 'kill -9' the postmaster
> > 
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] how postgresql request the computer resources

2005-10-27 Thread Jim C. Nasby
On Thu, Oct 27, 2005 at 03:58:55PM -0600, Michael Best wrote:
> Richard Huxton wrote:
> >>WITH ANY OF THIS QUERIES MSSQL TAKES NOT MUCH OF 7 SECONDS
> >
> >
> >In which case they make a bad choice for showing PostgreSQL is faster 
> >than MSSQL. Is this the only query you have, or are others giving you 
> >problems too?
> >
> >I think count(*) is about the weakest point in PG, but I don't think 
> >there'll be a general solution available soon. As I'm sure someone has 
> >mentioned, whatever else, PG needs to check the row for its visibility 
> >information.
> >
> > From the start of your email, you seem to suspect your configuration 
> >needs some work. Once you are happy that your settings in general are 
> >good, you can override some by issuing set statements before your query. 
> >For example:
> >SET work_mem = 1;
> >might well improve example #2 where you had a hash.
> >
> >-- 
> >  Richard Huxton
> >  Archonet Ltd
> 
> Someone had suggested keeping a vector table with +1 and -1 for row 
> insertion and deletion and then running a cron to sum the vectors and 
> update a table so that you could select from that table to get the row 
> count.  Perhaps some sort of SUM() on a column function.
> 
> Since this seems like a reasonable approach (or perhaps there may be yet 
> another better mechanism), cannot someone add this sort of functionality 
> to Postgresql to do behind the scenes?

There's all kinds of things that could be added; the issue is
ascertaining what the performance trade-offs are (there's no such thing
as a free lunch) and if the additional code complexity is worth it.

Note that your suggestion probably wouldn't work in this case because
the user isn't doing a simple SELECT count(*) FROM table;. I'd bet that
MSSQL is using index covering to answer his queries so quickly,
something that currently just isn't possible with PostgreSQL. But if you
search the -hackers archives, you'll find a discussion on adding limited
heap tuple visibility information to indexes. That would allow for
partial index covering in many cases, which would probably be a huge win
for the queries the user was asking about.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How much memory?

2005-10-27 Thread Jim C. Nasby
On Thu, Oct 27, 2005 at 06:39:33PM -0400, Ron Peacetree wrote:
> Databases basically come in 4 sizes:
> 
> 1= The entire DB fits into memory.
> 2= The performance critical table(s) fit(s) into memory
> 3= The indexes of the performance critical table(s) fit into memory.
> 4= Neither the performance critical tables nor their indexes fit into memory.
> 
> Performance decreases (exponentially), and development + maintenance 
> cost/difficulty/pain increases (exponentially), as you go down the list.
> 
> While it is often not possible to be in class "1" above, do everything you 
> can to be in at least class "3" and do everything you can to avoid class "4".
> 
> At ~$75-$150 per GB as of this post, RAM is the cheapest investment you can 
> make in a high perfomance, low hassle DBMS.  IWill's and Tyan's 16 DIMM slot 
> mainboards are worth every penny.

And note that your next investment after RAM should be better disk IO.
More CPUs *generally* don't buy you much (if anything). My rule of
thumb: the only time your database should be CPU-bound is if you've got
a bad design*.

*NOTE: before everyone goes off about query parallelism and big
in-memory sorts and what-not, keep in mind I said "rule of thumb". :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster