Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Andrew Sullivan
On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote:

> hold of these dead rows instead of recycling them.  I suspect you have
> a client process somewhere that is holding an open transaction for a
> long time ... probably not doing anything, just sitting there with an
> unclosed BEGIN ...

Which could be because you're doing something nasty with one of the
"autocommit=off" clients.  Most of the client libraries implement
this by doing "commit;begin;" at every commit.  This means you have
way more idle in transaction connections than you think.  Look in
pg_stat_activity, assuming you've turned on query echoing. 

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall
Josh Berkus writes:
> Harry,


   Many thanks for your response,

> 
> >It has been suggested to me that I resubmit this question to this list,
> > rather than the GENERAL list it was originaly sent to.
> >
> >I asked earlier about ways of doing an UPDATE involving a left outer
> > join and got some very useful feedback.
> 
> The query you posted will always be somewhat slow due to the forced join 
> order, which is unavodable with a left outer join.  

  Yes - I rather suspected that!  It is a shame it takes two joins to do
the work.

> 
> However, regarding your peculiar behaviour, please post:
> 
> 1) Your random_page_cost and effective_cache_size settings

#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch cost

  i.e. - still set to their defaults.

> 2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN

  First the case with no vacuum analyze:

  QUERY PLAN
---
 Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual time=18579.92..48277.69 
rows=335671 loops=1)
   Merge Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Index Scan using import_cdrs_cdr_id_key on import_cdrs  (cost=0.00..52.00 
rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1)
   ->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual time=18578.71..21155.65 
rows=335671 loops=1)
 Sort Key: un.cdr_id
 ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual 
time=4.70..10011.35 rows=335671 loops=1)
   Hash Cond: ("outer".interim_cli = "inner".interim_num)
   Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
   ->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000 width=49) 
(actual time=0.02..4265.63 rows=335671 loops=1)
   ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.57..4.57 
rows=0 loops=1)
 ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) 
(actual time=0.12..2.77 rows=239 loops=1)
 Total runtime: 80408.42 msec
(12 rows)

  And now the case *with* the vacuum analyze:

   QUERY PLAN
-
 Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual 
time=12383.44..49297.58 rows=335671 loops=1)
   Hash Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) (actual 
time=0.15..9504.24 rows=335671 loops=1)
   ->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual 
time=12371.13..12371.13 rows=0 loops=1)
 ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual 
time=4.91..9412.55 rows=335671 loops=1)
   Hash Cond: ("outer".interim_cli = "inner".interim_num)
   Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
   ->  Seq Scan on import_cdrs un  (cost=0.00..8496.71 rows=335671 
width=30) (actual time=0.09..3813.54 rows=335671 loops=1)
   ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.71..4.71 
rows=0 loops=1)
 ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) 
(actual time=0.22..2.90 rows=239 loops=1)
 Total runtime: 432543.73 msec
(11 rows)

   Please note that since I first posted I have been slightly adjusting the
schema of the tables, but the disparity remains.

   Many thanks for your assistance.

   Regards,
   Harry.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Shridhar Daithankar
Harry Broomhall wrote:
> #effective_cache_size = 1000# typically 8KB each
> #random_page_cost = 4   # units are one sequential page fetch cost
You must tune the first one at least. Try 
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
parameters.

>>2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN
>
>
>   First the case with no vacuum analyze:
>
>   QUERY PLAN
> 
---
>  Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual 
time=18579.92..48277.69 rows=335671 loops=1)
>Merge Cond: ("outer".cdr_id = "inner".cdr_id)
>->  Index Scan using import_cdrs_cdr_id_key on import_cdrs 
(cost=0.00..52.00 rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 
loops=1)
>->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual 
time=18578.71..21155.65 rows=335671 loops=1)
>  Sort Key: un.cdr_id
>  ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual 
time=4.70..10011.35 rows=335671 loops=1)
>Hash Cond: ("outer".interim_cli = "inner".interim_num)
>Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
>->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000 
width=49) (actual time=0.02..4265.63 rows=335671 loops=1)
>->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual 
time=4.57..4.57 rows=0 loops=1)
>  ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 
width=46) (actual time=0.12..2.77 rows=239 loops=1)
>  Total runtime: 80408.42 msec
> (12 rows)

You are lucky to get a better plan here because planner is way off w.r.t 
estimated number of rows.
>
>   And now the case *with* the vacuum analyze:
>
>QUERY PLAN
> 
-
>  Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual 
time=12383.44..49297.58 rows=335671 loops=1)
>Hash Cond: ("outer".cdr_id = "inner".cdr_id)
>->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) 
(actual time=0.15..9504.24 rows=335671 loops=1)
>->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual 
time=12371.13..12371.13 rows=0 loops=1)
>  ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual 
time=4.91..9412.55 rows=335671 loops=1)
>Hash Cond: ("outer".interim_cli = "inner".interim_num)
>Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
>->  Seq Scan on import_cdrs un  (cost=0.00..8496.71 
rows=335671 width=30) (actual time=0.09..3813.54 rows=335671 loops=1)
>->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual 
time=4.71..4.71 rows=0 loops=1)
>  ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 
width=46) (actual time=0.22..2.90 rows=239 loops=1)
>  Total runtime: 432543.73 msec
> (11 rows)
>

What happens if you turn off hash joins? Also bump sort memory to something 
good.. around 16MB and see what difference does it make to performance..

 Shridhar



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Gaetano Mendola
Andrew Sullivan wrote:
On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote:


hold of these dead rows instead of recycling them.  I suspect you have
a client process somewhere that is holding an open transaction for a
long time ... probably not doing anything, just sitting there with an
unclosed BEGIN ...


Which could be because you're doing something nasty with one of the
"autocommit=off" clients.  Most of the client libraries implement
this by doing "commit;begin;" at every commit.  This means you have
way more idle in transaction connections than you think.  Look in
pg_stat_activity, assuming you've turned on query echoing. 
Or is enough do a ps -eafwww | grep post
to see the state of the connections
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall
Shridhar Daithankar writes:
> Harry Broomhall wrote:
>  > #effective_cache_size = 1000# typically 8KB each
>  > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> You must tune the first one at least. Try 
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
> parameters.

   Wow.  Many thanks for the pointer.  I'm going to be spending some time
trying to get my head around all of that!

[SNIP]

>  >  Total runtime: 80408.42 msec
>  > (12 rows)
> 
> You are lucky to get a better plan here because planner is way off w.r.t 
> estimated number of rows.

   Yes!  I thought that.  Which was why I was so surprised at the difference.

>  >
>  >   And now the case *with* the vacuum analyze:
>  >
[SNIP]
> 
> What happens if you turn off hash joins? Also bump sort memory to something 
> good.. around 16MB and see what difference does it make to performance..



   Lots of things to try there.


   It will probably take me some time  .

   Regards,
  Harry.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Alexander Priem
The machine is going to be used for a pretty large database (well over 100
tables with some of them containing over a million records from the start,
number of tables and records will grow (much?) larger in the future). This
database is going to be used by a pretty large number of employees. The
number of concurrent users will vary between 1 - 100 or so, depending on the
time of day etc. This will be a database containing client and supplier data
as well as product descriptions and prices/ingredients/labels/brands etc.
Database use will include lots of SELECTS but also lots of INSERTS/UPDATES,
i.e. the database will be pretty active during bussiness hours...

I think you (Scott and Will) are right when you say that NAS devices are not
ideal for this kind of thing. I have been thinking about the hardware
configuration for this machine for some time now (and had a lot of hints
through this list already) and decided to go for a SCSI RAID config after
all. The extra costs will be worth it   :)

The machine I have in mind now is like this :

Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...

This machine will be a bit more expensive than I thought at first (it's
going to be about EUR 14.000, but that's including 3 years of on-site
support from Dell (24x7, 4-hour response) and peripherals like UPS etc...

Do you think this machine wil be OK for this task?

Thanks for your help so far   :)

Kind regards,
Alexander Priem.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Shridhar Daithankar
Alexander Priem wrote:
Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...
This machine will be a bit more expensive than I thought at first (it's
going to be about EUR 14.000, but that's including 3 years of on-site
support from Dell (24x7, 4-hour response) and peripherals like UPS etc...
Check opteron as well.. I don't know much about european resellers. IBM sells 
eserver 325 which has opterons. Apparently they scale much better at higher 
load. Of course pricing,availability and support are most important.

http://theregister.co.uk/content/61/33378.html
http://www.pc.ibm.com/us/eserver/opteron/325/
Any concrete benchmarks for postgresql w.r.t xeons and opterons? A collection 
would be nice to have..:-)

 Shridhar

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Alexander Priem
I have considered Opteron, yes. But I think there are too many
uncertainties, like :

* It's a new CPU that has not proven itself yet.
* I don't think I can buy directly from IBM (according to their site), so
how about support (24x7) ? This will be very important to our client.
* I need to install and configure a 64bit Linux flavour which I don't know
(yet)

Any suggestions about the usability of the system I described before?

Here is the description again:

Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...

I have never worked with a XEON CPU before. Does anyone know how it performs
running PostgreSQL 7.3.4 / 7.4 on RedHat 9 ? Is it faster than a Pentium 4?
I believe the main difference is cache memory, right? Aside from cache mem,
it's basically a Pentium 4, or am I wrong?

Kind regards,
Alexander.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Geoffrey
Alexander Priem wrote:
I have considered Opteron, yes. But I think there are too many
uncertainties, like :
* It's a new CPU that has not proven itself yet.
* I don't think I can buy directly from IBM (according to their site), so
how about support (24x7) ? This will be very important to our client.
* I need to install and configure a 64bit Linux flavour which I don't know
(yet)
See http://www.monarchcomputer.com/  they custom build operton systems 
and preload them with Linux.  You don't pay the Microsoft tax.

--
Until later, Geoffrey   [EMAIL PROTECTED]
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall
Shridhar Daithankar writes:

  First - many thanks for your suggestions and pointers to further info.

  I have been trying some of them with some interesting results!

> Harry Broomhall wrote:
>  > #effective_cache_size = 1000# typically 8KB each
>  > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> You must tune the first one at least. Try 
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
> parameters.

  Changing effective_cache_size seemed to have very little effect.  I took it
in steps up to 300MB (the machine has 640MB memory), and the differences
in speed were less than 10%.

[SNIP]
> 
> What happens if you turn off hash joins?

   This makes the non vacuum version about 40% slower, and the vacuum version
to the same speed (i.e. about 4X faster than it had been!).

> Also bump sort memory to something 
> good.. around 16MB and see what difference does it make to performance..


  This was interesting.  Taking it to 10MB made a slight improvement.  Up to
20MB and the vacuum case improved by 5X speed, but the non-vacuum version
slowed down.  Putting it up to 40MB slowed both down again.

  I will need to test with some of the other scripts and functions I have
written, but it looks as if selective use of more sort memory will be
useful.

  Regards,
  Harry.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] SRFs ... no performance penalty?

2003-10-21 Thread Robert Treat
On Mon, 2003-10-20 at 20:55, Josh Berkus wrote:
> Folks,
> 
> I'm working on the demo session for our upcoming presentation at PHPCon.  
> 
> As a side issue, we ended up comparing 3 versions of the same search screen:
> 
> 1) All in PHP with views;
> 2) Using a function to build a query and count results but executing that 
> query directly and sorting, paging in PHP;
> 3) Using a Set Returning function to handle row-returning, sorting, and 
> paging.
> 
> All three methods were executing a series moderately complex query against a 
> medium-sized data set (only about 20,000 rows but it's on a laptop).  The 
> postgresql.conf was tuned like a webserver; e.g. low sort_mem, high 
> max_connections.
> 
> So far, on the average of several searches, we have:
> 
> 1) 0.19687 seconds
> 2) 0.20667 seconds
> 3) 0.20594 seconds
> 

Is this measuring time in the back-end or total time of script
execution? 


> In our tests, using any kind of PL/pgSQL function seems to carry a 0.01 second 
> penalty over using PHP to build the search query.   I'm not sure if this is 
> comparitive time for string-parsing or something else; the 0.01 seems to be 
> consistent regardless of scale.
> 
> The difference between using a PL/pgSQL function as a query-builder only (the 
> 7.2.x method) and using SRFs was small enough not to be significant.
> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave
Title: Tuning for mid-size server






Hi,


Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with internal drives on RAID5 will be delivered. Postgres will be from RH8.0.

I am planning for these values for the postgres configuration - to begin with:


Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144


Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 167772


Effective_cache_size = 262144 (same as shared_buffers - 25%)



In the /etc/sysctl file:

=

kernel.shmall = 536870912 (512MB) SHMALL Total amount of shared memory available (bytes or pages)

kernel.shmmax = 536870912 (512MB) SHMMAX Maximum size of shared memory segment (bytes)


In a generic sense, these are recommended values I found in some documents. The database will be small in size and will gradually grow over time from few thousands to a few million records, or more. The activity will be mostly of select statements from a few tables with joins, orderby, groupby clauses. The web application is based on Apache/Resin and hotspot JVM 1.4.0.

Are the above settings ok to begin with? Are there any other parameters that I should configure now, or monitor lateron?

In other words, am I missing anything here to take full advantage of 4 CPUs and 8Gigs of RAM?


Appreciate any help.



Thanks,

Anjan


** 

This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges.  If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.





Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread scott.marlowe
On Tue, 21 Oct 2003, Alexander Priem wrote:

> The machine is going to be used for a pretty large database (well over 100
> tables with some of them containing over a million records from the start,
> number of tables and records will grow (much?) larger in the future). This
> database is going to be used by a pretty large number of employees. The
> number of concurrent users will vary between 1 - 100 or so, depending on the
> time of day etc. This will be a database containing client and supplier data
> as well as product descriptions and prices/ingredients/labels/brands etc.
> Database use will include lots of SELECTS but also lots of INSERTS/UPDATES,
> i.e. the database will be pretty active during bussiness hours...
> 
> I think you (Scott and Will) are right when you say that NAS devices are not
> ideal for this kind of thing. I have been thinking about the hardware
> configuration for this machine for some time now (and had a lot of hints
> through this list already) and decided to go for a SCSI RAID config after
> all. The extra costs will be worth it   :)
> 
> The machine I have in mind now is like this :
> 
> Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
> This machine will contain a PERC4/Di RAID controller with 128MB of battery
> backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
> of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
> Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
> (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
> server via a SCSI cable...

Funny, we're looking at the same basic type of system here, but with a 
Perc3/CI controller.  We have a local supplier who gives us machines with 
a 3 year warranty and looks to be $1,000 to $2,000 lower than the Dell.

We're just going to run two 73 Gig drives in a RAID1 to start with, with 
battery backed RAM.

So that brings up my question, which is better, the Perc4 or Perc3 
controllers, and what's the difference between them?  I find Dell's 
tendency to hide other people's hardware behind their own model numbers 
mildly bothersome, as it makes it hard to comparison shop.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Richard Huxton
On Tuesday 21 October 2003 15:28, Anjan Dave wrote:
> Hi,
>
> Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with
> internal drives on RAID5 will be delivered. Postgres will be from RH8.0.

You'll want to upgrade PG to v7.3.4

> I am planning for these values for the postgres configuration - to begin
> with:
>
> Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144
>
> Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that -
> 167772
>
> Effective_cache_size = 262144 (same as shared_buffers - 25%)

My instincts would be to lower the first two substantially, and increase the 
effective cache once you know load levels. I'd probably start with something 
like the values below and work up:
shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing)
sort_mem = 4,000 - 8,000 (don't forget this is for each sort)

You'll find the annotated postgresql.conf and performance tuning articles 
useful:
http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

> In a generic sense, these are recommended values I found in some
> documents. The database will be small in size and will gradually grow
> over time from few thousands to a few million records, or more. The
> activity will be mostly of select statements from a few tables with
> joins, orderby, groupby clauses. The web application is based on
> Apache/Resin and hotspot JVM 1.4.0.

You'll need to figure out how many concurrent users you'll have and how much 
memory will be required by apache/java. If your database grows radically, 
you'll probably want to re-tune as it grows.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Anjan,

> Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with
> internal drives on RAID5 will be delivered. Postgres will be from RH8.0.

How many drives?   RAID5 sucks for heavy read-write databases, unless you have 
5+ drives.  Or a large battery-backed cache.

Also, last I checked, you can't address 8GB of RAM without a 64-bit processor.  
Since when are the Xeons 64-bit?

> Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144

That's too high.  Cut it in half at least.  Probably down to 5% of available 
RAM.

> Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that -
> 167772

Fine if you're running a few-user-large-operation database.  If this is a 
webserver, you want a much, much lower value.

> Effective_cache_size = 262144 (same as shared_buffers - 25%)

Much too low.  Where did you get these calculations, anyway?

> In a generic sense, these are recommended values I found in some
> documents.

Where?  We need to contact the author of the "documents" and tell them to 
correct things.

> joins, orderby, groupby clauses. The web application is based on
> Apache/Resin and hotspot JVM 1.4.0.

You'll need to estimate the memory consumed by Java & Apache to have realistic 
figures to work with.

> Are the above settings ok to begin with? Are there any other parameters
> that I should configure now, or monitor lateron?

No, they're not.  See:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
parameters.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] SRFs ... no performance penalty?

2003-10-21 Thread Josh Berkus
Robert,

> > 1) 0.19687 seconds
> > 2) 0.20667 seconds
> > 3) 0.20594 seconds
>
> Is this measuring time in the back-end or total time of script
> execution?

Total time of execution, e.g. from clicking the "enter" button to displaying 
the list of matches.  Any other comparison would be misleading.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Seum-Lim Gan
Hi Tom,

1.)
OK. We have narrowed it down.
We did a few (like 5 to 8 times) vacuum analyze  (no full), the
pg_statistics relfilenode grew. There was no database operation when
we did this, no other client connections except the one that does
the vacuum.
If we do plain simple "vacuum " (again no full), we see
pg_statistics_relid_att_index relfilenode grew instead of
pg_statistics.
So, overtime, these files will grow if we do vacuum.

Are these expected ?

The question now is, if we are not doing anything
to the database, why would they grow after a few vacuums ?
2.)
The other problem we have with
 DETAIL:  101802 dead row versions cannot be removed yet.

 DETAIL:  110900 dead row versions cannot be removed yet.

 DETAIL:  753064 dead row versions cannot be removed yet.

 DETAIL:  765328 dead row versions cannot be removed yet.
We will collect more data and see what we can get from the
the process. Offhand, the process is connecting to
the database through ODBC and we don't use any BEGIN in
our updates, just doing plain UPDATE repeatedly
with different keys randomly.
The database is defaulted to autocommit=true in postgresql.conf.
Thanks.

Gan

At 5:25 pm -0400 2003/10/20, Tom Lane wrote:
Seum-Lim Gan <[EMAIL PROTECTED]> writes:
 We tried one more thing: with the table not being updated
 at all and we did vacuum. Each time a vacuum is done,
 the index file becomes bigger.
It is not possible for plain vacuum to make the index bigger.

VACUUM FULL possibly could make the index bigger, since it has to
transiently create duplicate index entries for every row it moves.
If you want any really useful comments on your situation, you're going
to have to offer considerably more detail than you have done so far ---
preferably, a test case that lets someone else reproduce your results.
So far, all we can do is guess on the basis of very incomplete
information.  When you aren't even bothering to mention whether a vacuum
is FULL or not, I have to wonder whether I have any realistic picture of
what's going on.
			regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


--
++
| Seum-Lim GAN email : [EMAIL PROTECTED]  |
| Lucent Technologies|
| 2000 N. Naperville Road, 6B-403F  tel : (630)-713-6665 |
| Naperville, IL 60566, USA.fax : (630)-713-7272 |
|   web : http://inuweb.ih.lucent.com/~slgan |
++
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Tom Lane
Seum-Lim Gan <[EMAIL PROTECTED]> writes:
> We did a few (like 5 to 8 times) vacuum analyze  (no full), the
> pg_statistics relfilenode grew.

Well, sure.  ANALYZE puts new rows into pg_statistic, and obsoletes old
ones.  You need to vacuum pg_statistic every so often (not to mention
the other system catalogs).

> If we do plain simple "vacuum " (again no full), we see
> pg_statistics_relid_att_index relfilenode grew instead of
> pg_statistics.

Don't think I believe that.  Plain vacuum won't touch pg_statistic
at all (unless it's the target table of course).  I'd expect ANALYZE
to make both the stats table and its index grow, though.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Tom Lane
Harry Broomhall <[EMAIL PROTECTED]> writes:

>->  Index Scan using import_cdrs_cdr_id_key on import_cdrs  (cost=0.00..52.00 
> rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1)

>->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) (actual 
> time=0.15..9504.24 rows=335671 loops=1)

Hm.  The planner's default cost parameters assume that a full-table
index scan will be much slower than a full-table seq scan.  That's
evidently not the case in your test situation.  You could probably
bring the estimates more in line with reality (and thereby improve the
choice of plan) by reducing random_page_cost towards 1 and increasing
effective_cache_size to represent some realistic fraction of your
available RAM (though I concur with your observation that the
latter doesn't change the estimates all that much).

Beware however that test-case reality and production reality are not the
same thing.  You are evidently testing with tables that fit in RAM.
If your production tables will not, you'd better be wary of being overly
aggressive about reducing random_page_cost.  I believe the default value
(4.0) is fairly representative for situations where many actual disk
fetches are needed, ie, the tables are much larger than RAM.  1.0 would
be appropriate if all your tables are always fully cached in RAM (since
RAM has by definition no random-access penalty).  In intermediate cases
you need to select intermediate values.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread scott.marlowe
On Tue, 21 Oct 2003, Josh Berkus wrote:

> Anjan,
> 
> > Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with
> > internal drives on RAID5 will be delivered. Postgres will be from RH8.0.
> 
> How many drives?   RAID5 sucks for heavy read-write databases, unless you have 
> 5+ drives.  Or a large battery-backed cache.

You don't need a large cache, so much as a cache.  The size isn't usually 
an issue now that 64 to 256 megs caches are the nominal cache sizes.  Back 
when it was a choice of 4 or 8 megs it made a much bigger difference than 
64 versus 256 meg make today.

Also, if it's a read only environment, RAID5 with n drives equals the 
performance of RAID0 with n-1 drives.

> Also, last I checked, you can't address 8GB of RAM without a 64-bit processor.  
> Since when are the Xeons 64-bit?

Josh, you gotta get out more.  IA32 has supported >4 gig ram for a long 
time now, and so has the linux kernel.  It uses a paging method to do it.  
Individual processes are still limited to ~3 gig on Linux on 32 bit 
hardware though, so the extra mem will almost certainly spend it's time as 
kernel cache.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave
From what I know, there is a cache-row-set functionality that doesn't
exist with the newer postgres...

Concurrent users will start from 1 to a high of 5000 or more, and could
ramp up rapidly. So far, with increased users, we have gone up to
starting the JVM (resin startup) with 1024megs min and max (recommended
by Sun) - on the app side.

Thanks,
Anjan 


-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 21, 2003 11:57 AM
To: Anjan Dave; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Tuning for mid-size server


On Tuesday 21 October 2003 15:28, Anjan Dave wrote:
> Hi,
>
> Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, 
> with internal drives on RAID5 will be delivered. Postgres will be from

> RH8.0.

You'll want to upgrade PG to v7.3.4

> I am planning for these values for the postgres configuration - to 
> begin
> with:
>
> Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144
>
> Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 
> 167772
>
> Effective_cache_size = 262144 (same as shared_buffers - 25%)

My instincts would be to lower the first two substantially, and increase
the 
effective cache once you know load levels. I'd probably start with
something 
like the values below and work up:
shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the
cacheing) sort_mem = 4,000 - 8,000 (don't forget this is for each sort)

You'll find the annotated postgresql.conf and performance tuning
articles 
useful: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php

> In a generic sense, these are recommended values I found in some 
> documents. The database will be small in size and will gradually grow 
> over time from few thousands to a few million records, or more. The 
> activity will be mostly of select statements from a few tables with 
> joins, orderby, groupby clauses. The web application is based on 
> Apache/Resin and hotspot JVM 1.4.0.

You'll need to figure out how many concurrent users you'll have and how
much 
memory will be required by apache/java. If your database grows
radically, 
you'll probably want to re-tune as it grows.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave
Josh,

The 6650 can have upto 32GB of RAM.

There are 5 drives. In future, they will be replaced by a fiber array -
hopefully.

I read an article that suggests you 'start' with 25% of memory for
shared_buffers. Sort memory was suggested to be at 2-4%. Here's the
link:
http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html
Maybe, I misinterpreted it.

I read the document on
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html and the
suggested values are much lower than what I have mentioned here. It
won't hurt to start with lower numbers and increase lateron if needed.

Thanks,
Anjan 



-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 21, 2003 12:21 PM
To: Anjan Dave; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Tuning for mid-size server


Anjan,

> Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, 
> with internal drives on RAID5 will be delivered. Postgres will be from

> RH8.0.

How many drives?   RAID5 sucks for heavy read-write databases, unless
you have 
5+ drives.  Or a large battery-backed cache.

Also, last I checked, you can't address 8GB of RAM without a 64-bit
processor.  
Since when are the Xeons 64-bit?

> Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144

That's too high.  Cut it in half at least.  Probably down to 5% of
available 
RAM.

> Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 
> 167772

Fine if you're running a few-user-large-operation database.  If this is
a 
webserver, you want a much, much lower value.

> Effective_cache_size = 262144 (same as shared_buffers - 25%)

Much too low.  Where did you get these calculations, anyway?

> In a generic sense, these are recommended values I found in some 
> documents.

Where?  We need to contact the author of the "documents" and tell them
to 
correct things.

> joins, orderby, groupby clauses. The web application is based on 
> Apache/Resin and hotspot JVM 1.4.0.

You'll need to estimate the memory consumed by Java & Apache to have
realistic 
figures to work with.

> Are the above settings ok to begin with? Are there any other 
> parameters that I should configure now, or monitor lateron?

No, they're not.  See:
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune
these 
parameters.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Scott,

> Also, if it's a read only environment, RAID5 with n drives equals the
> performance of RAID0 with n-1 drives.

True.

> Josh, you gotta get out more.  IA32 has supported >4 gig ram for a long
> time now, and so has the linux kernel.  It uses a paging method to do it.
> Individual processes are still limited to ~3 gig on Linux on 32 bit
> hardware though, so the extra mem will almost certainly spend it's time as
> kernel cache.

Not that you'd want a sigle process to grow that large anyway.   

So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit 
vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even 
I can do the math on 2^32.   All these 64-bit vendors, then, are talking 
about the limit on ram *per application* and not per machine?

This has all been academic to me to date, as the only very-high-ram systems 
I've worked with were Sparc or micros.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Low Insert/Update Performance

2003-10-21 Thread Josh Berkus
Rhaoni,

> The delphi program does just one commit for all queries .
> I was wandering if ther is some configuration parameters to be changed to
> improve the performance ?

To help you, we'll need to to trap a query and run an EXPLAIN ANALYZE on it.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Anjan,

> I read an article that suggests you 'start' with 25% of memory for
> shared_buffers. Sort memory was suggested to be at 2-4%. Here's the
> link:
> http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html
> Maybe, I misinterpreted it.

No, I can see how you arrived at that conclusion, and Bruce is an authority.  
I'll contact him.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Anjan,

> From what I know, there is a cache-row-set functionality that doesn't
> exist with the newer postgres...

What?  PostgreSQL has always used the kernel cache for queries.

> Concurrent users will start from 1 to a high of 5000 or more, and could
> ramp up rapidly. So far, with increased users, we have gone up to
> starting the JVM (resin startup) with 1024megs min and max (recommended
> by Sun) - on the app side.

Well, just keep in mind when tuning that your calculations should be based on 
*available* RAM, meaning RAM not used by Apache or the JVM.

With that many concurrent requests, you'll want to be *very* conservative with 
sort_mem; I might stick to the default of 1024 if I were you, or even lower 
it to 512k.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread scott.marlowe
On Tue, 21 Oct 2003, Josh Berkus wrote:

> Scott,
> 
> > Also, if it's a read only environment, RAID5 with n drives equals the
> > performance of RAID0 with n-1 drives.
> 
> True.
> 
> > Josh, you gotta get out more.  IA32 has supported >4 gig ram for a long
> > time now, and so has the linux kernel.  It uses a paging method to do it.
> > Individual processes are still limited to ~3 gig on Linux on 32 bit
> > hardware though, so the extra mem will almost certainly spend it's time as
> > kernel cache.
> 
> Not that you'd want a sigle process to grow that large anyway.

True :-)  Especially a pgsql backend.

> So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit 
> vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even 
> I can do the math on 2^32.   All these 64-bit vendors, then, are talking 
> about the limit on ram *per application* and not per machine?

I think it's 64 gigs in the current implementation, but that could just be 
a chip set thing, i.e. the theoretical limit is probably 2^63 or 2^64, but 
the realistic limitation is that the current mobo chipsets are gonna have 
a much lower limit, and I seem to recall that being 64 gig last I looked.



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread scott.marlowe
On Tue, 21 Oct 2003, Josh Berkus wrote:

> Anjan,
> 
> > From what I know, there is a cache-row-set functionality that doesn't
> > exist with the newer postgres...
> 
> What?  PostgreSQL has always used the kernel cache for queries.
> 
> > Concurrent users will start from 1 to a high of 5000 or more, and could
> > ramp up rapidly. So far, with increased users, we have gone up to
> > starting the JVM (resin startup) with 1024megs min and max (recommended
> > by Sun) - on the app side.
> 
> Well, just keep in mind when tuning that your calculations should be based on 
> *available* RAM, meaning RAM not used by Apache or the JVM.
> 
> With that many concurrent requests, you'll want to be *very* conservative with 
> sort_mem; I might stick to the default of 1024 if I were you, or even lower 
> it to 512k.

Exactly.  Remember, Anjan, that that if you have a single sort that can't 
fit in RAM, it will use the hard drive for temp space, effectively 
"swapping" on its own.  If the concurrent sorts run the server out of 
memory, the server will start swapping process, quite possibly the sorts, 
in a sort of hideous round robin death spiral that will bring your machine 
to its knees as the worst possible time, midday under load.  sort_mem is 
one of the small "foot guns" in the postgresql.conf file that people tend 
to pick up and go "huh, what's this do?" right before cranking it up.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Andrew Sullivan
On Tue, Oct 21, 2003 at 10:12:15AM -0700, Josh Berkus wrote:
> 
> So what is the ceiling on 32-bit processors for RAM? Most of the 64-bit 
> vendors are pushing Athalon64 and G5 as "breaking the 4GB barrier", and even 
> I can do the math on 2^32.   All these 64-bit vendors, then, are talking 
> about the limit on ram *per application* and not per machine?

Or per same-time access.  Remember that, back in the old days on the
pre-386s, accessing the extended or expanded memory (anyone remember
which was which?) involved some fairly serious work, and not
everything was seamless.  I expect something similar is at work here. 
Not that I've had a reason to play with 4G ix86 machines, anyway.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Andrew Sullivan
On Tue, Oct 21, 2003 at 10:15:57AM -0700, Josh Berkus wrote:
> Anjan,
> 
> > I read an article that suggests you 'start' with 25% of memory for
> > shared_buffers. Sort memory was suggested to be at 2-4%. Here's the
> > link:
> > http://www.ca.postgresql.org/docs/momjian/hw_performance/node8.html
> > Maybe, I misinterpreted it.
> 
> No, I can see how you arrived at that conclusion, and Bruce is an authority.  
> I'll contact him.

I think the "25%" rule of thumb is slightly stale: above some
threshold, it just falls apart, and lots of people now have machines
well within that threshold.  Heck, I'll bet Bruce's 2-way machine is
within that threshold.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Jeff
On Tue, 21 Oct 2003 10:12:15 -0700
Josh Berkus <[EMAIL PROTECTED]> wrote:

> So what is the ceiling on 32-bit processors for RAM? Most of the
> 64-bit vendors are pushing Athalon64 and G5 as "breaking the 4GB
> barrier", and even I can do the math on 2^32.   All these 64-bit
> vendors, then, are talking about the limit on ram *per application*
> and not per machine?

You can have > 4GB per app, but also you get a big performance boost as
you don't have to deal with all the silly paging - think of it from when
we switched from real mode to protected mode. 

If you check out hte linux-kernel archives you'll see one of the things
often recommended when things go odd is to turn off HIMEM support. 

-- 
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Tom Lane
Andrew Sullivan <[EMAIL PROTECTED]> writes:
> I think the "25%" rule of thumb is slightly stale: above some
> threshold, it just falls apart, and lots of people now have machines
> well within that threshold.  Heck, I'll bet Bruce's 2-way machine is
> within that threshold.

IIRC, we've not seen much evidence that increasing shared_buffers above
about 1 delivers any performance boost.  That's 80Mb, so the "25%"
rule doesn't get seriously out of whack until you get to a gig or so of
RAM.  Which was definitely not common at the time the rule was put
forward, but is now.  Probably we should modify the rule-of-thumb to
something like "25%, but not more than 1 buffers".

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Andrew,

> I think the "25%" rule of thumb is slightly stale: above some
> threshold, it just falls apart, and lots of people now have machines
> well within that threshold.  Heck, I'll bet Bruce's 2-way machine is
> within that threshold.

Sure.  But we had a few people on this list do tests (including me) and the 
anecdotal evidence was lower than 25%, substantially.   The falloff is subtle 
until you hit 50% of RAM, like:

%   query throughput
1   
5   -
10---
15--
20  --
25  -
30  
35  
40  ---

... so it's often not immediately apparent when you've set stuff a little too 
high.However, in the folks that tested, the ideal was never anywhere near 
25%, usually more in the realm of 5-10%.  I've been using 6% as my starting 
figure for the last year for a variety of servers with good results.

Of course, if you have anecdotal evidence to the contrary, then the only way 
to work this would be to have OSDL help us sort it out.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Will LaShell
On Tue, 2003-10-21 at 08:40, scott.marlowe wrote:

> So that brings up my question, which is better, the Perc4 or Perc3 
> controllers, and what's the difference between them?  I find Dell's 
> tendency to hide other people's hardware behind their own model numbers 
> mildly bothersome, as it makes it hard to comparison shop.

Perc4 has n LSI 1030 chip
http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm


Perc3
depending on the model can be a couple of things but I think they are
all U160 controllers and not U320




Will


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Tom Lane
Andrew Sullivan <[EMAIL PROTECTED]> writes:
> On Tue, Oct 21, 2003 at 10:12:15AM -0700, Josh Berkus wrote:
>> So what is the ceiling on 32-bit processors for RAM?

> ... Remember that, back in the old days on the
> pre-386s, accessing the extended or expanded memory (anyone remember
> which was which?) involved some fairly serious work, and not
> everything was seamless.  I expect something similar is at work here. 

Right.  A 32-bit processor can only (conveniently) allow any individual
process to access 4G worth of address space.  However the total RAM in
the system can be more --- the kernel can set up the hardware address
mappings to let different user processes use different up-to-4G segments
of that RAM.  And the kernel can also use excess RAM for disk buffer
cache.  So there's plenty of value in more-than-4G RAM, as long as
you're not expecting any single user process to need more than 4G.
This is no problem at all for Postgres, in which individual backend
processes don't usually get very large, and we'd just as soon let most
of the RAM go to kernel disk buffers anyway.

I think that some hardware configurations have problems with using RAM
above the first 4G for disk buffers, because of disk controller hardware
that can't cope with physical DMA addresses wider than 32 bits.  The
solution here is to buy a better disk controller.  If you google for
"bounce buffers" you can learn more about this.

What goes around comes around I guess --- I remember playing these same
kinds of games to use more than 64K RAM in 16-bit machines, 25-odd years
ago...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Christopher Browne
In the last exciting episode, [EMAIL PROTECTED] (Josh Berkus) wrote:
> So what is the ceiling on 32-bit processors for RAM? Most of the
> 64-bit vendors are pushing Athalon64 and G5 as "breaking the 4GB
> barrier", and even I can do the math on 2^32.  All these 64-bit
> vendors, then, are talking about the limit on ram *per application*
> and not per machine?

I have been seeing ia-32 servers with 8GB of RAM; it looks as though
there are ways of having them support ("physically, in theory, if you
could get a suitable motherboard") as much as 64GB.

But that certainly doesn't get you past 2^32 bytes per process, and
possibly not past 2^31 bytes/process.

>From Linux kernel help:

 CONFIG_NOHIGHMEM:
   
  
Linux can use up to 64 Gigabytes of physical memory on x86
systems.  However, the address space of 32-bit x86 processors is
only 4 Gigabytes large. That means that, if you have a large
amount of physical memory, not all of it can be "permanently
mapped" by the kernel. The physical memory that's not permanently
mapped is called "high memory".

And that leaves open the question of how much shared memory you can
address.  That presumably has to fit into the 4GB, and if your
PostgreSQL processes had (by some fluke) 4GB of shared memory, there
wouldn't be any "local" memory for sort memory and the likes.

Add to that the consideration that there are reports of Linux "falling
over" when you get to right around 2GB/4GB.  I ran a torture test a
while back that _looked_ like it was running into that; I can't verify
that, unfortunately.

I don't see there being a whole lot of use of having more than about
8GB on an ia-32 system; what with shared memory maxing out at
somewhere between 1 and 2GB, that suggests having ~8GB in total.

I'd add another PG cluster if I had 16GB...
-- 
let name="aa454" and tld="freenet.carleton.ca" in name ^ "@" ^ tld;;
http://www.ntlug.org/~cbbrowne/postgresql.html
"A statement is either correct or incorrect.  To be *very* incorrect is
 like being *very* dead ... "
-- Herbert F. Spirer
   Professor of Information Management
   University of Conn.
   (DATAMATION Letters, Sept. 1, 1984)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave
Josh,
 
The app servers are seperate dual-cpu boxes with 2GB RAM on each.
 
Yes, from all the responses i have seen, i will be reducing the numbers to what has 
been suggested.
 
Thanks to all,
anjan

-Original Message- 
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Tue 10/21/2003 1:22 PM 
To: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] 
Cc: 
Subject: Re: [PERFORM] Tuning for mid-size server



Anjan,

> From what I know, there is a cache-row-set functionality that doesn't
> exist with the newer postgres...

What?  PostgreSQL has always used the kernel cache for queries.

> Concurrent users will start from 1 to a high of 5000 or more, and could
> ramp up rapidly. So far, with increased users, we have gone up to
> starting the JVM (resin startup) with 1024megs min and max (recommended
> by Sun) - on the app side.

Well, just keep in mind when tuning that your calculations should be based on
*available* RAM, meaning RAM not used by Apache or the JVM.

With that many concurrent requests, you'll want to be *very* conservative with
sort_mem; I might stick to the default of 1024 if I were you, or even lower
it to 512k.

--
Josh Berkus
Aglio Database Solutions
San Francisco



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave
Hopefully, i am not steering this into a different direction, but is there a way to 
find out how much sort memory each query is taking up, so that we can scale that up 
with increasing users?

THanks,
Anjan

-Original Message- 
From: scott.marlowe [mailto:[EMAIL PROTECTED] 
Sent: Tue 10/21/2003 1:33 PM 
To: Josh Berkus 
Cc: Anjan Dave; Richard Huxton; [EMAIL PROTECTED] 
Subject: Re: [PERFORM] Tuning for mid-size server



On Tue, 21 Oct 2003, Josh Berkus wrote:

> Anjan,
>
> > From what I know, there is a cache-row-set functionality that doesn't
> > exist with the newer postgres...
>
> What?  PostgreSQL has always used the kernel cache for queries.
>
> > Concurrent users will start from 1 to a high of 5000 or more, and could
> > ramp up rapidly. So far, with increased users, we have gone up to
> > starting the JVM (resin startup) with 1024megs min and max (recommended
> > by Sun) - on the app side.
>
> Well, just keep in mind when tuning that your calculations should be based on
> *available* RAM, meaning RAM not used by Apache or the JVM.
>
> With that many concurrent requests, you'll want to be *very* conservative 
with
> sort_mem; I might stick to the default of 1024 if I were you, or even lower
> it to 512k.

Exactly.  Remember, Anjan, that that if you have a single sort that can't
fit in RAM, it will use the hard drive for temp space, effectively
"swapping" on its own.  If the concurrent sorts run the server out of
memory, the server will start swapping process, quite possibly the sorts,
in a sort of hideous round robin death spiral that will bring your machine
to its knees as the worst possible time, midday under load.  sort_mem is
one of the small "foot guns" in the postgresql.conf file that people tend
to pick up and go "huh, what's this do?" right before cranking it up.




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] RAID controllers etc... was: PostgreSQL data on a NAS device ?

2003-10-21 Thread scott.marlowe
On 21 Oct 2003, Will LaShell wrote:

> On Tue, 2003-10-21 at 08:40, scott.marlowe wrote:
> 
> > So that brings up my question, which is better, the Perc4 or Perc3 
> > controllers, and what's the difference between them?  I find Dell's 
> > tendency to hide other people's hardware behind their own model numbers 
> > mildly bothersome, as it makes it hard to comparison shop.
> 
> Perc4 has n LSI 1030 chip
> http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm
> 
> 
> Perc3
> depending on the model can be a couple of things but I think they are
> all U160 controllers and not U320

Thanks.  I googled around and found this page:

http://www.domsch.com/linux/

Which says what each model is.  It looks like the "RAID" controller they 
wanna charge me for is about $500 or so, so I'm guessing it's the medium 
range Elite 1600 type controller, i.e. U160, which is plenty for the 
machine / drive number we'll be using.  

Has anyone played around with the latest ones to get a feel for them?  I 
want a battery backed controller that runs well under linux and also BSD 
that isn't gonna break the bank.  I'd heard bad stories about the 
performance of the Adaptec RAID controllers, but it seems the newer ones 
aren't bad from what I've found googling.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Andrew Sullivan
On Tue, Oct 21, 2003 at 11:51:02AM -0700, Josh Berkus wrote:

> Of course, if you have anecdotal evidence to the contrary, then the
> only way to work this would be to have OSDL help us sort it out.

Nope.  I too have such anecdotal evidence that 25% is way too high. 
It also seems to depend pretty heavily on what you're trying to
optimise for and what platform you have.  But I'm glad to hear
(again) that people seem to think the 25% too high for most cases.  I
don't feel so much like I'm tilting against windmills.

A


-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread scott.marlowe
On Tue, 21 Oct 2003, Andrew Sullivan wrote:

> On Tue, Oct 21, 2003 at 11:51:02AM -0700, Josh Berkus wrote:
> 
> > Of course, if you have anecdotal evidence to the contrary, then the
> > only way to work this would be to have OSDL help us sort it out.
> 
> Nope.  I too have such anecdotal evidence that 25% is way too high. 
> It also seems to depend pretty heavily on what you're trying to
> optimise for and what platform you have.  But I'm glad to hear
> (again) that people seem to think the 25% too high for most cases.  I
> don't feel so much like I'm tilting against windmills.

I think where it makes sense is when you have something like a report 
server where the result sets may be huge, but the parellel load is load, 
i.e. 5 or 10 users tossing around 100 Meg or more at time.

If you've got 5,000 users running queries that are indexed and won't be 
using that much memory each, then there's usually no advantage to going 
over a certain number of buffers, and that certain number may be as low 
as 1000 for some applications.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] RAID controllers etc... was: PostgreSQL data on a NAS device ?

2003-10-21 Thread Will LaShell
On Tue, 2003-10-21 at 13:36, scott.marlowe wrote:
> On 21 Oct 2003, Will LaShell wrote:
> 
> > On Tue, 2003-10-21 at 08:40, scott.marlowe wrote:
> > 
> > > So that brings up my question, which is better, the Perc4 or Perc3 
> > > controllers, and what's the difference between them?  I find Dell's 
> > > tendency to hide other people's hardware behind their own model numbers 
> > > mildly bothersome, as it makes it hard to comparison shop.
> > 
> > Perc4 has n LSI 1030 chip
> > http://docs.us.dell.com/docs/storage/perc4di/en/ug/features.htm
> > 
> > 
> > Perc3
> > depending on the model can be a couple of things but I think they are
> > all U160 controllers and not U320
> 
> Thanks.  I googled around and found this page:
> 
> http://www.domsch.com/linux/
> 
> Which says what each model is.  It looks like the "RAID" controller they 
> wanna charge me for is about $500 or so, so I'm guessing it's the medium 
> range Elite 1600 type controller, i.e. U160, which is plenty for the 
> machine / drive number we'll be using.  
> 
> Has anyone played around with the latest ones to get a feel for them?  I 
> want a battery backed controller that runs well under linux and also BSD 
> that isn't gonna break the bank.  I'd heard bad stories about the 
> performance of the Adaptec RAID controllers, but it seems the newer ones 
> aren't bad from what I've found googling.

We own 2 Elite 1650  and we love them.  It would be nice to have had
U320 capable controllers but the cards are completely reliable. I
recommend the LSI controllers to everyone because I've never had a
problem with them.



signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Andrew Sullivan
On Tue, Oct 21, 2003 at 03:11:17PM -0600, scott.marlowe wrote:
> I think where it makes sense is when you have something like a report 
> server where the result sets may be huge, but the parellel load is load, 
> i.e. 5 or 10 users tossing around 100 Meg or more at time.

In our case, we were noticing that truss showed an unbelievable
amount of time spent by the postmaster doing open() calls to the OS
(this was on Solaris 7).  So we thought, "Let's try a 2G buffer
size."  2G was more than enough to hold the entire data set under
question.  Once the buffer started to fill, even plain SELECTs
started taking a long time.  The buffer algorithm is just not that
clever, was my conclusion.

(Standard disclaimer: not a long, controlled test.  It's just a bit
of gossip.)

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Tuning for mid-size server

2003-10-21 Thread Josh Berkus
Scott,

> I think where it makes sense is when you have something like a report 
> server where the result sets may be huge, but the parellel load is load, 
> i.e. 5 or 10 users tossing around 100 Meg or more at time.

I've found that that question makes the difference between using 6% & 12% ... 
particularly large data transformations ... but not higher than that.  And 
I've had ample opportunity to test on 2 reporting servers.For one thing, 
with very large reports one tends to have a lot of I/O binding, which is 
handled by the kernel.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] analyzing postgresql performance for dbt-2

2003-10-21 Thread markw
I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having
some trouble figuring out what I should be looking for when I'm trying
to tune the database.  I have results for a decent baseline, but when I
try to increase the load on the database, the performance drops.
Nothing in the graphs (in the links listed later) sticks out to me so
I'm wondering if there are other database statitics I should try to
collect. Any suggestions would be great and let me know if I can answer
any other questions.

Here are a pair of results where I just raise the load on the
database, where increasing the load increases the area of the database
touched in addition to increasing the transaction rate.  The overall
metric increases somewhat, but the response time for most of the
interactions also increases significantly:

http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline]
- load of 100 warehouses
- metric 1249.65

http://developer.osdl.org/markw/dbt2-pgsql/149/
- load of 140 warehouses
- metric 1323.90

Both of these runs had wal_buffers set to 8, checkpoint_segments 200,
and checkpoint_timeout 1800.

So far I've only tried various wal_buffers and checkpoint_segments
settings in the next set of results for a load of 140 warehouses.

http://developer.osdl.org/markw/dbt2-pgsql/148/
- metric 1279.26
- wal_buffers 8
- checkpoint_segments 100
- checkpoint_timeout 300

http://developer.osdl.org/markw/dbt2-pgsql/149/
- metric 1323.90
- wal_buffers 8
- checkpoint_segments 200
- checkpoint_timeout 1800

http://developer.osdl.org/markw/dbt2-pgsql/150/
- metric 1281.13
- wal_buffers 8
- checkpoint_segments 300
- checkpoint_timeout 1800

http://developer.osdl.org/markw/dbt2-pgsql/151/
- metric 1311.99
- wal_buffers 32
- checkpoint_segments 200
- checkpoint_timeout 1800

http://developer.osdl.org/markw/dbt2-pgsql/152/
- metric 1268.37
- wal_buffers 64
- checkpoint_segments 200
- checkpoint_timeout 1800

http://developer.osdl.org/markw/dbt2-pgsql/154/
- metric 1314.62
- wal_buffers 16
- checkpoint_segments 200
- checkpoint_timeout 1800


Thanks!

-- 
Mark Wong - - [EMAIL PROTECTED]
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436  (fax)

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] analyzing postgresql performance for dbt-2

2003-10-21 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
> I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having
> some trouble figuring out what I should be looking for when I'm trying
> to tune the database.  I have results for a decent baseline, but when I
> try to increase the load on the database, the performance drops.
> Nothing in the graphs (in the links listed later) sticks out to me so
> I'm wondering if there are other database statitics I should try to
> collect. Any suggestions would be great and let me know if I can answer
> any other questions.
> 
> Here are a pair of results where I just raise the load on the
> database, where increasing the load increases the area of the database
> touched in addition to increasing the transaction rate.  The overall
> metric increases somewhat, but the response time for most of the
> interactions also increases significantly:
> 
> http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline]
>   - load of 100 warehouses
>   - metric 1249.65
>   
> http://developer.osdl.org/markw/dbt2-pgsql/149/
>   - load of 140 warehouses
>   - metric 1323.90

I looked at these charts and they looked normal to me.  It looked like
your the load increased until your computer was saturated.  Is there
something I am missing?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] analyzing postgresql performance for dbt-2

2003-10-21 Thread Mark Wong
On Tue, Oct 21, 2003 at 08:35:56PM -0400, Bruce Momjian wrote:
> [EMAIL PROTECTED] wrote:
> > I'm running our DBT-2 workload against PostgreSQL 7.3.4 and I'm having
> > some trouble figuring out what I should be looking for when I'm trying
> > to tune the database.  I have results for a decent baseline, but when I
> > try to increase the load on the database, the performance drops.
> > Nothing in the graphs (in the links listed later) sticks out to me so
> > I'm wondering if there are other database statitics I should try to
> > collect. Any suggestions would be great and let me know if I can answer
> > any other questions.
> > 
> > Here are a pair of results where I just raise the load on the
> > database, where increasing the load increases the area of the database
> > touched in addition to increasing the transaction rate.  The overall
> > metric increases somewhat, but the response time for most of the
> > interactions also increases significantly:
> > 
> > http://developer.osdl.org/markw/dbt2-pgsql/158/ [baseline]
> > - load of 100 warehouses
> > - metric 1249.65
> > 
> > http://developer.osdl.org/markw/dbt2-pgsql/149/
> > - load of 140 warehouses
> > - metric 1323.90
> 
> I looked at these charts and they looked normal to me.  It looked like
> your the load increased until your computer was saturated.  Is there
> something I am missing?

I've run some i/o tests so I'm pretty sure I haven't saturated that.  And it
looks like I have almost 10% more processor time left.  I do agree that it
appears something might be saturated, I just don't know where to look...

Thanks,
Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org