date:20031021

Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Andrew Sullivan

On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote:

> hold of these dead rows instead of recycling them.  I suspect you have
> a client process somewhere that is holding an open transaction for a
> long time ... probably not doing anything, just sitting there with an
> unclosed BEGIN ...

Which could be because you're doing something nasty with one of the
"autocommit=off" clients.  Most of the client libraries implement
this by doing "commit;begin;" at every commit.  This means you have
way more idle in transaction connections than you think.  Look in
pg_stat_activity, assuming you've turned on query echoing. 

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall

Josh Berkus writes:
> Harry,


   Many thanks for your response,

> 
> >It has been suggested to me that I resubmit this question to this list,
> > rather than the GENERAL list it was originaly sent to.
> >
> >I asked earlier about ways of doing an UPDATE involving a left outer
> > join and got some very useful feedback.
> 
> The query you posted will always be somewhat slow due to the forced join 
> order, which is unavodable with a left outer join.  

  Yes - I rather suspected that!  It is a shame it takes two joins to do
the work.

> 
> However, regarding your peculiar behaviour, please post:
> 
> 1) Your random_page_cost and effective_cache_size settings

#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch cost

  i.e. - still set to their defaults.

> 2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN

  First the case with no vacuum analyze:

  QUERY PLAN
---
 Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual time=18579.92..48277.69 
rows=335671 loops=1)
   Merge Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Index Scan using import_cdrs_cdr_id_key on import_cdrs  (cost=0.00..52.00 
rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 loops=1)
   ->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual time=18578.71..21155.65 
rows=335671 loops=1)
 Sort Key: un.cdr_id
 ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual 
time=4.70..10011.35 rows=335671 loops=1)
   Hash Cond: ("outer".interim_cli = "inner".interim_num)
   Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
   ->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000 width=49) 
(actual time=0.02..4265.63 rows=335671 loops=1)
   ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.57..4.57 
rows=0 loops=1)
 ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) 
(actual time=0.12..2.77 rows=239 loops=1)
 Total runtime: 80408.42 msec
(12 rows)

  And now the case *with* the vacuum analyze:

   QUERY PLAN
-
 Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual 
time=12383.44..49297.58 rows=335671 loops=1)
   Hash Cond: ("outer".cdr_id = "inner".cdr_id)
   ->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) (actual 
time=0.15..9504.24 rows=335671 loops=1)
   ->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual 
time=12371.13..12371.13 rows=0 loops=1)
 ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual 
time=4.91..9412.55 rows=335671 loops=1)
   Hash Cond: ("outer".interim_cli = "inner".interim_num)
   Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
   ->  Seq Scan on import_cdrs un  (cost=0.00..8496.71 rows=335671 
width=30) (actual time=0.09..3813.54 rows=335671 loops=1)
   ->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual time=4.71..4.71 
rows=0 loops=1)
 ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 width=46) 
(actual time=0.22..2.90 rows=239 loops=1)
 Total runtime: 432543.73 msec
(11 rows)

   Please note that since I first posted I have been slightly adjusting the
schema of the tables, but the disparity remains.

   Many thanks for your assistance.

   Regards,
   Harry.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Shridhar Daithankar

Harry Broomhall wrote:
> #effective_cache_size = 1000# typically 8KB each
> #random_page_cost = 4   # units are one sequential page fetch cost
You must tune the first one at least. Try 
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
parameters.

>>2) The EXPLAIN ANALYZE of each query instead of just the EXPLAIN
>
>
>   First the case with no vacuum analyze:
>
>   QUERY PLAN
> 
---
>  Merge Join  (cost=99.32..171.32 rows=1000 width=259) (actual 
time=18579.92..48277.69 rows=335671 loops=1)
>Merge Cond: ("outer".cdr_id = "inner".cdr_id)
>->  Index Scan using import_cdrs_cdr_id_key on import_cdrs 
(cost=0.00..52.00 rows=1000 width=164) (actual time=0.42..11479.51 rows=335671 
loops=1)
>->  Sort  (cost=99.32..101.82 rows=1000 width=95) (actual 
time=18578.71..21155.65 rows=335671 loops=1)
>  Sort Key: un.cdr_id
>  ->  Hash Join  (cost=6.99..49.49 rows=1000 width=95) (actual 
time=4.70..10011.35 rows=335671 loops=1)
>Hash Cond: ("outer".interim_cli = "inner".interim_num)
>Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
>->  Seq Scan on import_cdrs un  (cost=0.00..20.00 rows=1000 
width=49) (actual time=0.02..4265.63 rows=335671 loops=1)
>->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual 
time=4.57..4.57 rows=0 loops=1)
>  ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 
width=46) (actual time=0.12..2.77 rows=239 loops=1)
>  Total runtime: 80408.42 msec
> (12 rows)

You are lucky to get a better plan here because planner is way off w.r.t 
estimated number of rows.
>
>   And now the case *with* the vacuum analyze:
>
>QUERY PLAN
> 
-
>  Hash Join  (cost=15335.91..49619.57 rows=335671 width=202) (actual 
time=12383.44..49297.58 rows=335671 loops=1)
>Hash Cond: ("outer".cdr_id = "inner".cdr_id)
>->  Seq Scan on import_cdrs  (cost=0.00..8496.71 rows=335671 width=126) 
(actual time=0.15..9504.24 rows=335671 loops=1)
>->  Hash  (cost=10398.73..10398.73 rows=335671 width=76) (actual 
time=12371.13..12371.13 rows=0 loops=1)
>  ->  Hash Join  (cost=6.99..10398.73 rows=335671 width=76) (actual 
time=4.91..9412.55 rows=335671 loops=1)
>Hash Cond: ("outer".interim_cli = "inner".interim_num)
>Join Filter: (("outer".starttime >= "inner".starttime) AND 
("outer".starttime <= "inner".endtime))
>->  Seq Scan on import_cdrs un  (cost=0.00..8496.71 
rows=335671 width=30) (actual time=0.09..3813.54 rows=335671 loops=1)
>->  Hash  (cost=6.39..6.39 rows=239 width=46) (actual 
time=4.71..4.71 rows=0 loops=1)
>  ->  Seq Scan on num_xlate  (cost=0.00..6.39 rows=239 
width=46) (actual time=0.22..2.90 rows=239 loops=1)
>  Total runtime: 432543.73 msec
> (11 rows)
>

What happens if you turn off hash joins? Also bump sort memory to something 
good.. around 16MB and see what difference does it make to performance..

 Shridhar



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] index file bloating still in 7.4 ?

2003-10-21 Thread Gaetano Mendola

Andrew Sullivan wrote:
On Mon, Oct 20, 2003 at 05:42:52PM -0400, Tom Lane wrote:


hold of these dead rows instead of recycling them.  I suspect you have
a client process somewhere that is holding an open transaction for a
long time ... probably not doing anything, just sitting there with an
unclosed BEGIN ...


Which could be because you're doing something nasty with one of the
"autocommit=off" clients.  Most of the client libraries implement
this by doing "commit;begin;" at every commit.  This means you have
way more idle in transaction connections than you think.  Look in
pg_stat_activity, assuming you've turned on query echoing. 
Or is enough do a ps -eafwww | grep post
to see the state of the connections
Regards
Gaetano Mendola
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall

Shridhar Daithankar writes:
> Harry Broomhall wrote:
>  > #effective_cache_size = 1000# typically 8KB each
>  > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> You must tune the first one at least. Try 
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
> parameters.

   Wow.  Many thanks for the pointer.  I'm going to be spending some time
trying to get my head around all of that!

[SNIP]

>  >  Total runtime: 80408.42 msec
>  > (12 rows)
> 
> You are lucky to get a better plan here because planner is way off w.r.t 
> estimated number of rows.

   Yes!  I thought that.  Which was why I was so surprised at the difference.

>  >
>  >   And now the case *with* the vacuum analyze:
>  >
[SNIP]
> 
> What happens if you turn off hash joins? Also bump sort memory to something 
> good.. around 16MB and see what difference does it make to performance..



   Lots of things to try there.


   It will probably take me some time  .

   Regards,
  Harry.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Alexander Priem

The machine is going to be used for a pretty large database (well over 100
tables with some of them containing over a million records from the start,
number of tables and records will grow (much?) larger in the future). This
database is going to be used by a pretty large number of employees. The
number of concurrent users will vary between 1 - 100 or so, depending on the
time of day etc. This will be a database containing client and supplier data
as well as product descriptions and prices/ingredients/labels/brands etc.
Database use will include lots of SELECTS but also lots of INSERTS/UPDATES,
i.e. the database will be pretty active during bussiness hours...

I think you (Scott and Will) are right when you say that NAS devices are not
ideal for this kind of thing. I have been thinking about the hardware
configuration for this machine for some time now (and had a lot of hints
through this list already) and decided to go for a SCSI RAID config after
all. The extra costs will be worth it   :)

The machine I have in mind now is like this :

Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...

This machine will be a bit more expensive than I thought at first (it's
going to be about EUR 14.000, but that's including 3 years of on-site
support from Dell (24x7, 4-hour response) and peripherals like UPS etc...

Do you think this machine wil be OK for this task?

Thanks for your help so far   :)

Kind regards,
Alexander Priem.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Shridhar Daithankar

Alexander Priem wrote:
Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...
This machine will be a bit more expensive than I thought at first (it's
going to be about EUR 14.000, but that's including 3 years of on-site
support from Dell (24x7, 4-hour response) and peripherals like UPS etc...
Check opteron as well.. I don't know much about european resellers. IBM sells 
eserver 325 which has opterons. Apparently they scale much better at higher 
load. Of course pricing,availability and support are most important.

http://theregister.co.uk/content/61/33378.html
http://www.pc.ibm.com/us/eserver/opteron/325/
Any concrete benchmarks for postgresql w.r.t xeons and opterons? A collection 
would be nice to have..:-)

 Shridhar

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Alexander Priem

I have considered Opteron, yes. But I think there are too many
uncertainties, like :

* It's a new CPU that has not proven itself yet.
* I don't think I can buy directly from IBM (according to their site), so
how about support (24x7) ? This will be very important to our client.
* I need to install and configure a 64bit Linux flavour which I don't know
(yet)

Any suggestions about the usability of the system I described before?

Here is the description again:

Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
This machine will contain a PERC4/Di RAID controller with 128MB of battery
backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
(15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
server via a SCSI cable...

I have never worked with a XEON CPU before. Does anyone know how it performs
running PostgreSQL 7.3.4 / 7.4 on RedHat 9 ? Is it faster than a Pentium 4?
I believe the main difference is cache memory, right? Aside from cache mem,
it's basically a Pentium 4, or am I wrong?

Kind regards,
Alexander.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread Geoffrey

Alexander Priem wrote:
I have considered Opteron, yes. But I think there are too many
uncertainties, like :
* It's a new CPU that has not proven itself yet.
* I don't think I can buy directly from IBM (according to their site), so
how about support (24x7) ? This will be very important to our client.
* I need to install and configure a 64bit Linux flavour which I don't know
(yet)
See http://www.monarchcomputer.com/  they custom build operton systems 
and preload them with Linux.  You don't pay the Microsoft tax.

--
Until later, Geoffrey   [EMAIL PROTECTED]
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Performance weirdness with/without vacuum analyze

2003-10-21 Thread Harry Broomhall

Shridhar Daithankar writes:

  First - many thanks for your suggestions and pointers to further info.

  I have been trying some of them with some interesting results!

> Harry Broomhall wrote:
>  > #effective_cache_size = 1000# typically 8KB each
>  > #random_page_cost = 4   # units are one sequential page fetch cost
> 
> You must tune the first one at least. Try 
> http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html to tune these 
> parameters.

  Changing effective_cache_size seemed to have very little effect.  I took it
in steps up to 300MB (the machine has 640MB memory), and the differences
in speed were less than 10%.

[SNIP]
> 
> What happens if you turn off hash joins?

   This makes the non vacuum version about 40% slower, and the vacuum version
to the same speed (i.e. about 4X faster than it had been!).

> Also bump sort memory to something 
> good.. around 16MB and see what difference does it make to performance..

  This was interesting.  Taking it to 10MB made a slight improvement.  Up to
20MB and the vacuum case improved by 5X speed, but the non-vacuum version
slowed down.  Putting it up to 40MB slowed both down again.

  I will need to test with some of the other scripts and functions I have
written, but it looks as if selective use of more sort memory will be
useful.

  Regards,
  Harry.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] SRFs ... no performance penalty?

2003-10-21 Thread Robert Treat

On Mon, 2003-10-20 at 20:55, Josh Berkus wrote:
> Folks,
> 
> I'm working on the demo session for our upcoming presentation at PHPCon.  
> 
> As a side issue, we ended up comparing 3 versions of the same search screen:
> 
> 1) All in PHP with views;
> 2) Using a function to build a query and count results but executing that 
> query directly and sorting, paging in PHP;
> 3) Using a Set Returning function to handle row-returning, sorting, and 
> paging.
> 
> All three methods were executing a series moderately complex query against a 
> medium-sized data set (only about 20,000 rows but it's on a laptop).  The 
> postgresql.conf was tuned like a webserver; e.g. low sort_mem, high 
> max_connections.
> 
> So far, on the average of several searches, we have:
> 
> 1) 0.19687 seconds
> 2) 0.20667 seconds
> 3) 0.20594 seconds
> 

Is this measuring time in the back-end or total time of script
execution? 


> In our tests, using any kind of PL/pgSQL function seems to carry a 0.01 second 
> penalty over using PHP to build the search query.   I'm not sure if this is 
> comparitive time for string-parsing or something else; the 0.01 seems to be 
> consistent regardless of scale.
> 
> The difference between using a PL/pgSQL function as a query-builder only (the 
> 7.2.x method) and using SRFs was small enough not to be significant.
> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Tuning for mid-size server

2003-10-21 Thread Anjan Dave

Title: Tuning for mid-size server

Hi,

Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with internal drives on RAID5 will be delivered. Postgres will be from RH8.0.

I am planning for these values for the postgres configuration - to begin with:

Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144

Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 167772

Effective_cache_size = 262144 (same as shared_buffers - 25%)

In the /etc/sysctl file:

kernel.shmall = 536870912 (512MB) SHMALL Total amount of shared memory available (bytes or pages)

kernel.shmmax = 536870912 (512MB) SHMMAX Maximum size of shared memory segment (bytes)

In a generic sense, these are recommended values I found in some documents. The database will be small in size and will gradually grow over time from few thousands to a few million records, or more. The activity will be mostly of select statements from a few tables with joins, orderby, groupby clauses. The web application is based on Apache/Resin and hotspot JVM 1.4.0.

Are the above settings ok to begin with? Are there any other parameters that I should configure now, or monitor lateron?

In other words, am I missing anything here to take full advantage of 4 CPUs and 8Gigs of RAM?

Appreciate any help.

Thanks,

Anjan

This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.

Re: [PERFORM] PostgreSQL data on a NAS device ?

2003-10-21 Thread scott.marlowe

On Tue, 21 Oct 2003, Alexander Priem wrote:

> The machine is going to be used for a pretty large database (well over 100
> tables with some of them containing over a million records from the start,
> number of tables and records will grow (much?) larger in the future). This
> database is going to be used by a pretty large number of employees. The
> number of concurrent users will vary between 1 - 100 or so, depending on the
> time of day etc. This will be a database containing client and supplier data
> as well as product descriptions and prices/ingredients/labels/brands etc.
> Database use will include lots of SELECTS but also lots of INSERTS/UPDATES,
> i.e. the database will be pretty active during bussiness hours...
> 
> I think you (Scott and Will) are right when you say that NAS devices are not
> ideal for this kind of thing. I have been thinking about the hardware
> configuration for this machine for some time now (and had a lot of hints
> through this list already) and decided to go for a SCSI RAID config after
> all. The extra costs will be worth it   :)
> 
> The machine I have in mind now is like this :
> 
> Dell PowerEdge 1750 machine with Intel Xeon CPU at 3 GHz and 4 GB of RAM.
> This machine will contain a PERC4/Di RAID controller with 128MB of battery
> backed cache memory. The O/S and logfiles will be placed on a RAID-1 setup
> of two 36Gb SCSI-U320 drives (15.000rpm). Database data will be placed on a
> Dell PowerVault 220S rack-module containing six 36Gb SCSI-U320 drives
> (15.000rpm) in a RAID-10 setup. This PowerVault will be connected to the DB
> server via a SCSI cable...

Funny, we're looking at the same basic type of system here, but with a 
Perc3/CI controller.  We have a local supplier who gives us machines with 
a 3 year warranty and looks to be $1,000 to $2,000 lower than the Dell.

We're just going to run two 73 Gig drives in a RAID1 to start with, with 
battery backed RAM.

So that brings up my question, which is better, the Perc4 or Perc3 
controllers, and what's the difference between them?  I find Dell's 
tendency to hide other people's hardware behind their own model numbers 
mildly bothersome, as it makes it hard to comparison shop.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

47 matches

Mail list logo