Re: [PERFORM] Optimizing No matching record Queries

2008-02-13 Thread Richard Huxton

Dean Gibson (DB Administrator) wrote:

The questions are:

1. Why in the planner scanning the entire idx_listing_entrydate, when 
I'd think it should be scanning the entire 
pk_listingstatus_listingstatusid ?


It's looking at the ORDER BY and sees that the query needs the 10 most 
recent, so tries searching by date. That's sensible where you are going 
to have a lot of matches for fklistingsourceid.


Which suggests that statistics for "fklistingsourceid" aren't high 
enough, like Greg suggested. If that doesn't help, the index on 
(fklistingsourceid,entrydate) that Stephen might well do so.


2. Why is "Index Scan using pk_listingstatus_listingstatusid on 
listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never 
executed)" ?


Because nothing comes out of the first index-scan.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Tore Halset

Hello.

I think I started that discussion. We ended up buying a Dell 2900 with  
PERC 6/i and 10 * 145GB SAS 3,5" 15KRpm discs. 6 of the SAS discs are  
in a raid 10 for the database, 2 in a mirror for the wal and the last  
2 in a mirror for the OS. We get 350MB/s writing and 380MB/s reading  
to/from the raid 10 area using dd. The OS is Ubuntu and the filesystem  
for the raid 10 is ext3.


The box is still under testing, but we plan to set it in production  
this week.


Regards,
 - Tore.

On Feb 12, 2008, at 17:32 , Craig James wrote:

Does anyone have performance info about the new Dell Perc/6  
controllers?  I found a long discussion ("Dell vs HP") about the  
Perc/5, but nothing about Perc/6.  What's under the covers?


Here is the (abbreviated) info from Dell on this machine:
PowerEdge 1950 IIIQuad Core Intel® Xeon® E5405, 2x6MB Cache,  
2.0GHz, 1333MHz FSB
Additional Processors Quad Core Intel® Xeon® E5405, 2x6MB Cache,  
2.0GHz, 1333MHz FSB

Memory8GB 667MHz (4x2GB), Dual Ranked DIMMs
Hard Drive Configuration  Integrated SAS/SATA RAID 5, PERC 6/i  
Integrated


Thanks,
Craig

---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Update with Subquery Performance

2008-02-13 Thread Linux Guru
yes, I also thought of this method and tested it before I got your mail and
this solution seems workable.

Thanks for the help

On Feb 12, 2008 9:18 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Linux Guru" <[EMAIL PROTECTED]> writes:
> > Analyzing did not help, here is the out of EXPLAIN ANALYZE of update
> query
> > "Seq Scan on dummy  (cost=0.00..56739774.24 rows=23441 width=275)
> (actual
> > time=18.927..577929.014 rows=22712 loops=1)"
> > "  SubPlan"
> > "->  Aggregate  (cost=2420.41..2420.43 rows=1 width=19) (actual
> time=
> > 25.423..25.425 rows=1 loops=22712)"
> > "  ->  Seq Scan on dummy "temp"  (cost=0.00..2416.01 rows=586
> > width=19) (actual time=0.049..17.834 rows=2414 loops=22712)"
> > "Filter: ((product)::text = ($0)::text)"
> > "Total runtime: 578968.885 ms"
>
> Yeah, that's just not going to be fast.  An index on the product column
> might help a bit, but the real issue is that you're repetitively
> calculating the same aggregates.  I think you need a separate temp
> table, along the lines of
>
> create temp table dummy_agg as
>  select product,
> (case when sum(pd) <> 0 then sum(gd)/sum(pd)*100 else 0 end) as s
>  from dummy
>  group by product;
>
> create index dummy_agg_i on dummy_agg(product); -- optional
>
> update dummy
>  set gp= (select s from dummy_agg where dummy_agg.product = dummy.product
> );
>
> The index would only be needed if you expect a lot of rows (lot of
> different product values).
>
>regards, tom lane
>


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Dave Cramer


On 13-Feb-08, at 5:02 AM, Tore Halset wrote:


Hello.

I think I started that discussion. We ended up buying a Dell 2900  
with PERC 6/i and 10 * 145GB SAS 3,5" 15KRpm discs. 6 of the SAS  
discs are in a raid 10 for the database, 2 in a mirror for the wal  
and the last 2 in a mirror for the OS. We get 350MB/s writing and  
380MB/s reading to/from the raid 10 area using dd. The OS is Ubuntu  
and the filesystem for the raid 10 is ext3.


Wow that's fantastic. Just to be sure, did you make sure that you read  
and wrote 2x memory to take the cache out of the measurement ?


Dave

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Join Query Perfomance Issue

2008-02-13 Thread Thomas Zaksek

> Can you send the table definitions of the tables involved in the
> query, including index information? Might be if we look hard enough we
> can find something.
>
> Peter



 Table "messungen_v_dat_2007_11_12"
   Column | Type | Modifiers | Description
---+--+---+-
ganglinientyp | character(1) |   |
minute_tag| smallint |   |
messpunkt | integer  |   |
Indexes:
   "messungen_v_dat_2007_11_12_ganglinientyp_key" UNIQUE, btree 
(ganglinientyp, minute_tag, messpunkt)

   "messungen_v_dat_2007_11_12_messpunkt_idx" btree (messpunkt)
   "messungen_v_dat_2007_11_12_messpunkt_minute_tag_idx" btree 
(ganglinientyp, minute_tag)

Foreign-key constraints:
   "messungen_v_dat_2007_11_12_messpunkt_fkey" FOREIGN KEY (messpunkt) 
REFERENCES messpunkt(nr)

Inherits: messungen_v_dat
Has OIDs: no




Table "messpunkt"
Column |  Type   |   
Modifiers| Description

+-++-
nr | integer | not null default 
nextval('messpunkt_nr_seq'::regclass) |
zs_nr  | integer 
||
mw_nr  | integer 
||

Indexes:
   "messpunkt_pkey" PRIMARY KEY, btree (nr)
   "messpunkt_zs_nr_key" UNIQUE, btree (zs_nr, mw_nr)
   "messpunkt_mw_idx" btree (mw_nr)
   "messpunkt_nr_idx" btree (nr)
   "messpunkt_zs_idx" btree (zs_nr)
Foreign-key constraints:
   "messpunkt_mw_nr_fkey" FOREIGN KEY (mw_nr) REFERENCES de_mw(nr)
   "messpunkt_zs_nr_fkey" FOREIGN KEY (zs_nr) REFERENCES de_zs(zs)
Has OIDs: no




 Table "de_mw"
Column |   Type   | Modifiers  
| Description

+--++-
nr | integer  | not null default nextval('de_mw_nr_seq'::regclass) |
j_ges  | smallint ||
mw_abh | integer  ||
Indexes:
   "de_mw_pkey" PRIMARY KEY, btree (nr)
   "de_mw_j_ges_key" UNIQUE, btree (j_ges, mw_abh)
   "de_nw_nr_idx" btree (nr)
Check constraints:
   "de_mw_check" CHECK (j_ges IS NOT NULL AND (j_ges = 0 AND (mw_abh = 
0 OR mw_abh = 255 OR mw_abh IS NULL) OR j_ges > 0 AND j_ges <= 80 AND 
mw_abh <> 0))

Has OIDs: no

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Join Query Perfomance Issue

2008-02-13 Thread Thomas Zaksek
We have tried some recoding now, using a materialized view we could 
reduce the query to a join over too tables without any functions inside 
the query, for example:


explain analyse SELECT '12.11.2007 18:04:00 UTC' AS zeit,
  'M' AS ganglinientyp,
  zs_de,
  j_ges,
  j_lkw,
  v_pkw,
  v_lkw,
  p_bel
  FROM  messungen_v_dat_2007_11_12 m
  LEFT JOIN messwerte_mv w on w.nr = m.messpunkt
  WHERE  m.ganglinientyp = 'M'   
  AND 992 = m.minute_tag;


Nested Loop Left Join  (cost=0.00..32604.48 rows=3204 width=14) (actual 
time=11.991..2223.227 rows=2950 loops=1)
  ->  Index Scan using 
messungen_v_dat_2007_11_12_messpunkt_minute_tag_idx on 
messungen_v_dat_2007_11_12 m  (cost=0.00..5371.09 rows=3204 width=4) 
(actual time=0.152..12.385 rows=2950 loops=1)

Index Cond: ((ganglinientyp = 'M'::bpchar) AND (992 = minute_tag))
  ->  Index Scan using messwerte_mv_nr_idx on messwerte_mv w  
(cost=0.00..8.49 rows=1 width=18) (actual time=0.730..0.734 rows=1 
loops=2950)

Index Cond: (w.nr = m.messpunkt)
Total runtime: 2234.143 ms
(6 rows)

To me this plan looks very clean and nearly optimal, BUT ~2seconds for 
the nested loop can't be that good, isn't it?
The behavior of this query and the database is quite a mystery for me, 
yesterday i had it running in about 100ms, today i started testing with 
the same query and 2000-3000ms :(


Could this be some kind of a postgresql server/configuration problem? 
This queries are very perfomance dependend, they are called a lot of 
times in a comlex physical real time simulation of traffic systems. 
200ms would be ok here, but >1sec is perhaps not functional.


The old version just used one big (partitioned) table without any joins, 
performing this query in 10-300ms, depended on the server load.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Small DB Server Advice

2008-02-13 Thread Rory Campbell-Lange
We have a web application for which we intend to run the database on a
dedicated server.

We hope by the end of 2008 to have 10 companies accessing 10 instances
of the database for this application. The dump file of each database is
likely to be less than 100MB at the end of the year. The databases are
read-heavy.

I'm thinking of something along the following lines:

(https://secure.dnuk.com/systems/r325hs-1u.php?configuration=7766)

4 x 147GB 15000 rpm SCSI in RAID 10 with 320-1 RAID CARD + 64MB cache BBU
2x Intel Xeon E5405 / 4x 2.00GHz / 1333MHz FSB / 12MB cache
6GB RAM

Cost around 2320 GBP -- it would be great to get it under 2000
Needs to be in the UK.

I would be grateful for any comments. I'm particularly out of date about
the best processors to go for. DNUK also have Opteron as an option.

Rory




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Small DB Server Advice

2008-02-13 Thread Matthew

On Wed, 13 Feb 2008, Rory Campbell-Lange wrote:

   4 x 147GB 15000 rpm SCSI in RAID 10 with 320-1 RAID CARD + 64MB cache BBU
   2x Intel Xeon E5405 / 4x 2.00GHz / 1333MHz FSB / 12MB cache
   6GB RAM

   Cost around 2320 GBP -- it would be great to get it under 2000
   Needs to be in the UK.



I would be grateful for any comments. I'm particularly out of date about
the best processors to go for. DNUK also have Opteron as an option.


That sounds pretty good. It should run postgres fairly well, especially if 
you have quite a few parallel queries coming in. You won't need a bigger 
BBU cache if it's read-heavy. You'll have eight CPU cores, which is good. 
And RAID 10 is good.


As for Intel/AMD, I think they're neck-and-neck at the moment. Both are 
fast.


Of course, we over here have no idea how much actual read traffic there 
will be, so you may be massively over-providing or it may be woefully 
inadequate, but this machine looks like a fairly good buy for the price.


Matthew

--
No trees were killed in the sending of this message.  However a large
number of electrons were terribly inconvenienced.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Creating and updating table using function parameter reference

2008-02-13 Thread Linux Guru
I want to create and update two tables in a function such as below, but
using parameters as tablename is not allowed and gives an error. Is there
any way I could achieve this?

CREATE OR REPLACE FUNCTION test ( t1  text,t2 text  ) RETURNS numeric AS $$
declare temp1 text;
declare temp2 text;
begin
temp1=t1;
temp2=t2;
select
product,
(case when sum(pd) <> 0 then sum(gd)/sum(pd)*100 else 0 end  ) as gppp
into temp2 from temp1  as dummy
group by dummy.product,dummy.totalclaimsgroup,dummy.avgmems,dummy.months;

update temp1 as t  set
 GPPP=(select gppp  from temp2  as dummy where dummy.product=t.product),

end
$$ LANGUAGE plpgsql


--
ERROR:  syntax error at or near "$1"
LINE 1: ...en sum(gd)/sum(pd)*100 else 0 end ) as gppp from  $1  as dum...
 ^
QUERY:  select product, (case when sum(pd) <> 0 then sum(gd)/sum(pd)*100
else 0 end ) as gppp from  $1  as dummy group by dummy.product,
dummy.totalclaimsgroup,dummy.avgmems,dummy.months
CONTEXT:  SQL statement in PL/PgSQL function "test" near line 10

** Error **

ERROR: syntax error at or near "$1"
SQL state: 42601
Context: SQL statement in PL/PgSQL function "test" near line 10


Re: [PERFORM] Creating and updating table using function parameter reference

2008-02-13 Thread Albert Cervera Areny
A Dimecres 13 Febrer 2008 15:25, Linux Guru va escriure:
> I want to create and update two tables in a function such as below, but
> using parameters as tablename is not allowed and gives an error. Is there
> any way I could achieve this?

You're looking for EXECUTE:
http://www.postgresql.org/docs/8.3/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

>
> CREATE OR REPLACE FUNCTION test ( t1  text,t2 text  ) RETURNS numeric AS $$
> declare temp1 text;
> declare temp2 text;
> begin
> temp1=t1;
> temp2=t2;
> select
> product,
> (case when sum(pd) <> 0 then sum(gd)/sum(pd)*100 else 0 end  ) as gppp
> into temp2 from temp1  as dummy
> group by dummy.product,dummy.totalclaimsgroup,dummy.avgmems,dummy.months;
>
> update temp1 as t  set
>  GPPP=(select gppp  from temp2  as dummy where dummy.product=t.product),
>
> end
> $$ LANGUAGE plpgsql
>
>
> --
> ERROR:  syntax error at or near "$1"
> LINE 1: ...en sum(gd)/sum(pd)*100 else 0 end ) as gppp from  $1  as dum...
>  ^
> QUERY:  select product, (case when sum(pd) <> 0 then sum(gd)/sum(pd)*100
> else 0 end ) as gppp from  $1  as dummy group by dummy.product,
> dummy.totalclaimsgroup,dummy.avgmems,dummy.months
> CONTEXT:  SQL statement in PL/PgSQL function "test" near line 10
>
> ** Error **
>
> ERROR: syntax error at or near "$1"
> SQL state: 42601
> Context: SQL statement in PL/PgSQL function "test" near line 10

 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Small DB Server Advice

2008-02-13 Thread Matthew

On Wed, 13 Feb 2008, Magnus Hagander wrote:

On Wed, 13 Feb 2008, Rory Campbell-Lange wrote:
   4 x 147GB 15000 rpm SCSI in RAID 10 with 320-1 RAID CARD + 64MB cache 
BBU


In my experience, battery backed cache is always worth the money. Even if 
you're mostly select, you will have some updates. And it'll also pick up 
other write activity onthe system...


Of course. My point was that 64MB should be quite sufficient if most 
accesses are reads. We have a few machines here with 2GB BBU caches as we 
do LOTS of writes - that sort of thing probably isn't necessary here.


Matthew

--
I suppose some of you have done a Continuous Maths course. Yes? Continuous
Maths?  Whoah, it was like that, was it!
   -- Computer Science Lecturer

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Join Query Perfomance Issue

2008-02-13 Thread Tom Lane
Thomas Zaksek <[EMAIL PROTECTED]> writes:
> Nested Loop Left Join  (cost=0.00..32604.48 rows=3204 width=14) (actual 
> time=11.991..2223.227 rows=2950 loops=1)
>->  Index Scan using 
> messungen_v_dat_2007_11_12_messpunkt_minute_tag_idx on 
> messungen_v_dat_2007_11_12 m  (cost=0.00..5371.09 rows=3204 width=4) 
> (actual time=0.152..12.385 rows=2950 loops=1)
>  Index Cond: ((ganglinientyp = 'M'::bpchar) AND (992 = minute_tag))
>->  Index Scan using messwerte_mv_nr_idx on messwerte_mv w  
> (cost=0.00..8.49 rows=1 width=18) (actual time=0.730..0.734 rows=1 
> loops=2950)
>  Index Cond: (w.nr = m.messpunkt)
>  Total runtime: 2234.143 ms
> (6 rows)

> To me this plan looks very clean and nearly optimal,

For so many rows I'm surprised it's not using a bitmap indexscan.
What PG version is this?  How big are these tables?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Merlin Moncure
On Feb 13, 2008 5:02 AM, Tore Halset <[EMAIL PROTECTED]> wrote:
> Hello.
>
> I think I started that discussion. We ended up buying a Dell 2900 with
> PERC 6/i and 10 * 145GB SAS 3,5" 15KRpm discs. 6 of the SAS discs are
> in a raid 10 for the database, 2 in a mirror for the wal and the last
> 2 in a mirror for the OS. We get 350MB/s writing and 380MB/s reading
> to/from the raid 10 area using dd. The OS is Ubuntu and the filesystem
> for the raid 10 is ext3.

Those are decent numbers.  Can you do a bonnie++ run and post the
results (specifically interested in seeks)?

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Small DB Server Advice

2008-02-13 Thread Magnus Hagander

Matthew wrote:

On Wed, 13 Feb 2008, Rory Campbell-Lange wrote:
   4 x 147GB 15000 rpm SCSI in RAID 10 with 320-1 RAID CARD + 64MB 
cache BBU

   2x Intel Xeon E5405 / 4x 2.00GHz / 1333MHz FSB / 12MB cache
   6GB RAM

   Cost around 2320 GBP -- it would be great to get it under 2000
   Needs to be in the UK.



I would be grateful for any comments. I'm particularly out of date about
the best processors to go for. DNUK also have Opteron as an option.


That sounds pretty good. It should run postgres fairly well, especially 
if you have quite a few parallel queries coming in. You won't need a 
bigger BBU cache if it's read-heavy. You'll have eight CPU cores, which 
is good. And RAID 10 is good.


In my experience, battery backed cache is always worth the money. Even 
if you're mostly select, you will have some updates. And it'll also pick 
up other write activity onthe system...



//Magnus

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Anyone using a SAN?

2008-02-13 Thread Peter Koczan
Hi all,

We're considering setting up a SAN where I work. Is there anyone using
a SAN, for postgres or other purposes? If so I have a few questions
for you.

- Are there any vendors to avoid or ones that are particularly good?

- What performance or reliability implications exist when using SANs?

- Are there any killer features with SANs compared to local storage?

Any other comments are certainly welcome.

Peter

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Kenneth Marshall
On Wed, Feb 13, 2008 at 10:56:54AM -0600, Peter Koczan wrote:
> Hi all,
> 
> We're considering setting up a SAN where I work. Is there anyone using
> a SAN, for postgres or other purposes? If so I have a few questions
> for you.
> 
> - Are there any vendors to avoid or ones that are particularly good?
> 
> - What performance or reliability implications exist when using SANs?
> 
> - Are there any killer features with SANs compared to local storage?
> 
> Any other comments are certainly welcome.
> 
> Peter
> 

Peter,

The key is to understand your usage patterns, both I/O and query.
SANs can be easily bandwidth limited which can tank your database
performance. There have been several threads in the mailing list
about performance problems caused by the use of a SAN for storage.

Cheers,
Ken

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Join Query Perfomance Issue

2008-02-13 Thread Thomas Zaksek

For so many rows I'm surprised it's not using a bitmap indexscan.
What PG version is this?  How big are these tables?

regards, tom lane


Its PG 8.2.6 on Freebsd.

messungen_v_dat_2007_11_12 ist about 4 million rows and messwerte is 
about 10 million rows.



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] HOT TOAST?

2008-02-13 Thread Josh Berkus
Folks,

Does anyone know if HOT is compatible with pg_toast tables, or do TOASTed rows 
simply get excluded from HOT?  I can run some tests, but if someone knows 
this off the top of their heads it would save me some time.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Alex Deucher
On Feb 13, 2008 12:46 PM, Kenneth Marshall <[EMAIL PROTECTED]> wrote:
> On Wed, Feb 13, 2008 at 10:56:54AM -0600, Peter Koczan wrote:
> > Hi all,
> >
> > We're considering setting up a SAN where I work. Is there anyone using
> > a SAN, for postgres or other purposes? If so I have a few questions
> > for you.
> >
> > - Are there any vendors to avoid or ones that are particularly good?
> >
> > - What performance or reliability implications exist when using SANs?
> >
> > - Are there any killer features with SANs compared to local storage?
> >
> > Any other comments are certainly welcome.
> >
> > Peter
> >
>
> Peter,
>
> The key is to understand your usage patterns, both I/O and query.
> SANs can be easily bandwidth limited which can tank your database
> performance. There have been several threads in the mailing list
> about performance problems caused by the use of a SAN for storage.

It's critical that you set up the SAN with a database in mind
otherwise the performance will be bad.  I tested a DB on a SAN
designed to maximize storage space and performance was terrible.  I
never had the time or resources to reconfigure the SAN to test a more
suitable spindle setup since the SAN was in heavy production use for
file archiving.

Alex

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Small DB Server Advice

2008-02-13 Thread Greg Smith

On Wed, 13 Feb 2008, Rory Campbell-Lange wrote:


(https://secure.dnuk.com/systems/r325hs-1u.php?configuration=7766)
   4 x 147GB 15000 rpm SCSI in RAID 10 with 320-1 RAID CARD + 64MB cache BBU


That's running the LSI Megaraid SCSI controller.  Those are solid but not 
the best performers in their class, particularly on writes.  But given 
your application description (1GB data in a year and read-heavy) that card 
running a 4-spindle RAID10 should be overkill.


I'm particularly out of date about the best processors to go for. DNUK 
also have Opteron as an option.


Current Intel chips benchmark better, occasionally you'll find people who 
claim the better multi-CPU memory model in the Opteron systems give them 
better performance at high loads but that's difficult to quantify. 
There's not a huge difference in any case.  You're probably going to 
bottleneck on either disk or how fast DDR2 memory goes anyway and both 
sets of products are competative right now.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] HOT TOAST?

2008-02-13 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes:
> Does anyone know if HOT is compatible with pg_toast tables, or do TOASTed 
> rows 
> simply get excluded from HOT?

The current TOAST code never does any updates, only inserts/deletes.
But the HOT logic should be able to reclaim deleted rows early via
pruning.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Tore Halset

On Feb 13, 2008, at 12:06, Dave Cramer wrote:



On 13-Feb-08, at 5:02 AM, Tore Halset wrote:


Hello.

I think I started that discussion. We ended up buying a Dell 2900  
with PERC 6/i and 10 * 145GB SAS 3,5" 15KRpm discs. 6 of the SAS  
discs are in a raid 10 for the database, 2 in a mirror for the wal  
and the last 2 in a mirror for the OS. We get 350MB/s writing and  
380MB/s reading to/from the raid 10 area using dd. The OS is Ubuntu  
and the filesystem for the raid 10 is ext3.


Wow that's fantastic. Just to be sure, did you make sure that you  
read and wrote 2x memory to take the cache out of the measurement ?


Dave




The box have 16GB of ram, but my original test file was only 25GB.  
Sorry. Going to 33GB lowered the numbers for writing. Here you have  
some samples.


% sh -c "dd if=/dev/zero of=bigfile bs=8k count=400 && sync"
3276800 bytes (33 GB) copied, 103.722 seconds, 316 MB/s
3276800 bytes (33 GB) copied, 99.669 seconds, 329 MB/s

% time dd if=bigfile of=/dev/null bs=8k
3276800 bytes (33 GB) copied, 85.4235 seconds, 384 MB/s
3276800 bytes (33 GB) copied, 85.4628 seconds, 383 MB/s

Regards,
 - Tore.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Optimizing No matching record Queries

2008-02-13 Thread Pallav Kalva
Thanks! for all your replies, I tried increasing the statistics on 
fklistingsourceid to 1000 it made any difference.  Then I created an 
index on (fklistingsourceid,entrydate) it helped and it was fast.


This index would fix this problem but in general I would like to know 
what if  there are queries where it does "index scan backwards" and 
there is no "order by clause" and the query is still bad ? Would there 
be a case like that or the planner uses index scan backwards only when 
use order by desc also.



Richard Huxton wrote:

Dean Gibson (DB Administrator) wrote:

The questions are:

1. Why in the planner scanning the entire idx_listing_entrydate, when 
I'd think it should be scanning the entire 
pk_listingstatus_listingstatusid ?


It's looking at the ORDER BY and sees that the query needs the 10 most 
recent, so tries searching by date. That's sensible where you are 
going to have a lot of matches for fklistingsourceid.


Which suggests that statistics for "fklistingsourceid" aren't high 
enough, like Greg suggested. If that doesn't help, the index on 
(fklistingsourceid,entrydate) that Stephen might well do so.


2. Why is "Index Scan using pk_listingstatus_listingstatusid on 
listingstatus listingsta1_  (cost=0.00..0.27 rows=1 width=4) (never 
executed)" ?


Because nothing comes out of the first index-scan.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Dell Perc/6

2008-02-13 Thread Tore Halset

On Feb 13, 2008, at 20:45, Tore Halset wrote:

The box have 16GB of ram, but my original test file was only 25GB.  
Sorry. Going to 33GB lowered the numbers for writing. Here you have  
some samples.


% sh -c "dd if=/dev/zero of=bigfile bs=8k count=400 && sync"
3276800 bytes (33 GB) copied, 103.722 seconds, 316 MB/s
3276800 bytes (33 GB) copied, 99.669 seconds, 329 MB/s

% time dd if=bigfile of=/dev/null bs=8k
3276800 bytes (33 GB) copied, 85.4235 seconds, 384 MB/s
3276800 bytes (33 GB) copied, 85.4628 seconds, 383 MB/s

Regards,
- Tore.


And here are the bonnie++ numbers. I am a bonnie++ newbie so I ran it  
with no options.


Version 1.03c   --Sequential Output-- --Sequential Input-  
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--  
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec  
%CP  /sec %CP
harteigen32136M 83983  97 221757  40 106483  19 89603  97 268787   
22 886.1   1
--Sequential Create-- Random  
Create
-Create-- --Read--- -Delete-- -Create-- --Read---  
-Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec  
%CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++  
+ +++
harteigen,32136M, 
83983,97,221757,40,106483,19,89603,97,268787,22,886.1,1,16,+,+++,++ 
+++,+++,+,+++,+,+++,+,+++,+,+++



Regards,
 - Tore.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Optimizing No matching record Queries

2008-02-13 Thread Gregory Stark
"Pallav Kalva" <[EMAIL PROTECTED]> writes:

> This index would fix this problem but in general I would like to know what if
> there are queries where it does "index scan backwards" and there is no "order
> by clause" and the query is still bad ? Would there be a case like that or the
> planner uses index scan backwards only when use order by desc also.

I think you're oversimplifying. Basically you were asking the planner for the
most recent record for a given user. The planner had the choice either of

a) going through all the records for a given user and picking the most recent,

or b) scanning the records from most recent to oldest and looking for the
given user.

It was a choice between two evils. If there are a lot of records for the user
then a) will be bad since it has to scan all of them to find the most recent
and if there are no records for the user then b) will be bad because it'll
have to go through all of the records to the beginning of time.

The suggested index lets it scan the records for the given user from most
recent to oldest without seeing any records for any other user.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Tobias Brox
[Peter Koczan - Wed at 10:56:54AM -0600]
> We're considering setting up a SAN where I work. Is there anyone using
> a SAN, for postgres or other purposes? If so I have a few questions
> for you.

Some time ago, my boss was planning to order more hardware - including a
SAN - and coincidentally, SANs were discussed at this list as well.
The consensus on this list seemed to be that running postgres on SAN is
not cost efficiently - one would get better performance for a lower cost
if the database host is connected directly to the disks - and also,
buying the wrong SAN can cause quite some problems.

My boss (with good help of the local SAN-pusher) considered that the
arguments against the SAN solution on this list was not really valid for
an "enterprise" user.  The SAN-pusher really insisted that through a
state-of-the-art SAN theoretically it should be possible to achieve far
better bandwidth as well as lower latency to the disks.  Personally, I
don't have the clue, but all my colleagues believes him, so I guess he
is right ;-)  What I'm told is that the state-of-the-art SAN allows for
an "insane amount" of hard disks to be installed, much more than what
would fit into any decent database server.  We've ended up buying a SAN,
the physical installation was done last week, and I will be able to tell
in some months if it was a good idea after all, or not.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Join Query Perfomance Issue

2008-02-13 Thread Scott Marlowe
On Feb 12, 2008 4:11 AM, Thomas Zaksek <[EMAIL PROTECTED]> wrote:

> I tried turning off nestloop, but with terrible results:

Yeah, it didn't help.  I was expecting the query planner to switch to
a more efficient join plan.

> I also tried to increase work_men, now the config is
> work_mem = 4MB

Try setting it higher for JUST THIS query.  i.e.

set work_mem=128M;
explain analyze select 

and see how that runs.  Then play with it til you've got it down to
what helps.  Note that work_mem in postgresql.conf being too large can
be dangerous, so it might be something you set for just this query for
safety reasons.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Arjen van der Meijden

On 13-2-2008 22:06 Tobias Brox wrote:

What I'm told is that the state-of-the-art SAN allows for
an "insane amount" of hard disks to be installed, much more than what
would fit into any decent database server.  We've ended up buying a SAN,
the physical installation was done last week, and I will be able to tell
in some months if it was a good idea after all, or not.


Your SAN-pusher should have a look at the HP-submissions for TPC-C... 
The recent Xeon systems are all without SAN's and still able to connect 
hundreds of SAS-disks.


This one has 2+28+600 hard drives connected to it:
http://tpc.org/results/individual_results/HP/hp_ml370g5_2p_X5460_tpcc_080107_es.pdf

Long story short, using SAS you can theoretically connect up to 64k 
disks to a single system. And with the HP-example they connected 26 
external enclosures (MSA70) to 8 internal with external SAS-ports. I.e. 
they ended up with 28+600 harddrives spread out over 16 external 4-port 
SAS-connectors with a bandwidth of 12Gbit per connector...


Obviously its a bit difficult to share those 628 harddrives amongst 
several systems, but the argument your colleagues have for SAN isn't a 
very good one. All major hardware vendors nowadays have external 
SAS-enclosures which can hold 12-25 external harddrives (and can often 
be stacked to two or three enclosures) and can be connected to normal 
internal PCI-e SAS-raid-cards. Those controllers have commonly two 
external ports and can be used with other controllers in the system to 
combine all those connected enclosures to one or more virtual images, or 
you could have your software LVM/raid on top of those controllers.


Anyway, the common physical limit of 6-16 disks in a single 
server-enclosure isn't very relevant anymore in an argument against SAN.


Best regards,

Arjen

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] HOT TOAST?

2008-02-13 Thread Josh Berkus
Tom,

> The current TOAST code never does any updates, only inserts/deletes.
> But the HOT logic should be able to reclaim deleted rows early via
> pruning.

OK, so for a heavy update application we should still see a vacuum 
reduction, even if most of the rows are 40k large?

Time to run some tests ...

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Greg Smith

On Wed, 13 Feb 2008, Tobias Brox wrote:

What I'm told is that the state-of-the-art SAN allows for an "insane 
amount" of hard disks to be installed, much more than what would fit 
into any decent database server.


You can attach a surpringly large number of drives to a server nowadays, 
but in general it's easier to manage larger numbers of them on a SAN. 
Also, there are significant redundancy improvements using a SAN that are 
worth quite a bit in some enterprise environments.  Being able to connect 
all the drives, no matter how many, to two or more machines at once 
trivially is typically easier to setup on a SAN than when you're using 
more direct storage.


Basically the performance breaks down like this:

1) Going through the SAN interface (fiber channel etc.) introduces some 
latency and a potential write bottleneck compared with direct storage, 
everything else being equal.  This can really be a problem if you've got a 
poor SAN vendor or interface issues you can't sort out.


2) It can be easier to manage a large number of disks in the SAN, so for 
situations where aggregate disk throughput is the limiting factor the SAN 
solution might make sense.


3) At the high-end, you can get SANs with more cache than any direct 
controller I'm aware of, which for some applications can lead to them 
having a more quantifiable lead over direct storage.  It's easy (albeit 
expensive) to get an EMC array with 16GB worth of memory for caching on it 
for example (and with 480 drives).  And since they've got a more robust 
power setup than a typical server, you can even enable all the individual 
drive caches usefully (that's 16-32MB each nowadays, so at say 100 disks 
you've potentially got another 1.6GB of cache right there).  If you're got 
a typical server you can end up needing to turn off individual direct 
attached drive caches for writes, because they many not survive a power 
cycle even with a UPS, and you have to just rely on the controller write 
cache.


There's no universal advantage on either side here, just a different set 
of trade-offs.  Certainly you'll never come close to the performance/$ 
direct storage gets you if you buy that in SAN form instead, but at higher 
budgets or feature requirements they may make sense anyway.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Tobias Brox
[Arjen van der Meijden]
> Your SAN-pusher should have a look at the HP-submissions for TPC-C... 
> The recent Xeon systems are all without SAN's and still able to connect 
> hundreds of SAS-disks.

Yes, I had a feeling that the various alternative solutions for "direct
connection" hadn't been investigated fully.  I was pushing for it, but
hardware is not my thing.  Anyway, most likely the only harm done by
chosing SAN is that it's more expensive than an equivalent solution with
direct connected disks.  Well, not my money anyway. ;-)

> Obviously its a bit difficult to share those 628 harddrives amongst 
> several systems, but the argument your colleagues have for SAN isn't a 
> very good one.

As far as I've heard, you cannot really benefit much from this with
postgres, one cannot have two postgres servers on two hosts sharing the
same data (i.e. using one for failover or for CPU/memory-bound read
queries).

Having the SAN connected to several hosts gives us two benefits, if the
database host goes down but not the SAN, it will be quite fast to start
up a new postgres instance on a different host - and it will also be
possible to take out backups real-time from the SAN without much
performance-hit.  Anyway, with a warm standby server as described on
http://www.postgresql.org/docs/current/interactive/warm-standby.html one
can achieve pretty much the same without a SAN.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Scott Marlowe
On Feb 13, 2008 5:02 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
> On Wed, 13 Feb 2008, Tobias Brox wrote:
>
> > What I'm told is that the state-of-the-art SAN allows for an "insane
> > amount" of hard disks to be installed, much more than what would fit
> > into any decent database server.
>
> You can attach a surpringly large number of drives to a server nowadays,
> but in general it's easier to manage larger numbers of them on a SAN.
> Also, there are significant redundancy improvements using a SAN that are
> worth quite a bit in some enterprise environments.  Being able to connect
> all the drives, no matter how many, to two or more machines at once
> trivially is typically easier to setup on a SAN than when you're using
> more direct storage.

SNIP

> There's no universal advantage on either side here, just a different set
> of trade-offs.  Certainly you'll never come close to the performance/$
> direct storage gets you if you buy that in SAN form instead, but at higher
> budgets or feature requirements they may make sense anyway.

I agree with everything you've said here, and you've said it far more
clearly than I could have.

I'd like to add that it may still be feasable to have a SAN and a db
with locally attached storage.  Talk the boss into a 4 port caching
SAS controller and four very fast hard drives or something else on the
server so that you can run tests to compare the performance of a
rather limited on board RAID set to the big SAN.  For certain kinds of
things, like loading tables, it will still be a very good idea to have
local drives for caching and transforming data and such.

Going further, the argument for putting the db onto the SAN may be
weakened if the amount of data on the db server can't and likely won't
require a lot of space.  A lot of backend office dbs are running in
the sub gigabyte range and will never grow to the size of the social
security database.  Even with dozens of apps, an in house db server
might be using no more than a few dozen gigabytes of storage.  Given
the cost and performance of large SAS and SATA drives, it's not all
unlikely that you can fit everything you need for the next five years
on a single set of disks on a server that's twice as powerful as most
internal db servers need.

You can hide the cost of the extra drives in the shadow of the receipt
for the SAN.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Bruce Momjian

Should this be summarized somewhere in our docs; just a few lines with
the tradeoffs, direct storage = cheaper, faster, SAN = more configurable?

---

Scott Marlowe wrote:
> On Feb 13, 2008 5:02 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
> > On Wed, 13 Feb 2008, Tobias Brox wrote:
> >
> > > What I'm told is that the state-of-the-art SAN allows for an "insane
> > > amount" of hard disks to be installed, much more than what would fit
> > > into any decent database server.
> >
> > You can attach a surpringly large number of drives to a server nowadays,
> > but in general it's easier to manage larger numbers of them on a SAN.
> > Also, there are significant redundancy improvements using a SAN that are
> > worth quite a bit in some enterprise environments.  Being able to connect
> > all the drives, no matter how many, to two or more machines at once
> > trivially is typically easier to setup on a SAN than when you're using
> > more direct storage.
> 
> SNIP
> 
> > There's no universal advantage on either side here, just a different set
> > of trade-offs.  Certainly you'll never come close to the performance/$
> > direct storage gets you if you buy that in SAN form instead, but at higher
> > budgets or feature requirements they may make sense anyway.
> 
> I agree with everything you've said here, and you've said it far more
> clearly than I could have.
> 
> I'd like to add that it may still be feasable to have a SAN and a db
> with locally attached storage.  Talk the boss into a 4 port caching
> SAS controller and four very fast hard drives or something else on the
> server so that you can run tests to compare the performance of a
> rather limited on board RAID set to the big SAN.  For certain kinds of
> things, like loading tables, it will still be a very good idea to have
> local drives for caching and transforming data and such.
> 
> Going further, the argument for putting the db onto the SAN may be
> weakened if the amount of data on the db server can't and likely won't
> require a lot of space.  A lot of backend office dbs are running in
> the sub gigabyte range and will never grow to the size of the social
> security database.  Even with dozens of apps, an in house db server
> might be using no more than a few dozen gigabytes of storage.  Given
> the cost and performance of large SAS and SATA drives, it's not all
> unlikely that you can fit everything you need for the next five years
> on a single set of disks on a server that's twice as powerful as most
> internal db servers need.
> 
> You can hide the cost of the extra drives in the shadow of the receipt
> for the SAN.
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Anyone using a SAN?

2008-02-13 Thread Peter Koczan
Thanks for all your input, it is very helpful. A SAN for our postgres
deployment is probably sufficient in terms of performance, because we
just don't have that much data. I'm a little concerned about needs for
user and research databases, but if a project needs a big, fast
database, it might be wise to have them shell out for DAS.

My co-workers and I are meeting with a vendor in two weeks (3Par,
specifically), and I think I have a better idea of what I should be
looking at. I'll keep you all up on the situation. Keep the ideas
coming as I still would like to know of any other important factors.

Thanks again.

Peter

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings