[PERFORM] Using statement_timeout as a performance tool?

2007-02-01 Thread Mark Stosberg

Hello,

I'm working on setting up replication with Slony, and will soon have a
slave that a lot of SELECT traffic will be sent to (over 500k/day).

The primary query we need to run is somewhat complex, but seems to
complete on average in well under a second.

However, every so often (less in 1 in 10,000 queries) we'll see the
query take 2 or 3 minutes.

It's not clear why this is happening-- perhaps there is something else
going on that is affecting this query.

I'm considering the use of  "statement_timeout" to limit the time of
this particular query, to suppress the rare "run away", and avoid tying
up the processor for that additional time.

I think it may be better to give up, quit spending cycles on it right
then, and return an "oops, try again in a few minutes" message instead.
>From the data we have, seems like it has a strong chance of working again.

Is anyone else using "statement_timeout" as part of an overall
performance plan?

Mark


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Subselect query enhancement

2007-02-01 Thread Michael Artz

I'm needing help determining the best all-around query for the
following situation.  I have primary table that holds ip information
and two other tables that hold event data for the specific IP in with
a one-to-many mapping between them, ie:

CREATE TABLE ip_info (
   ip IP4,
   --other data
);

CREATE TABLE network_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR,
   port INTEGER,
   --other data
);

CREATE TABLE host_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR
   port INTEGER,
   --other data
);

There is quite a bit of commonality between the network_events and
host_events schemas, but they do not currently share an ancestor.
ip_info has about 13 million rows, the network_events table has about
30 million rows, and the host_events table has about 7 million rows.
There are indexes on all the rows.

The query that I would like to execute is to select all the rows of
ip_info that have either  network or host events that meet some
criteria, i.e. name='blah'.  I have 3 different possibilities that I
have thought of to execute this.

First, 2 'ip IN (SELECT ...)' statements joined by an OR:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah')
   OR ip IN (SELECT ip FROM host_events WHERE name='blah');

Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah'
 UNION
 SELECT ip FROM host_events WHERE name='blah');

Or, finally, the UNION statment with DISTINCTs:

SELECT * FROM ip_info
 WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah'
 UNION
 SELECT DISTINCT ip FROM host_events WHERE name='blah');


From what I have read, the UNION statement does an implicit DISTINCT,

but I thought that doing it on each of the tables would result in
slightly faster execution.  Can you think of any other ways to
implement the previous query?

I have explained/analyzed all the queries but, unfortunately, they are
on an isolated computer.  The gist is that, for relatively
low-incidence values of name, the UNION performs better, but for
queries on a common name, the dual-subselect query performs better.

The explains look something like:
Dual-subselect:
Seq scan on ip_info
 Filter:  ... AND ((hashed_subplan) OR (hashed_subplan))
 Subplan
   -> Result
 -> Append
   -> various scans on host_events
   -> Result
 -> Append
   -> various scans on network_events

UNION SELECT DISTINCT:

Nested Loop
 -> Unique
   -> Sort
 -> Append
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on host_events
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on network_events

If it would help to have more information, I could retype some of
numbers in the explain.

Any ideas?

Thanks,
-Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Richard Huxton

Michael Artz wrote:

I'm needing help determining the best all-around query for the
following situation. 


Not sure whether such a beast exists, but...

> I have primary table that holds ip information

and two other tables that hold event data for the specific IP in with
a one-to-many mapping between them, ie:

[snip]

There is quite a bit of commonality between the network_events and
host_events schemas, but they do not currently share an ancestor.
ip_info has about 13 million rows, the network_events table has about
30 million rows, and the host_events table has about 7 million rows.
There are indexes on all the rows.


What indexes though. Do you have (name,ip) on the two event tables?

How selective is "name" - are there many different values or just a few? 
If lots, it might be worth increasing the statistics gathered on that 
column (ALTER COLUMN ... SET STATISTICS).

http://www.postgresql.org/docs/8.2/static/sql-altertable.html


The query that I would like to execute is to select all the rows of
ip_info that have either  network or host events that meet some
criteria, i.e. name='blah'.  I have 3 different possibilities that I
have thought of to execute this.

First, 2 'ip IN (SELECT ...)' statements joined by an OR:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah')
   OR ip IN (SELECT ip FROM host_events WHERE name='blah');

Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah'
 UNION
 SELECT ip FROM host_events WHERE name='blah');

Or, finally, the UNION statment with DISTINCTs:

SELECT * FROM ip_info
 WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah'
 UNION
 SELECT DISTINCT ip FROM host_events WHERE name='blah');

 From what I have read, the UNION statement does an implicit DISTINCT,
but I thought that doing it on each of the tables would result in
slightly faster execution.  Can you think of any other ways to
implement the previous query?


You're right about removing duplicates. Not sure whether the DISTINCTs 
on the sub-selects are helping or hindering. It'll probably depend on 
your hardware, config, number of rows etc.


The only other way I can think of for this query is to UNION two JOINs. 
Might interact well with the (name,ip) index I mentioned above.



I have explained/analyzed all the queries but, unfortunately, they are
on an isolated computer.  The gist is that, for relatively
low-incidence values of name, the UNION performs better, but for
queries on a common name, the dual-subselect query performs better.


Difficult to say much without seeing the full explain analyse. Did the 
row estimates look reasonable?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Ted Allen
I've found that doing joins seems to produce better results on the big 
tables queries I use.  This is not always the case though.


How about this option:

SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING 
(ip) RIGHT JOIN  host_events USING (ip) WHERE 
(network_events.name='blah' OR host_events.name = 'blah')  AND 
ip_info.ip IS NOT NULL;


That gets rid of the sub-queries your using that look pretty costly.

Michael Artz wrote:

I'm needing help determining the best all-around query for the
following situation.  I have primary table that holds ip information
and two other tables that hold event data for the specific IP in with
a one-to-many mapping between them, ie:

CREATE TABLE ip_info (
   ip IP4,
   --other data
);

CREATE TABLE network_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR,
   port INTEGER,
   --other data
);

CREATE TABLE host_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR
   port INTEGER,
   --other data
);

There is quite a bit of commonality between the network_events and
host_events schemas, but they do not currently share an ancestor.
ip_info has about 13 million rows, the network_events table has about
30 million rows, and the host_events table has about 7 million rows.
There are indexes on all the rows.

The query that I would like to execute is to select all the rows of
ip_info that have either  network or host events that meet some
criteria, i.e. name='blah'.  I have 3 different possibilities that I
have thought of to execute this.

First, 2 'ip IN (SELECT ...)' statements joined by an OR:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah')
   OR ip IN (SELECT ip FROM host_events WHERE name='blah');

Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah'
 UNION
 SELECT ip FROM host_events WHERE name='blah');

Or, finally, the UNION statment with DISTINCTs:

SELECT * FROM ip_info
 WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah'
 UNION
 SELECT DISTINCT ip FROM host_events WHERE name='blah');


From what I have read, the UNION statement does an implicit DISTINCT,

but I thought that doing it on each of the tables would result in
slightly faster execution.  Can you think of any other ways to
implement the previous query?

I have explained/analyzed all the queries but, unfortunately, they are
on an isolated computer.  The gist is that, for relatively
low-incidence values of name, the UNION performs better, but for
queries on a common name, the dual-subselect query performs better.

The explains look something like:
Dual-subselect:
Seq scan on ip_info
 Filter:  ... AND ((hashed_subplan) OR (hashed_subplan))
 Subplan
   -> Result
 -> Append
   -> various scans on host_events
   -> Result
 -> Append
   -> various scans on network_events

UNION SELECT DISTINCT:

Nested Loop
 -> Unique
   -> Sort
 -> Append
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on host_events
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on network_events

If it would help to have more information, I could retype some of
numbers in the explain.

Any ideas?

Thanks,
-Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

[EMAIL PROTECTED] 
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Michael Artz

How about this option:

SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING
(ip) RIGHT JOIN  host_events USING (ip) WHERE
(network_events.name='blah' OR host_events.name = 'blah')  AND
ip_info.ip IS NOT NULL;


Nah, that seems to be much much worse.  The other queries usually
return in 1-2 minutes, this one has been running for 30 minutes and
has still not returned

-Mike

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Frank Wiles
On Thu, 1 Feb 2007 11:42:03 -0500
"Michael Artz" <[EMAIL PROTECTED]> wrote:

> I'm needing help determining the best all-around query for the
> following situation.  I have primary table that holds ip information
> and two other tables that hold event data for the specific IP in with
> a one-to-many mapping between them, ie:
> 
> CREATE TABLE ip_info (
> ip IP4,
> --other data
> );
> 
> CREATE TABLE network_events (
> ip IP4 NOT NULL REFERENCES ip_info(ip),
> name VARCHAR,
> port INTEGER,
> --other data
> );
> 
> CREATE TABLE host_events (
> ip IP4 NOT NULL REFERENCES ip_info(ip),
> name VARCHAR
> port INTEGER,
> --other data
> );

It would probably help to have an index on that column for all three
tables, then I would wager using joins will be the speed winner. 

 -
   Frank Wiles <[EMAIL PROTECTED]>
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Michael Artz

 > I have primary table that holds ip information
> and two other tables that hold event data for the specific IP in with
> a one-to-many mapping between them, ie:
[snip]
> There is quite a bit of commonality between the network_events and
> host_events schemas, but they do not currently share an ancestor.
> ip_info has about 13 million rows, the network_events table has about
> 30 million rows, and the host_events table has about 7 million rows.
> There are indexes on all the rows.

What indexes though. Do you have (name,ip) on the two event tables?


All the columns are indexed individually.  The tables are completely
static, as I reload the whole DB with new data every day.


How selective is "name" - are there many different values or just a few?
If lots, it might be worth increasing the statistics gathered on that
column (ALTER COLUMN ... SET STATISTICS).
http://www.postgresql.org/docs/8.2/static/sql-altertable.html


I guess that is the heart of my question.  "name" is not very
selective (there are only 20 or so choices) however other columns are
fairly selective for certain cases, such as 'port'.  When querying on
and unusual port, the query is very fast, and the single UNIONed
subselect returns quickly.  When 'port' is not very selective (like
port = '80', which is roughly 1/2 of the rows in the DB), the dual
subselect query wins, hands-down.

And I have altered the statistics via the config file:
 default_statistics_target = 100
Perhaps this should be even higher for certain columns?


> The query that I would like to execute is to select all the rows of
> ip_info that have either  network or host events that meet some
> criteria, i.e. name='blah'.  I have 3 different possibilities that I
> have thought of to execute this.
>
> First, 2 'ip IN (SELECT ...)' statements joined by an OR:
>
> SELECT * FROM ip_info
>  WHERE ip IN (SELECT ip FROM network_events WHERE name='blah')
>OR ip IN (SELECT ip FROM host_events WHERE name='blah');
>
> Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement:
>
> SELECT * FROM ip_info
>  WHERE ip IN (SELECT ip FROM network_events WHERE name='blah'
>  UNION
>  SELECT ip FROM host_events WHERE name='blah');
>
> Or, finally, the UNION statment with DISTINCTs:
>
> SELECT * FROM ip_info
>  WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah'
>  UNION
>  SELECT DISTINCT ip FROM host_events WHERE name='blah');
>
>  From what I have read, the UNION statement does an implicit DISTINCT,
> but I thought that doing it on each of the tables would result in
> slightly faster execution.  Can you think of any other ways to
> implement the previous query?

You're right about removing duplicates. Not sure whether the DISTINCTs
on the sub-selects are helping or hindering. It'll probably depend on
your hardware, config, number of rows etc.

The only other way I can think of for this query is to UNION two JOINs.
Might interact well with the (name,ip) index I mentioned above.


Nah, that did very poorly.


> I have explained/analyzed all the queries but, unfortunately, they are
> on an isolated computer.  The gist is that, for relatively
> low-incidence values of name, the UNION performs better, but for
> queries on a common name, the dual-subselect query performs better.

Difficult to say much without seeing the full explain analyse. Did the
row estimates look reasonable?


hmm, I think so, but I'm not that good in reading the outputs.  I'll
see if I can retype some of the interesting bits of the explain
analyze.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Richard Huxton

Michael Artz wrote:

 > I have primary table that holds ip information
> and two other tables that hold event data for the specific IP in with
> a one-to-many mapping between them, ie:
[snip]
> There is quite a bit of commonality between the network_events and
> host_events schemas, but they do not currently share an ancestor.
> ip_info has about 13 million rows, the network_events table has about
> 30 million rows, and the host_events table has about 7 million rows.
> There are indexes on all the rows.

What indexes though. Do you have (name,ip) on the two event tables?


All the columns are indexed individually.  The tables are completely
static, as I reload the whole DB with new data every day.


The point of a (name,ip) index would be to let you read off ip numbers 
in order easily.



How selective is "name" - are there many different values or just a few?
If lots, it might be worth increasing the statistics gathered on that
column (ALTER COLUMN ... SET STATISTICS).
http://www.postgresql.org/docs/8.2/static/sql-altertable.html


I guess that is the heart of my question.  "name" is not very
selective (there are only 20 or so choices) however other columns are
fairly selective for certain cases, such as 'port'.  When querying on
and unusual port, the query is very fast, and the single UNIONed
subselect returns quickly.  When 'port' is not very selective (like
port = '80', which is roughly 1/2 of the rows in the DB), the dual
subselect query wins, hands-down.

And I have altered the statistics via the config file:
 default_statistics_target = 100
Perhaps this should be even higher for certain columns?


You're probably better off leaving it at 10 and upping it for the vital 
columns. 25 for names should be a good choice.


You could try partial indexes for those cases where you have 
particularly common values of name/port:


CREATE INDEX idx1 ON host_events (ip) WHERE port=80;

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Michael Artz

Here are some numbers for 3 different queries using a very selective
query (port = ).  I'm thinking that, since the row estimates are
different from the actuals (2 vs 2000), that this particular port
didn't make it into the statistics ... is that true?  Does this
matter?  If this isn't enough data, I can type up some more.

One thing that I forgot to mention is that the network_events and
host_events tables are partitioned by first octet of the IP, so when I
say "various scans of ..." that means that there is a scan of each of
the partitions, the type determined by the planner and the statistics,
I assume.

** Dual subselects:
SELECT * FROM ip_info
WHERE ip IN (SELECT ip FROM network_events WHERE port = )
  OR ip IN (SELECT ip FROM host_events WHERE port = );

Seq scan on ip_info (cost=2776..354575 rows=9312338 width=72) (actual
time=34..8238 rows=234 loops=1)
Filter:  ... AND ((hashed_subplan) OR (hashed_subplan))
Subplan
  -> Result (cost=0..849 rows=459 width=4) (actual time=0.176..2.310
rows=72 loops=1)
-> Append (cost=0.00..849 rows=459 width=4) (actual
time=0.173..2.095 rows=72 loops=1)
  -> various scans on host_events
  -> Result (cost=0..1923 rows=856 width=4) (actual
time=0.072..24.614 rows=2393 loops=1)
-> Append (cost=0..1923 rows=856 width=4) (actual time=0.069..27
rows=2393 loops=1)
  -> various scans on network_events

** Single subselect:

SELECT * FROM ip_info
WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE port = 
UNION
SELECT DISTINCT ip FROM host_events WHERE port = );

Nested Loop (cost=2841..2856 rows=2 width=72) (actual time=55..106
rows=2349 loops=1)
-> Unique (cost=2841..2856 rows=2 width=72) (actual time=55..66
rows=2349 loops=1)
  -> Sort (cost=2841..2841 rows=2 width=4) (actual time=55..58
rows=2401 loops=1)
-> Append (cost=1956..2841 rows=2 width=4) (actual time=29..50
rows=2401 loops=1)
  -> Unique (cost=1956..1959 rows=2 width=4) (actual time=29..50
rows=2401 loops=1)
-> Sort
  -> Result
-> Append
  -> various scans on network_events
  -> Unique (cost=869..871 rows=1 width=4) (actual time=2.9..3.3
rows=70 loops=1)
-> Sort
  -> Result
-> Append
  -> various scans on host_events


** The join:

SELECT distinct ip_info.*
 FROM ip_info RIGHT JOIN network_events USING (ip)
  RIGHT JOIN  host_events USING (ip)
 WHERE (network_events.port= OR host_events.port=)

Unique (cost=9238..9367 rows=1965 width=72) (actual time=61..61 rows=52 loops=1)
 -> Sort (cost=9238..9288 rows=1965 width=72) (actual time=61..61
rows=63 loops=1)
   -> Hash Join (cost=850..9176 rows=1965 width=76) (actual
time=0..54 rows=2393 loops=1)
 -> Nested Loop Left Join (cost=0..8205 rows=856 width=76)
(actual time=0..54 rows=2393 loops=1)
   -> Append
 -> various scans of network_events
 -> Index Scan of ip_info (cost=0..7 rows=1 width=72) (actual
time=0..0 rows=1 loops 2393)
   ->Hash (cost=849..849 rows=459 width=4) (actual time=0..2 rows=72 loops=1)
 -> Append
   ->various scans of host_events


On 2/1/07, Michael Artz <[EMAIL PROTECTED]> wrote:

>  > I have primary table that holds ip information
> > and two other tables that hold event data for the specific IP in with
> > a one-to-many mapping between them, ie:
> [snip]
> > There is quite a bit of commonality between the network_events and
> > host_events schemas, but they do not currently share an ancestor.
> > ip_info has about 13 million rows, the network_events table has about
> > 30 million rows, and the host_events table has about 7 million rows.
> > There are indexes on all the rows.
>
> What indexes though. Do you have (name,ip) on the two event tables?

All the columns are indexed individually.  The tables are completely
static, as I reload the whole DB with new data every day.

> How selective is "name" - are there many different values or just a few?
> If lots, it might be worth increasing the statistics gathered on that
> column (ALTER COLUMN ... SET STATISTICS).
> http://www.postgresql.org/docs/8.2/static/sql-altertable.html

I guess that is the heart of my question.  "name" is not very
selective (there are only 20 or so choices) however other columns are
fairly selective for certain cases, such as 'port'.  When querying on
and unusual port, the query is very fast, and the single UNIONed
subselect returns quickly.  When 'port' is not very selective (like
port = '80', which is roughly 1/2 of the rows in the DB), the dual
subselect query wins, hands-down.

And I have altered the statistics via the config file:
  default_statistics_target = 100
Perhaps this should be even higher for certain columns?

> > The query that I would like to execute is to select all the rows of
> > ip_info that have either  network or host events that meet some
> > criteria, i.e. name='blah'.  I have 3 different possibilities that I
> > have thought of to execute this.
>

Re: [PERFORM] int4 vs varchar to store ip addr

2007-02-01 Thread Michael Artz

On 1/30/07, Pomarede Nicolas <[EMAIL PROTECTED]> wrote:

On Mon, 29 Jan 2007, Florian Weimer wrote:

> * Pomarede Nicolas:
>
>> I could use PG internal inet/cidr type to store the ip addrs, which
>> would take 12 bytes per IP, thus gaining a few bytes per row.
>
> I thought it's down to 8 bytes in PostgreSQL 8.2, but I could be
> mistaken.
>
>> Apart from gaining some bytes, would the btree index scan be faster
>> with this data type compared to plain varchar ?
>
> It will be faster because less I/O is involved.
>
> For purposes like yours, there is a special ip4 type in a contributed
> package which brings down the byte count to 4.  I'm not sure if it's
> been ported to PostgreSQL 8.2 yet.

Yes thanks for this reference, ip4r package seems to be a nice addition to
postgres for what I'd like to do. Does someone here have some real life
experience with it (regarding performance and stability) ?


I'm using IP4 and have not had a problem with it in 8.2 (or 8.1) in
terms of stability.  As I designed my DB using it, I don't really have
any comparisons to inet and/or varchar.  One of the most useful things
for me is the ability to create a GIST index to support determination
of range inclusion (i.e. 192.168.23.1 is in the 192.168/16 network
range), although it doesn't sound like this would be useful to you.

-Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Andrew Lazarus

>> How about this option:
>>
>> SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING
>> (ip) RIGHT JOIN  host_events USING (ip) WHERE
>> (network_events.name='blah' OR host_events.name = 'blah')  AND
>> ip_info.ip IS NOT NULL;

MA> Nah, that seems to be much much worse.  The other queries usually
MA> return in 1-2 minutes, this one has been running for 30 minutes and
MA> has still not returned

I find that an OR involving two different fields (in this case even
different tables) is faster when replaced by the equivalent UNION. In this
case---

SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING
(ip) WHERE
network_events.name='blah' AND ip_info.ip IS NOT NULL
UNION
SELECT distinct ip_info.* FROM ip_info RIGHT JOIN host_events USING (ip) WHERE
host_events.name = 'blah'  AND ip_info.ip IS NOT NULL;

Moreover, at least through 8.1, GROUP BY is faster than DISTINCT.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] drive configuration for a new server

2007-02-01 Thread Ben
I'm looking to replace some old crusty hardware with some sparkling new 
hardware. In the process, I'm looking to move away from the previous 
mentality of having the Big Server for Everything to having a cluster of 
servers, each of which handles some discrete subset of data. But rackspace 
isn't inifinte, so I'm leaning towards cases that give me 8 drive bays. 
This leaves me with an interesting problem of how to configure these 
limited number of drives.


I know that ideally I would have seperate spindles for WAL, indexes, and 
data. But I also know that I must be able to survive a drive failure, and 
I want at least 1TB of space for my data. I suspect with so few drive 
bays, I won't be living in an ideal world.


With an even mix of reads and writes (or possibly more writes than reads), 
is it better to use RAID10 and have everything on the same partition, or 
to have data and indexes on a 6-drive RAID5 with WAL on its own RAID1?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq