[PERFORM] plperl vs plpgsql

2005-04-15 Thread Alex
Hi,
i am thinking about swiching to plperl as it seems to me much more 
flexible and easier to create functions.

what is the recommended PL for postgres? or which one is most widely 
used / most popular?
is there a performance difference between plpgsql and plperl ?

porting to other systems is not a real issue as all my servers have perl 
installed.

Thanks for any advice
Alex

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] plperl vs plpgsql

2005-04-17 Thread Alex
Is there a performance difference between the two?
which of the PL is most widely used. One problem i have with the plpgsql 
is that the quoting is really a pain.


Christopher Browne wrote:
After takin a swig o' Arrakan spice grog, [EMAIL PROTECTED] (Alex) belched out:
 

i am thinking about swiching to plperl as it seems to me much more
flexible and easier to create functions.
what is the recommended PL for postgres? or which one is most widely
used / most popular?
is there a performance difference between plpgsql and plperl ?
   

If what you're trying to do is "munge text," pl/perl will be a whole
lot more suitable than pl/pgsql because it has a rich set of text
mungeing tools and string functions which pl/pgsql lacks.
If you intend to do a lot of work involving reading unmunged tuples
from this table and that, pl/pgsql provides a much more natural
syntax, and will probably be a bit faster as the query processor may
even be able to expand some of the actions, rather than needing to
treat Perl code as an "opaque blob."
I would definitely be inclined to use the more natural language for
the given task...
 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


[PERFORM] Some Performance Advice Needed

2004-12-23 Thread Alex
Hi,
i recently run pgbench against different servers and got some results I 
dont quite understand.

A) EV1: Dual Xenon, 2GHz, 1GB Memory, SCSI 10Krpm, RHE3
B) Dual Pentium3 1.4ghz (Blade), SCSI Disk 10Krmp, 1GB Memory, Redhat 8
C) P4 3.2GHz, IDE 7.2Krpm, 1GBMem, Fedora Core2
All did run only postgres 7.4.6
pgconf settings:
max_connections = 100
shared_buffers = 8192
sort_mem = 8192
vacuum_mem = 32768
max_fsm_pages = 20
max_fsm_relations = 1
wal_sync_method = fsync  
wal_buffers = 64   
checkpoint_segments = 10   
effective_cache_size = 65536
random_page_cost = 1.4

/etc/sysctl.conf
shmall and shmmax set to 768mb
Runnig PGbench reported
A) 220 tps
B) 240 tps
C) 510 tps
Running hdparm reported
A) 920mb/s   (SCSI 10k)
B) 270mb/s   (SCSI 10k)
C) 1750mb/s  (IDE  7.2k)
What I dont quite understand is why a P3.2 is twice as fast as a Dual 
Xenon with SCSI disks, A dual Xenon 2GHz is not faster than a dual P3 
1.4Ghz, and the hdparm results also dont make much sense.

Has anybody an explanation for that? Is there something I can do to get 
more performance out of the SCSI disks?

Thanks for any advise
Alex









---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] How can I make use of both CPUs in a dual processor machine

2005-02-09 Thread Alex
Hi,
we just got a new dual processor machine and I wonder if there is a way 
to utilize both processors.

Our DB server is basically fully dedicated to postgres. (its a dual amd 
with 4gb mem.)

I have a batch job that periodically loads about 8 million records into 
a table.
for this I drop the indices, truncate the table, use the copy to insert 
the data, recreate the indices (4 indices), vacuum the table.

That is all done through a perl batch job.
While I am doing this, I noticed that only one CPU is really used.
So here are my questions:
Is there a way to utilize both CPUs
Is it possible to split up the import file and run 2 copy processes
Is it possible to create 2 indices at the same time
Would I actually gain anything from that, or is the bottleneck somewhere 
else ?

(perl is a given here for the batch job)
If anyone has some experience or ideas... any hints or help on this 
would be appreciated.

Thanks
Alex
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] How can I make use of both CPUs in a dual processor

2005-02-09 Thread Alex
Thanks John.
Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI 
Disks, 4GB of memory.
Disks are pretty fast and memory should be more than enough. Currently 
we dont have many concurrent connections.

I run PG 8.0.1 on Fedora Core 3
When I now run the batch job, one CPU runs in the 80-90% the other in 
5-10% max.



John A Meinel wrote:
Alex wrote:
Hi,
we just got a new dual processor machine and I wonder if there is a 
way to utilize both processors.

Our DB server is basically fully dedicated to postgres. (its a dual 
amd with 4gb mem.)

I have a batch job that periodically loads about 8 million records 
into a table.
for this I drop the indices, truncate the table, use the copy to 
insert the data, recreate the indices (4 indices), vacuum the table.

That is all done through a perl batch job.
While I am doing this, I noticed that only one CPU is really used.
So here are my questions:
Is there a way to utilize both CPUs
For postgres, you get a max of 1 CPU per connection, so to use both, 
you need 2 CPU's.

Is it possible to split up the import file and run 2 copy processes
Is it possible to create 2 indices at the same time
You'd want to be a little careful. Postgres uses work_mem for vacuum 
and index creation, so if you have 2 processes doing it, just make 
sure you aren't running out of RAM and going to swap.

Would I actually gain anything from that, or is the bottleneck 
somewhere else ?

More likely, the bottleneck would be disk I/O. Simply because it is 
almost always disk I/O. However, without knowing your configuration, 
how much CPU is used during the operation, etc, it's hard to say.

(perl is a given here for the batch job)
If anyone has some experience or ideas... any hints or help on this 
would be appreciated.

Thanks
Alex
Sorry I wasn't a lot of help. You should probably post your postgres 
version, and more information about how much CPU load there is while 
your load is running.

John
=:->

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How can I make use of both CPUs in a dual processor

2005-02-10 Thread Alex
Thanks for all the suggestions. It seems that creating indices, or even 
import data using a copy is easy to implement. I also have some jobs 
that create reports and want to try if I gain anything if i work reports 
in parallel.

will give it a try in the next week and let you know the resuls.
Alex
John A Meinel wrote:
Alex wrote:
Thanks John.
Well as I mentioned. I have a  Dual AMD Opteron 64 2.4ghz, 15k rpm 
SCSI Disks, 4GB of memory.
Disks are pretty fast and memory should be more than enough. 
Currently we dont have many concurrent connections.

Well, you didn't mention Opteron before (it makes a difference against 
Xeons).
How many disks and in what configuration?
Do you have pg_xlog on a separate set of disks?
Are your drives in RAID 10 (0+1) or RAID 5?

If you have enough disks the recommended configuration is at least a 
RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest 
of the drives) for the actual data.

If your dataset is read heavy, or you have more than 6 disks, you can 
get away with RAID 5 for the actual data. But since you are talking 
about loading 8million rows at once, it certainly sounds like you are 
write heavy.

If you only have a few disks, it's still probably better to put 
pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much 
append only, so if you dedicate a disk set to it, you eliminate a lot 
of seek times.

I run PG 8.0.1 on Fedora Core 3
When I now run the batch job, one CPU runs in the 80-90% the other in 
5-10% max.

Anyway, it doesn't completely sound like you are CPU limited, but you 
might be able to get a little bit more if you spawn another process. 
Have you tried dropping the index, doing the copy, and then recreating 
the 4-indexes in separate processes?

The simple test for this is to open 3-4 psql connections, have one of 
them drop the indexes and do the copy, in the other connections you 
can already have typed "CREATE INDEX ..." so when the copy is done and 
committed to the database, you just go to the other terminals and hit 
enter.

Unfortunately you'll have to use wall clock time to see if this is 
faster.

Though I think you could do the same thing with a bash script. The 
authentication should be in "trust" mode so that you don't take the 
time to type your password.

#!/bin/bash
psql -h  -c "DROP INDEX ...; COPY FROM ..."
psql -h  -c "CREATE INDEX ..." &
psql -h  -c "CREATE INDEX ..." &
psql -h  -c "CREATE INDEX ..." &
psql -h  -c "CREATE INDEX ..."
Now, I don't really know how to wait for all child processes in a bash 
script (I could give you the python for it, but you're a perl guy). 
But by not spawning the last INDEX, I'm hoping it takes longer than 
the rest. Try to put the most difficult index there.

Then you could just run
time loadscript.sh
I'm sure you could do the equivalent in perl. Just open multiple 
connections to the DB, and have them ready.

I'm guessing since you are on a dual processor machine, you won't get 
much better performance above 2 connections.

You can also try doing 2 COPYs at the same time, but it seems like you 
would have issues. Do you have any serial columns that you expect to 
be in a certain order, or is all the information in the copy?

If the latter, try it, let us know what you get. I can't tell you the 
perl for this, since I'm not a perl guy.

John
=:->

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Poor query performance

2009-07-14 Thread Alex
Forgot to add:

postg...@ec2-75-101-128-4:~$ psql --version
psql (PostgreSQL) 8.3.5

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Poor query performance

2009-07-14 Thread Alex
Below is a query that takes 16 seconds on the first run.  I am having
generally poor performance for queries in uncached areas of the data
and often mediocre (500ms-2s+) performance generallly, although
sometimes it's very fast.  All the queries are pretty similar and use
the indexes this way.

I've been trying to tune this thing with little luck.  There are about
1.5M records.  It's using the index properly.  Settings are:
work_mem=20MB, shared_buffers=128MB, effective_cache_size=1024MB.

I have run ANALYZE and VACUUM FULL recently.

The whole database is around 16GB.   The server is an ec2 instance
with 5 compute units in two cores (1 unit is one 2Ghz processor) and
1.7Gb of RAM.

Swapping seems to be minimal.

Note that the ANALYZE is from my slow query logger, so the numbers
don't match the time the uncached query took.

There are 118K rows in this select.  It is possible the sort is the
issue, but that's why I have 20M working memory.  Do I really need
more than that?

Slow query: (16.852746963501) [0] SELECT id FROM "source_listings"
WHERE (post_time BETWEEN '2009-07-02 14:19:29.520886' AND '2009-07-11
14:19:29.520930' AND ((geo_lon BETWEEN 10879358 AND 10909241 AND
geo_lat BETWEEN 13229080 AND 13242719)) AND city = 'boston')  ORDER BY
post_time DESC LIMIT 108 OFFSET 0
Limit  (cost=30396.63..30396.90 rows=108 width=12) (actual
time=1044.575..1044.764 rows=108 loops=1)
  ->  Sort  (cost=30396.63..30401.47 rows=1939 width=12) (actual
time=1044.573..1044.630 rows=108 loops=1)
Sort Key: post_time
Sort Method:  top-N heapsort  Memory: 21kB
->  Bitmap Heap Scan on source_listings
(cost=23080.81..30321.44 rows=1939 width=12) (actual
time=321.111..952.704 rows=118212 loops=1)
  Recheck Cond: ((city = 'boston'::text) AND (post_time >=
'2009-07-02 14:19:29.520886'::timestamp without time zone) AND
(post_time <= '2009-07-11 14:19:29.52093'::timestamp without time
zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon
>= 10879358) AND (geo_lon <= 10909241))
  ->  Bitmap Index Scan on sl_city_etc
(cost=0.00..23080.33 rows=1939 width=0) (actual time=309.007..309.007
rows=118212 loops=1)
Index Cond: ((city = 'boston'::text) AND
(post_time >= '2009-07-02 14:19:29.520886'::timestamp without time
zone) AND (post_time <= '2009-07-11 14:19:29.52093'::timestamp without
time zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND
(geo_lon >= 10879358) AND (geo_lon <= 10909241))
Total runtime: 1045.683 ms



Even without the sort performance is poor:

cribq=# EXPLAIN ANALYZE SELECT count(id) FROM "source_listings" WHERE
(post_time BETWEEN '2009-07-02 14:19:29.520886' AND '2009-07-11
14:19:29.520930' AND ((geo_lon BETWEEN 10879358 AND 10909241 AND
geo_lat BETWEEN 13229080 AND 13242719)) AND city = 'boston');
 
QUERY
PLAN
--
 Aggregate  (cost=30326.29..30326.30 rows=1 width=4) (actual
time=847.967..847.968 rows=1 loops=1)
   ->  Bitmap Heap Scan on source_listings  (cost=23080.81..30321.44
rows=1939 width=4) (actual time=219.505..769.878 rows=118212 loops=1)
 Recheck Cond: ((city = 'boston'::text) AND (post_time >=
'2009-07-02 14:19:29.520886'::timestamp without time zone) AND
(post_time <= '2009-07-11 14:19:29.52093'::timestamp without time
zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon
>= 10879358) AND (geo_lon <= 10909241))
 ->  Bitmap Index Scan on sl_city_etc  (cost=0.00..23080.33
rows=1939 width=0) (actual time=206.981..206.981 rows=118212 loops=1)
   Index Cond: ((city = 'boston'::text) AND (post_time >=
'2009-07-02 14:19:29.520886'::timestamp without time zone) AND
(post_time <= '2009-07-11 14:19:29.52093'::timestamp without time
zone) AND (geo_lat >= 13229080) AND (geo_lat <= 13242719) AND (geo_lon
>= 10879358) AND (geo_lon <= 10909241))
 Total runtime: 848.816 ms

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query performance

2009-07-16 Thread Alex

>
> How is the index  sl_city_etc defined?
>

 Index "public.sl_city_etc"
Column|Type
--+-
 city | text
 listing_type | text
 post_time| timestamp without time zone
 bedrooms | integer
 region   | text
 geo_lat  | integer
 geo_lon  | integer
btree, for table "public.source_listings"

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query performance

2009-07-16 Thread Alex
Thanks.  That's very helpful.  I'll take your suggestions and see if
things improve.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Strange memory behavior with rails - caching in connection?

2009-07-16 Thread Alex
I am using Postgres with Rails.  Each rails application "thread" is
actually a separate process (mongrel) with it's own connection.

Normally, the db connection processes (?) look something like this in
top:

15772 postgres  15   0  229m  13m  12m S0  0.8   0:00.09 postgres:
db db [local]
idle

These quickly grow as the application is used to 50+ Mb per instance.

When I restart mongrel (the Rails application processes) these go back
down to their normal small size.  That makes me suspect this is not
normal caching and there is some sort of unhealthy leak going on.

Is there something Rails could be doing to cause these to grow?  Maybe
the connection is not being cleaned up properly?  Is there some sort
of connection cache going on?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Memory usage of writer process

2009-08-13 Thread Alex
The writer process seems to be using inordinate amounts of memory:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND
11088 postgres  13  -2 3217m 2.9g 2.9g S0 38.7   0:10.46 postgres:
writer process
20190 postgres  13  -2 3219m  71m  68m S0  0.9   0:52.48 postgres:
cribq cribq [local] idle

I am writing moderately large (~3k) records to my database a few times
a second.  Even when I stop doing that, the process continues to take
up all of that memory.

Am I reading this right?  Why is it using so much memory?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Alex Turner
1250/sec with record size average is 26 bytes
800/sec with record size average is 48 bytes. 
250/sec with record size average is 618 bytes.

Data from pg_stats and our own job monitoring

System has four partitions, two raid 1s, a four disk RAID 10 and a six
disk RAID 10.
pg_xlog is on four disk RAID 10, database is on RAID 10.

Data is very spread out because database turnover time is very high,
so our performance is about double this with a fresh DB. (the data
half life is probably measurable in days or weeks).

Alex Turner
netEconomist

On Apr 1, 2005 1:06 PM, Marc G. Fournier <[EMAIL PROTECTED]> wrote:
> 
> Just curious, but does anyone have an idea of what we are capable of?  I
> realize that size of record would affect things, as well as hardware, but
> if anyone has some ideas on max, with 'record size', that would be
> appreciated ...
> 
> Thanks ...
> 
> 
> Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-01 Thread Alex Turner
Oh - this is with a seperate transaction per command.

fsync is on.

Alex Turner
netEconomist

On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> 1250/sec with record size average is 26 bytes
> 800/sec with record size average is 48 bytes.
> 250/sec with record size average is 618 bytes.
> 
> Data from pg_stats and our own job monitoring
> 
> System has four partitions, two raid 1s, a four disk RAID 10 and a six
> disk RAID 10.
> pg_xlog is on four disk RAID 10, database is on RAID 10.
> 
> Data is very spread out because database turnover time is very high,
> so our performance is about double this with a fresh DB. (the data
> half life is probably measurable in days or weeks).
> 
> Alex Turner
> netEconomist
> 
> On Apr 1, 2005 1:06 PM, Marc G. Fournier <[EMAIL PROTECTED]> wrote:
> >
> > Just curious, but does anyone have an idea of what we are capable of?  I
> > realize that size of record would affect things, as well as hardware, but
> > if anyone has some ideas on max, with 'record size', that would be
> > appreciated ...
> >
> > Thanks ...
> >
> > 
> > Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
> > Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 
> > 7615664
> >
> > ---(end of broadcast)---
> > TIP 8: explain analyze is your friend
> >
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner
To be honest, I've yet to run across a SCSI configuration that can
touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
let alone 180MB/sec read or write, which is why we moved _away_ from
SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
badly on RAID 1, RAID 5 and RAID 10.

35MB/sec for a three drive RAID 0 is not bad, it's appalling.  The
hardware manufacturer should be publicly embarassed for this kind of
speed.  A single U320 10k drive can do close to 70MB/sec sustained.

If someone can offer benchmarks to the contrary (particularly in
linux), I would be greatly interested.

Alex Turner
netEconomist

On Mar 29, 2005 8:17 AM, Dave Cramer <[EMAIL PROTECTED]> wrote:
> Yeah, 35Mb per sec is slow for a raid controller, the 3ware mirrored is
> about 50Mb/sec, and striped is about 100
> 
> Dave
> 
> PFC wrote:
> 
> >
> >> With hardware tuning, I am sure we can do better than 35Mb per sec. Also
> >
> >
> > WTF ?
> >
> > My Laptop does 19 MB/s (reading <10 KB files, reiser4) !
> >
> > A recent desktop 7200rpm IDE drive
> > # hdparm -t /dev/hdc1
> > /dev/hdc1:
> >  Timing buffered disk reads:  148 MB in  3.02 seconds =  49.01 MB/sec
> >
> > # ll "DragonBall 001.avi"
> > -r--r--r--1 peufeu   users218M mar  9 20:07 DragonBall
> > 001.avi
> >
> > # time cat "DragonBall 001.avi" >/dev/null
> > real0m4.162s
> > user0m0.020s
> > sys 0m0.510s
> >
> > (the file was not in the cache)
> > => about 52 MB/s (reiser3.6)
> >
> > So, you have a problem with your hardware...
> >
> > ---(end of broadcast)---
> > TIP 7: don't forget to increase your free space map settings
> >
> >
> 
> --
> Dave Cramer
> http://www.postgresintl.com
> 519 939 0336
> ICQ#14675561
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Sustained inserts per sec ... ?

2005-04-04 Thread Alex Turner
Yup, Battery backed, cache enabled.  6 drive RAID 10, and 4 drive RAID
10, and 2xRAID 1.

It's a 3ware 9500S-8MI - not bad for $450 plus BBU.

Alex Turner
netEconomist

On Apr 1, 2005 6:03 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Alex Turner <[EMAIL PROTECTED]> writes:
> > On Apr 1, 2005 4:17 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> >> 1250/sec with record size average is 26 bytes
> >> 800/sec with record size average is 48 bytes.
> >> 250/sec with record size average is 618 bytes.
> 
> > Oh - this is with a seperate transaction per command.
> > fsync is on.
> 
> [ raised eyebrow... ]  What kind of disk hardware is that exactly, and
> does it have write cache enabled?  It's hard to believe those numbers
> if not.
> 
> Write caching is fine if it's done in a battery-backed cache, which you
> can get in the higher-end hardware RAID controllers.  Otherwise you're
> going to have problems whenever the power goes away unexpectedly.
> 
> regards, tom lane
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner
I'm no drive expert, but it seems to me that our write performance is
excellent.  I think what most are concerned about is OLTP where you
are doing heavy write _and_ heavy read performance at the same time.

Our system is mostly read during the day, but we do a full system
update everynight that is all writes, and it's very fast compared to
the smaller SCSI system we moved off of.  Nearly a 6x spead
improvement, as fast as 900 rows/sec with a 48 byte record, one row
per transaction.

I don't know enough about how SATA works to really comment on it's
performance as a protocol compared with SCSI.  If anyone has a usefull
link on that, it would be greatly appreciated.

More drives will give more throughput/sec, but not necesarily more
transactions/sec.  For that you will need more RAM on the controler,
and defaintely a BBU to keep your data safe.

Alex Turner
netEconomist

On Apr 4, 2005 10:39 AM, Steve Poe <[EMAIL PROTECTED]> wrote:
> 
> 
> Alex Turner wrote:
> 
> >To be honest, I've yet to run across a SCSI configuration that can
> >touch the 3ware SATA controllers.  I have yet to see one top 80MB/sec,
> >let alone 180MB/sec read or write, which is why we moved _away_ from
> >SCSI.  I've seen Compaq, Dell and LSI controllers all do pathetically
> >badly on RAID 1, RAID 5 and RAID 10.
> >
> >
> Alex,
> 
> How does the 3ware controller do in heavy writes back to the database?
> It may have been Josh, but someone said that SATA does well with reads
> but not writes. Would not equal amount of SCSI drives outperform SATA?
> I don't want to start a "whose better" war, I am just trying to learn
> here. It would seem the more  drives you could place in a RAID
> configuration, the performance would increase.
> 
> Steve Poe
> 
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-04-04 Thread Alex Turner
I'm doing some research on SATA vs SCSI right now, but to be honest
I'm not turning up much at the protocol level.  Alot of stupid
benchmarks comparing 10k Raptor drives against Top of the line 15k
drives, where usnurprsingly the SCSI drives win but of course cost 4
times as much.  Although even in some, SATA wins, or draws.  I'm
trying to find something more apples to apples. 10k to 10k.

Alex Turner
netEconomist



On Apr 4, 2005 3:23 PM, Vivek Khera <[EMAIL PROTECTED]> wrote:
> 
> On Apr 4, 2005, at 3:12 PM, Alex Turner wrote:
> 
> > Our system is mostly read during the day, but we do a full system
> > update everynight that is all writes, and it's very fast compared to
> > the smaller SCSI system we moved off of.  Nearly a 6x spead
> > improvement, as fast as 900 rows/sec with a 48 byte record, one row
> > per transaction.
> >
> 
> Well, if you're not heavily multitasking, the advantage of SCSI is lost
> on you.
> 
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
It's hardly the same money, the drives are twice as much.

It's all about the controller baby with any kind of dive.  A bad SCSI
controller will give sucky performance too, believe me.  We had a
Compaq Smart Array 5304, and it's performance was _very_ sub par.

If someone has a simple benchmark test database to run, I would be
happy to run it on our hardware here.

Alex Turner

On Apr 6, 2005 3:30 AM, William Yu <[EMAIL PROTECTED]> wrote:
> Alex Turner wrote:
> > I'm no drive expert, but it seems to me that our write performance is
> > excellent.  I think what most are concerned about is OLTP where you
> > are doing heavy write _and_ heavy read performance at the same time.
> >
> > Our system is mostly read during the day, but we do a full system
> > update everynight that is all writes, and it's very fast compared to
> > the smaller SCSI system we moved off of.  Nearly a 6x spead
> > improvement, as fast as 900 rows/sec with a 48 byte record, one row
> > per transaction.
> 
> I've started with SATA in a multi-read/multi-write environment. While it
> ran pretty good with 1 thread writing, the addition of a 2nd thread
> (whether reading or writing) would cause exponential slowdowns.
> 
> I suffered through this for a week and then switched to SCSI. Single
> threaded performance was pretty similar but with the advanced command
> queueing SCSI has, I was able to do multiple reads/writes simultaneously
> with only a small performance hit for each thread.
> 
> Perhaps having a SATA caching raid controller might help this situation.
> I don't know. It's pretty hard justifying buying a $$$ 3ware controller
> just to test it when you could spend the same money on SCSI and have a
> guarantee it'll work good under multi-IO scenarios.
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application ?

2005-04-06 Thread Alex Turner
I think everyone was scared off by the 5000 inserts per second number.

I've never seen even Oracle do this on a top end Dell system with
copious SCSI attached storage.

Alex Turner
netEconomist

On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>  
> Unfortunately. 
>  
> But we are in the the process to choose Postgresql with pgcluster. I'm
> currently running some tests (performance, stability...) 
> Save the money on the license fees, you get it for your hardware ;-) 
>  
> I still welcome any advices or comments and I'll let you know how the
> project is going on. 
>  
> Benjamin. 
>  
>  
>  
>  "Mohan, Ross" <[EMAIL PROTECTED]> 
> 
> 05/04/2005 20:48 
>  
> Pour :<[EMAIL PROTECTED]> 
> cc : 
> Objet :RE: [PERFORM] Postgresql vs SQLserver for this
> application ? 
>  
>  
> You never got answers on this? Apologies, I don't have one, but'd be curious
> to hear about any you did get 
>   
> thx 
>   
> Ross 
> 
> -Original Message-
>  From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf
> Of [EMAIL PROTECTED]
>  Sent: Monday, April 04, 2005 4:02 AM
>  To: pgsql-performance@postgresql.org
>  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
>  
> 
>  hi all. 
>  
>  We are designing a quite big application that requires a high-performance
> database backend. 
>  The rates we need to obtain are at least  5000 inserts per second and 15
> selects per second for one connection. There should only be 3 or 4
> simultaneous connections. 
>  I think our main concern is to deal with the constant flow of data coming
> from the inserts that must be available for selection as fast as possible.
> (kind of real time access ...) 
>  
>  As a consequence, the database should rapidly increase up to more than one
> hundred gigs. We still have to determine how and when we shoud backup old
> data to prevent the application from a performance drop. We intend to
> develop some kind of real-time partionning on our main table keep the flows
> up. 
>  
>  At first, we were planning to use SQL Server as it has features that in my
> opinion could help us a lot : 
> - replication 
> - clustering 
>  
>  Recently we started to study Postgresql as a solution for our project : 
> - it also has replication 
> - Postgis module can handle geographic datatypes (which would
> facilitate our developments) 
> - We do have a strong knowledge on Postgresql administration (we use
> it for production processes) 
> - it is free (!) and we could save money for hardware purchase. 
>  
>  Is SQL server clustering a real asset ? How reliable are Postgresql
> replication tools  ? Should I trust Postgresql performance for this kind of
> needs ? 
>  
>  My question is a bit fuzzy but any advices are most welcome...
> hardware,tuning or design tips as well :)) 
>  
>  Thanks a lot. 
>  
>  Benjamin. 
>  
>  
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ?

2005-04-06 Thread Alex Turner
I guess I was thinking more in the range of 5000 transaction/sec, less
so 5000 rows on bulk import...

Alex

On Apr 6, 2005 12:47 PM, Mohan, Ross <[EMAIL PROTECTED]> wrote:
> 
> 
> 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per 
> second.
> 
> > I'd love to see PG get into this range..i am a big fan of PG (just a
> > rank newbie) but I gotta think the underlying code to do this has to
> > be not-too-complex.
> 
> I'd say we're there.
> 
> ||Yes! PG is there, assuredly!   So VERY cool!  I made a 
> newbie
> error of conflating COPY with INSERT. I don't know if I could get
> oracle to do much more than about 500-1500 rows/sec...PG is quite 
> impressive.
> 
> Makes one wonder why corporations positively insist on giving oracle
>  yearly. 
> 
> -Original Message-
> From: Rod Taylor [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 06, 2005 12:41 PM
> To: Mohan, Ross
> Cc: pgsql-performance@postgresql.org
> Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for thisapplication ?
> 
> On Wed, 2005-04-06 at 16:12 +, Mohan, Ross wrote:
> > I wish I had a Dell system and run case to show you Alex, but I
> > don't... however...using Oracle's "direct path" feature, it's pretty
> > straightforward.
> >
> > We've done 110,000 rows per second into index-less tables on a big
> > system (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a
> > second. Sustained for almost 9 minutes. )
> 
> Just for kicks I did a local test on a desktop machine (single CPU, single 
> IDE drive) using COPY from STDIN for a set of integers in via a single 
> transaction, no indexes.
> 
> 1572864 tuples were loaded in 13715.613ms, which is approx 115k rows per 
> second.
> 
> Okay, no checkpoints and I didn't cross an index boundary, but I also haven't 
> tuned the config file beyond bumping up the buffers.
> 
> Lets try again with more data this time.
> 
> 31Million tuples were loaded in approx 279 seconds, or approx 112k rows per 
> second.
> 
> > I'd love to see PG get into this range..i am a big fan of PG (just a
> > rank newbie) but I gotta think the underlying code to do this has to
> > be not-too-complex.
> 
> I'd say we're there.
> 
> > -Original Message-
> > From: Alex Turner [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, April 06, 2005 11:38 AM
> > To: [EMAIL PROTECTED]
> > Cc: pgsql-performance@postgresql.org; Mohan, Ross
> > Subject: Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this 
> > application ?
> >
> >
> > I think everyone was scared off by the 5000 inserts per second number.
> >
> > I've never seen even Oracle do this on a top end Dell system with
> > copious SCSI attached storage.
> >
> > Alex Turner
> > netEconomist
> >
> > On Apr 6, 2005 3:17 AM, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > >
> > > Unfortunately.
> > >
> > > But we are in the the process to choose Postgresql with pgcluster.
> > > I'm
> > > currently running some tests (performance, stability...) Save the
> > > money on the license fees, you get it for your hardware ;-)
> > >
> > > I still welcome any advices or comments and I'll let you know how
> > > the
> > > project is going on.
> > >
> > > Benjamin.
> > >
> > >
> > >
> > >  "Mohan, Ross" <[EMAIL PROTECTED]>
> > >
> > > 05/04/2005 20:48
> > >
> > > Pour :<[EMAIL PROTECTED]>
> > > cc :
> > > Objet :RE: [PERFORM] Postgresql vs SQLserver for this
> > > application ?
> > >
> > >
> > > You never got answers on this? Apologies, I don't have one, but'd be
> > > curious to hear about any you did get
> > >
> > > thx
> > >
> > > Ross
> > >
> > > -Original Message-
> > >  From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of
> > > [EMAIL PROTECTED]
> > >  Sent: Monday, April 04, 2005 4:02 AM
> > >  To: pgsql-performance@postgresql.org
> > >  Subject: [PERFORM] Postgresql vs SQLserver for this application ?
> > >
> > >
> > >  hi all.
> > >
> > >  We are designing a quite big application that requires a
> > > high-performance database backend.  The rates we need to obtain are

Re: [PERFORM] Réf

2005-04-06 Thread Alex Turner
I think his point was that 9 * 4 != 2400

Alex Turner
netEconomist

On Apr 6, 2005 2:23 PM, Rod Taylor <[EMAIL PROTECTED]> wrote:
> On Wed, 2005-04-06 at 19:42 +0200, Steinar H. Gunderson wrote:
> > On Wed, Apr 06, 2005 at 01:18:29PM -0400, Rod Taylor wrote:
> > > Yeah, I think that can be done provided there is more than one worker.
> > > My limit seems to be about 1000 transactions per second each with a
> > > single insert for a single process (round trip time down the Fibre
> > > Channel is large) but running 4 simultaneously only drops throughput to
> > > about 900 per process (total of 2400 transactions per second) and the
> > > machine still seemed to have lots of oomph to spare.
> >
> > Erm, have I missed something here? 900 * 4 = 2400?
> 
> Nope. You've not missed anything.
> 
> If I ran 10 processes and the requirement would be met.
> --
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
>

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
Well - unfortuantely software RAID isn't appropriate for everyone, and
some of us need a hardware RAID controller.  The LSI Megaraid 320-2
card is almost exactly the same price as the 3ware 9500S-12 card
(although I will conceed that a 320-2 card can handle at most 2x14
devices compare with the 12 on the 9500S).

If someone can come up with a test, I will be happy to run it and see
how it goes.  I would be _very_ interested in the results having just
spent $7k on a new DB server!!

I have also seen really bad performance out of SATA.  It was with
either an on-board controller, or a cheap RAID controller from
HighPoint.  As soon as I put in a decent controller, things went much
better.  I think it's unfair to base your opinion of SATA from a test
that had a poor controler.

I know I'm not the only one here running SATA RAID and being very
satisfied with the results.

Thanks,

Alex Turner
netEconomist

On Apr 6, 2005 4:01 PM, William Yu <[EMAIL PROTECTED]> wrote:
> It's the same money if you factor in the 3ware controller. Even without
> a caching controller, SCSI works good in multi-threaded IO (not
> withstanding crappy shit from Dell or Compaq). You can get such cards
> from LSI for $75. And of course, many server MBs come with LSI
> controllers built-in. Our older 32-bit production servers all use Linux
> software RAID w/ SCSI and there's no issues when multiple
> users/processes hit the DB.
> 
> *Maybe* a 3ware controller w/ onboard cache + battery backup might do
> much better for multi-threaded IO than just plain-jane SATA.
> Unfortunately, I have not been able to find anything online that can
> confirm or deny this. Hence, the choice is spend $$$ on the 3ware
> controller and hope it meets your needs -- or spend $$$ on SCSI drives
> and be sure.
> 
> Now if you want to run such tests, we'd all be delighted with to see the
> results so we have another option for building servers.
> 
> 
> Alex Turner wrote:
> > It's hardly the same money, the drives are twice as much.
> >
> > It's all about the controller baby with any kind of dive.  A bad SCSI
> > controller will give sucky performance too, believe me.  We had a
> > Compaq Smart Array 5304, and it's performance was _very_ sub par.
> >
> > If someone has a simple benchmark test database to run, I would be
> > happy to run it on our hardware here.
> >
> > Alex Turner
> >
> > On Apr 6, 2005 3:30 AM, William Yu <[EMAIL PROTECTED]> wrote:
> >
> >>Alex Turner wrote:
> >>
> >>>I'm no drive expert, but it seems to me that our write performance is
> >>>excellent.  I think what most are concerned about is OLTP where you
> >>>are doing heavy write _and_ heavy read performance at the same time.
> >>>
> >>>Our system is mostly read during the day, but we do a full system
> >>>update everynight that is all writes, and it's very fast compared to
> >>>the smaller SCSI system we moved off of.  Nearly a 6x spead
> >>>improvement, as fast as 900 rows/sec with a 48 byte record, one row
> >>>per transaction.
> >>
> >>I've started with SATA in a multi-read/multi-write environment. While it
> >>ran pretty good with 1 thread writing, the addition of a 2nd thread
> >>(whether reading or writing) would cause exponential slowdowns.
> >>
> >>I suffered through this for a week and then switched to SCSI. Single
> >>threaded performance was pretty similar but with the advanced command
> >>queueing SCSI has, I was able to do multiple reads/writes simultaneously
> >>with only a small performance hit for each thread.
> >>
> >>Perhaps having a SATA caching raid controller might help this situation.
> >>I don't know. It's pretty hard justifying buying a $$$ 3ware controller
> >>just to test it when you could spend the same money on SCSI and have a
> >>guarantee it'll work good under multi-IO scenarios.
> >>
> >>---(end of broadcast)---
> >>TIP 8: explain analyze is your friend
> >>
> >
> >
> > ---(end of broadcast)---
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> >
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
I guess I'm setting myself up here, and I'm really not being ignorant,
but can someone explain exactly how is SCSI is supposed to better than
SATA?

Both systems use drives with platters.  Each drive can physically only
read one thing at a time.

SATA gives each drive it's own channel, but you have to share in SCSI.
 A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
SCSI can only do 320MB/sec across the entire array.

What am I missing here?

Alex Turner
netEconomist

On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> Sorry if I'm pointing out the obvious here, but it seems worth
> mentioning. AFAIK all 3ware controllers are setup so that each SATA
> drive gets it's own SATA bus. My understanding is that by and large,
> SATA still suffers from a general inability to have multiple outstanding
> commands on the bus at once, unlike SCSI. Therefore, to get good
> performance out of SATA you need to have a seperate bus for each drive.
> Theoretically, it shouldn't really matter that it's SATA over ATA, other
> than I certainly wouldn't want to try and cram 8 ATA cables into a
> machine...
> 
> Incidentally, when we were investigating storage options at a previous
> job we talked to someone who deals with RS/6000 storage. He had a bunch
> of info about their serial controller protocol (which I can't think of
> the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
> saturating even a 160MB SCSI bus with only 2 or 3 drives.
> 
> People are finally realizing how important bandwidth has become in
> modern machines. Memory bandwidth is why RS/6000 was (and maybe still
> is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
> the water. Likewise it's why SCSI is so much better than IDE (unless you
> just give each drive it's own dedicated bandwidth).
> --
> Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
Ok - so I found this fairly good online review of various SATA cards
out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10.

http://www.tweakers.net/reviews/557/

Very interesting stuff.

Alex Turner
netEconomist

On Apr 6, 2005 7:32 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> I guess I'm setting myself up here, and I'm really not being ignorant,
> but can someone explain exactly how is SCSI is supposed to better than
> SATA?
> 
> Both systems use drives with platters.  Each drive can physically only
> read one thing at a time.
> 
> SATA gives each drive it's own channel, but you have to share in SCSI.
>  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> SCSI can only do 320MB/sec across the entire array.
> 
> What am I missing here?
> 
> Alex Turner
> netEconomist
> 
> On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> > Sorry if I'm pointing out the obvious here, but it seems worth
> > mentioning. AFAIK all 3ware controllers are setup so that each SATA
> > drive gets it's own SATA bus. My understanding is that by and large,
> > SATA still suffers from a general inability to have multiple outstanding
> > commands on the bus at once, unlike SCSI. Therefore, to get good
> > performance out of SATA you need to have a seperate bus for each drive.
> > Theoretically, it shouldn't really matter that it's SATA over ATA, other
> > than I certainly wouldn't want to try and cram 8 ATA cables into a
> > machine...
> >
> > Incidentally, when we were investigating storage options at a previous
> > job we talked to someone who deals with RS/6000 storage. He had a bunch
> > of info about their serial controller protocol (which I can't think of
> > the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
> > saturating even a 160MB SCSI bus with only 2 or 3 drives.
> >
> > People are finally realizing how important bandwidth has become in
> > modern machines. Memory bandwidth is why RS/6000 was (and maybe still
> > is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
> > the water. Likewise it's why SCSI is so much better than IDE (unless you
> > just give each drive it's own dedicated bandwidth).
> > --
> > Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
> > Give your computer some brain candy! www.distributed.net Team #1828
> >
> > Windows: "Where do you want to go today?"
> > Linux: "Where do you want to go tomorrow?"
> > FreeBSD: "Are you guys coming, or what?"
> >
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
Ok - I take it back - I'm reading through this now, and realising that
the reviews are pretty clueless in several places...


On Apr 6, 2005 8:12 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> Ok - so I found this fairly good online review of various SATA cards
> out there, with 3ware not doing too hot on RAID 5, but ok on RAID 10.
> 
> http://www.tweakers.net/reviews/557/
> 
> Very interesting stuff.
> 
> Alex Turner
> netEconomist
> 
> On Apr 6, 2005 7:32 PM, Alex Turner <[EMAIL PROTECTED]> wrote:
> > I guess I'm setting myself up here, and I'm really not being ignorant,
> > but can someone explain exactly how is SCSI is supposed to better than
> > SATA?
> >
> > Both systems use drives with platters.  Each drive can physically only
> > read one thing at a time.
> >
> > SATA gives each drive it's own channel, but you have to share in SCSI.
> >  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> > SCSI can only do 320MB/sec across the entire array.
> >
> > What am I missing here?
> >
> > Alex Turner
> > netEconomist
> >
> > On Apr 6, 2005 5:41 PM, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> > > Sorry if I'm pointing out the obvious here, but it seems worth
> > > mentioning. AFAIK all 3ware controllers are setup so that each SATA
> > > drive gets it's own SATA bus. My understanding is that by and large,
> > > SATA still suffers from a general inability to have multiple outstanding
> > > commands on the bus at once, unlike SCSI. Therefore, to get good
> > > performance out of SATA you need to have a seperate bus for each drive.
> > > Theoretically, it shouldn't really matter that it's SATA over ATA, other
> > > than I certainly wouldn't want to try and cram 8 ATA cables into a
> > > machine...
> > >
> > > Incidentally, when we were investigating storage options at a previous
> > > job we talked to someone who deals with RS/6000 storage. He had a bunch
> > > of info about their serial controller protocol (which I can't think of
> > > the name of) vs SCSI. SCSI had a lot more overhead, so you could end up
> > > saturating even a 160MB SCSI bus with only 2 or 3 drives.
> > >
> > > People are finally realizing how important bandwidth has become in
> > > modern machines. Memory bandwidth is why RS/6000 was (and maybe still
> > > is) cleaning Sun's clock, and it's why the Opteron blows Itaniums out of
> > > the water. Likewise it's why SCSI is so much better than IDE (unless you
> > > just give each drive it's own dedicated bandwidth).
> > > --
> > > Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
> > > Give your computer some brain candy! www.distributed.net Team #1828
> > >
> > > Windows: "Where do you want to go today?"
> > > Linux: "Where do you want to go tomorrow?"
> > > FreeBSD: "Are you guys coming, or what?"
> > >
> >
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to improve db performance with $7K?

2005-04-06 Thread Alex Turner
Yeah - the more reading I'm doing - the more I'm finding out.

Alledgelly the Western Digial Raptor drives implement a version of
ATA-4 Tagged Queing which allows reordering of commands.  Some
controllers support this.  The 3ware docs say that the controller
support both reordering on the controller and to the drive. *shrug*

This of course is all supposed to go away with SATA II which as NCQ,
Native Command Queueing.  Of course the 3ware controllers don't
support SATA II, but a few other do, and I'm sure 3ware will come out
with a controller that does.

Alex Turner
netEconomist

On 06 Apr 2005 23:00:54 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:
> 
> Alex Turner <[EMAIL PROTECTED]> writes:
> 
> > SATA gives each drive it's own channel, but you have to share in SCSI.
> >  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> > SCSI can only do 320MB/sec across the entire array.
> 
> SCSI controllers often have separate channels for each device too.
> 
> In any case the issue with the IDE protocol is that fundamentally you can only
> have a single command pending. SCSI can have many commands pending. This is
> especially important for a database like postgres that may be busy committing
> one transaction while another is trying to read. Having several commands
> queued on the drive gives it a chance to execute any that are "on the way" to
> the committing transaction.
> 
> However I'm under the impression that 3ware has largely solved this problem.
> Also, if you save a few dollars and can afford one additional drive that
> additional drive may improve your array speed enough to overcome that
> inefficiency.
> 
> --
> greg
> 
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Alex Turner
Based on the reading I'm doing, and somebody please correct me if I'm
wrong, it seems that SCSI drives contain an on disk controller that
has to process the tagged queue.  SATA-I doesn't have this.  This
additional controller, is basicaly an on board computer that figures
out the best order in which to process commands.  I believe you are
also paying for the increased tolerance that generates a better speed.
 If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM
to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one
represents a step up in parts and technology, thereby generating a
cost increase (at least thats what the manufactures tell us).  I know
if you ever held a 15k drive in your hand, you can notice a
considerable weight difference between it and a 7200RPM IDE drive.

Alex Turner
netEconomist

On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> Another simple question: Why is SCSI more expensive?  After the
> eleventy-millionth controller is made, it seems like SCSI and SATA are
> using a controller board and a spinning disk.  Is somebody still making
> money by licensing SCSI technology?
> 
> Rick
> 
> [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:
> 
> > You asked for it!  ;-)
> >
> > If you want cheap, get SATA.  If you want fast under
> > *load* conditions, get SCSI.  Everything else at this
> > time is marketing hype, either intentional or learned.
> > Ignoring dollars, expect to see SCSI beat SATA by 40%.
> >
> >  * * * What I tell you three times is true * * *
> >
> > Also, compare the warranty you get with any SATA
> > drive with any SCSI drive.  Yes, you still have some
> > change leftover to buy more SATA drives when they
> > fail, but... it fundamentally comes down to some
> > actual implementation and not what is printed on
> > the cardboard box.  Disk systems are bound by the
> > rules of queueing theory.  You can hit the sales rep
> > over the head with your queueing theory book.
> >
> > Ultra320 SCSI is king of the hill for high concurrency
> > databases.  If you're only streaming or serving files,
> > save some money and get a bunch of SATA drives.
> > But if you're reading/writing all over the disk, the
> > simple first-come-first-serve SATA heuristic will
> > hose your performance under load conditions.
> >
> > Next year, they will *try* bring out some SATA cards
> > that improve on first-come-first-serve, but they ain't
> > here now.  There are a lot of rigged performance tests
> > out there...  Maybe by the time they fix the queueing
> > problems, serial Attached SCSI (a/k/a SAS) will be out.
> > Looks like Ultra320 is the end of the line for parallel
> > SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
> > water.
> >
> > Ultra320 SCSI.
> > Ultra320 SCSI.
> > Ultra320 SCSI.
> >
> > Serial Attached SCSI.
> > Serial Attached SCSI.
> > Serial Attached SCSI.
> >
> > For future trends, see:
> > http://www.incits.org/archive/2003/in031163/in031163.htm
> >
> > douglas
> >
> > p.s. For extra credit, try comparing SATA and SCSI drives
> > when they're 90% full.
> >
> > On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
> >
> > > I guess I'm setting myself up here, and I'm really not being ignorant,
> > > but can someone explain exactly how is SCSI is supposed to better than
> > > SATA?
> > >
> > > Both systems use drives with platters.  Each drive can physically only
> > > read one thing at a time.
> > >
> > > SATA gives each drive it's own channel, but you have to share in SCSI.
> > >  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> > > SCSI can only do 320MB/sec across the entire array.
> > >
> > > What am I missing here?
> > >
> > > Alex Turner
> > > netEconomist
> >
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if
> your
> >   joining column's datatypes do not match
> 
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] profiling postgresql queries?

2005-04-12 Thread Alex Turner
Speaking of triggers...

Is there any plan to speed up plpgsql tiggers?  Fairly simple
crosstable insert triggers seem to slow my inserts to a crawl.

Is the best thing just to write triggers in C (I really don't want to
put this stuff in the application logic because it really doesn't
belong there).

Alex Turner
netEconomist

On Apr 12, 2005 10:10 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> hubert lubaczewski <[EMAIL PROTECTED]> writes:
> > and it made me wonder - is there a way to tell how much time of backend
> > was spent on triggers, index updates and so on?
> 
> In CVS tip, EXPLAIN ANALYZE will break out the time spent in each
> trigger.  This is not in any released version, but if you're desperate
> you could load up a play server with your data and test.
> 
> regards, tom lane
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner
I have read a large chunk of this, and I would highly recommend it to
anyone who has been participating in the drive discussions.  It is
most informative!!

Alex Turner
netEconomist

On 4/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Greg,
> 
> I posted this link under a different thread (the $7k server thread).  It is
> a very good read on why SCSI is better for servers than ATA.  I didn't note
> bias, though it is from a drive manufacturer.  YMMV.  There is an
> interesting, though dated appendix on different manufacturers' drive
> characteristics.
> 
> http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
> 
> Enjoy,
> 
> Rick
> 
> [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM:
> 
> >
> > Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller
> > instead of the 3ware one.
> >
> > Poking around it seems this does come with Linux drivers and there is a
> > battery backup option. So it doesn't seem to be completely insane.
> >
> > Anyone have any experience with these controllers?
> >
> > I'm also wondering about whether I'm better off with one of these SATA
> raid
> > controllers or just going with SCSI drives.
> >
> > --
> > greg
> >
> >
> > ---(end of broadcast)---
> > TIP 8: explain analyze is your friend
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner
I have put together a little head to head performance of a 15k SCSI,
10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
comparison at storage review

http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devID_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5=248&devCnt=6

It does illustrate some of the weaknesses of SATA drives, but all in
all the Raptor drives put on a good show.

Alex Turner
netEconomist

On 4/14/05, Alex Turner <[EMAIL PROTECTED]> wrote:
> I have read a large chunk of this, and I would highly recommend it to
> anyone who has been participating in the drive discussions.  It is
> most informative!!
> 
> Alex Turner
> netEconomist
> 
> On 4/14/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > Greg,
> >
> > I posted this link under a different thread (the $7k server thread).  It is
> > a very good read on why SCSI is better for servers than ATA.  I didn't note
> > bias, though it is from a drive manufacturer.  YMMV.  There is an
> > interesting, though dated appendix on different manufacturers' drive
> > characteristics.
> >
> > http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf
> >
> > Enjoy,
> >
> > Rick
> >
> > [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM:
> >
> > >
> > > Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller
> > > instead of the 3ware one.
> > >
> > > Poking around it seems this does come with Linux drivers and there is a
> > > battery backup option. So it doesn't seem to be completely insane.
> > >
> > > Anyone have any experience with these controllers?
> > >
> > > I'm also wondering about whether I'm better off with one of these SATA
> > raid
> > > controllers or just going with SCSI drives.
> > >
> > > --
> > > greg
> > >
> > >
> > > ---(end of broadcast)---
> > > TIP 8: explain analyze is your friend
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >   joining column's datatypes do not match
> >
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner
Just to clarify these are tests from http://www.storagereview.com, not
my own.  I guess they couldn't get number for those parts.  I think
everyone understands that a 0ms seek time impossible, and indicates a
missing data point.

Thanks,

Alex Turner
netEconomist

On 4/14/05, Dave Held <[EMAIL PROTECTED]> wrote:
> > -Original Message-----
> > From: Alex Turner [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, April 14, 2005 12:14 PM
> > To: [EMAIL PROTECTED]
> > Cc: Greg Stark; pgsql-performance@postgresql.org;
> > [EMAIL PROTECTED]
> > Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
> >
> >
> > I have put together a little head to head performance of a 15k SCSI,
> > 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
> > comparison at storage review
> >
> > http://www.storagereview.com/php/benchmark/compare_rtg_2001.ph
> > p?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devI
> > D_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5
> > =248&devCnt=6
> >
> > It does illustrate some of the weaknesses of SATA drives, but all in
> > all the Raptor drives put on a good show.
> > [...]
> 
> I think it's a little misleading that your tests show 0ms seek times
> for some of the write tests.  The environmental test also selects a
> missing data point as the winner.  Besides that, it seems to me that
> seek time is one of the most important features for a DB server, which
> means that the SCSI drives are the clear winners and the non-WD SATA
> drives are the embarrassing losers.  Transfer rate is import, but
> perhaps less so because DBs tend to read/write small blocks rather
> than large files.  On the server suite, which seems to me to be the
> most relevant for DBs, the Atlas 15k spanks the other drives by a
> fairly large margin (especially the lesser SATA drives).  When you
> ignore the "consumer app" benchmarks, I wouldn't be so confident in
> saying that the Raptors "put on a good show".
> 
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-14 Thread Alex Turner
Looking at the numbers, the raptor with TCQ enabled was close or beat
the Atlas III 10k drive on most benchmarks.

Naturaly a 15k drive is going to be faster in many areas, but it is
also much more expensive.  It was only 44% better on the server tests
than the raptor with TCQ, but it costs nearly 300% more ($538 cdw.com,
$180 newegg.com).  Note also that the 15k drive was the only drive
that kept up with the raptor on raw transfer speed, which is going to
matter for WAL.

For those of us on a budget, a quality controller card with lots of
RAM is going to be our biggest friend because it can cache writes, and
improve performance.  The 3ware controllers seem to be universally
benchmarked as the best SATA RAID 10 controllers where database
performance is concerned.  Even the crappy tweakers.net review had the
3ware as the fastest controller for a MySQL data partition in RAID 10.

The Raptor drives can be had for as little as $180/ea, which is quite
a good price point considering they can keep up with their SCSI 10k
RPM counterparts on almost all tests with NCQ enabled (Note that 3ware
controllers _don't_ support NCQ, although they claim their HBA based
queueing is 95% as good as NCQ on the drive).

Alex Turner
netEconomist

On 4/14/05, Dave Held <[EMAIL PROTECTED]> wrote:
> > -Original Message-----
> > From: Alex Turner [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, April 14, 2005 12:14 PM
> > To: [EMAIL PROTECTED]
> > Cc: Greg Stark; pgsql-performance@postgresql.org;
> > [EMAIL PROTECTED]
> > Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
> >
> >
> > I have put together a little head to head performance of a 15k SCSI,
> > 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive
> > comparison at storage review
> >
> > http://www.storagereview.com/php/benchmark/compare_rtg_2001.ph
> > p?typeID=10&testbedID=3&osID=4&raidconfigID=1&numDrives=1&devI
> > D_0=232&devID_1=40&devID_2=259&devID_3=267&devID_4=261&devID_5
> > =248&devCnt=6
> >
> > It does illustrate some of the weaknesses of SATA drives, but all in
> > all the Raptor drives put on a good show.
> > [...]
> 
> I think it's a little misleading that your tests show 0ms seek times
> for some of the write tests.  The environmental test also selects a
> missing data point as the winner.  Besides that, it seems to me that
> seek time is one of the most important features for a DB server, which
> means that the SCSI drives are the clear winners and the non-WD SATA
> drives are the embarrassing losers.  Transfer rate is import, but
> perhaps less so because DBs tend to read/write small blocks rather
> than large files.  On the server suite, which seems to me to be the
> most relevant for DBs, the Atlas 15k spanks the other drives by a
> fairly large margin (especially the lesser SATA drives).  When you
> ignore the "consumer app" benchmarks, I wouldn't be so confident in
> saying that the Raptors "put on a good show".
> 
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-04-14 Thread Alex Turner
3ware claim that their 'software' implemented command queueing
performs at 95% effectiveness compared to the hardware queueing on a
SCSI drive, so I would say that they agree with you.

I'm still learning, but as I read it, the bits are split across the
platters and there is only 'one' head, but happens to be reading from
multiple platters.  The 'further' in linear distance the data is from
the current position, the longer it's going to take to get there. 
This seems to be true based on a document that was circulated.  A hard
drive takes considerable amount of time to 'find' a track on the
platter compared to the rotational speed, which would agree with the
fact that you can read 70MB/sec, but it takes up to 13ms to seek.

the ATA protocol is just how the HBA communicates with the drive,
there is no reason why the HBA can't reschedule reads and writes just
the like SCSI drive would do natively, and this is what infact 3ware
claims.  I get the feeling based on my own historical experience that
generaly drives don't just have a bunch of bad blocks.  This all leads
me to believe that you can predict with pretty good accuracy how
expensive it is to retrieve a given block knowing it's linear
increment.

Alex Turner
netEconomist

On 4/14/05, Kevin Brown <[EMAIL PROTECTED]> wrote:
> Tom Lane wrote:
> > Kevin Brown <[EMAIL PROTECTED]> writes:
> > > I really don't see how this is any different between a system that has
> > > tagged queueing to the disks and one that doesn't.  The only
> > > difference is where the queueing happens.  In the case of SCSI, the
> > > queueing happens on the disks (or at least on the controller).  In the
> > > case of SATA, the queueing happens in the kernel.
> >
> > That's basically what it comes down to: SCSI lets the disk drive itself
> > do the low-level I/O scheduling whereas the ATA spec prevents the drive
> > from doing so (unless it cheats, ie, caches writes).  Also, in SCSI it's
> > possible for the drive to rearrange reads as well as writes --- which
> > AFAICS is just not possible in ATA.  (Maybe in the newest spec...)
> >
> > The reason this is so much more of a win than it was when ATA was
> > designed is that in modern drives the kernel has very little clue about
> > the physical geometry of the disk.  Variable-size tracks, bad-block
> > sparing, and stuff like that make for a very hard-to-predict mapping
> > from linear sector addresses to actual disk locations.
> 
> Yeah, but it's not clear to me, at least, that this is a first-order
> consideration.  A second-order consideration, sure, I'll grant that.
> 
> What I mean is that when it comes to scheduling disk activity,
> knowledge of the specific physical geometry of the disk isn't really
> important.  What's important is whether or not the disk conforms to a
> certain set of expectations.  Namely, that the general organization is
> such that addressing the blocks in block number order guarantees
> maximum throughput.
> 
> Now, bad block remapping destroys that guarantee, but unless you've
> got a LOT of bad blocks, it shouldn't destroy your performance, right?
> 
> > Combine that with the fact that the drive controller can be much
> > smarter than it was twenty years ago, and you can see that the case
> > for doing I/O scheduling in the kernel and not in the drive is
> > pretty weak.
> 
> Well, I certainly grant that allowing the controller to do the I/O
> scheduling is faster than having the kernel do it, as long as it can
> handle insertion of new requests into the list while it's in the
> middle of executing a request.  The most obvious case is when the head
> is in motion and the new request can be satisfied by reading from the
> media between where the head is at the time of the new request and
> where the head is being moved to.
> 
> My argument is that a sufficiently smart kernel scheduler *should*
> yield performance results that are reasonably close to what you can
> get with that feature.  Perhaps not quite as good, but reasonably
> close.  It shouldn't be an orders-of-magnitude type difference.
> 
> --
> Kevin Brown   [EMAIL PROTECTED]
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner
No offense to that review, but it was really wasn't that good, and
drew bad conclusions from the data.  I posted it originaly and
immediately regretted it.

See http://www.tweakers.net/reviews/557/18

Amazingly the controller with 1Gig cache manages a write throughput of
750MB/sec on a single drive.

quote:
"Floating high above the crowd, the ARC-1120 has a perfect view on the
struggles of the other adapters. "

It's because the adapter has 1Gig of RAM, nothing to do with the RAID
architecture, it's clearly caching the entire dataset.  The drive
can't physicaly run that fast.  These guys really don't know what they
are doing.

Curiously:
http://www.tweakers.net/reviews/557/25

The 3ware does very well as a data drive for MySQL.

The size of your cache is going to _directly_ affect RAID 5
performance.  Put a gig of memory in a 3ware 9500S and benchmark it
against the Areca then.

Also - folks don't run data paritions on RAID 5 because the write
speed is too low.  When you look at the results for RAID 10, the 3ware
leads the pack.

See also:
http://www20.tomshardware.com/storage/20041227/areca-raid6-06.html

I trust toms hardware a little more to set up a good review to be honest.

The 3ware trounces the Areca in all IO/sec test.

Alex Turner
netEconomist

On 4/15/05, Marinos Yannikos <[EMAIL PROTECTED]> wrote:
> Joshua D. Drake wrote:
> > Well I have never even heard of it. 3ware is the defacto authority of
> > reasonable SATA RAID.
> 
> no! 3ware was rather early in this business, but there are plenty of
> (IMHO, and some other people's opinion) better alternatives available.
> 3ware has good Linux drivers, but the performance of their current
> controllers isn't that good.
> 
> Have a look at this: http://www.tweakers.net/reviews/557/1
> 
> especially the sequential writes with RAID-5 on this page:
> 
> http://www.tweakers.net/reviews/557/19
> 
> We have been a long-time user of a 3ware 8506 controller (8 disks,
> RAID-5) and have purchased 2 Areca ARC-1120 now since we weren't
> satisfied with the performance and the 2TB per array limit...
> 
> -mjy
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner
The original thread was how much can I get for $7k

You can't fit a 15k RPM SCSI solution into $7K ;)  Some of us are on a budget!

10k RPM SATA drives give acceptable performance at a good price, thats
really the point here.

I have never really argued that SATA is going to match SCSI
performance on multidrive arrays for IO/sec.  But it's all about the
benjamins baby.  If I told my boss we need $25k for a database
machine, he'd tell me that was impossible, and I have $5k to do it. 
If I tell him $7k - he will swallow that.  We don't _need_ the amazing
performance of a 15k RPM drive config.  Our biggest hit is reads, so
we can buy 3xSATA machines and load balance.  It's all about the
application, and buying what is appropriate.  I don't buy a Corvette
if all I need is a malibu.

Alex Turner
netEconomist

On 4/15/05, Dave Held <[EMAIL PROTECTED]> wrote:
> > -Original Message-
> > From: Alex Turner [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, April 14, 2005 6:15 PM
> > To: Dave Held
> > Cc: pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] Intel SRCS16 SATA raid?
> >
> > Looking at the numbers, the raptor with TCQ enabled was close or
> > beat the Atlas III 10k drive on most benchmarks.
> 
> And I would be willing to bet that the Atlas 10k is not using the
> same generation of technology as the Raptors.
> 
> > Naturaly a 15k drive is going to be faster in many areas, but it
> > is also much more expensive.  It was only 44% better on the server
> > tests than the raptor with TCQ, but it costs nearly 300% more ($538
> > cdw.com, $180 newegg.com).
> 
> State that in terms of cars.  Would you be willing to pay 300% more
> for a car that is 44% faster than your competitor's?  Of course you
> would, because we all recognize that the cost of speed/performance
> does not scale linearly.  Naturally, you buy the best speed that you
> can afford, but when it comes to hard drives, the only major feature
> whose price tends to scale anywhere close to linearly is capacity.
> 
> > Note also that the 15k drive was the only drive that kept up with
> > the raptor on raw transfer speed, which is going to matter for WAL.
> 
> So get a Raptor for your WAL partition. ;)
> 
> > [...]
> > The Raptor drives can be had for as little as $180/ea, which is
> > quite a good price point considering they can keep up with their
> > SCSI 10k RPM counterparts on almost all tests with NCQ enabled
> > (Note that 3ware controllers _don't_ support NCQ, although they
> > claim their HBA based queueing is 95% as good as NCQ on the drive).
> 
> Just keep in mind the points made by the Seagate article.  You're
> buying much more than just performance for that $500+.  You're also
> buying vibrational tolerance, high MTBF, better internal
> environmental controls, and a pretty significant margin on seek time,
> which is probably your most important feature for disks storing tables.
> An interesting test would be to stick several drives in a cabinet and
> graph how performance is affected at the different price points/
> technologies/number of drives.
> 
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
>

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Intel SRCS16 SATA raid?

2005-04-15 Thread Alex Turner
I stand corrected!

Maybe I should re-evaluate our own config!

Alex T

(The dell PERC controllers do pretty much suck on linux)

On 4/15/05, Vivek Khera <[EMAIL PROTECTED]> wrote:
> 
> On Apr 15, 2005, at 11:01 AM, Alex Turner wrote:
> 
> > You can't fit a 15k RPM SCSI solution into $7K ;)  Some of us are on a
> > budget!
> >
> 
> I just bought a pair of Dual Opteron, 4GB RAM, LSI 320-2X RAID dual
> channel with 8 36GB 15kRPM seagate drives.  Each one of these boxes set
> me back just over $7k, including onsite warrantee.
> 
> They totally blow away the Dell Dual XEON with external 14 disk RAID
> (also 15kRPM drives, manufacturer unknown) which also has 4GB RAM and a
> Dell PERC 3/DC controller, the whole of which set me back over $15k.
> 
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
> 
> 
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
This is fundamentaly untrue.

A mirror is still a mirror.  At most in a RAID 10 you can have two
simultaneous seeks.  You are always going to be limited by the seek
time of your drives.  It's a stripe, so you have to read from all
members of the stripe to get data, requiring all drives to seek. 
There is no advantage to seek time in adding more drives.  By adding
more drives you can increase throughput, but the max throughput of the
PCI-X bus isn't that high (I think around 400MB/sec)  You can easily
get this with a six or seven drive RAID 5, or a ten drive RAID 10.  At
that point you start having to factor in the cost of a bigger chassis
to hold more drives, which can be big bucks.

Alex Turner
netEconomist

On 18 Apr 2005 10:59:05 -0400, Greg Stark <[EMAIL PROTECTED]> wrote:
> 
> William Yu <[EMAIL PROTECTED]> writes:
> 
> > Using the above prices for a fixed budget for RAID-10, you could get:
> >
> > SATA 7200 -- 680MB per $1000
> > SATA 10K  -- 200MB per $1000
> > SCSI 10K  -- 125MB per $1000
> 
> What a lot of these analyses miss is that cheaper == faster because cheaper
> means you can buy more spindles for the same price. I'm assuming you picked
> equal sized drives to compare so that 200MB/$1000 for SATA is almost twice as
> many spindles as the 125MB/$1000. That means it would have almost double the
> bandwidth. And the 7200 RPM case would have more than 5x the bandwidth.
> 
> While 10k RPM drives have lower seek times, and SCSI drives have a natural
> seek time advantage, under load a RAID array with fewer spindles will start
> hitting contention sooner which results into higher latency. If the controller
> works well the larger SATA arrays above should be able to maintain their
> mediocre latency much better under load than the SCSI array with fewer drives
> would maintain its low latency response time despite its drives' lower average
> seek time.
> 
> --
> greg
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
[snip]
> 
> Adding drives will not let you get lower response times than the average seek
> time on your drives*. But it will let you reach that response time more often.
> 
[snip]

I believe your assertion is fundamentaly flawed.  Adding more drives
will not let you reach that response time more often.  All drives are
required to fill every request in all RAID levels (except possibly
0+1, but that isn't used for enterprise applicaitons).  Most requests
in OLTP require most of the request time to seek, not to read.  Only
in single large block data transfers will you get any benefit from
adding more drives, which is atypical in most database applications. 
For most database applications, the only way to increase
transactions/sec is to decrease request service time, which is
generaly achieved with better seek times or a better controller card,
or possibly spreading your database accross multiple tablespaces on
seperate paritions.

My assertion therefore is that simply adding more drives to an already
competent* configuration is about as likely to increase your database
effectiveness as swiss cheese is to make your car run faster.

Alex Turner
netEconomist

*Assertion here is that the DBA didn't simply configure all tables and
xlog on a single 7200 RPM disk, but has seperate physical drives for
xlog and tablespace at least on 10k drives.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at
least I would never recommend 1+0 for anything).

RAID 10 and RAID 0+1 are _quite_ different.  One gives you very good
redundancy, the other is only slightly better than RAID 5, but
operates faster in degraded mode (single drive).

Alex Turner
netEconomist

On 4/18/05, John A Meinel <[EMAIL PROTECTED]> wrote:
> Alex Turner wrote:
> 
> >[snip]
> >
> >
> >>Adding drives will not let you get lower response times than the average 
> >>seek
> >>time on your drives*. But it will let you reach that response time more 
> >>often.
> >>
> >>
> >>
> >[snip]
> >
> >I believe your assertion is fundamentaly flawed.  Adding more drives
> >will not let you reach that response time more often.  All drives are
> >required to fill every request in all RAID levels (except possibly
> >0+1, but that isn't used for enterprise applicaitons).
> >
> Actually 0+1 is the recommended configuration for postgres databases
> (both for xlog and for the bulk data), because the write speed of RAID5
> is quite poor.
> Hence you base assumption is not correct, and adding drives *does* help.
> 
> >Most requests
> >in OLTP require most of the request time to seek, not to read.  Only
> >in single large block data transfers will you get any benefit from
> >adding more drives, which is atypical in most database applications.
> >For most database applications, the only way to increase
> >transactions/sec is to decrease request service time, which is
> >generaly achieved with better seek times or a better controller card,
> >or possibly spreading your database accross multiple tablespaces on
> >seperate paritions.
> >
> >
> This is probably true. However, if you are doing lots of concurrent
> connections, and things are properly spread across multiple spindles
> (using RAID0+1, or possibly tablespaces across multiple raids).
> Then each seek occurs on a separate drive, which allows them to occur at
> the same time, rather than sequentially. Having 2 processes competing
> for seeking on the same drive is going to be worse than having them on
> separate drives.
> John
> =:->
> 
> 
>

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
I think the add more disks thing is really from the point of view that
one disk isn't enough ever.  You should really have at least four
drives configured into two RAID 1s.  Most DBAs will know this, but
most average Joes won't.

Alex Turner
netEconomist

On 4/18/05, Steve Poe <[EMAIL PROTECTED]> wrote:
> Alex,
> 
> In the situation of the animal hospital server I oversee, their
> application is OLTP. Adding hard drives (6-8) does help performance.
> Benchmarks like pgbench and OSDB agree with it, but in reality users
> could not see noticeable change. However, moving the top 5/10 tables and
> indexes to their own space made a greater impact.
> 
> Someone who reads PostgreSQL 8.0 Performance Checklist is going to see
> point #1 add more disks is the key. How about adding a subpoint to
> explaining when more disks isn't enough or applicable? I maybe
> generalizing the complexity of tuning an OLTP application, but some
> clarity could help.
> 
> Steve Poe
> 
>

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
Ok - well - I am partially wrong...

If you're stripe size is 64Kb, and you are reading 256k worth of data,
it will be spread across four drives, so you will need to read from
four devices to get your 256k of data (RAID 0 or 5 or 10), but if you
are only reading 64kb of data, I guess you would only need to read
from one disk.

So my assertion that adding more drives doesn't help is pretty
wrong... particularly with OLTP because it's always dealing with
blocks that are smaller that the stripe size.

Alex Turner
netEconomist

On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> At 18:56 18/04/2005, Alex Turner wrote:
> >All drives are required to fill every request in all RAID levels
> 
> No, this is definitely wrong. In many cases, most drives don't actually
> have the data requested, how could they handle the request?
> 
> When reading one random sector, only *one* drive out of N is ever used to
> service any given request, be it RAID 0, 1, 0+1, 1+0 or 5.
> 
> When writing:
> - in RAID 0, 1 drive
> - in RAID 1, RAID 0+1 or 1+0, 2 drives
> - in RAID 5, you need to read on all drives and write on 2.
> 
> Otherwise, what would be the point of RAID 0, 0+1 or 1+0?
> 
> Jacques.
> 
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
So I wonder if one could take this stripe size thing further and say
that a larger stripe size is more likely to result in requests getting
served parallized across disks which would lead to increased
performance?

Again, thanks to all people on this list, I know that I have learnt a
_hell_ of alot since subscribing.

Alex Turner
netEconomist

On 4/18/05, Alex Turner <[EMAIL PROTECTED]> wrote:
> Ok - well - I am partially wrong...
> 
> If you're stripe size is 64Kb, and you are reading 256k worth of data,
> it will be spread across four drives, so you will need to read from
> four devices to get your 256k of data (RAID 0 or 5 or 10), but if you
> are only reading 64kb of data, I guess you would only need to read
> from one disk.
> 
> So my assertion that adding more drives doesn't help is pretty
> wrong... particularly with OLTP because it's always dealing with
> blocks that are smaller that the stripe size.
> 
> Alex Turner
> netEconomist
> 
> On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > At 18:56 18/04/2005, Alex Turner wrote:
> > >All drives are required to fill every request in all RAID levels
> >
> > No, this is definitely wrong. In many cases, most drives don't actually
> > have the data requested, how could they handle the request?
> >
> > When reading one random sector, only *one* drive out of N is ever used to
> > service any given request, be it RAID 0, 1, 0+1, 1+0 or 5.
> >
> > When writing:
> > - in RAID 0, 1 drive
> > - in RAID 1, RAID 0+1 or 1+0, 2 drives
> > - in RAID 5, you need to read on all drives and write on 2.
> >
> > Otherwise, what would be the point of RAID 0, 0+1 or 1+0?
> >
> > Jacques.
> >
> >
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
Mistype.. I meant 0+1 in the second instance :(


On 4/18/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
> Alex Turner wrote:
> > Not true - the recommended RAID level is RAID 10, not RAID 0+1 (at
> > least I would never recommend 1+0 for anything).
> 
> Uhmm I was under the impression that 1+0 was RAID 10 and that 0+1 is NOT
> RAID 10.
> 
> Ref: http://www.acnc.com/raid.html
> 
> Sincerely,
> 
> Joshua D. Drake
> 
> 
> > ---(end of broadcast)---
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> >   subscribe-nomail command to [EMAIL PROTECTED] so that your
> >   message can get through to the mailing list cleanly
> 
>

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
On 4/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> At 20:21 18/04/2005, Alex Turner wrote:
> >So I wonder if one could take this stripe size thing further and say
> >that a larger stripe size is more likely to result in requests getting
> >served parallized across disks which would lead to increased
> >performance?
> 
> Actually, it would be pretty much the opposite. The smaller the stripe
> size, the more evenly distributed data is, and the more disks can be used
> to serve requests. If your stripe size is too large, many random accesses
> within one single file (whose size is smaller than the stripe size/number
> of disks) may all end up on the same disk, rather than being split across
> multiple disks (the extreme case being stripe size = total size of all
> disks, which means concatenation). If all accesses had the same cost (i.e.
> no seek time, only transfer time), the ideal would be to have a stripe size
> equal to the number of disks.
> 
[snip]

Ahh yes - but the critical distinction is this:
The smaller the stripe size, the more disks will be used to serve _a_
request - which is bad for OLTP because you want fewer disks per
request so that you can have more requests per second because the cost
is mostly seek.  If more than one disk has to seek to serve a single
request, you are preventing that disk from serving a second request at
the same time.

To have more throughput in MB/sec, you want a smaller stripe size so
that you have more disks serving a single request allowing you to
multiple by effective drives to get total bandwidth.

Because OLTP is made up of small reads and writes to a small number of
different files, I would guess that you want those files split up
across your RAID, but not so much that a single small read or write
operation would traverse more than one disk.   That would infer that
your optimal stripe size is somewhere on the right side of the bell
curve that represents your database read and write block count
distribution.  If on average the dbwritter never flushes less than 1MB
to disk at a time, then I guess your best stripe size would be 1MB,
but that seems very large to me.

So I think therefore that I may be contending the exact opposite of
what you are postulating!

Alex Turner
netEconomist

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-04-18 Thread Alex Turner
Does it really matter at which end of the cable the queueing is done
(Assuming both ends know as much about drive geometry etc..)?

Alex Turner
netEconomist

On 4/18/05, Bruce Momjian  wrote:
> Kevin Brown wrote:
> > Greg Stark wrote:
> >
> >
> > > I think you're being misled by analyzing the write case.
> > >
> > > Consider the read case. When a user process requests a block and
> > > that read makes its way down to the driver level, the driver can't
> > > just put it aside and wait until it's convenient. It has to go ahead
> > > and issue the read right away.
> >
> > Well, strictly speaking it doesn't *have* to.  It could delay for a
> > couple of milliseconds to see if other requests come in, and then
> > issue the read if none do.  If there are already other requests being
> > fulfilled, then it'll schedule the request in question just like the
> > rest.
> 
> The idea with SCSI or any command queuing is that you don't have to wait
> for another request to come in --- you can send the request as it
> arrives, then if another shows up, you send that too, and the drive
> optimizes the grouping at a later time, knowing what the drive is doing,
> rather queueing in the kernel.
> 
> --
>   Bruce Momjian|  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us   |  (610) 359-1001
>   +  If your life is a hard drive, |  13 Roberts Road
>   +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
>

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to improve db performance with $7K?

2005-04-20 Thread Alex Turner
I wonder if thats something to think about adding to Postgresql? A
setting for multiblock read count like Oracle (Although having said
that I believe that Oracle natively caches pages much more
aggressively that postgresql, which allows the OS to do the file
caching).

Alex Turner
netEconomist

P.S. Oracle changed this with 9i, you can change the Database block
size on a tablespace by tablespace bassis making it smaller for OLTP
tablespaces and larger for Warehousing tablespaces (at least I think
it's on a tablespace, might be on a whole DB).

On 4/19/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> On Mon, Apr 18, 2005 at 06:41:37PM -, Mohan, Ross wrote:
> > Don't you think "optimal stripe width" would be
> > a good question to research the binaries for? I'd
> > think that drives the answer, largely.  (uh oh, pun alert)
> >
> > EG, oracle issues IO requests (this may have changed _just_
> > recently) in 64KB chunks, regardless of what you ask for.
> > So when I did my striping (many moons ago, when the Earth
> > was young...) I did it in 128KB widths, and set the oracle
> > "multiblock read count" according. For oracle, any stripe size
> > under 64KB=stupid, anything much over 128K/258K=wasteful.
> >
> > I am eager to find out how PG handles all this.
> 
> AFAIK PostgreSQL requests data one database page at a time (normally
> 8k). Of course the OS might do something different.
> --
> Jim C. Nasby, Database Consultant   [EMAIL PROTECTED]
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How to improve db performance with $7K?

2005-04-20 Thread Alex Turner
Whilst I admire your purist approach, I would say that if it is
beneficial to performance that a kernel understand drive geometry,
then it is worth investigating teaching it how to deal with that!

I was less referrring to the kernel as I was to the controller.

Lets say we invented a new protocol that including the drive telling
the controller how it was layed out at initialization time so that the
controller could make better decisions about re-ordering seeks.  It
would be more cost effective to have that set of electronics just once
in the controller, than 8 times on each drive in an array, which would
yield better performance to cost ratio.  Therefore I would suggest it
is something that should be investigated.  After all, why implemented
TCQ on each drive, if it can be handled more effeciently at the other
end by the controller for less money?!

Alex Turner
netEconomist

On 4/19/05, Dave Held <[EMAIL PROTECTED]> wrote:
> > -Original Message-
> > From: Alex Turner [mailto:[EMAIL PROTECTED]
> > Sent: Monday, April 18, 2005 5:50 PM
> > To: Bruce Momjian
> > Cc: Kevin Brown; pgsql-performance@postgresql.org
> > Subject: Re: [PERFORM] How to improve db performance with $7K?
> >
> > Does it really matter at which end of the cable the queueing is done
> > (Assuming both ends know as much about drive geometry etc..)?
> > [...]
> 
> The parenthetical is an assumption I'd rather not make.  If my
> performance depends on my kernel knowing how my drive is laid
> out, I would always be wondering if a new drive is going to
> break any of the kernel's geometry assumptions.  Drive geometry
> doesn't seem like a kernel's business any more than a kernel
> should be able to decode the ccd signal of an optical mouse.
> The kernel should queue requests at a level of abstraction that
> doesn't depend on intimate knowledge of drive geometry, and the
> drive should queue requests on the concrete level where geometry
> matters.  A drive shouldn't guess whether a process is trying to
> read a file sequentially, and a kernel shouldn't guess whether
> sector 30 is contiguous with sector 31 or not.
> 
> __
> David B. Held
> Software Engineer/Array Services Group
> 200 14th Ave. East,  Sartell, MN 56377
> 320.534.3637 320.253.7800 800.752.8129
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
>

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Is:

REINDEX DATABASE blah

supposed to rebuild all indices in the database, or must you specify
each table individualy? (I'm asking because I just tried it and it
only did system tables)

Alex Turner
netEconomist

On 4/21/05, Josh Berkus  wrote:
> Bill,
> 
> > What about if an out-of-the-ordinary number of rows
> > were deleted (say 75% of rows in the table, as opposed
> > to normal 5%) followed by a 'VACUUM ANALYZE'? Could
> > things get out of whack because of that situation?
> 
> Yes.  You'd want to run REINDEX after and event like that.  As you should now.
> 
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Alex Turner
Same thing happens in Oracle

ALTER INDEX  rebuild

To force a rebuild.  It will mark the free blocks as 'free' below the
PCTFREE value for the tablespace.

Basically If you build an index with  entries.  and each entry is
1/4 of a block, the database will write 2500 blocks to the disk.  If
you delete a random 75% of the index values, you will now have 2500
blocks that have 75% free space.  The database will reuse that free
space in those blocks as you insert new values, but until then, you
still have 2500 blocks worth of data on a disk, that is only 25% full.
 Rebuilding the index forces the system to physically re-allocate all
that data space, and now you have just 2499 entries, that use 625
blocks.

I'm not sure that 'blocks' is the correct term in postgres, it's
segments in Oracle, but the concept remains the same.

Alex Turner
netEconomist

On 4/21/05, Bill Chandler <[EMAIL PROTECTED]> wrote:
> 
> --- Josh Berkus  wrote:
> > Bill,
> >
> > > What about if an out-of-the-ordinary number of
> > rows
> > > were deleted (say 75% of rows in the table, as
> > opposed
> > > to normal 5%) followed by a 'VACUUM ANALYZE'?
> > Could
> > > things get out of whack because of that situation?
> >
> > Yes.  You'd want to run REINDEX after and event like
> > that.  As you should now.
> >
> > --
> > Josh Berkus
> > Aglio Database Solutions
> > San Francisco
> >
> 
> Thank you.  Though I must say, that is very
> discouraging.  REINDEX is a costly operation, timewise
> and due to the fact that it locks out other processes
> from proceeding.  Updates are constantly coming in and
> queries are occurring continuously.  A REINDEX could
> potentially bring the whole thing to a halt.
> 
> Honestly, this seems like an inordinate amount of
> babysitting for a production application.  I'm not
> sure if the client will be willing to accept it.
> 
> Admittedly my knowledge of the inner workings of an
> RDBMS is limited, but could somebody explain to me why
> this would be so?  If you delete a bunch of rows why
> doesn't the index get updated at the same time?  Is
> this a common issue among all RDBMSs or is it
> something that is PostgreSQL specific?  Is there any
> way around it?
> 
> thanks,
> 
> Bill
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> 
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
> 
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
What is the status of Postgres support for any sort of multi-machine  
scaling support? What are you meant to do once you've upgraded your  
box and tuned the conf files as much as you can? But your query load  
is just too high for a single machine?

Upgrading stock Dell boxes (I know we could be using better machines,  
but I am trying to tackle the real issue) is not a hugely price  
efficient way of getting extra performance, nor particularly scalable  
in the long term.

So, when/is PG meant to be getting a decent partitioning system?  
MySQL is getting one (eventually) which is apparently meant to be  
similiar to Oracle's according to the docs. Clusgres does not appear  
to be widely/or at all used, and info on it seems pretty thin on the  
ground, so I am
not too keen on going with that. Is the real solution to multi- 
machine partitioning (as in, not like MySQLs MERGE tables) on  
PostgreSQL actually doing it in our application API? This seems like  
a less than perfect solution once we want to add redundancy and  
things into the mix. 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 15:41, John A Meinel wrote:
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded  
your  box
and tuned the conf files as much as you can? But your query load  is
just too high for a single machine?

Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.
Better hardware = More Efficient != More Scalable
But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.
Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.


So, when/is PG meant to be getting a decent partitioning system?   
MySQL
is getting one (eventually) which is apparently meant to be   
similiar to
Oracle's according to the docs. Clusgres does not appear  to be
widely/or at all used, and info on it seems pretty thin on the   
ground,
so I am
not too keen on going with that. Is the real solution to multi-  
machine
partitioning (as in, not like MySQLs MERGE tables) on  PostgreSQL
actually doing it in our application API? This seems like  a less  
than
perfect solution once we want to add redundancy and  things into  
the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/
Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt,  
which
has been overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same  
table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to  
wait
for all of the machines to get a copy of the data before you can  
say it
has been committed, which does *not* scale well with the number of  
machines.
This is why I mention partitioning. It solves this issue by storing  
different data sets on different machines under the same schema.  
These seperate chunks of the table can then be replicated as well for  
data redundancy and so on. MySQL are working on these things, but PG  
just has a bunch of third party extensions, I wonder why these are  
not being integrated into the main trunk :/ Thanks for pointing me to  
PGCluster though. It looks like it should be better than Slony at least.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 16:02, Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy "Big  
Iron" at all.
You would be right. Although we are not against paying a bit more  
than $300 for a server ;)

With all the Opteron v. Xeon around here, and talk of $30,000  
machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from  
Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit  
ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as  
easy to
harness as a single machine.

Yes, clustering solutions can distribute the data, and can even do  
it on a
per-table basis in some cases.  This still leaves it up to the  
application's
logic to handle reunification of the data.
If your going to be programming that sort of logic into your API in  
the beginning, it's not too much more work to add basic replication,  
load balancing and partitioning into it either. But the DB should be  
able to do it for you, adding that stuff in later is often more  
difficult and less likely to get done.

Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is  
capable of
working on multiple nodes of an execution tree at the same time, it  
would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the  
current
execution.  I believe MySQL has this, and Oracle may implement it  
in another
way.
MySQL sort of has this, it's not as good as Oracle's though.  
Apparently there is a much better version of it in 5.1 though, that  
should make it to stable sometime next year I imagine.

2. There is no #2 at this time, but I'm sure one can be
hypothesized.
I would of thought a particularly smart version of pg_pool could do  
it. It could partition data to different servers if it knew which  
columns to key by on each table.

...Google and other companies have definitely proved that one can  
harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)
I shudder to think how much the "Big Iron" equivalent of a google  
data-center would cost.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John A  
Meinel
Sent: Tuesday, May 10, 2005 7:41 AM
To: Alex Stapleton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning / Clustering

Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your
box and tuned the conf files as much as you can? But your query load
is just too high for a single machine?
Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell  
is far
away from Big Iron. I don't know what performance you are looking  
for, but
you can easily get into inserting 10M rows/day with quality hardware.

But actually is it your SELECT load that is too high, or your  
INSERT load,
or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you  
have INSERT
going into the Master, you can have as many replicated slaves,  
which can
handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT  
goes to a
different place than a SELECT. But there has been some discussion  
about
pg_pool being able to spread the query load, and having it be aware  
of the
difference between a SELECT and an INSERT and have it route the  
query to the
correct host. The biggest problem being that functions could cause  
a SELECT
func() to actually insert a row, which pg_pool wouldn't know about.  
There
are 2 possible solutions, a) don't do that when you are using this  
system,
b) add some sort of comment hint so that pg_pool can understand  
that the
select is actually an INSERT, and needs to be done on the master.


So, when/is PG meant to be getting a decent partitioning system?
MySQL is getting one (eventually) which is apparently meant to be
similiar to Oracle's according to the docs. Clusgres does not appear
to be widely/or at all used, and info on it seems pretty thin on the
ground, so I am not too keen on going with that. Is the real solution
to multi- machine partitioning (as in, not like MySQLs MERGE tables)
on  PostgreSQL actually doing it in our ap

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton
On 11 May 2005, at 08:16, Simon Riggs wrote:
On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote:
So, when/is PG meant to be getting a decent partitioning system?
ISTM that your question seems to confuse where code comes from.  
Without
meaning to pick on you, or reply rudely, I'd like to explore that
question. Perhaps it should be a FAQ entry.

All code is written by someone, and those people need to eat. Some
people are fully or partly funded to perform their tasks on this  
project
(coding, patching, etc). Others contribute their time for a variety of
reasons where involvement has a positive benefit.

You should ask these questions:
- Is anyone currently working on (Feature X)?
- If not, Can I do it myself?
- If not, and I still want it, can I fund someone else to build it for
me?
Asking "when is Feature X going to happen" is almost certainly  
going to
get the answer "never" otherwise, if the initial development is large
and complex. There are many TODO items that have lain untouched for
years, even though adding the feature has been discussed and agreed.

Best Regards, Simon Riggs
Acceptable Answers to 'So, when/is PG meant to be getting a decent  
partitioning system?':

1. Person X is working on it I believe.
2. It's on the list, but nobody has done anything about it yet
3. Your welcome to take a stab at it, I expect the community  
would support your efforts as well.
4. If you have a huge pile of money you could probably buy the  
Moon. Thinking along those lines, you can probably pay someone to  
write it for you.
5. It's a stupid idea, and it's never going to work, and heres  
why..

Unacceptable Answers to the same question:
1. Yours.
Be more helpful, and less arrogant please. Everyone else who has  
contributed to this thread has been very helpful in clarifying the  
state of affairs and pointing out what work is and isn't being done,  
and alternatives to just waiting for PG do it for you.

---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton
On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very "Plug-and-Play" as you have to  
base most of your API on memcached (I imagine MySQLs NDB tables might  
work as well actually) for it to work well.

They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on it's
interesting to see how they've made it scale.
They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews
I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in it.  
Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and probably  
get confused if you add new nodes onto it as well. Easily extensible  
clustering (no complete redistribution of data required when you add/ 
remove nodes) with the data distributed across nodes seems to be  
nothing but a pipe dream right now.

It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.
Don't Google use indexing servers which keep track of where data is?  
So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?

David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton
On 11 May 2005, at 09:50, Alex Stapleton wrote:
On 11 May 2005, at 08:57, David Roussel wrote:

For an interesting look at scalability, clustering, caching, etc  
for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf

I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very "Plug-and-Play" as you have  
to base most of your API on memcached (I imagine MySQLs NDB tables  
might work as well actually) for it to work well.


They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on  
it's
interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews

I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in  
it. Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and  
probably get confused if you add new nodes onto it as well. Easily  
extensible clustering (no complete redistribution of data required  
when you add/remove nodes) with the data distributed across nodes  
seems to be nothing but a pipe dream right now.


It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.

Don't Google use indexing servers which keep track of where data  
is? So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?
That will teach me to RTFA first ;) Ok so LJ maintain an index of  
which cluster each user is on, kinda of like google do :)


David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of  
broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 11 May 2005, at 23:35, PFC wrote:


However, memcached (and for us, pg_memcached) is an excellent way  
to improve
horizontal scalability by taking disposable data (like session  
information)
out of the database and putting it in protected RAM.

So, what is the advantage of such a system versus, say, a  
"sticky sessions" system where each session is assigned to ONE  
application server (not PHP then) which keeps it in RAM as native  
objects instead of serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better, and if  
one machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which can manage sessions. Potentially the savings are huge,  
though.
Theres no reason it couldn't be done with PHP to be fair as long as  
you could ensure that the client was always routed back to the same  
machines. Which has it's own set of issues entirely. I am not  
entirely sure that memcached actually does serialize data when it's  
comitted into memcached either, although I could be wrong, I have not  
looked at the source. Certainly if you can ensure that a client  
always goes back to the same machine you can simplify the whole thing  
hugely. It's generally not that easy though, you need a proxy server  
of some description capable of understanding the HTTP traffic and  
maintaining a central session lookup table to redirect with. Which  
isn't really solving the problem so much as moving it somewhere else.  
Instead of needing huge memcached pools, you need hardcore  
loadbalancers. Load Balancers tend to cost $ in comparison.  
Distributed sticky sessions are a rather nice idea, I would like to  
hear a way of implementing them cheaply (and on PHP) as well. I may  
have to give that some thought in fact. Oh yeah, and load balancers  
software often sucks in annoying (if not always important) ways.

On Google, their distributed system spans a huge number of PCs  
and it has redundancy, ie. individual PC failure is a normal thing  
and is a part of the system, it is handled gracefully. I read a  
paper on this matter, it's pretty impressive. The google filesystem  
has nothing to do with databases though, it's more a massive data  
store / streaming storage.

Since when did Massive Data stores have nothing to do with DBs? Isn't  
Oracle Cluster entirely based on forming an enormous scalable disk  
array to store your DB on?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner
Having local sessions is unnesesary, and here is my logic:

Generaly most people have less than 100Mb of bandwidth to the internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth. 
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.

Why solve the complicated clustered sessions problem, when you don't
really need to?

Alex Turner
netEconomist

On 5/11/05, PFC <[EMAIL PROTECTED]> wrote:
> 
> 
> > However, memcached (and for us, pg_memcached) is an excellent way to
> > improve
> > horizontal scalability by taking disposable data (like session
> > information)
> > out of the database and putting it in protected RAM.
> 
> So, what is the advantage of such a system versus, say, a "sticky
> sessions" system where each session is assigned to ONE application server
> (not PHP then) which keeps it in RAM as native objects instead of
> serializing and deserializing it on each request ?
> I'd say the sticky sessions should perform a lot better, and if one
> machine dies, only the sessions on this one are lost.
> But of course you can't do it with PHP as you need an app server which
> can manage sessions. Potentially the savings are huge, though.
> 
> On Google, their distributed system spans a huge number of PCs and it 
> has
> redundancy, ie. individual PC failure is a normal thing and is a part of
> the system, it is handled gracefully. I read a paper on this matter, it's
> pretty impressive. The google filesystem has nothing to do with databases
> though, it's more a massive data store / streaming storage.
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 12 May 2005, at 15:08, Alex Turner wrote:
Having local sessions is unnesesary, and here is my logic:
Generaly most people have less than 100Mb of bandwidth to the  
internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.
Why solve the complicated clustered sessions problem, when you don't
really need to?
100 hits a second = 8,640,000 hits a day. I work on a site which does  
> 100 million dynamic pages a day. In comparison Yahoo probably does  
> 100,000,000,000 (100 billion) views a day
 if I am interpreting Alexa's charts correctly. Which is about  
1,150,000 a second.

Now considering the site I work on is not even in the top 1000 on  
Alexa, theres a lot of sites out there which need to solve this  
problem I would assume.

There are also only so many hash table lookups a single machine can  
do, even if its a Quad Opteron behemoth.


Alex Turner
netEconomist
On 5/11/05, PFC <[EMAIL PROTECTED]> wrote:


However, memcached (and for us, pg_memcached) is an excellent way to
improve
horizontal scalability by taking disposable data (like session
information)
out of the database and putting it in protected RAM.
So, what is the advantage of such a system versus, say, a  
"sticky
sessions" system where each session is assigned to ONE application  
server
(not PHP then) which keeps it in RAM as native objects instead of
serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better,  
and if one
machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which
can manage sessions. Potentially the savings are huge, though.

On Google, their distributed system spans a huge number of  
PCs and it has
redundancy, ie. individual PC failure is a normal thing and is a  
part of
the system, it is handled gracefully. I read a paper on this  
matter, it's
pretty impressive. The google filesystem has nothing to do with  
databases
though, it's more a massive data store / streaming storage.

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner
Ok - my common sense alarm is going off here...

There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.

http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.

That means each and every person on the internet has to view 100 pages
per day of yahoo.

pretty unlikely IMHO.  I for one don't even use Yahoo ;)

100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.

If I visit 20 web sites in a day, and see an average of 10 pages per
site. that means only about 2000 or so sites generate 100 million page
views in a day or better.

100 million pageviews averages to 1157/sec, which we'll double for
peak load to 2314.

I can easily see a system doing 2314 hash lookups per second.  Hell I
wrote a system that could do a thousand times that four years ago on a
single 1Ghz Athlon.  Heck - you can get 2314 lookups/sec on a 486 ;)

Given that session information doesn't _have_ to persist to storage,
and can be kept in RAM.  A single server could readily manage session
information for even very large sites (of course over a million
concurrent users could really start chewing into RAM, but if you are
Yahoo, you can probably afford a box with 100GB of RAM ;).

We get over 1000 tps on a dual opteron with a couple of mid size RAID
arrays on 10k discs with fsync on for small transactions.  I'm sure
that could easily be bettered with a few more dollars.

Maybe my number are off, but somehow it doesn't seem like that many
people need a highly complex session solution to me.

Alex Turner
netEconomist

On 5/12/05, Alex Stapleton <[EMAIL PROTECTED]> wrote:
> 
> On 12 May 2005, at 15:08, Alex Turner wrote:
> 
> > Having local sessions is unnesesary, and here is my logic:
> >
> > Generaly most people have less than 100Mb of bandwidth to the
> > internet.
> >
> > If you make the assertion that you are transferring equal or less
> > session data between your session server (lets say an RDBMS) and the
> > app server than you are between the app server and the client, an out
> > of band 100Mb network for session information is plenty of bandwidth.
> > This also represents OLTP style traffic, which postgresql is pretty
> > good at.  You should easily be able to get over 100Tps.  100 hits per
> > second is an awful lot of traffic, more than any website I've managed
> > will ever see.
> >
> > Why solve the complicated clustered sessions problem, when you don't
> > really need to?
> 
> 100 hits a second = 8,640,000 hits a day. I work on a site which does
>  > 100 million dynamic pages a day. In comparison Yahoo probably does
>  > 100,000,000,000 (100 billion) views a day
>   if I am interpreting Alexa's charts correctly. Which is about
> 1,150,000 a second.
> 
> Now considering the site I work on is not even in the top 1000 on
> Alexa, theres a lot of sites out there which need to solve this
> problem I would assume.
> 
> There are also only so many hash table lookups a single machine can
> do, even if its a Quad Opteron behemoth.
> 
> 
> > Alex Turner
> > netEconomist
> >
> > On 5/11/05, PFC <[EMAIL PROTECTED]> wrote:
> >
> >>
> >>
> >>
> >>> However, memcached (and for us, pg_memcached) is an excellent way to
> >>> improve
> >>> horizontal scalability by taking disposable data (like session
> >>> information)
> >>> out of the database and putting it in protected RAM.
> >>>
> >>
> >> So, what is the advantage of such a system versus, say, a
> >> "sticky
> >> sessions" system where each session is assigned to ONE application
> >> server
> >> (not PHP then) which keeps it in RAM as native objects instead of
> >> serializing and deserializing it on each request ?
> >> I'd say the sticky sessions should perform a lot better,
> >> and if one
> >> machine dies, only the sessions on this one are lost.
> >> But of course you can't do it with PHP as you need an app
> >> server which
> >> can manage sessions. Potentially the savings are huge, though.
> >>
> >> On Google, their distributed system spans a huge number of
> >> PCs and it has
> >> redundancy, ie. individual PC failure is a normal thing and is a
> >> part of
> >> the system, it is handled gracefully. I read a paper on this
> >> matter, it's
> >> pretty impressive. The google filesystem has nothing to do with
> >> databases
> >> though, it's more a massive data store / streaming storage.
> >>
> >> ---(end of
> >> broadcast)---
> >> TIP 1: subscribe and unsubscribe commands go to
> >> [EMAIL PROTECTED]
> >>
> >>
> >
> >
> 
>

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 12 May 2005, at 18:33, Josh Berkus wrote:
People,

In general I think your point is valid. Just remember that it  
probably
also matters how you count page views. Because technically images  
are a
separate page (and this thread did discuss serving up images). So if
there are 20 graphics on a specific page, that is 20 server hits just
for that one page.

Also, there's bots and screen-scrapers and RSS, web e-mails, and  
web services
and many other things which create hits but are not "people".  I'm  
currently
working on clickstream for a site which is nowhere in the top 100,  
and is
getting 3 million real hits a day ... and we know for a fact that  
at least
1/4 of that is bots.
I doubt bots are generally Alexa toolbar enabled.
Regardless, the strategy you should be employing for a high traffic  
site is
that if your users hit the database for anything other than direct
interaction (like filling out a webform) then you're lost.Use  
memcached,
squid, lighttpd caching, ASP.NET caching, pools, etc.   Keep the  
load off the
database except for the stuff that only the database can do.
This is the aproach I would take as well. There is no point storing  
stuff in a DB, if your only doing direct lookups on it and it isn't  
the sort of data that you care so much about the integrity of.


--
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Optimizing for writes. Data integrity not critical

2005-05-21 Thread Alex Stapleton
Is using a ramdisk in situations like this entirely ill-advised then?  
When data integrity isn't a huge issue and you really need good write  
performance it seems like it wouldn't hurt too much. Unless I am  
missing something?


On 20 May 2005, at 02:45, Christopher Kings-Lynne wrote:

I'm doing the writes individually.  Is there a better way?   
Combining them all into a transaction or something?




Use COPY of course :)

Or at worst bundle 1000 inserts at a time in a transation...

And if you seriously do not care about your data at all, set fsync  
= off  in you postgresql.conf for a mega speedup.


Chris

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Optimizing for writes. Data integrity not critical

2005-05-21 Thread Alex Stapleton
I am interested in optimising write performance as well, the machine  
I am testing on is maxing out around 450 UPDATEs a second which is  
quite quick I suppose. I haven't tried turning fsync off yet. The  
table has...a lot of indices as well. They are mostly pretty simple  
partial indexes though.


I would usually just shuv stuff into memcached, but I need to store  
and sort (in realtime) 10's of thousands of rows. (I am experimenting  
with replacing some in house toplist generating stuff with a PG  
database.) The partial indexes are basically the only thing which  
makes the table usable btw.


The read performance is pretty damn good, but for some reason I chose  
to wrote the benchmark script in PHP, which can totally destroy the  
accuracy of your results if you decide to call pg_fetch_*(), even  
pg_affected_rows() can skew things significantly.


So any ideas how to improve the number of writes I can do a second?  
The existing system sorts everything by the desired column when a  
request is made, and the data it sorts is updated in realtime (whilst  
it isn't being sorted.) And it can sustain the read/write load (to  
memory) just fine. If I PG had heap tables this would probably not be  
a problem at all, but it does, so it is. Running it in a ramdisk  
would be acceptable, it's just annoying to create the db everytime  
the machine goes down. And having to run the entire PG instance off  
of the ramdisk isn't great either.


On 19 May 2005, at 23:21, Steve Bergman wrote:


Hi,

I am using postgresql in small (almost trivial) application in  
which I pull some data out of a Cobol C/ISAM file and write it into  
a pgsl table.  My users can then use the data however they want by  
interfacing to the data from OpenOffice.org.


The amount of data written is about 60MB and takes a few minutes on  
a 1200Mhz Athlon with a single 60MB IDE drive running Fedora Core 3  
with pgsql 7.4.7.  I'd like to speed up the DB writes a bit if  
possible.  Data integrity is not at all critical as the database  
gets dropped, created, and populated immediately before each use.   
Filesystem is ext3, data=ordered and I need to keep it that way as  
there is other data in the filesystem that I do care about.  I have  
not done any tuning in the config file yet, and was wondering what  
things would likely speed up writes in this situation.


I'm doing the writes individually.  Is there a better way?   
Combining them all into a transaction or something?


Thanks,
Steve Bergman

---(end of  
broadcast)---

TIP 8: explain analyze is your friend





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Select performance vs. mssql

2005-05-24 Thread Alex Turner
Until you start worrying about MVC - we have had problems with the
MSSQL implementation of read consistency because of this 'feature'.

Alex Turner
NetEconomistOn 5/24/05, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
On Tue, May 24, 2005 at 08:36:36 -0700,  mark durrant <[EMAIL PROTECTED]> wrote:>> --MSSQL's ability to hit the index only and not having> to go to the table itself results in a _big_
> performance/efficiency gain. If someone who's in> development wants to pass this along, it would be a> nice addition to PostgreSQL sometime in the future.> I'd suspect that as well as making one query faster,
> it would make everything else faster/more scalable as> the server load is so much less.This gets brought up a lot. The problem is that the index doesn't includeinformation about whether the current transaction can see the referenced
row. Putting this information in the index will add significant overheadto every update and the opinion of the developers is that this would bea net loss overall.---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Filesystem

2005-06-03 Thread Alex Turner
We have been using XFS for about 6 months now and it has even tolerated
a controller card crash.  So far we have mostly good things to
report about XFS.  I benchmarked raw throughputs at various stripe
sizes, and XFS came out on top for us against reiser and ext3.  I
also used it because of it's supposed good support for large files,
which was verified somewhat by the benchmarks.

I have noticed a problem though - if you have 80 files in a
directory, it seems that XFS chokes on simple operations like 'ls' or
'chmod -R ...' where ext3 doesn't, don't know about reiser, I went
straight back to default after that problem (that partition is not on a
DB server though).

Alex Turner
netEconomistOn 6/3/05, Martin Fandel <[EMAIL PROTECTED]> wrote:
Hi @ all,i have only a little question. Which filesystem is preferred forpostgresql? I'm plan to use xfs (before i used reiserfs). The reasonis the xfs_freeze Tool to make filesystem-snapshots.Is the performance better than reiserfs, is it reliable?
best regards,Martin---(end of broadcast)---TIP 3: if posting/reading through Usenet, please send an appropriate  subscribe-nomail command to 
[EMAIL PROTECTED] so that your  message can get through to the mailing list cleanly


[PERFORM] random_page_cost = 1?

2005-06-09 Thread Alex Stapleton
Is this advisable? The disks are rather fast (15k iirc) but somehow I  
don't think they are covered in whatever magic fairy dust it would  
require for a sequential read to be as fast as a random one. However  
I could be wrong, are there any circumstances when this is actually  
going to help performance?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] PostgreSQL using the wrong Index

2005-06-13 Thread Alex Stapleton

We have two index's like so

l1_historical=# \d "N_intra_time_idx"
   Index "N_intra_time_idx"
Column |Type
+-
time   | timestamp without time zone
btree


l1_historical=# \d "N_intra_pkey"
 Index "N_intra_pkey"
Column |Type
+-
symbol | text
time   | timestamp without time zone
unique btree (primary key)

and on queries like this

select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST'  
order by time desc limit 1;


PostgreSQL takes a very long time to complete, as it effectively  
scans the entire table, backwards. And the table is huge, about 450  
million rows. (btw, there are no triggers or any other exciting  
things like that on our tables in this db.)


but on things where the symbol does exist in the table, it's more or  
less fine, and nice and fast.


Whilst the option the planner has taken might be faster most of the  
time, the worst case scenario is unacceptable for obvious reasons.  
I've googled for trying to force the use of a specific index, but  
can't find anything relevant. Does anyone have any suggestions on  
getting it to use an index which hopefully will have better worst  
case performance?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL using the wrong Index

2005-06-13 Thread Alex Stapleton

Oh, we are running 7.4.2 btw. And our random_page_cost = 1

On 13 Jun 2005, at 14:02, Alex Stapleton wrote:


We have two index's like so

l1_historical=# \d "N_intra_time_idx"
   Index "N_intra_time_idx"
Column |Type
+-
time   | timestamp without time zone
btree


l1_historical=# \d "N_intra_pkey"
 Index "N_intra_pkey"
Column |Type
+-
symbol | text
time   | timestamp without time zone
unique btree (primary key)

and on queries like this

select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST'  
order by time desc limit 1;


PostgreSQL takes a very long time to complete, as it effectively  
scans the entire table, backwards. And the table is huge, about 450  
million rows. (btw, there are no triggers or any other exciting  
things like that on our tables in this db.)


but on things where the symbol does exist in the table, it's more  
or less fine, and nice and fast.


Whilst the option the planner has taken might be faster most of the  
time, the worst case scenario is unacceptable for obvious reasons.  
I've googled for trying to force the use of a specific index, but  
can't find anything relevant. Does anyone have any suggestions on  
getting it to use an index which hopefully will have better worst  
case performance?


---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL using the wrong Index

2005-06-13 Thread Alex Stapleton


On 13 Jun 2005, at 15:47, John A Meinel wrote:


Alex Stapleton wrote:



Oh, we are running 7.4.2 btw. And our random_page_cost = 1



Which is only correct if your entire db fits into memory. Also, try
updating to a later 7.4 version if at all possible.



I am aware of this, I didn't configure this machine though  
unfortuantely.



On 13 Jun 2005, at 14:02, Alex Stapleton wrote:



We have two index's like so

l1_historical=# \d "N_intra_time_idx"
   Index "N_intra_time_idx"
Column |Type
+-
time   | timestamp without time zone
btree



Just so you are aware, writing this as: "We have an index on
N_intra(time) and one on N_Intra(symbol, time)" is a lot more  
succinct.




Sorry, I happened to have them there in my clipboard at the time so I  
just blindly pasted them in.




l1_historical=# \d "N_intra_pkey"
 Index "N_intra_pkey"
Column |Type
+-
symbol | text
time   | timestamp without time zone
unique btree (primary key)

and on queries like this

select * from "N_intra" where symbol='SOMETHING WHICH DOESNT EXIST'
order by time desc limit 1;

PostgreSQL takes a very long time to complete, as it effectively
scans the entire table, backwards. And the table is huge, about 450
million rows. (btw, there are no triggers or any other exciting
things like that on our tables in this db.)

but on things where the symbol does exist in the table, it's  
more  or

less fine, and nice and fast.





What happens if you do:
SELECT * FROM "N_intra" WHERE symbol='doesnt exist' ORDER BY symbol,
time DESC LIMIT 1;


Hurrah! I should of thought of this, considering i've done it in the  
past :) Thanks a lot, that's great.


Yes, symbol is constant, but it frequently helps the planner  
realize it
can use an index scan if you include all terms in the index in the  
ORDER

BY clause.








Whilst the option the planner has taken might be faster most of the
time, the worst case scenario is unacceptable for obvious reasons.
I've googled for trying to force the use of a specific index, but
can't find anything relevant. Does anyone have any suggestions on
getting it to use an index which hopefully will have better worst
case performance?





Try the above first. You could also create a new index on symbol
   CREATE INDEX "N_intra_symbol_idx" ON "N_intra"(symbol);

Then the WHERE clause should use the symbol index, which means it can
know quickly that an entry doesn't exist. I'm not sure how many  
entries

you have per symbol, though, so this might cause problems in the ORDER
BY time portion.

I'm guessing what you really want is to just do the ORDER BY  
symbol, time.


John
=:->





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Multiple disks: RAID 5 or PG Cluster

2005-06-18 Thread Alex Turner
Of course these numbers are not true as soon as you exceed the stripe
size for a read operation, which is often only 128k.  Typically a
stripe of mirrors will not read from seperate halves of the mirrors
either, so RAID 10 is only N/2 best case in my experience, Raid 0+1 is
a mirror of stripes and will read from independant halves, but gives
worse redundancy.

Alex Turner
NetEconomistOn 6/18/05, Jacques Caron <[EMAIL PROTECTED]> wrote:
Hi,At 18:00 18/06/2005, PFC wrote:> I don't know what I'm talking about, but wouldn't mirorring be> faster>than striping for random reads like you often get on a database ? (ie. the
>reads can be dispatched to any disk) ? (or course, not for writes, but if>you won't use fsync, random writes should be reduced no ?)Roughly, for random reads, the performance (in terms of operations/s)
compared to a single disk setup, with N being the number of drives, is:RAID 0 (striping):- read = N- write = N- capacity = N- redundancy = 0RAID 1 (mirroring, N=2):- read = N- write = 1
- capacity = 1- redundancy = 1RAID 5 (striping + parity, N>=3)- read = N-1- write = 1/2- capacity = N-1- redundancy = 1RAID 10 (mirroring + striping, N=2n, N>=4)- read = N
- write = N/2- capacity = N/2- redundancy < N/2So depending on your app, i.e. your read/write ratio, how much data can becached, whether the data is important or not, how much data you have, etc,
one or the other option might be better.Jacques.---(end of broadcast)---TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-20 Thread Alex Stapleton
Hi, i'm trying to optimise our autovacuum configuration so that it  
vacuums / analyzes some of our larger tables better. It has been set  
to the default settings for quite some time. We never delete  
anything  (well not often, and not much) from the tables, so I am not  
so worried about the VACUUM status, but I am wary of XID wraparound  
nuking us at some point if we don't sort vacuuming out so we VACUUM  
at least once every year ;) However not running ANALYZE for such huge  
periods of time is probably impacting the statistics accuracy  
somewhat, and I have seen some unusually slow queries at times.  
Anyway, does anyone think we might benefit from a more aggressive  
autovacuum configuration? 


---(end of broadcast)---
TIP 6: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-20 Thread Alex Stapleton


On 20 Jun 2005, at 15:59, Jacques Caron wrote:


Hi,

At 16:44 20/06/2005, Alex Stapleton wrote:


We never delete
anything  (well not often, and not much) from the tables, so I am not
so worried about the VACUUM status



DELETEs are not the only reason you might need to VACUUM. UPDATEs  
are important as well, if not more. Tables that are constantly  
updated (statistics, session data, queues...) really need to be  
VACUUMed a lot.


We UPDATE it even less often.




but I am wary of XID wraparound
nuking us at some point if we don't sort vacuuming out so we VACUUM
at least once every year ;)



That would give you a maximum average of 31 transactions/sec...  
Don't know if that's high or low for you.


It's high as far as inserts go for us. It does them all at the end of  
each minute.





 However not running ANALYZE for such huge
periods of time is probably impacting the statistics accuracy
somewhat, and I have seen some unusually slow queries at times.
Anyway, does anyone think we might benefit from a more aggressive
autovacuum configuration?



ANALYZE is not a very expensive operation, however VACUUM can  
definitely be a big strain and take a long time on big tables,  
depending on your setup. I've found that partitioning tables (at  
the application level) can be quite helpful if you manage to keep  
each partition to a reasonable size (under or close to available  
memory), especially if the partitioning scheme is somehow time- 
related. YMMV.


Jacques.


That's not currently an option as it would require a pretty large  
amount of work to implement. I think we will have to keep that in  
mind though.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Alex Stapleton


On 20 Jun 2005, at 18:46, Josh Berkus wrote:



Alex,




Hi, i'm trying to optimise our autovacuum configuration so that it
vacuums / analyzes some of our larger tables better. It has been set
to the default settings for quite some time. We never delete
anything  (well not often, and not much) from the tables, so I am not
so worried about the VACUUM status, but I am wary of XID wraparound
nuking us at some point if we don't sort vacuuming out so we VACUUM
at least once every year ;)




I personally don't use autovaccuum on very large databases.   For DW,
vacuuming is far better tied to ETL operations or a clock schedule of
downtime.



Downtime is something I'd rather avoid if possible. Do you think we  
will need to run VACUUM FULL occasionally? I'd rather not lock tables  
up unless I cant avoid it. We can probably squeeze an automated  
vacuum tied to our data inserters every now and then though.




XID wraparound may be further away than you think.   Try checking
pg_controldata, which will give you the current XID, and you can  
calculate
how long you are away from wraparound.  I just tested a 200G data  
warehouse
and figured out that we are 800 months away from wraparound,  
despite hourly

ETL.



Is this an 8.0 thing? I don't have a pg_controldata from what I can  
see. Thats nice to hear though.








However not running ANALYZE for such huge
periods of time is probably impacting the statistics accuracy
somewhat, and I have seen some unusually slow queries at times.
Anyway, does anyone think we might benefit from a more aggressive
autovacuum configuration?




Hmmm, good point, you could use autovacuum for ANALYZE only.  Just  
set the
VACUUM settings preposterously high (like 10x) so it never runs.
Then it'll
run ANALYZE only.   I generally threshold 200, multiple 0.1x for  
analyze;

that is, re-analyze after 200+10% of rows have changed.



I will try those settings out, that sounds good to me though.



--
Josh Berkus
Aglio Database Solutions
San Francisco







---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] autovacuum suggestions for 500,000,000+ row tables?

2005-06-21 Thread Alex Stapleton


On 21 Jun 2005, at 18:13, Josh Berkus wrote:


Alex,



Downtime is something I'd rather avoid if possible. Do you think we
will need to run VACUUM FULL occasionally? I'd rather not lock tables
up unless I cant avoid it. We can probably squeeze an automated
vacuum tied to our data inserters every now and then though.



As long as your update/deletes are less than 10% of the table for  
all time,

you should never have to vacuum, pending XID wraparound.



Hmm, I guess as we have hundreds of millions of rows, and when we do  
delete things, it's only a few thousand, and rarely. VACUUMing  
shouldn't need to happen too often. Thats good. Thanks a lot for the  
advice.



Is this an 8.0 thing? I don't have a pg_controldata from what I can
see. Thats nice to hear though.



'fraid so, yes.


Bloody Debian stable. I might have to experiment with building from  
source or using alien on debian to convert the rpms. Fun. Oh well.



--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Mount database on RAM disk?

2005-07-09 Thread Alex Stapleton


On 8 Jul 2005, at 20:21, Merlin Moncure wrote:


Stuart,



I'm putting together a road map on how our systems can scale as our


load


increases. As part of this, I need to look into setting up some fast
read only mirrors of our database. We should have more than enough


RAM


to fit everything into memory. I would like to find out if I could
expect better performance by mounting the database from a RAM disk,


or


if I would be better off keeping that RAM free and increasing the
effective_cache_size appropriately.



If you're accessing a dedicated, read-only system with a database


small


enough to fit in RAM, it'll all be cached there anyway, at least on


Linux


and BSD.   You won't be gaining anything by creating a ramdisk.





ditto windows.

Files cached in memory are slower than reading straight from memory  
but

not nearly enough to justify reserving memory for your use.  In other
words, your O/S is a machine with years and years of engineering
designed best how to dole memory out to caching and various processes.
Why second guess it?


Because sometimes it gets it wrong. The most brutal method is  
occasionally the most desirable. Even if it not the "right" way to do  
it.



Merlin

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner
Also seems pretty silly to put it on a regular SATA connection, when
all that can manage is 150MB/sec.  If you made it connection directly
to 66/64-bit PCI then it could actualy _use_ the speed of the RAM, not
to mention PCI-X.

Alex Turner
NetEconomist

On 7/26/05, John A Meinel <[EMAIL PROTECTED]> wrote:
> I saw a review of a relatively inexpensive RAM disk over at
> anandtech.com, the Gigabyte i-RAM
> http://www.anandtech.com/storage/showdoc.aspx?i=2480
> 
> Basically, it is a PCI card, which takes standard DDR RAM, and has a
> SATA port on it, so that to the system, it looks like a normal SATA drive.
> 
> The card costs about $100-150, and you fill it with your own ram, so for
> a 4GB (max size) disk, it costs around $500. Looking for solid state
> storage devices, the cheapest I found was around $5k for 2GB.
> 
> Gigabyte claims that the battery backup can last up to 16h, which seems
> decent, if not really long (the $5k solution has a built-in harddrive so
> that if the power goes out, it uses the battery power to copy the
> ramdisk onto the harddrive for more permanent storage).
> 
> Anyway, would something like this be reasonable as a drive for storing
> pg_xlog? With 4GB you could have as many as 256 checkpoint segments.
> 
> I'm a little leary as it is definitely a version 1.0 product (it is
> still using an FPGA as the controller, so they were obviously pushing to
> get the card into production).
> 
> But it seems like this might be a decent way to improve insert
> performance, without setting fsync=false.
> 
> Probably it should see some serious testing (as in power spikes/pulled
> plugs, etc). I know the article made some claim that if you actually
> pull out the card it goes into "high consumption mode" which is somehow
> greater than if you leave it in the slot with the power off. Which to me
> seems like a lot of bull, and really means the 16h is only under
> best-case circumstances. But even 1-2h is sufficient to handle a simple
> power outage.
> 
> And if you had a UPS with detection of power failure, you could always
> sync the ramdisk to a local partition before the power goes out. Though
> you could do that with a normal in-memory ramdisk (tmpfs) without having
> to buy the card. Though it does give you up-to an extra 4GB of ram, for
> machines which have already maxed out their slots.
> 
> Anyway, I thought I would mention it to the list, to see if anyone else
> has heard of it, or has any thoughts on the matter. I'm sure there are
> some people who are using more expensive ram disks, maybe they have some
> ideas about what this device is missing. (other than costing about
> 1/10th the price)
> 
> John
> =:->
> 
> 
> 
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Cheap RAM disk?

2005-07-26 Thread Alex Turner
Please see:

http://www.newegg.com/Product/Product.asp?Item=N82E16820145309
and
http://www.newegg.com/Product/Product.asp?Item=N82E16820145416

The price of Reg ECC is not significantly higher than regular ram at
this point.  Plus if you go with super fast 2-2-2-6 then it's actualy
more than good ol 2.5 Reg ECC.

Alex Turner
NetEconomist

On 7/26/05, PFC <[EMAIL PROTECTED]> wrote:
> 
> > I'm a little leary as it is definitely a version 1.0 product (it is
> > still using an FPGA as the controller, so they were obviously pushing to
> > get the card into production).
> 
> Not necessarily. FPGA's have become a sensible choice now. My RME 
> studio
> soundcard uses a big FPGA.
> 
> The performance in the test doesn't look that good, though, but don't
> forget it was run under windows. For instance they get 77s to copy the
> Firefox source tree on their Athlon 64/raptor ; my Duron / 7200rpm ide
> drive does it in 30 seconds, but not with windows of course.
> 
> However it doesnt' use ECC so... That's a pity, because they could 
> have
> implemented ECC in "software" inside the chip, and have the benefits of
> error correction with normal, cheap RAM.
> 
> Well; wait and see...
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
>

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Need for speed

2005-08-16 Thread Alex Turner
Are you calculating aggregates, and if so, how are you doing it (I ask
the question from experience of a similar application where I found
that my aggregating PGPLSQL triggers were bogging the system down, and
changed them so scheduled jobs instead).

Alex Turner
NetEconomist

On 8/16/05, Ulrich Wisser <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> one of our services is click counting for on line advertising. We do
> this by importing Apache log files every five minutes. This results in a
> lot of insert and delete statements. At the same time our customers
> shall be able to do on line reporting.
> 
> We have a box with
> Linux Fedora Core 3, Postgres 7.4.2
> Intel(R) Pentium(R) 4 CPU 2.40GHz
> 2 scsi 76GB disks (15.000RPM, 2ms)
> 
> I did put pg_xlog on another file system on other discs.
> 
> Still when several users are on line the reporting gets very slow.
> Queries can take more then 2 min.
> 
> I need some ideas how to improve performance in some orders of
> magnitude. I already thought of a box with the whole database on a ram
> disc. So really any idea is welcome.
> 
> Ulrich
> 
> 
> 
> --
> Ulrich Wisser  / System Developer
> 
> RELEVANT TRAFFIC SWEDEN AB, Riddarg 17A, SE-114 57 Sthlm, Sweden
> Direct (+46)86789755 || Cell (+46)704467893 || Fax (+46)86789769
> 
> http://www.relevanttraffic.com
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
>

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] choosing RAID level for xlogs

2005-08-16 Thread Alex Turner
Theoretically RAID 5 can perform better than RAID 10 over the same
number of drives (more members form the stripe in RAID 5 than in RAID
10).  All you have to do is calculate parity faster than the drives
can write.  Doesn't seem like a hard task really, although most RAID
controllers seem incapable of doing so, it is possible that Clariion
might be able to acheive it.  The other factor is that for partial
block writes, the array has to first read the original block in order
to recalculate the parity, so small random writes are very slow.  If
you are writing chunks that are larger than your stripe size*(n-1),
then in theory the controller doesn't have to re-read a block, and can
just overwrite the parity with the new info.

Consider just four drives.  in RAID 10, it is a stripe of two mirrors,
forming two independant units to write to.  in RAID 5, it is a 3 drive
stripe with parity giving three independant units to write to. 
Theoretically the RAID 5 should be faster, but I've yet to benchmark a
controler where this holds to be true.

Of course if you ever do have a drive failure, your array grinds to a
halt because rebuilding a raid 5 requires reading (n-1) blocks to
rebuild just one block where n is the number of drives in the array,
whereas a mirror only required to read from a single spindle of the
RAID.

I would suggest running some benchmarks at RAID 5 and RAID 10 to see
what the _real_ performance actualy is, thats the only way to really
tell.

Alex Turner
NetEconomist

On 8/16/05, Anjan Dave <[EMAIL PROTECTED]> wrote:
> Yes, that's true, though, I am a bit confused because the Clariion array
> document I am reading talks about how the write cache can eliminate the
> RAID5 Write Penalty for sequential and large IOs...resulting in better
> sequential write performance than RAID10.
> 
> anjan
> 
> 
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 16, 2005 2:00 PM
> To: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] choosing RAID level for xlogs
> 
> Quoting Anjan Dave <[EMAIL PROTECTED]>:
> 
> > Hi,
> >
> >
> >
> > One simple question. For 125 or more checkpoint segments
> > (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or
> > 170MB) on a very busy database, what is more suitable, a separate 6
> disk
> > RAID5 volume, or a RAID10 volume? Databases will be on separate
> > spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is
> > paramount, but I don't want to use RAID0.
> >
> 
> RAID10 -- no question.  xlog activity is overwhelmingly sequential 8KB
> writes.
> In order for RAID5 to perform a write, the host (or controller) needs to
> perform
> extra calculations for parity.  This turns into latency.  RAID10 does
> not
> perform those extra calculations.
> 
> >
> >
> > PG7.4.7 on RHAS 4.0
> >
> >
> >
> > I can provide more info if needed.
> >
> >
> >
> > Appreciate some recommendations!
> >
> >
> >
> > Thanks,
> >
> > Anjan
> >
> >
> >
> >
> > ---
> > This email message and any included attachments constitute
> confidential
> > and privileged information intended exclusively for the listed
> > addressee(s). If you are not the intended recipient, please notify
> > Vantage by immediately telephoning 215-579-8390, extension 1158. In
> > addition, please reply to this message confirming your receipt of the
> > same in error. A copy of your email reply can also be sent to
> > [EMAIL PROTECTED] Please do not disclose, copy, distribute or take
> > any action in reliance on the contents of this information. Kindly
> > destroy all copies of this message and any attachments. Any other use
> of
> > this email is prohibited. Thank you for your cooperation. For more
> > information about Vantage, please visit our website at
> > http://www.vantage.com <http://www.vantage.com/> .
> > ---
> >
> >
> >
> >
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
>

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] choosing RAID level for xlogs

2005-08-16 Thread Alex Turner
Don't forget that often controlers don't obey fsyncs like a plain
drive does.  thats the point of having a BBU ;)

Alex Turner
NetEconomist

On 8/16/05, John A Meinel <[EMAIL PROTECTED]> wrote:
> Anjan Dave wrote:
> > Yes, that's true, though, I am a bit confused because the Clariion array
> > document I am reading talks about how the write cache can eliminate the
> > RAID5 Write Penalty for sequential and large IOs...resulting in better
> > sequential write performance than RAID10.
> >
> > anjan
> >
> 
> To give a shorter statement after my long one...
> If you have enough cache that the controller can write out big chunks to
> the disk at a time, you can get very good sequential RAID5 performance,
> because the stripe size is large (so it can do a parallel write to all
> disks).
> 
> But for small chunk writes, you suffer the penalty of the read before
> write, and possible multi-disk read (depends on what is in cache).
> 
> RAID10 generally handles small writes better, and I would guess that
> 4disks would perform almost identically to 6disks, since you aren't
> usually writing enough data to span multiple stripes.
> 
> If your battery-backed cache is big enough that you don't fill it, they
> probably perform about the same (superfast) since the cache hides the
> latency of the disks.
> 
> If you start filling up your cache, RAID5 probably can do better because
> of the parallelization.
> 
> But small writes followed by an fsync do favor RAID10 over RAID5.
> 
> John
> =:->
> 
> 
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] choosing RAID level for xlogs

2005-08-16 Thread Alex Turner
The other point that is well made is that with enough drives you will
max out the PCI bus before you max out the drives.  64-bit 66Mhz can
do about 400MB/sec, which can be acheived by two 3 drive stripes (6
drive in RAID 10).  A true PCI-X card can do better, but can your
controller?  Remember, U320 is only 320MB/channel...

Alex Turner
NetEconomist

On 8/16/05, Anjan Dave <[EMAIL PROTECTED]> wrote:
> Thanks, everyone. I got some excellent replies, including some long 
> explanations. Appreciate the time you guys took out for the responses.
> 
> The gist of it i take, is to use RAID10. I have 400MB+ of write cache on the 
> controller(s), that the RAID5 LUN(s) could benefit from by filling it up and 
> writing out the complete stripe, but come to think of it, it's shared among 
> the two Storage Processors, all the LUNs, not just the ones holding the 
> pg_xlog directory. The other thing (with Clariion) is the write cache 
> mirroring. Write isn't signalled complete to the host until the cache content 
> is mirrored across the other SP (and vice-versa), which is a good thing, but 
> this operation could potentially become a bottleneck with very high load on 
> the SPs.
> 
> Also, one would have to fully trust the controller/manufacturer's claim on 
> signalling the write completion. And, performance is a priority over the 
> drive space lost in RAID10 for me.
> 
> I can use 4 drives instead of 6.
> 
> Thanks,
> Anjan
> 
> t-Original Message-
> From: Gregory S. Williamson [mailto:[EMAIL PROTECTED]
> Sent: Tue 8/16/2005 6:22 PM
> To: Anjan Dave; pgsql-performance@postgresql.org
> Cc:
> Subject: RE: [PERFORM] choosing RAID level for xlogs
> 
> 
> 
> I would be very cautious about ever using RAID5, despite 
> manufacturers' claims to the contrary. The link below is authored by a very 
> knowledgable fellow whose posts I know (and trust) from Informix land.
> 
> <http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt>
> 
> Greg Williamson
> DBA
> GlobeXplorer LLC
> 
> 
> -Original Message-
> From:   [EMAIL PROTECTED] on behalf of Anjan Dave
> Sent:   Mon 8/15/2005 1:35 PM
> To: pgsql-performance@postgresql.org
> Cc:
> Subject:[PERFORM] choosing RAID level for xlogs
> Hi,
> 
> 
> 
> One simple question. For 125 or more checkpoint segments
> (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or
> 170MB) on a very busy database, what is more suitable, a separate 6 
> disk
> RAID5 volume, or a RAID10 volume? Databases will be on separate
> spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is
> paramount, but I don't want to use RAID0.
> 
> 
> 
> PG7.4.7 on RHAS 4.0
> 
> 
> 
> I can provide more info if needed.
> 
> 
> 
> Appreciate some recommendations!
> 
> 
> 
> Thanks,
> 
> Anjan
> 
> 
> 
> 
> ---
> This email message and any included attachments constitute 
> confidential
> and privileged information intended exclusively for the listed
> addressee(s). If you are not the intended recipient, please notify
> Vantage by immediately telephoning 215-579-8390, extension 1158. In
> addition, please reply to this message confirming your receipt of the
> same in error. A copy of your email reply can also be sent to
> [EMAIL PROTECTED] Please do not disclose, copy, distribute or take
> any action in reliance on the contents of this information. Kindly
> destroy all copies of this message and any attachments. Any other use 
> of
> this email is prohibited. Thank you for your cooperation. For more
> information about Vantage, please visit our website at
> http://www.vantage.com <http://www.vantage.com/> .
> ---
> 
> 
> 
> 
> 
> !DSPAM:4300fd35105094125621296!
> 
> 
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] sustained update load of 1-2k/sec

2005-08-19 Thread Alex Turner
I have managed tx speeds that high from postgresql going even as high
as 2500/sec for small tables, but it does require a good RAID
controler card (yes I'm even running with fsync on).  I'm using 3ware
9500S-8MI with Raptor drives in multiple RAID 10s.  The box wasn't too
$$$ at just around $7k.  I have two independant controlers on two
independant PCI buses to give max throughput. on with a 6 drive RAID
10 and the other with two 4 drive RAID 10s.

Alex Turner
NetEconomist

On 8/19/05, Mark Cotner <[EMAIL PROTECTED]> wrote:
> Hi all,
> I bet you get tired of the same ole questions over and
> over.
> 
> I'm currently working on an application that will poll
> thousands of cable modems per minute and I would like
> to use PostgreSQL to maintain state between polls of
> each device.  This requires a very heavy amount of
> updates in place on a reasonably large table(100k-500k
> rows, ~7 columns mostly integers/bigint).  Each row
> will be refreshed every 15 minutes, or at least that's
> how fast I can poll via SNMP.  I hope I can tune the
> DB to keep up.
> 
> The app is threaded and will likely have well over 100
> concurrent db connections.  Temp tables for storage
> aren't a preferred option since this is designed to be
> a shared nothing approach and I will likely have
> several polling processes.
> 
> Here are some of my assumptions so far . . .
> 
> HUGE WAL
> Vacuum hourly if not more often
> 
> I'm getting 1700tx/sec from MySQL and I would REALLY
> prefer to use PG.  I don't need to match the number,
> just get close.
> 
> Is there a global temp table option?  In memory tables
> would be very beneficial in this case.  I could just
> flush it to disk occasionally with an insert into blah
> select from memory table.
> 
> Any help or creative alternatives would be greatly
> appreciated.  :)
> 
> 'njoy,
> Mark
> 
> 
> --
> Writing software requires an intelligent person,
> creating functional art requires an artist.
> -- Unknown
> 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq
>

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] sustained update load of 1-2k/sec

2005-08-19 Thread Alex Turner
Don't forget that Ultra 320 is the speed of the bus, not each drive. 
No matter how many honking 15k disks you put on a 320MB bus, you can
only get 320MB/sec! and have so many outstanding IO/s on the bus.

Not so with SATA! Each drive is on it's own bus, and you are only
limited by the speed of your PCI-X Bus, which can be as high as
800MB/sec at 133Mhz/64bit.

It's cheap and it's fast - all you have to do is pay for the
enclosure, which can be a bit pricey, but there are some nice 24bay
and even 40bay enclosures out there for SATA.

Yes a 15k RPM drive will give you better seek time and better peak
through put, but put them all on a single U320 bus and you won't see
much return past a stripe size of 3 or 4.

If it's raw transactions per second data warehouse style, it's all
about the xlog baby which is sequential writes, and all about large
block reads, which is sequential reads.

Alex Turner
NetEconomist
P.S. Sorry if i'm a bit punchy, I've been up since yestarday with
server upgrade nightmares that continue ;)

On 8/19/05, Ron <[EMAIL PROTECTED]> wrote:
> Alex mentions a nice setup, but I'm pretty sure I know how to beat
> that IO subsystems HW's performance by at least 1.5x or 2x.  Possibly
> more.  (No, I do NOT work for any vendor I'm about to discuss.)
> 
> Start by replacing the WD Raptors with Maxtor Atlas 15K II's.
> At 5.5ms average access, 97.4MB/s outer track throughput, 85.9MB/s
> average, and 74.4 MB/s inner track throughput, they have the best
> performance characteristics of any tested shipping HDs I know
> of.  (Supposedly the new SAS versions will _sustain_ ~98MB/s, but
> I'll believe that only if I see it under independent testing).
> In comparison, the numbers on the WD740GD are 8.1ms average access,
> 71.8, 62.9, and 53.9 MB/s outer, average and inner track throughputs
> respectively.
> 
> Be prepared to use as many of them as possible (read: as many you can
> afford) if you want to maximize transaction rates, particularly for
> small transactions like this application seems to be mentioning.
> 
> Next, use a better RAID card.  The TOL enterprise stuff (Xyratex,
> Engino, Dot-hill) is probably too expensive, but in the commodity
> market benchmarks indicate that that Areca's 1GB buffer RAID cards
> currently outperform all the other commodity RAID stuff.
> 
> 9 Atlas II's per card in a RAID 5 set, or 16 per card in a RAID 10
> set, should max the RAID card's throughput and come very close to, if
> not attaining, the real world peak bandwidth of the 64b 133MHz PCI-X
> bus they are plugged into.  Say somewhere in the 700-800MB/s range.
> 
> Repeat the above for as many independent PCI-X buses as you have for
> a very fast commodity RAID IO subsystem.
> 
> Two such configured cards used in the dame manner as mentioned by
> Alex should easily attain 1.5x - 2x the transaction numbers mentioned
> by Alex unless there's a bottleneck somewhere else in the system design.
> 
> Hope this helps,
> Ron Peacetree
> 
> At 08:40 AM 8/19/2005, Alex Turner wrote:
> >I have managed tx speeds that high from postgresql going even as high
> >as 2500/sec for small tables, but it does require a good RAID
> >controler card (yes I'm even running with fsync on).  I'm using 3ware
> >9500S-8MI with Raptor drives in multiple RAID 10s.  The box wasn't too
> >$$$ at just around $7k.  I have two independant controlers on two
> >independant PCI buses to give max throughput. on with a 6 drive RAID
> >10 and the other with two 4 drive RAID 10s.
> >
> >Alex Turner
> >NetEconomist
> >
> >On 8/19/05, Mark Cotner <[EMAIL PROTECTED]> wrote:
> > > Hi all,
> > > I bet you get tired of the same ole questions over and
> > > over.
> > >
> > > I'm currently working on an application that will poll
> > > thousands of cable modems per minute and I would like
> > > to use PostgreSQL to maintain state between polls of
> > > each device.  This requires a very heavy amount of
> > > updates in place on a reasonably large table(100k-500k
> > > rows, ~7 columns mostly integers/bigint).  Each row
> > > will be refreshed every 15 minutes, or at least that's
> > > how fast I can poll via SNMP.  I hope I can tune the
> > > DB to keep up.
> > >
> > > The app is threaded and will likely have well over 100
> > > concurrent db connections.  Temp tables for storage
> > > aren't a preferred option since this is designed to be
> > > a shared nothing approach and I will likely have
> > > several polling processes.
> > >
> > > Here are some of my a

Re: [PERFORM] Avoid using swap in a cluster

2005-09-02 Thread Alex Stapleton


On 2 Sep 2005, at 10:42, Richard Huxton wrote:


Ricardo Humphreys wrote:


Hi all.
 In a cluster, is there any way to use the main memory of the  
other nodes instead of the swap? If I have a query with many sub- 
queries and a lot of data, I can easily fill all the memory in a  
node. The point is: is there any way to continue using the main  
memory from other nodes in the same query instead of the swap?




I don't know of any clustered version of PG that can spread queries  
over multiple machines. Can I ask what you are using?


IIRC GreenPlums DeepGreen MPP (Version 2) can do it. It does cost  
money though, but it is a very nice product.



--
  Richard Huxton
  Archonet Ltd

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Query take 101 minutes, help, please

2005-09-07 Thread Alex Hayward
On Wed, 7 Sep 2005, Meetesh Karia wrote:

> PG is creating the union of January, February and March tables first and
> that doesn't have an index on it. If you're going to do many queries using
> the union of those three tables, you might want to place their contents into
> one table and create an index on it.
>
> Otherwise, try something like this:
>
> SELECT "Rut Cliente"
> FROM "Internet_Abril"
> WHERE "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Enero")
> AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Febrero")
> AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM
> "Internet_Marzo");

You may also wish to try:

SELECT "Rut Cliente"
FROM "Internet_Abril"
WHERE NOT EXISTS
  (SELECT 1 FROM "Internet_Enero"
   WHERE "Internet_Enero"."Rut Cliente"="Internet_Abril"."Rut Cliente")
AND NOT EXISTS
  (SELECT 1 FROM "Internet_Febrero"
   WHERE "Internet_Febrero"."Rut Cliente"="Internet_Abril"."Rut Cliente")
AND NOT EXISTS
  (SELECT 1 FROM "Internet_Marzo"
   WHERE "Internet_Marzo"."Rut Cliente"="Internet_Abril"."Rut Cliente")

which will probably scan the indexes on the January, February and March
indexes once for each row in the April table.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance considerations for very heavy INSERT traffic

2005-09-12 Thread Alex Turner
Split your system into multiple partitions of RAID 10s.For max
performance,  ten drive RAID 10 for pg_xlog (This will max out a
PCI-X bus) on Bus A, multiple 4/6Drive RAID 10s for tablespaces on Bus
B. For max performance I would recommend using one RAID 10 for raw data
tables, one for aggregate tables and one for indexes.  More RAM
will only help you with queries against your data, if you are
pre-aggregating, then you may not need all that much RAM.

You can easily get 100 tansacts per second with even less hardware with a little data partitioning.

Choose your controller carefully as many don't co-operate with linux well.
Alex Turner
NetEconomist
On 9/12/05, Brandon Black <[EMAIL PROTECTED]> wrote:

I'm in the process of developing an application which uses PostgreSQL
for data storage.  Our database traffic is very atypical, and as a
result it has been rather challenging to figure out how to best tune
PostgreSQL on what development hardware we have, as well as to figure
out exactly what we should be evaluating and eventually buying for
production hardware.

The vast, overwhelming majority of our database traffic is pretty much
a non-stop stream of INSERTs filling up tables.  It is akin to
data acquisition.  Several thousand clients are sending
once-per-minute updates full of timestamped numerical data at our
central server, which in turn performs INSERTs into several distinct
tables as part of the transaction for that client.  We're talking
on the order of ~100 transactions per second, each containing INSERTs
to multiple tables (which contain only integer and floating point
columns and a timestamp column - the primary key (and only index) is on
a unique integer ID for the client and the timestamp).  The
transaction load is spread evenly over time by having the clients send
their per-minute updates at random times rather than on the exact
minute mark.

There will of course be users using a web-based GUI to extract data
from these tables and display them in graphs and whatnot, but the
SELECT query traffic will always be considerably less frequent and
intensive than the incessant INSERTs, and it's not that big a deal if
the large queries take a little while to run.

This data also expires - rows with timestamps older than X days will be
DELETEd periodically (once an hour or faster), such that the tables
will reach a relatively stable size (pg_autovacuum is handling
vacuuming for now, but considering our case, we're thinking of killing
pg_autovacuum in favor of having the periodic DELETE process also do a
vacuum of affected tables right after the DELETE, and then have it
vacuum the other low traffic tables once a day while it's at it).

There is an aggregation layer in place which proxies the inbound data
from the clients into a small(er) number of persistent postgresql
backend processes.  Right now we're doing one aggregator per 128
clients (so instead of 128 seperate database connections over the
course of a minute for a small transaction each, there is a single
database backend that is constantly committing transactions at a rate
of ~ 2/second).  At a test load of ~1,000 clients, we would have 8
aggregators running and 8 postgresql backends.  Testing has seemed
to indicate we should aggregate even harder - the planned production
load is ~5,000 clients initially, but will grow to almost double that
in the not-too-distant future, and that would mean ~40 backends at 128
clients each initially.  Even on 8 cpus, I'm betting 40 concurrent
backends doing 2 tps is much worse off than 10 backends doing 8 tps.


Test hardware right now is a dual Opteron with 4G of ram, which we've
barely gotten 1,000 clients running against.  Current disk hardware in
testing is whatever we could scrape together (4x 3-ware PCI hardware
RAID controllers, with 8 SATA drives in a RAID10 array off of each -
aggregated up in a 4-way stripe with linux md driver and then formatted
as ext3 with an appropriate stride parameter and data=""
Production will hopefully be a 4-8-way Opteron, 16 or more G of RAM,
and a fiberchannel hardware raid array or two (~ 1TB available RAID10 storage) with 15krpm disks and
battery-backed write cache.

I know I haven't provided a whole lot of application-level detail here,
but does anyone have any general advice on tweaking postgresql to deal
with a very heavy load of concurrent and almost exclusively write-only
transactions?  Increasing shared_buffers seems to always help,
even out to half of the dev box's ram (2G).  A 100ms commit_delay
seemed to help, but tuning it (and _siblings) has been difficult. 
We're using 8.0 with the default 8k blocksize, but are strongly
considering both developing against 8.1 (seems it might handle the
heavy concurrency better), and re-compiling with 32k blocksize since
our storage arrays will inevitably be using fairly wide stripes. 
Any advice on any of this (other than drop the project while you're
still a little bit sane)?

--Brandon





Re: [PERFORM] RAID Stripe size

2005-09-20 Thread Alex Turner
I have benched different sripe sizes with different file systems, and the perfmance differences can be quite dramatic.

Theoreticaly a smaller stripe is better for OLTP as you can write more
small transactions independantly to more different disks more often
than not, but a large stripe size is good for Data warehousing as you
are often doing very large sequential reads, and a larger stripe size
is going to exploit the on-drive cache as you request larger single
chunks from the disk at a time.

It also seems that different controllers are partial to different
defaults that can affect their performance, so I would suggest that
testing this on two different controller cards man be less than optimal.

I would also recommend looking at file system.  For us JFS worked
signifcantly faster than resier for large read loads and large write
loads, so we chose JFS over ext3 and reiser.

I found that lower stripe sizes impacted performance badly as did overly large stripe sizes.

Alex Turner
NetEconomistOn 16 Sep 2005 04:51:43 -0700, bmmbn <[EMAIL PROTECTED]> wrote:
Hi EveryoneThe machine is IBM x345 with ServeRAID 6i 128mb cache and 6 SCSI 15kdisks.2 disks are in RAID1 and hold the OS, SWAP & pg_xlog4 disks are in RAID10 and hold the Cluster itself.
the DB will have two major tables 1 with 10 million rows and one with100 million rows.All the activities against this tables will be SELECT.Currently the strip size is 8k. I read in many place this is a poor
setting.Am i right ?---(end of broadcast)---TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] RAID Stripe size

2005-09-20 Thread Alex Turner
I have found JFS to be just fine.  We have been running a medium
load on this server for 9 months with no unscheduled down time. 
Datbase is about 30gig on disk, and we get about 3-4 requests per
second that generate results sets in the thousands from about 8am to
about 11pm.

I have foudn that JFS barfs if you put a million files in a directory
and try to do an 'ls', but then so did reiser, only Ext3 handled this
test succesfully.  Fortunately with a database, this is an
atypical situation, so JFS has been fine for DB for us so far.

We have had severe problems with Ext3 when file systems hit 100% usage,
they get all kinds of unhappy, we haven't had the same problem with JFS.

Alex Turner
NetEconomistOn 9/20/05, Welty, Richard <[EMAIL PROTECTED]> wrote:
Alex Turner  wrote:> I would also recommend looking at file system.  For us JFS worked signifcantly>  faster than resier for large read loads and large write loads, so we chose JFS>  over ext3 and reiser.
has jfs been reliable for you? there seems to be a lot of conjecture about instability,but i find jfs a potentially attractive alternative for a number of reasons.richard---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to   choose an index scan if your joining column's datatypes do not   match


Re: [PERFORM] Need for speed 2

2005-09-20 Thread Alex Turner
I have found that while the OS may flush to the controller fast with
fsync=true, the controller does as it pleases (it has BBU, so I'm not
too worried), so you get great performance because your controller is
determine read/write sequence outside of what is being demanded by an
fsync.

Alex Turner
NetEconomistOn 8/25/05, Kelly Burkhart <[EMAIL PROTECTED]> wrote:
On Thu, 2005-08-25 at 11:16 -0400, Ron wrote:> ># - Settings -> >>
>fsync =
false  
# turns forced synchronization on or off> >#wal_sync_method = fsync# the default varies across platforms:>
>
# fsync, fdatasync, open_sync, or>> I hope you have a battery backed write buffer!Battery backed write buffer will do nothing here, because the OS istaking it's sweet time flushing to the controller's battery backed write
buffer!Isn't the reason for batter backed controller cache to make fsync()sfast?-K---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not   match


Re: [PERFORM] Monitoring Postgresql performance

2005-09-28 Thread Alex Stapleton


On 28 Sep 2005, at 15:32, Arnau wrote:



Hi all,

  I have been "googling" a bit searching info about a way to  
monitor postgresql (CPU & Memory, num processes, ... ) and I  
haven't found anything relevant. I'm using munin to monitor others  
parameters of my servers and I'd like to include postgresql or have  
a similar tool. Any of you is using anything like that? all kind of  
hints are welcome :-)


Cheers!



Have you looked at SNMP? It's a bit complex but there's lots of tools  
for monitoring system data / sending alerts based on SNMP already.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Indexes on ramdisk

2005-10-04 Thread Alex Turner
Talk about your IO system a bit.  There might be obvious ways to improve.

What System/Motherboard are you using?
What Controller Cards are you using?
What kind of Disks do you have (SATA, SCSI 7.6k 10k 15k)
What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?
What kind of RAIDs do you have setup (How many drives what stripe sizes, how many used for what).
What levels of RAID are you using (0,1,10,5,50)?

With good setup, a dual PCI-X bus motherboard can hit 2GB/sec and
thousands of transactions  to disk if you have a controller/disks
that can keep up.  That is typicaly enough for most people without
resorting to SSD.

Alex Turner
NetEconomistOn 10/4/05, Emil Briggs <[EMAIL PROTECTED]> wrote:
I have an application that has a table that is both read and write intensive.Data from iostat indicates that the write speed of the system is the factorthat is limiting performance. The table has around 20 columns and most of the
columns are indexed. The data and the indices for the table are distributedover several mirrored disk partitions and pg_xlog is on another. I'm lookingat ways to improve performance and besides the obvious one of getting an SSD
I thought about putting the indices on a ramdisk. That means that after apower failure or shutdown I would have to recreate them but that isacceptable for this application. What I am wondering though is whether or not
I would see much performance benefit and if there would be any startupproblems after a power down event due to the indices not being present. Anyinsight would be appreciated.Emil---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Indexes on ramdisk

2005-10-05 Thread Alex Turner
What kind of order of improvement do you need to see?

What period are these number for?  Were they collected over 1 hour, 1 day, 1 month?

How much Cache do you have on the controller?

You can certainly get more speed by adding more disk and possibly by
adding more controller RAM/a second controller.  10 disks isn't
really that many for a totally kick-ass DB server.  You can
acheive more block writes with RAID 10s than with RAID 1s.  Wether
it's cost effective is dependant on lots of factors like your chassis
and drive enclosures etc. vs SSD.  SSD will be faster, but last I
heard was expensive, and I checked a few websites but couldn't get much
price info.  Normaly when you can't get price info, thats a bad
sign ;).  If you are doing large chunks of writes to a small
number of tables, then you might be better off with a single large RAID
10 for your tablespace than with seperate RAID 1s.  If you are
writing 5 to 1 more table data than index data, you are hurting
yourself by seperating on to multiple RAID 1s instead of a single RAID
10 which could write at 2-3x for the 5, and 2-3x for the 1 and only
suffer a single seek penalty but get data onto disk twice to three
times as fast (depending how many RAID 1s you join).  Try
unseperating RAID 1s, and combine to a RAID 10. for indexes and
tablespaces.  The controller will re-sequence your writes/reads to
help with effeciency, and dbwriter is there to make things go easier.

You can at least get some idea by doing an iostat and see how many IOs
and how much throughput is happening. That will rappidly help determine
if you are bound by IOs or by MB/sec.

Worst case I'm wrong, but IMHO it's worth a try.

Alex Turner
NetEconomistOn 10/4/05, Emil Briggs <[EMAIL PROTECTED]> wrote:
> Talk about your IO system a bit. There might be obvious ways to improve.>> What System/Motherboard are you using?> What Controller Cards are you using?> What kind of Disks do you have (SATA, SCSI 
7.6k 10k 15k)> What denominations (9, 18, 36, 72, 143, 80, 160, 200 240Gig)?> What kind of RAIDs do you have setup (How many drives what stripe sizes,> how many used for what).> What levels of RAID are you using (0,1,10,5,50)?
>It's a quad opteron system. RAID controller is a 4 channel LSILogic Megaraid320 connected to 10 15k 36.7G SCSI disks. The disks are configured in 5mirrored partitions. The pg_xlog is on one mirror and the data and indexes
are spread over the other 4 using tablespaces. These numbers frompg_stat_user_tables are from about 2 hours earlier today on this one table.idx_scan   20578690idx_tup_fetch  35866104841
n_tup_ins1940081n_tup_upd   1604041n_tup_del1880424---(end of broadcast)---TIP 4: Have you searched our list archives?
   http://archives.postgresql.org


Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Alex Turner
Realise also that unless you are running the 1.5 x86-64 build, java
will not use more than 1Gig, and if the app server requests more than
1gig, Java will die (I've been there) with an out of memory error, even
though there is plenty of free mem available.  This can easily be
cause by a lazy GC thread if the applicaiton is running high on CPU
usage.

The kernel will not report memory used for caching pages as being
unavailable, if a program calls a malloc, the kernel will just swap out
the oldest disk page and give the memory to the application.

Your free -mo shows 3 gig free even with cached disk pages.  It
looks to me more like either a Java problem, or a kernel problem...

Alex Turner
NetEconomistOn 10/10/05, Jon Brisbin <[EMAIL PROTECTED]> wrote:
Tom Lane wrote:>> Are you sure it's not cached data pages, rather than cached inodes?> If so, the above behavior is *good*.>> People often have a mistaken notion that having near-zero free RAM means
> they have a problem.  In point of fact, that is the way it is supposed> to be (at least on Unix-like systems).  This is just a reflection of the> kernel doing what it is supposed to do, which is to use all spare RAM
> for caching recently accessed disk pages.  If you're not swapping then> you do not have a problem.Except for the fact that my Java App server crashes when all theavailable memory is being used by caching and not reclaimed :-)
If it wasn't for the app server going down, I probably wouldn't care.--Jon BrisbinWebmasterNPC International, Inc.---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate   subscribe-nomail command to [EMAIL PROTECTED] so that your   message can get through to the mailing list cleanly



Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Alex Turner
Perhaps this is true for 1.5 on x86-32 (I've only used it on x86-64)
but I was more thinking 1.4 which many folks are still using.

AlexOn 10/11/05, Alan Stange <[EMAIL PROTECTED]> wrote:
Alex Turner wrote:> Realise also that unless you are running the 1.5 x86-64 build, java> will not use more than 1Gig, and if the app server requests more than> 1gig, Java will die (I've been there) with an out of memory error,
> even though there is plenty of free mem available.  This can easily be> cause by a lazy GC thread if the applicaiton is running high on CPU usage.On my side of Planet Earth, the standard non-x64 1.5
 JVM will happilyuse more than 1G of memory (on linux and Solaris, can't speak forWindows).  If you're running larger programs, it's probably a good ideato use the -server compiler in the JVM as well.  I regularly run with
-Xmx1800m and regularly have >1GB heap sizes.The standard GC will not cause on OOM error if space remains for therequested object.  The GC thread blocks all other threads during itsactivity, whatever else is happening on the machine.   The
newer/experimental GC's did have some potential race conditions, but Ibelieve those have been resolved in the 1.5 JVMs.Finally, note that the latest _05 release of the 1.5 JVM also nowsupports large page sizes on Linux and Windows:
-XX:+UseLargePages   this can be quite beneficial depending on thememory patterns in your programs.-- Alan


Re: [PERFORM] Performance on SUSE w/ reiserfs

2005-10-11 Thread Alex Turner
Well - to each his own I guess - we did extensive testing on 1.4, and
it refused to allocate much past 1gig on both Linux x86/x86-64 and
Windows.

AlexOn 10/11/05, Alan Stange <[EMAIL PROTECTED]> wrote:
Alex Turner wrote:> Perhaps this is true for 1.5 on x86-32 (I've only used it on x86-64)> but I was more thinking 1.4 which many folks are still using.The 1.4.x JVM's will also work just fine with much more than 1GB of
memory.   Perhaps you'd like to try again?-- Alan>> On 10/11/05, *Alan Stange* <[EMAIL PROTECTED]> 
[EMAIL PROTECTED]>> wrote:>> Alex Turner wrote:>> > Realise also that unless you are running the 1.5 x86-64 build, java> > will not use more than 1Gig, and if the app server requests more
> than> > 1gig, Java will die (I've been there) with an out of memory error,> > even though there is plenty of free mem available.  This can> easily be> > cause by a lazy GC thread if the applicaiton is running high on
> CPU usage.>> On my side of Planet Earth, the standard non-x64 1.5 JVM will happily> use more than 1G of memory (on linux and Solaris, can't speak for> Windows).  If you're running larger programs, it's probably a good
> idea> to use the -server compiler in the JVM as well.  I regularly run with> -Xmx1800m and regularly have >1GB heap sizes.>> The standard GC will not cause on OOM error if space remains for the
> requested object.  The GC thread blocks all other threads during its> activity, whatever else is happening on the machine.   The> newer/experimental GC's did have some potential race conditions, but I
> believe those have been resolved in the 1.5 JVMs.>> Finally, note that the latest _05 release of the 1.5 JVM also now> supports large page sizes on Linux and Windows:> -XX:+UseLargePages   this can be quite beneficial depending on the
> memory patterns in your programs.>> -- Alan>>


Re: [PERFORM] What gets cached?

2005-10-21 Thread Alex Turner
Oracle uses LRU caching algorithm also, not LFU.

AlexOn 10/21/05, Martin Nickel <[EMAIL PROTECTED]> wrote:
I was reading a comment in another posting and it started me thinkingabout this.  Let's say I startup an Oracle server.  All my queries are alittle bit (sometimes a lot bit) slow until it gets its "normal" things in
memory, then it's up to speed.  The "normal" things would include somesmall lookup tables and the indexes for the most frequently used tables.Let's say I do the same thing in Postgres.  I'm likely to have my very
fastest performance for the first few queries until memory gets filled up. The only time Postgres seems to take advantage of cached data is when I repeat the same (or substantially the same) query.  I don't know of any
 way to view what is actually cached at any point in time, but it seems like "most recently used" rather than "most frequently used".Does this seem true? s---(end of broadcast)---
TIP 4: Have you searched our list archives?   http://archives.postgresql.org


Re: [PERFORM] Used Memory

2005-10-21 Thread Alex Turner
[snip]to the second processor in my dual Xeon eServer) has got me to thepoint that the perpetually high memory usage doesn't affect my
application server.
I'm curious - how does the high memory usage affect your application server?

Alex 



Re: [PERFORM] What gets cached?

2005-10-24 Thread Alex Turner
Just to play devils advocate here for as second, but if we have an
algorithm that is substational better than just plain old LRU, which is
what I believe the kernel is going to use to cache pages (I'm no kernel
hacker), then why don't we apply that and have a significantly larger
page cache a la Oracle?

AlexOn 10/21/05, Neil Conway <[EMAIL PROTECTED]> wrote:
On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote:> Let's say I do the same thing in Postgres.  I'm likely to have my very> fastest performance for the first few queries until memory gets filled up.
No, you're not: if a query doesn't hit the cache (both the OS cache andthe Postgres userspace cache), it will run slower. If the caches areempty when Postgres starts up (which is true for the userspace cache and
might be true of the OS cache), the first queries that are run should beslower, not faster.>  The only time Postgres seems to take advantage of cached data is when I>  repeat the same (or substantially the same) query.
Caching is done on a page-by-page basis -- the source text of the queryitself is not relevant. If two different queries happen to hit a similarset of pages, they will probably both benefit from the same set of
cached pages.> I don't know of any way to view what is actually cached at any point> in time, but it seems like "most recently used" rather than "most> frequently used".
The cache replacement policy in 7.4 and older releases is simple LRU.The policy in 8.0 is ARC (essentially a version of LRU modified to tryto retain hot pages more accurately). The policy in 8.1 is a clock-based
algorithm.-Neil---(end of broadcast)---TIP 6: explain analyze is your friend


Re: [PERFORM] Is There Any Way ....

2005-10-24 Thread Alex Turner
This is possible with Oracle utilizing the keep pool

alter table t_name storage ( buffer_pool keep);

If Postgres were to implement it's own caching system, this seems like
it would be easily to implement (beyond the initial caching effort).

Alex


On 10/24/05, Craig A. James <[EMAIL PROTECTED]> wrote:
> Jim C. Nasby"  wrote:
> > > Stefan Weiss wrote:
> > > ... IMO it would be useful to have a way to tell
> > > PG that some tables were needed frequently, and should be cached if
> > > possible. This would allow application developers to consider joins with
> > > these tables as "cheap", even when querying on columns that are
> > > not indexed.
> >
> > Why do you think you'll know better than the database how frequently
> > something is used? At best, your guess will be correct and PostgreSQL
> > (or the kernel) will keep the table in memory. Or, your guess is wrong
> > and you end up wasting memory that could have been used for something
> > else.
> >
> > It would probably be better if you describe why you want to force this
> > table (or tables) into memory, so we can point you at more appropriate
> > solutions.
>
> Or perhaps we could explain why we NEED to force these tables into memory, so 
> we can point you at a more appropriate implementation.  ;-)
>
> Ok, wittiness aside, here's a concrete example.  I have an application with 
> one critical index that MUST remain in memory at all times.  The index's 
> tablespace is about 2 GB.  As long as it's in memory, performance is 
> excellent - a user's query takes a fraction of a second.  But if it gets 
> swapped out, the user's query might take up to five minutes as the index is 
> re-read from memory.
>
> Now here's the rub.  The only performance I care about is response to queries 
> from the web application.  Everything else is low priority.  But there is 
> other activity going on.  Suppose, for example, that I'm updating tables, 
> performing queries, doing administration, etc., etc., for a period of an 
> hour, during which no customer visits the site.  The another customer comes 
> along and performs a query.
>
> At this point, no heuristic in the world could have guessed that I DON'T CARE 
> ABOUT PERFORMANCE for anything except my web application.  The performance of 
> all the other stuff, the administration, the updates, etc., is utterly 
> irrelevant compared to the performance of the customer's query.
>
> What actually happens is that the other activities have swapped out the 
> critical index, and my customer waits, and waits, and waits... and goes away 
> after a minute or two.  To solve this, we've been forced to purchase two 
> computers, and mirror the database on both.  All administration and 
> modification happens on the "offline" database, and the web application only 
> uses the "online" database.  At some point, we swap the two servers, sync the 
> two databases, and carry on.  It's a very unsatisfactory solution.
>
> There is ONLY one way to convey this sort of information to Postgres, which 
> is to provide the application developer a mechanism to explicitely name the 
> tables that should be locked in memory.
>
> Look at tsearchd that Oleg is working on.  It's a direct response to this 
> problem.
>
> It's been recognized for decades that, as kernel developers (whether a Linux 
> kernel or a database kernel), our ability to predict the behavior of an 
> application is woefully inadequate compared with the application developer's 
> knowledge of the application.  Computer Science simply isn't a match for the 
> human brain yet, not even close.
>
> To give you perspective, since I posted a question about this problem 
> (regarding tsearch2/GIST indexes), half of the responses I received told me 
> that they encountered this problem, and their solution was to use an external 
> full-text engine.  They all confirmed that Postgres can't deal with this 
> problem yet, primarily for the reasons outlined above.
>
> Craig
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
>

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Alex Turner
You could also create your own index so to speak as a table that
simply contains a list of primary keys and an order value field that
you can use as your offset.  This can be kept in sync with the master
table using triggers pretty easily.  2 million is not very much if you
only have a integer pkey, and an integer order value, then you can
join it against the main table.

create table my_index_table (
primary_key_value int,
order_val int,
primary key (primary_key_value));

create index my_index_table_order_val_i on index_table (order_val);

select * from main_table a, my_index_table b where b.order_val>=25 and
b.order_val<50 and a.primary_key_id=b.primary_key_id

If the data updates alot then this won't work as well though as the
index table will require frequent updates to potentialy large number
of records (although a small number of pages so it still won't be
horrible).

Alex Turner
NetEconomist

On 10/26/05, Joshua D. Drake <[EMAIL PROTECTED]> wrote:
>
> > We have a GUI that let user browser through the record page by page at
> > about 25 records a time. (Don't ask me why but we have to have this
> > GUI). This translates to something like
> >
> >   select count(*) from table   <-- to give feedback about the DB size
>
> Do you have a integer field that is an ID that increments? E.g; serial?
>
> >   select * from table order by date limit 25 offset 0
>
> You could use a cursor.
>
> Sincerely,
>
> Joshua D. Drake
>
>
> >
> > Tables seems properly indexed, with vacuum and analyze ran regularly.
> > Still this very basic SQLs takes up to a minute run.
> >
> > I read some recent messages that select count(*) would need a table
> > scan for Postgre. That's disappointing. But I can accept an
> > approximation if there are some way to do so. But how can I optimize
> > select * from table order by date limit x offset y? One minute
> > response time is not acceptable.
> >
> > Any help would be appriciated.
> >
> > Wy
> >
> >
> --
> The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> Managed Services, Shared and Dedicated Hosting
> Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
>
>
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings
>

---(end of broadcast)---
TIP 6: explain analyze is your friend


  1   2   3   >