Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-14 Thread Justin



Is this on a 64 bit or 32 bit machine?  We had the problem with a 32
bit linux box (not sure what flavor) just a few months ago.  I would
not create a filesystem on a partition of 2+TB
  

Yes this machine is 64bit

You do know that effective_cache_size is the size of the OS level
cache.  i.e. it won't show up in postgresql's memory usage.  On a
machine with (I assume) 12 or more gigs or memory, you should have
your shared_buffers set to a much higher number than 100Meg.  (unless
you're still running 7.4 but that's another story.)



Sorry for my ignorance of linux, i'm used to windows task manager or 
performance monitor showing all the

memory usage.  I
decided to move to Linux on the new server to get 64bit so still in the 
learning curve with that


I played with shared_buffer and never saw much of an improvement from
100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
still never saw no movement in the numbers.

i agree with you, those numbers are terrible  i realized after posting i
had the option -C turned on
if i read the option -C correctly it is disconnecting and reconnecting
between transactions. The way read -C option creates the worst case.

The raid controller setting is set to make sure it don't lie on fsync

shared_buffers = 800megs
temp_buffers 204800
work_mem 256MB
fsync_on
wal_syns_method fysnc



C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 1 -v -h
192.168.1.9 -U postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 1768.940935 (including connections establishing)
tps = 1783.230500 (excluding connections establishing)


C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 1 -v -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 1
number of transactions actually processed: 40/40
tps = 567.149831 (including connections establishing)
tps = 568.648692 (excluding connections establishing)

--now with just Select --

C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 10 -t 1 -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 16160.310278 (including connections establishing)
tps = 17436.791630 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 40 -t 1 -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 40
number of transactions per client: 1
number of transactions actually processed: 40/40
tps = 18338.529250 (including connections establishing)
tps = 20031.048125 (excluding connections establishing)





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-14 Thread Greg Smith

On Fri, 14 Mar 2008, Justin wrote:


I played with shared_buffer and never saw much of an improvement from
100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
still never saw no movement in the numbers.


Increasing shared_buffers normally improves performance as the size of the 
database goes up, but since the pgbench workload is so simple the 
operating system will cache it pretty well even if you don't give the 
memory directly to PostgreSQL.  Also, on Windows large settings for 
shared_buffers don't work very well, you might as well keep it in the 
100MB range.



wal_sync_method=fsync


You might get a decent boost in resuls that write data (not the SELECT 
ones) by changing


wal_sync_method = open_datasync

which is the default on Windows.  The way you've got your RAID controller 
setup, this is no more or less safe than using fsync.


i agree with you, those numbers are terrible i realized after posting i 
had the option -C turned on if i read the option -C correctly it is 
disconnecting and reconnecting between transactions. The way read -C 
option creates the worst case.


In addition to being an odd testing mode, there's an outstanding bug in 
how -C results are computed that someone submitted a fix for, but it 
hasn't been applied yet.  I would suggest forgetting you ever ran that 
test.



number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 1768.940935 (including connections establishing)



number of clients: 40
number of transactions per client: 1
number of transactions actually processed: 40/40
tps = 567.149831 (including connections establishing)
tps = 568.648692 (excluding connections establishing)


Note how the total number of transactions goes up here, because it's 
actually doing clients x requested transcations in total.  The 40 client 
case is actually doing 4X as many total operations.  That also means you 
can expect 4X as many checkpoints during that run.  It's a longer run like 
this second one that you might see some impact by increasing 
checkpoint_segments.


To keep comparisons like this more fair, I like to keep the total 
transactions constant and just divide that number by the number of clients 
to figure out what to set the -t parameter to.  40 is a good medium 
length test, so for that case you'd get


-c 10 -t 4
-c 40 -t 1

as the two to compare.

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-14 Thread Justin



Greg Smith wrote:

On Fri, 14 Mar 2008, Justin wrote:


I played with shared_buffer and never saw much of an improvement from
100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
still never saw no movement in the numbers.


Increasing shared_buffers normally improves performance as the size of 
the database goes up, but since the pgbench workload is so simple the 
operating system will cache it pretty well even if you don't give the 
memory directly to PostgreSQL.  Also, on Windows large settings for 
shared_buffers don't work very well, you might as well keep it in the 
100MB range.



wal_sync_method=fsync


You might get a decent boost in resuls that write data (not the SELECT 
ones) by changing


wal_sync_method = open_datasync

which is the default on Windows.  The way you've got your RAID 
controller setup, this is no more or less safe than using fsync.
I moved the window server back to fsync a long time ago.  Around here we 
are super paranoid about making sure the data does not become corrupt, 
performance is secondary.  The new server along with the old server is 
way over built for the load it will ever see.  I will be making the old 
server a slony replicator located in the manufacturing building.


Also **note* *tried setting the value open_datasync and get invalid 
parameter.  instead i use open_sync


i agree with you, those numbers are terrible i realized after posting 
i had the option -C turned on if i read the option -C correctly it is 
disconnecting and reconnecting between transactions. The way read -C 
option creates the worst case.


In addition to being an odd testing mode, there's an outstanding bug 
in how -C results are computed that someone submitted a fix for, but 
it hasn't been applied yet.  I would suggest forgetting you ever ran 
that test.

Why is the -C option odd?



Note how the total number of transactions goes up here, because it's 
actually doing clients x requested transcations in total.  The 40 
client case is actually doing 4X as many total operations.  That also 
means you can expect 4X as many checkpoints during that run.  It's a 
longer run like this second one that you might see some impact by 
increasing checkpoint_segments.


To keep comparisons like this more fair, I like to keep the total 
transactions constant and just divide that number by the number of 
clients to figure out what to set the -t parameter to.  40 is a 
good medium length test, so for that case you'd get


-c 10 -t 4
-c 40 -t 1

as the two to compare.


 retested with fsync turned on -

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 4 -v -h 
192.168.1.9 -U

postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 4
number of transactions actually processed: 40/40
tps = 767.040279 (including connections establishing)
tps = 767.707166 (excluding connections establishing)


C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 1 -v -h 
192.168.1.9 -U

postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 1
number of transactions actually processed: 40/40
tps = 648.988227 (including connections establishing)
tps = 650.935720 (excluding connections establishing)


---open_sync

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 4 -v -h 
192.168.1.9 -U

postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 4
number of transactions actually processed: 40/40
tps = 798.030461 (including connections establishing)
tps = 798.752349 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 1 -v -h 
192.168.1.9 -U

postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 1
number of transactions actually processed: 40/40
tps = 613.879195 (including connections establishing)
tps = 615.592023 (excluding connections establishing)


Re: [PERFORM] ER diagram tool

2008-03-14 Thread Jurgen Haan
ERStudio
Toad Data Modeller

And you might try searching sourceforge or freshmeat.

sathiya psql wrote:
> Is there any tool to draw ER diagram from SQL schema file...
> 
> 
> no other groups are replying.

-- 
Easyflex diensten b.v.
Acaciastraat 16
4921 MA  MADE
T: 0162 - 690410
F: 0162 - 690419
E: [EMAIL PROTECTED]
W: http://www.easyflex.nl

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.3 write performance

2008-03-14 Thread Enrico Sirola

Hi,
I follow up myself: I was using pgbench with the wrong scale size.  
With the configuration I posted before and scale=100 I Get the  
following:


sudo -u postgres pgbench -c 10 -t 1 -s 100
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 4399.753895 (including connections establishing)
tps = 4405.228901 (excluding connections establishing)

sudo -u postgres pgbench -c 50 -t 1 -s 100
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 50
number of transactions per client: 1
number of transactions actually processed: 50/50
tps = 3208.532479 (including connections establishing)
tps = 3211.816174 (excluding connections establishing)

which I think is in line with the expectations.
Thanks,
e.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ER diagram tool

2008-03-14 Thread sathiya psql
>
> 14:31 < rtfm_please> For information about erd
> 14:31 < rtfm_please> see http://druid.sf.net/
> 14:31 < rtfm_please> or http://schemaspy.sourceforge.net/


A very great Thanks.

SchemaSpy drawn ER diagram by referring my database...
it done a very good job

Thanks a lot GUY...


> 
> 14:31 < rtfm_please> or http://uml.sourceforge.net/index.php
>
>


[PERFORM] Lots of "semop" calls under load

2008-03-14 Thread Albe Laurenz
On a database (PostgreSQL 8.2.4 on 64-bit Linux 2.6.18 on 8 AMD Opterons)
that is under high load, I observe the following:

- About 200 database sessions concurrently issue queries, most of them small,
  but I have found one that touches 38000 table and index blocks.
- "vmstat" shows that CPU time is divided between "idle" and "iowait",
  with user and sys time practically zero.
- the run queue is short, but the blocked queue (uninterruptible sleep) is 
around 10.
- Many context switches are reported (over hundred per second).
- "sar" says that the disk with the database is on 100% of its capacity.
  Storage is on a SAN box.

Queries that normally take seconds at most require up to an hour to finish.

I ran "lsof -p" on a backend running the big query mentioned above, and
it does not use any temp files (work_mem = 20MB).
The query accesses only one table and its index.

What puzzles me is the "strace -tt" output from that backend:

13:44:58.263598 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.313448 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.313567 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.442917 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.443074 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.565313 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.682178 semop(229382, 0x7fff482f6070, 1) = 0
13:44:58.682333 semop(393227, 0x7fff482f6480, 1) = 0
13:44:58.807452 semop(393227, 0x7fff482f6050, 1) = 0
13:44:58.924425 semop(393227, 0x7fff482f6480, 1) = 0
13:44:58.924727 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.045456 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.169011 semop(393227, 0x7fff482f6480, 1) = 0
13:44:59.169226 semop(327689, 0x7fff482f64a0, 1) = 0
[many more semops]
13:44:59.602532 semop(327689, 0x7fff482f6070, 1) = 0
13:44:59.602648 lseek(32, 120176640, SEEK_SET) = 120176640
13:44:59.602742 read(32, "{\0\0\0xwv\227\1\0\0\0\320\0\350\0\0 \3  [EMAIL 
PROTECTED]"..., 8192) = 8192
13:44:59.602825 semop(327689, 0x7fff482f64d0, 1) = 0
13:44:59.602872 semop(393227, 0x7fff482f6080, 1) = 0
13:44:59.602929 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.614559 semop(360458, 0x7fff482f6070, 1) = 0
[many more semops]
13:44:59.742103 semop(229382, 0x7fff482f64a0, 1) = 0
13:44:59.742172 semop(393227, 0x7fff482f6050, 1) = 0
13:44:59.756526 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
13:44:59.758096 semop(393227, 0x7fff482f6480, 1) = 0
13:44:59.771655 semop(393227, 0x7fff482f6050, 1) = 0
[hundreds of semops]
13:45:14.339905 semop(393227, 0x7fff482f6050, 1) = 0
13:45:14.466992 semop(360458, 0x7fff482f6070, 1) = 0
13:45:14.467102 lseek(33, 332693504, SEEK_SET) = 332693504
13:45:14.467138 read(33, "{\0\0\0\210\235\351\331\1\0\0\0\204\0010\32\360\37\3 
\340\237 \0\320\237 \0\300\237 \0"..., 8192) = 8192
13:45:14.599815 semop(163844, 0x7fff482f60a0, 1) = 0
13:45:14.66 lseek(32, 125034496, SEEK_SET) = 125034496
13:45:14.600305 read(32, "{\0\0\0\230\257\270\227\1\0\0\0\330\0\340\0\0 \3  
[EMAIL PROTECTED]"..., 8192) = 8192
13:45:14.600391 semop(163844, 0x7fff482f64d0, 1) = 0
13:45:14.600519 semop(393227, 0x7fff482f6480, 1) = 0

and so on. File 32 is the table, file 33 is the index.

Many of the table and index blocks are probably already in shared memory
(shared_buffers = 6GB) and don't have to be read from disk.

My questions:

Is the long duration of the query caused by something else than I/O overload?
What are the semops? Lightweight locks waiting for shared buffer?
Are the lseek and read operations really that fast although the disk is on 100%?

Is this normal behavior under overload or is something ill tuned?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-14 Thread Michael Stone

On Thu, Mar 13, 2008 at 05:27:09PM -0400, Greg Smith wrote:
I haven't found fdatasync to be significantly better in my tests on Linux 
but I never went out of my way to try and quantify it.  My understanding 
is that some of the write barrier implementation details on ext3 
filesystems make any sync call a relatively heavy operation but I haven't 
poked at the code yet to figure out why.


Which is why having the checkpoints on a seperate ext2 partition tends 
to be a nice win. (Even if its not on a seperate disk.)


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Lots of "semop" calls under load

2008-03-14 Thread Tom Lane
"Albe Laurenz" <[EMAIL PROTECTED]> writes:
> On a database (PostgreSQL 8.2.4 on 64-bit Linux 2.6.18 on 8 AMD Opterons)
> that is under high load, I observe the following:
> ...
> - "vmstat" shows that CPU time is divided between "idle" and "iowait",
>   with user and sys time practically zero.
> - "sar" says that the disk with the database is on 100% of its capacity.

It sounds like you've simply saturated the disk's I/O bandwidth.
(I've noticed that Linux isn't all that good about distinguishing "idle"
from "iowait" --- more than likely you're really looking at 100% iowait.)

>   Storage is on a SAN box.

What kind of SAN box?  You're going to need something pretty beefy to
keep all those CPUs busy.

> What puzzles me is the "strace -tt" output from that backend:

Some low level of contention and consequent semops/context switches
is to be expected.  I don't think you need to worry if it's only
100/sec.  The sort of "context swap storm" behavior we've seen in
the past is in the tens of thousands of swaps/sec on hardware
much weaker than what you have here --- if you were seeing one of
those I bet you'd be well above 10 swaps/sec.

> Are the lseek and read operations really that fast although the disk is on 
> 100%?

lseek is (should be) cheap ... it doesn't do any actual I/O.  The
read()s you're showing here were probably satisfied from kernel disk
cache.  If you look at a larger sample you'll find slower ones, I think.
Another thing to look for is slow writes.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Adaptec 5805 SAS Raid

2008-03-14 Thread Glyn Astill
Any of you chaps used this controller?



  ___ 
Rise to the challenge for Sport Relief with Yahoo! For Good  

http://uk.promotions.yahoo.com/forgood/


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] The "many nulls" problem

2008-03-14 Thread Kynn Jones
It often happens that a particular pieces of information is non-null for a
small minority of cases.  A superficially different manifestation of this is
when two pieces of information are identical in all but a small minority of
cases.  This can be easily mapped to the previous description by defining a
null in one column to mean that its contents should be obtained from those
of another column.  A further variant of this is when one piece of
information is a simple function of another one in all but a small minority
of cases.

(BTW, I vaguely recall that RDb theorists have a technical term for this
particular design issue, but I don't remember it.)

In all these cases, the design choice, at least according to RDb's 101, is
between including a column in the table that will be NULL most of the time,
or defining a second auxiliary column that references the first one and
holds the non-redundant information for the minority of cases for which this
is necessary (and maybe define a VIEW that includes all the columns).

But for me it is a frequent occurrence that my quaint and simple RDb's 101
reasoning doesn't really apply for PostgreSQL.  Basically, Pg is too smart
for it!  For example, does a large proportion of NULLs really imply a lot of
wasted space?  Maybe this is true for fixed-length data types, but what
about for type TEXT or VARCHAR?

Just to be concrete, consider the case of a customers database for some home
shopping website.  Suppose that, as it happens, for the majority of this
site's customers, the shipping and billing addresses are identical.  Or
consider the scenario of a company in which, for most employees, the email
address can be readily computed from the first and last name using the rule
First M. Last => [EMAIL PROTECTED], but the company allows some
flexibility for special cases (e.g. for people like Yasuhiro Tanaka who's
known to everyone by his nickname, Yaz, the email is
[EMAIL PROTECTED] hardly anyone remembers or even knows his
full name.)

What's your schema design approach for such situations?  How would you go
about deciding whether the number of exceptional cases is small enough to
warrant a second table?  Of course, one could do a systematic profiling of
various possible scenarios, but as a first approximation what's your
rule-of-thumb?

TIA!

Kynn


Re: [PERFORM] The "many nulls" problem

2008-03-14 Thread Oleg Bartunov

Kynn,

have you seen contrib/hstore ? You can have one table with common attributes
and hide others in hstore

Oleg
On Fri, 14 Mar 2008, Kynn Jones wrote:


It often happens that a particular pieces of information is non-null for a
small minority of cases.  A superficially different manifestation of this is
when two pieces of information are identical in all but a small minority of
cases.  This can be easily mapped to the previous description by defining a
null in one column to mean that its contents should be obtained from those
of another column.  A further variant of this is when one piece of
information is a simple function of another one in all but a small minority
of cases.

(BTW, I vaguely recall that RDb theorists have a technical term for this
particular design issue, but I don't remember it.)

In all these cases, the design choice, at least according to RDb's 101, is
between including a column in the table that will be NULL most of the time,
or defining a second auxiliary column that references the first one and
holds the non-redundant information for the minority of cases for which this
is necessary (and maybe define a VIEW that includes all the columns).

But for me it is a frequent occurrence that my quaint and simple RDb's 101
reasoning doesn't really apply for PostgreSQL.  Basically, Pg is too smart
for it!  For example, does a large proportion of NULLs really imply a lot of
wasted space?  Maybe this is true for fixed-length data types, but what
about for type TEXT or VARCHAR?

Just to be concrete, consider the case of a customers database for some home
shopping website.  Suppose that, as it happens, for the majority of this
site's customers, the shipping and billing addresses are identical.  Or
consider the scenario of a company in which, for most employees, the email
address can be readily computed from the first and last name using the rule
First M. Last => [EMAIL PROTECTED], but the company allows some
flexibility for special cases (e.g. for people like Yasuhiro Tanaka who's
known to everyone by his nickname, Yaz, the email is
[EMAIL PROTECTED] hardly anyone remembers or even knows his
full name.)

What's your schema design approach for such situations?  How would you go
about deciding whether the number of exceptional cases is small enough to
warrant a second table?  Of course, one could do a systematic profiling of
various possible scenarios, but as a first approximation what's your
rule-of-thumb?

TIA!

Kynn



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware question for a DB server

2008-03-14 Thread Pascal Cohen

Greg Smith wrote:

On Wed, 12 Mar 2008, Mark Lewis wrote:


One question that's likely going to be important depending on your
answers above is whether or not you're getting a battery-backed write
cache for that ServeRAID-8K.


Apparently there's a 8k-l and an regular 8-k; the l doesn't have the 
cache, so if this one is a regular 8-k it will have 256MB and a 
battery. See 
http://www.redbooks.ibm.com/abstracts/TIPS0054.html?Open#ServeRAID-8k

It is the solution with RAM and battery.


From Pascal's description of the application this system sounds like 
overkill whether or not there's a cache.  For scaling to lots of small 
requests, using things like using connection pooling may end up being 
more important than worring about the disk system (the database isn't 
big enough relative to RAM for that to be too important).


I agree with what you are saying. We are using Java with a pool of 
connections to access the DB. Today our database is really small 
compared to the RAM but it may evolve and even will probably grow (hope 
so which would be a good situation).


Thanks for your advices/remarks.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hardware question for a DB server

2008-03-14 Thread Scott Marlowe
On Fri, Mar 14, 2008 at 1:24 PM, Pascal Cohen <[EMAIL PROTECTED]> wrote:
>  I agree with what you are saying. We are using Java with a pool of
>  connections to access the DB. Today our database is really small
>  compared to the RAM but it may evolve and even will probably grow (hope
>  so which would be a good situation).
>


Keep in mind that differential cost between a mediocre and a good RAID
controller is often only a few hundred dollars.  If that means you can
scale to 10 or 100 times as many users, it's an investment worth
making up front rather than later on.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The "many nulls" problem

2008-03-14 Thread Heikki Linnakangas

Kynn Jones wrote:

In all these cases, the design choice, at least according to RDb's 101, is
between including a column in the table that will be NULL most of the time,
or defining a second auxiliary column that references the first one and
holds the non-redundant information for the minority of cases for which this
is necessary (and maybe define a VIEW that includes all the columns).

But for me it is a frequent occurrence that my quaint and simple RDb's 101
reasoning doesn't really apply for PostgreSQL.  Basically, Pg is too smart
for it!  For example, does a large proportion of NULLs really imply a lot of
wasted space?  


It depends. If there's *any* NULLs on a row, a bitmap of the NULLs is 
stored in the tuple header. Without NULL bitmap, the tuple header is 23 
bytes, and due to memory alignment, it's always rounded up to 24 bytes. 
That one padding byte is "free" for use as NULL bitmap, so it happens 
that if your table has eight columns or less, NULLs will take no space 
at all. If you have more columns than that, if there's *any* NULLs on a 
row you'll waste a whole 4 or 8 bytes (or more if you have a very wide 
table and go beyond the next 4/8 byte boundary), depending on whether 
you're on a 32-bit or 64-bit platform, regardless of how many NULLs 
there is.


That's on 8.3. 8.2 and earlier versions are similar, but the tuple 
header used to be 27 bytes instead of 23, so you have either one or five 
"free" bytes, depending on architecture.


In any case, that's pretty good compared to many other RDBMSs.

> Maybe this is true for fixed-length data types, but what
> about for type TEXT or VARCHAR?

Datatype doesn't make any difference. Neither does fixed vs variable length.


What's your schema design approach for such situations?  How would you go
about deciding whether the number of exceptional cases is small enough to
warrant a second table?  Of course, one could do a systematic profiling of
various possible scenarios, but as a first approximation what's your
rule-of-thumb?


From performance point of view, I would go with a single table with 
NULL fields on PostgreSQL.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The "many nulls" problem

2008-03-14 Thread Kynn Jones
On Fri, Mar 14, 2008 at 3:46 PM, Heikki Linnakangas <[EMAIL PROTECTED]>
wrote:
>
> 
>
 From performance point of view, I would go with a single table with
> NULL fields on PostgreSQL.


Wow.  I'm so glad I asked!  Thank you very much!

Kynn


Re: [PERFORM] The "many nulls" problem

2008-03-14 Thread Kynn Jones
On Fri, Mar 14, 2008 at 2:59 PM, Oleg Bartunov <[EMAIL PROTECTED]> wrote:

> have you seen contrib/hstore ? You can have one table with common
> attributes
> and hide others in hstore
>

That's interesting.  I'll check it out.  Thanks!

Kynn


Re: [PERFORM] The "many nulls" problem

2008-03-14 Thread Oleg Bartunov

On Fri, 14 Mar 2008, Kynn Jones wrote:


On Fri, Mar 14, 2008 at 2:59 PM, Oleg Bartunov <[EMAIL PROTECTED]> wrote:


have you seen contrib/hstore ? You can have one table with common
attributes
and hide others in hstore



That's interesting.  I'll check it out.  Thanks!


actually, hstore was designed specially for this kind of problems.




Kynn



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Anyone using a SAN?

2008-03-14 Thread Peter Koczan
Hi all,

I had a few meetings with SAN vendors and I thought I'd give you some
follow-up on points of potential interest.

- Dell/EMC
The representative was like the Dell dude grown up. The sales pitch
mentioned "price point" about twenty times (to the point where it was
annoying), and the pitch ultimately boiled down to "Dude, you're
getting a SAN." My apologies in advance to bringing back repressed
memories of the Dell dude. As far as technical stuff goes, it's about
what you'd expect from a low-level SAN. The cost for a SAN was in the
$2-3 per GB range if you went with the cheap option...not terrible,
but not great either, especially since you'd have to buy lots of GB.
Performance numbers weren't bad, but they weren't great either.

- 3par
The sales pitch was more focused on technical aspects and only
mentioned "price point" twice...which is a win in my books, at least
compared to Dell. Their real place to shine was in the technical
aspect. Whereas Dell just wanted to sell you a storage system that you
put on a network, 3par wanted to sell you a storage system
specifically designed for a network, and change the very way you think
about storage. They had a bunch of cool management concepts, and very
advanced failover, power outage, and backup techniques and tools.
Performance wasn't shabby, either, for instance a RAID 5 set could get
about 90% the IOPS and transfer rate that a RAID 10 set could. How
exactly this compares to DAS they didn't say. The main stumbling block
with 3par is price. While they didn't give any specific numbers, best
estimates put a SAN in the $5-7 per GB range. The extra features just
might be worth it though.

- Lefthand
This is going to be an upcoming meeting, so I don't have as good of an
opinion. Looking at their website, they seem more to the Dell end in
terms of price and functionality. I'll keep you in touch as I have
more info. They seem good for entry-level SANs, though.

Luckily, almost everything here works with Linux (at least the major
distros), including the management tools, in case people were worried
about that. One of the key points to consider going forward is that
the competition of iSCSI and Fibre Channel techs will likely bring
price down in the future. While SANs are certainly more expensive than
their DAS counterparts, the gap appears to be closing.

However, to paraphrase a discussion between a few of my co-workers,
you can buy toilet paper or kitty litter in huge quantities because
you know you'll eventually use it...and it doesn't change in
performance or basic functionality. Storage is just something that you
don't always want to buy a lot of in one go. It will get bigger, and
cheaper, and probably faster in a relatively short amount of time. The
other thing is that you can't really get a small SAN. The minimum is
usually in the multiple TB range (and usually >10 TB). I'd love to be
able to put together a proof of concept and a test using 3par's
technology and commodity 80GB slow disks, but I really can't. You're
stuck with going all-in right away, and enough people have had
problems being married to specific techs or vendors that it's really
hard to break that uneasiness.

Thanks for reading, hopefully you found it slightly informative.

Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adaptec 5805 SAS Raid

2008-03-14 Thread Leigh Dyer

Glyn Astill wrote:

Any of you chaps used this controller?



It looks very similar to the rebadged Adaptec that Sun shipped in the 
X4150 I ordered a few weeks ago, though the Sun model had only 256MB of 
cache RAM. I was wary of going Adaptec after my experiences with the 
PERC/3i, which couldn't even seem to manage a single disk's worth of 
read performance from a RAID-1 array, but I was pleasantly surprised by 
this card. I'm only running a RAID-1 array on it, with 2 146GB 10krpm 
SAS drives, but I was impressed with the read performance -- it seems 
quite happy to split sequential reads across the two disks.


Here are the bonnie++ numbers I took during my run-in testing:

Version  1.03   --Sequential Output-- --Sequential Input- 
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
membrane12G 54417  89 86808  15 41489   6 59517  96 125266  10 
629.6   0
--Sequential Create-- Random 
Create
-Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
membrane 16 19496  97 + +++ 14220  68  7673  40 + +++ 
5246  26


I'm not sure if I'd yet be comfortable running a larger array for a 
database on an Adaptec card, but it's definitely a great improvement on 
the earlier Adaptec hardware I've used.


Thanks
Leigh




  ___ 
Rise to the challenge for Sport Relief with Yahoo! For Good  


http://uk.promotions.yahoo.com/forgood/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] best way to run maintenance script

2008-03-14 Thread Vinubalaji Gopal
Hi all,
  I have been searching for the best way to run maintenance scripts
which does a vacuum, analyze and deletes some old data. Whenever the
maintenance script runs - mainly the pg_maintenance --analyze script -
it slows down postgresql inserts and I want to avoid that. The system is
under constant load and I am not interested in the time taken to vacuum.
Is there a utility or mechanism in postgresql which helps in reducing
priority of maintenance queries?

Is writing a postgresql C function and setting the priority of process
the only way to change the priority of the maintenance script or is
there a better way.
http://weblog.bignerdranch.com/?p=11

I tried using the nice command (Linux system) on the maintenance script
- it did not have any effect - guess it does not change the niceness of
the postgresql vacuum process.

(I am running Postgresql 8.0 on a Linux)

--
Vinu

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 14 Mar 2008 17:00:21 -0800
Vinubalaji Gopal <[EMAIL PROTECTED]> wrote:

> Hi all,
>   I have been searching for the best way to run maintenance scripts
> which does a vacuum, analyze and deletes some old data. Whenever the
> maintenance script runs - mainly the pg_maintenance --analyze script -
> it slows down postgresql inserts and I want to avoid that. The system
> is under constant load and I am not interested in the time taken to
> vacuum. Is there a utility or mechanism in postgresql which helps in
> reducing priority of maintenance queries?

You can use parameters such as vacuum_cost_delay to help this... see
the docs:

http://www.postgresql.org/docs/8.3/static/runtime-config-autovacuum.html

> 
> Is writing a postgresql C function and setting the priority of process
> the only way to change the priority of the maintenance script or is
> there a better way.
> http://weblog.bignerdranch.com/?p=11
> 
> I tried using the nice command (Linux system) on the maintenance
> script
> - it did not have any effect - guess it does not change the niceness
> of the postgresql vacuum process.
> 
> (I am running Postgresql 8.0 on a Linux)

If you are truly running 8.0 and not something like 8.0.15 vacuum is
the least of your worries.

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
  PostgreSQL political pundit | Mocker of Dolphins

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH2xkkATb/zqfZUUQRAsFxAJ422xFUGNwJZZVS47SwM9HJEYrb/gCePESL
YZFM27b93ylhy5TuE2MCcww=
=2Zpp
-END PGP SIGNATURE-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Vinubalaji Gopal
Hi Joshua,

> You can use parameters such as vacuum_cost_delay to help this... see
> the docs:
> 
> http://www.postgresql.org/docs/8.3/static/runtime-config-autovacuum.html

I am checking it out. Seems to be a nice option for vacuum - but wish
there was a way to change the delete priority or I will try to use the C
based priority hack.


> If you are truly running 8.0 and not something like 8.0.15 vacuum is
> the least of your worries.
Its 8.0.4. 

Thanks.

--
Vinu

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Tom Lane
Vinubalaji Gopal <[EMAIL PROTECTED]> writes:
>> If you are truly running 8.0 and not something like 8.0.15 vacuum is
>> the least of your worries.

> Its 8.0.4. 

That's only a little bit better.  Read about all the bug fixes you're
missing at
http://www.postgresql.org/docs/8.0/static/release.html
and then consider updating ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Fri, 14 Mar 2008 17:51:52 -0800
Vinubalaji Gopal <[EMAIL PROTECTED]> wrote:

> Hi Joshua,
> 
> > You can use parameters such as vacuum_cost_delay to help this... see
> > the docs:
> > 
> > http://www.postgresql.org/docs/8.3/static/runtime-config-autovacuum.html
> 
> I am checking it out. Seems to be a nice option for vacuum - but wish
> there was a way to change the delete priority or I will try to use
> the C based priority hack.

I think you will find if you do it the right way, which is to say the
way that it is meant to be done with the configurable options, your
life will be a great deal more pleasant than some one off hack.

Sincerely,

Joshua D. Drake

- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
  PostgreSQL political pundit | Mocker of Dolphins

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH2zG0ATb/zqfZUUQRAtmeAKCpKUbZP63qmiAPI6x4i9sLaf3LfwCfTPwb
mdS3L7JzlwarEjuu3WGFdaE=
=V7wn
-END PGP SIGNATURE-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Vinubalaji Gopal

On Fri, 2008-03-14 at 18:37 -0700, Tom Lane wrote:
> That's only a little bit better.  Read about all the bug fixes you're

Sure - will eventually upgrade it sometime - but it has to wait for
now :(


--
Vinu

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-14 Thread Vinubalaji Gopal

> 
> I think you will find if you do it the right way, which is to say the
> way that it is meant to be done with the configurable options, your
> life will be a great deal more pleasant than some one off hack.
> 

yeah I agree. The pg_maintanence script which calls vacuum and analyze
is the one of the thing that is causing more problems. I am trying out
various vacuum options (vacuum_cost_limit, vacuum_cost_delay) and
finding it hard to understand the implications of the variables. What
are the optimal values for the vacuum_* parameters - for a really active
database (writes at the rate of ~ 50 rows/seconds).

I started with
vacuum_cost_delay = 200
vacuum_cost_limit = 400

and that did not help much. 

--
Vinu


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance