Re: [PERFORM] Replication Syatem

2008-04-29 Thread Greg Smith

On Tue, 29 Apr 2008, Gauri Kanekar wrote:


We do vacuum full, as vacuum verbose analyse dont regain space for us.


Ah, now we're getting to the root of your problem here.  You expect that 
VACUUM should reclaim space.


Whenever you UPDATE a row, it writes a new one out, then switches to use 
that version.  This leaves behind the original.  Those now unused rows are 
what VACUUM gathers, but it doesn't give that space back to the operating 
system.


The model here assumes that you'll need that space again for the next time 
you UPDATE or INSERT a row.  So instead VACUUM just keeps those available 
for database reuse rather than returning it to the operating system.


Now, if you don't VACUUM frequently enough, this model breaks down, and 
the table can get bigger with space that may never get reused.  The idea 
is that you should be VACUUMing up now unneeded rows at about the same 
rate they're being re-used.  When you don't keep up, the database can 
expand in space that you don't get back again.  The right answer to this 
problem is not to use VACUUM FULL; it's to use regular VACUUM more often.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-29 Thread A B
Here is some more information.

Size of database:

du -sh /var/lib/pgsql/data/base/*
4,1M/var/lib/pgsql/data/base/1
4,1M/var/lib/pgsql/data/base/10792
4,1M/var/lib/pgsql/data/base/10793
9,1M/var/lib/pgsql/data/base/16388
11M /var/lib/pgsql/data/base/19233
1,6G/var/lib/pgsql/data/base/20970

I'm not sure what the size acctually is... But I can't imagine that it
is 1,6 GB!!! I'd say I have 11MB of data in it...

Cpu is Intel CoreDuo E6750, 4 GB RAM
Harddiscs are two Segate 320 GB SATA discs. running software raid
(!!), raid-1.Yes, this might be a big performance hit, but that is
what I have right now, in the future I can throw more money on
hardware.

Will I see a general improvement in performance in 8.3.X over 8.1.11?


2008/4/29 A B <[EMAIL PROTECTED]>:
> Right now, version 8.1.11 on centos.x86-64, intel dual core cpu with 2
>  sata discs  (mirror raid)
>
>  The queries are most select/inserts.. I guess... I'm not sure exactly
>  what to answer on that.
>  "explain analyze" is something I have not read about yet.
>
>
>  2008/4/28 Claus Guttesen <[EMAIL PROTECTED]>:
>
>
> > >  1) hardware
>  >  >  2) rewriting my queries and table structures
>  >  >  3) using more predefined queries
>  >  >  4) tweek parameters in the db conf files
>  >  >
>  >  >  Of these points:
>  >  >  1) is nothing I can do about right now, but in the future perhaps.
>  >  >  2) will be quite hard right now since there is more code than time.
>  >  >  3) almost like 2 but perhaps more do-able with the current constraints.
>  >  >  4) This seems to be the easiest one to start with...
>  >  >
>  >  >  So what should I do/read concerning point 4?
>  >  >  If you have other good suggestions  I'd be very interested in that.
>  >  >
>  >  >  Thank you :-)
>  >
>  >  You can provide information postgresql-version, what type of queries
>  >  you're running, some explain analyze of those, and what type of
>  >  hardware you're running and what OS is installed.
>  >
>  >  --
>  >  regards
>  >  Claus
>  >
>  >  When lenity and cruelty play for a kingdom,
>  >  the gentlest gamester is the soonest winner.
>  >
>  >  Shakespeare
>  >
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
>From most of the reply found that upgrade to higher version of postgres  may
be to 8.3.1 may be one of the solution to tackle this problem

Checked about HOT feature in 8.3.1.

Do we need to do any special config changes or any other setting for HOT to
work??

Any special guideline to follow to make HOT working??

~ Gauri

On Tue, Apr 29, 2008 at 2:07 PM, Greg Smith <[EMAIL PROTECTED]> wrote:

> On Tue, 29 Apr 2008, Gauri Kanekar wrote:
>
>  We do vacuum full, as vacuum verbose analyse dont regain space for us.
> >
>
> Ah, now we're getting to the root of your problem here.  You expect that
> VACUUM should reclaim space.
>
> Whenever you UPDATE a row, it writes a new one out, then switches to use
> that version.  This leaves behind the original.  Those now unused rows are
> what VACUUM gathers, but it doesn't give that space back to the operating
> system.
>
> The model here assumes that you'll need that space again for the next time
> you UPDATE or INSERT a row.  So instead VACUUM just keeps those available
> for database reuse rather than returning it to the operating system.
>
> Now, if you don't VACUUM frequently enough, this model breaks down, and
> the table can get bigger with space that may never get reused.  The idea is
> that you should be VACUUMing up now unneeded rows at about the same rate
> they're being re-used.  When you don't keep up, the database can expand in
> space that you don't get back again.  The right answer to this problem is
> not to use VACUUM FULL; it's to use regular VACUUM more often.
>
>
> --
> * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
>



-- 
Regards
Gauri


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Shane Ambler

Gauri Kanekar wrote:

Andrew,

Can you explain me in detail why u said vacuum full is making the things
worst.
We do vacuum full, as vacuum verbose analyse dont regain space for us.



vacuum full stops all access so that the data files can be re-writen 
without the unused space.


normal vacuum will update the records of what space is no longer used so 
that it can then be reused with the next update/insert. Your db size 
will not shrink straight away but it will stop growing until you use all 
the free space left from previous update/delete


The more frequently you do a normal vacuum the less time it will take 
and things will run a lot smoother with your file size growing slowly to 
accommodate new data.


Expanding on what others have mentioned as a drawback of vacuum full - 
you should look at REINDEX'ing as well (maybe one index or table at a 
time). You will most likely find this will reclaim some disk space for 
you as well.





--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Pavan Deolasee
On Tue, Apr 29, 2008 at 4:35 PM, Gauri Kanekar
<[EMAIL PROTECTED]> wrote:

>
> Do we need to do any special config changes or any other setting for HOT to
> work??

No. HOT is enabled by default, on all tables. There is no way and need
to disable it.

>
> Any special guideline to follow to make HOT working??
>

You can do couple of things to benefit from HOT.

1. HOT addresses a special, but common case where UPDATE operation
does not change any of the index keys. So check if your UPDATE changes
any of the index keys. If so, see if you can avoid having index
involving that column. Of course, I won't advocate dropping an index
if it would drastically impact your frequently run queries.

2. You may leave some free space in the heap (fillfactor less than
100). My recommendation would be to leave space worth of one row or
slightly more than that to let first UPDATE be an HOT update.
Subsequent UPDATEs in the page may reuse the dead row created by
earlier UPDATEs.

3. Avoid any long running transactions.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Vacuum statistics

2008-04-29 Thread Francisco Reyes
I recall reading posts in the past where one could query the stat tables and 
see how well autovacuum was performing. Not finding the posts.



I found this query:
SELECT relname, relkind, reltuples, relpages FROM pg_class where relkind = 
'r';


From the output how can I tell the number of dead tuples? Or how effective 

autovacuum is in the particular table..

Recently inheritted several large Postgresql DBs (tables in the hundreds of 
millions and some tables over a billion rows) and I am just starting to go 
over them and see how autovacuum has been performing.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] corrupted shared memory message

2008-04-29 Thread Pablo Alcaraz

Tom Lane wrote:

Pablo Alcaraz <[EMAIL PROTECTED]> writes:
  
We have a database running smoothly for months. 2 days ago I get this 
error message. I tried a restore, a full restore (deleting the old 
database and recovering from backup all the information) but we are 
getting this error every time.



I think you've got hardware problems.  Run some memory and disk
diagnostics.

regards, tom lane


  

Bingo! Thanks!
Pablo


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
Thanx for the help.

Need some more help.

"table1" has two indices
unique indx1 = "pkfld"
unique indx2 = "fkfld1,fkfld2"

did following steps in the listed order -

1. vacuumed the whole DB
2. "table1"
  RecCnt ==> 11970789
  Size ==> 2702.41 MB
3.update "table1" set fld7 = 1000 where fld1/100 = 999 ;
this UPDATED 1230307 records
4. checked "table1" size again
 Reccnt =>   11970789
 Size ==> 2996.57MB
5. Again did the update, update "table1" set fld7 = 1000 where fld1/100
= 999 ;
this UPDATED 1230307 records
6. Got "table1" size as
RecCnt ==> 11970789
Size ==> 3290.64
7. Updated again, update "table1" set fld7 = 1000 where fld1/100 = 999 ;
this UPDATED 1230307 records
6. "table1" size as
RecCnt ==> 11970789
Size ==> 3584.66

Found that the size increased gradually. Is HOT working over here ??
Guide me if im doing something wrong.

~ Gauri

On Tue, Apr 29, 2008 at 4:55 PM, Pavan Deolasee <[EMAIL PROTECTED]>
wrote:

> On Tue, Apr 29, 2008 at 4:35 PM, Gauri Kanekar
> <[EMAIL PROTECTED]> wrote:
>
> >
> > Do we need to do any special config changes or any other setting for HOT
> to
> > work??
>
> No. HOT is enabled by default, on all tables. There is no way and need
> to disable it.
>
> >
> > Any special guideline to follow to make HOT working??
> >
>
> You can do couple of things to benefit from HOT.
>
> 1. HOT addresses a special, but common case where UPDATE operation
> does not change any of the index keys. So check if your UPDATE changes
> any of the index keys. If so, see if you can avoid having index
> involving that column. Of course, I won't advocate dropping an index
> if it would drastically impact your frequently run queries.
>
> 2. You may leave some free space in the heap (fillfactor less than
> 100). My recommendation would be to leave space worth of one row or
> slightly more than that to let first UPDATE be an HOT update.
> Subsequent UPDATEs in the page may reuse the dead row created by
> earlier UPDATEs.
>
> 3. Avoid any long running transactions.
>
> Thanks,
> Pavan
>
> --
> Pavan Deolasee
> EnterpriseDB http://www.enterprisedb.com
>



-- 
Regards
Gauri


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Alvaro Herrera
Gauri Kanekar escribió:

> Do we need to do any special config changes or any other setting for HOT to
> work??

No.  HOT is always working, if it can.  You don't need to configure it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Alvaro Herrera
Gauri Kanekar escribió:

> Found that the size increased gradually. Is HOT working over here ??
> Guide me if im doing something wrong.

Probably not.  Try vacuuming between the updates.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Pavan Deolasee
On Tue, Apr 29, 2008 at 6:29 PM, Gauri Kanekar
<[EMAIL PROTECTED]> wrote:
>
>
> Found that the size increased gradually. Is HOT working over here ??
> Guide me if im doing something wrong.
>

You have chosen a bad case for HOT. Since you are repeatedly updating
the same set of rows, the dead space created in the first step is the
blocks which are not touched in the subsequent updates. Is this a real
scenario or are you just testing ? If its just for testing, I would
suggest updating different sets of rows in each step and then check.

Thanks,
Pavan



-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
Thats how our updates works.
We usually tend to touch the same row many times a day.

~ Gauri

On Tue, Apr 29, 2008 at 6:39 PM, Pavan Deolasee <[EMAIL PROTECTED]>
wrote:

> On Tue, Apr 29, 2008 at 6:29 PM, Gauri Kanekar
> <[EMAIL PROTECTED]> wrote:
> >
> >
> > Found that the size increased gradually. Is HOT working over here ??
> > Guide me if im doing something wrong.
> >
>
> You have chosen a bad case for HOT. Since you are repeatedly updating
> the same set of rows, the dead space created in the first step is the
> blocks which are not touched in the subsequent updates. Is this a real
> scenario or are you just testing ? If its just for testing, I would
> suggest updating different sets of rows in each step and then check.
>
> Thanks,
> Pavan
>
>
>
> --
> Pavan Deolasee
> EnterpriseDB http://www.enterprisedb.com
>



-- 
Regards
Gauri


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Pavan Deolasee
On Tue, Apr 29, 2008 at 6:42 PM, Gauri Kanekar
<[EMAIL PROTECTED]> wrote:
> Thats how our updates works.
> We usually tend to touch the same row many times a day.
>

Then start with a non-100 fillfactor. I would suggest something like
80 and then adjust based on the testing. Since you are anyways have a
update intensive setup, leaving free space in the heap won't harm you
much in the long term.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gregory Stark
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:

>> Any special guideline to follow to make HOT working??
>>
>
> You can do couple of things to benefit from HOT.
>
> 1. HOT addresses a special, but common case where UPDATE operation
> does not change any of the index keys. So check if your UPDATE changes
> any of the index keys. If so, see if you can avoid having index
> involving that column. Of course, I won't advocate dropping an index
> if it would drastically impact your frequently run queries.
>
> 2. You may leave some free space in the heap (fillfactor less than
> 100). My recommendation would be to leave space worth of one row or
> slightly more than that to let first UPDATE be an HOT update.
> Subsequent UPDATEs in the page may reuse the dead row created by
> earlier UPDATEs.
>
> 3. Avoid any long running transactions.

Perhaps we should put this list in the FAQ.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Tom Lane
Greg Smith <[EMAIL PROTECTED]> writes:
> The model here assumes that you'll need that space again for the next time 
> you UPDATE or INSERT a row.  So instead VACUUM just keeps those available 
> for database reuse rather than returning it to the operating system.

> Now, if you don't VACUUM frequently enough, this model breaks down, and 
> the table can get bigger with space that may never get reused.  The idea 
> is that you should be VACUUMing up now unneeded rows at about the same 
> rate they're being re-used.  When you don't keep up, the database can 
> expand in space that you don't get back again.  The right answer to this 
> problem is not to use VACUUM FULL; it's to use regular VACUUM more often.

Also, you need to make sure you have the FSM parameters set high enough
so that all the free space found by a VACUUM run can be remembered.

The less often you run VACUUM, the more FSM space you need, because
there'll be more free space reclaimed per run.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-29 Thread Tom Lane
"A B" <[EMAIL PROTECTED]> writes:
> I'm not sure what the size acctually is... But I can't imagine that it
> is 1,6 GB!!! I'd say I have 11MB of data in it...

Sounds like you've got a rather severe case of table and/or index bloat.
This is typically caused by not vacuuming often enough.

The easiest way to get the size back down is probably to dump and reload
the database.  After that you need to look at your vacuuming practices.

> Will I see a general improvement in performance in 8.3.X over 8.1.11?

Probably so, if only because it has autovacuum turned on by default.
That's not really a substitute for careful administration practices,
but it helps.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Vivek Khera


On Apr 29, 2008, at 10:16 AM, Tom Lane wrote:


Greg Smith <[EMAIL PROTECTED]> writes:
The model here assumes that you'll need that space again for the  
next time
you UPDATE or INSERT a row.  So instead VACUUM just keeps those  
available

for database reuse rather than returning it to the operating system.

[ ... ]
Also, you need to make sure you have the FSM parameters set high  
enough

so that all the free space found by a VACUUM run can be remembered.

The less often you run VACUUM, the more FSM space you need, because
there'll be more free space reclaimed per run.


I can actually watch one of our applications slow down once the free  
space in the table is used up.  Extending the data file seems to be  
much more expensive than using the free space found in existing pages  
of the file.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-29 Thread John Rouillard
On Tue, Apr 29, 2008 at 05:19:59AM +0930, Shane Ambler wrote:
> John Rouillard wrote:
> 
> >We can't do this as we are backfilling a couple of months of data 
> >into tables with existing data.
> 
> Is this a one off data loading of historic data or an ongoing thing?

Yes it's a one off bulk data load of many days of data. The daily
loads will also take 3 hour's but that is ok since we only do those
once a day so we have 21 hours of slack in the schedule 8-).

> >>>The only indexes we have to drop are the ones on the primary keys
> >>> (there is one non-primary key index in the database as well).
> 
> If this amount of data importing is ongoing then one thought I would try
> is partitioning (this could be worthwhile anyway with the amount of data
> you appear to have).
> Create an inherited table for the month being imported, load the data 
> into it, then add the check constraints, indexes, and modify the 
> rules/triggers to handle the inserts to the parent table.

Hmm, interesting idea, worth considering if we have to do this again
(I hope not). 

Thaks for the reply.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-29 Thread John Rouillard
On Mon, Apr 28, 2008 at 02:16:02PM -0400, Greg Smith wrote:
> On Mon, 28 Apr 2008, John Rouillard wrote:
> 
> >   2008-04-21 11:36:43 UTC @(2761)i: LOG: checkpoints ... (27 seconds 
> >   apart)
> > so I changed:
> >  checkpoint_segments = 30
> >  checkpoint_warning = 150
> 
> That's good, but you might go higher than 30 for a bulk loading operation 
> like this, particularly on 8.1 where checkpoints are no fun.  Using 100 is 
> not unreasonable.

Ok. I can do that. I chose 30 to make the WAL logs span the 5 minute

  checkpoint_timeout = 300

so that the 30 segments wouldn't wrap over before the 5 minute
checkpoint that usually occurs. Maybe I should increase both the
timeout and the segments?
 
> >shared_buffers = 3000
> >I don't see any indication in the docs that increasing shared memory
> >would help speed up a copy operation.
> 
> The index blocks use buffer space, and what ends up happening if there's 
> not enough memory is they are written out more than they need to be (and 
> with your I/O hardware you need to avoid writes unless absolutely 
> necessary).

I forgot to mention the raid 1/0 is on a 3ware 9550SX-4LP raid card
setup as raid 1/0. The write cache is on and autoverify is turned off.

> Theoretically the OS is caching around that situation but 
> better to avoid it. 

The system is using 6-8MB of memory for cache.

> You didn't say how much RAM you have,

16GB total, but 8GB or so is taken up with other processes.

> but you should 
> start by a factor of 10 increase to 30,000 and see if that helps; if so, 
> try making it large enough to use 1/4 of total server memory.  3000 is 
> only giving the server 24MB of RAM to work with, and it's unfair to expect 
> it to work well in that situation.

So swap the memory usage from the OS cache to the postgresql process.
Using 1/4 as a guideline it sounds like 600,000 (approx 4GB) is a
better setting. So I'll try 30 to start (1/8 of memory) and see
what it does to the other processes on the box.
 
> While not relevant to this exercise you'll need to set 
> effective_cache_size to a useful value one day as well.

This is a very lightly loaded database, a few queries/hour usually
scattered across the data set, so hopefully that won't be much of an
issue.

-- 
-- rouilj

John Rouillard
System Administrator
Renesys Corporation
603-244-9084 (cell)
603-643-9300 x 111

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Chris Browne
[EMAIL PROTECTED] ("Gauri Kanekar") writes:
> Basically we have some background process which updates "table1" and
> we don't want the application to make any changes to "table1" while
> vacuum.  Vacuum requires exclusive lock on "table1" and if any of
> the background or application is ON vacuum don't kick off. Thats the
> reason we need to get the site down.

VACUUM has not required an exclusive lock on tables since version 7.1.

What version of PostgreSQL are you running?
-- 
output = ("cbbrowne" "@" "acm.org")
http://linuxdatabases.info/info/sap.html
Rules of the Evil Overlord #192.  "If I appoint someone as my consort,
I will  not subsequently inform  her that she  is being replaced  by a
younger, more attractive woman.  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor performance loading 100M of sql data using copy

2008-04-29 Thread Greg Smith

On Tue, 29 Apr 2008, John Rouillard wrote:


So swap the memory usage from the OS cache to the postgresql process.
Using 1/4 as a guideline it sounds like 600,000 (approx 4GB) is a
better setting. So I'll try 30 to start (1/8 of memory) and see
what it does to the other processes on the box.


That is potentially a good setting.  Just be warned that when you do hit a 
checkpoint with a high setting here, you can end up with a lot of data in 
memory that needs to be written out, and under 8.2 that can cause an ugly 
spike in disk writes.  The reason I usually threw out 30,000 as a 
suggested starting figure is that most caching disk controllers can buffer 
at least 256MB of writes to keep that situation from getting too bad. 
Try it out and see what happens, just be warned that's the possible 
downside of setting shared_buffers too high and therefore you might want 
to ease into that more gradually (particularly if this system is shared 
with other apps).

x
--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Shane Ambler

Alvaro Herrera wrote:

Gauri Kanekar escribió:


Do we need to do any special config changes or any other setting for HOT to
work??


No.  HOT is always working, if it can.  You don't need to configure it.



Unless you have upgraded since you started this thread you are still 
running 8.1.3.


HOT is only available in 8.3 and 8.3.1

You DO need to upgrade to get the benefits of HOT



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SSD database benchmarks

2008-04-29 Thread Greg Smith
This is the best write-up I've seen yet on quantifying what SSDs are good 
and bad at in a database context:


http://www.bigdbahead.com/?p=37

--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD database benchmarks

2008-04-29 Thread Guido Neitzer

On 29.04.2008, at 12:55, Greg Smith wrote:

This is the best write-up I've seen yet on quantifying what SSDs are  
good and bad at in a database context:


http://www.bigdbahead.com/?p=37


They totally missed "mainly write" applications which most of my  
applications are. Reads in a OLTP setup are typically coming from a  
cache (or, like in our case an index like Solr) while writes go  
through ... So you might get some decent IO from the SSD when the  
database just started up without caches filled, but as long as your  
cache hit ratio is good, it doesn't matter anymore after a couple of  
minutes.


Nevertheless it's an interesting development.

cug

--
http://www.event-s.net


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD database benchmarks

2008-04-29 Thread Greg Smith

On Tue, 29 Apr 2008, Guido Neitzer wrote:

They totally missed "mainly write" applications which most of my applications 
are.


Not totally--the very first graph shows that even on random data, 100% 
write apps are 1/2 the speed of a regular hard drive.


After poking around the site a bit more they also did some tests with MFT, 
some kernel software from Easy Computing that basically uses a write log 
disk to improve the write situation:


http://www.bigdbahead.com/?p=44

That version also has a better summary of the results.

MFT is clearly not ready for prime time yet from the general buginess, but 
as that approach matures (and more vendors do something similar in 
hardware) it will be really interesting to see what happens.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD database benchmarks

2008-04-29 Thread Merlin Moncure
On Tue, Apr 29, 2008 at 2:55 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
> This is the best write-up I've seen yet on quantifying what SSDs are good
> and bad at in a database context:
>
>  http://www.bigdbahead.com/?p=37
>
>  --
>  * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

here is another really informative article (pdf)
http://www.storagesearch.com/easyco-flashperformance-art.pdf.

The conclusions are different...they claim it's hard to make good use
of ssd advantages if you do any random writing at all.  Of course,
they are also developing an automagical solution to the problem.  The
information is good though...it's certainly worth a read.  The problem
is probably solvable.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Understanding histograms

2008-04-29 Thread Len Shapiro

I hope I am posting to the right list.
I am running Postgresql 8.1.9 and don't understand the behavior of
histograms for data items not in the MVC list.  I teach databases and
want to use Postgres as an example.  I will appreciate any help that
anyone can provide.

Here is the data I am using. I am interested only in the "rank" attribute.

CREATE TABLE Sailors (
  sid Integer NOT NULL,
  sname varchar(20),
  rank integer,
  age real,
  PRIMARY KEY (sid));

I insert 30 sailor rows:

INSERT INTO Sailors VALUES (3, 'Andrew', 10,  30.0);
INSERT INTO Sailors VALUES (17, 'Bart',  5,  30.2);
INSERT INTO Sailors VALUES (29, 'Beth',3,  30.4);
INSERT INTO Sailors VALUES (28, 'Bryant',3,  30.6);
INSERT INTO Sailors VALUES (4, 'Cynthia',  9,  30.8);
INSERT INTO Sailors VALUES (16, 'David',  9,  30.9);
INSERT INTO Sailors VALUES (27, 'Fei',3,  31.0);
INSERT INTO Sailors VALUES (12, 'James',3,  32.0);
INSERT INTO Sailors VALUES (30, 'Janice',3,  33.0);
INSERT INTO Sailors VALUES (2, 'Jim', 8,  34.5);
INSERT INTO Sailors VALUES (15, 'Jingke', 10,  35.0);
INSERT INTO Sailors VALUES (26, 'Jonathan',9,  36.0);
INSERT INTO Sailors VALUES (24, 'Kal',3,  36.6);
INSERT INTO Sailors VALUES (14, 'Karen', 8,  37.8);
INSERT INTO Sailors VALUES (8, 'Karla',7,  39.0);
INSERT INTO Sailors VALUES (25, 'Kristen', 10, 39.5);
INSERT INTO Sailors VALUES (19, 'Len',   8,  40.0);
INSERT INTO Sailors VALUES (7, 'Lois',   8,  41.0);
INSERT INTO Sailors VALUES (13, 'Mark', 7,  43.0);
INSERT INTO Sailors VALUES (18, 'Melanie', 1,  44.0);
INSERT INTO Sailors VALUES (5, 'Niru',  5,  46.0);
INSERT INTO Sailors VALUES (23, 'Pavel',3,  48.0);
INSERT INTO Sailors VALUES (1, 'Sergio', 7,  50.0);
INSERT INTO Sailors VALUES (6, 'Suhui', 1,  51.0);
INSERT INTO Sailors VALUES (22, 'Suresh',9,  52.0);
INSERT INTO Sailors VALUES (20, 'Tim',7,  54.0);
INSERT INTO Sailors VALUES (21, 'Tom', 10,  56.0);
INSERT INTO Sailors VALUES (11, 'Warren',3,  58.0);
INSERT INTO Sailors VALUES (10, 'WuChang',9,  59.0);
INSERT INTO Sailors VALUES (9, 'WuChi', 10,  60.0);

after analyzing, I access the pg_stats table with

SELECT  n_distinct, most_common_vals,
   most_common_freqs,   histogram_bounds
FROM pg_stats WHERE tablename = 'sailors' AND attname = 'rank';

and I get:

n_distinct  most_common_valsmost_common_freqs 
histogram_bounds

-0.23
   {3,9,10,7,8}{0.27,0.17,0.17,0.13,0.13} 
{1,5}


I have two questions.  I'd appreciate any info you can provide,
including pointers to the source code.

1. Why does Postgres come up with a negative n_distinct?  It
apparently thinks that the number of rank values will increase as the
number of sailors increases.  What/where is the algorithm that decides
that?

2. The most_common_vals and their frequencies make sense.  They say
that the values {3,9,10,7,8} occur a total of 26 times, so other
values occur a total of 4 times.  The other, less common, values are 1
and 5, each occuring twice, so the histogram {1,5} is appropriate.
If I run the query
EXPLAIN SELECT * from sailors where rank = const;
for any const not in the MVC list, I get the plan

Seq Scan on sailors  (cost=0.00..1.38 rows=2 width=21)
 Filter: (rank = const)

The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
sense to me for other values of const not in the MVC list.
For example, if I run the query
EXPLAIN SELECT * from sailors where rank = -1000;
Postgres still gives an estimate of "row=2".
Can someone please explain?

Thanks,

Len Shapiro
Portland State University

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding histograms

2008-04-29 Thread Tom Lane
Len Shapiro <[EMAIL PROTECTED]> writes:
> 1. Why does Postgres come up with a negative n_distinct?

It's a fractional representation.  Per the docs:

> stadistinct   float4  The number of distinct nonnull data values in 
> the column. A value greater than zero is the actual number of distinct 
> values. A value less than zero is the negative of a fraction of the number of 
> rows in the table (for example, a column in which values appear about twice 
> on the average could be represented by stadistinct = -0.5). A zero value 
> means the number of distinct values is unknown

> The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
> sense to me for other values of const not in the MVC list.
> For example, if I run the query
> EXPLAIN SELECT * from sailors where rank = -1000;
> Postgres still gives an estimate of "row=2".

I'm not sure what estimate you'd expect instead?  The code has a built in
assumption that no value not present in the MCV list can be more
frequent than the last member of the MCV list, so it's definitely not
gonna guess *more* than 2.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
HOT doesn't seems to be working in our case.

This is "table1" structure :
 idintegernot null
 codeintegernot null
 cridintegernot null
 statuscharacter varying(1)default 'A'::character varying
 delta1bigintdefault 0
 delta2bigintdefault 0
 delta3bigintdefault 0
 delta4bigintdefault 0
 tz_idintegerdefault 0
Indexes:
"idx1" PRIMARY KEY, btree (id)
"idx2" UNIQUE, btree (code, crid)
"idx3" btree (tz_id)
"idx4" btree (status)

code as crid are foreign key.

Here delta* fields get updated through out the day. and most of the time it
may update the same row again n again.

table1 contains around 12843694 records.

Now not understanding y HOT don't work in our case.

Changed fillfactor to 80, 75,70 but nothing seems to work.

~Gauri
On Tue, Apr 29, 2008 at 10:18 PM, Shane Ambler <[EMAIL PROTECTED]> wrote:

> Alvaro Herrera wrote:
>
> > Gauri Kanekar escribió:
> >
> >  Do we need to do any special config changes or any other setting for
> > > HOT to
> > > work??
> > >
> >
> > No.  HOT is always working, if it can.  You don't need to configure it.
> >
> >
> Unless you have upgraded since you started this thread you are still
> running 8.1.3.
>
> HOT is only available in 8.3 and 8.3.1
>
> You DO need to upgrade to get the benefits of HOT
>
>
>
>
> --
>
> Shane Ambler
> pgSQL (at) Sheeky (dot) Biz
>
> Get Sheeky @ http://Sheeky.Biz
>



-- 
Regards
Gauri


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Pavan Deolasee
On Wed, Apr 30, 2008 at 10:59 AM, Gauri Kanekar
<[EMAIL PROTECTED]> wrote:
> HOT doesn't seems to be working in our case.
>

Can you please post output of the following query ?

SELECT relid, relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_dead_tup
from pg_stat_user_tables WHERE relname = 'table1';


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
 relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup
---++---+---+---+
 16461 | table1 | 0 |   8352496 |  5389 |8351242


On Wed, Apr 30, 2008 at 11:07 AM, Pavan Deolasee <[EMAIL PROTECTED]>
wrote:

> On Wed, Apr 30, 2008 at 10:59 AM, Gauri Kanekar
> <[EMAIL PROTECTED]> wrote:
> > HOT doesn't seems to be working in our case.
> >
>
> Can you please post output of the following query ?
>
> SELECT relid, relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_dead_tup
> from pg_stat_user_tables WHERE relname = 'table1';
>
>
> Thanks,
> Pavan
>
> --
> Pavan Deolasee
> EnterpriseDB http://www.enterprisedb.com
>



-- 
Regards
Gauri


Re: [PERFORM] Understanding histograms

2008-04-29 Thread Len Shapiro
Tom,

Thank you for your prompt reply.

On Tue, Apr 29, 2008 at 10:19 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Len Shapiro <[EMAIL PROTECTED]> writes:
>  > 1. Why does Postgres come up with a negative n_distinct?
>
>  It's a fractional representation.  Per the docs:
>
>  > stadistinct   float4  The number of distinct nonnull data values 
> in the column. A value greater than zero is the actual number of distinct 
> values. A value less than zero is the negative of a fraction of the number of 
> rows in the table (for example, a column in which values appear about twice 
> on the average could be represented by stadistinct = -0.5). A zero value 
> means the number of distinct values is unknown

I asked about n_distinct, whose documentation reads in part "The
negated form is used when ANALYZE believes that the number of distinct
values is likely to increase as the table grows".  and I asked about
why ANALYZE believes that the number of distinct values is likely to
increase.  I'm unclear why you quoted to me the documentation on
stadistinct.
>
>
>  > The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
>  > sense to me for other values of const not in the MVC list.
>  > For example, if I run the query
>  > EXPLAIN SELECT * from sailors where rank = -1000;
>  > Postgres still gives an estimate of "row=2".
>
>  I'm not sure what estimate you'd expect instead?

Instead I would expect an estimate of "rows=0" for values of const
that are not in the MCV list and not in the histogram.  When the
histogram has less than the maximum number of entries, implying (I am
guessing here) that all non-MCV values are in the histogram list, this
seems like a simple strategy and has the virtue of being accurate.

Where in the source is the code that manipulates the histogram?

> The code has a built in
>  assumption that no value not present in the MCV list can be more
>  frequent than the last member of the MCV list, so it's definitely not
>  gonna guess *more* than 2.

That's interesting.  Where is this in the source code?

Thanks for all your help.

All the best,

Len Shapiro

> regards, tom lane
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Pavan Deolasee
On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar
<[EMAIL PROTECTED]> wrote:
>  relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup
> ---++---+---+---+
>  16461 | table1 | 0 |   8352496 |  5389 |8351242
>

Hmm.. So indeed there are very few HOT updates. What is the fillfactor
you are using for these tests ? If its much less than 100, the very
low percentage of HOT updates would make me guess that you are
updating one of the index columns. Otherwise at least the initial
updates until you fill up the free space should be HOT.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gauri Kanekar
"table1" structure :
 idintegernot null
 codeintegernot null
 cridintegernot null
 statuscharacter varying(1)default 'A'::character varying
 delta1bigintdefault 0
 delta2bigintdefault 0
 delta3bigintdefault 0
 delta4bigintdefault 0
 tz_idintegerdefault 0
Indexes:
"idx1" PRIMARY KEY, btree (id)
"idx2" UNIQUE, btree (code, crid)
"idx3" btree (tz_id)
"idx4" btree (status)

code as crid are foreign key.

update table1 set delta1 = 100 where code/100 =999;


On Wed, Apr 30, 2008 at 12:16 PM, Gauri Kanekar <[EMAIL PROTECTED]>
wrote:

> fillfactor is set to 80 as you suggested.
> delta* fields r updated and these fields are no where related to any of
> the index fields.
>
>
>
> On Wed, Apr 30, 2008 at 12:13 PM, Pavan Deolasee <[EMAIL PROTECTED]>
> wrote:
>
> > On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar
> > <[EMAIL PROTECTED]> wrote:
> > >  relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd |
> > n_dead_tup
> > >
> > ---++---+---+---+
> > >  16461 | table1 | 0 |   8352496 |  5389 |8351242
> > >
> >
> > Hmm.. So indeed there are very few HOT updates. What is the fillfactor
> > you are using for these tests ? If its much less than 100, the very
> > low percentage of HOT updates would make me guess that you are
> > updating one of the index columns. Otherwise at least the initial
> > updates until you fill up the free space should be HOT.
> >
> > Thanks,
> > Pavan
> >
> >
> > --
> > Pavan Deolasee
> > EnterpriseDB http://www.enterprisedb.com
> >
>
>
>
> --
> Regards
> Gauri




-- 
Regards
Gauri