Re: [PERFORM] Postgres low end processing.

2003-10-06 Thread Stef
Thanks for the replies,

On Fri, 3 Oct 2003 11:08:48 -0700
Josh Berkus <[EMAIL PROTECTED]> wrote:
=> 1. Make sure that the WAL files (pg_xlog) are on a seperate disk from the 
=> database files, either through mounting or symlinking.
 
I'm not sure I understand how this helps?

=> 2. Tweak the .conf file for low vacuum_mem (1024?), but vacuum very 
=> frequently, like every 1-5 minutes.  Spend some time tuning your 
=> fsm_max_pages to the ideal level so that you're not allocating any extra 
=> memory to the FSM.
=>
=> 3. If your concern is *average* CPU/RAM consumption, and not peak load 
=> activity, increase wal_files and checkpoint_segments to do more efficient 
=> batch processing of pending updates as the cost of some disk space.  If peak 
=> load activity is a problem, don't do this.
=> 
=> 4. Tune all of your queries carefully to avoid anything requiring a 
=> RAM-intensive merge join or CPU-eating calculated expression hash join, or 
=> similar computation-or-RAM-intensive operations.

Thanks, I'll try some of these, and post the results.
The actual machines seem to be Pentium I machines,
with 32M RAM. I've gathered that it is theoretically 
possible, so no to go try it.

Regards
Stef


pgp0.pgp
Description: PGP signature


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Jeff
On Sun, 5 Oct 2003, Neil Conway wrote:

>
> > I don't know any portable way to do that :-(
>
> For the non-portable way of doing this, are you referring to O_DIRECT?
>
> Even if it isn't available everywhere, it might be worth considering
> this at least for the platforms on which it is supported.
>

I strongly agree here only if we can prove there is a benefit.
I think it would be silly of us if some OS supported SnazzyFeatureC that
was able to speed up PG by a large percentage (hopefully, in a rather
non-invasive way in the code).  But, I do see the problem here with bloat
and PG being radically different platform to platform.  I suppose we could
dictate that at least N os's had to have it.. or perhaps supply it has
contrib/ patches Something to think about.

I'd be interested in tinkering with this, but I'm more interested at the
moment of why (with proof, not antecdotal) Solaris is so much slower than
Linux and what we cna do about this.  We're looking to move a rather large
Informix db to PG and ops has reservations about ditching Sun hardware.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Andrew Sullivan
On Mon, Oct 06, 2003 at 08:07:27AM -0400, Jeff wrote:
> I strongly agree here only if we can prove there is a benefit.

There's plenty of academic work which purports to show that LRU is
far from the best choice.  Just in principle, it seems obvious that a
single-case seqscan-type operation (such as vacuum does) is a good
way to lose your cache for no real gain.

> I'd be interested in tinkering with this, but I'm more interested at the
> moment of why (with proof, not antecdotal) Solaris is so much slower than
> Linux and what we cna do about this.  We're looking to move a rather large
> Informix db to PG and ops has reservations about ditching Sun hardware.

Interestingly, we're contemplating ditching Solaris because of the
terrible reliability we're getting from the hardware.

You can use truss to find some of the problems on Solaris.  The
open() syscall takes forever when you don't hit the Postgres shared
buffers (even if you can be sure the page is in filesystem buffers --
we could demonstrate it on a 1 G database on a machine with 10 G of
RAM).  I've heard grumblings about spinlocks on Solaris which might
explain this problem.  I certainly notice that performance gets
geometrically worse when you add a few hundred extra connections.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Afilias CanadaToronto, Ontario Canada
<[EMAIL PROTECTED]>  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Shridhar Daithankar
Jeff wrote:
I'd be interested in tinkering with this, but I'm more interested at the
moment of why (with proof, not antecdotal) Solaris is so much slower than
Linux and what we cna do about this.  We're looking to move a rather large
Informix db to PG and ops has reservations about ditching Sun hardware.
Is linux on sparc hardware is an option..:-)

 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres low end processing.

2003-10-06 Thread Bruno Wolff III
On Mon, Oct 06, 2003 at 09:55:51 +0200,
  Stef <[EMAIL PROTECTED]> wrote:
> 
> Thanks, I'll try some of these, and post the results.
> The actual machines seem to be Pentium I machines,
> with 32M RAM. I've gathered that it is theoretically 
> possible, so no to go try it.

I am running 7.4beta2 on a Pentium I machine with 48 MB of memory.
I was running an earlier version of Postgres (probably 7.1.x) on it
when it only had 32 MB of memory. It doesn't run very fast, but it
works OK. I remember increase from 32MB to 48MB was very noticible in
the time to serve web pages using information from the DB, but I
don't remember the details since it was a couple of years ago.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] locking/performance, Solaris performance discovery

2003-10-06 Thread Jeff
On Mon, 6 Oct 2003, Andrew Sullivan wrote:

> There's plenty of academic work which purports to show that LRU is
> far from the best choice.  Just in principle, it seems obvious that a
> single-case seqscan-type operation (such as vacuum does) is a good
> way to lose your cache for no real gain.
>

Logically bypassing caches for a seq scan also makes sense.

> Interestingly, we're contemplating ditching Solaris because of the
> terrible reliability we're getting from the hardware.
>

The reason ops likes solaris / sun is twofold. 1. we have a pile of big
sun machines around.  2. Solaris / Sun is quite a bit more graceful in the
egvent of a hardware failure.  We've burned out our fair share of cpu's
etc and solaris has been rather graceful about it.

I've started profiling and running tests... currently it is leaning
towards the sysv semaphores. I see in src/backend/port/ that pg_sema.c is
linked to the sysv implementation.  So what I did was create a
semaphore set, and then fired off 5 copies of a program that attaches
to that semaphore and then locks/unlocks it 1M times.

2xP2-450, Linux 2.4.18: 1 process: 221680 / sec, 5 process: 98039 / sec
4xUltraSparc II-400Mhz, Solaris 2.6: 1 proc: 142857 / sec, 5 process:
23809

So I'm guessing that is where a LOT of the suck is coming from.

What I plan to do next is looking to see if there are other interprocess
locking mechanisms on solaris (perhaps pthread_mutex with that
inter-process flag or something) to see if I can get those numbers a
little closer.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] locking/performance, Solaris performance discovery

2003-10-06 Thread Tom Lane
Jeff <[EMAIL PROTECTED]> writes:
> I've started profiling and running tests... currently it is leaning
> towards the sysv semaphores. I see in src/backend/port/ that pg_sema.c is
> linked to the sysv implementation.

Does Solaris have Posix semaphores?  You could try using those instead.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Shopping for hardware

2003-10-06 Thread Jason Hihn
Ok, I asked this on [novice], but I was told it's be better to post it
here...

I've got some money to spend on a new servers. The biggest concern is the
PostgreSQL database server that will "be the company." (*Everyone* uses the
database server in some form or another) I'm looking for hot-swappable RAID
1 on a Linux platform at the least. Are there any vendors to avoid or
prefer? What works best? Am I better off going with a DIY or getting
something pre-packaged?

In terms of numbers, we expect have an average of 100 active connections
(most of which are idle 9/10ths of the time), with about 85% reading
traffic.  I hope to have one server host about 1000-2000 active databases,
with the largest being about 60 meg (no blobs). Inactive databases will only
be for reading (archival) purposes, and will seldom be accessed. (I could
probably move them off to another server with a r/o disk...)

Does any of this represent a problem for Postgres? The datasets are
typically not that large, only a few queries on a few databases ever return
over 1000 rows.

The configuration that is going on in my head is:
RAID 1, 200gig disks
1 server, 4g ram
Linux 2.4 or 2.6 (depends on when we deploy and 2.6's track record at that
time)

I want something that can do hot-swaps and auto-mirroring after swap.
Unfortunately, this is a new area for me. (I normally stick to S/W for
non-high end systems)

Thanks!


Jason Hihn
Paytime Payroll



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] locking/performance, Solaris performance discovery

2003-10-06 Thread Jeff
On Mon, 6 Oct 2003, Tom Lane wrote:

>
> Does Solaris have Posix semaphores?  You could try using those instead.
>
>   regards, tom lane

Yep. It does.

I learned them quick enough (using posix_sema.c as a guide)
and found out that at least on Sol 2.6 they are slower than sysv - with 5
processes it went to about 16k lock/unlock a second.

I'm going to try to find a box around here I can get sol(8|9) on that has
sufficient disk space and see.  I'm guessing sun has likely made
improvements...


Another odd thing I'm trying to work out is why my profiles come out so
radically different on the linux box and the sun box.

Sun:
 31.17 18.9018.90 internal_mcount
 19.10 30.4811.58  8075381 0.00 0.00  _bt_checkkeys
  5.66 33.91 3.43 24375253 0.00 0.00  FunctionCall2
  4.82 36.83 2.92  8073010 0.00 0.00  _bt_step
  3.51 38.96 2.1314198 0.15 0.15  _read
  2.77 40.64 1.68  8069040 0.00 0.00  varchareq
  2.59 42.21 1.5728454 0.06 0.23  _bt_next
  2.29 43.60 1.39 1003 1.39 1.40  AtEOXact_Buffers
  1.86 44.73 1.13 16281197 0.00 0.00  pg_detoast_datum
  1.81 45.83 1.10 _mcount
  1.68 46.85 1.02 2181 0.47 0.47  pglz_decompress


Linux:
 11.14  0.62 0.62 1879 0.00 0.00  pglz_decompress
  6.71  0.99 0.37 1004 0.00 0.00  AtEOXact_Buffers
  3.80  1.20 0.21  1103045 0.00 0.00  AllocSetAlloc
  3.23  1.38 0.18   174871 0.00 0.00  nocachegetattr
  2.92  1.54 0.16  1634957 0.00 0.00  AllocSetFreeIndex
  2.50  1.68 0.1420303 0.00 0.00  heapgettup
  1.93  1.79 0.11 1003 0.00 0.00  AtEOXact_CatCache
  1.76  1.89 0.10   128442 0.00 0.00  hash_any
  1.72  1.98 0.1090312 0.00 0.00  FunctionCall3
  1.69  2.08 0.0950632 0.00 0.00  ExecTargetList
  1.60  2.17 0.0951647 0.00 0.00  heap_formtuple
  1.55  2.25 0.09   406162 0.00 0.00  newNode
  1.46  2.33 0.08   133044 0.00 0.00  hash_search

It is the same query with slightly different data (The Sun has probably..
20-40k more rows in the table the query hits).

I'll be digging up more info later today.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Shopping for hardware

2003-10-06 Thread scott.marlowe
On Mon, 6 Oct 2003, Jason Hihn wrote:

> Ok, I asked this on [novice], but I was told it's be better to post it
> here...
> 
> I've got some money to spend on a new servers. The biggest concern is the
> PostgreSQL database server that will "be the company." (*Everyone* uses the
> database server in some form or another) I'm looking for hot-swappable RAID
> 1 on a Linux platform at the least. Are there any vendors to avoid or
> prefer? What works best? Am I better off going with a DIY or getting
> something pre-packaged?

Depends on your hardware expertise.  You can do quite well either way.  I 
prefer adding my own components to a pre-built vanilla server.

> In terms of numbers, we expect have an average of 100 active connections
> (most of which are idle 9/10ths of the time), with about 85% reading
> traffic.  I hope to have one server host about 1000-2000 active databases,
> with the largest being about 60 meg (no blobs). Inactive databases will only
> be for reading (archival) purposes, and will seldom be accessed. (I could
> probably move them off to another server with a r/o disk...)

That's not a really big load, but I'm guessing the peaks will be big 
enough to notice.

> Does any of this represent a problem for Postgres? The datasets are
> typically not that large, only a few queries on a few databases ever return
> over 1000 rows.

Nah, this is pretty normal stuff for Postgresql or any other database in 
its approximate class (Sybase, Oracle, Informix, DB2, MSSQL2k).

> The configuration that is going on in my head is:
> RAID 1, 200gig disks
> 1 server, 4g ram
> Linux 2.4 or 2.6 (depends on when we deploy and 2.6's track record at that
> time)

That's a good starting point.  I'd avoid 2.6 until it's had time for the 
bugs to drop out.  The latest 2.4 kernels are pretty stable.

List of things to include if you need more performance, in order of 
priority:

proper tuning of the postgresql.conf file (see 
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html)
hardware RAID card with battery backed cache, the bigger the cache the 
better.
more drives for RAID 1+0
faster CPUs.  

since you've already got 4 gigs of RAM slated, you're set there on linux, 
where having more won't likely help a lot unless you go to a 64 bit 
platform.

> I want something that can do hot-swaps and auto-mirroring after swap.
> Unfortunately, this is a new area for me. (I normally stick to S/W for
> non-high end systems)

The LSI/Megaraid cards can handle hot swaps quite well, make sure you get 
the right kind of hot swap shoes so they isolate the drive from the buss 
when you turn it off and they don't lock up your scsi buss.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Shopping for hardware

2003-10-06 Thread Vivek Khera
> "JH" == Jason Hihn <[EMAIL PROTECTED]> writes:

JH> The configuration that is going on in my head is:
JH> RAID 1, 200gig disks
JH> 1 server, 4g ram
JH> Linux 2.4 or 2.6 (depends on when we deploy and 2.6's track record at that
JH> time)

My recommendation is to get more disks (smaller and faster) rather
than a few large ones.  As for vendors, I always buy from Dell because
they actually honor their "4-hour 24x7 replacement parts with
technician to stick 'em" in guarantee.  That and their hardware is
rock solid and non-funky (ie, I can run FreeBSD on it with no issues).

Here's my latest setup I just got:

Dell PE 2650, dual Xeon processors (lowest speed they sell, as this is
not a bottleneck)
4Gb RAM
Dell PERC3 RAID controller (rebranded AMI controller) dual channel
2x 18Gb internal disks on RAID1 (RAID channel0)
14x 18Gb external disks on RAID5 (RAID channel1, see my posts on this
list from a month or so ago on how I arrived at RAID5).

All the disks are SCSI 15kRPM U320 drives, tho the controller only
does U160.

I run FreeBSD, but it should run linux just fine, too.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Shopping for hardware

2003-10-06 Thread Josh Berkus
Jason,

> In terms of numbers, we expect have an average of 100 active connections
> (most of which are idle 9/10ths of the time), with about 85% reading
> traffic.  I hope to have one server host about 1000-2000 active databases,
> with the largest being about 60 meg (no blobs). Inactive databases will
> only be for reading (archival) purposes, and will seldom be accessed. (I
> could probably move them off to another server with a r/o disk...)

Hey, two people (one of them me) suggested that rather than putting all 2000 
databases on one $15,000 server, that you buy 3 $5000 servers and split 
things up.   You may have considered this suggestion and rejected it, but 
I'mm wondering if you missed it ...

If you're lumping everything on one server, you'll need to remember to 
increase max_fsm_relations to the total number of tables in all databases ... 
for example, for 10 tables in 2000 databases you'll want a setting of 2 
(which sounds huge but it's really only about 1mb memory).

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Seqscan buffer promotion (was: reindex/vacuum locking/performance?)

2003-10-06 Thread James Rogers
On Mon, 2003-10-06 at 05:15, Andrew Sullivan wrote:
> There's plenty of academic work which purports to show that LRU is
> far from the best choice.  Just in principle, it seems obvious that a
> single-case seqscan-type operation (such as vacuum does) is a good
> way to lose your cache for no real gain.


Traditionally, seqscan type operations are accommodated in LRU type
managers by having multiple buffer promotion policies, primarily because
it is simple to implement.  For example, if you are doing a seqscan, a
buffer loaded from disk is never promoted to the top of the LRU. 
Instead it is only partially promoted (say, halfway for example) toward
the top of the buffer list.  A page that is already in the buffer is
promoted either to the halfway point or top depending on where it was
found.  There are numerous variations on the idea, some being more
clever and complex than others.  

The point of this being that a pathological or rare sequential scan can
never trash more than a certain percentage of the cache, while not
significantly impacting the performance of a sequential scan. The
primary nuisance is that it slightly increases the API complexity. I'll
add that I don't know what PostgreSQL actually does in this regard, but
from the thread it appears as though seqscans are handled like the
default case.

Cheers,

-James Rogers
 [EMAIL PROTECTED]



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] count(*) slow on large tables

2003-10-06 Thread Sean Chittenden
> How it will help? This is in addition to trigger proposal that came
> up earlier. With triggers it's not possible to make values visible
> across backends unless trigger updates a table, which eventually
> leads to vacuum/dead tuples problem.
> 
> 1. User creates a trigger to check updates/inserts for certain conditions.
> 2. It updates the count as and when required.
> 3. If the trigger detects the count is not initialized, it would issue the 
> same query first time. There is no avoiding this issue.
> 
> Besides providing facility of resident variables could be used
> imaginatively as well.
> 
> Does this make sense? IMO this is more generalised approach over all.

I do this _VERY_ frequently in my databases, only I have my stored
procs do the aggregate in a predefined MVCC table that's always there.
Here's a denormalized version for public consumption/thought:

CREATE TABLE global.dba_aggregate_cache (
  dbl TEXT NOT NULL,-- The database location, doesn't need to be
-- qualified (ex: schema.table.col)
  op TEXT NOT NULL, -- The operation, SUM, COUNT, etc.
  qual TEXT,-- Any kind of conditional, such as a where clause
  val_int INT,  -- Whatever the value is, of type INT
  val_bigint BIGINT,-- Whatever the value is, of type BIGINT
  val_text TEXT,-- Whatever the value is, of type TEXT
  val_bytea BYTEA,  -- Whatever the value is, of type BYTEA
);
CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON 
global.dba_aggregate_cache(dbl,op);

Then, I use a function to retrieve this value instead of a SELECT
COUNT(*).

SELECT public.cache_count('dbl','qual');  -- In this case, the op is COUNT
SELECT public.cache_count('dbl'); -- Returns the COUNT for the table listed in 
the dbl

Then, I create 4 or 5 functions (depends on the op I'm performing):

1) A private function that _doesn't_ run as security definer, that
   populates the global.dba_aggregate_cache row if it's empty.
2) A STABLE function for SELECTs, if the row doesn't exist, then it
   calls function #1 to populate its existence.
3) A STABLE function for INSERTs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then adds the
   necessary bits to make it accurate.
4) A STABLE function for DELETEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then deletes the
   necessary bits to make it accurate.
5) A STABLE function for UPDATEs, if the row doesn't exist, then it
   calls function #1 to populate its existence, then updates the
   necessary bits to make it accurate.  It's not uncommon for me to
   not have an UPDATE function/trigger.

Create triggers for functions 2-5, and test away.  It's MVCC,
searching through a table that's INDEX'ed for a single row is
obviously vastly faster than a seqscan/aggregate.  If I need any kind
of an aggregate to be fast, I use this system with a derivation of the
above table.  The problem with it being that I have to retrain others
to use cache_count(), or some other function instead of using
COUNT(*).

That said, it'd be nice if there were a way to tell PostgreSQL to do
the above for you and teach COUNT(*), SUM(*), or other aggregates to
use an MVCC backed cache similar to the above.  If people want their
COUNT's to be fast, then they have to live with the INSERT, UPDATE,
DELETE cost.  The above doesn't work with anything complex such as
join's, but it's certainly a start and I think satisfies everyone's
gripes other than the tuple churn that _does_ happen (*nudge nudge*,
pg_autovacuum could be integrated into the backend to handle this).
Those worried about performance, the pages that are constantly being
recycled would likely stay in disk cache (PG or the OS).  There's
still some commit overhead, but still... no need to over optimize by
requiring the table to be stored in the out dated, slow, and over used
shm (also, *nudge nudge*).

Anyway, let me throw that out there as a solution that I use and it
works quite well.  I didn't explain the use of the qual column, but I
think those who grasp the above way of handling things probably grok
how to use the qual column in a dynamically executed query.

CREATE AGGREGATE CACHE anyone?

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] SOlaris updates

2003-10-06 Thread Jeff
Ran the test on another linux box - the one that generated the dump the
sun loaded (which should have similar data...) and I got a profile plan
similar to the Sun. Which makes me feel more comfortable.

Still interesting why that other box gave me the different profile.
Now off the fun and exciting world of seeing what I can do about it.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Neil Conway
On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> On Sun, 2003-10-05 at 19:43, Tom Lane wrote:
> > This would be relatively easy to fix as far as our own buffering is
> > concerned, but the thing that's needed to make it really useful is
> > to prevent caching of seqscan-read pages in the kernel disk buffers.

> For the non-portable way of doing this, are you referring to O_DIRECT?

I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
right?

(If you were referring to O_DIRECT, I wanted to add that I wouldn't be
surprised if using O_DIRECT on many kernels reduces or eliminates any
readahead the OS will be doing on the sequential read, so the net result
may actually be a loss for a typical seqscan.)

-Neil



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Postgres low end processing.

2003-10-06 Thread Josh Berkus
Stef,

> => 1. Make sure that the WAL files (pg_xlog) are on a seperate disk from the 
> => database files, either through mounting or symlinking.
>  
> I'm not sure I understand how this helps?

It gives you better fsync write performance on a low-end disk setup.   
Otherwise, the disk is forced to do a hop-back-and-forth between the database 
and the xlog, resulting in much slower updates and thus the database tying up 
blocks of RAM longer -- particularly if your shared_buffers are set very low, 
which they will be.

On RAID setups, this is unnecessary becuase the RAID takes care of disk access 
management.  But on a low-end, 2-IDE-disk machine, you have to do it.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
> right?

Not necessarily --- as you point out, it's not clear that O_DIRECT would
help us.  What would be way cool is something similar to what James
Rogers was talking about: a way to tell the kernel not to promote this
page all the way to the top of its LRU list.  I'm not sure that *any*
Unixen have such an API, let alone one that's common across more than
one platform :-(

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Larry Rosenman


--On Monday, October 06, 2003 14:26:10 -0400 Tom Lane <[EMAIL PROTECTED]> 
wrote:

Neil Conway <[EMAIL PROTECTED]> writes:
On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
right?
Not necessarily --- as you point out, it's not clear that O_DIRECT would
help us.  What would be way cool is something similar to what James
Rogers was talking about: a way to tell the kernel not to promote this
page all the way to the top of its LRU list.  I'm not sure that *any*
Unixen have such an API, let alone one that's common across more than
one platform :-(
I think Verita's VxFS has this as an option/IOCTL.

You can read the Veritas doc on my
http://www.lerctr.org:8458/
pages under filesystems.

That should work on UnixWare and Solaris sites that have VxFS installed.

VxFS is standard on UW.

LER

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


pgp0.pgp
Description: PGP signature


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Bruce Momjian
Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> > I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
> > right?
> 
> Not necessarily --- as you point out, it's not clear that O_DIRECT would
> help us.  What would be way cool is something similar to what James
> Rogers was talking about: a way to tell the kernel not to promote this
> page all the way to the top of its LRU list.  I'm not sure that *any*
> Unixen have such an API, let alone one that's common across more than
> one platform :-(

Solaris has "free-behind", which prevents a large kernel sequential scan
from blowing out the cache.

I only read about it in the Mauro Solaris Internals book, and it seems
to be done automatically.  I guess most OS's don't do this optimization
because they usually don't read files larger than their cache.

I see BSD/OS madvise() has:

 #define MADV_NORMAL 0   /* no further special treatment */
 #define MADV_RANDOM 1   /* expect random page references */
 #define MADV_SEQUENTIAL 2   /* expect sequential references */
 #define MADV_WILLNEED   3   /* will need these pages */
-->  #define MADV_DONTNEED   4   /* don't need these pages */
 #define MADV_SPACEAVAIL 5   /* insure that resources are reserved */

The marked one seems to have the control we need.  Of course, the kernel
madvise() code has:

/* Not yet implemented */

Looks like NetBSD implements it, but it also unmaps the page from the
address space, which might be more than we want.  NetBSD alao has:

 #define MADV_FREE   6   /* pages are empty, free them */

which frees the page.  I am unclear on its us.

FreeBSD has this comment:

/*
 * vm_page_dontneed
 *
 *  Cache, deactivate, or do nothing as appropriate.  This routine
 *  is typically used by madvise() MADV_DONTNEED.
 *
 *  Generally speaking we want to move the page into the cache so
 *  it gets reused quickly.  However, this can result in a silly syndrome
 *  due to the page recycling too quickly.  Small objects will not be
 *  fully cached.  On the otherhand, if we move the page to the inactive
 *  queue we wind up with a problem whereby very large objects
 *  unnecessarily blow away our inactive and cache queues.
 *
 *  The solution is to move the pages based on a fixed weighting.  We
 *  either leave them alone, deactivate them, or move them to the cache,
 *  where moving them to the cache has the highest weighting.
 *  By forcing some pages into other queues we eventually force the
 *  system to balance the queues, potentially recovering other unrelated
 *  space from active.  The idea is to not force this to happen too
 *  often.
 */

The Linux comment is:

/*
 * Application no longer needs these pages.  If the pages are dirty,
 * it's OK to just throw them away.  The app will be more careful about
 * data it wants to keep.  Be sure to free swap resources too.  The
 * zap_page_range call sets things up for refill_inactive to actually free
 * these pages later if no one else has touched them in the meantime,
 * although we could add these pages to a global reuse list for
 * refill_inactive to pick up before reclaiming other pages.
 *
 * NB: This interface discards data rather than pushes it out to swap,
 * as some implementations do.  This has performance implications for
 * applications like large transactional databases which want to discard
 * pages in anonymous maps after committing to backing store the data
 * that was kept in them.  There is no reason to write this data out to
 * the swap area if the application is discarding it.
 *
 * An interface that causes the system to free clean pages and flush
 * dirty pages is already available as msync(MS_INVALIDATE).
 */

It seems mmap is more for controlling the memory mapping of files rather
than controlling the cache itself.



-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Jason Hihn
Stepping out on a limb... (I'm not a disk kernel guy)

I have long thought that as part of a cache descriptor, there should be a
process-definable replacement-strategy (RS). Each cache entry would be
associated to each process's replacement-strategy variable and the
page-replacement algorithm would then take into consideration the desired
policy. Imagine for simplicity sake, that each strategy gets its own cache
table. When it comes time to replace a page, the system scans the cache
tables, picks the most likely page for replacement from each table, then
selects the most likely page between all policies. This allows the 99% of
apps that can make excellent use of use LRU to use LRU among themselves
(best for everyone), and the MRU (better for databases) (best for everyone
too) to only sacrifice the best pages between MRU apps. Though, once you
have an MRU process, the final decision between taking the page should be
use MRU, and not LRU. Of course there are a number of questions: does each
RS get its own table, to be managed independently, or can we combine them
all into one table? What are the performance implications of the multiple
table management?

One day, I'd like to see function pointers and kernel modules used as ways
for apps to manage replacement policy. fantasyland# insmod MRU.o
fantasyland# vi postgresql.conf { replacement_policy=MRU }
{meanwhile in some postgre .c file:}
set_cache_policy(get_cfg_replacement_policy());
fantasyland# service postmaster restart

Anyone want to throw this at the kernel developers?

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
> Sent: Monday, October 06, 2003 2:26 PM
> To: Neil Conway
> Cc: Andrew Sullivan; PostgreSQL Performance
> Subject: Re: [PERFORM] reindex/vacuum locking/performance?
>
>
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> > I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
> > right?
>
> Not necessarily --- as you point out, it's not clear that O_DIRECT would
> help us.  What would be way cool is something similar to what James
> Rogers was talking about: a way to tell the kernel not to promote this
> page all the way to the top of its LRU list.  I'm not sure that *any*
> Unixen have such an API, let alone one that's common across more than
> one platform :-(
>
>   regards, tom lane
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] reindex/vacuum locking/performance?

2003-10-06 Thread Ronald Khoo

> This would be relatively easy to fix as far as our own buffering is
> concerned, but the thing that's needed to make it really useful is
> to prevent caching of seqscan-read pages in the kernel disk buffers.
> I don't know any portable way to do that :-(

raw disc ? :-)




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]