[PERFORM] Forcing WAL flush

2013-01-07 Thread james
Is there a way to force a WAL flush so that async commits (from other 
connections) are flushed, short of actually updating a sacrificial row?


Would be nice to do it without generating anything extra, even if it is 
something that causes IO in the checkpoint.


Am I right to think that an empty transaction won't do it, and nor will 
a transaction that is just a NOTIFY?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing WAL flush

2013-01-07 Thread james

Le 2013-01-07 à 16:49, james a écrit :


Is there a way to force a WAL flush so that async commits (from other 
connections) are flushed, short of actually updating a sacrificial row?

Would be nice to do it without generating anything extra, even if it is 
something that causes IO in the checkpoint.

Am I right to think that an empty transaction won't do it, and nor will a 
transaction that is just a NOTIFY?


Does pg_start_backup() trigger a full WAL flush?

http://www.postgresql.org/docs/9.2/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP
http://www.postgresql.org/docs/9.2/static/functions-admin.html#FUNCTIONS-ADMIN-BACKUP-TABLE

Bye,
François


That sounds rather heavyweight!

I'm looking for something lightweight - I might call this rather often, 
as a sort of application-level group commit where I commit async but 
defer the ack to the requester (or other externally visible side 
effects) slightly until some other thread forces a flush.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of complicated query

2013-05-23 Thread james

On 23/05/2013 22:57, Jonathan Morra wrote:
I'm not sure I understand your proposed solution.  There is also the 
case to consider where the same patient can be assigned the same 
device multiple times.  In this case, the value may be reset at each 
assignment (hence the line value - issued_value AS value from the 
original query).




Perhaps you could use triggers to help somewhat?  At least for the 
lifetime part.


For a given assignment of a device to a patient, only the last value is 
useful, so you can maintain that easily enough (a bit like a 
materialised view but before 9.3 I guess).


But, that might fix 'lifetime' but not some arbitrary windowed view.  I 
can see why an 'as at' end time is useful, but not why a start time is 
so useful: if a device has readings before the window but not in the 
window, is that 'no reading' or should the last reading prior to the 
window apply?


It also seems to me that the solution you have is hard to reason about.  
Its like a Haskell program done in one big inline fold rather than a 
bunch of 'where' clauses, and I find these cause significant brain overload.


Perhaps you could break it out into identifiable chunks that work out 
(both for lifetime if not using triggers, and for your date range 
otherwise) the readings that are not superceded (ie the last in the date 
bounds for a device assignment), and then work with those.  Consider the 
CTE 'WITH queries' for doing this?


It seems to me that if you can do this, then the problem might be easier 
to express.


Failing that, I'd be looking at using temporary tables, and forcing a 
series of reduce steps using them, but then I'm a nasty old Sybase 
hacker at heart. ;-)




[PERFORM] Postgres recovery time

2005-11-14 Thread Piccarello, James (James)
Title: Postgres recovery time






Does anyone know what factors affect the recovery time of postgres if it does not shutdown cleanly? With the same size database I've seen  times from a few seconds to a few minutes. The longest time was 33 minutes. The 33 minutes was after a complete system crash and reboot so there are a lot of other things going on as well. 125 seconds was the longest time I could reproduce by just doing a kill -9 on postmaster. 

Is it the size of the transaction log? The dead space in files? 


I'm running postges 7.3.4 in Red Hat 8.0. Yes, yes I know it's crazy but for a variety of reasons upgrading is not currently feasible.

Jim





Re: [PERFORM] Hardware/OS recommendations for large databases ( 5TB)

2005-11-15 Thread James Mello
Unless there was a way to guarantee consistency, it would be hard at
best to make this work. Convergence on large data sets across boxes is
non-trivial, and diffing databases is difficult at best. Unless there
was some form of automated way to ensure consistency, going 8 ways into
separate boxes is *very* hard. I do suppose that if you have fancy
storage (EMC, Hitachi) you could do BCV or Shadow copies. But in terms
of commodity stuff, I'd have to agree with Merlin. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of William Yu
Sent: Tuesday, November 15, 2005 10:57 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Hardware/OS recommendations for large databases (
5TB)

Merlin Moncure wrote:
>>You could instead buy 8 machines that total 16 cores, 128GB RAM and
>  
> It's hard to say what would be better.  My gut says the 5u box would 
> be a lot better at handling high cpu/high concurrency problems...like 
> your typical business erp backend.  This is pure speculation of 
> course...I'll defer to the experts here.

In this specific case (data warehouse app), multiple machines is the
better bet. Load data on 1 machine, copy to other servers and then use a
middleman to spread out SQL statements to each machine.

I was going to suggest pgpool as the middleman but I believe it's
limited to 2 machines max at this time. I suppose you could daisy chain
pgpools running on every machine.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] make bulk deletes faster?

2005-12-17 Thread James Klo
I have the following table:

CREATE TABLE timeblock
(
  timeblockid int8 NOT NULL,
  starttime timestamp,
  endtime timestamp,
  duration int4,
  blocktypeid int8,
  domain_id int8,
  create_date timestamp,
  revision_date timestamp,
  scheduleid int8,
  CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid),
  CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid)
  REFERENCES blocktype (blocktypeid) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id)
  REFERENCES wa_common_domain (domain_id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
) 
WITH OIDS;

CREATE INDEX timeblock_blocktype_idx
  ON timeblock
  USING btree
  (blocktypeid);

CREATE INDEX timeblock_date_idx
  ON timeblock
  USING btree
  (starttime, endtime);

CREATE INDEX timeblockepoch_idx
  ON timeblock
  USING btree
  (date_trunc('minute'::text, starttime), (date_part('epoch'::text, 
date_trunc('minute'::text, starttime)) * 1000::double precision), 
date_trunc('minute'::text, endtime), (date_part('epoch'::text, 
date_trunc('minute'::text, endtime)) * 1000::double precision));

CREATE INDEX timeblockhourmin_idx
  ON timeblock
  USING btree
  (date_part('hour'::text, starttime), date_part('minute'::text, 
starttime), date_part('hour'::text, endtime), date_part('minute'::text, 
endtime));

CREATE INDEX timeblockid_idx
  ON timeblock
  USING btree
  (timeblockid);


There are also indexes on wa_common_domain and blocktype on pkeys.

explain analyze delete from timeblock where timeblockid = 66

Index Scan using timeblockid_idx on timeblock  (cost=0.00..5.28 rows=1 
width=6) (actual time=0.022..0.022 rows=0 loops=1)
  Index Cond: (timeblockid = 66)
Total runtime: 0.069 ms


I need to routinely move data from the timeblock table to an archive 
table with the same schema named timeblock_archive.  I really need this 
to happen as quickly as possible, as the archive operation appears to 
really tax the db server... 

I'd like some suggestions on how to get the deletes to happen faster, as 
while deleting individually appears to extremely fast, when I go to 
delete lots of rows the operation takes an extremely long time to 
complete (5000 rows takes about 3 minutes, 100 rows takes almost 
close to 4 hours or more depending upon server load; wall time btw).

i've tried several different approaches doing the delete and I can't 
seem to make it much faster... anyone have any ideas?

The approaches I've taken both use a temp table to define the set that 
needs to be deleted.

Here's what I've tried:

Attempt 1:
--
delete from timeblock where timeblockid in (select timeblockid from 
timeblock_tmp)


Attempt 2:
--
num_to_delete := (select count(1) from tmp_timeblock);
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% 
rows]', timestart, num_to_delete;
cur_offset := 0;
while cur_offset < num_to_delete loop
   delete from timeblock where timeblockid in 
  (select timeblockid from 
 tmp_timeblock limit 100 offset cur_offset);
   get diagnostics num_affected = ROW_COUNT;
   RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows] 
cur_offset = %', timestart, num_affected, cur_offset;
  cur_offset := cur_offset + 100;
end loop;


Attempt 3:
--
   num_to_delete := (select count(1) from tmp_timeblock);
   cur_offset := num_to_delete;
   RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [% 
rows]', timestart, num_to_delete;
   open del_cursor for select timeblockid from tmp_timeblock;
   loop
  fetch del_cursor into del_pkey;
  if not found then
 exit;
  else
 delete from timeblock where timeblockid = del_pkey;
 get diagnostics num_affected = ROW_COUNT;
 cur_offset := cur_offset - num_affected;
 if cur_offset % 1000 = 0 then 
RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% 
left]', timestart, cur_offset;
 end if;
  end if;
   end loop;
   close del_cursor;


I've considered using min(starttime) and max(starttime) from the temp 
table and doing: 

delete from timeblock where starttime between min and max;

however, I'm concerned about leaving orphan data, deleting too much data  
running into foreign key conflicts, etc.

dropping the indexes on timeblock could be bad, as this table recieves 
has a high volume on reads, inserts & updates.

Any one have any suggestions?

Thanks,

Jim K

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] make bulk deletes faster?

2005-12-19 Thread James Klo
In article <[EMAIL PROTECTED]>,
 [EMAIL PROTECTED] (Michael Fuhr) wrote:

> On Sat, Dec 17, 2005 at 09:10:40PM -0800, James Klo wrote:
> > I'd like some suggestions on how to get the deletes to happen faster, as 
> > while deleting individually appears to extremely fast, when I go to 
> > delete lots of rows the operation takes an extremely long time to 
> > complete (5000 rows takes about 3 minutes, 100 rows takes almost 
> > close to 4 hours or more depending upon server load; wall time btw).
> 
> Those times do seem excessive -- do any other tables have foreign
> key references to the table you're deleting from?  If so, do those
> tables have indexes on the referring columns?  Does this table or
> any referring table have triggers?  Also, are you regularly vacuuming
> and analyzing your tables?  Have you examined pg_locks to see if
> an unacquired lock might be slowing things down?

As the table was originally created using Hibernate, yes, there are 
several key references, however I've already added indexes those tables 
on referring keys.  There are no triggers, we were running 
pg_autovaccum, but found that it wasn't completing.  I believe we 
disabled, and are now running a cron every 4 hours. My archiving method, 
is also running analyze - as I figure after a mass deletes, it would 
probably keep query speeds from degrading.)

I've looked at pg_locks, but not sure I understand quite how to use it 
to determine if there are unacquired locks.  I do know that we 
occasionally get some warnings from C3P0 that states it detects a 
deadlock, and allocates emergency threads.

BTW, If I didn't mention, we are using PG 8.1 on Red Hat Enterprise, 4GB 
RAM, 4 dual-core CPUs, think its RAID5 (looks like what I would consider 
typical Linux partitioning /, /tmp, /usr, /var, /boot, /home).  After 
trolling the archives, and doing a bit of sleuthing on the DB, I'm lead 
to believe that this is more or less a default install of PG 8.1. As I'm 
relatively new to PG, I'm not sure how it should be configured for our 
setup.  I would suspect that this could probably effect the speed of 
deletes (and queries as well).

Thanks for any help you can provide.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] make bulk deletes faster?

2005-12-19 Thread James Klo

Mitch Skinner wrote:


Have you considered partitioning?

http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

If you can partition your timeblock table so that you archive an entire
partition at a time, then you can delete the archived rows by just
dropping (or truncating) that partition.  AFAIK there's no way to
"re-parent" a partition (e.g., from the timeblock table to the
timeblock_archive table).

If your app is particularly cooperative you might be able to use
partitioning to avoid moving data around entirely.  If table accesses
are always qualified by something you can use as a partitioning key,
then partitioning can give you the speed benefits of a small table
without the effort of keeping it cleared out.


Yes, I've considered partitioning as a long term change. I was thinking 
about this for other reasons - mainly performance.  If I go the 
partitioning route, would I need to even perform archival?


The larger problem that I need to solve is really twofold:

1. Need to keep reads on timeblocks that are from the current day 
through the following seven days very fast, especially current day reads.


2. Need to be able to maintain the timeblocks for reporting purposes, 
for at least a year (potentially more).  This could probably better 
handled performing aggregate analysis, but this isn't on my current radar.



Another good read, if you haven't yet, is
http://powerpostgresql.com/Downloads/annotated_conf_80.html
especially the "Memory", "Checkpoints", and maybe "WAL options"
sections.  If you're doing large deletes then you may need to increase
your free space map settings--if a VACUUM VERBOSE finishes by saying
that you need more FSM pages, then the table may have gotten bloated
over time (which can be fixed with a configuration change and a VACUUM
FULL, though this will lock everything else out of the table while it's
running).



Thanks, I will look into this as well.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Optimizing a query

2006-12-13 Thread James Cloos
I've currently got this table:

,
| n=# \d nanpa
|  Table "public.nanpa"
|Column   | Type | Modifiers 
| +--+---
|  state  | character(2) | 
|  npa| character(3) | not null
|  nxx| character(3) | not null
|  ocn| character(4) | 
|  company| text | 
|  ratecenter | text | 
|  switch | text | 
|  effective  | date | 
|  use| character(2) | not null
|  assign | date | 
|  ig | character(1) | 
| Indexes:
| "nanpa_pkey" PRIMARY KEY, btree (npa, nxx) CLUSTER
`

and was doing queries of the form:

,
| select * from nanpa where npa=775 and nxx=413;
`

where were quite slow.  Explain showed that it was doing sequential
scans even though the primary key contained the two term I was
selecting on.

Today, looking at it again in prep to this post, I noticed that the
numbers were being converted to ::text, and a quick test showed that
queries of the form:

,
| select * from nanpa where npa=775::bpchar and nxx=413::bpchar;
`

used the index.

I specified char(3) when I created the table simple because npa and
nxx are defined as three-character strings.  Tagging the queies is
a pain, especially as I often do queries of that form in psql(1).

(Incidently, there are multiple similar tables, also keyed on
(npa,nxx), which show the same problem.  The nanpa table above is
just a good example.)

Should I convert the columns to text?  Or create an additional index
that expects ::text args?  (If so, how?)

Or is there some other way to ensure the indices get used w/o having
to tag data in the queries?

Thanks,

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimizing a query

2006-12-13 Thread James Cloos
>>>>> "Husam" == Tomeh, Husam <[EMAIL PROTECTED]> writes:

Husam> Have you run vacuum/analyze on the table?

Yes, back when I first noticed how slow it was.
It did not make any difference.

explain analyze says:

,
| n=# explain analyse  select * from nanpa where npa=775 and nxx=473;
|QUERY PLAN 
  
| 

|  Seq Scan on nanpa  (cost=0.00..5344.60 rows=4 width=105) (actual 
time=371.718..516.816 rows=1 loops=1)
|Filter: (((npa)::text = '775'::text) AND ((nxx)::text = '473'::text))
|  Total runtime: 516.909 ms
| (3 rows)
`

vs:

,
| n=# explain analyse  select * from nanpa where npa=775::char and 
nxx=473::char;
|   QUERY PLAN  

| 
--
|  Index Scan using nanpa_pkey on nanpa  (cost=0.00..4.33 rows=1 width=105) 
(actual time=64.831..64.831 rows=0 loops=1)
|Index Cond: ((npa = '7'::bpchar) AND (nxx = '4'::bpchar))
|  Total runtime: 64.927 ms
| (3 rows)
`

BTW, I forgot to mention I'm at 8.1.4 on that box.

-JimC
-- 
James Cloos <[EMAIL PROTECTED]> OpenPGP: 1024D/ED7DAEA6

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] compact flash disks?

2007-03-06 Thread James Mansion
I see that one can now get compact flash to SATA connectors.

If I were to use a filesystem with noatime etc and little non-sql traffic,
does the physical update pattern tend to have hot sectors that will tend to
wear out CF?

I'm wondering about a RAID5 with data on CF drives and RAID1 for teh WAL on
a fast SATA or SAS drive pair.  I'm thhinking that this would tend to have
good performance because the seek time for the data is very low, even if the
actual write speed can be slower than state of the art.  2GB CF isn't so
pricey any more.

Just wondering.

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/711 - Release Date: 05/03/2007
09:41


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] compact flash disks?

2007-03-07 Thread James Mansion
>WARNING:  modern TOtL flash RAMs are only good for ~1.2M writes per
>memory cell.  and that's the =good= ones.
>Using flash RAM for write heavy applications like OLTP, or for WAL,
>etc can be very dangerous

Well, that's why I suggested that the WAL would stream to a hard disk
array, where the large IO sequential write speed will be helpful.

Whether OLTP is a problem will presumably depend on the freqency of updates
and vacuum to each physical cluster of rows in a disk block.

Most rows in a trading application will have quite a long lifetime, and be
updated relatively few times (even where we writing fixings info into
trades).

>Flash write speeds also stink; being ~1/2 flash's already low read speed.

Sure - but it may still be an effective tradoff where the limiting factor
would otherwise be seek time.

>Much better to use flash RAM for read heavy applications.

Why?  I can get a 'PC' server with 128GB of RAM quite easily now,
and that will mean I can cache most of not all hot data for any trading
app I've worked on.  Settled trades that matured in prior periods can
be moved to tables on real disks - they are hardly ever accessed
anyway.


In the long run, we are going to have to seriously rethink pg's use
of WAL as the way we implement MVCC as it becomes more and more of a
performance bottleneck.
We have WAL because Stonebreaker made an assumption about the future
dominance of optical media that has turned out to be false.
...and it's been one of pg's big issues every since.


>>   2GB CF isn't so
>>pricey any more.
>Heck =16= GB Flash only costs ~$300 US and 128GB SSDs based on flash
>RAM are due out this year.

Quite.  Suppose I have a RAID with double redundancy, then I get enough
capacity
for quite a lot of raw data, and can swap a card out every weekend and let
the
RAID rebuild it in rotation to keep them within conservative wear limits.

So long as the wear levelling works moderately well (and without needing FAT
on the disk or whatever) then I should be fine.

I think.  Maybe.

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/713 - Release Date: 07/03/2007
09:24


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] compact flash disks?

2007-03-08 Thread James Mansion
Isn't it likely that a single stream (or perhaps one that can be partitioned
across spindles) will tend to be fastest, since it has a nice localised
stream that a) allows for compression of reasonable blocks and b) fits with
commit aggregation?

RAM capacity on servers is going up and up, but the size of a customer
address or row on an invoice isn't.  I'd like to see an emphasis on speed of
update with an assumption that most hot data is cached, most of the time.

My understanding also is that storing data columnwise is handy when its
persisted because linear scans are much faster.  Saw it once with a system
modelled after APL, blew me away even on a sparc10 once the data was
organised and could be mapped.

Still, for the moment anything that helps with the existing system would be
good.  Would it help to define triggers to be deferrable to commit as well
as end of statement (and per row)?  Seems to me it should be, at least for
ones that raise 'some thing changed' events.  And/or allow specification
that events can fold and should be very cheap (don't know if this is the
case now?  Its not as well documented how this works as I'd like)

James
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.7/713 - Release Date: 07/03/2007
09:24


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Dispatch-Merge pattern

2007-03-15 Thread James Riordan

Howdy-

I am currently using PostgreSQL to store and process a high-bandwidth
event stream.  I do not need old events but the delete and vacuum does
not terminate due to the large number of events being inserted (it
just pushes me over the tipping point of where the machine can keep up
with the events).

I ended up implementing a scheme where a trigger is used to redirect
the events (round robin based on time) to a series of identically
structured tables.  I can then use TRUNCATE older tables rather than
DELETE and VACUUM (which is a significant speed up).

It worked out pretty well so thought post the idea to find out if

- it is stupid way of doing things and there is a correct database
  abstraction for doing this

or

 - it is a reasonable way of solving this problem and might be of use
   to other folks using rdbs as event processing

I then use a view to merge the tables.  Obviously update would be a
problem for my purposes, and I suppose a lot of event processing, it
isn't an issue.

Either way, details are at:
 
http://unsyntax.net/james/blog/tools+and+programming/2007/03/08/Dispatch-Merge-Database-Pattern

Cheers,
James

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] compact flash disks?

2007-04-03 Thread James Mansion
>On sequential read speed HDs outperform flash disks... only on random
>access the flash disks are better. So if your application is a DW one,
>you're very likely better off using HDs.

This looks likely to be a non-issue shortly, see here:

http://www.reghardware.co.uk/2007/03/27/sams_doubles_ssd_capacity/

I still think this sort of devices will become the OLTP device
of choice before too long - even if we do have to watch the wear rate.

>WARNING:  modern TOtL flash RAMs are only good for ~1.2M writes per
>memory cell.  and that's the =good= ones.

Well, my original question was whether the physical update pattern
of the server does have hotspots that will tend to cause a problem
in normal usage if the wear levelling (such as it is) doesn't entirely
spread the load.  The sorts of application I'm interested in will not
update individual data elements very often.  There's a danger that
index nodes might be rewritted frequently, but one might want to allow
that indexes persist lazily and should be recovered from a scan after
a crash that leaves them dirty, so that they can be cached and avoid
such an access pattern.


Out of interest with respect to WAL - has anyone tested to see whether
one could tune the group commit to pick up slightly bigger blocks and
write the WAL using compression, to up the *effective* write speed of
media? Once again, most data I'm interested in is far from a random
pattern and tends to compress quite well.

If the WAL write is committed to the disk platter, is it OK for
arbitrary data blocks to have failed to commit to disk so we can
recover the updates for committed transactions?

Is theer any documentation on the write barrier usage?

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.25/744 - Release Date: 03/04/2007
05:32


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SCSI vs SATA

2007-04-04 Thread James Mansion
>sure but for any serious usage one either wants to disable that
>cache(and rely on tagged command queuing or how that is called in SATAII
>world) or rely on the OS/raidcontroller implementing some sort of
>FUA/write barrier feature(which linux for example only does in pretty
>recent kernels)

Does anyone know which other hosts have write barrier implementations?
Solaris?  FreeBSD? Windows?

The buffers should help greatly in such a case, right?  Particularly if
you have quite a wide stripe.

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007
12:48


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SCSI vs SATA

2007-04-04 Thread James Mansion
>Right --- the point is not the interface, but whether the drive is built
>for reliability or to hit a low price point.

Personally I take the marketing mublings about the enterprise drives
with a pinch of salt.  The low-price drives HAVE TO be reliable too,
because a non-negligible failure rate will result in returns processing
costs that destroy a very thin margin.

Granted, there was a move to very short warranties a while back,
but the trend has been for more realistic warranties again recently.
You can bet they don't do this unless the drives are generally pretty
good.

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.25/745 - Release Date: 03/04/2007
12:48


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SCSI vs SATA

2007-04-05 Thread James Mansion
>Server drives are generally more tolerant of higher temperatures.  I.e.
>the failure rate for consumer and server class HDs may be about the same
>at 40 degrees C, but by the time the internal case temps get up to 60-70
>degrees C, the consumer grade drives will likely be failing at a much
>higher rate, whether they're working hard or not.

Can you cite any statistical evidence for this?

James

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.26/746 - Release Date: 04/04/2007
13:09


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] SCSI vs SATA

2007-04-08 Thread James Mansion
>Logic?

Foul!  That's NOT evidence.

>
>Mechanical devices have decreasing MTBF when run in hotter environments,
>often at non-linear rates.

I agree that this seems intuitive.  But I think taking it as a cast-iron
truth is dangerous.

>Server class drives are designed with a longer lifespan in mind.

Evidence?

>Server class hard drives are rated at higher temperatures than desktop
>drives.
>
>Google can supply any numbers to fill those facts in, but I found a
>dozen or so data sheets for various enterprise versus desktop drives in
>a matter of minutes.

I know what the marketing info says, that's not the point.  Bear in mind
that these are somewhat designed to justify very much higher prices.

I'm looking for statistical evidence that the difference is there, not
marketing colateral.  They may be designed to be more reliable.  And
the design targets *that the manufacturer admits to* may be more
stringent, but I'm interested to know what the actual measured difference
is.

>From the sound of it, you DON'T have such evidence.  Which is not a
surprise, because I don't have it either, and I do try to keep my eyes
open for it.

James




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 269.0.0/751 - Release Date: 07/04/2007
22:57


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Craig James




I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value 
on one of the columns I finally killed after it ran 17 hours and had 
still not completed.  Queries into the table are butt slow, and


The update query that started this all I had to kill after 17hours.  
It should have updated all 121+ million records.  That brought my 
select count down to 19 minutes, but still a far cry from acceptable.


If you have a column that needs to be updated often for all rows, 
separate it into a different table, and create a view that joins it back 
to the main table so that your application still sees the old schema.


This will greatly speed your update since (in Postgres) and update is 
the same as a delete+insert.  By updating that one column, you're 
re-writing your entire 121 million rows.  If you separate it, you're 
only rewriting that one column.  Don't forget to vacuum/analyze and 
reindex when you're done.


Better yet, if you can stand a short down time, you can drop indexes on 
that column, truncate, then do 121 million inserts, and finally 
reindex.  That will be MUCH faster.


Craig



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Auto-ANALYZE?

2007-05-23 Thread Craig James

Auto-vacuum has made Postgres a much more "friendly" system.  Is there some 
reason the planner can't also auto-ANALYZE in some situations?

Here's an example I ran into:

  create table my_tmp_table (...);
  insert into my_tmp_table (select some stuff from here and there);
  select ... from my_tmp_table join another_table on (...);

The last statement generated a horrible plan, because the planner had no idea 
what was in the temporary table (which only had about 100 rows in it).  Simply 
inserting an ANALYZE before the SELECT improved performance by a factor of 100 
or so.

There are several situations where you could automatically analyze the data.

1. Any time you have to do a full table scan, you might as well throw in an 
ANALYZE of the data you're scanning.  If I understand things, ANALYZE takes a 
random sample anyway, so a full table scan should be able to produce even 
better statistics than a normal ANALYZE.

2. If you have a table with NO statistics, the chances of generating a sensible 
plan are pretty random.  Since ANALYZE is quite fast, if the planner encounters 
no statistics, why not ANALYZE it on the spot?  (This might need to be a 
configurable feature, though.)

3. A user-configurable update threshold, such as, "When 75% of the rows have changed 
since the last ANALYZE, trigger an auto-analyze."  The user-configurable part would 
account for the fact that some tables stats don't change much even after many updates, 
but others may need to be reanalyzed after a modest number of updates.

Auto-vacuum, combined with auto-analyze, would eliminate many of the problems that plague 
neophyte (and sometimes experienced) users of Postgres.  A substantial percentage of the 
questions to this list are answered with, "Have you ANALYZED?"

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread James Mansion

Alexander Staubo wrote:

On 5/23/07, Andy <[EMAIL PROTECTED]> wrote:

An example would be:
SELECT * FROM table
 WHERE name like '%john%' or street like 
'%srt%'


Anyway, the query planner always does seq scan on the whole table and 
that

takes some time. How can this be optimized or made in another way to be
faster?


There's no algorithm in existence that can "index" arbitrary
substrings the way you think. The only rational way to accomplish this
is to first break the text into substrings using some algorithm (eg.,
words delimited by whitespace and punctuation), and index the
substrings individually.
That seems rather harsh.  If I'd put an index on each of these colomns 
I'd certainly
expect it to use the indices - and I'm pretty sure that Sybase would.  
I'd expect
it to scan the index leaf pages instead of the table itself - they 
should be much

more compact and also likely to be hot in cache.

Why *wouldn't* the planner do this?

James


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread James Mansion



If Sybase is still like SQL Server (or the other way around), it *may*
end up scanning the index *IFF* the index is a clustered index. If it's
a normal index, it will do a sequential scan on the table.

  
Are you sure its not covered?  Have to check at work - but I'm off next 
week so it'll have to wait.



It's not a win on PostgreSQL, because of our MVCC implementation. We
need to scan *both* index *and* data pages if we go down that route, in
which case it's a lot faster to just scan the data pages alone.

  
Why do you need to go to all the data pages - doesn't the index 
structure contain all the keys so
you prefilter and then check to see if the *matched* items are still in 
view?  I'll be first to admit I
know zip about Postgres, but it seems odd - doesn't the index contain 
copies of the key values?.


I suspect that I mis-spoke with 'leaf'.  I really just mean 'all index 
pages with data', since the scan
does not even need to be in index order, just a good way to get at the 
data in a compact way.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread Craig James

Mark Lewis wrote:


PG could scan the index looking for matches first and only load the
actual rows if it found a match, but that could only be a possible win
if there were very few matches, because the difference in cost between a
full index scan and a sequential scan would need to be greater than the
cost of randomly fetching all of the matching data rows from the table
to look up the visibility information.  


Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings?  I thought indexes only had to 
store the string up to the point where there was no ambiguity, for example, if I have "missing", "mississippi" and 
"misty", the index only needs "missin", "missis" and "mist" in the actual index.  This would make 
it impossible to use a full index scan for a LIKE query.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread Craig James

Alvaro Herrera wrote:
>> Just out of curiosity: Does Postgress store a duplicate of the data in the 
index, even for long strings?  I thought indexes only had to store the 
string up to the point where there was no ambiguity, for example, if I have 
"missing", "mississippi" and "misty", the index only needs "missin", 
"missis" and "mist" in the actual index.


What would happen when you inserted a new tuple with just "miss"?  You
would need to expand all the other tuples in the index.


That's right.  This technique used by some index implementations is a tradeoff between 
size and update speed.  Most words in most natural languages can be distinguished by the 
first few characters.  The chances of having to modify more than a few surrounding nodes 
when you insert "miss" is small, so some implementations choose this method.  
Other implementations choose to store the full string.  I was just curious which method 
Postgres uses.

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] ECC RAM really needed?

2007-05-25 Thread Craig James

We're thinking of building some new servers.  We bought some a while back that 
have ECC (error correcting) RAM, which is absurdly expensive compared to the 
same amount of non-ECC RAM.  Does anyone have any real-life data about the 
error rate of non-ECC RAM, and whether it matters or not?  In my long career, 
I've never once had a computer that corrupted memory, or at least I never knew 
if it did.  ECC sound like a good idea, but is it solving a non-problem?

Thanks,
Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?

2007-05-29 Thread Craig James

On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote:
   The point I'm worried is performance. Do you think the performance 
would be better executing exactly the same queries only adding an extra 
column to all the tables e.g. customer_id, than open a connection to the 
only one customers DB and execute the query there?


There is no simple answer to this question; it depends too much on your data.  
In many cases, adding a customer_id to every table, and perhaps also 
per-customer views (per Jeff's suggestion), can work really well.

However, performance is not the only consideration, or even the main 
consideration.  We operate with about 150 separate databases.  In our cases, 
administration issues and software design outweighed performance issues.

For example, with separate databases, security is simpler, *and* it's easy to 
convince the customer that their data is protected.  Creating views only helps 
for read-only access.  When the customer wants to modify their data, how will 
you keep them from accessing and overwriting one another's data?  Even with 
views, can you convince the customer you've done it right?  With separate 
databases, you use the built-in security of Postgres, and don't have to 
duplicate it in your schema and apps.

With separate databases, it's really easy to discard a customer.  This can be 
particularly important for a big customer with millions of linked records.  In 
a database-for-everyone design, you'll have lots of foreign keys, indexes, etc. 
that make deleting a whole customer a REALLY big job.  Contrast that with just 
discarding a whole database, which typically takes a couple seconds.

But even more important (to us) is the simplicity of the applications and management.  
It's far more than just an extra " ... and customer = xyz" added to every 
query.  Throwing the customers together means every application has to understand 
security, and many operations that would be simple become horribly tangled.  Want to back 
up a customer's data?  You can't use pg_dump, you have to write your own dump app.  Want 
to restore a customer's data?  Same.  Want to do a big update?  Your whole database is 
affected and probably needs to be vacuum/analyzed.  On and on, at every turn, management 
and applications are more complex.

If you have hundreds of separate databases, it's also easy to scale: Just buy 
more servers, and move some of the databases.  With a single monster database, 
as load increases, you may hit the wall sooner or later.

Postgres is really good at maintaining many separate databases.  Why do it 
yourself?

There are indeed performance issues, but even that's not black and white.  
Depending on the specifics of your queries and the load on your servers, you 
may get better performance from a single monster database, or from hundreds of 
separate databases.

So, your question has no simple answer.  You should indeed evaluate the 
performance, but other issues may dominate your decision.

Craig



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread Craig James

Apologies for a somewhat off-topic question, but...

The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.  
It detects the two RAID1 arrays, the partitions of which are marked properly.  
But it can't find the RAID0 on top of that, because there's no corresponding 
device to auto-detect.  The result is that it creates /dev/md0 and /dev/md1 and 
assembles the RAID1 devices on bootup, but /dev/md2 isn't created, so the RAID0 
can't be assembled at boot time.

Here's what it looks like:

$ cat /proc/mdstat 
Personalities : [raid0] [raid1] 
md2 : active raid0 md0[0] md1[1]

 234436224 blocks 64k chunks
 
md1 : active raid1 sde1[1] sdc1[2]

 117218176 blocks [2/2] [UU]
 
md0 : active raid1 sdd1[1] sdb1[0]

 117218176 blocks [2/2] [UU]

$ uname -r
2.6.12-1.1381_FC3

After a reboot, I always have to do this:

 mknod /dev/md2 b 9 2
 mdadm --assemble /dev/md2 /dev/md0 /dev/md1
 mount /dev/md2

What am I missing here?

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

[EMAIL PROTECTED] wrote:
various people (not database experts) are pushing to install Oracle 
cluster so that they can move all of these to one table with a 
customerID column.


They're blowing smoke if they think Oracle can do this.  One of my applications 
had this exact same problem -- table-per-customer versus 
big-table-for-everyone.  Oracle fell over dead, even with the best indexing 
possible, tuned by the experts, and using partitions keyed to the customerID.

We ended up breaking it up into table-per-customer because Oracle fell over 
dead when we had to do a big update on a customer's entire dataset.  All other 
operations were slowed by the additional index on the customer-ID, especially 
complex joins.  With a table-for-everyone, you're forced to create tricky 
partitioning or clustering, clever indexes, and even with that, big updates are 
problematic.  And once you do this, then you become heavily tied to one RDBMS 
and your applications are no longer portable, because clustering, indexing, 
partitioning and other DB tuning tricks are very specific to each RDBMS.

When we moved to Postgres, we never revisited this issue, because both Oracle 
and Postgres are able to handle thousands of tables well.  As I wrote in a 
previous message on a different topic, often the design of your application is 
more important than the performance.  In our case, the table-per-customer makes 
the applications simpler, and security is MUCH easier.

Oracle is simply not better than Postgres in this regard.  As far as I know, 
there is only one specific situation (discussed frequently here) where Oracle 
is faster: the count(), min() and max() functions, and I know significant 
progress has been made since I started using Postgres.  I have not found any 
other query where Oracle is significantly better, and I've found several where 
Postgres is the clear winner.

It's telling that Oracle's license contract prohibits you from publishing 
comparisons and benchmarks.  You have to wonder why.

Craig

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Scott Marlowe wrote:
OTOH, there are some things, like importing data, which are MUCH faster 
in pgsql than in the big database.


An excellent point, I forgot about this. The COPY command is the best thing 
since the invention of a shirt pocket.  We have a database-per-customer design, 
and one of the mosterous advantages of Postgres is that we can easily do 
backups.  A pg_dump, then scp to a backup server, and in just a minute or two 
we have a full backup.  For recovery, pg_restore is equally fast and amazing.  
Last time I checked, Oracle didn't have anything close to this.

Craig



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Jonah H. Harris wrote:

On 6/6/07, Craig James <[EMAIL PROTECTED]> wrote:

They're blowing smoke if they think Oracle can do this.


Oracle could handle this fine.


Oracle fell over dead, even with the best indexing possible,
tuned by the experts, and using partitions keyed to the
customerID.


I don't think so, whoever tuned this likely didn't know what they were 
doing.


Wrong on both counts.

You didn't read my message.  I said that *BOTH* Oracle and Postgres performed 
well with table-per-customer.  I wasn't Oracle bashing.  In fact, I was doing 
the opposite: Someone's coworker claimed ORACLE was the miracle cure for all 
problems, and I was simply pointing out that there are no miracle cures.  (I 
prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used 
extensively.)

The technical question is simple: Table-per-customer or big-table-for-everyone.  The 
answer is, "it depends."  It depends on your application, your 
read-versus-write ratio, the table size, the design of your application software, and a 
dozen other factors.  There is no simple answer, but there are important technical 
insights which, I'm happy to report, various people contributed to this discussion.  
Perhaps you have some technical insight too, because it really is an important question.

The reason I assert (and stand by this) that "They're blowing smoke" when they 
claim Oracle has the magic cure, is because Oracle and Postgres are both relational 
databases, they write their data to disks, and they both have indexes with O(log(N)) 
retrieval/update times.  Oracle doesn't have a magical workaround to these facts, nor 
does Postgres.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] LIKE search and performance

2007-06-06 Thread James Mansion

[EMAIL PROTECTED] wrote:

What is a real life example where an intelligent and researched
database application would issue a like or ilike query as their
primary condition in a situation where they expected very high
selectivity?
  
In my case the canonical example is to search against textual keys where 
the search is
performed automatically if the user hs typed enough data and paused.  In 
almost all
cases the '%' trails, and I'm looking for 'starts with' in effect.  
usually the search will have
a specified upper number of returned rows, if that's an available 
facility.  I realise in this
case that matching against the index does not allow the match count 
unless we check

MVCC as we go, but I don't see why another thread can't be doing that.

James


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-07 Thread Craig James

Tyrrill, Ed wrote:


I have a table, let's call it A, whose primary key, a_id, is referenced
in a second table, let's call it B.  For each unique A.a_id there are
generally many rows in B with the same a_id.  My problem is that I want
to delete a row in A when the last row in B that references it is
deleted.  Right now I just query for rows in A that aren't referenced by
B, and that worked great when the tables were small, but it takes over
an hour now that the tables have grown larger (over 200 million rows in
B and 14 million in A).  The delete has to do a sequential scan of both
tables since I'm looking for what's not in the indexes.

I was going to try creating a trigger after delete on B for each row to
check for more rows in B with the same a_id, and delete the row in A if
none found.  In general I will be deleting 10's of millions of rows from
B and 100's of thousands of rows from A on a daily basis.  What do you
think?  Does anyone have any other suggestions on different ways to
approach this?


Essentially what you're doing is taking the one-hour job and spreading out in 
little chunks over thousands of queries.  If you have 10^7 rows in B and 10^5 
rows in A, then on average you have 100 references from B to A.  That means 
that 99% of the time, your trigger will scan B and find that there's nothing to 
do.  This could add a lot of overhead to your ordinary transactions, costing a 
lot more in the long run than just doing the once-a-day big cleanout.

You didn't send the specifics of the query you're using, along with an EXPLAIN 
ANALYZE of it in operation.  It also be that your SQL is not optimal, and that 
somebody could suggest a more efficient query.

It's also possible that it's not the sequential scans that are the problem, but 
rather that it just takes a long time to delete 100,000 rows from table A 
because you have a lot of indexes. Or it could be a combination of performance 
problems.

You haven't given us enough information to really analyze your problem.  Send 
more details!

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-11 Thread Craig James

Tyrrill, Ed wrote:

QUERY PLAN



---
 Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645 width=8)
(actual time=6503583.342..8220629.311 rows=93524 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".record_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..521525.10 rows=13799645 width=8) (actual
time=15.955..357813.621 rows=13799645 loops=1)
   ->  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
(actual time=6503265.293..7713657.750 rows=214938308 loops=1)
 Sort Key: backup_location.record_id
 ->  Seq Scan on backup_location  (cost=0.00..3311212.04
rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308
loops=1)
 Total runtime: 8229178.269 ms
(8 rows)

I ran vacuum analyze after the last time any inserts, deletes, or
updates were done, and before I ran the query above.  I've attached my
postgresql.conf.  The machine has 4 GB of RAM.


I thought maybe someone with more expertise than me might answer this, but 
since they haven't I'll just make a comment.  It looks to me like the sort of 
214 million rows is what's killing you.  I suppose you could try to increase 
the sort memory, but that's a lot of memory.  It seems to me an index merge of 
a relation this large would be faster, but that's a topic for the experts.

On a theoretical level, the problem is that it's sorting the largest table.  
Perhaps you could re-cast the query so that it only has to sort the smaller 
table, something like

  select a.id from a where a.id not in (select distinct b.id from b)

where "b" is the smaller table.  There's still no guarantee that it won't do a sort on 
"a", though.  In fact one of the clever things about Postgres is that it can convert a query like 
the one above into a regular join, unless you do something like "select ... offset 0" which blocks 
the optimizer from doing the rearrangement.

But I think the first approach is to try to tune for a better plan using your 
original query.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] test / live environment, major performance difference

2007-06-11 Thread Craig James


On 2007-06-11 Christo Du Preez wrote:

I really hope someone can shed some light on my problem. I'm not sure
if this is a posgres or potgis issue.

Anyway, we have 2 development laptops and one live server, somehow I
managed to get the same query to perform very well om my laptop, but
on both the server and the other laptop it's really performing bad.


One simple possibility that bit me in the past: If you do pg_dump/pg_restore to 
create a copy of the database, you have to ANALYZE the newly-restored database. 
 I mistakenly assumed that pg_restore would do this, but you have to run 
ANALYZE explicitely after a restore.

Craig


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Replication

2007-06-14 Thread Craig James

Looking for replication solutions, I find:

Slony-I
Seems good, single master only, master is a single point of failure,
no good failover system for electing a new master or having a failed
master rejoin the cluster.  Slave databases are mostly for safety or
for parallelizing queries for performance.  Suffers from O(N^2) 
communications (N = cluster size).


Slony-II
Seems brilliant, a solid theoretical foundation, at the forefront of
computer science.  But can't find project status -- when will it be
available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
Seems pretty good, but web site is not current, there are releases in use
that are not on the web site, and also seems to always be a couple steps
behind the current release of Postgres.  Two single-points failure spots,
load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?

Thanks!
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Thanks to all who replied and filled in the blanks.  The problem with the web 
is you never know if you've missed something.

Joshua D. Drake wrote:

Looking for replication solutions, I find...
Slony-II

Dead


Wow, I'm surprised.  Is it dead for lack of need, lack of resources, too 
complex, or all of the above?  It sounded like such a promising theoretical 
foundation.

Ben wrote:

Which replication problem are you trying to solve?


Most of our data is replicated offline using custom tools tailored to our loading pattern, but we 
have a small amount of "global" information, such as user signups, system configuration, 
advertisements, and such, that go into a single small (~5-10 MB) "global database" used 
by all servers.

We need "nearly-real-time replication," and instant failover.  That is, it's far more 
important for the system to keep working than it is to lose a little data.  Transactional integrity 
is not important.  Actual hardware failures are rare, and if a user just happens to sign up, or do 
"save preferences", at the instant the global-database server goes down, it's not a 
tragedy.  But it's not OK for the entire web site to go down when the one global-database server 
fails.

Slony-I can keep several slave databases up to date, which is nice.  And I 
think I can combine it with a PGPool instance on each server, with the master 
as primary and few Slony-copies as secondary.  That way, if the master goes 
down, the PGPool servers all switch to their secondary Slony slaves, and 
read-only access can continue.  If the master crashes, users will be able to do 
most activities, but new users can't sign up, and existing users can't change 
their preferences, until either the master server comes back, or one of the 
slaves is promoted to master.

The problem is, there don't seem to be any "vote a new master" type of tools 
for Slony-I, and also, if the original master comes back online, it has no way to know 
that a new master has been elected.  So I'd have to write a bunch of SOAP services or 
something to do all of this.

I would consider PGCluster, but it seems to be a patch to Postgres itself.  I'm 
reluctant to introduce such a major piece of technology into our entire system, 
when only one tiny part of it needs the replication service.

Thanks,
Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Andreas Kostyrka wrote:

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable.


But Slony slaves are read-only, correct?  So the system isn't fully functional 
once the master goes down.


That leaves you the problem of restarting your app
(or making it reconnect) to the new master.


Don't you have to run a Slony app to convert one of the slaves into the master?


5-10MB data implies such a fast initial replication, that making the
server rejoin the cluster by setting it up from scratch is not an issue.


The problem is to PREVENT it from rejoining the cluster.  If you have some 
semi-automatic process that detects the dead server and converts a slave to the 
master, and in the mean time the dead server manages to reboot itself (or its 
network gets fixed, or whatever the problem was), then you have two masters 
sending out updates, and you're screwed.


The problem is, there don't seem to be any "vote a new master" type of
tools for Slony-I, and also, if the original master comes back online,
it has no way to know that a new master has been elected.  So I'd have
to write a bunch of SOAP services or something to do all of this.


You don't need SOAP services, and you do not need to elect a new master.
if dbX goes down, dbY takes over, you should be able to decide on a
static takeover pattern easily enough.


I can't see how that is true.  Any self-healing distributed system needs 
something like the following:

 - A distributed system of nodes that check each other's health
 - A way to detect that a node is down and to transmit that
   information across the nodes
 - An election mechanism that nominates a new master if the
   master fails
 - A way for a node coming online to determine if it is a master
   or a slave

Any solution less than this can cause corruption because you can have two nodes 
that both think they're master, or end up with no master and no process for 
electing a master.  As far as I can tell, Slony doesn't do any of this.  Is 
there a simpler solution?  I've never heard of one.


The point here is, that the servers need to react to a problem, but you
probably want to get the admin on duty to look at the situation as
quickly as possible anyway.


No, our requirement is no administrator interaction.  We need instant, 
automatic recovery from failure so that the system stays online.


Furthermore, you need to checkout pgpool, I seem to remember that it has
some bad habits in routing queries. (E.g. it wants to apply write
queries to all nodes, but slony makes the other nodes readonly.
Furthermore, anything inside a BEGIN is sent to the master node, which
is bad with some ORMs, that by default wrap any access into a transaction)


I should have been more clear about this.  I was planning to use PGPool in the 
PGPool-1 mode (not the new PGPool-2 features that allow replication).  So it 
would only be acting as a failover mechanism.  Slony would be used as the 
replication mechanism.

I don't think I can use PGPool as the replicator, because then it becomes a new 
single point of failure that could bring the whole system down.  If you're 
using it for INSERT/UPDATE, then there can only be one PGPool server.

I was thinking I'd put a PGPool server on every machine in failover mode only.  
It would have the Slony master as the primary connection, and a Slony slave as 
the failover connection.  The applications would route all INSERT/UPDATE 
statements directly to the Slony master, and all SELECT statements to the 
PGPool on localhost.  When the master failed, all of the PGPool servers would 
automatically switch to one of the Slony slaves.

This way, the system would keep running on the Slony slaves (so it would be 
read-only), until a sysadmin could get the master Slony back online.  And when 
the master came online, the PGPool servers would automatically reconnect and 
write-access would be restored.

Does this make sense?

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Replication

2007-06-18 Thread Craig James

Markus Schiltknecht wrote:
Not quite... there's still Postgres-R, see www.postgres-r.org  And I'm 
continuously working on it, despite not having updated the website for 
almost a year now...


I planned on releasing the next development snapshot together with 8.3, 
as that seems to be delayed, that seems realistic ;-)


Is Postgres-R the same thing as Slony-II?  There's a lot of info and news 
around about Slony-II, but your web page doesn't seem to mention it.

While researching replication solutions, I had a heck of a time sorting out the 
dead or outdated web pages (like the stuff on gborg) from the active projects.

Either way, it's great to know you're working on it.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Volunteer to build a configuration tool

2007-06-18 Thread James Neethling



This is my idea:

A JavaScript HTML page that would have some basic questions at the top:

1) How much memory do you have?

2) How many connections will be made to the database?

3) What operating system do you use?

4) Etc…

Next the person would press a button, “generate”, found below the 
questions. The JavaScript HTML page would then generate content for 
two Iframes at the bottom on the page. One Iframe would contain the 
contents of the postgresql.conf file. The postgresql.conf settings 
would be tailored more to the individuals needs than the standard 
default file. The second Iframe would contain the default settings one 
should consider using with their operating system.


My web team would be very happy to develop this for the PostgreSQL 
project. It would have saved us a lot of time by having a 
configuration tool in the beginning. I am willing to make this a very 
high priority for my team.




Hi Lance,

I agree that having a page that can assist in generating a base 
configuration file is an excellent way to start off with a good 
configuration that can assist a system administrator in getting half way 
to a good configuration. We've recently gone through a process of 
configuring a machine and it is a time consuming task of testing and 
benchmarking various configuration details.


My thoughts:
Using the browser is a great idea as a universal platform. I can 
foreseen a problem in that some users won't have GUI access to the 
machine that they are setting up. I don't have much direct experience in 
this field, but I suspect that a great number of installations happen 
'headless'? This can easily be circumvented by hosting the configuration 
builder on a public internet site, possibly postgresql.org?


Also, Javascript isn't the easiest language to use to get all the 
decisions that need to be made for various configuration options. Would 
it not be a better idea to host a configuration builder centrally, 
possible on postgresql.org and have the documentation reference it, 
including the docs that come packaged with postgresql (README, INSTALL 
documentation?). This would mean that you wouldn't be able to package 
the configuration builder, but you would be able to implement more 
application logic and more complex decision making in a hosted 
application. Of course, I have no idea of the skills that your team 
already have :)




To add ideas: perhaps a more advanced tool would be able to add comment 
indicating a suggested range for the particular setting. For example, 
with 2Gb of RAM, it chooses a workmem of, say, 768Mb, with a comment 
indicating a suggested range of 512Mb - 1024Mb.



Thanks for taking the time to put this together and for offering the 
services of your team.


Kind regards,
James









begin:vcard
fn:James  Neethling
n:Neethling;James 
org:Silver Sphere Business Solutions
adr:Centurion Business Park A2;;25633 Democracy Way;Prosperity Park;Milnerton;Cape Town;7441
email;internet:[EMAIL PROTECTED]
title:Managing Member
tel;work:27 21 552 7108
tel;fax:27 21 552 7106
tel;cell:27 83 399 2799
x-mozilla-html:FALSE
url:http://www.silversphere.co.za
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3

2007-06-29 Thread Craig James

Dolafi, Tom wrote:
min(fmin) |   max(fmin)|avg(fmin)  
   1  |   55296469 |11423945 


min(fmax) |   max(fmax)|avg(fmax)
  18  |   3288 |11424491

There are 5,704,211 rows in the table.


When you're looking for weird index problems, it's more interesting to know if 
there are certain numbers that occur a LOT.  From your statistics above, each 
number occurs about 10 times in the table.  But do some particular numbers 
occur thousands, or even millions, of times?

Here is a query that will print a list of the highest-occuring values.  You 
might expect a few occurances of 20, and maybe 30, but if you have thousands or 
millions of occurances of certain numbers, then that can screw up an index.

  select fmax, c from
   (select fmax, count(fmax) as c from your_table group by fmax) as foo
  where c > 3 order by c desc;

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Join with lower/upper limits doesn't scale well

2007-07-02 Thread Craig James

I have the same schema in two different databases.  In "smalldb", the two tables of 
interest have about 430,000 rows, in "bigdb", the two tables each contain about 5.5 
million rows.  I'm processing the data, and for various reasons it works out well to process it in 
100,000 row chunks.  However, it turns out for the big schema, selecting 100,000 rows is the 
longest single step of the processing.

Below is the explain/analyze output of the query from each database.  Since 
both tables are indexed on the joined columns, I don't understand why the big 
table should be so much slower -- I hoped this would scale well, or at least 
O(log(N)), not O(N).

What's going on here?  I don't know if I'm reading this right, but it looks 
like the sort is taking all the time, but that doesn't make sense because in 
both cases it's sorting 100,000 rows.

Thanks,
Craig


bigdb=> explain analyze
bigdb->   select r.row_num, m.molkeys from my_rownum r
bigdb->   join my_molkeys m on (r.version_id = m.version_id)
bigdb->   where r.row_num >= 10 AND r.row_num < 20
bigdb->   order by r.row_num;

Sort  (cost=431000.85..431248.23 rows=98951 width=363) (actual 
time=46306.748..46417.448 rows=10 loops=1)
  Sort Key: r.row_num
  ->  Hash Join  (cost=2583.59..422790.68 rows=98951 width=363) (actual 
time=469.010..45752.131 rows=10 loops=1)
Hash Cond: ("outer".version_id = "inner".version_id)
->  Seq Scan on my_molkeys m  (cost=0.00..323448.30 rows=5472530 
width=363) (actual time=11.243..33299.933 rows=5472532 loops=1)
->  Hash  (cost=2336.21..2336.21 rows=98951 width=8) (actual 
time=442.260..442.260 rows=10 loops=1)
  ->  Index Scan using i_chm_rownum_row_num on my_rownum r  
(cost=0.00..2336.21 rows=98951 width=8) (actual time=47.551..278.736 rows=10 
loops=1)
Index Cond: ((row_num >= 10) AND (row_num < 20))
Total runtime: 46543.163 ms


smalldb=> explain analyze
smalldb->   select r.row_num, m.molkeys from my_rownum r
smalldb->   join my_molkeys m on (r.version_id = m.version_id)
smalldb->   where r.row_num >= 10 AND r.row_num < 20
smalldb->   order by r.row_num;

Sort  (cost=43598.23..43853.38 rows=102059 width=295) (actual 
time=4097.180..4207.733 rows=10 loops=1)
  Sort Key: r.row_num
  ->  Hash Join  (cost=2665.09..35107.41 rows=102059 width=295) (actual 
time=411.635..3629.756 rows=10 loops=1)
Hash Cond: ("outer".version_id = "inner".version_id)
->  Seq Scan on my_molkeys m  (cost=0.00..23378.90 rows=459590 
width=295) (actual time=8.563..2011.455 rows=459590 loops=1)
->  Hash  (cost=2409.95..2409.95 rows=102059 width=8) (actual 
time=402.867..402.867 rows=10 loops=1)
  ->  Index Scan using i_chm_rownum_row_num_8525 on my_rownum r  
(cost=0.00..2409.95 rows=102059 width=8) (actual time=37.122..242.528 rows=10 
loops=1)
Index Cond: ((row_num >= 10) AND (row_num < 20))
Total runtime: 4333.501 ms



Table "bigdb.my_rownum"
  Column   |  Type   | Modifiers 
+-+---
version_id | integer | 
parent_id  | integer | 
row_num| integer | 
Indexes:

   "i_chm_rownum_row_num" UNIQUE, btree (row_num)
   "i_chm_rownum_version_id" UNIQUE, btree (version_id)
   "i_chm_rownum_parent_id" btree (parent_id)



Table "bigdb.my_molkeys"
  Column   |  Type   | Modifiers 
+-+---
version_id | integer | 
molkeys| text| 
Indexes:

   "i_chm_molkeys_version_id" UNIQUE, btree (version_id)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id >= 320
and   r.version_id <  330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id >= 320
and   r.version_id <  330
and   m.version_id >= 320
and   m.version_id <  330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It seemed to be ignoring the fact that it 
could push the "between" condition along to the second table, since the condition and the join are 
on the same indexed columns.  So, I added a redundant condition, and bingo, it was a lot faster.  In the 
analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a "hot" database 
that's been queried a couple of times.  In real life on a "cold" database, the times are more like 
10 seconds and 21 seconds, so it's quite significant.

Thanks,
Craig



db=> explain analyze 
db-> select r.version_id, r.row_num, m.molkeys from my_rownum r

db-> join my_molkeys m on (r.version_id = m.version_id)
db-> where r.version_id >= 320
db-> and   r.version_id <  330
db-> order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)
  Sort Key: r.version_id
  ->  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
->  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 rows=44620 
width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id >= 320) AND (version_id < 330))
  ->  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 
loops=1)
Index Cond: ((version_id >= 320) AND (version_id < 
330))
->  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947)
  Index Cond: ("outer".version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db=> explain analyze 
db-> select r.version_id, r.row_num, m.molkeys from my_rownum r

db-> join my_molkeys m on (r.version_id = m.version_id)
db-> where r.version_id >= 320
db-> and r.version_id <330
db-> and m.version_id >=   320
db-> and m.version_id <330
db-> order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)
  Sort Key: r.version_id
  ->  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)
Hash Cond: ("outer".version_id = "inner".version_id)
->  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 
loops=1)
  Index Cond: ((version_id >= 320) AND (version_id < 330))
->  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  ->  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1)
Recheck Cond: ((version_id >= 320) AND (version_id < 
330))
->  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 
loops=1)
  Index Cond: ((version_id >= 320) AND (version_id 
< 330))
Total runtime: 1096.031 ms
(12 rows)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

Sorry, I forgot to mention: This is 8.1.4, with a fairly ordinary configuration 
on a 4 GB system.

Craig


Craig James wrote:

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id >= 320
and   r.version_id <  330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id >= 320
and   r.version_id <  330
and   m.version_id >= 320
and   m.version_id <  330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It 
seemed to be ignoring the fact that it could push the "between" 
condition along to the second table, since the condition and the join 
are on the same indexed columns.  So, I added a redundant condition, and 
bingo, it was a lot faster.  In the analysis shown below, the timing 
(about 1.0 and 1.5 seconds respectively) are for a "hot" database that's 
been queried a couple of times.  In real life on a "cold" database, the 
times are more like 10 seconds and 21 seconds, so it's quite significant.


Thanks,
Craig



db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db-> join my_molkeys m on (r.version_id = m.version_id)
db-> where r.version_id >= 320
db-> and   r.version_id <  330
db-> order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)

  Sort Key: r.version_id
  ->  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
->  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id >= 320) AND (version_id < 
330))
  ->  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 
rows=46947 loops=1)
Index Cond: ((version_id >= 320) AND (version_id 
< 330))
->  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 
loops=46947)

  Index Cond: ("outer".version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db=> explain analyze db-> select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db-> join my_molkeys m on (r.version_id = m.version_id)
db-> where r.version_id >= 320
db-> and r.version_id <330
db-> and m.version_id >=   320
db-> and m.version_id <330
db-> order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)

  Sort Key: r.version_id
  ->  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)

Hash Cond: ("outer".version_id = "inner".version_id)
->  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 
rows=46947 loops=1)
  Index Cond: ((version_id >= 320) AND (version_id < 
330))
->  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  ->  Bitmap Heap Scan on my_rownum r  
(cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 
rows=46947 loops=1)
Recheck Cond: ((version_id >= 320) AND 
(version_id < 330))
->  Bitmap Index Scan on 
i_chm_rownum_version_id_4998  (cost=0.00..366.72 rows=44620 width=0) 
(actual time=21.174..21.174 rows=46947 loops=1)
  Index Cond: ((version_id >= 320) AND 
(version_id < 330))

Total runtime: 1096.031 ms
(12 rows)





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] pg_restore causes 100

2007-07-12 Thread Craig James

Here's an oddity.  I have 10 databases, each with about a dozen connections to Postgres 
(about 120 connections total), and at midnight they're all idle.  These are mod_perl 
programs (like a FastCGI -- they stay connected so they're ready for instant service).  
So using "ps -ef" and grep, we find one of the databases looks like this:

postgres 22708  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46915) idle
postgres 22709  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46916) idle
postgres 22710  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46917) idle
postgres 22711  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46918) idle
postgres 22712  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46919) idle
postgres 22724  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42440) idle
postgres 22725  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42441) idle
postgres 22726  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42442) idle
postgres 22727  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42443) idle
postgres 22728  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42444) idle
postgres 22731  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42447) idle

Now here's the weird thing.  I'm running a pg_restore of a database (on the order of 4GB 
compressed, maybe 34M rows of ordinary data, and 15M rows in one BLOB table that's 
typically 2K per blob).  When I do this, ALL of the postgress backends start working at 
about 1% CPU apiece.  This means that the 120 "idle" postgres backends are 
together using almost 100% of one CPU on top of the 100% CPU being used by pg_restore.  
See the output of top(1) below.

Is this normal?  All I can guess at is that something's going on in shared 
memory that every Postgres backend has to respond to.

Thanks,
Craig



Tasks: 305 total,   1 running, 304 sleeping,   0 stopped,   0 zombie
Cpu(s): 33.5% us,  1.5% sy,  0.0% ni, 57.8% id,  6.6% wa,  0.2% hi,  0.4% si
Mem:   4151456k total,  4011020k used,   140436k free,10096k buffers
Swap:  2104504k total,94136k used,  2010368k free,  3168596k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
6681 postgres  16   0  217m 188m 161m D 50.4  4.6   4:29.30 postmaster   
1577 root  10  -5 000 S  1.0  0.0 108:01.97 md0_raid1
8487 postgres  15   0  187m 8704 4996 S  1.0  0.2   0:06.56 postmaster   
8506 postgres  15   0  187m 8604 4892 S  1.0  0.2   0:06.37 postmaster   
8507 postgres  15   0  187m 8708 5004 S  1.0  0.2   0:06.42 postmaster   
8512 postgres  15   0  187m 8612 4904 S  1.0  0.2   0:06.65 postmaster   
8751 postgres  15   0  187m  10m 7520 S  1.0  0.3   0:07.95 postmaster   
8752 postgres  15   0  187m  10m 7492 S  1.0  0.3   0:07.84 postmaster   
14053 postgres  15   0  187m 8752 5044 S  1.0  0.2   0:06.53 postmaster   
16515 postgres  15   0  187m 8156 4452 S  1.0  0.2   0:06.33 postmaster   
25351 postgres  15   0  187m 9772 6064 S  1.0  0.2   0:06.75 postmaster   
25387 postgres  15   0  187m 8444 4752 S  1.0  0.2   0:06.45 postmaster

Re: [PERFORM] select on 1milion register = 6s

2007-07-28 Thread Craig James

Bruno Rodrigues Siqueira wrote:
Who can help me? My SELECT in a base with 1 milion register, 
using  expression index = 6seconds…


Run your query using 


  EXPLAIN ANALYZE SELECT ... your query ...

and then post the results to this newsgroup.  Nobody can help until they see 
the results of EXPLAIN ANALYZE.  Also, include all other relevant information, 
such as Postgres version, operating system, amount of memory, and any changes 
you have made to the Postgres configuration file.

Craig



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Craig James

Tilmann Singer wrote:

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> [20070728 21:05]:

Let's try putting the sort/limit in each piece of the UNION to speed them up 
separately.

SELECT * FROM (
 (SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY created_at DESC LIMIT 10) AS q1
 UNION
 (SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10) AS q2
ORDER BY created_at DESC LIMIT 10;


It's not possible to use ORDER BY or LIMIT within unioned queries.

http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION


If I'm reading this documentation correctly, it *is* possible, as long as 
they're inside of a sub-select, as in this case.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Scott Marlowe wrote:

And there's the issue that with windows / NTFS that when one process
opens a file for read, it locks it for all other users.  This means
that things like virus scanners can cause odd, unpredictable failures
of your database.

  

Can you provide some justification for this?

James


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Scott Marlowe wrote:

Where unixes generally outperform windows is in starting up new
backends, better file systems, and handling very large shared_buffer
settings.
  


Why do you think that UNIX systems are better at handling large shared 
buffers than Wndows?
32 bit Windows systems can suffer from fragmented address space, to be 
sure, but if the
performance of the operating-system supplied mutex or semaphore isn't 
good enough, you can

just use the raw atomic ops.

If what you mean is that pg has a design that's heavily oriented towards 
things that tend to
be cheap on POSIX and doesn't use the core Win32 features effectively, 
then let's track

that as an optimisation opportunity for the Win32 port.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Carlo Stonebanks wrote:

Isn't it just easier to assume that Windows Server can't do anything right?
;-)

  

Well, avoiding the ;-) - people do, and its remarkably foolish of them.  Its
a long-standing whinge that many people with a UNIX-background seem to
just assume that Windows sucks, but you could run 40,000 sockets from a
single Win32 process for a while and some well-known UNIX systems would
still struggle to do this, libevent or no.  Admitedly, the way a Win32
app is architected would be rather different from a typical POSIX one.

Windows has been a cheap target bt its remarkably adequate and the
TPC results speak for themselves.






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] What to vacuum after deleting lots of tables

2007-09-10 Thread Craig James

If I delete a whole bunch of tables (like 10,000 tables), should I vacuum 
system tables, and if so, which ones?  (This system is still on 8.1.4 and isn't 
running autovacuum).

Thanks,
Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Tablespaces and NFS

2007-09-19 Thread Craig James

Carlos Moreno wrote:

Anyone has tried a setup combining tablespaces with NFS-mounted partitions?


There has been some discussion of this recently, you can find it in the 
archives (http://archives.postgresql.org/).  The word seems to be that NFS can 
lead to data corruption.

Craig




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Craig James

Luiz K. Matsumura wrote:
Is connected to full 100Mb, it transfers many things quick to clients. 
Is running Apache adn JBoss, transfer rate is good, I did scp to copy 
many archives and is as quick as the old server.


I have no idea how to continue researching this problem. Now I'm going 
to do some networks tests.


Any chance this is your desktop machine, and you're also using it for audio?  
Microsoft built in a feature (!) that reduces network speed by 90% when music 
is playing:

 http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Block at a time ...

2010-03-17 Thread Craig James

On 3/17/10 2:52 AM, Greg Stark wrote:

On Wed, Mar 17, 2010 at 7:32 AM, Pierre C  wrote:

I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1 GB.


This turns out to be a bad idea. One of the first thing Oracle DBAs
are told to do is change this default setting to allocate some
reasonably large fixed size rather than scaling upwards.

This might be mostly due to Oracle's extent-based space management but
I'm not so sure. Recall that the filesystem is probably doing some
rounding itself. If you allocate 120kB it's probably allocating 128kB
itself anyways. Having two layers rounding up will result in odd
behaviour.

In any case I was planning on doing this a while back. Then I ran some
experiments and couldn't actually demonstrate any problem. ext2 seems
to do a perfectly reasonable job of avoiding this problem. All the
files were mostly large contiguous blocks after running some tests --
IIRC running pgbench.


This is one of the more-or-less solved problems in Unix/Linux.  Ext* file systems have a 
"reserve" usually of 10% of the disk space that nobody except root can use.  
It's not for root, it's because with 10% of the disk free, you can almost always do a 
decent job of allocating contiguous blocks and get good performance.  Unless Postgres has 
some weird problem that Linux has never seen before (and that wouldn't be 
unprecedented...), there's probably no need to fool with file-allocation strategies.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-22 Thread Craig James

On 3/22/10 11:47 AM, Scott Carey wrote:


On Mar 17, 2010, at 9:41 AM, Craig James wrote:


On 3/17/10 2:52 AM, Greg Stark wrote:

On Wed, Mar 17, 2010 at 7:32 AM, Pierre C   wrote:

I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1 GB.


This turns out to be a bad idea. One of the first thing Oracle DBAs
are told to do is change this default setting to allocate some
reasonably large fixed size rather than scaling upwards.

This might be mostly due to Oracle's extent-based space management but
I'm not so sure. Recall that the filesystem is probably doing some
rounding itself. If you allocate 120kB it's probably allocating 128kB
itself anyways. Having two layers rounding up will result in odd
behaviour.

In any case I was planning on doing this a while back. Then I ran some
experiments and couldn't actually demonstrate any problem. ext2 seems
to do a perfectly reasonable job of avoiding this problem. All the
files were mostly large contiguous blocks after running some tests --
IIRC running pgbench.


This is one of the more-or-less solved problems in Unix/Linux.  Ext* file systems have a 
"reserve" usually of 10% of the disk space that nobody except root can use.  
It's not for root, it's because with 10% of the disk free, you can almost always do a 
decent job of allocating contiguous blocks and get good performance.  Unless Postgres has 
some weird problem that Linux has never seen before (and that wouldn't be 
unprecedented...), there's probably no need to fool with file-allocation strategies.

Craig



Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
tables being written to at once.  Result?  Fragmented tables.


Is this from real-life experience?  With fragmentation, there's a point of diminishing 
return.  A couple head-seeks now and then hardly matter.  My recollection is that even 
when there are lots of concurrent processes running that are all making files larger and 
larger, the Linux file system still can do a pretty good job of allocating 
mostly-contiguous space.  It doesn't just dumbly allocate from some list, but rather 
tries to allocate in a way that results in pretty good "contiguousness" (if 
that's a word).

On the other hand, this is just from reading discussion groups like this one 
over the last few decades, I haven't tried it...

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-25 Thread James Mansion

Hannu Krosing wrote:

Pulling the plug should not corrupt a postgreSQL database, unless it was
using disks which lie about write caching.
  
Didn't we recently put the old wife's 'the disks lied' tale to bed in 
favour of actually admiting that some well known filesystems and 
saftware raid systems have had trouble with their write barriers?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] why does swap not recover?

2010-03-26 Thread Craig James

On 3/26/10 4:57 PM, Richard Yen wrote:

Hi everyone,

We've recently encountered some swapping issues on our CentOS 64GB Nehalem 
machine, running postgres 8.4.2.  Unfortunately, I was foolish enough to set 
shared_buffers to 40GB.  I was wondering if anyone would have any insight into 
why the swapping suddenly starts, but never recovers?

http://richyen.com/i/swap.png";>

Note, the machine has been up and running since mid-December 2009.  It was only 
a March 8 that this swapping began, and it's never recovered.

If we look at dstat, we find the following:

http://richyen.com/i/dstat.png";>

Note that it is constantly paging in, but never paging out.


This happens when you have too many processes using too much space to fit in 
real memory, but none of them are changing their memory image.  If the system 
swaps a process in, but that process doesn't change anything in memory, then 
there are no dirty pages and the kernel can just kick the process out of memory 
without writing anything back to the swap disk -- the data in the swap are 
still valid.

It's a classic problem when processes are running round-robin. Say you have 
space for 100 processes, but you're running 101 process.  When you get to the 
#101, #1 is the oldest so it swaps out.  Then #1 runs, and #2 is the oldest, so 
it gets kicked out.  Then #2 runs and kicks out #3 ... and so forth.  Going 
from 100 to 101 process brings the system nearly to a halt.

Some operating systems try to use tricks to keep this from happening, but it's 
a hard problem to solve.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Craig James

Most of the time Postgres runs nicely, but two or three times a day we get a 
huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 
CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike 
events.  During these spikes, the system is completely unresponsive (you can't 
even login via ssh).

I managed to capture one such event using top(1) with the "batch" option as a 
background process.  See output below - it shows 19 active postgress processes, but I 
think it missed the bulk of the spike.

For some reason, every postgres backend suddenly decides (is told?) to do 
something.  When this happens, the system become unusable for anywhere from ten 
seconds to a minute or so, depending on how much web traffic stacks up behind 
this event.  We have two servers, one offline and one public, and they both do 
this, so it's not caused by actual web traffic (and the Apache logs don't show 
any HTTP activity correlated with the spikes).

I thought based on other posts that this might be a background-writer problem, 
but it's not I/O, it's all CPU as far as I can tell.

Any ideas where I can look to find what's triggering this?

8 CPUs, 8 GB memory
8-disk RAID10 (10k SATA)
Postgres 8.3.0
Fedora 8, kernel is 2.6.24.4-64.fc8
Diffs from original postgres.conf:

max_connections = 1000
shared_buffers = 2000MB
work_mem = 256MB
max_fsm_pages = 1600
max_fsm_relations = 625000
synchronous_commit = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
escape_string_warning = off

Thanks,
Craig


top - 11:24:59 up 81 days, 20:27,  4 users,  load average: 0.98, 0.83, 0.92
Tasks: 366 total,  20 running, 346 sleeping,   0 stopped,   0 zombie
Cpu(s): 30.6%us,  1.5%sy,  0.0%ni, 66.3%id,  1.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8194800k total,  8118688k used,76112k free,   36k buffers
Swap:  2031608k total,   169348k used,  1862260k free,  7313232k cached

PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
18972 postgres  20   0 2514m  11m 8752 R   11  0.1   0:00.35 postmaster
10618 postgres  20   0 2514m  12m 9456 R9  0.2   0:00.54 postmaster
10636 postgres  20   0 2514m  11m 9192 R9  0.1   0:00.45 postmaster
25903 postgres  20   0 2514m  11m 8784 R9  0.1   0:00.21 postmaster
10626 postgres  20   0 2514m  11m 8716 R6  0.1   0:00.45 postmaster
10645 postgres  20   0 2514m  12m 9352 R6  0.2   0:00.42 postmaster
10647 postgres  20   0 2514m  11m 9172 R6  0.1   0:00.51 postmaster
18502 postgres  20   0 2514m  11m 9016 R6  0.1   0:00.23 postmaster
10641 postgres  20   0 2514m  12m 9296 R5  0.2   0:00.36 postmaster
10051 postgres  20   0 2514m  13m  10m R4  0.2   0:00.70 postmaster
10622 postgres  20   0 2514m  12m 9216 R4  0.2   0:00.39 postmaster
10640 postgres  20   0 2514m  11m 8592 R4  0.1   0:00.52 postmaster
18497 postgres  20   0 2514m  11m 8804 R4  0.1   0:00.25 postmaster
18498 postgres  20   0 2514m  11m 8804 R4  0.1   0:00.22 postmaster
10341 postgres  20   0 2514m  13m   9m R2  0.2   0:00.57 postmaster
10619 postgres  20   0 2514m  12m 9336 R1  0.2   0:00.38 postmaster
15687 postgres  20   0 2321m  35m  35m R0  0.4   8:36.12 postmaster



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Craig James

On 4/7/10 2:40 PM, Joshua D. Drake wrote:

On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:

Most of the time Postgres runs nicely, but two or three times a day we get a 
huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 
CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike 
events.  During these spikes, the system is completely unresponsive (you can't 
even login via ssh).

I managed to capture one such event using top(1) with the "batch" option as a 
background process.  See output below - it shows 19 active postgress processes, but I 
think it missed the bulk of the spike.


What does iostat 5 say during the jump?


It's very hard to say ... I'll have to start a background job to watch for a 
day or so.  While it's happening, you can't login, and any open windows become 
unresponsive.  I'll probably have to run it at high priority using nice(1) to 
get any data at all during the event.

Would vmstat be informative?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Craig James

On 4/7/10 3:36 PM, Joshua D. Drake wrote:

On Wed, 2010-04-07 at 14:45 -0700, Craig James wrote:

On 4/7/10 2:40 PM, Joshua D. Drake wrote:

On Wed, 2010-04-07 at 14:37 -0700, Craig James wrote:

Most of the time Postgres runs nicely, but two or three times a day we get a 
huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 
CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike 
events.  During these spikes, the system is completely unresponsive (you can't 
even login via ssh).

I managed to capture one such event using top(1) with the "batch" option as a 
background process.  See output below - it shows 19 active postgress processes, but I 
think it missed the bulk of the spike.


What does iostat 5 say during the jump?


It's very hard to say ... I'll have to start a background job to watch for a 
day or so.  While it's happening, you can't login, and any open windows become 
unresponsive.  I'll probably have to run it at high priority using nice(1) to 
get any data at all during the event.


Do you have sar runing? Say a sar -A ?


No, I don't have it installed.  I'll have a look. At first glance it looks like a 
combination of what I can get with "top -b" and vmstat, but with a single 
program.


My guess is that it is not CPU, it is IO and your CPU usage is all WAIT
on IO.

To have your CPUs so flooded that they are the cause of an inability to
log in is pretty suspect.


I thought so too, except that I can't login during the flood.  If the CPUs were 
all doing iowaits, logging in should be easy.

Greg's suggestion that shared_buffers and work_mem are too big for an 8 GB 
system fits these symptoms -- if it's having a swap storm, login is effectively 
impossible.

Craig



Joshua D. Drake




Thanks,
Craig







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Craig James

On 4/7/10 2:59 PM, Tom Lane wrote:

Craig James  writes:

Most of the time Postgres runs nicely, but two or three times a day we get a 
huge spike in the CPU load that lasts just a short time -- it jumps to 10-20 
CPU loads.  Today it hit 100 CPU loads.  Sometimes days go by with no spike 
events.  During these spikes, the system is completely unresponsive (you can't 
even login via ssh).
I managed to capture one such event using top(1) with the "batch" option as a 
background process.  See output below - it shows 19 active postgress processes, but I 
think it missed the bulk of the spike.



Any ideas where I can look to find what's triggering this?



Postgres 8.3.0

^

If it's really 8.3.0, try updating to 8.3.something-recent.  We've fixed
a whole lot of bugs since then.


Good advice, I've been meaning to do this, maybe this will be a kick in the 
pants to motivate me.


I have a suspicion that this might be an sinval overrun scenario, in
which case you'd need to update to 8.4 to get a real fix.  But updating
in the 8.3 branch would be cheap and easy.

If it is sinval overrun, it would presumably be triggered by a whole lot
of catalog changes being made at approximately the same time.  Can you
correlate the spikes with anything like that?


Not that I know of.  Just regular web traffic.  On the backup server these 
events happen occasionally even when there is little or no web traffic, and 
nobody logged in doing maintenance.



regards, tom lane




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Craig James

On 4/7/10 5:47 PM, Robert Haas wrote:

On Wed, Apr 7, 2010 at 6:56 PM, David Rees  wrote:

max_fsm_pages = 1600
max_fsm_relations = 625000
synchronous_commit = off


You are playing with fire here.  You should never turn this off unless
you do not care if your data becomes irrecoverably corrupted.


That is not correct.  Turning off synchronous_commit is sensible if
you don't mind losing the last few transactions on a crash.  What will
corrupt your database is if you turn off fsync.


A bit off the original topic, but ...

I set it this way because I was advised that with a battery-backed RAID 
controller, this was a safe setting.  Is that not the case?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Dell Perc HX00 RAID controllers: What's inside?

2010-05-07 Thread Craig James

Now that it's time to buy a new computer, Dell has changed their RAID models 
from the Perc6 to Perc H200 and such.  Does anyone know what's inside these?  I 
would hope they've stuck with the Megaraid controller...

Also, I can't find any info on Dell's site about how these devices can be 
configured.  I was thinking of ten disks, as

  OS: RAID1
  WAL: RAID1
  Database: RAID10 using 6 disks

But it's not clear to me if these RAID controllers can handle multible arrays, 
or if you need a separate controller for each array.  We're a small shop and I 
only get to do this every year or so, and everything changes in between 
purchases!

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-12 Thread Craig James

On 5/12/10 4:55 AM, Kevin Grittner wrote:

venu madhav  wrote:

we display in sets of 20/30 etc. The user also has the option to
browse through any of those records hence the limit and offset.


Have you considered alternative techniques for paging?  You might
use values at the edges of the page to run a small query (limit, no
offset) when they page.  You might generate all the pages on the
first pass and cache them for a while.


Kevin is right.  You need to you "hitlists" - a semi-temporary table that holds 
the results of your initial query.  You're repeating a complex, expensive query over and 
over, once for each page of data that the user wants to see.  Instead, using a hitlist, 
your initial query looks something like this:

create table hitlist_xxx(
   objectid integer,
   sortorder integer default nextval('hitlist_seq')
);

insert into hitlist_xxx (objectid)
(select ... your original query ... order by ...)

You store some object ID or primary key in the "hitlist" table, and the 
sequence records your original order.

Then when your user asks for page 1, 2, 3 ... N, all you have to do is join 
your hitlist to your original data:

  select ... from mytables join hitlist_xxx on (...)
 where sortorder >= 100 and sortorder < 120;

which would instantly return page 5 of your data.

To do this, you need a way to know when a user is finished so that you can 
discard the hitlist.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-05-26 Thread Craig James

On 5/26/10 9:47 AM, Stephen Frost wrote:

* Eliot Gable (egable+pgsql-performa...@gmail.com) wrote:

Since PostgreSQL is written in C, I assume there is no
such additional overhead. I assume that the PL/PGSQL implementation at its
heart also uses SPI to perform those executions. Is that a fair statement?


Right, but I also wouldn't expect a huge improvment either, unless
you're calling these queries a ton, or the queries that you're calling
from the pl/pgsql are pretty short-lived.

Don't get me wrong, C is going to be faster, but it depends on exactly
what's going on as to if it's going to be an overall improvment of, say,
10%, or a 10-fold improvment. :)


Or a 0.1% improvement, which is more likely.  Or that the PL/PGSQL version is 
even faster than the C version, because if you do any string regexp in your 
function, Perl has extremely efficient algorithms, probably better than you 
have time to write in C.

We use Perl extensively and have never had any complaints.  The database 
activity completely dominates all queries, and the performance of Perl has 
never even been noticable.

We use a C functions for a few things, and it is a big nuisance.  Every time 
you upgrade Postgres or your OS, there's a chance the recompile will fail 
because of changed header files.  Any bugs in your code crash Postgres itself.  
We avoid C as much as possible (and I love C, been doing it since 1984).

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?

2010-05-27 Thread Craig James

On 5/18/10 3:28 PM, Carlo Stonebanks wrote:

Sample code:

SELECT *
FROM MyTable
WHERE foo = 'bar' AND MySlowFunc('foo') = 'bar'

Let's say this required a SEQSCAN because there were no indexes to
support column foo. For every row where foo <> 'bar' would the filter on
the SEQSCAN short-circuit the AND return false right away, or would it
still execute MySlowFunc('foo') ?


I asked a similar question a few years back, and the answer is that the planner 
just makes a guess and applies it to all functions.  It has no idea whether 
your function is super fast or incredibly slow, they're all assigned the same 
cost.

In this fairly simple case, the planner might reasonably guess that "foo = 'bar'" will 
always be faster than "AnyFunc(foo) = 'bar'".  But for real queries, that might not be 
the case.

In my case, I have a function that is so slow that it ALWAYS is good to avoid 
it.  Unfortunately, there's no way to explain that to Postgres, so I have to 
use other tricks to force the planner not to use it.

  select * from
(select * from MyTable where foo = 'bar' offset 0)
where MySlowFunc(foo) = 'bar';

The "offset 0" prevents the planner from collapsing this query back into your 
original syntax.  It will only apply MySlowFunc() to rows where you already know that foo 
= 'bar'.

It would be nice if Postgres had a way to assign a cost to every function. 
Until then, you have to use convoluted SQL if you have a really slow function.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Does FILTER in SEQSCAN short-circuit AND?

2010-05-27 Thread Craig James

On 5/27/10 2:28 PM, Kevin Grittner wrote:

Craig James  wrote:


It would be nice if Postgres had a way to assign a cost to every
function.


The COST clause of CREATE FUNCTION doesn't do what you want?

http://www.postgresql.org/docs/8.4/interactive/sql-createfunction.html


Cool ... I must have missed it when this feature was added.  Nice!

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Weird XFS WAL problem

2010-06-02 Thread Craig James

I'm testing/tuning a new midsize server and ran into an inexplicable problem.  
With an RAID10 drive, when I move the WAL to a separate RAID1 drive, TPS drops 
from over 1200 to less than 90!   I've checked everything and can't find a 
reason.

Here are the details.

8 cores (2x4 Intel Nehalem 2 GHz)
12 GB memory
12 x 7200 SATA 500 GB disks
3WARE 9650SE-12ML RAID controller with bbu
  2 disks: RAID1  500GB ext4  blocksize=4096
  8 disks: RAID10 2TB, stripe size 64K, blocksize=4096 (ext4 or xfs - see below)
  2 disks: hot swap
Ubuntu 10.04 LTS (Lucid)

With xfs or ext4 on the RAID10 I got decent bonnie++ and pgbench results (this 
one is for xfs):

Version 1.03e   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
argon24064M 70491  99 288158  25 129918  16 65296  97 428210  23 558.9  
 1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 23283  81 + +++ 13775  56 20143  74 + +++ 15152  54
argon,24064M,70491,99,288158,25,129918,16,65296,97,428210,23,558.9,1,16,23283,81,+,+++,13775,56,20143\
,74,+,+++,15152,54

pgbench -i -s 100 -U test
pgbench -c 10 -t 1 -U test
scaling factor: 100
query mode: simple
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 1046.104635 (including connections establishing)
tps = 1046.337276 (excluding connections establishing)

Now the mystery: I moved the pg_xlog directory to a RAID1 array (same 3WARE 
controller, two more SATA 7200 disks).  Run the same tests and ...

tps = 82.325446 (including connections establishing)
tps = 82.326874 (excluding connections establishing)

I thought I'd made a mistake, like maybe I moved the whole database to the 
RAID1 array, but I checked and double checked.  I even watched the lights blink 
- the WAL was definitely on the RAID1 and the rest of Postgres on the RAID10.

So I moved the WAL back to the RAID10 array, and performance jumped right back up 
to the >1200 TPS range.

Next I check the RAID1 itself:

  dd if=/dev/zero of=./bigfile bs=8192 count=200

which yielded 98.8 MB/sec - not bad.  bonnie++ on the RAID1 pair showed good 
performance too:

Version 1.03e   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
argon24064M 68601  99 110057  18 46534   6 59883  90 123053   7 471.3   
1
--Sequential Create-- Random Create
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
 16 + +++ + +++ + +++ + +++ + +++ + +++
argon,24064M,68601,99,110057,18,46534,6,59883,90,123053,7,471.3,1,16,+,+++,+,+++,+,+++,+,\
+++,+,+++,+,+++

So ... anyone have any idea at all how TPS drops to below 90 when I move the 
WAL to a separate RAID1 disk?  Does this make any sense at all?  It's 
repeatable. It happens for both ext4 and xfs. It's weird.

You can even watch the disk lights and see it: the RAID10 disks are on almost 
constantly when the WAL is on the RAID10, but when you move the WAL over to the 
RAID1, its lights are dim and flicker a lot, like it's barely getting any data, 
and the RAID10 disk's lights barely go on at all.

Thanks,
Craig










--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Craig James

On 6/2/10 4:40 PM, Mark Kirkwood wrote:

On 03/06/10 11:30, Craig James wrote:

I'm testing/tuning a new midsize server and ran into an inexplicable
problem. With an RAID10 drive, when I move the WAL to a separate RAID1
drive, TPS drops from over 1200 to less than 90! I've checked
everything and can't find a reason.


Are the 2 new RAID1 disks the same make and model as the 12 RAID10 ones?


Yes.


Also, are barriers *on* on the RAID1 mount and off on the RAID10 one?


It was the barriers.  "barrier=1" isn't just a bad idea on ext4, it's a 
disaster.

pgbench -i -s 100 -U test
pgbench -c 10 -t 1 -U test

Change WAL to barrier=0

tps = 1463.264981 (including connections establishing)
tps = 1463.725687 (excluding connections establishing)

Change WAL to noatime, nodiratime, barrier=0

tps = 1479.331476 (including connections establishing)
tps = 1479.810545 (excluding connections establishing)

Change WAL to barrier=1

tps = 82.325446 (including connections establishing)
tps = 82.326874 (excluding connections establishing)

This is really hard to believe, because the bonnie++ numbers and dd(1) numbers look good 
(see my original post).  But it's totally repeatable.  It must be some really unfortunate 
"just missed the next sector going by the write head" problem.

So with ext4, bonnie++ and dd aren't the whole story.

BTW, I also learned that if you edit /etc/fstab and use "mount -oremount" it WON'T change "barrier=0/1" 
unless it is explicit in the fstab file.  That is, if you put "barrier=0" into /etc/fstab and use the remount, it will 
change it to no barriers.  But if you then remove it from /etc/fstab, it won't change it back to the default.  You have to 
actually put "barrier=1" if you want to get it back to the default.  This seems like a bug to me, and it made it really 
hard to track this down. "mount -oremount" is not the same as umount/mount!

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need to increase performance of a query

2010-06-10 Thread Craig James

On 6/10/10 12:34 PM, Anne Rosset wrote:

Jochen Erwied wrote:

Thursday, June 10, 2010, 8:36:08 PM you wrote:


psrdb=# (SELECT
psrdb(# MAX(item_rank.rank) AS maxRank
psrdb(# FROM
psrdb(# item_rank item_rank
psrdb(# WHERE
psrdb(# item_rank.project_id='proj2783'
psrdb(# AND item_rank.pf_id IS NULL
psrdb(#
psrdb(# )
psrdb-# ORDER BY
psrdb-# maxRank DESC;


Don't think it does really matter, but why do you sort a resultset
consisting of only one row?


Sorry, I should have removed the ORDER by (the full query has a union).
So without the ORDER by, here are the results:
psrdb=# SELECT
psrdb-# MAX(item_rank.rank) AS maxRank
psrdb-# FROM
psrdb-# item_rank item_rank
psrdb-# WHERE
psrdb-# item_rank.pf_id='plan1408';
maxrank
-
2050400
(1 row)

Time: 1.516 ms
psrdb=# SELECT
psrdb-# MAX(item_rank.rank) AS maxRank
psrdb-# FROM
psrdb-# item_rank item_rank
psrdb-# WHERE
psrdb-# item_rank.project_id='proj2783'
psrdb-# AND item_rank.pf_id IS NULL;
maxrank
-
202
(1 row)

Time: 13.177 ms

Is there anything that can be done for the second one?


Postgres normally doesn't index NULL values even if the column is indexed, so it has to 
do a table scan when your query includes an IS NULL condition.  You need to create an 
index that includes the "IS NULL" condition.

  create index item_rank_null_idx on item_rank(pf_id)
   where item_rank.pf_id is null;

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Craig James

[oops, didn't hit "reply to list" first time, resending...]

On 6/15/10 9:02 AM, Steve Wampler wrote:

Chris Browne wrote:

"jgard...@jonathangardner.net"  writes:

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.


For such a scenario, I'd suggest you:

- Set up a filesystem that is memory-backed. On Linux, RamFS or TmpFS
are reasonable options for this.

- The complication would be that your "restart the machine and move
on" needs to consist of quite a few steps:

- recreating the filesystem
- fixing permissions as needed
- running initdb to set up new PG instance
- automating any needful fiddling with postgresql.conf, pg_hba.conf
- starting up that PG instance
- creating users, databases, schemas, ...


How about this: Set up a database entirely on a RAM disk, then install a 
WAL-logging warm standby.  If the production computer goes down, you bring the 
warm standby online, shut it down, and use tar(1) to recreate the database on 
the production server when you bring it back online.  You have speed and you 
have near-100% backup.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Craig James

On 6/16/10 12:00 PM, Josh Berkus wrote:



* fsync=off =>  5,100
* fsync=off and synchronous_commit=off =>  5,500


Now, this *is* interesting ... why should synch_commit make a difference
if fsync is off?

Anyone have any ideas?


I found that pgbench has "noise" of about 20% (I posted about this a couple 
days ago using data from 1000 identical pgbench runs).  Unless you make a bunch of runs 
and average them, a difference of 5,100 to 5,500 appears to be meaningless.

Craig




tmpfs, WAL on same tmpfs:
* Default config: 5,200
* full_page_writes=off =>  5,200
* fsync=off =>  5,250
* synchronous_commit=off =>  5,200
* fsync=off and synchronous_commit=off =>  5,450
* fsync=off and full_page_writes=off =>  5,250
* fsync=off, synchronous_commit=off and full_page_writes=off =>  5,500


So, in this test, it seems like having WAL on tmpfs doesn't make a
significant difference for everything == off.

I'll try running some tests on Amazon when I have a chance.  It would be
worthwhile to get figures without Python's "ceiling".




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] requested shared memory size overflows size_t

2010-06-24 Thread Craig James

Can anyone tell me what's going on here?  I hope this doesn't mean my system 
tables are corrupt...

Thanks,
Craig


select relname, pg_relation_size(relname) from pg_class
where pg_get_userbyid(relowner) = 'emol_warehouse_1'
and relname not like 'pg_%'
order by pg_relation_size(relname) desc;
ERROR:  relation "rownum_temp" does not exist

emol_warehouse_1=> select relname from pg_class where relname = 'rownum_temp';
   relname
--
 rownum_temp
(1 row)

emol_warehouse_1=> \d rownum_temp
Did not find any relation named "rownum_temp".
emol_warehouse_1=> create table rownum_temp(i int);
CREATE TABLE
emol_warehouse_1=> drop table rownum_temp;
DROP TABLE
emol_warehouse_1=> select relname, pg_relation_size(relname) from pg_class
where pg_get_userbyid(relowner) = 'emol_warehouse_1'
and relname not like 'pg_%'
order by pg_relation_size(relname) desc;
ERROR:  relation "rownum_temp" does not exist

emol_warehouse_1=> select relname, pg_relation_size(relname) from pg_class;
ERROR:  relation "tables" does not exist





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] System tables screwed up? (WAS requested shared memory size overflows size_t)

2010-06-24 Thread Craig James

On 6/24/10 4:19 PM, Alvaro Herrera wrote:

Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:


select relname, pg_relation_size(relname) from pg_class
  where pg_get_userbyid(relowner) = 'emol_warehouse_1'
  and relname not like 'pg_%'
  order by pg_relation_size(relname) desc;
ERROR:  relation "rownum_temp" does not exist

emol_warehouse_1=>  select relname from pg_class where relname = 'rownum_temp';
 relname
--
   rownum_temp
(1 row)


What's the full row?  I'd just add a "WHERE relkind = 'r'" to the above
query anyway.


Thanks, in fact that works.  But my concern is that these are system tables and 
system functions and yet they seem to be confused.  I've used this query dozens 
of times and never seen this behavior before.  It makes me really nervous...

Craig

P.S. Sorry I got the Subject wrong the first time by hitting the REPLY key 
mindlessly, I've changed it now.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-06-24 Thread Craig James

I'm reviving this question because I never figured it out.  To summarize: At random 
intervals anywhere from a few times per hour to once or twice a day, we see a huge spike 
in CPU load that essentially brings the system to a halt for up to a minute or two.  
Previous answers focused on "what is it doing", i.e. is it really Postgres or 
something else?

Now the question has narrowed down to this: what could trigger EVERY postgres backend to 
do something at the same time?  See the attached output from "top -b", which 
shows what is happening during one of the CPU spikes.

A little background about our system.  We have roughly 100 FastCGI clients 
connected at all times that are called on to generate images from data in the 
database.  Even though there are a lot of these, they don't do much.  They sit 
there most of the time, then they spew out a couple dozen GIF images in about 
one second as a user gets a new page of data.  Each GIF image requires fetching 
a single row using a single indexed column, so it's a trival amount of work for 
Postgres.

We also have the "heavy lift" application that does the search.  Typically one 
or two of these is running at a time, and takes from a fraction of a second to a few 
minutes to complete.  In this particular instance, immediately before this spike, the CPU 
load was only at about 10% -- a couple users poking around with easy queries.

So what is it that will cause every single Postgres backend to come to life at 
the same moment, when there's no real load on the server?  Maybe if a backend 
crashes?  Some other problem?

There's nothing in the serverlog.

Thanks,
Craig


top - 12:15:09 up 81 days, 21:18,  4 users,  load average: 0.38, 0.38, 0.73
Tasks: 374 total,  95 running, 279 sleeping,   0 stopped,   0 zombie
Cpu(s): 62.5%us,  2.2%sy,  0.0%ni, 34.9%id,  0.2%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   8194800k total,  7948928k used,   245872k free,   36k buffers
Swap:  2031608k total,   161136k used,  1870472k free,  7129744k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
22120 postgres  20   0 2514m  17m  13m R   11  0.2   0:01.02 postmaster
18497 postgres  20   0 2514m  11m 8832 R6  0.1   0:00.62 postmaster
22962 postgres  20   0 2514m  12m 9548 R6  0.2   0:00.22 postmaster
24002 postgres  20   0 2514m  11m 8804 R6  0.1   0:00.15 postmaster
25900 postgres  20   0 2514m  11m 8824 R6  0.1   0:00.55 postmaster
 8941 postgres  20   0 2324m 6172 4676 R5  0.1   0:00.32 postmaster
10622 postgres  20   0 2514m  12m 9444 R5  0.2   0:00.79 postmaster
14021 postgres  20   0 2514m  11m 8548 R5  0.1   0:00.28 postmaster
14075 postgres  20   0 2514m  11m 8672 R5  0.1   0:00.27 postmaster
14423 postgres  20   0 2514m  11m 8572 R5  0.1   0:00.29 postmaster
18896 postgres  20   0 2324m 5644 4204 R5  0.1   0:00.11 postmaster
18897 postgres  20   0 2514m  12m 9800 R5  0.2   0:00.27 postmaster
18928 postgres  20   0 2514m  11m 8792 R5  0.1   0:00.18 postmaster
18973 postgres  20   0 2514m  11m 8792 R5  0.1   0:00.70 postmaster
22049 postgres  20   0 2514m  17m  14m R5  0.2   0:01.11 postmaster
22050 postgres  20   0 2514m  16m  13m R5  0.2   0:01.06 postmaster
22843 postgres  20   0 2514m  12m 9328 R5  0.2   0:00.20 postmaster
24202 postgres  20   0 2324m 5560 4120 R5  0.1   0:00.07 postmaster
24388 postgres  20   0 2514m  12m 9380 R5  0.2   0:00.16 postmaster
25903 postgres  20   0 2514m  11m 8828 R5  0.1   0:00.55 postmaster
28362 postgres  20   0 2514m  11m 8952 R5  0.1   0:00.48 postmaster
 5667 postgres  20   0 2324m 6752 5588 R4  0.1   0:08.93 postmaster
 7531 postgres  20   0 2324m 5452 4008 R4  0.1   0:03.21 postmaster
 9219 postgres  20   0 2514m  11m 8476 R4  0.1   0:00.89 postmaster
 9820 postgres  20   0 2514m  12m 9.9m R4  0.2   0:00.92 postmaster
10050 postgres  20   0 2324m 6172 4676 R4  0.1   0:00.31 postmaster
10645 postgres  20   0 2514m  12m 9512 R4  0.2   0:00.72 postmaster
14582 postgres  20   0 2514m  25m  21m R4  0.3   0:02.10 postmaster
18502 postgres  20   0 2514m  11m 9040 R4  0.1   0:00.64 postmaster
18972 postgres  20   0 2514m  11m 8792 R4  0.1   0:00.76 postmaster
18975 postgres  20   0 2514m  11m 8904 R4  0.1   0:00.63 postmaster
19496 postgres  20   0 2514m  14m  11m R4  0.2   0:00.44 postmaster
22121 postgres  20   0 2514m  16m  13m R4  0.2   0:00.81 postmaster
24340 postgres  20   0 2514m  12m 9424 R4  0.2   0:00.15 postmaster
24483 postgres  20   0 2324m 6008 4536 R4  0.1   0:00.21 postmaster
25668 postgres  20   0 2514m  16m  13m R4  0.2   0:00.91 postmaster
26382 postgres  20   0 2514m  11m 8996 R4  0.1   0:00.50 postmaster
28363 postgres  20   0 2514m  11m 8908 R4  0.1   0:00.34 postmaster
 9754 postgres  20   0 2514m  11m 8752 R3  0.1   0:00.29 postmaster
16113 postgres  20   0 2514m  17m  14m R3  0.2   0:01.10 postmaster
18498 postgres  20   0 2514m  11m 8

Re: [PERFORM] Occasional giant spikes in CPU load

2010-06-25 Thread Craig James

On 6/24/10 9:04 PM, Tom Lane wrote:

Craig James  writes:

So what is it that will cause every single Postgres backend to come to life at 
the same moment, when there's no real load on the server?  Maybe if a backend 
crashes?  Some other problem?


sinval queue overflow comes to mind ... although that really shouldn't
happen if there's "no real load" on the server.  What PG version is
this?


8.3.10.  Upgraded based on your advice when I first asked this question.


  Also, the pg_stat_activity view contents when this happens would
probably be more useful to look at than "top" output.


I'll try.  It's hard to discover anything because the whole machine is 
overwhelmed when this happens.  The only way I got the top(1) output was by 
running it high priority as root using nice(1).  I can't do that with a 
Postgres backend, but I'll see what I can do.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-06-25 Thread Craig James

On 6/25/10 7:47 AM, Tom Lane wrote:

Craig James  writes:

On 6/24/10 9:04 PM, Tom Lane wrote:

sinval queue overflow comes to mind ... although that really shouldn't
happen if there's "no real load" on the server.  What PG version is
this?



8.3.10.  Upgraded based on your advice when I first asked this question.


Any chance of going to 8.4?  If this is what I suspect, you really need
this 8.4 fix:
http://archives.postgresql.org/pgsql-committers/2008-06/msg00227.php
which eliminated the thundering-herd behavior that previous releases
exhibit when the sinval queue overflows.


Yes, there is a chance of upgrading to 8.4.4.  I just bought a new server and 
it has 8.4.4 on it, but it won't be online for a while so I can't compare yet.  
This may motivate me to upgrade the current servers to 8.4.4 too.  I was 
pleased to see that 8.4 has a new upgrade-in-place feature that means we don't 
have to dump/restore.  That really helps a lot.

A question about 8.4.4: I've been having problems with bloat.  I thought I'd 
adjusted the FSM parameters correctly based on advice I got here, but 
apparently not.  8.4.4 has removed the configurable FSM parameters completely, 
which is very cool.  But ... if I upgrade a bloated database using the 
upgrade-in-place feature, will 8.4.4 recover the bloat and return it to the OS, 
or do I still have to recover the space manually (like vacuum-full/reindex, or 
cluster, or copy/drop a table)?


Or you could look at using connection pooling so you don't have quite
so many backends ...


I always just assumed that lots of backends that would be harmless if each one 
was doing very little.  If I understand your explanation, it sounds like that's 
not entirely true in pre-8.4.4 releases due to the sinval queue problems.

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-06-25 Thread Craig James

On 6/25/10 9:41 AM, Kevin Grittner wrote:

Craig James  wrote:


I always just assumed that lots of backends that would be harmless
if each one was doing very little.


Even if each is doing very little, if a large number of them happen
to make a request at the same time, you can have problems.  This is
exactly where a connection pool can massively improve both
throughput and response time.  If you can arrange it, you want a
connection pool which will put a limit on active database
transactions and queue requests to start a new transaction until one
of the pending ones finishes.


No, that's doesn't seem to be the case.  There is no external activity that 
triggers this huge spike in usage.  It even happens to our backup server when 
only one of us is using it to do a single query.  This problem seems to be 
triggered by Postgres itself, not by anything external.

Per Tom's suggestion, I think upgrading to 8.4.4 is the answer.  I'll learn 
more when our new hardware comes into use with a shiny new 8.4.4 installation.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] pgbench results on a new server

2010-06-25 Thread Craig James

I've got a new server and want to make sure it's running well.  Are these 
pretty decent numbers?

8 cores (2x4 Intel Nehalem 2 GHz)
12 GB memory
12 x 7200 SATA 500 GB disks
3WARE 9650SE-12ML RAID controller with BBU
  WAL on ext2, 2 disks: RAID1 500GB, blocksize=4096
  Database on ext4, 8 disks: RAID10 2TB, stripe size 64K, blocksize=4096
Ubuntu 10.04 LTS (Lucid)
Postgres 8.4.4

pgbench -i -s 100 -U test
pgbench -c 5 -t 2 -U test
tps = 4903
pgbench -c 10 -t 1 -U test
tps = 4070
pgbench -c20 -t 5000 -U test
tps = 5789
pgbench -c30 -t  -U test
tps = 6961
pgbench -c40 -t 2500 -U test
tps = 2945

Thanks,
Craig



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Architecting a database

2010-06-25 Thread Craig James

On 6/25/10 3:28 PM, Kevin Grittner wrote:

  wrote:

With the PostgreSQL type tables I am not so certain how the data
is arranged within the one file. Does having the data all in one
database allow PostgreSQL to better utilize indexes and caches or
does having a number of smaller databases provide performance
increases? In case it is important, there are 2000 clients
involved, so that would be 2000 databases if I followed my current
FoxPro related structure.


The implications of putting multiple clients in a table, with a
client's rows identified by a client_id column, are probably fairly
obvious. If many of those 2,000 clients have tables with millions of
rows, performance could suffer without very careful indexing,
managing tables with billions of rows can become challenging, and
there could be concerns about how to ensure that data from one
client isn't accidentally shown to another.


You should also ask whether there are social (that is, nontechncal) reasons to 
avoid multiple clients per table.

When a customer asks about security and you tell them, "You get your own database, nobody else 
can log in," they tend to like that.  If you tell them that their data is mixed with everyone 
else's, but "we've done a really good job with our app software and we're pretty sure there 
are no bugs that would let anyone see your data," that may not fly.

People will trust Postgres security (assuming you actually do it right) because 
it's an open source, trusted product used by some really big companies.  But 
your own app?  Do you even trust it?

Even if your application IS secure, it may not matter.  It's what the customer 
believes or worries about that can sell your product.

We've also found another really good reason for separate databases.  It lets 
you experiment without any impact on anything else.  We have scripts that can 
create a database in just a few minutes, load it up, and have it ready to demo 
in just a few minutes.  If we don't end up using it, we just blow it off and 
its gone.  No other database is impacted at all.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgbench results on a new server

2010-06-28 Thread Craig James

On 6/25/10 12:03 PM, Greg Smith wrote:

Craig James wrote:

I've got a new server and want to make sure it's running well.


Any changes to the postgresql.conf file? Generally you need at least a
moderate shared_buffers (1GB or so at a minimum) and checkpoint_segments
(32 or higher) in order for the standard pgbench test to give good results.


max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB

For fun I ran it with the installation defaults, and it never got above 1475 
TPS.


pgbench -c20 -t 5000 -U test
tps = 5789
pgbench -c30 -t  -U test
tps = 6961
pgbench -c40 -t 2500 -U test
tps = 2945


General numbers are OK, the major drop going from 30 to 40 clients is
larger than it should be. I'd suggest running the 40 client count one
again to see if that's consistent.


It is consistent.  When I run pgbench from a different server, I get this:

   pgbench -c40 -t 2500 -U test
   tps = 7999

   pgbench -c100 -t 1000 -U test
   tps = 6693

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-30 Thread Craig James

On 6/30/10 9:42 AM, Dave Crooke wrote:

I haven't jumped in yet on this thread, but here goes 

If you're really looking for query performance, then any database which
is designed with reliability and ACID consistency in mind is going to
inherently have some mis-fit features.

Some other ideas to consider, depending on your query mix:

1. MySQL with the MyISAM database (non-ACID)

2. Put an in-application generic query cache in front of the DB, that
runs in the app address space, e.g. Cache' if using Java

3. Using a DB is a good way to get generic querying capability, but if
the "where" clause in the querying is over a small set of meta-data, and
SQL syntax is not a big requirement, consider non-RDBMS alternatives,
e.g. use XPath over a W3C DOM object tree to get primary keys to
in-memory hash tables (possibly distributed with something like memcached)


These would be good suggestions if the "throwaway" database was the only one.  
But in real life, these throwaway databases are built from other databases that are NOT 
throwaway, where the data matters and ACID is critical.  In other words, they'll probably 
need Postgres anyway.

Sure, you could use both Postgres and MySQL/ISAM, but that means installing and 
maintaining both, plus building all of the other application layers to work on 
both systems.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-02 Thread Craig James

On 7/2/10 6:59 AM, Eliot Gable wrote:

Yes, I have two pl/pgsql functions. They take a prepared set of data
(just the row id of the original results, plus the particular priority
and weight fields) and they return the same set of data with an extra
field called "order" which contains a numerical order to apply when
sorting the rows. One function uses the priority information to break
everything into priority groups, then calls the other function for each
priority group. Each time it gets results back from the inner function,
it returns that set of results. When it has looped through all priority
groups, then it returns the full built-up set of results back to the
calling function.

The pl/pgsql functions implementing the sort are as optimized as they
are likely to get. I don't want to waste my time trying to further
optimize pl/pgsql functions that are never going to be as fast and
efficient as I need. I would rather spend that time re-writing it in C
and get sorting back to <1ms.

I guess the real question is, is a generic C sorting function my only
real alternative? Is there anything else that would allow me to sort
things faster than pl/pgsql functions? For example, if I used pl/perl,
would I be able to expect considerably better performance for sorting
than using pl/pgsql? What about other supported languages? If I can get
close to 1ms sorting performance without resorting to C, it would save
me much time and frustration.


Try coding it in perl on the server.  It is MUCH easier to code, and you don't 
have to link anything or learn the esoteric details of the Postgres/C API.

Perl itself is written in C, and some of it's operations are extremely fast.  
Depending on the size and complexity of your data structures, Perl code may be 
just as fast as code you could write in C.

Even if it turns out to be slower than you like, it will give you a way to 
package up your sort functionality into a function call, so if you later find 
you need to replace the Perl function with a C function, the rest of your 
application won't change.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread Craig James

On 7/8/10 9:31 AM, Ryan Wexler wrote:

Thanks a lot for all the comments.  The fact that both my windows box
and the old linux box both show a massive performance improvement over
the new linux box seems to point to hardware to me.  I am not sure how
to test the fsync issue, but i don't see how that could be it.

The raid card the server has in it is:
3Ware 4 Port 9650SE-4LPML RAID Card

Looking it up, it seems to indicate that it has BBU


Make sure the battery isn't dead.  Most RAID controllers drop to non-BBU speeds 
if they detect that the battery is faulty.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread Craig James

On 7/8/10 12:47 PM, Ryan Wexler wrote:



On Thu, Jul 8, 2010 at 12:46 PM, Kevin Grittner
mailto:kevin.gritt...@wicourts.gov>> wrote:

Ryan Wexler mailto:r...@iridiumsuite.com>>
wrote:

 > One thing I don't understand is why BBU will result in a huge
 > performance gain.  I thought BBU was all about power failures?

Well, it makes it safe for the controller to consider the write
complete as soon as it hits the RAM cache, rather than waiting for
persistence to the disk itself.  It can then schedule the writes in
a manner which is efficient based on the physical medium.

Something like this was probably happening on your non-server
machines, but without BBU it was not actually safe.  Server class
machines tend to be more conservative about not losing your data,
but without a RAID controller with BBU cache, that slows writes down
to the speed of the rotating disks.

-Kevin

Thanks for the explanations that makes things clearer.  It still amazes
me that it would account for a 5x change in IO.


It's not exactly a 5x change in I/O, rather it's a 5x change in *transactions*.  Without a BBU 
Postgres has to wait for each transaction to by physically written to the disk, which at 7200 RPM 
(or 10K or 15K) means a few hundred per second.  Most of the time Postgres is just sitting there 
waiting for the disk to say, "OK, I did it."  With BBU, once the RAID card has the data, 
it's virtually guaranteed it will get to the disk even if the power fails, so the RAID controller 
says, "OK, I did it" even though the data is still in the controller's cache and not 
actually on the disk.

It means there's no tight relationship between the disk's rotational speed and 
your transaction rate.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread Craig James

On 7/8/10 2:18 PM, timothy.noo...@emc.com wrote:

How does the linux machine know that there is a BBU installed and to
change its behavior or change the behavior of Postgres? I am
experiencing performance issues, not with searching but more with IO.


It doesn't.  It trusts the disk controller.  Linux says, "Flush your cache" and the controller 
says, "OK, it's flushed."  In the case of a BBU controller, the controller can say that almost 
instantly because it's got the data in a battery-backed memory that will survive even if the power goes out.  
In the case of a non-BBU controller (RAID or non-RAID), the controller has to actually wait for the head to 
move to the right spot, then wait for the disk to spin around to the right sector, then write the data.  Only 
then can it say, "OK, it's flushed."

So to Linux, it just appears to be a disk that's exceptionally fast at flushing 
its buffers.

Craig



-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Craig James
Sent: Thursday, July 08, 2010 4:02 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] performance on new linux box

On 7/8/10 12:47 PM, Ryan Wexler wrote:



On Thu, Jul 8, 2010 at 12:46 PM, Kevin Grittner
mailto:kevin.gritt...@wicourts.gov>>

wrote:


 Ryan Wexlermailto:r...@iridiumsuite.com>>
 wrote:

  >  One thing I don't understand is why BBU will result in a huge
  >  performance gain.  I thought BBU was all about power failures?

 Well, it makes it safe for the controller to consider the write
 complete as soon as it hits the RAM cache, rather than waiting for
 persistence to the disk itself.  It can then schedule the writes

in

 a manner which is efficient based on the physical medium.

 Something like this was probably happening on your non-server
 machines, but without BBU it was not actually safe.  Server class
 machines tend to be more conservative about not losing your data,
 but without a RAID controller with BBU cache, that slows writes

down

 to the speed of the rotating disks.

 -Kevin

Thanks for the explanations that makes things clearer.  It still

amazes

me that it would account for a 5x change in IO.


It's not exactly a 5x change in I/O, rather it's a 5x change in
*transactions*.  Without a BBU Postgres has to wait for each transaction
to by physically written to the disk, which at 7200 RPM (or 10K or 15K)
means a few hundred per second.  Most of the time Postgres is just
sitting there waiting for the disk to say, "OK, I did it."  With BBU,
once the RAID card has the data, it's virtually guaranteed it will get
to the disk even if the power fails, so the RAID controller says, "OK, I
did it" even though the data is still in the controller's cache and not
actually on the disk.

It means there's no tight relationship between the disk's rotational
speed and your transaction rate.

Craig




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using more tha one index per table

2010-07-21 Thread Craig James

On 7/21/10 5:47 PM, Craig Ringer wrote:

On 21/07/10 22:59, Greg Smith wrote:


  A useful trick to know is that if you replace the version number
with "current", you'll get to the latest version most of the time
(sometimes the name of the page is changed between versions, too, but
this isn't that frequent).


The docs pages could perhaps benefit from an auto-generated note saying:

"The current version of Pg is 8.4. This documentation is for version
8.2. Click [here] for documentation on the current version."

... or something to that effect. It'd be a nice (and more user-friendly)
alternative to url twiddling when searches reveal docs for an old
version, and might help push the /current/ pages up in search rank too.


In addition, why not use symlinks so that the current version is simply called 
"current", as in

   http://www.postgresql.org/docs/current/static/sql-insert.html

If you google for "postgres insert", you get this:

  http://www.postgresql.org/docs/8.1/static/sql-insert.html

The problem is that Google ranks pages based on inbound links, so older versions of 
Postgres *always* come up before the latest version in page ranking.  By using 
"current" and encouraging people to link to that, we could quickly change the 
Google pagerank so that a search for Postgres would turn up the most-recent version of 
documentation.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using more tha one index per table

2010-07-22 Thread Craig James

On 7/21/10 6:47 PM, Greg Smith wrote:

Craig James wrote:

By using "current" and encouraging people to link to that, we could
quickly change the Google pagerank so that a search for Postgres would
turn up the most-recent version of documentation.


How do you propose to encourage people to do that? If I had a good
answer to that question, I'd already be executing on it. I've made a
habit of doing that when writing articles on the wiki, which hopefully
themselves become popular and then elevate those links (all of the ones
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for
example point to current). I don't know how to target "people who link
to the PostgreSQL manual" beyond raising awareness of the issue
periodically on these lists, like I did on this thread.


You don't have to get everyone to do it.  Just get more people to link to 
"current" than to other versions and you win in the Google ranking.

Start by sending an announcement to every PG mailing list.  You'd probably get 
a couple thousand right away, which by itself might do the trick.

Once "current" reaches the top of the Google ranking, it will cascade: People searching 
for Postgres documentation will find "current" first, and will post links to it, which 
will further reinforce its popularity.

There will always be people who link to older versions, but since the versions change 
frequently and "current" lasts forever, its ranking will constantly build until 
it ultimately wins.

There's no downside to it.  It's easy to do.  The other ideas (like putting "out of date" 
disclaimers and such into older versions) might also be useful, but might be a lot of work for just 
a little more gain.  Creating a "current" link is simple and in the long run will be very 
effective.  The sooner it starts, the sooner it will gain traction.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using more tha one index per table

2010-07-23 Thread Craig James

On 7/23/10 2:22 AM, Torsten Zühlsdorff wrote:

Craig James schrieb:


A useful trick to know is that if you replace the version number
with "current", you'll get to the latest version most of the time
(sometimes the name of the page is changed between versions, too, but
this isn't that frequent).


The docs pages could perhaps benefit from an auto-generated note saying:

"The current version of Pg is 8.4. This documentation is for version
8.2. Click [here] for documentation on the current version."

... or something to that effect. It'd be a nice (and more user-friendly)
alternative to url twiddling when searches reveal docs for an old
version, and might help push the /current/ pages up in search rank too.


In addition, why not use symlinks so that the current version is
simply called "current", as in

http://www.postgresql.org/docs/current/static/sql-insert.html

If you google for "postgres insert", you get this:

http://www.postgresql.org/docs/8.1/static/sql-insert.html

The problem is that Google ranks pages based on inbound links, so
older versions of Postgres *always* come up before the latest version
in page ranking.


Since 2009 you can deal with this by defining the canonical-version.
(http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html)


This is a really cool feature, but it's not what we need.  The "canonical" refers to the URL, not 
the web page.  It's only supposed to be used if you have multiple URLs that are actually the *same* page; the 
"canonical" URL tells Google "use only this URL for this page."

But in our case, the Postgres manuals for each release have different URLs *and* 
different content, so the "canonical URL" isn't the right solution.

Craig


Greetings from Germany,
Torsten




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using more tha one index per table

2010-07-24 Thread Craig James

On 7/24/10 5:57 AM, Torsten Zühlsdorff wrote:

Craig James schrieb:


The problem is that Google ranks pages based on inbound links, so
older versions of Postgres *always* come up before the latest version
in page ranking.


Since 2009 you can deal with this by defining the canonical-version.
(http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html)



This is a really cool feature, but it's not what we need. The
"canonical" refers to the URL, not the web page. It's only supposed to
be used if you have multiple URLs that are actually the *same* page;
the "canonical" URL tells Google "use only this URL for this page."

But in our case, the Postgres manuals for each release have different
URLs *and* different content, so the "canonical URL" isn't the right
solution.


This is true, but the content is allowed to change "a little". Of course
their is no percentage of allowed changes. But it can be quite much.
I've used this feature for some clients, which push their content into
very different websites and it does work.
Most of the content of the documentation doesn't change much between the
releases. In most cases the canonical will work the way i suggest.

In case of big changes even the recommandation of using a "current"
version won't work. Its true that Google ranks pages based on inbound
links. But there are more than 200 other factores, which influence the
rankings. Most people do not know, that changing most of a sites content
makes the inbound links for a long time useless. After big changes in
the documentation the "current" entry will be droped for some monthes
and the old entries will appear. But note, that every single site of the
documentation is ranked for itself. From my experience i would expect
the canonical-version with better results, than the current-version.

But the canonical is not the best solution in my opinion. I often edit
the urls of some documentations, because i need it for a special
postgresql version. The documentation clearly misses a version-switch.
Combined with an big note, that the current displayed documentation is
not the one of the current postgresql-version, this will be the best
compromiss in my opinion.


Here's an idea: Use a "current" URL, plus a JavaScript embedded in every page that compares its own 
URL to the "current" URL and, if it doesn't match, does a "document.write()" indicating 
how to find the most-current version.

That would solve three problems:

  1. There would be a "current" version that people could link to.
  2. If someone found an old version, they would know it and could
 instantly be directed to the current version.
  3. It wouldn't be any burden on the web site maintainers, because
 the JavaScript wouldn't have to be changed.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Two fast searches turn slow when used with OR clause

2010-08-05 Thread Craig James

I can query either my PARENT table joined to PRICES, or my VERSION table joined 
to PRICES, and get an answer in 30-40 msec.  But put the two together, it jumps 
to 4 seconds.  What am I missing here?  I figured this query would be nearly 
instantaneous.  The VERSION.ISOSMILES and PARENT.ISOSMILES columns both have 
unique indexes.  Instead of using these indexes, it's doing a full-table scan 
of both tables, even though there can't possibly be more than one match in each 
table.

I guess I could rewrite this as a UNION of the two subqueries, but that seems 
contrived.

This is PG 8.3.10 on Linux.

Thanks,
Craig


=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
vn.version_id
-> from plus p join sample s
->  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
-> join version vn on (s.version_id = vn.version_id) join parent pn
->  on (s.parent_id = pn.parent_id)
-> where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
-> or pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
-> order by price;

 Sort  (cost=71922.00..71922.00 rows=1 width=19) (actual 
time=4337.114..4337.122 rows=10 loops=1)
   Sort Key: p.price   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=18407.53..71921.99 rows=1 width=19) (actual 
time=1122.685..4337.028 rows=10 loops=1)
 ->  Hash Join  (cost=18407.53..71903.71 rows=4 width=20) (actual 
time=1122.624..4336.682 rows=7 loops=1)
   Hash Cond: (s.version_id = vn.version_id)
   Join Filter: ((vn.isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text) OR (pn.isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text))
   ->  Hash Join  (cost=8807.15..44470.73 rows=620264 width=54) 
(actual time=431.501..2541.329 rows=620264 loops=1)
 Hash Cond: (s.parent_id = pn.parent_id)
 ->  Seq Scan on sample s  (cost=0.00..21707.64 rows=620264 
width=24) (actual time=0.008..471.340 rows=620264 loops=1)
 ->  Hash  (cost=5335.40..5335.40 rows=277740 width=38) 
(actual time=431.166..431.166 rows=277740 loops=1)
   ->  Seq Scan on parent pn  (cost=0.00..5335.40 
rows=277740 width=38) (actual time=0.012..195.822 rows=277740 loops=1)
   ->  Hash  (cost=5884.06..5884.06 rows=297306 width=38) (actual 
time=467.267..467.267 rows=297306 loops=1)
 ->  Seq Scan on version vn  (cost=0.00..5884.06 
rows=297306 width=38) (actual time=0.017..215.285 rows=297306 loops=1)
 ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 
rows=4 width=26) (actual time=0.039..0.041 rows=1 loops=7)
   Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id 
= s.compound_id))
 Total runtime: 4344.222 ms
(17 rows)


If I only query the VERSION table, it's very fast:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
vn.version_id
-> from plus p
-> join sample s on (p.compound_id = s.compound_id and p.supplier_id = 
s.supplier_id)
-> join version vn on (s.version_id = vn.version_id)
-> where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price;

Sort  (cost=45.73..45.74 rows=1 width=19) (actual time=32.438..32.448 rows=10 
loops=1)
   Sort Key: p.price
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=0.00..45.72 rows=1 width=19) (actual 
time=32.309..32.411 rows=10 loops=1)
 ->  Nested Loop  (cost=0.00..36.58 rows=2 width=20) (actual 
time=32.295..32.319 rows=7 loops=1)
   ->  Index Scan using i_version_isosmiles on version vn  
(cost=0.00..8.39 rows=1 width=4) (actual time=32.280..32.281 rows=1 loops=1)
 Index Cond: (isosmiles = 
'Fc1n1B1OC(C)(C)C(C)(C)O1'::text)
   ->  Index Scan using i_sample_version_id on sample s  
(cost=0.00..28.12 rows=6 width=20) (actual time=0.011..0.024 rows=7 loops=1)
 Index Cond: (s.version_id = vn.version_id)
 ->  Index Scan using i_plus_compound_id on plus p  (cost=0.00..4.51 
rows=4 width=26) (actual time=0.010..0.011 rows=1 loops=7)
   Index Cond: ((p.supplier_id = s.supplier_id) AND (p.compound_id 
= s.compound_id))
 Total runtime: 32.528 ms
(12 rows)


Same good performance if I only query the PARENT table:

x=> explain analyze select p.price, p.amount, p.units, s.catalogue_id, 
pn.parent_id from plus p join sample s on (p.compound_id = s.compound_id and 
p.supplier_id = s.supplier_id) join parent pn on (s.parent_id = pn.parent_id) 
where pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price;
QUERY PLAN
---
 Sort  (cost=57.73..57.74 rows=1 width=19) (actual time=43.564..43.564 rows=10 
loops=1)
   Sort Key: p.price
   Sort Method:  quicksort  Memory: 25kB
   ->  Nested Loop  (cost=0.00..57.72 rows=1 width=19) (a

Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-05 Thread Craig James

On 8/5/10 11:28 AM, Kenneth Cox wrote:

I am using PostgreSQL 8.3.7 on a dedicated IBM 3660 with 24GB RAM
running CentOS 5.4 x86_64. I have a ServeRAID 8k controller with 6 SATA
7500RPM disks in RAID 6, and for the OLAP workload it feels* slow
 My current performance is 85MB/s write, 151 MB/s reads


I get 193MB/sec write and 450MB/sec read on a RAID10 on 8 SATA 7200 RPM disks.  
RAID10 seems to scale linearly -- add disks, get more speed, to the limit of 
your controller.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] write barrier question

2010-08-18 Thread Craig James

On 8/18/10 12:24 PM, Samuel Gendler wrote:

With barriers off, I saw a transaction rate of about 1200.  With
barriers on, it was closer to 1050.  The test had a concurrency of 40
in both cases.


I discovered there is roughly 10-20% "noise" in pgbench results after running 
the exact same test over a 24-hour period on a machine with no other activity.  Be sure 
you run your tests enough times to get good statistics unless you're looking at much 
larger differences.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Between

2010-08-27 Thread Craig James

On 8/27/10 5:21 PM, Ozer, Pam wrote:

I have a query that

Select Distinct VehicleId

 From Vehicle

Where VehicleMileage between 0 and 15000.

I have an index on VehicleMileage. Is there another way to put an index on a 
between? The index is not being picked up. It does get picked up when I run

Select Distinct VehicleId

 From Vehicle

Where VehicleMileage = 15000.

I just want to make sure that there is not a special index I should be using.


You need to post EXPLAIN ANALYZE of your query. It could be that an index scan 
is actually not a good plan (for example, a sequential scan might be faster if 
most of your vehicles have low mileage).  Without the EXPLAIN ANALYZE, there's 
no way to say what's going on.

Did you ANALYZE your database after you loaded the data?

Craig


Thanks

*Pam Ozer*




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Held idle connections vs use of a Pooler

2010-09-14 Thread Craig James

On 9/14/10 9:10 AM, mark wrote:

Hello,

I am relatively new to postgres (just a few months) so apologies if
any of you are bearing with me.

I am trying to get a rough idea of the amount of bang for the buck I
might see if I put in a connection pooling service into the enviroment
vs our current methodology of using persistent open connections.

We have a number of in house applications that connect to a central
Postgres instance. (8.3.7). The box is admitting underpowered with
only 8 cores, and 8gb or ram and not great disk IO out of an MSA-70.
the database is about 35GB on disk and does mainly (~95%) OTLP type
queries. I am currently begging for more ram.

Most of the connections from the various apps hold idle connections
until they need to execute a query once done go back to holding an
open idle connection.  (there are ~600 open connections at any given
time, and most of the time most are idle)

this is typically fine while the number of active queries is low, but
some other application (that doesn't use connection pooling or holding
open connections when not in use) is hitting the db from time to time
with 50-100 small queries (2ms queries from my testing) nearly all at
once. when this happens the whole response time goes out the door
however).


While connection pooling may be a good answer for you, there also appears to be a 
problem/bug in 8.3.x that may be biting you.  My installation is very similar to yours 
(hundreds of idle "lightweight" connections, occasional heavy use by certain 
apps).  Look at this thread:

  http://archives.postgresql.org/pgsql-performance/2010-04/msg00071.php

On the server that's been upgraded to 8.4.4, we're not seeing this problem.  
But it's not in full production yet, so I can't say for sure that the CPU 
spikes are gone.

(Unfortunately, the archives.postgresql.org HTML formatting is horrible -- why 
on Earth can't it wrap lines?)

Craig




I think from reading this list for a few weeks the answer is move to
using connection pooling package elsewhere to better manage incoming
connections, with a lower number to the db.

I am told this will require some re-working of some app code as I
understand pg-pool was tried a while back in our QA environment and
server parts of various in-house apps/scripts/..etc started to
experience show stopping problems.

to help make my case to the devs and various managers I was wondering
if someone could expand on what extra work is having to be done while
queries run and there is a high (500-600) number of open yet idle
connections to db. lots of the queries executed use sub-transactions
if that makes a difference.


basically what I am paying extra for with that many persistent
connections, that I might save if I go to the effort of getting the
in-house stuff to make use of a connection pooler ?


thank you for your time.

..: mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-11 Thread Craig James

On 10/9/10 6:47 PM, Scott Marlowe wrote:

On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel  wrote:

I know that there haven been many discussions on the slowness of count(*) even
when an index is involved because the visibility of the rows has to be
checked. In the past I have seen many suggestions about using triggers and
tables to keep track of counts and while this works fine in a situation where
you know what the report is going to be ahead of time, this is simply not an
option when an unknown WHERE clause is to be used (dynamically generated).
I ran into a fine example of this when I was searching this mailing list,
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3." Obviously at some point count(*) came into play here because
the site made a list of pages (1 2 3 4 5 6>  next). I very commonly make a
list of pages from search results, and the biggest time killer here is the
count(*) portion, even worse yet, I sometimes have to hit the database with
two SELECT statements, one with OFFSET and LIMIT to get the page of results I
need and another to get the amount of total rows so I can estimate how many
pages of results are available. The point I am driving at here is that since
building a list of pages of results is such a common thing to do, there need
to be some specific high speed ways to do this in one query. Maybe an
estimate(*) that works like count but gives an answer from the index without
checking visibility? I am sure that this would be good enough to make a page
list, it is really no big deal if it errors on the positive side, maybe the
list of pages has an extra page off the end. I can live with that. What I
can't live with is taking 13 seconds to get a page of results from 850,000
rows in a table.


99% of the time in the situations you don't need an exact measure, and
assuming analyze has run recently, select rel_tuples from pg_class for
a given table is more than close enough.  I'm sure wrapping that in a
simple estimated_rows() function would be easy enough to do.


First of all, it's not true.  There are plenty of applications that need an 
exact answer.  Second, even if it is only 1%, that means it's 1% of the 
queries, not 1% of people.  Sooner or later a large fraction of developers will 
run into this.  It's probably been the most-asked question I've seen on this 
forum in the four years I've been here.  It's a real problem, and it needs a 
real solution.

I know it's a hard problem to solve, but can we stop hinting that those of us 
who have this problem are somehow being dense?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] UUID performance as primary key

2010-10-16 Thread Craig James

On 10/15/10 6:58 PM, Navkirat Singh wrote:

I am interested in finding out the pros/cons of using UUID as a
 primary key field. My requirement states that UUID would be perfect
 in my case as I will be having many small databases which will link
up to a global database using the UUID. Hence, the need for a unique
 key across all databases.


You left out one piece of information: How many keys per second do you need?

We put a sequence in the global database that all secondary databases use to 
get their IDs.  It means an extra connect/disconnect (a pooler can minimize 
this), so if you're issuing thousands of IDs per second, this isn't a good 
idea.  But for a small-ish number of IDs per second, it gets you the benefit of 
a universal ID without the size of the UUID field.

Craig (the other one)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-24 Thread James Mansion

Kevin Grittner wrote:

On what do you base that assumption?  I assume that we send a full
8K to the OS cache, and the file system writes disk sectors
according to its own algorithm.  With either platters or BBU cache,
the data is persisted on fsync; why do you see a risk with one but
not the other?
  
Surely 'the data is persisted sometime after our write and before the 
fsynch returns, but

may be written:
- in small chunks
- out of order
- in an unpredictable way'

When I looked at the internals of TokyoCabinet for example, the design 
was flawed but
would be 'fairly robust' so long as mmap'd pages that were dirtied did 
not get persisted

until msync, and were then persisted atomically.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPUs for new databases

2010-10-26 Thread James Cloos
>>>>> "JB" == Josh Berkus  writes:

JB> In a general workload, fewer faster cores are better.  We do not scale
JB> perfectly across cores.  The only case where that's not true is
JB> maintaining lots of idle connections, and that's really better dealt
JB> with in software.

I've found that ram speed is the most limiting factor I've run into for
those cases where the db fits in RAM.  The less efficient lookups run
just as fast when the CPU is in powersving mode as in performance, which
implies that the cores are mostly waiting on RAM (cache or main).

I suspect cache size and ram speed will be the most important factors
until the point where disk i/o speed and capacity take over.

I'm sure some db applications run computaionally expensive queries on
the server, but most queries seem light on computaion and heavy on
gathering and comparing.

It can help to use recent versions of gcc with -march=native.  And
recent versions of glibc offer improved string ops on recent hardware.

-JimC
-- 
James Cloos  OpenPGP: 1024D/ED7DAEA6

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-28 Thread James Mansion

Tom Lane wrote:

The other and probably worse problem is that there's no application
control over how soon changes to mmap'd pages get to disk.  An msync
will flush them out, but the kernel is free to write dirty pages sooner.
So if they're depending for consistency on writes not happening until
msync, it's broken by design.  (This is one of the big reasons we don't
use mmap'd space for Postgres disk buffers.)
  
Well, I agree that it sucks for the reason you give - but you use write 
and that's *exactly* the
same in terms of when it gets written, as when you update a byte on an 
mmap'd page.


And you're quite happy to use write.

The only difference is that its a lot more explicit where the point of 
'maybe its written and maybe

it isn't' occurs.

There need be no real difference in the architecture for one over the 
other: there does seem to be
evidence that write and read can have better forward-read and 
write-behind behaviour, because
read/write does allow you to initiate an IO with a hint to a size that 
exceeds a hardware page.


And yes, after getting into the details while starting to port TC to 
Windows, I decided to bin
it.  Especially handy that SQLite3 has WAL now.  (And one last dig - TC 
didn't even
have a checksum that would let you tell when it had been broken: but it 
might all be fixed now

of course, I don't have time to check.)

James


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   6   >