from:"matt"

Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-23 Thread matt

> Personally, I've been unimpressed by Dell/Xeon; I think the Sun might do
> better than you think, comparitively.On all the Dell servers I've used
> so
> far, I've not seen performance that comes even close to the hardware
> specs.

It's true that any difference will be far less than the GHz ratio, and I
can't really speak for Dell servers in general, but a pair of 2.4GHz Xeons
in a Dell workstation gets about 23 SPECint_rate2000, and a pair of 1GHz
UltraSparc IIIs in a SunFire V210 gets 10.  The ratios are the same for
other non-FP benchmarks.

Now the Suns do have some architectural advantages, and they used to have
far superior memory bandwidth than intel boxes, and they often still do
for more than 2 cpus, and definitely do for more than four.  But my
personal experience is that for 4 cpus or less the entry level UNIX
offerings from Sun/IBM/HP fell behind in raw performance (FP excepted) two
or three years ago.  The posh hardware's an entirely different matter of
course.

On the other hand, I can think of innumerable non performance related
reasons to buy a 'real UNIX box' as a low end DB server.  CPU performance
is way down the priority list compared with IO throughput, stability,
manageability, support, etc etc.

Given that the original question was about a very heavily write-oriented
environment, I'd take the Sun every day of the week, assuming that those
compile option changes have sorted out the oddly slow PG performance at
last.

M

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-23 Thread matt

> Indeed, if our Suns actually diabled the broken hardware when they
> died, fell over, and rebooted themselves, I'd certainly praise them
> to heaven.  But I have to say that the really very good reporting of
> failing memory has saved me some headaches.

Ha!  Yes, it would seem the obvious thing to do - but as you say, at least
you get told what borked and may even be able to remove it without
stopping the machine.  Sometimes.  Or at least you get a nice lunch from
your Sun reseller.

> I should say, also, that my initial experience of AIX has been
> extremely good.  I can't comment on the fun it might involve in the
> long haul, of course.

The current crop of power4+ boxen is reputed to even be able to recover
from a failed CPU without a restart.  Not *always* one imagines, but
usefully often enough for the banking mob to get sweaty over the feature. 
More importantly though, IBM seems committed to supporting all this
goodness under Linux too  (though not BSD I fear - sorry Bruce)

Now if these vendors could somehow eliminate downtime due to human error
we'd be talking *serious* reliablity.

M

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] RAID or manual split?

2004-06-03 Thread matt

> It seems, that if I know the type and frequency of the queries a
> database will be seeing, I could split the database by hand over
> multiple disks and get better performance that I would with a RAID array
> with similar hardware.

Unlikely, but possible if you had radically different hardware for
different tables.

> Six large (3-7 Mrow) 'summary' tables, each being updated continuously
> by 5-20 processes with about 0.5 transactions/second/process.

Well you should get close to an order of magnitude better performance from
a RAID controller with write-back cache on those queries.

> Periodically (currently every two weeks), join queries are
> performed between one of the 'summary' tables(same one each time) and
> each of the other five.  Each join touches most rows of both tables,
> indexes aren't used.  Results are written into a separate group of
> 'inventory' tables (about 500 Krow each), one for each join.

The more disks the data is spread over the better (the RAID controller
will help here with striping).

> There are frequent (100-1000/day) queries of both the
> inventory and summary tables using the primary key -- always using the
> index and returning < 10 rows.

RAM is what you need, to cache the data and indexes, and then as much CPU
power as you can get.

> We're currently getting (barely) acceptable performance from a single
> 15k U160 SCSI disk, but db size and activity are growing quickly.
> I've got more disks and a battery-backed LSI card on order.

3 or more disks in a stripe set, with write back caching, will almost
certainly give a huge performance boost.  Try that first, and only if you
have issues should you think about futzing with symlinks etc.

M

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance Bottleneck

2004-08-10 Thread matt


> Squid also takes away the work of doing SSL (presuming you're running it
> on a different machine). Unfortunately it doesn't support HTTP/1.1 which
> means that most generated pages (those that don't set Content-length) end
> up forcing squid to close and then reopen the connection to the web
> server.

It is true that it doesn't support http/1.1, but 'most generated pages'? 
Unless they are actually emitted progressively they should have a
perfectly good content-length header.

> I've also had some problems when Squid had a large number of connections
> open (several thousand); though that may have been because of my
> half_closed_clients setting. Squid 3 coped a lot better when I tried it
> (quite a few months ago now - and using FreeBSD and the special kqueue
> system call) but crashed under some (admittedly synthetic) conditions.

It runs out of the box with a very conservative setting for max open file
descriptors - this may or may not be the cause of the problems you have
seen.  Certainly I ran squid with >16,000 connections back in 1999...

> You still have periods of time when the web servers are busy using their
> CPUs to generate HTML rather than waiting for database queries. This is
> especially true if you cache a lot of data somewhere on the web servers
> themselves (which, in my experience, reduces the database load a great
> deal). If you REALLY need to reduce the number of connections (because you
> have a large number of web servers doing a lot of computation, say) then
> it might still be useful.

Aha, a postgres related topic in this thread!  What you say is very true,
but then given that the connection overhead is so vanishingly small, why
not simply run without a persistent DB connection in this case?  I would
maintain that if your webservers are holding open idle DB connections for
so long that it's a problem, then simply close the connections!

M

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL performance problem -> tuning

2003-08-14 Thread matt


> Mem: 71M Active, 23M Inact, 72M Wired, 436K Cache, 48M Buf, 208M Free
> Swap: 368M Total, 2852K Used, 366M Free
> 
>   It's right that I can figure that I can use 384M (total RAM) - 72M 
> (wired) - 48M (buf) = 264M for PostgreSQL.
>   Hence, if I set effective_cache_size to 24M (3072 8K blocks), 
> reasonable value (less than 240M, say 48M) for sort_mem, some value for 
> shared_buffers (i.e. 24M, or 6144 4K blocks (FreeBSD), or 3072 8K blocks 
> (PostgreSQL)), and rest of RAM 264M (total free with OS cache) - 24M 
> (reserved for OS cache) - 48M (sort) - 24M (shared) = 168M PostgreSQL 
> allocate dynamically by himself?

Totally, utterly the wrong way around.

Start with 384M, subtract whatever is in use by other processes,
excepting kernel disk cache, subtract your PG shared buffers, subtract
(PG proc size + PG sort mem)*(max number of PG processes you need to run
- should be same as max_connections if thinking conservatively), leave
some spare room so you can ssh in without swapping, and *the remainder*
is what you should set effective_cache_size to.  This is all in the
docs.

The key thing is:  set effective_cache_size *last*.  Note that Postgres
assumes your OS is effective at caching disk blocks, so if that
assumption is wrong you lose performance.

Also, why on _earth_ would you need 48MB for sort memory?  Are you
seriously going to run a query that returns 48M of data and then sort
it, on a machine with 384M of RAM?

M




---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] Hardware recommendations to scale to silly load

2003-08-27 Thread matt

I'm wondering if the good people out there could perhaps give me some
pointers on suitable hardware to solve an upcoming performance issue. 
I've never really dealt with these kinds of loads before, so any
experience you guys have would be invaluable.  Apologies in advance for
the amount of info below...

My app is likely to come under some serious load in the next 6 months,
but the increase will be broadly predictable, so there is time to throw
hardware at the problem.

Currently I have a ~1GB DB, with the largest (and most commonly accessed
and updated) two tables having 150,000 and 50,000 rows.

A typical user interaction with the system involves about 15
single-table selects, 5 selects with joins or subqueries, 3 inserts, and
3 updates.  The current hardware probably (based on benchmarking and
profiling) tops out at about 300 inserts/updates *or* 2500 selects per
second.

There are multiple indexes on each table that updates & inserts happen
on.  These indexes are necessary to provide adequate select performance.

Current hardware/software:
Quad 700MHz PIII Xeon/1MB cache
3GB RAM
RAID 10 over 4 18GB/10,000rpm drives
128MB battery backed controller cache with write-back enabled
Redhat 7.3, kernel 2.4.20
Postgres 7.2.3 (stock redhat issue)

I need to increase the overall performance by a factor of 10, while at
the same time the DB size increases by a factor of 50.  e.g. 3000
inserts/updates or 25,000 selects per second, over a 25GB database with
most used tables of 5,000,000 and 1,000,000 rows.

Notably, the data is very time-sensitive, so the active dataset at any
hour is almost certainly going to be more on the order of 5GB than 25GB
(plus I'll want all the indexes in RAM of course).

Also, and importantly, the load comes but one hour per week, so buying a
Starfire isn't a real option, as it'd just sit idle the rest of the
time.  I'm particularly interested in keeping the cost down, as I'm a
shareholder in the company!

So what do I need?  Can anyone who has (or has ever had) that kind of
load in production offer any pointers, anecdotes, etc?  Any theoretical
musings also more than welcome.  Comments upon my sanity will be
referred to my doctor.

If the best price/performance option is a second hand 32-cpu Alpha
running VMS I'd be happy to go that way ;-)

Many thanks for reading this far.

Matt




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt

> You probably, more than anything, should look at some kind of 
> superfast, external storage array

Yeah, I think that's going to be a given.  Low end EMC FibreChannel
boxes can do around 20,000 IOs/sec, which is probably close to good
enough.

You mentioned using multiple RAID controllers as a boost - presumably
the trick here is to split the various elements (WAL, tables, indexes)
across different controllers using symlinks or suchlike?  Can I feasibly
split the DB tables across 5 or more controllers?

> > Also, and importantly, the load comes but one hour per week, so buying a
> > Starfire isn't a real option, as it'd just sit idle the rest of the
> > time.  I'm particularly interested in keeping the cost down, as I'm a
> > shareholder in the company!
> 
> Interesting.  If you can't spread the load out, can you batch some parts 
> of it?  Or is the whole thing interactive therefore needing to all be 
> done in real time at once?

All interactive I'm afraid.  It's a micropayment system that's going to
be used here in the UK to do online voting for a popular TV programme. 
The phone voting system has a hard limit of [redacted] million votes per
hour, and the producers would like to be able to tell people to vote
online if the phone lines are busy.  They can vote online anyway, but we
expect the average viewer to have to make 10 calls just to get through
during peak times, so the attraction is obvious.

> whether you like it or not, you're gonna need heavy iron if you need to do 
> this all in one hour once a week.

Yeah, I need to rent a Starfire for a month later this year, anybody got
one lying around?  Near London?

> Actually, I've seen stuff like that going on Ebay pretty cheap lately.  I 
> saw a 64 CPU E10k (366 MHz CPUs) with 64 gigs ram and 20 hard drives going 
> for $24,000 a month ago.  Put Linux or BSD on it and Postgresql should 
> fly.

Jeez, and I thought I was joking about the Starfire.  Even Slowaris
would be OK on one of them.

The financial issue is that there's just not that much money in the
micropayments game for bursty sales.  If I was doing these loads
*continuously* then I wouldn't be working, I'd be in the Maldives :-)

I'm also looking at renting equipment, or even trying out IBM/HP's
'on-demand' offerings.




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt

> Don't know how "cheap" they are.
> 
> I have an app that does large batch updates. I found that if I dropped
> the indexes, did the updates and recreated the indexes, it was faster
> than doing the updates while the indexes were intact.

Yeah, unfortunately it's not batch work, but real time financial work. 
If I drop all the indexes my select performance goes through the floor,
as you'd expect.

> Does noatime make much difference on a PostgreSQL database?  I haven't
> tested that yet.

Yup, it does.  In fact it should probably be in the standard install
documentation (unless someone has a reason why it shouldn't).  Who
*cares* when PG last looked at the tables?  If 'nomtime' was available
that would probably be a good thing too.

> Can you split it onto multiple boxes?  Some database layouts lend themselves
> to this, others don't.  Obviously you can't do joins from one server to
> another, so you may lose more in multiple queries than you gain by having
> multiple servers.  It's worth looking into though.

I'm considering that.  There are some tables which I might be able to
split out.  There amy even be some things I can pull from the DB
altogether (session info in particular, so long as I can reliably send a
given user's requests to the same app server each time, bearing in mind
I can't see the cookies too easily because 50% of the requests are over
SSL)

> I know my answers aren't quite the ones you were looking for, but my
> experience is that many people try to solve poor application design
> by simply throwing bigger hardware at the problem.  It appears as though
> you've already done your homework, though.

Well, I *hope* that's the case!  The core issue is simply that we have
to deal with an insane load for 1 hour a week, and there's just no
avoiding it.

Maybe I can get Sun/HP/IBM to lend some gear (it's a pretty high-profile
site).


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt

> Are you sure?  Have you tested the overall application to see if possibly
> you gain more on insert performance than you lose on select performanc?

Unfortunately dropping any of the indexes results in much worse select
performance that is not remotely clawed back by the improvement in
insert performance.

Actually there doesn't really seem to *be* that much improvement in
insert performance when going from 3 indexes to 2.  I guess indexes must
be fairly cheap for PG to maintain?

> It's possible that compiling Postgres manually with proper optimizations
> could yield some improvements, as well as building a custom kernel in
> Redhat.
> 
> Also, you don't mention which filesystem you're using:
> http://www.potentialtech.com/wmoran/postgresql.php

Yeah, I can imagine getting 5% extra from a slim kernel and
super-optimised PG.

The FS is ext3, metadata journaling (the default), mounted noatime.

> But if you're in the situation where you have more time than money,
> you may find that an overall audit of your app is worthwhile.  Consider
> taking parts that are in perl (for example) and recoding them into C
> (that is, unless you've already identified that all the bottlenecks are
> at the PostgreSQL server)

I can pretty cheaply add more CPU horsepower for the app servers, as
they scale horizontally, so I can chuck in a couple (or 3, or 4, or ...)
more dual-cpu boxen with a gig of ram and tell the load balancer about
them.  The problem with the DB is that that approach simply won't work -
the box just has to get bigger!

> I doubt if the suggestions I've made are going to get you 10x, but they
> may get you 2x, and then you only need the hardware to do 5x.

It all helps :-)  A few percent here, a few percent there, pretty soon
you're talking serious improvements...

Thanks

Matt


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Hardware recommendations to scale to silly load

2003-08-28 Thread matt

> Are you *sure* about that  3K updates/inserts per second xlates
> to 10,800,000 per hour.  That, my friend, is a WHOLE HECK OF A LOT!

Yup, I know!  

> During the 1 hour surge, will SELECTs at 10 minutes after the 
> hour depend on INSERTs at 5 minutes after the hour?

Yes, they do.  It's a payments system, so things like account balances
and purchase histories have to be updated in real time.

> Only one hour out of 168?  May I ask what kind of app it is?

Online voting for an unnamed TV show...

> > If the best price/performance option is a second hand 32-cpu Alpha
> > running VMS I'd be happy to go that way ;-)
> 
> I'd love to work on a GS320!  You may even pick one up for a million
> or 2.  The license costs for VMS & Rdb would eat you, though.

You'd be amazed how little they do go for actually :-)




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Disappointing performance in db migrated from MS SQL

2004-02-13 Thread matt

> Josh, the disks in the new system should be substantially faster than
> the old. Both are Ultra160 SCSI RAID 5 arrays, but the new system has
> 15k RPM disks, as opposed to the 10k RPM disks in the old system.

Spindle speed does not correlate with 'throughput' in any easy way.  What
controllers are you using for these disks?

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

[PERFORM] Prefetch

2005-05-10 Thread Matt Olson

I wanted to get some opinions about row prefetching.  AFAIK, there is no
prefetching done by PostgreSQL; all prefetching is delegated to the operating
system.

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has
around 30,000 records (tickers).  A typical operation is to get the 200 day
simple moving average (of price) for each ticker and write the result to a
summary table.  In running this process (Perl/DBI), it is typical to see
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next
day's date, the postgres cache and file cache is now populated with 199 days
of the needed data, postgres runs 80-90% of CPU and total run time is greatly
reduced.  My conclusion is that this is a high cache hit rate in action.

I've done other things that make sense, like using indexes, playing with the
planner constants and turning up the postgres cache buffers.

Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields no
apparent difference in database performance.  The random nature of the I/O
drops disk reads down to about 1MB/sec for the array.  A linear table scan
can easily yield 70-80MB/sec on this system.  Total table size is usually
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has
there been serious consideration of implementing something like a prefetch
subsystem?  Does anyone have any opinions as to why this would be a bad idea
for postgres?

Postges is great for a multiuser environment and OLTP applications.  However,
in this set up, a data warehouse, the observed performance is not what I
would hope for.

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Prefetch

2005-05-10 Thread Matt Olson

My postgres binaries and WAL are on a separate disk from the raid array.  The 
table I'm doing the selects from is probably about 4GB in size and 18-20 
million records.  No concurrent or dependent inserts or deletes are going on.

Tom's point and your points about optimizing the application are well taken.  
I know my approach is sub optimal and prone to getting caught by latency 
issues (seek times, cache hit rates, etc.).  However, the question of 
prefetch in my mind is all about eliminating latencies, so, I thought my 
problem would be good for the sake of discussing prefetching.

The two approaches I'm in the process of testing are Rod and Greg's suggestion 
of using 'CLUSTER'.  And for the sake of not letting a good idea get away, 
I'll probably spend time on doing a parallel query approach which Tom 
suggested.  

I'll report back to the list what I find and maybe do some _rough_ 
benchmarking.  This is a production app, so I can't get too much in the way 
of the daily batches.  

-- 
Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

On Tuesday 10 May 2005 11:13 am, Greg Stark wrote:
> Matt Olson writes:
> > I've done other things that make sense, like using indexes, playing with
> > the planner constants and turning up the postgres cache buffers.
> >
> > Even playing with extream hdparm read-ahead numbers (i.e. 64738) yields
> > no apparent difference in database performance.  The random nature of the
> > I/O drops disk reads down to about 1MB/sec for the array.  A linear table
> > scan can easily yield 70-80MB/sec on this system.  Total table size is
> > usually around 1GB and with indexes should be able to fit completely in
> > main memory.
>
> Actually forcing things to use indexes is the wrong direction to go if
> you're trying to process lots of data and want to stream it off disk as
> rapidly as possible. I would think about whether you can structure your
> data such that you can use sequential scans. That might mean partitioning
> your raw data into separate tables and then accessing only the partitions
> that are relevant to the query.
>
> In your application that might be hard. It sounds like you would need more
> or less one table per stock ticker which would really be hard to manage.
>
> One thing you might look into is using the CLUSTER command. But postgres
> doesn't maintain the cluster ordering so it would require periodically
> rerunning it.
>
> I'm a bit surprised by your 1MB/s rate. I would expect to see about 10MB/s
> even for completely random reads. Is it possible you're seeing something
> else interfering? Do you have INSERT/UPDATE/DELETE transactions happening
> concurrently with this select scan? If so you should strongly look into
> separating the transaction log from the data files.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Prefetch

2005-05-15 Thread Matt Olson

I wanted to get some opinions about row prefetching.  AFAIK, there is no 
prefetching done by PostgreSQL; all prefetching is delegated to the operating 
system.  

The hardware (can't say enough good things about it):

Athlon 64, dual channel
4GB ram
240GB usable 4 disk raid5 (ATA133)
Fedora Core 3
PostgreSQL 7.4.7

I have what is essentially a data warehouse of stock data.  Each day has 
around 30,000 records (tickers).  A typical operation is to get the 200 day 
simple moving average (of price) for each ticker and write the result to a 
summary table.  In running this process (Perl/DBI), it is typical to see 
70-80% I/O wait time with postgres running a about 8-9%.   If I run the next 
day's date, the postgres cache and file cache is now populated with 199 days 
of the needed data, postgres runs 80-90% of CPU and total run time is greatly 
reduced.  My conclusion is that this is a high cache hit rate in action.  

I've done other things that make sense, like using indexes, playing with the 
planner constants and turning up the postgres cache buffers.  

Even playing with extream hdparm read-ahead numbers (i.e. 64738), there is no 
apparent difference in database performance.  The random nature of the I/O 
drops disk reads down to about 1MB/sec for the array.  A linear table scan 
can easily yield 70-80MB/sec on this system.  Total table size is usually 
around 1GB and with indexes should be able to fit completely in main memory.

Other databases like Oracle and DB2 implement some sort of row prefetch.  Has 
there been serious consideration of implementing something like a prefetch 
subsystem?  Does anyone have any opinions as to why this would be a bad idea 
for postgres?  

Postges is great for a multiuser environment and OLTP applications.  However, 
in this set up, a data warehouse, the observed performance is not what I 
would hope for.  

Regards,

Matt Olson
Ocean Consulting
http://www.oceanconsulting.com/

---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] pgsql upgrade

2006-12-04 Thread Matt Chambers

Hi all, I've run into an issue with a Pg 7.4.6 to 8.1.5 upgrade along 
with hardware upgrade. 
I moved the database from a 2x 3.0ghz Xeon (512kb w/HT) to a 2x Opteron 
250.  The database is in
memory on a tmpfs partition. (the application can rebuild the db during 
total system failure)


When I first switched it over, the results were exactly what I expected. 
I was sustaining about
a 2.2 on the Xeon and with the Opteron, about a 1.5 with the same 
traffic. The box is highload,

it ouputs about 15mbps.

After a couple hours working perfectly the ' system' (vs user) load 
jumped from
a 3% of total CPU usage, to 30%, a 10x increase, and postgres started to 
write out
to data/base at a fairly fast rate.  The CPU context switch rate doubled 
at the same time.

Iowait, which was historically 0 on the 7.4 box, went to 0.08.

Strangely enough, a vacuum (not full or analyze) stopped postgres from 
writing to data/base
but the strange load pattern remains. (system is ~30% of the overall 
load, vs 3% before)


So, my question is, what happened, and how can I get it back to the same 
load pattern

7.4.6 had, and the same pattern I had for 4 hours before it went crazy?


Matt




















---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[PERFORM] Sunfire X4500 recommendations

2007-03-22 Thread Matt Smiley

My company is purchasing a Sunfire x4500 to run our most I/O-bound databases, 
and I'd like to get some advice on configuration and tuning.  We're currently 
looking at:
 - Solaris 10 + zfs + RAID Z
 - CentOS 4 + xfs + RAID 10
 - CentOS 4 + ext3 + RAID 10
but we're open to other suggestions.

>From previous message threads, it looks like some of you have achieved stellar 
>performance under both Solaris 10 U2/U3 with zfs and CentOS 4.4 with xfs.  
>Would those of you who posted such results please describe how you tuned the 
>OS/fs to yield those figures (e.g. patches, special drivers, read-ahead, 
>checksumming, write-through cache settings, etc.)?

Most of our servers currently run CentOS/RedHat, and we have little experience 
with Solaris, but we're not opposed to Solaris if there's a compelling reason 
to switch.  For example, it sounds like zfs snapshots may have a lighter 
performance penalty than LVM snapshots.  We've heard that just using LVM (even 
without active snapshots) imposes a maximum sequential I/O rate of around 600 
MB/s (although we haven't yet reached this limit experimentally).

By the way, we've also heard that Solaris is "more stable" under heavy I/O load 
than Linux.  Have any of you experienced this?  It's hard to put much stock in 
such a blanket statement, but naturally we don't want to introduce 
instabilities.

Thanks in advance for your thoughts!

For reference:

Our database cluster will be 3-6 TB in size.  The Postgres installation will be 
8.1 (at least initially), compiled to use 32 KB blocks (rather than 8 KB).  The 
workload will be predominantly OLAP.  The Sunfire X4500 has 2 dual-core 
Opterons, 16 GB RAM, 48 SATA disks (500 GB/disk * 48 = 24 TB raw -> 12 TB 
usable under RAID 10).

So far, we've seen the X4500 deliver impressive but suboptimal results using 
the out-of-the-box installation of Solaris + zfs.  The Linux testing is in the 
early stages (no xfs, yet), but so far it yeilds comparatively modest write 
rates and very poor read and rewrite rates.

===
Results under Solaris with zfs:
===

Four concurrent writers:
% time dd if=/dev/zero of=/zpool1/test/50GB-zero1 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero2 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero3 bs=1024k count=51200 ; time 
sync
% time dd if=/dev/zero of=/zpool1/test/50GB-zero4 bs=1024k count=51200 ; time 
sync

Seq Write (bs = 1 MB):  128 + 122 + 131 + 124 = 505 MB/s

Four concurrent readers:
% time dd if=/zpool1/test/50GB-zero1 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero2 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero3 of=/dev/null bs=1024k
% time dd if=/zpool1/test/50GB-zero4 of=/dev/null bs=1024k

Seq Read (bs = 1 MB):   181 + 177 + 180 + 178 = 716 MB/s


One bonnie++ process:
% bonnie++ -r 16384 -s 32g:32k -f -n0 -d /zpool1/test/bonnie_scratch

Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
thumper132G:32k   604173  98 268893  43   543389  59 519.2  
 3
thumper1,32G:32k,,,604173,98,268893,43,,,543389,59,519.2,3,


4 concurrent synchronized bonnie++ processes:
% bonnie++ -p4
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -r 16384 -s 32g:32k -y -f -n0 -d /zpool1/test/bonnie_scratch
% bonnie++ -p-1

Combined results of 4 sessions:
Seq Output:   124 + 124 + 124 + 140 = 512 MB/s
Rewrite:   93 +  94 +  93 +  96 = 376 MB/s
Seq Input:192 + 194 + 193 + 197 = 776 MB/s
Random Seek:  327 + 327 + 335 + 332 = 1321 seeks/s


=
Results under CentOS 4 with ext3 and LVM:
=

% bonnie++ -s 32g:32k -f -n0 -d /large_lvm_stripe/test/bonnie_scratch
Version  1.03   --Sequential Output-- --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
thumper1.rt 32G:32k   346595  94 59448  11   132471  12 479.4   
2
thumper1.rtkinternal,32G:32k,,,346595,94,59448,11,,,132471,12,479.4,2,



Summary of bonnie++ results:


   sequential  sequentialsequential  scattered
Test case  write MB/s  rewrite MB/s  read MB/s   seeks/s
-  --    --  -
Sol10+zfs, 1 process  604   269 543519
Sol10+zfs, 4 processes512   376

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-23 Thread Matt Smiley

Thanks Dimitri!  That was very educational material!  I'm going to think out 
loud here, so please correct me if you see any errors.

The section on tuning for OLTP transactions was interesting, although my OLAP 
workload will be predominantly bulk I/O over large datasets of 
mostly-sequential blocks.

The NFS+ZFS section talked about the zil_disable control for making zfs ignore 
commits/fsyncs.  Given that Postgres' executor does single-threaded synchronous 
I/O like the tar example, it seems like it might benefit significantly from 
setting zil_disable=1, at least in the case of frequently flushed/committed 
writes.  However, zil_disable=1 sounds unsafe for the datafiles' filesystem, 
and would probably only be acceptible for the xlogs if they're stored on a 
separate filesystem and you're willing to loose recently committed 
transactions.  This sounds pretty similar to just setting fsync=off in 
postgresql.conf, which is easier to change later, so I'll skip the zil_disable 
control.

The RAID-Z section was a little surprising.  It made RAID-Z sound just like 
RAID 50, in that you can customize the trade-off between iops versus usable 
diskspace and fault-tolerance by adjusting the number/size of parity-protected 
disk groups.  The only difference I noticed was that RAID-Z will apparently set 
the stripe size across vdevs (RAID-5s) to be as close as possible to the 
filesystem's block size, to maximize the number of disks involved in 
concurrently fetching each block.  Does that sound about right?

So now I'm wondering what RAID-Z offers that RAID-50 doesn't.  I came up with 2 
things: an alleged affinity for full-stripe writes and (under RAID-Z2) the 
added fault-tolerance of RAID-6's 2nd parity bit (allowing 2 disks to fail per 
zpool).  It wasn't mentioned in this blog, but I've heard that under certain 
circumstances, RAID-Z will magically decide to mirror a block instead of 
calculating parity on it.  I'm not sure how this would happen, and I don't know 
the circumstances that would trigger this behavior, but I think the goal (if it 
really happens) is to avoid the performance penalty of having to read the rest 
of the stripe required to calculate parity.  As far as I know, this is only an 
issue affecting small writes (e.g. single-row updates in an OLTP workload), but 
not large writes (compared to the RAID's stripe size).  Anyway, when I saw the 
filesystem's intent log mentioned, I thought maybe the small writes are 
converted to full-stripe writes by deferring their commit until a full stripe's 
worth of data had been accumulated.  Does that sound plausible?

Are there any other noteworthy perks to RAID-Z, rather than RAID-50?  If not, 
I'm inclined to go with your suggestion, Dimitri, and use zfs like RAID-10 to 
stripe a zpool over a bunch of RAID-1 vdevs.  Even though many of our queries 
do mostly sequential I/O, getting higher seeks/second is more important to us 
than the sacrificed diskspace.

For the record, those blogs also included a link to a very helpful ZFS Best 
Practices Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

To sum up, so far the short list of tuning suggestions for ZFS includes:
 - Use a separate zpool and filesystem for xlogs if your apps write often.
 - Consider setting zil_disable=1 on the xlogs' dedicated filesystem.  ZIL is 
the intent log, and it sounds like disabling it may be like disabling 
journaling.  Previous message threads in the Postgres archives debate whether 
this is safe for the xlogs, but it didn't seem like a conclusive answer was 
reached.
 - Make filesystem block size (zfs record size) match the Postgres block size.
 - Manually adjust vdev_cache.  I think this sets the read-ahead size.  It 
defaults to 64 KB.  For OLTP workload, reduce it; for DW/OLAP maybe increase it.
 - Test various settings for vq_max_pending (until zfs can auto-tune it).  See 
http://blogs.sun.com/erickustarz/entry/vq_max_pending
 - A zpool of mirrored disks should support more seeks/second than RAID-Z, just 
like RAID 10 vs. RAID 50.  However, no single Postgres backend will see better 
than a single disk's seek rate, because the executor currently dispatches only 
1 logical I/O request at a time.

>>> Dimitri <[EMAIL PROTECTED]> 03/23/07 2:28 AM >>>
On Friday 23 March 2007 03:20, Matt Smiley wrote:
> My company is purchasing a Sunfire x4500 to run our most I/O-bound
> databases, and I'd like to get some advice on configuration and tuning. 
> We're currently looking at: - Solaris 10 + zfs + RAID Z
>  - CentOS 4 + xfs + RAID 10
>  - CentOS 4 + ext3 + RAID 10
> but we're open to other suggestions.
>

Matt,

for Solaris + ZFS you may find answers to all your questions here:

  http://blogs.sun.com/roch/category/ZFS
  http://blogs.sun.com/realneel/entry/zfs_and_databases

Think to

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-27 Thread Matt Smiley

Hi Dimitri,

First of all, thanks again for the great feedback!

Yes, my I/O load is mostly read operations.  There are some bulk writes done in 
the background periodically throughout the day, but these are not as 
time-sensitive.  I'll have to do some testing to find the best balance of read 
vs. write speed and tolerance of disk failure vs. usable diskspace.

I'm looking forward to seeing the results of your OLTP tests!  Good luck!  
Since I won't be doing that myself, it'll be all new to me.

About disk failure, I certainly agree that increasing the number of disks will 
decrease the average time between disk failures.  Apart from any performance 
considerations, I wanted to get a clear idea of the risk of data loss under 
various RAID configurations.  It's a handy reference, so I thought I'd share it:



The goal is to calculate the probability of data loss when we loose a certain 
number of disks within a short timespan (e.g. loosing a 2nd disk before 
replacing+rebuilding the 1st one).  For RAID 10, 50, and Z, we will loose data 
if any disk group (i.e. mirror or parity-group) looses 2 disks.  For RAID 60 
and Z2, we will loose data if 3 disks die in the same parity group.  The parity 
groups can include arbitrarily many disks.  Having larger groups gives us more 
usable diskspace but less protection.  (Naturally we're more likely to loose 2 
disks in a group of 50 than in a group of 5.)

g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or 
more; dual-parity = 4 or more)
n = total number of disks
risk of loosing any 1 disk = 1/n
risk of loosing 1 disk from a particular group = g/n
risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1)
risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2)

For the x4500, we have 48 disks.  If we stripe our data across all those disks, 
then these are our configuration options:

RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same 
group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   4824  0.09%
  3  16   4832  0.27%
  4  12   4836  0.53%
  6   8   4840  1.33%
  8   6   4842  2.48%
 12   4   4844  5.85%
 24   2   4846 24.47%
 48   1   4847100.00%

RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose 
data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   48   n/an/a
  3  16   4816  0.01%
  4  12   4824  0.02%
  6   8   4832  0.12%
  8   6   4836  0.32%
 12   4   4840  1.27%
 24   2   4844 11.70%
 48   1   4846100.00%

So, in terms of fault tolerance:
 - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when 
only 2 disks fail.
 - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk 
groups across which to spread the risk.
 - Having more parity groups increases fault tolerance but decreases usable 
diskspace.

That's all assuming each disk has an equal chance of failure, which is probably 
true since striping should distribute the workload evenly.  And again, these 
probabilities are only describing the case where we don't have enough time 
between disk failures to recover the array.

In terms of performance, I think RAID 10 should always be best for write speed. 
 (Since it doesn't calculate parity, writing a new block doesn't require 
reading the rest of the RAID stripe just to recalculate the parity bits.)  I 
think it's also normally just as fast for reading, since the controller can 
load-balance the pending read requests to both sides of each mirror.





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Sunfire X4500 recommendations

2007-03-29 Thread Matt Smiley

Hi David,

Thanks for your feedback!  I'm rather a newbie at this, and I do appreciate the 
critique.

First, let me correct myself: The formulas for the risk of loosing data when 
you loose 2 and 3 disks shouldn't have included the first term (g/n).  I'll 
give the corrected formulas and tables at the end of the email.


> please explain why you are saying that the risk of loosing any 1 disk is 
> 1/n. shouldn't it be probability of failure * n instead?

1/n represents the assumption that all disks have an equal probability of being 
the next one to fail.  This seems like a fair assumption in general for the 
active members of a stripe (not including hot spares).  A possible exception 
would be the parity disks (because reads always skip them and writes always hit 
them), but that's only a consideration if the RAID configuration used dedicated 
disks for parity instead of distributing it across the RAID 5/6 group members.  
Apart from that, whether the workload is write-heavy or read-heavy, sequential 
or scattered, the disks in the stripe ought to handle a roughly equivalent 
number of iops over their lifetime.


> following this logic the risk of loosing all 48 disks in a single group of 
> 48 would be 100%

Exactly.  Putting all disks in one group is RAID 0 -- no data protection.  If 
you loose even 1 active member of the stripe, the probability of loosing your 
data is 100%.


> also what you are looking for is the probability of the second (and third) 
> disks failing in time X (where X is the time nessasary to notice the 
> failure, get a replacement, and rebuild the disk)

Yep, that's exactly what I'm looking for.  That's why I said, "these 
probabilities are only describing the case where we don't have enough time 
between disk failures to recover the array."  My goal wasn't to estimate how 
long time X is.  (It doesn't seem like a generalizable quantity; due partly to 
logistical and human factors, it's unique to each operating environment.)  
Instead, I start with the assumption that time X has been exceeded, and we've 
lost a 2nd (or 3rd) disk in the array.  Given that assumption, I wanted to show 
the probability that the loss of the 2nd disk has caused the stripe to become 
unrecoverable.

We know that RAID 10 and 50 can tolerate the loss of anywhere between 1 and n/g 
disks, depending on how lucky you are.  I wanted to quantify the amount of luck 
required, as a risk management tool.  The duration of time X can be minimized 
with hot spares and attentive administrators, but the risk after exceeding time 
X can only be minimized (as far as I know) by configuring the RAID stripe with 
small enough underlying failure groups.


> the killer is the time needed to rebuild the disk, with multi-TB arrays 
> is't sometimes faster to re-initialize the array and reload from backup 
> then it is to do a live rebuild (the kernel.org servers had a raid failure 
> recently and HPA mentioned that it took a week to rebuild the array, but 
> it would have only taken a couple days to do a restore from backup)

That's very interesting.  I guess the rebuild time also would depend on how 
large the damaged failure group was.  Under RAID 10, for example, I think you'd 
still only have to rebuild 1 disk from its mirror, regardless of how many other 
disks were in the stripe, right?  So shortening the rebuild time may be another 
good motivation to keep the failure groups small.


> add to this the fact that disk failures do not appear to be truely 
> independant from each other statisticly (see the recent studies released 
> by google and cmu), and I wouldn't bother with single-parity for a 

I don't think I've seen the studies you mentioned.  Would you cite them please? 
 This may not be typical of everyone's experience, but what I've seen during 
in-house load tests is an equal I/O rate for each disk in my stripe, using 
short-duration sampling intervals to avoid long-term averaging effects.  This 
is what I expected to find, so I didn't delve deeper.

Certainly it's true that some disks may be more heavily burdened than others 
for hours or days, but I wouldn't expect any bias from an application-driven 
access pattern to persist for a significant fraction of a disk's lifespan.  The 
only influence I'd expect to bias the cumulative I/O handled by a disk over its 
entire life would be its role in the RAID configuration.  Hot spares will have 
minimal wear-and-tear until they're activated.  Dedicated parity disks will 
probably live longer than data disks, unless the workload is very heavily 
oriented towards small writes (e.g. logging).


> multi-TB array. If the data is easy to recreate (including from backup) or 
> short lived (say a database of log data that cycles every month or so) I 
> would just do RAID-0 and plan on loosing the data on drive failure (this 
> assumes that you can afford the loss of service when this happens). if the 
> data is more important then I'd do dual-parity or more, along with a ho

[PERFORM] db performance/design question

2007-09-12 Thread Matt Chambers



I'm designing a system that will be doing over a million inserts/deletes 
on a single table every hour.  Rather than using a single table, it is 
possible for me to partition the data into multiple tables if I wanted 
to, which would be nice because I can just truncate them when I don't 
need them.  I could even use table spaces to split the IO load over 
multiple filers.  The application does not require all this data be in 
the same table.   The data is fairly temporary, it might last 5 seconds, 
it might last 2 days, but it will all be deleted eventually and 
different data will be created.


Considering a single table would grow to 10mil+ rows at max, and this 
machine will sustain about 25mbps of insert/update/delete traffic 24/7 - 
365, will I be saving much by partitioning data like that?


--
-Matt

<http://twiki.spimageworks.com/twiki/bin/view/Software/CueDevelopment>

[PERFORM] Slowing UPDATEs inside a transaction

2011-03-03 Thread Matt Burke

Hi. I've only been using PostgreSQL properly for a week or so, so I
apologise if this has been covered numerous times, however Google is
producing nothing of use.

I'm trying to import a large amount of legacy data (billions of
denormalised rows) into a pg database with a completely different schema,
de-duplicating bits of it on-the-fly while maintaining a reference count.
The procedures to do this have proven easy to write, however the speed is
not pretty. I've spent some time breaking down the cause and it's come down
to a simple UPDATE as evidenced below:


CREATE TABLE foo (a int PRIMARY KEY, b int);
INSERT INTO foo VALUES (1,1);

CREATE OR REPLACE FUNCTION test() RETURNS int AS $$
DECLARE
i int;
BEGIN
FOR i IN 1..1 LOOP
UPDATE foo SET b=b+1 WHERE a=1;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;

When run individually, this function produces the following timing:
Time: 1912.593 ms
Time: 1937.183 ms
Time: 1941.607 ms
Time: 1943.673 ms
Time: 1944.738 ms

However, when run inside a transaction (to simulate a function doing the
same work) I get this:

START TRANSACTION
Time: 0.836 ms
Time: 1908.931 ms
Time: 5370.110 ms
Time: 8809.378 ms
Time: 12274.294 ms
Time: 15698.745 ms
Time: 19218.389 ms


There is no disk i/o and the postgresql process runs 100% cpu.
Server is amd64 FreeBSD 8-STABLE w/16GB RAM running postgresql 9.0.3 from
packages

Looking at the timing of real data (heavily grouped), it seems the speed of
UPDATEs can vary dependent on how heavily updated a row is, so I set out to
produce a test case:

CREATE TABLE foo (a int PRIMARY KEY, b int);
INSERT INTO foo VALUES (1,1),(2,1),(3,1),(4,1);

CREATE OR REPLACE FUNCTION test(int) RETURNS int AS $$
DECLARE
i int;
BEGIN
FOR i IN 1..1 LOOP
UPDATE foo SET b=1 WHERE a=$1;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;
START TRANSACTION;
SELECT test(1);  Time: 1917.305 ms
SELECT test(2);  Time: 1926.758 ms
SELECT test(3);  Time: 1926.498 ms
SELECT test(1);  Time: 5376.691 ms
SELECT test(2);  Time: 5408.231 ms
SELECT test(3);  Time: 5403.654 ms
SELECT test(1);  Time: 8842.425 ms
SELECT test(4);  Time: 1925.954 ms
COMMIT; START TRANSACTION;
SELECT test(1);  Time: 1911.153 ms


As you can see, the more an individual row is updated /within a
transaction/, the slower it becomes for some reason.

Unfortunately in my real-world case, I need to do many billions of these
UPDATEs. Is there any way I can get around this without pulling my huge
source table out of the database and feeding everything in line-at-a-time
from outside the database?



Thanks.


-- 

 
The information contained in this message is confidential and is intended for 
the addressee only. If you have received this message in error or there are any 
problems please notify the originator immediately. The unauthorised use, 
disclosure, copying or alteration of this message is strictly forbidden. 

Critical Software Ltd. reserves the right to monitor and record e-mail messages 
sent to and from this address for the purposes of investigating or detecting 
any unauthorised use of its system and ensuring its effective operation.

Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, 
Keele Science Park, Keele, Staffordshire, ST5 5NH.


This message has been scanned for security threats by iCritical.
For further information, please visit www.icritical.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slowing UPDATEs inside a transaction

2011-03-04 Thread Matt Burke

Robert Haas wrote:
> Old row versions have to be kept around until they're no longer of 
> interest to any still-running transaction.

Thanks for the explanation.

Regarding the snippet above, why would the intermediate history of
multiply-modified uncommitted rows be of interest to anything, or is the
current behaviour simply "cheaper" overall in terms of cpu/developer time?

-- 

The information contained in this message is confidential and is intended for 
the addressee only. If you have received this message in error or there are any 
problems please notify the originator immediately. The unauthorised use, 
disclosure, copying or alteration of this message is strictly forbidden. 

Critical Software Ltd. reserves the right to monitor and record e-mail messages 
sent to and from this address for the purposes of investigating or detecting 
any unauthorised use of its system and ensuring its effective operation.

Critical Software Ltd. registered in England, 04909220. Registered Office: IC2, 
Keele Science Park, Keele, Staffordshire, ST5 5NH.

This message has been scanned for security threats by iCritical.
For further information, please visit www.icritical.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Scaling further up

2004-03-15 Thread Matt Davies

Quoting Andrew Sullivan <[EMAIL PROTECTED]>:

> On Fri, Mar 12, 2004 at 06:25:48PM -0500, Anjan Dave wrote:
> > We upgraded from 8GB to 12GB RAM a month or so ago, but even in the
> > past, I've never seen the system exhaust on it's system cache (~6GB, in
> > 'top'), while it's swapping.
> > 
> > Some one had mentioned why not have the entire DB in memory? How do I
> > configure that, for knowledge?
> 
> You don't.  It'll automatically be in memory if (a) you have enough
> memory, (b) you don't have anything else on the machine using the
> memory, and (c) it's been read at least one time.

This is the preferred method, but you could create a memory disk if running
linux. This has several caveats, though.

1. You may have to recompile the kernel for support.
2. You must store the database on a hard drive partition during reboots.
3. Because of #2 this option is generally useful if you have static content that
is loaded to the MD upon startup of the system. 

You could have some fancy methodology of shutting down the system and then
copying the data to a disk-based filesystem, but this is inherently bad since
at any moment a power outage would erase any updates changes.

The option is there to start with all data in memory, but in general, this is
probablt not what you want. Just an FYI.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-23 Thread Matt Clark

If it's going to be write intensive then the RAID controller will be the most 
important thing.  A dual p3/500 with a write-back
cache will smoke either of the boxes you mention using software RAID on write 
performance.

As for the compute intensive side (complex joins & sorts etc), the Dell will most 
likely beat the Sun by some distance, although
what the Sun lacks in CPU power it may make up a bit in memory bandwidth/latency.

Matt

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Subbiah,
> Stalin
> Sent: 23 March 2004 18:41
> To: 'Andrew Sullivan'; '[EMAIL PROTECTED]'
> Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
>
>
> We are looking into Sun V210 (2 x 1 GHz cpu, 2 gig ram, 5.8Os) vs. Dell 1750
> (2 x 2.4 GHz xeon, 2 gig ram, RH3.0). database will mostly be
> write intensive and disks will be on raid 10. Wondering if 64bit 1 GHz to
> 32bit 2.4 GHz make a big difference here.
>
> Thanks!
>
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Andrew
> Sullivan
> Sent: Tuesday, March 23, 2004 9:37 AM
> To: '[EMAIL PROTECTED]'
> Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
>
>
> On Mon, Mar 22, 2004 at 04:05:45PM -0800, Subbiah, Stalin wrote:
> > being the key performance booster for postgres.  what is the preferred OS
> > for postgres deployment if given an option between linux and solaris. As
>
> One thing this very much depends on is what you're trying to do.
> Suns have a reputation for greater reliability.  While my own
> experience with Sun hardware has been rather shy of sterling, I _can_
> say that it stands head and shoulders above a lot of the x86 gear you
> can get.
>
> If you're planning to use Solaris on x86, don't bother.  Solaris is a
> slow, bloated pig compared to Linux, at least when it comes to
> managing the largish number of processes that Postgres requires.
>
> If pure speed is what you're after, I have found that 2-way, 32 bit
> Linux on P-IIIs compares very favourably to 4 way 64 bit Ultra SPARC
> IIs.
>
> A
>
> --
> Andrew Sullivan  | [EMAIL PROTECTED]
> The fact that technology doesn't work is no bar to success in the
> marketplace.
>   --Philip Greenspun
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
>



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-24 Thread Matt Clark

> > Now if these vendors could somehow eliminate downtime due to human error
> > we'd be talking *serious* reliablity.
>
> You mean making the OS smart enough to know when clearing the arp
> cache is a bonehead operation, or just making the hardware smart
> enough to realise that the keyswitch really shouldn't be turned
> while 40 people are logged in?  (Either way, I agree this'd be an
> improvement.  It'd sure make colocation a lot less painful.)

Well I was joking really, but those are two very good examples!  Yes, machines should 
require extra confirmation for operations like
those.  Hell, even a simple 'init 0' would be well served by a prompt that says "There 
are currently 400 network sockets open, 50
remote users logged in, and 25 disk IOs per second.  What's more, there's nobody 
logged in at the console to boot me up again
afterwards - are you _sure_ you want to shut the machine down?".  It's also crazy that 
there's no prompt after an 'rm -rf' (we could
have 'rm -rf --iacceptfullresponsibility' for an unprompted version).

Stuff like that would have saved me from a few embarrassments in the past for sure ;-)

It drives me absolutely nuts every time I see a $staggeringly_expensive clustered 
server whose sysadmins are scared to do a failover
test in case something goes wrong!  Or which has worse uptime than my desktop PC 
because the cluster software's poorly set up or
administered.  Or which has both machines on the same circuit breaker.  I could go on 
but it's depressing me.

Favourite anecdote:  A project manager friend of mine had a new 'lights out' 
datacenter to set up.  The engineers, admins and
operators swore blind that everything had been tested in every possible way, and that 
incredible uptime was guaranteed.  'So if I
just pull this disk out everything will keep working?' he asked, and then pulled the 
disk out without waiting for an answer...

Ever since he told me that story I've done exactly that with every piece of so-called 
'redundant' hardware a vendor tries to flog
me.  Ask them to set it up, then just do nasty things to it without asking for 
permission.  Less than half the gear makes it through
that filter, and actually you can almost tell from the look on the technical sales 
rep's face as you reach for the
drive/cable/card/whatever whether it will or won't.

M






---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Matt Clark

As a cross-ref to all the 7.4.x tests people have sent in, here's 7.2.3 (Redhat 7.3), 
Quad Xeon 700MHz/1MB L2 cache, 3GB RAM.

Idle-ish (it's a production server) cs/sec ~5000

3 test queries running:
   procs  memoryswap  io system cpu
 r  b  w   swpd   free   buff  cache   si  sobibo   incs   us  sy  id
 3  0  0  23380 577680 105912 2145140   0   0 0 0  107 116890  50  14  35
 2  0  0  23380 577680 105912 2145140   0   0 0 0  114 118583  50  15  34
 2  0  0  23380 577680 105912 2145140   0   0 0 0  107 115842  54  14  32
 2  1  0  23380 577680 105920 2145140   0   0 032  156 117549  50  16  35

HTH

Matt

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Tom Lane
> Sent: 20 April 2004 01:02
> To: [EMAIL PROTECTED]
> Cc: Joe Conway; scott.marlowe; Bruce Momjian; [EMAIL PROTECTED];
> [EMAIL PROTECTED]; Neil Conway
> Subject: Re: [PERFORM] Wierd context-switching issue on Xeon 
> 
> 
> Here is a test case.  To set up, run the "test_setup.sql" script once;
> then launch two copies of the "test_run.sql" script.  (For those of
> you with more than two CPUs, see whether you need one per CPU to make
> trouble, or whether two test_runs are enough.)  Check that you get a
> nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
> 
> In isolation, test_run.sql should do essentially no syscalls at all once
> it's past the initial ramp-up.  On a machine that's functioning per
> expectations, multiple copies of test_run show a relatively low rate of
> semop() calls --- a few per second, at most --- and maybe a delaying
> select() here and there.
> 
> What I actually see on Josh's client's machine is a context swap storm:
> "vmstat 1" shows CS rates around 170K/sec.  strace'ing the backends
> shows a corresponding rate of semop() syscalls, with a few delaying
> select()s sprinkled in.  top(1) shows system CPU percent of 25-30
> and idle CPU percent of 16-20.
> 
> I haven't bothered to check how long the test_run query takes, but if it
> ends while you're still examining the behavior, just start it again.
> 
> Note the test case assumes you've got shared_buffers set to at least
> 1000; with smaller values, you may get some I/O syscalls, which will
> probably skew the results.
> 
>   regards, tom lane
> 
> 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Postgres over Linux NBD or NFS

2004-06-22 Thread Matt Clark

How about iSCSI?  This is exactly what it's for - presenting a bunch of
remote SCSI hardware as if it were local.  

There are several reference implementations on SourceForge from Intel, Cisco
& others.

I've never tried it myself, but I would if I had the need.  And let's face
it there are some very big players selling very pricey kit that uses it, so
you should have pretty high confidence that the fundamentals are strong.

M



> The other goal is to be able to stick LOTS of disk into one 
> box, and dole it out to multiple servers.  It's more 
> expensive to set up and manage 3 RAID arrays than it is to 
> set up and manage just 1, because you have to manage 3 sets 
> of disk hardware rather than 1.
[snip]
> The "poor man's approach" involves trying to fake this by 
> building a "disk box" running Linux that exports the storage 
> either as a filesystem (using NFS) or as disk blocks (NBD).  
> NFS clearly doesn't provide the filesystem semantics needed 
> to get decent reliability; with NBD, it's not clear what happens :-(.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Swapping in 7.4.3

2004-07-16 Thread Matt Clark

> This is normal.  My personal workstation has been up for 16 
> days, and it shows 65 megs used for swap.  The linux kernel 
> looks for things that haven't been accessed in quite a while 
> and tosses them into swap to free up the memory for other uses.
> 
> This isn't PostgreSQL's fault, or anything elses.  It's how a 
> typical Unix kernel works.  I.e. you're seeing a problem that 
> simply isn't there.

Actually it (and other OSes) does slightly better than that.  It _copies_
the least recently used pages into swap, but leaves them in memory.  Then
when there really is a need to swap stuff out there is no need to actually
write to swap because it's already been done, and conversely if those pages
are wanted then they don't have to be read from disk because they were never
removed from memory.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Performance Bottleneck

2004-08-08 Thread Matt Clark

> And this is exactly where the pgpool advantage lies. 
> Especially with the 
> TPC-W, the Apache is serving a mix of PHP (or whatever CGI 
> technique is 
> used) and static content like images. Since the 200+ Apache 
> kids serve 
> any of that content by random and the emulated browsers very much 
> encourage it to ramp up MaxClients children by using up to 4 
> concurrent 
> image connections, one does end up with MaxClients DB 
> connections that 
> are all relatively low frequently used. In contrast to that the real 
> pgpool causes lesser, more active DB connections, which is better for 
> performance.

There are two well-worn and very mature techniques for dealing with the
issue of web apps using one DB connection per apache process, both of which
work extremely well and attack the issue at its source.

1)  Use a front-end caching proxy like Squid as an accelerator.  Static
content will be served by the accelerator 99% of the time.  Additionally,
large pages can be served immediately to the accelerator by Apache, which
can then go on to serve another request without waiting for the end user's
dial-up connection to pull the data down.  Massive speedup, fewer apache
processes needed.

2)  Serve static content off an entirely separate apache server than the
dynamic content, but by using separate domains (e.g. 'static.foo.com').

Personally I favour number 1.  Our last biggish peak saw 6000 open HTTP and
HTTPS connections and only 200 apache children, all of them nice and busy,
not hanging around on street corners looking bored.  During quiet times
Apache drops back to its configured minimum of 40 kids.  Option 2 has the
advantage that you can use a leaner build for the 'dynamic' apache server,
but with RAM so plentiful these days that's a less useful property.

Basically this puts the 'pooling' back in the stateless HTTP area where it
truly belongs and can be proven not to have any peculiar side effects
(especially when it comes to transaction safety).  Even better, so long as
you use URL parameters for searches and the like, you can have the
accelerator cache those pages for a certain time too so long as slightly
stale results are OK.

I'm sure pgpool and the like have their place, but being band-aids for
poorly configured websites probably isn't the best use for them.

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] insert

2004-08-13 Thread Matt Clark

> > It is likely that you are missing an index on one of those foreign 
> > key'd items.
> 
> I don't think that is too likely as a foreign key reference 
> must be a unique key which would have an index. 

I think you must be thinking of primary keys, not foreign keys.  All
one-to-many relationships have non-unique foreign keys.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Disabling transaction/outdated-tuple behaviour

2004-08-26 Thread Matt Clark

> Immediately after performing a vaccum, updates take upto 50 
> milliseconds to occur, however the update performance 
> degrades over time, such that after a few hours of continuous 
> updates, each update takes about half a second. Regular 
> vacuuming improves the performance temporarily, but during 
> the vacuum operation (which takes upto 2 minutes), 
> performance of concurrent updates falls below an acceptable 
> level (sometimes > 2 seconds per update).

You must be doing an enormous number of updates!  You can vacuum as often as
you like, and should usually do so at least as often as the time it takes
for 'all' tuples to be updated.  So, in your case, every 10 updates.  OK,
that seems unnecessary, how about every 100 updates? 

> According to the documentation, PostgreSQL keeps the old 
> versions of the tuples in case of use by other transactions 
> (i.e. each update is actually extending the table). I believe 
> this behaviour is what is causing my performance problem.

Yes, it probably is.

> Is there a way to disable this behaviour such that an update 
> operation would overwrite the current record and does not 
> generate an outdated tuple each time? (My application does 
> not need transactional support).

No, I don't believe there is.  If you really don't need transaction support
then you might want to reconsider whether postgres is really the right tool.

M


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Table UPDATE is too slow

2004-08-31 Thread Matt Clark

> >That looks like poor database normalization, really. Are you 
> sure you 
> >don't want to split this into multiple tables instead of having 62 
> >columns?
> >
> No, it is properly normalized. The data in this table is stock 
> fundamentals, stuff like 52 week high, ex-dividend date, etc, etc.

Hmm, the two examples you gave there are actually ripe for breaking out into
another table.  It's not quite 'normalisation', but if you have data that
changes very rarely, why not group it into a separate table?  You could have
the highly volatile data in one table, the semi-volatile stuff in another,
and the pretty static stuff in a third.  Looked at another way, if you have
sets of fields that tend to change together, group them into tables
together.  That way you will radically reduce the number of indexes that are
affected by each update.

But as someone else pointed out, you should at the very least wrap your
updates in a big transaction.

M


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Caching of Queries

2004-09-23 Thread Matt Clark

> I've looked at PREPARE, but apparently it only lasts 
> per-session - that's worthless in our case (web based 
> service, one connection per data-requiring connection).

That's a non-sequitur.  Most 'normal' high volume web apps have persistent
DB connections, one per http server process.  Are you really dropping DB
connections and reconnecting each time a new HTTP request comes in?

M


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] IBM P-series machines

2004-10-11 Thread Matt Clark


As for "vendor support" for Opteron, that sure looks like a
trainwreck...  If you're going through IBM, then they won't want to
respond to any issues if you're not running a "bog-standard" RHAS/RHES
release from Red Hat.  And that, on Opteron, is preposterous, because
there's plenty of the bits of Opteron support that only ever got put
in Linux 2.6, whilst RHAT is still back in the 2.4 days.
 

To be fair, they have backported a boatload of 2.6 features to their kernel:
http://www.redhat.com/software/rhel/kernel26/
And that page certainly isn't an exhaustive list...
M
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Which plattform do you recommend I run PostgreSQL for

2004-10-12 Thread Matt Clark


In the MySQL manual it says that MySQL performs best with Linux 2.4 with
ReiserFS on x86. Can anyone official, or in the know, give similar
information regarding PostgreSQL?
 

I'm neither official, nor in the know, but I do have a spare moment! I 
can tell you that any *NIX variant on any modern hardware platform will 
give you good performance, except for Cygwin/x86.  Any differences 
between OSes on the same hardware are completely swamped by far more 
direct concerns like IO systems, database design, OS tuning etc.  Pick 
the OS you're most familiar with is usually a good recommendation (and 
not just for Postgres).

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Opteron vs RHAT

2004-10-13 Thread Matt Clark

> >>trainwreck...  If you're going through IBM, then they won't want to 
> >>respond to any issues if you're not running a 
> "bog-standard" RHAS/RHES 
> >>release from Red Hat.  
...> To be fair, we keep on actually running into things that 
> _can't_ be backported, like fibrechannel drivers that were 
> written to take advantage of changes in the SCSI support in 2.6.

I thought IBM had good support for SUSE?  I don't know why I thought that...


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Select with qualified join condition / Batch inserts

2004-10-15 Thread Matt Clark

> SELECT cmp.WELL_INDEX, cmp.COMPOUND, con.CONCENTRATION 
>   FROM SCR_WELL_COMPOUND cmp, SCR_WELL_CONCENTRATION con 
>   WHERE cmp.BARCODE=con.BARCODE 
>   AND cmp.WELL_INDEX=con.WELL_INDEX 
>   AND cmp.MAT_ID=con.MAT_ID 
>   AND cmp.MAT_ID = 3 
>   AND cmp.BARCODE='910125864' 
>   AND cmp.ID_LEVEL = 1;

Quick guess - type mismatch forcing sequential scan.  Try some quotes:
AND cmp.MAT_ID = '3' 
AND cmp.BARCODE='910125864' 
AND cmp.ID_LEVEL = '1';

M


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark


Basically you set a default in seconds for the HTML results to be
cached, and then have triggers set that force the cache to regenerate
(whenever CRUD happens to the content, for example).
Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a
believer out of me!
 

Nice to have it in a library, but if you want to be that simplistic then 
it's easy in any language.  What if a process on server B modifies a n 
important value that server A has cached though?  Coherency (albeit that 
the client may choose to not use it) is a must for a general solution.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark


It might be easiest to shove the caching logic into pgpool instead.
...
When pg_pool is told to cache a query, it can get a table list and
monitor for changes. When it gets changes, simply dumps the cache.
 

It's certainly the case that the typical web app (which, along with 
warehouses, seems to be one half of the needy apps), could probably do 
worse than use pooling as well.  I'm not well up enough on pooling to 
know how bulletproof it is though, which is why I included it in my list 
of things that make me go 'hmm'.  It would be really nice not to 
have to take both things together.

More to the point though, I think this is a feature that really really 
should be in the DB, because then it's trivial for people to use.  
Taking an existing production app and justifying a switch to an extra 
layer of pooling software is relatively hard compared with grabbing data 
from a view instead of a table (or setting a variable, or adding a tweak 
to a query, or however else it might be implemented).

Eminiently doable in pgpool though, and just the right thing for anyone 
already using it.

M
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark







  
More to the point though, I think this is a feature that really really 
should be in the DB, because then it's trivial for people to use.  

  
  
How does putting it into PGPool make it any less trivial for people to
use?

The answers are at 
http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html .  Specifically,
it's a separate application that needs configuration, the homepage has
no real discussion of the potential pitfalls of pooling and what this
implementation does to get around them, you get the idea.  I'm sure
it's great software, but it doesn't come as part of the DB server, so
95% of people who would benefit from query caching being implemented in
it never will.  If it shipped with and was turned on by default in SUSE
or RedHat that would be a different matter.  Which I realise makes me
look like one of those people who doesn't appreciate code unless it's
'popular', but I hope I'm not *that* bad...

Oh OK, I'll say it, this is a perfect example of why My*** has so much
more mindshare.  It's not better, but it sure makes the average Joe
_feel_ better.  Sorry, I've got my corporate hat on today, I'm sure
I'll feel a little less cynical tomorrow.

M

Re: [PERFORM] Caching of Queries

2004-09-27 Thread Matt Clark


Any competently written application where caching results would be a
suitable performance boost can already implement application or
middleware caching fairly easily, and increase performance much more
than putting result caching into the database would.
 

I guess the performance increase is that you can spend $10,000 on a 
developer, or $10,000 on hardware, and for the most part get a more 
reliable result the second way.  MemcacheD is fine(ish), but it's not a 
panacea, and it's more than easy to shoot yourself in the foot with it.  
Caching is hard enough that lots of people do it badly - I'd rather use 
an implementation from the PG team than almost anywhere else.

I don't see caching results in the database as much of a win for most
well written applications.  Toy benchmarks, sure, but for real apps it
seems it would add a lot of complexity, and violate the whole point of
using an ACID database.
 

Well the point surely is to _remove_ complexity from the application, 
which is written by God Knows Who, and put it in the DB, which is 
written by God And You.  And you can still have ACID (cached data is not 
the same as stale data, although once you have the former, the latter 
can begin to look tempting sometimes).

M
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Caching of Queries

2004-09-30 Thread Matt Clark


If you're not using a connection pool of some kind then you might as
well forget query plan caching, because your connect overhead will swamp
the planning cost. This does not mean you have to use something like
pgpool (which makes some rather questionable claims IMO); any decent web
application language/environment will support connection pooling.
 

Hmm, a question of definition -  there's a difference between a pool and 
a persistent connection.  Pretty much all web apps have one connection 
per process, which is persistent (i.e. not dropped and remade for each 
request), but not shared between processes, therefore not pooled.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Caching of Queries

2004-10-01 Thread Matt Clark

> OK, that'd work too... the point is if you're re-connecting 
> all the time it doesn't really matter what else you do for 
> performance.

Yeah, although there is the chap who was asking questions on the list
recently who had some very long-running code on his app servers, so was best
off closing the connection because he had far too many postmaster processes
just sitting there idle all the time!

But you're right, it's a killer usually.

M 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Caching of Queries

2004-10-05 Thread Matt Clark


> I don't know what you are exactly referring to in above URL 
> when you are talking about "potential pitfalls of pooling". 
> Please explain more.

Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that
some people aren't necessarily aware of them up front.  For instance, pgpool
does an 'abort transaction' and a 'reset all' in lieu of a full reconnect
(of course, since a full reconnect is exactly what we are trying to avoid).
Is this is enough to guarantee that a given pooled connection behaves
exactly as a non-pooled connection would from a client perspective?  For
instance, temporary tables are usually dropped at the end of a session, so a
client (badly coded perhaps) that does not already use persistent
connections might be confused when the sequence 'connect, create temp table
foo ..., disconnect, connect, create temp table foo ...' results in the
error 'Relation 'foo' already exists'.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Speeding up this function

2004-10-19 Thread Matt Nuzum

Hello, I've thought it would be nice to index certain aspects of my
apache log files for analysis. I've used several different techniques
and have something usable now, but I'd like to tweak it one step
further.

My first performance optimization was to change the logformat into a
CSV format.  I processed the logfiles with PHP and plsql stored
procedures. Unfortunately, it took more than 24 hours to process 1
days worth of log files.

I've now switched to using C# (using mono) to create hash-tables to do
almost all of the pre-processing.  This has brought the time down to
about 3 hours.  Actually, if I take out one step it brought the
process down to about 6 minutes, which is a tremendous improvement.

The one step that is adding 2.5+ hours to the job is not easily done
in C#, as far as I know.

Once the mostly-normalized data has been put into a table called
usage_raw_access I then use this query:
insert into usage_access select * , 
usage_normalize_session(accountid,client,atime) as sessionid 
from usage_raw_access;

All it does is try to "link" pageviews together into a session. 
here's the function:
 create or replace function usage_normalize_session (varchar(12),
inet, timestamptz) returns integer as '
 DECLARE
 -- $1 = Account ID, $2 = IP Address, $3 = Time
 RecordSet record;
 BEGIN
 SELECT INTO RecordSet DISTINCT sessionid FROM usage_access ua
 WHERE ua.accountid = $1
 AND ua.client = $2
 AND ua.atime <= ($3 - ''20 min''::interval)::timestamptz;

 if found
 then return RecordSet.sessionid;
 end if;

 return nextval(''usage_session_ids'');
 END;'
 language plpgsql;

And the table usage_access looks like this:
   Table "public.usage_access"
  Column|   Type   | Modifiers
-+--+---
[snip]
client  | inet |
atime   | timestamp with time zone |
accountid   | character varying(12)|
sessionid   | integer  |
Indexes: usage_acccess_req_url btree (req_url),
usage_access_accountid btree (accountid),
usage_access_atime btree (atime),
usage_access_hostid btree (hostid),
usage_access_sessionid btree (sessionid)
usage_access_sessionlookup btree (accountid,client,atime);

As you can see, this looks for clients who have visited the same site
within 20 min.  If there is no match, a unique sessionid is assigned
from a sequence.  If there is a visit, the session id assigned to them
is used.  I'm only able to process about 25 records per second with my
setup.  My window to do this job is 3-4 hours and the shorter the
better.

Here is an explain analyze of the query I do (note I limited it to 1000):
EXPLAIN ANALYZE
insert into usage_access select * ,
usage_normalize_session(accountid,client,atime) as sessionid from
usage_raw_access limit 1000;
QUERY PLAN
--
 Subquery Scan "*SELECT*"  (cost=0.00..20.00 rows=1000 width=196)
(actual time=51.63..47634.22 rows=1000 loops=1)
   ->  Limit  (cost=0.00..20.00 rows=1000 width=196) (actual
time=51.59..47610.23 rows=1000 loops=1)
 ->  Seq Scan on usage_raw_access  (cost=0.00..20.00 rows=1000
width=196) (actual time=51.58..47606.14 rows=1001 loops=1)
 Total runtime: 48980.54 msec

I also did an explain of the query that's performed inside the function:

EXPLAIN ANALYZE
select sessionid from usage_access ua where ua.accountid =  'XYZ' and
ua.client = '64.68.88.45'::inet and ua.atime <= '2003-11-02
04:50:01-05'::timestamptz;
  
  QUERY PLAN
--
Index Scan using usage_access_sessionlookup on usage_access ua 
(cost=0.00..6.02 rows=1 width=4) (actual time=0.29..0.29 rows=0
loops=1)
  Index Cond: ((accountid = 'XYZ'::character varying) AND (client =
'64.68.88.45'::inet) AND (atime <= '2003-11-02 04:50:01-05'::timestamp
with time zone))
Total runtime: 0.35 msec
(3 rows)


What I'd really like to know is if someone knows a way to do any of
the following:
 a: Make the INSERT into ... SELECT *,usage_access_sessionlookup().. work faster
 b: Make the usage_access_sessionlookup() smarter,better,etc.
 c: Do this in C# using a hash-table or some other procedure that
would be quicker.
 d: Find an algorithm to create the sessionid without having to do any
database or hash-table lookups.  As the dataset gets bigger, it won't
fit in RAM and the lookup queries will become I/O bound, drastically
slowing things down.

d: is my first choice.

For some reason I just can't seem to get my mind around the data.  I
wonder if there'

Re: [PERFORM] Speeding up this function

2004-10-19 Thread Matt Nuzum

On Tue, 19 Oct 2004 15:49:45 -0400, Jeremy Dunn <[EMAIL PROTECTED]> wrote:
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of
> > Matt Nuzum
> > Sent: Tuesday, October 19, 2004 3:35 PM
> > To: pgsql-performance
> > Subject: [PERFORM] Speeding up this function
> >
> 

> 
> This is probably a stupid question, but why are you trying to create
> sessions after the fact?  Since it appears that users of your site must
> login, why not just assign a sessionID to them at login time, and keep
> it in the URL for the duration of the session?  Then it would be easy to
> track where they've been.
> 
> - Jeremy
> 
> 

You don't have to log in to visit the sites.  These log files are
actually for many domains.  Right now, we do logging with a web-bug
and it does handle the sessions, but it relies on javascript and we
want to track a lot more than we are now.  Plus, that code is in
JavaScript and one of our primary motiviations is to ditch MySQL
completely.

-- 
Matthew Nuzum   | Makers of "Elite Content Management System"
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] OS desicion

2004-10-20 Thread Matt Clark

You are asking the wrong question.  The best OS is the OS you (and/or 
the customer)  knows and can administer competently.  The real 
performance differences between unices are so small as to be ignorable 
in this context.  The context switching bug is not OS-dependent, but 
varys in severity across machine architectures (I understand it to be 
mostly P4/Athlon related, but don't take my word for it).

M
Tom Fischer wrote:
Hi List,
I have a Dual-Xeon 3Ghz System with with GB RAM and an Adaptec 212ß SCSI
RAID with 4 SCA Harddiscs. Our customer wants to have the Machine tuned
for best Database performance. Which OS should we used? We are tending
between Linux 2.6 or FreeBSD. The Database Size is 5GB and ascending.
Most SQL-Queries are Selects, the Tablesizes are beetween 300k and up to
10 MB. I've read the Hardware Performance Guide and the result was to
take FreeBSD in the Decision too :)
And what is on this Context Switiching Bug i have read in the Archive? 

Hope you can help me
Regards
Tom
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] how much mem to give postgres?

2004-10-20 Thread Matt Clark


How would I turn that off? In the kernel config? Not too familiar with
that. I have a 2 proc xeon with 4 gigs of mem on the way for postgres,
so I hope HT isn't a problem. If HT is turned off, does it just not
use the other "half" of the processor? Or does the processor just work
as one unit?
 

You turn it off in the BIOS.  There is no 'other half', the processor is 
just pretending to have two cores by shuffling registers around, which 
gives maybe a 5-10% performance gain in certain multithreaded 
situations.  A hack to overcome marchitactural limitations due 
to the overly long pipeline in the Prescott core..  Really of 
most use for desktop interactivity rather than actual throughput.

M
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] OS desicion

2004-10-20 Thread Matt Clark







  
The real 
performance differences between unices are so small as to be ignorable
in this context. 

  
  <>
Well, at least the difference between Linux and BSD. There are
substantial 
tradeoffs should you chose to use Solaris or UnixWare.
  
Yes, quite right, I should have said 'popular x86-based unices'.

Re: [PERFORM] how much mem to give postgres?

2004-10-20 Thread Matt Clark



Hyperthreading is actually an excellent architectural feature that
can give significant performance gains when implemented well and used
for an appropriate workload under a decently HT aware OS.
IMO, typical RDBMS streams are not an obviously appropriate workload,
Intel didn't implement it particularly well and I don't think there
are any OSes that support it particularly well.

But don't write off using it in the future, when it's been improved
at both the OS and the silicon levels.
 

You are quite right of course  - unfortunately the current Intel 
implementation meets nearly none of these criteria!  As Rod Taylor 
pointed out off-list, IBM's SMT implementation on the Power5 is vastly 
superior.  Though he's also just told me that Sun is beating IBM on 
price/performance for his workload, so who knows how reliable a chap he 
is... ;-)

M
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Matt Clark

I suppose I'm just idly wondering really.  Clearly it's against PG
philosophy to build an FS or direct IO management into PG, but now it's so
relatively easy to plug filesystems into the main open-source Oses, It
struck me that there might be some useful changes to, say, XFS or ext3, that
could be made that would help PG out.

I'm thinking along the lines of an FS that's aware of PG's strategies and
requirements and therefore optimised to make those activities as efiicient
as possible - possibly even being aware of PG's disk layout and treating
files differently on that basis.

Not being an FS guru I'm not really clear on whether this would help much
(enough to be worth it anyway) or not - any thoughts?  And if there were
useful gains to be had, would it need a whole new FS or could an existing
one be modified?

So there might be (as I said, I'm not an FS guru...):
* great append performance for the WAL?
* optimised scattered writes for checkpointing?
* Knowledge that FSYNC is being used for preserving ordering a lot of the
time, rather than requiring actual writes to disk (so long as the writes
eventually happen in order...)?


Matt



Matt Clark
Ymogen Ltd
P: 0845 130 4531
W: https://ymogen.net/
M: 0774 870 1584
 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Matt Clark

> Looking at that list, I got the feeling that you'd want to 
> push that PG-awareness down into the block-io layer as well, 
> then, so as to be able to optimise for (perhaps) conflicting 
> goals depending on what the app does; for the IO system to be 
> able to read the apps mind it needs to have some knowledge of 
> what the app is / needs / wants and I get the impression that 
> this awareness needs to go deeper than the FS only.

That's a fair point, it would need be a kernel patch really, although not
necessarily a very big one, more a case of looking at FDs and if they're
flagged in some way then get the PGfs to do the job instead of/as well as
the normal code path.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-26 Thread Matt Clark


I don't have iostat on that machine, but vmstat shows a lot of writes to
the drives, and the runnable processes are more than 1:
6  1  0 3617652 292936 279192800 0 52430 1347  4681 25
19 20 37
 

Assuming that's the output of 'vmstat 1' and not some other delay, 
50MB/second of sustained writes is usually considered 'a lot'. 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] can't handle large number of INSERT/UPDATEs

2004-10-26 Thread Matt Clark


and certainly anyone who's been around a computer more than a week or
two knows which direction "in" and "out" are customarily seen from.
regards, tom lane

Apparently not whoever wrote the man page that everyone copied ;-)
Interesting. I checked this on several machines. They actually say 
different things.

Redhat 9- bi: Blocks sent to a block device (blocks/s).
Latest Cygwin- bi: Blocks sent to a block device (blocks/s).
Redhat 7.x- bi: Blocks sent to a block device (blocks/s).
Redhat AS3- bi: blocks sent out to a block device (in blocks/s)
I would say that I probably agree, things should be relative to the 
cpu. However, it doesn't seem to be something that was universally 
agreed upon. Or maybe the man-pages were all wrong, and only got 
updated recently.

Looks like the man pages are wrong, for RH7.3 at least.  It says bi is 
'blocks written', but an actual test like 'dd if=/dev/zero of=/tmp/test 
bs=1024 count=16384' on an otherwise nearly idle RH7.3 box gives:
  procs  memoryswap  io 
system cpu
r  b  w   swpd   free   buff  cache  si  sobibo   incs  us  
sy  id
0  0  0  75936 474704 230452 953580   0   0 0 0  106  2527   0   
0  99
0  0  0  75936 474704 230452 953580   0   0 0 16512  376  2572   
0   2  98
0  0  0  75936 474704 230452 953580   0   0 0 0  105  2537   
0   0 100

Which is in line with bo being 'blocks written'.
M
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] preloading indexes

2004-11-03 Thread Matt Clark

Title: Message




The best way to get all the stuff needed by a query into 
RAM is to run the query.  Is it more that you want to 'pin' the data in RAM 
so it doesn't get overwritten by other 
queries?
 
-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
[EMAIL PROTECTED]Sent: 03 November 2004 
17:31To: [EMAIL PROTECTED]Subject: 
[PERFORM] preloading indexes

  
  I am working with some pretty 
  convoluted queries that work very slowly the first time they’re called but 
  perform fine on the second call. I am fairly certain that these differences 
  are due to the caching. Can someone point me in a direction that would allow 
  me to pre-cache the critical 
indexes?

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

> I have a dual processor system that can support over 150 concurrent 
> connections handling normal traffic and load.   Now suppose I setup 
> Apache to spawn all of it's children instantly, what will 
...
> This will spawn 150 children in a short order of time and as 
> this takes 

"Doctor, it hurts when I do this!"
"Well, don't do that then..."

Sorry, couldn't resist ;-)

Our Apache/PG driven website also needs to be able to deal with occasional
large peaks, so what we do is:

StartServers 15 # Don't create too many children initially
MinSpareServers 10  # Always have at least 10 spares lying around
MaxSpareServers 20  # But no more than 20
MaxClients 150  # Up to 150 - the default 256 is too much for our
RAM


So on server restart 15 Apache children are created, then one new child
every second up to a maximum of 150.

Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
scope for queuing inbound requests while we wait for sufficient children to
be spawned.

In addition we (as _every_ high load site should) run Squid as an
accelerator, which dramatically increases the number of client connections
that can be handled.  Across 2 webservers at peak times we've had 50,000
concurrently open http & https client connections to Squid, with 150 Apache
children doing the work that squid can't (i.e. all the dynamic stuff), and
PG (on a separate box of course) whipping through nearly 800 mixed selects,
inserts and updates per second - and then had to restart Apache on one of
the servers for a config change...  Not a problem :-)

One little tip - if you run squid on the same machine as apache, and use a
dual-proc box, then because squid is single-threaded it will _never_ take
more than half the CPU - nicely self balancing in a way.

M


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

> Apache::DBI overall works better to what I require, even if 
> it is not a 
> pool per sey.   Now if pgpool supported variable rate pooling like 
> Apache does with it's children, it might help to even things 
> out.  That 
> and you'd still get the spike if you have to start the webserver and 
> database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

> Case in point: A first time visitor hits your home page.  A 
> dynamic page is generated (in about 1 second) and served 
> (taking 2 more seconds) which contains links to 20 additional 

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.

I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

> Correct the 75% of all hits are on a script that can take 
> anywhere from 
> a few seconds to a half an hour to complete.The script 
> essentially 
> auto-flushes to the browser so they get new information as it arrives 
> creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.

> A squid proxy would probably cause severe problems when 
> dealing with a 
> script that does not complete output for a variable rate of time.

No, it's fine, squid gives it to the client as it gets it, but can receive
from the server faster.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


1- You have a query that runs for half an hour and you spoon feed 
the  results to the client ?
(argh)

2- Your script looks for new data every few seconds, sends a 
packet, then  sleeps, and loops ?

If it's 2 I have a readymade solution for you, just ask.
I'm guessing (2) - PG doesn't give the results of a query in a stream. 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


These are CGI scripts at the lowest level, nothing more and nothing 
less.  While I could probably embed a small webserver directly into 
the perl scripts and run that as a daemon, it would take away the 
portability that the scripts currently offer.
If they're CGI *scripts* then they just use the CGI environment, not 
Apache, so a daemon that accepts the inbound connections, then compiles 
the scripts a-la Apache::Registry, but puts each in a separate thread 
would be, er, relatively easy for someone better at multithreaded stuff 
than me.

This should be my last question on the matter, does squid report the 
proper IP address of the client themselves?That's a critical 
requirement for the scripts.

In the X-Forwarded-For header.  Not that you can be sure you're seeing 
the true client IP anyway if they've gone through an ISP proxy beforehand.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


In your webpage include an iframe with a Javascript to refresh it 
every  five seconds. The iframe fetches a page from the server which 
brings in  the new data in form of generated JavaScript which writes 
in the parent  window. Thus, you get a very short request every 5 
seconds to fetch new  data, and it is displayed in the client's window 
very naturally.

...
Yup.  If you go the JS route then you can do even better by using JS to 
load data into JS objects in the background and manipulate the page 
content directly, no need for even an Iframe.  Ignore the dullards who 
have JS turned off - it's essential for modern web apps, and refusing JS 
conflicts absolutely with proper semantic markup.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good 
starting point.

It's clear that this discussion has moved way away from PG!  Although in 
the context of DB backed web apps I guess in remains a bit on-topic...

M
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] appropriate indexing

2004-11-04 Thread Matt Clark


- ITEM table will, grow, grow, grow (sold items are not deleted)
WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and 
(ITEM.KIND=2 or ITEM.KIND=3)

Partial index on item.status ?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Matt Clark







[EMAIL PROTECTED] wrote:

  Citando Rod Taylor <[EMAIL PROTECTED]>:
  
  
Please send an explain analyze from both.

  
  I'm sendin three explains. In the first the Dell machine didn't use existing
indexes, so I turn enable_seqscan off (this is the second explain). The total
cost decreased, but the total time not. The third explain refers to the cheaper
(and faster) machine. The last thing is the query itself.


 Nested Loop  (cost=9008.68..13596.97 rows=1 width=317) (actual
time=9272.803..65287.304 rows=2604 loops=1)
 Nested Loop  (cost=5155.51..19320.20 rows=1 width=317) (actual
time=480.311..62530.121 rows=2604 loops=1)
 Hash Join  (cost=2.23..11191.77 rows=9 width=134) (actual
time=341.708..21868.167 rows=2604 loops=1)

  

Well the plan is completely different on the dev machine.  Therefore
either the PG version or the postgresql.conf is different.  No other
possible answer.

M

Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Matt Clark


All 3 plans have crappy estimates.
Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).
 

Er, no other possible answer except Rod's :-)
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.
 

Hmm... I respectfully disagree.  It is so powerful that it is impossible 
to ignore when implementing a sophisticated app.  And it is not 
dangerous to the user so long as they have a popup blocker.  
Commercially, I can ignore the people who turn it off, and I can gain a 
huge benefit from knowing that 95% of people have it turned on, because 
it gives my users a hugely better experience than the equivalent XHTML 
only page (which I deliver, and which works, but which is a fairly 
depressing experience compared to the JS enabled version).

It is _amazing_ how much crud you can take out of a page if you let JS 
do the dynamic stuff (with CSS still in full control of the styling).  
Nice, clean, semantically sensible XHTML, that can be transformed for 
multiple devices - it's great.

An example:
/previews/foo.wmv
But we want it to appear in a popup when viewed in certain devices  
Easy - Attach an 'onclick' event handler (or just set the target 
attribute) when the device has a suitable screen & media player, but 
leave the markup clean for the rest of the world.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


A note though : you'll have to turn off HTTP persistent 
connections in  your server (not in your proxy) or youre back to 
square one.

I hadn't considered that.  On the client side it would seem to be up to 
the client whether to use a persistent connection or not.  If it does, 
then yeah, a request every 5 seconds would still just hold open a 
server.  One more reason to use a proxy I s'pose.

It's clear that this discussion has moved way away from PG!  Although 
in  the context of DB backed web apps I guess in remains a bit 
on-topic...

I find it very on-topic as
- it's a way to help this guy solve his "pg problem" which was iin 
fact a  design problem
- it's the future of database driven web apps (no more reloading 
the  whole page !)

I think in the future there will be a good bit of presentation 
login in  the client...
Not if Bruno has his way ;-)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark


Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark

> For some reason it's a requirement that partial wildcard 
> searches are done on this field, such as "SELECT ... WHERE 
> field LIKE 'A%'"
> 
> I thought an interesting way to do this would be to simply 
> create partial indexes for each letter on that field, and it 
> works when the query matches the WHERE clause in the index 
> exactly like above.  The problem is thus:

I thought PG could use an ordinary index for 'like' conditions with just a
terminating '%'?

My other thought is that like 'A%' should grab about 1/26th of the table
anyway (if the initial character distribution is random), and so a
sequential scan might be the best plan anyway...

M


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] What is the difference between these?

2004-11-05 Thread Matt Nuzum

To me, these three queries seem identical... why doesn't the first one
(simplest to understand and write) go the same speed as the third one?

I'll I'm trying to do is get statistics for one day (in this case,
today) summarized.  Table has ~25M rows.  I'm using postgres 7.3.? on
rh linux 7.3 (note that i think the difference between the first two
might just be related to the data being in memory for the second
query).


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime >
date_trunc('day', now());
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=580350.43..580350.43 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=580164.48..580342.21 rows=2964 loops=1)
 Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 580350.65 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from (select * from usage_access
where atime > date_trunc('day', now())) as temp;
QUERY PLAN  
--
 Aggregate  (cost=933439.69..933439.69 rows=1 width=4) (actual
time=348012.85..348012.85 rows=1 loops=1)
   ->  Seq Scan on usage_access  (cost=0.00..912400.11 rows=8415831
width=4) (actual time=347960.53..348004.68 rows=2964 loops=1)
 Filter: (atime > date_trunc('day'::text, now()))
 Total runtime: 348013.10 msec
(4 rows)


 EXPLAIN ANALYZE
 select count(distinct sessionid) from usage_access where atime
between date_trunc('day', now()) and date_trunc('day', now()) + '1
day'::interval;
QUERY PLAN  
--
 Aggregate  (cost=89324.98..89324.98 rows=1 width=4) (actual
time=27.84..27.84 rows=1 loops=1)
   ->  Index Scan using usage_access_atime on usage_access 
(cost=0.00..89009.39 rows=126237 width=4) (actual time=0.51..20.37
rows=2964 loops=1)
 Index Cond: ((atime >= date_trunc('day'::text, now())) AND
(atime <= (date_trunc('day'::text, now()) + '1 day'::interval)))
 Total runtime: 28.11 msec
(4 rows)

-- 
Matthew Nuzum   | Makers of "Elite Content Management System"
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark

> With that many rows, and a normal index on the field, 
> postgres figures the best option for say "I%" is not an index 
> scan, but a sequential scan on the table, with a filter -- 
> quite obviously this is slow as heck, and yes, I've run 
> analyze several times and in fact have the vacuum analyze automated.

Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an
ordinary index OK?  If so then...

The planner would usually assume (from what Tom usually says) that 1/26
selectivity isn't worth doing an index scan for, but in your case it's wrong
(maybe because the rows are very big?)

You may be able to get the planner to go for an index scan on "like 'I%'" by
tweaking the foo_cost variables in postgresql.conf 

Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%'
... ", or do that as a stored proc.

> With the partial index the index scan is used and the cost 
> drops from 0..2million to 0..9000 -- a vast improvement.

So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';

Any good?


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Strange (?) Index behavior?

2004-11-05 Thread Matt Clark


So there are really only 9000 rows out of 76 million starting with 'I'?  How
about combining some techniques - you could create an index on the first two
chars of the field (should be selective enough to give an index scan),
select from that, and select the actual data with the like clause.
   

I was talking about the cost, not the number of rows.  About 74,000
rows are returned but the query only takes about 8 seconds to run. --
 

Well, 74000/7600 ~= 0.1%, way less than 1/26, so no surprise that an 
indexscan is better, and also no surprise that the planner can't know 
that I is such an uncommon initial char.

with the partial index in place.
 

CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
CREATE INDEX idx_all ON table (field);
SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
AS approx WHERE field LIKE 'DE%';
   

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.
 

I don't see why not, it just uses the functional index to grap the 
1/(ascii_chars^2) of the rows that are of obvious interest, and then 
uses the standard index to filter that set..  Where it won't work is 
where you just want one initial char!  Which is why I suggested the 
silly query rewrite...

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index.  The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.
 

If you were planning the query, what would you do?  Assuming we're 
talking about A-Z as possible first chars, and assuming we don't know 
the distribution of those chars, then we have to assume 1/26 probability 
of each char, so a seq scan makes sense.  Whereas like 'JK%' should only 
pull 1/500 rows.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:
SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
-- or even --
SELECT field FROM table WHERE field LIKE 'A%';
The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.
 

Yes, because (I assume, Tom will no doubt clarify/correct), by creating 
the partial indices you create a lot more information about the 
distribution of the first char - either that, or the planner simply 
always uses an exactly matching partial index if available.

I _think_ that creating 26 partial indexes on '?%' is essentially the 
same thing as creating one functional index on substr(field,1,1), just 
messier, unless the partial indexes cause the planner to do something 
special...

M
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] ext3 journalling type

2004-11-08 Thread Matt Clark

> Am I right to assume that "writeback" is both fastest and at 
> the same time as safe to use as ordered?  Maybe any of you 
> did some benchmarks?

It should be fastest because it is the least overhead, and safe because
postgres does it's own write-order guaranteeing through fsync().  You should
also mount the FS with the 'noatime' option.

But  For some workloads, there are tests showing that 'data=journal' can
be the fastest!  This is because although the data is written twice (once to
the journal, and then to its real location on disk) in this mode data is
written _sequentially_ to the journal, and later written out to its
destination, which may be at a quieter time.

There's a discussion (based around 7.2) here:
http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.txt

M


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Speed in V8.0

2004-12-24 Thread Matt Clark

Another man working to the bitter end this Christmas!

There could be many reasons, but maybe first you should look at the amount
of RAM available?  If the tables fit in RAM on the production server but not
on the dev server, then that will easily defeat the improvement due to using
the native DB version.

Why don't you install cygwin on the dev box and do the comparison using the
same hardware?

M

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Thomas Wegner
> Sent: 21 December 2004 23:03
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Speed in V8.0
> 
> 
> Hello, i have a problem between V7.4.3 Cygwin and
> V8.0RC2 W2K. I have 2 systems:
> 
> 1. Production Machine
> - Dual P4 3000MHz
> - 2 GB RAM
> - W2K
> - PostgreSQL 7.4.3 under Cygwin
> - i connect to it over a DSL Line
> 2. Develop Machine
> - P4 1800MHz
> - 760 MB RAM
> - PostgreSQL Native Windows
> - local connection 100MB/FD
> 
> Both systems use the default postgresql.conf. Now the 
> problem. I have an (unoptimized, dynamic) query wich was 
> execute on the production machine over DSL in 2 seconds and 
> on my develop machine, connected over local LAN, in 119 seconds!
> 
> Whats this? I can not post the query details here public, its 
> a commercial project. Any first idea? I execute on both 
> machine the same query with the same database design!
> -
> Thomas Wegner
> CabrioMeter - The Weather Plugin for Trillian 
> http://www.wegner24.de/cabriometer
> 
> 
> 
> ---(end of 
> broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
> 


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Swapping on Solaris

2005-01-19 Thread Matt Casters

> Kevin Schroeder wrote:
> It looks to me like you are using no (device or file) swap at all, and
> have 1.3G of real memory free, so could in fact give Postgres more of it :-)
>

Indeed.
If you DO run into trouble after giving Postgres more RAM, use the vmstat 
command.
You can use this command like "vmstat 10". (ignore the first line)
Keep an eye on the "pi" and "po" parameters. (kilobytes paged in and out)

HTH,

Matt
--
Matt Casters <[EMAIL PROTECTED]>
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] Swapping on Solaris

2005-01-19 Thread Matt Clark

This page may be of use:
http://www.serverworldmagazine.com/monthly/2003/02/solaris.shtml
From personal experience, for god's sake don't think Solaris' VM/swap 
implementation is easy - it's damn good, but it ain't easy!

Matt
Kevin Schroeder wrote:
I think it's probably just reserving them.  I can't think of anything 
else. Also, when I run swap activity with sar I don't see any 
activity, which also points to reserved swap space, not used swap space.

swap -s reports
total: 358336k bytes allocated + 181144k reserved = 539480k used, 
2988840k available

Kevin
- Original Message - From: "Alan Stange" <[EMAIL PROTECTED]>
To: "Kevin Schroeder" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, January 19, 2005 11:04 AM
Subject: Re: [PERFORM] Swapping on Solaris

Kevin Schroeder wrote:
I may be asking the question the wrong way, but when I start up 
PostgreSQL swap is what gets used the most of.  I've got 1282MB free 
RAM right now and and 515MB swap in use.  Granted, swap file usage 
probably wouldn't be zero, but I would guess that it should be a lot 
lower so something must be keeping PostgreSQL from using the free 
RAM that my system is reporting.  For example, one of my postgres 
processes is 201M in size but on 72M is resident in RAM.  That extra 
130M is available in RAM, according to top, but postgres isn't using 
it.

The test you're doing doesn't measure what you think you're measuring.
First, what else is running on the machine?Note that some shared 
memory allocations do reserve backing pages in swap, even though the 
pages aren't currently in use.  Perhaps this is what you're 
measuring?  "swap -s" has better numbers than top.

You'd be better by trying a reboot then starting pgsql and seeing 
what memory is used.

Just because you start a process and see the swap number increase 
doesn't mean that the new process is in swap.  It means some 
anonymous pages had to be evicted to swap to make room for the new 
process or some pages had to be reserved in swap for future use.   
Typically a new process won't be paged out unless something else is 
causing enormous memory pressure...

-- Alan


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

[PERFORM]

2005-01-20 Thread Matt Casters


Hi,

I have the go ahead of a customer to do some testing on Postgresql in a couple 
of weeks as a
replacement for Oracle.
The reason for the test is that the number of users of the warehouse is going 
to increase and this
will have a serious impact on licencing costs. (I bet that sounds familiar)

We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb RAM) on 
Oracle.
Basically we have 2 large fact tables to deal with: one going for 400M rows, 
the other will be
hitting 1B rows soon.
(around 250Gb of data)

My questions to the list are: has this sort of thing been attempted before? If 
so, what where the
performance results compared to Oracle?
I've been reading up on partitioned tabes on pgsql, will the performance 
benefit will be
comparable to Oracle partitioned tables?
What are the gotchas?
Should I be testing on 8 or the 7 version?
While I didn't find any documents immediately, are there any fine manuals to 
read on data
warehouse performance tuning on PostgreSQL?

Thanks in advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date
on the results.

Best regards,

Matt
--
Matt Casters <[EMAIL PROTECTED]>
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM]

2005-01-20 Thread Matt Casters

Thanks Stephen,

My main concern is to get as much read performance on the disks as possible
on this given system.  CPU is rarely a problem on a typical data warehouse
system, this one's not any different.

We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third one
coming along.(around 350Gb)
I was kind of hoping that the new PGSQL tablespaces would allow me to create
a storage container spanning multiple file-systems, but unfortunately, that
seems to be not the case.  Is this correct?

That tells me that I probably need to do a full reconfiguration of the disks
on the Solaris level to get maximum performance out of the system.
Mmmm. This is going to be a though one to crack.  Perhaps it will be
possible to get some extra juice out of placing the indexes on the smaller
disks (150G) and the data on the bigger ones?

Thanks!

Matt

-Oorspronkelijk bericht-
Van: Stephen Frost [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 15:26
Aan: Matt Casters
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

* Matt Casters ([EMAIL PROTECTED]) wrote:
> I have the go ahead of a customer to do some testing on Postgresql in 
> a couple of weeks as a replacement for Oracle.
> The reason for the test is that the number of users of the warehouse 
> is going to increase and this will have a serious impact on licencing 
> costs. (I bet that sounds familiar)

Rather familiar, yes... :)

> We're running a medium sized data warehouse on a Solaris box (4CPU, 8Gb
RAM) on Oracle.
> Basically we have 2 large fact tables to deal with: one going for 400M 
> rows, the other will be hitting 1B rows soon.
> (around 250Gb of data)

Quite a bit of data.  There's one big thing to note here I think- Postgres
will not take advantage of multiple CPUs for a given query, Oracle will.
So, it depends on your workload as to how that may impact you.  Situations
where this will be unlikely to affect you:

Your main bottle-neck is IO/disk and not CPU.
You run multiple queries in parallel frequently.
There are other processes on the system which chew up CPU time anyway.

Situations where you're likely to be affected would be:

You periodically run one big query.
You run a set of queries in sequential order.

> My questions to the list are: has this sort of thing been attempted 
> before? If so, what where the performance results compared to Oracle?

I'm pretty sure it's been attempted before but unfortunately I don't have
any numbers on it myself.  My data sets aren't that large (couple million
rows) but I've found PostgreSQL at least as fast as Oracle for what we do,
and much easier to work with.

> I've been reading up on partitioned tabes on pgsql, will the 
> performance benefit will be comparable to Oracle partitioned tables?

In this case I would think so, except that PostgreSQL still won't use
multiple CPUs for a given query, even against partitioned tables, aiui.

> What are the gotchas?

See above? :)  Other issues are things having to do w/ your specific
SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is it,
something like select x,y from a,b where x=%y; to do a right-join, iirc).

> Should I be testing on 8 or the 7 version?

Now that 8.0 is out I'd say probably test with that and just watch for 8.0.x
releases before you go production, if you have time before you have to go
into production with the new solution (sounds like you do- changing
databases takes time anyway).

> Thanks in advance for any help you may have, I'll do my best to keep 
> pgsql-performance up to date on the results.

Hope that helps.  Others on here will correct me if I misspoke. :)

Stephen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM]

2005-01-20 Thread Matt Casters


Joshua,

Actually that's a great idea!
I'll have to check if Solaris wants to play ball though.
We'll have to see as we don't have the new disks yet, ETA is next week.

Cheers,

Matt

-Oorspronkelijk bericht-
Van: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Verzonden: donderdag 20 januari 2005 21:26
Aan: [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Onderwerp: Re: [PERFORM]

Matt Casters wrote:
>  
> Thanks Stephen,
> 
> My main concern is to get as much read performance on the disks as 
> possible on this given system.  CPU is rarely a problem on a typical 
> data warehouse system, this one's not any different.
> 
> We basically have 2 RAID5 disk sets (300Gb) and 150Gb) with a third 
> one coming along.(around 350Gb)

Why not run two raid systems. A RAID 1 for your OS and a RAID 10 for your
database? Push all of your extra drives into the RAID 10.

Sincerely,

Joshua D. Drake




> I was kind of hoping that the new PGSQL tablespaces would allow me to 
> create a storage container spanning multiple file-systems, but 
> unfortunately, that seems to be not the case.  Is this correct?
> 
> That tells me that I probably need to do a full reconfiguration of the 
> disks on the Solaris level to get maximum performance out of the system.
> Mmmm. This is going to be a though one to crack.  Perhaps it will be 
> possible to get some extra juice out of placing the indexes on the 
> smaller disks (150G) and the data on the bigger ones?
> 
> Thanks!
> 
> Matt
> 
> -Oorspronkelijk bericht-
> Van: Stephen Frost [mailto:[EMAIL PROTECTED]
> Verzonden: donderdag 20 januari 2005 15:26
> Aan: Matt Casters
> CC: pgsql-performance@postgresql.org
> Onderwerp: Re: [PERFORM]
> 
> * Matt Casters ([EMAIL PROTECTED]) wrote:
> 
>>I have the go ahead of a customer to do some testing on Postgresql in 
>>a couple of weeks as a replacement for Oracle.
>>The reason for the test is that the number of users of the warehouse 
>>is going to increase and this will have a serious impact on licencing 
>>costs. (I bet that sounds familiar)
> 
> 
> Rather familiar, yes... :)
> 
> 
>>We're running a medium sized data warehouse on a Solaris box (4CPU, 
>>8Gb
> 
> RAM) on Oracle.
> 
>>Basically we have 2 large fact tables to deal with: one going for 400M 
>>rows, the other will be hitting 1B rows soon.
>>(around 250Gb of data)
> 
> 
> Quite a bit of data.  There's one big thing to note here I think- 
> Postgres will not take advantage of multiple CPUs for a given query,
Oracle will.
> So, it depends on your workload as to how that may impact you.  
> Situations where this will be unlikely to affect you:
> 
> Your main bottle-neck is IO/disk and not CPU.
> You run multiple queries in parallel frequently.
> There are other processes on the system which chew up CPU time anyway.
> 
> Situations where you're likely to be affected would be:
> 
> You periodically run one big query.
> You run a set of queries in sequential order.
> 
> 
>>My questions to the list are: has this sort of thing been attempted 
>>before? If so, what where the performance results compared to Oracle?
> 
> 
> I'm pretty sure it's been attempted before but unfortunately I don't 
> have any numbers on it myself.  My data sets aren't that large (couple 
> million
> rows) but I've found PostgreSQL at least as fast as Oracle for what we 
> do, and much easier to work with.
> 
> 
>>I've been reading up on partitioned tabes on pgsql, will the 
>>performance benefit will be comparable to Oracle partitioned tables?
> 
> 
> In this case I would think so, except that PostgreSQL still won't use 
> multiple CPUs for a given query, even against partitioned tables, aiui.
> 
> 
>>What are the gotchas?
> 
> 
> See above? :)  Other issues are things having to do w/ your specific
> SQL- Oracle's old join syntax isn't supported by PostgreSQL (what is 
> it, something like select x,y from a,b where x=%y; to do a right-join,
iirc).
> 
> 
>>Should I be testing on 8 or the 7 version?
> 
> 
> Now that 8.0 is out I'd say probably test with that and just watch for 
> 8.0.x releases before you go production, if you have time before you 
> have to go into production with the new solution (sounds like you do- 
> changing databases takes time anyway).
> 
> 
>>Thanks in advance for any help you may have, I'll do my best to keep 
>>pgsql-performance up to date on the results.
> 
> 
> Hope that helps.  Others on here will correct me if I misspoke. :)
> 
>   Stephen
> 
> 
> 
> ---(e

[PERFORM] DWH on Postgresql

2005-01-20 Thread Matt Casters




Hi,
 
I have the go ahead 
of a customer to do some testing on Postgresql in a couple of weeks as a 
replacement for Oracle.

The reason for the 
test is that the number of users of the warehouse is going to increase and this 
will have a serious impact on licencing costs. (I bet that sounds 
familiar)
 
We're running a medium sized data 
warehouse on a Solaris box (4CPU, 8Gb RAM) on Oracle.
Basically we have 2 large fact tables 
to deal with: one going for 400M rows, the other will be 
hitting 1B rows soon.
(around 250Gb of 
data)
 
My questions to the 
list are: has this sort of thing been attempted before? If so, what where 
the results?
I've been reading up 
on partitioned tabes on pgsql, will the performance benefit will be comparable 
to Oracle partitioned tables?
What are the 
gotchas?  Should I be testing on 8 or the 7 version?
 
Thanks in 
advance for any help you may have, I'll do my best to keep 
pgsql-performance up to date on the results.
 
Best 
regards,
 
Matt
___
Matt Casters
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 OKEGEM, Belgium
Tel. 054/25.01.37
GSM 0486/97.29.37

Re: [PERFORM]

2005-01-21 Thread Matt Casters


> On Thu, Jan 20, 2005 at 11:31:29 -0500,
>   Alex Turner <[EMAIL PROTECTED]> wrote:
>> I am curious - I wasn't aware that postgresql supported partitioned tables,
>> Could someone point me to the docs on this.
>
> Some people have been doing it using a union view. There isn't actually
> a partition feature.
>
>

Actually, there is.  If found this example on pgsql-performance:

>>   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
>>   ANALYZE super_foo ;
>>
>>   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
>>   INSERT INTO sub_foo1 VALUES ( 1, 1 );
>>   -- repeat insert until sub_foo1 has 1,000,000 rows
>>   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
>>   ANALYZE sub_foo1 ;
>>
>>   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
>>   INSERT INTO sub_foo2 VALUES ( 2, 1 );
>>   -- repeat insert until sub_foo2 has 1,000,000 rows
>>   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
>>   ANALYZE sub_foo2 ;
>>

I think that in certain cases this system even beats Oracle as it stores less 
information in the
table partitions. (and in doing so is causing less disk IO)
BTW, internally, Oracle sees partitions as tables too.  Even the "Union all" 
system that MS SQL
Server uses works fine as long as the optimiser supports it to prune correctly.

Cheers,

Matt
--
Matt Casters <[EMAIL PROTECTED]>
i-Bridge bvba, http://www.kettle.be
Fonteinstraat 70, 9400 Okegem, Belgium
Phone +32 (0) 486/97.29.37



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Matt Clark

Presumably it can't _ever_ know without being explicitly told, because 
even for a plain SELECT there might be triggers involved that update 
tables, or it might be a select of a stored proc, etc.  So in the 
general case, you can't assume that a select doesn't cause an update, 
and you can't be sure that the table list in an update is a complete 
list of the tables that might be updated.


Tatsuo Ishii wrote:
Can I ask a question?
Suppose table A gets updated on the master at time 00:00. Until 00:03
pgpool needs to send all queries regarding A to the master only. My
question is, how can pgpool know a query is related to A?
--
Tatsuo Ishii
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM]

2005-01-21 Thread Matt Casters


>> > Some people have been doing it using a union view. There isn't actually
>> > a partition feature.
>>
>> Actually, there is.  If found this example on pgsql-performance:
>>
>> >>   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
>> >>   ANALYZE super_foo ;
>> >>
>> >>   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
> [...]
>> >>
>> >>   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
> [...]
>> >>
>
> Yes, this could be used instead of a view. But there is one thing
> missing. You can't just insert into super_foo and aquire the "correct
> partition". You will still have to insert into the correct underlying
> table. "Real" partitioning will take care of correct partition
> selection.

This IS bad news.  It would mean a serious change in the ETL.
I think I can solve the other problems, but I don't know about this one...

Regards,

Matt





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-21 Thread Matt Clark

Yes, I wasn't really choosing my examples particularly carefully, but I 
think the conclusion stands: pgpool (or anyone/thing except for the 
server) cannot in general tell from the SQL it is handed by the client 
whether an update will occur, nor which tables might be affected.

That's not to say that pgpool couldn't make a good guess in the majority 
of cases!

M
Joshua D. Drake wrote:
Matt Clark wrote:
Presumably it can't _ever_ know without being explicitly told, 
because even for a plain SELECT there might be triggers involved that 
update tables, or it might be a select of a stored proc, etc.  So in 
the general case, you can't assume that a select doesn't cause an 
update, and you can't be sure that the table list in an update is a 
complete list of the tables that might be updated.

Uhmmm no :) There is no such thing as a select trigger. The closest 
you would get
is a function that is called via select which could be detected by 
making sure
you are prepending with a BEGIN or START Transaction. Thus yes pgPool 
can be made
to do this.

Sincerely,
Joshua D. Drake

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

[PERFORM] Query plan excluding index on view

2008-04-03 Thread Matt Klinker

I'm trying to fine tune this query to return in a reasonable amount of time
and am having difficulties getting the query to run the way I'd like.  I
have a couple of semi-related entities that are stored in individual tables,
say, A and B.  There is then a view created that pulls together the common
fields from these 2 tables.  These are then related through a m:m
relationship to a classification.  Quick definitions of all of this follows:

Table: ItemA
id <- primary key
name
description


Table: ItemB
id<- primary key
name
description



View: Combined
SELECT id, name, description from ItemA
UNION ALL
SELECT id, name, description from ItemB


Table: xref
id   <- primary key
item_id  <- indexed, points to either ItemA.id or ItemB.id
classifcation_id  <- indexed, points to classification.id


Table: classifcation
id   <- primiary key
name

I'm trying to query from the classification, through the xref, and to the
view to get a list of Items (either A or B) that are tied to a specific
classification.  My query is rather simple, baiscally as follows:

SELECT id, name, description
FROM combination c
INNER JOIN xref on c.id = xref.item_id
WHERE xref.classifcation_id = 1

This query runs in about 2-3 minutes (I should mention that ItemA has ~18M
records and xref has ~26M records - and both will continue to grow).  The
explain text shows a disregard for the indexes on ItemA and ItemB and a
sequence scan is done on both of them.  However, if I rewrite this query to
join directly to ItemA rather to the view it runs in ~50ms because it now
uses the proper index.

I know it's generally requested to include the EXPLAIN text when submitting
a specific question, but I thought perhaps this was generic enough that
someone might at least have some suggestions.  If required I can certainly
work up a simpler example, or I could include my actual explain (though it
doesn't exactly match everything defined above as I tried to keep things
rather generic).

Any links would be nice as well, from all my searching the past few days,
most of the performance tuning resources I could find where about tuning the
server itself, not really a specific query - at least not one that dealt
with this issue.  If you've read this far - thank you much!

Re: [PERFORM] Query plan excluding index on view

2008-04-03 Thread Matt Klinker

Sorry for not including this extra bit originally.  Below is the explain
detail from both the query to the view that takes longer and then the query
directly to the single table that performs quickly.

Hash Join  (cost=49082.96..1940745.80 rows=11412 width=76)
  Hash Cond: (outer.?column1? = inner.listing_fid)
  ->  Append  (cost=0.00..1290709.94 rows=18487347 width=252)
->  Subquery Scan *SELECT* 1  (cost=0.00..1285922.80 rows=18384890
width=251)
  ->  Seq Scan on company  (cost=0.00..1102073.90 rows=18384890
width=251)
->  Subquery Scan *SELECT* 2  (cost=0.00..4787.14 rows=102457
width=252)
  ->  Seq Scan on school  (cost=0.00..3762.57 rows=102457
width=252)
  ->  Hash  (cost=49042.64..49042.64 rows=16130 width=8)
->  Bitmap Heap Scan on listing_node_xref xref
(cost=102.45..49042.64 rows=16130 width=8)
  Recheck Cond: (node_fid = 173204537)
  ->  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..102.45 rows=16130 width=0)
Index Cond: (node_fid = 173204537)


Nested Loop  (cost=102.45..98564.97 rows=11349 width=517)
  ->  Bitmap Heap Scan on listing_node_xref xref  (cost=102.45..49042.64
rows=16130 width=8)
Recheck Cond: (node_fid = 173204537)
->  Bitmap Index Scan on idx_listing_node_xref_node_fid
(cost=0.00..102.45 rows=16130 width=0)
  Index Cond: (node_fid = 173204537)
  ->  Index Scan using idx_pki_company_id on company c  (cost=0.00..3.06
rows=1 width=517)
Index Cond: (c.id = outer.listing_fid)


On Thu, Apr 3, 2008 at 7:19 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Matt Klinker" <[EMAIL PROTECTED]> writes:
> > I new I'd forget something!  I've tried this on both 8.2 and 8.3 with
> the
> > same results.
>
> Then you're going to have to provide more details ...
>
>regards, tom lane
>

Re: [PERFORM] Query plan excluding index on view

2008-04-04 Thread Matt Klinker

.304.90 rows=16240 width=0)
  Index Cond: (node_fid = 173204537)
  ->  Index Scan using pk_company_id on company c  (cost=0.00..9.67 rows=1
width=424)
Index Cond: (c.id = xref.listing_fid)




On Thu, Apr 3, 2008 at 11:49 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Matt Klinker" <[EMAIL PROTECTED]> writes:
> > Sorry for not including this extra bit originally.  Below is the explain
> > detail from both the query to the view that takes longer and then the
> query
> > directly to the single table that performs quickly.
> ...
> > ->  Subquery Scan *SELECT* 1  (cost=0.00..1285922.80
> rows=18384890
> > width=251)
> >   ->  Seq Scan on company  (cost=0.00..1102073.90
> rows=18384890
>
> The presence of a Subquery Scan node tells me that either this is a much
> older PG version than you stated, or there are some interesting details
> to the query that you omitted.  Please drop the fan-dance routine and
> show us a complete reproducible test case.
>
>regards, tom lane
>

Re: [SOLVED] [PERFORM] Query plan excluding index on view

2008-04-07 Thread Matt Klinker

Removing the constants definitely did take care of the issue on 8.3 (still
same query plan on 8.1).  Thanks for your help in getting this resolved, and
sorry again for not including all relevant information on my initial request

On Fri, Apr 4, 2008 at 10:20 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> "Matt Klinker" <[EMAIL PROTECTED]> writes:
> > --Joined View:
> > CREATE OR REPLACE VIEW directory_listing AS
> >  SELECT school.id, school.name, school.description, 119075291 AS
> > listing_type_fid
> >FROM school
> > UNION ALL
> >  SELECT company.id, company.name, company.description, 119074833 AS
> > listing_type_fid
> >FROM company;
>
> Ah, there's the problem :-(.  Can you get rid of the constants here?
> The planner's currently not smart about UNION ALL subqueries unless
> their SELECT lists contain just simple column references.
>
> (Yes, fixing that is on the todo list, but don't hold your breath...
> it'll be 8.4 material at the earliest.)
>
>regards, tom lane
>

Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Matt Smiley

Hi David,

Early in this thread, Pavel suggested:

> you should partial index
> 
> create index foo(b) on mytable where a is null;

Rather, you might try the opposite partial index (where a is NOT null) as a 
replacement for the original unqualified index on column A.  This new index 
will be ignored by the query you're trying to tune, but it'll be available to 
the other queries that filter to a non-null value of column A.  (Omitting NULL 
from that index should be ok because you normally wouldn't want to use an index 
when 95% of the table's rows match the filtered key.)

Then you can temporarily disable Seq Scans in your session for just this one 
query, as follows:

SQL> create table my_table ( a int, b int ) ;
CREATE TABLE

SQL> create index idx_a_not_null on my_table ( a ) where a is not null ;
CREATE INDEX

SQL> create index idx_b on my_table ( b ) ;
CREATE INDEX

SQL> insert into my_table (a, b)
select
  case when random() <= 0.95 then null else i end as a,
  mod(i, 10) as b
from generate_series(1, 1000) s(i)
;
INSERT 0 1000

SQL> analyze my_table ;
ANALYZE


Review the statistics available to the optimizer:

SQL> select attname, null_frac, n_distinct, most_common_vals, 
most_common_freqs, histogram_bounds, correlation
from pg_stats
where tablename = 'my_table'
order by attname
;
 attname | null_frac | n_distinct |   most_common_vals| 
 most_common_freqs   |  
  histogram_bounds| 
correlation
-+---++---+--++-
 a   | 0.945 | -1 |   | 
 | 
{2771,1301755,2096051,3059786,3680728,4653531,5882434,6737141,8240245,9428702,9875768}
 |   1
 b   | 0 | 10 | {9,4,3,1,2,6,8,5,7,0} | 
{0.110333,0.104,0.102333,0.100333,0.100333,0.0996667,0.0986667,0.098,0.096,0.09}
 |  
  |0.127294
(2 rows)

SQL> select relname, reltuples, relpages from pg_class where relname in 
('my_table', 'idx_a_not_null', 'idx_b') order by relname ;
relname | reltuples | relpages
+---+--
 idx_a_not_null |499955 | 1100
 idx_b  | 1e+07 |21946
 my_table   | 1e+07 |39492
(3 rows)


Run the test query, first without disabling Seq Scan to show this example 
reproduces the plan you're trying to avoid.

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
---
 Limit  (cost=0.00..2.66 rows=15 width=8) (actual time=0.070..0.263 rows=15 
loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.00 rows=929250 width=8) (actual 
time=0.061..0.159 rows=15 loops=1)
 Filter: ((a IS NULL) AND (b = 5))
 Total runtime: 0.371 ms
(4 rows)


Now run the same query without the Seq Scan option.

SQL> set enable_seqscan = false ;
SET

SQL> explain analyze select * from my_table where a is null and b = 5 limit 15 ;
QUERY PLAN
--
 Limit  (cost=0.00..46.33 rows=15 width=8) (actual time=0.081..0.232 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..2869913.63 rows=929250 
width=8) (actual time=0.072..0.130 rows=15 loops=1)
 Index Cond: (b = 5)
 Filter: (a IS NULL)
 Total runtime: 0.341 ms
(5 rows)

SQL> reset enable_seqscan ;
RESET


Yes, it's unsavory to temporarily adjust a session-level parameter to tune a 
single query, but I don't know of a less intrusive way to avoid the SeqScan.  
Here's why I think it might be your simplest option:

As far as I can tell, the plan nodes for accessing the table/index are unaware 
of the LIMIT.  The cost of the Limit node is estimated as the cost of its input 
row-source multiplied by the ratio of requested/returned rows.  For example, 
from the preceding plan output:
2869913.63 for "Index Scan" upper cost * (15 row limit / 929250 returned 
rows) = 46.326 upper cost for the "Limit" node
The underlying plan nodes each assume that all the rows matching their filter 
predicates will be returned up the pipeline; the cost estimate is only reduced 
at the Limit node.  A Seq Scan and an Index Scan (over a complete index) will 
both expected the same number of input rows (pg_class.reltuples).  They also 
produce the same estimate

Re: [PERFORM] limit clause breaks query planner?

2008-09-03 Thread Matt Smiley

"Tom Lane" <[EMAIL PROTECTED]> writes:
>  "Matt Smiley" <[EMAIL PROTECTED]> writes:
>  >  So an Index Scan is always going to have a higher cost estimate than
>  >  an equivalent Seq Scan returning the same result rows (unless
>  >  random_page_cost is < 1).  That's why I think the planner is always
>  >  preferring the plan that uses a Seq Scan.
>  
>  If that were the case, we'd never choose an indexscan at all...

You're right, that was a silly guess.

>  It's true that a plain indexscan is not preferred for queries that will
>  return a large fraction of the table.  However, it should be willing to
>  use a bitmap scan for this query, given default cost settings (the
>  default cost settings will cause it to prefer bitmap scan for retrieving
>  up to about a third of the table, in my experience).  I too am confused
>  about why it doesn't prefer that choice in the OP's example.

It looks like the bitmap scan has a higher cost estimate because the entire 
bitmap index must be built before beginning the heap scan and returning rows up 
the pipeline.  The row-count limit can't be pushed lower than the 
bitmap-heap-scan like it can for the basic index-scan.

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# set enable_indexscan = false ;
SET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
   QUERY PLAN

 Limit  (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 
rows=15 loops=1)
   ->  Bitmap Heap Scan on my_table  (cost=17070.22..69478.96 rows=988217 
width=8) (actual time=606.892..606.983 rows=15 loops=1)
 Recheck Cond: (b = 3)
 Filter: (a IS NULL)
 ->  Bitmap Index Scan on idx_b  (cost=0.00..16823.17 rows=109 
width=0) (actual time=592.657..592.657 rows=100 loops=1)
   Index Cond: (b = 3)
 Total runtime: 607.340 ms
(7 rows)


>  It would be interesting to alter the random_page_cost setting and see if he 
> gets
>  different results.

Using an unmodified postgresql.conf, the cost estimate for an index-scan were 
so much higher than for a seqscan that random_page_cost had to be set below 0.2 
before the index-scan was preferred.  However, it looks like this was mainly 
because effective_cache_size was too small.  The planner thought the cache was 
only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 
8 KB/block = 330 MB.  It makes sense for the cost estimate to be so much higher 
if blocks are expected to be repeatedly re-fetched from disk.  I wonder if 
David's effective_cache_size is too small.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
QUERY PLAN
---
 Limit  (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 
loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.74 rows=988217 width=8) (actual 
time=0.028..0.138 rows=15 loops=1)
 Filter: ((a IS NULL) AND (b = 3))
 Total runtime: 0.338 ms
(4 rows)

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# show random_page_cost ;
 random_page_cost
--
 4
(1 row)

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
QUERY PLAN
--
 Limit  (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..3029924.36 rows=988217 
width=8) (actual time=0.043..0.100 rows=15 loops=1)
 Index Cond: (b = 3)
 Filter: (a IS NULL)
 Total runtime: 0.308 ms
(5 rows)

test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
   QUERY PLAN
-
 Limit  (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..161190.65 rows=988217 
width=8) (actual time=0.042..0.097 rows=15 loops=1)
 Index Cond: (b = 3)
 Filter: (a IS NULL)
 Total runtime: 0.307 ms
(5 rows)


Now fix effective_cache_size and try again.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# set effect

Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Matt Smiley

"Tom Lane" <[EMAIL PROTECTED]> writes:
> I'm not sure offhand whether the existing correlation stats would be of use 
> for
> it, or whether we'd have to get ANALYZE to gather additional data.

Please forgive the tangent, but would it be practical to add support for 
gathering statistics on an arbitrary expression associated with a table, rather 
than just on materialized columns?  For example:
analyze my_tab for expression 'my_func(my_tab.col)' ;
It seems like any time you'd consider using a functional index, this feature 
would let the planner calculate decent selectivity estimates for the 
expression's otherwise opaque data distribution.  The expression might be 
treated as a virtual column on the table; not sure if that helps or hurts.  
Should I post this question on pgsql-hackers?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] inaccurate stats on large tables

2008-09-08 Thread Matt Smiley

Hi Kiran,

You gave great info on your problem.

First, is this the query you're actually trying to speed up, or is it a 
simplified version?  It looks like the optimizer has already chosen the best 
execution plan for the given query.  Since the query has no joins, we only have 
to consider access paths.  You're fetching 58221/37909009 = 0.15% of the rows, 
so a sequential scan is clearly inappropriate.  A basic index scan is likely to 
incur extra scattered I/O, so a bitmap index scan is favored.

To improve on this query's runtime, you could try any of the following:

 - Reorganize the data to reduce this query's scattered I/O (i.e. cluster on 
"paliasorigin_search3_idx" rather than "paliasorigin_alias_casefold_idx").  
Bear in mind, this may adversely affect other queries.

 - Increase the cache hit frequency by ensuring the underlying filesystem cache 
has plenty of RAM (usually so under Linux) and checking that other concurrent 
queries aren't polluting the cache.  Consider adding RAM if you think the 
working set of blocks required by most queries is larger than the combined 
Postgres and filesystem caches.  If other processes than the db do I/O on this 
machine, consider them as resource consumers, too.

 - Restructure the table, partitioning along a column that would be useful for 
pruning whole partitions for your painful queries.  In this case, origin_id or 
tax_id seems like a good bet, but again, consider other queries against this 
table.  38 million rows probably makes your table around 2 GB (guessing about 
55 bytes/row).  Depending on the size and growth rate of the table, it may be 
time to consider partitioning.  Out of curiosity, what runtime are you 
typically seeing from this query?  The explain-analyze ran in 113 ms, which I'm 
guessing is the effect of caching, not the runtime you're trying to improve.

 - Rebuild the indexes on this table.  Under certain use conditions, btree 
indexes can get horribly bloated.  Rebuilding the indexes returns them to their 
most compact and balanced form.  For example: reindex index 
"paliasorigin_search3_idx";  Apart from the locking and CPU usage during the 
rebuild, this has no negative consequences, so I'd try this before something 
drastic like partitioning.  First review the current size of the index for 
comparison: select pg_size_pretty(pg_relation_size('paliasorigin_search3_idx'));

Since you asked specifically about improving the row-count estimate, like the 
previous responder said, you should consider increasing the statistics target.  
This will help if individual columns are being underestimated, but not if the 
overestimate is due to joint variation.  In other words, the optimizer has no 
way to tell if there is there a logical relationship between columns A and B 
such that certain values in B only occur with certain values of A.  Just 
judging from the names, it sounds like origin_id and tax_id might have a 
parent-child relationship, so I thought it was worth mentioning.

Do the columns individually have good estimates?
explain analyze select * from paliasorigin where origin_id=20;
explain analyze select * from paliasorigin where tax_id=9606;

If not, increase the statistics on that column, reanalyze the table, and 
recheck the selectivity estimate:
alter table paliasorigin alter column origin_id set statistics 20;
analyze paliasorigin;
explain analyze select * from paliasorigin where origin_id=20;

Good luck!
Matt



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] too many clog files

2008-09-09 Thread Matt Smiley

Hi Duan,

As others have said, you should probably attempt to run pg_dump to export your 
database.  If that doesn't work, consider restoring from backup.  If the dump 
does work, you can create a clean PGDATA directory (using initdb like when you 
setup your original installation), and create a fresh copy of your database 
using the dump file.  Then abandon your potentially damaged PGDATA directory.

For future reference:

 - The "autovacuum" parameter in postgresql.conf is off by default under 
Postgres 8.1.  You should probably turn it on to ensure regular vacuuming, 
unless you have your own cronjob to do the vacuuming.

 - About finding old transactions, there are 2 places you have to look for old 
transactions.  The usual place is in pg_stat_activity.  The 2nd place is 
"pg_prepared_xacts", where prepared transactions are listed.  If there's a 
prepared transaction in your system, it might explain why your old commit-logs 
aren't being purged.  The following query shows both prepared and normal 
transactions:

select
  l.transactionid,
  age(l.transactionid) as age,  /* measured in number of other transactions 
elapsed, not in terms of time */
  l.pid,
  case when l.pid is null then false else true end as is_prepared,
  a.backend_start,
  p.prepared as time_xact_was_prepared,
  p.gid as prepared_name
from
  pg_locks l
  left outer join pg_stat_activity a on l.pid = a.procpid
  left outer join pg_prepared_xacts p on l.transactionid = p.transaction
where
  l.locktype = 'transactionid'
  and l.mode = 'ExclusiveLock'
  and l.granted
order by age(l.transactionid) desc
;

 transactionid | age | pid  | is_prepared | backend_start |
time_xact_was_prepared |  prepared_name
---+-+--+-+---+---+--
316645 |  44 |  | f   |   | 
2008-09-09 00:31:46.724178-07 | my_prepared_transaction1
316689 |   0 | 6093 | t   | 2008-09-09 00:40:10.928287-07 | 
  |
(2 rows)

Note that unless you run this query as a superuser (e.g. "postgres"), the 
columns from pg_stat_activity will only be visible for sessions that belong to 
you.  To rollback this example prepared transaction, you'd type:
  ROLLBACK PREPARED 'my_prepared_transaction1';

Hope this helps!
Matt



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] too many clog files

2008-09-09 Thread Matt Smiley

Alvaro Herrera wrote:
> Move the old clog files back where they were, and run VACUUM FREEZE in
> all your databases.  That should clean up all the old pg_clog files, if
> you're really that desperate.

Has anyone actually seen a CLOG file get removed under 8.2 or 8.3?  How about 
8.1?

I'm probably missing something, but looking at src/backend/commands/vacuum.c 
(under 8.2.9 and 8.3.3), it seems like vac_truncate_clog() scans through *all* 
tuples of pg_database looking for the oldest datfrozenxid.  Won't that always 
be template0, which as far as I know can never be vacuumed (or otherwise 
connected to)?

postgres=# select datname, datfrozenxid, age(datfrozenxid), datallowconn from 
pg_database order by age(datfrozenxid), datname ;
 datname  | datfrozenxid |   age| datallowconn
--+--+--+--
 template1| 36347792 | 3859 | t
 postgres | 36347733 | 3918 | t
 mss_test | 36347436 | 4215 | t
 template0|  526 | 36351125 | f
(4 rows)

I looked at several of my 8.2 databases' pg_clog directories, and they all have 
all the sequentially numbered segments ( through current segment).  Would 
it be reasonable for vac_truncate_clog() to skip databases where datallowconn 
is false (i.e. template0)?  Looking back to the 8.1.13 code, it does exactly 
that:
if (!dbform->datallowconn)
continue;

Also, Duan, if you have lots of files under pg_clog, you may be burning through 
transactions faster than necessary.  Do your applications leave autocommit 
turned on?  And since no one else mentioned it, as a work-around for a small 
filesystem you can potentially shutdown your database, move the pg_clog 
directory to a separate filesystem, and create a symlink to it under your 
PGDATA directory.  That's not a solution, just a mitigation.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-05 Thread Matt Burke

Arjen van der Meijden wrote:

> Afaik the Perc 5/i and /e are more or less rebranded LSI-cards (they're
> not identical in layout etc), so it would be a bit weird if they
> performed much less than the similar LSI's wouldn't you think?

I've recently had to replace a PERC4/DC with the exact same card made by
LSI (320-2) because the PERCs firmware was crippled. Its idea of RAID10
actually appears to be concatenated RAID1 arrays.

Since replacing it and rebuilding the array on the LSI card, performance
has been considerably better (14 disk SCSI shelf)

> Areca may be the fastest around right now, but if you'd like to get it
> all from one supplier, its not too bad to be stuck with Dell's perc 5 or
> 6 series.

The PERC6 isn't too bad, however it grinds to a halt when the IO queue
gets large and it has the serious limitation of not supporting more than
8 spans, so trying to build a RAID10 array greater than 16 disks is
pointless if you're not just after the extra capacity.

Are there any reasonable choices for bigger (3+ shelf) direct-connected
RAID10 arrays, or are hideously expensive SANs the only option? I've
checked out the latest Areca controllers, but the manual available on
their website states there's a limitation of 32 disks in an array...

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-05 Thread Matt Burke

Glyn Astill wrote:

> Did you try flashing the PERC with the LSI firmware?
> 
> I tried flashing a PERC3/dc with LSI firmware, it worked fine but I
> saw no difference in performance so I assumed it must be somethign
> else on the board that cripples it.

No, for a few reasons:

1. I read somewhere on the interwebs that doing so would brick the card
2. I don't have access to a DOS/Windows machine
3. Dodgy hardware isn't what you want when dealing with large databases

If it's not just a firmware issue it wouldn't surprise me if you could
just link a couple of pins/contacts/etc on the card and gain the LSIs
capabilities, but it's not an idea I'd entertain outside of personal use...

-- 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke

Rajesh Kumar Mallah wrote:
>> I've checked out the latest Areca controllers, but the manual 
>> available on their website states there's a limitation of 32 disks 
>> in an array...
> 
> Where exactly is there  limitation of 32 drives. the datasheet of 
> 1680 states support upto 128drives using enclosures.

The 1680 manual:
http://www.areca.us//support/download/RaidCards/Documents/Manual_Spec/SAS_Manual.zip

Page 25:

> Note:
> 
> 1. The maximum no. is 32 disk drived included in a single RAID set

Page 49:

> 1. Up to 32 disk drives can be included in a single RAID set.
> 2. Up to 8 RAID sets can be created per controller

(point 2 meaning you can't do s/w RAID over umpteen h/w RAID1 pairs)

Page 50:

> To create RAID 30/50/60 volume, you need create multiple RAID sets
> first with the same disk members on each RAID set. The max no. disk
> drives per volume set: 32 for RAID 0/1/10/3/5/6 and 128 for RAID
> 30/50/60.

...and a few more times saying the same thing

-- 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke

Scott Carey wrote:
> You probably don’t want a single array with more than 32 drives anyway,
> its almost always better to start carving out chunks and using software
> raid 0 or 1 on top of that for various reasons. I wouldn’t put more than
> 16 drives in one array on any of these RAID cards, they’re just not
> optimized for really big arrays and tend to fade between 6 to 16 in one
> array, depending on the quality.

This is what I'm looking at now. The server I'm working on at the moment
currently has a PERC6/e and 3xMD1000s which needs to be tested in a few
setups.  I need to code a benchmarker yet (I haven't found one yet that
can come close to replicating our DB usage patterns), but I intend to try:

1. 3x h/w RAID10 (one per shelf), sofware RAID0
2. lots x h/w RAID1, software RAID0 if the PERC will let me create
enough arrays
3. Pure s/w RAID10 if I can convince the PERC to let the OS see the disks
4. 2x h/w RAID30, software RAID0

I'm not holding much hope out for the last one :)


I'm just glad work on a rewrite of my inherited backend systems should
start soon; get rid of the multi-TB MySQL hell and move to a distributed
PG setup on dirt cheap Dell R200s/blades


> You can do direct-attached storage to 100+ drives or more if you want.
>  The price and manageability cost go up a lot if it gets too big
> however.  Having global hot spare drives is critical.  Not that the cost
> of using SAN’s and such is low...  SAS expanders have made DAS with
> large arrays very accessible though.

For large storage arrays (RAID60 or similar) you can't beat a RAID
controller and disk shelf(s), especially if you keep the raidsets small
and use cheap ludicrous capacity SATA disks

You just need to be aware that performance doesn't scale well/easily
over 1-2 shelves on the things


-- 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-06 Thread Matt Burke

Glyn Astill wrote:
>> Stupid question, but why do people bother with the Perc line of
>> cards if the LSI brand is better?  It seems the headache of trying 
>> to get the Perc cards to perform is not worth any money saved.
> 
> I think in most cases the dell cards actually cost more, people end
> up stuck with them because they come bundled with their servers -
> they find out too late that they've got a lemon.

That's what's been happening with me... The fact Dell prices can have a
fair bit of downward movement when you get the account manager on the
phone makes them especially attractive to the people controlling the
purse strings.

The biggest reason for me however is the lack of comparative reviews. I
struggled to get the LSI card to replace the PERC3 because all I had to
go on was qualitative mailing list/forum posts from strangers. The only
way I got it was to make the argument that other than trying the LSI,
we'd have no choice other than replacing the server+shelf+disks.

I want to see just how much better a high-end Areca/Adaptec controller
is, but I just don't think I can get approval for a £1000 card "because
some guy on the internet said the PERC sucks". Would that same person
say it sucked if it came in Areca packaging? Am I listening to the
advice of a professional, or a fanboy?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow-ish Query Needs Some Love

2010-02-02 Thread Matt White

On Feb 2, 6:06 am, Edgardo Portal  wrote:
> On 2010-02-02, Matt White  wrote:
>
> > I have a relatively straightforward query that by itself isn't that
> > slow, but we have to run it up to 40 times on one webpage load, so it
> > needs to run much faster than it does. Here it is:
>
> > SELECT COUNT(*) FROM users, user_groups
> >  WHERE users.user_group_id = user_groups.id AND NOT users.deleted AND
> > user_groups.partner_id IN
> >  (partner_id_1, partner_id_2);
>
> > The structure is partners have user groups which have users. In the
> > test data there are over 200,000 user groups and users but only ~3000
> > partners. Anyone have any bright ideas on how to speed this query up?
>
> Can you avoid running it 40 times, maybe by restructuring the
> query (or making a view) along the lines of the following and
> adding some logic to your page?
>
> SELECT p.partner_id, ug.user_group_id, u.id, count(*)
>   FROM partners p
>        LEFT JOIN user_groups ug
>               ON ug.partner_id=p.partner_id
>        LEFT JOIN users u
>               ON u.user_group_id=ug.id
>  WHERE NOT u.deleted
>  GROUP BY 1,2,3
> ;

Thanks for the suggestion. The view didn't seem to speed things up.
Perhaps we can reduce the number of times it's called, we'll see. Any
additional ideas would be helpful. Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

1 2 >

1 - 100 of 147 matches

Mail list logo