Re: [PERFORM] Nested Loop

2007-03-27 Thread Gauri Kanekar

Hi,

here is the query

SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS campname,
rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS advid, ra.name AS
advname, rpt_chn.id AS chanid, rpt_chn.name AS channame, rpt_cre.dn AS dn,
SUM(rm.imdel) AS impression, SUM(rm.cdel) AS click, rd.sqldate AS date FROM
rm, rn CROSS JOIN rd, ra, rs, rc, rpt_chn, rpt_cre WHERE rm.date_key =
rd.key AND rm.net_key = rn.key AND rm.adv_key = ra.key AND rm.camp_key =
rc.key AND rm.s_key = rs.key AND rm.chn_key = rpt_chn.key AND rm.cre_key =
rpt_cre.key AND ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND
'12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate, rs.id, rs.name,
ra.id, ra.name, rc.id, rc.name, rc.rev_type, rc.act_type, rpt_chn.id,
rpt_chn.name, rpt_cre.dn;



On 3/26/07, Ragnar <[EMAIL PROTECTED]> wrote:


On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote:

you did not show your query, nor did you answer whather you had vacuumed
and analyzed.

> enable_seqscan = off

why this? this is unlikely to help


>
> QUERY PLAN
> ...
>  ->  Nested Loop
> (cost=0.00..1104714.83 rows=6801 width=44) (actual
> time=1820.153..229779.814 rows=10945938 loops=1)

the estimates are way off here. you sure you have analyzed?

gnari

>





--
Regards
Gauri


Re: [PERFORM] Nested Loop

2007-03-27 Thread Ragnar
On þri, 2007-03-27 at 16:13 +0530, Gauri Kanekar wrote:
>  
> SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS
> campname, rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS
> advid, ra.name AS advname, rpt_chn.id AS chanid, rpt_chn.name AS
> channame, rpt_cre.dn AS dn, SUM(rm.imdel) AS impression, SUM(rm.cdel)
> AS click, rd.sqldate AS date FROM rm, rn CROSS JOIN rd, ra, rs, rc,
> rpt_chn, rpt_cre WHERE rm.date_key = rd.key AND rm.net_key = rn.key
> AND rm.adv_key = ra.key AND rm.camp_key = rc.key AND rm.s_key = rs.key
> AND rm.chn_key = rpt_chn.key AND rm.cre_key = rpt_cre.key AND
> ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND
> '12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate , rs.id,
> rs.name, ra.id, ra.name, rc.id, rc.name, rc.rev_type , rc.act_type,
> rpt_chn.id, rpt_chn.name, rpt_cre.dn;

you did not answer other questions, so do this:
1) VACUUM ANALYZE your database
2) set these in your postgresql.conf:
enable_seqscan = true
join_collapse_limit = 8
3) restart postgresql
4) do the EXPLAIN ANALYZE again, and send us it's output

gnari




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [GENERAL] ERROR: out of shared memory

2007-03-27 Thread Merlin Moncure

On 3/26/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Sorin N. Ciolofan" <[EMAIL PROTECTED]> writes:
>I have to manage an application written in java which call another module
> written in java which uses Postgre DBMS in a Linux environment. I'm new to
> Postgres. The problem is that for large amounts of data the application
> throws an:
>  org.postgresql.util.PSQLException: ERROR: out of shared memory

AFAIK the only very likely way to cause that is to touch enough
different tables in one transaction that you run out of lock entries.
While you could postpone the problem by increasing the
max_locks_per_transaction setting, I suspect there may be some basic
application misdesign involved here.  How many tables have you got?


or advisory locks...these are easy to spot.  query pg_locks and look
for entries of locktype 'advisory'.  I've already seen some apps in
the wild that use them, openads is one.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [GENERAL] ERROR: out of shared memory

2007-03-27 Thread Tom Lane
"Sorin N. Ciolofan" <[EMAIL PROTECTED]> writes:
> It seems that the legacy application creates tables dynamically and the
> number of the created tables depends on the size of the input of the
> application. For the specific input which generated that error I've
> estimated a number of created tables of about 4000. 
> Could be this the problem?

If you have transactions that touch many of them within one transaction,
then yup, you could be out of locktable space.  Try increasing
max_locks_per_transaction.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] How to enable jdbc???

2007-03-27 Thread Michael Dengler

Hi,

In postgres 7.4.* I had to pass --with-java to the configure script
for jdbc support.

Does postgres 8.2* include it by default? If not, how do I enable it?

Thanks

Miguel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] How to enable jdbc???

2007-03-27 Thread Joshua D. Drake

Michael Dengler wrote:

Hi,

In postgres 7.4.* I had to pass --with-java to the configure script
for jdbc support.

Does postgres 8.2* include it by default? If not, how do I enable it?


Just download the driver from jdbc.postgresql.org



Thanks

Miguel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Sunfire X4500 recommendations

2007-03-27 Thread Matt Smiley
Hi Dimitri,

First of all, thanks again for the great feedback!

Yes, my I/O load is mostly read operations.  There are some bulk writes done in 
the background periodically throughout the day, but these are not as 
time-sensitive.  I'll have to do some testing to find the best balance of read 
vs. write speed and tolerance of disk failure vs. usable diskspace.

I'm looking forward to seeing the results of your OLTP tests!  Good luck!  
Since I won't be doing that myself, it'll be all new to me.

About disk failure, I certainly agree that increasing the number of disks will 
decrease the average time between disk failures.  Apart from any performance 
considerations, I wanted to get a clear idea of the risk of data loss under 
various RAID configurations.  It's a handy reference, so I thought I'd share it:



The goal is to calculate the probability of data loss when we loose a certain 
number of disks within a short timespan (e.g. loosing a 2nd disk before 
replacing+rebuilding the 1st one).  For RAID 10, 50, and Z, we will loose data 
if any disk group (i.e. mirror or parity-group) looses 2 disks.  For RAID 60 
and Z2, we will loose data if 3 disks die in the same parity group.  The parity 
groups can include arbitrarily many disks.  Having larger groups gives us more 
usable diskspace but less protection.  (Naturally we're more likely to loose 2 
disks in a group of 50 than in a group of 5.)

g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or 
more; dual-parity = 4 or more)
n = total number of disks
risk of loosing any 1 disk = 1/n
risk of loosing 1 disk from a particular group = g/n
risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1)
risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2)

For the x4500, we have 48 disks.  If we stripe our data across all those disks, 
then these are our configuration options:

RAID 10 or 50 -- Mirroring or single-parity must loose 2 disks from the same 
group to loose data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   4824  0.09%
  3  16   4832  0.27%
  4  12   4836  0.53%
  6   8   4840  1.33%
  8   6   4842  2.48%
 12   4   4844  5.85%
 24   2   4846 24.47%
 48   1   4847100.00%

RAID 60 or Z2 -- Double-parity must loose 3 disks from the same group to loose 
data:
disks_per_group  num_groups  total_disks  usable_disks  risk_of_data_loss
  2  24   48   n/an/a
  3  16   4816  0.01%
  4  12   4824  0.02%
  6   8   4832  0.12%
  8   6   4836  0.32%
 12   4   4840  1.27%
 24   2   4844 11.70%
 48   1   4846100.00%

So, in terms of fault tolerance:
 - RAID 60 and Z2 always beat RAID 10, since they never risk data loss when 
only 2 disks fail.
 - RAID 10 always beats RAID 50 and Z, since it has the largest number of disk 
groups across which to spread the risk.
 - Having more parity groups increases fault tolerance but decreases usable 
diskspace.

That's all assuming each disk has an equal chance of failure, which is probably 
true since striping should distribute the workload evenly.  And again, these 
probabilities are only describing the case where we don't have enough time 
between disk failures to recover the array.

In terms of performance, I think RAID 10 should always be best for write speed. 
 (Since it doesn't calculate parity, writing a new block doesn't require 
reading the rest of the RAID stripe just to recalculate the parity bits.)  I 
think it's also normally just as fast for reading, since the controller can 
load-balance the pending read requests to both sides of each mirror.





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sunfire X4500 recommendations

2007-03-27 Thread david

On Tue, 27 Mar 2007, Matt Smiley wrote:




The goal is to calculate the probability of data loss when we loose a 
certain number of disks within a short timespan (e.g. loosing a 2nd disk 
before replacing+rebuilding the 1st one).  For RAID 10, 50, and Z, we 
will loose data if any disk group (i.e. mirror or parity-group) looses 2 
disks.  For RAID 60 and Z2, we will loose data if 3 disks die in the 
same parity group.  The parity groups can include arbitrarily many 
disks.  Having larger groups gives us more usable diskspace but less 
protection.  (Naturally we're more likely to loose 2 disks in a group of 
50 than in a group of 5.)


   g = number of disks in each group (e.g. mirroring = 2; single-parity = 3 or 
more; dual-parity = 4 or more)
   n = total number of disks
   risk of loosing any 1 disk = 1/n


please explain why you are saying that the risk of loosing any 1 disk is 
1/n. shouldn't it be probability of failure * n instead?



   risk of loosing 1 disk from a particular group = g/n
   risk of loosing 2 disks in the same group = g/n * (g-1)/(n-1)
   risk of loosing 3 disks in the same group = g/n * (g-1)/(n-1) * (g-2)/(n-2)


following this logic the risk of loosing all 48 disks in a single group of 
48 would be 100%


also what you are looking for is the probability of the second (and third) 
disks failing in time X (where X is the time nessasary to notice the 
failure, get a replacement, and rebuild the disk)


the killer is the time needed to rebuild the disk, with multi-TB arrays 
is't sometimes faster to re-initialize the array and reload from backup 
then it is to do a live rebuild (the kernel.org servers had a raid failure 
recently and HPA mentioned that it took a week to rebuild the array, but 
it would have only taken a couple days to do a restore from backup)


add to this the fact that disk failures do not appear to be truely 
independant from each other statisticly (see the recent studies released 
by google and cmu), and I wouldn't bother with single-parity for a 
multi-TB array. If the data is easy to recreate (including from backup) or 
short lived (say a database of log data that cycles every month or so) I 
would just do RAID-0 and plan on loosing the data on drive failure (this 
assumes that you can afford the loss of service when this happens). if the 
data is more important then I'd do dual-parity or more, along with a hot 
spare so that the rebuild can start as soon as the first failure is 
noticed by the system to give myself a fighting chance to save things.



In terms of performance, I think RAID 10 should always be best for write 
speed.  (Since it doesn't calculate parity, writing a new block doesn't 
require reading the rest of the RAID stripe just to recalculate the 
parity bits.)  I think it's also normally just as fast for reading, 
since the controller can load-balance the pending read requests to both 
sides of each mirror.


this depends on your write pattern. if you are doing sequential writes 
(say writing a log archive) then RAID 5 can be faster then RAID 10. since 
there is no data there to begin with the system doesn't have to read 
anything to calculate the parity, and with the data spread across more 
spindles you have a higher potential throughput.


if your write pattern is is more random, and especially if you are 
overwriting existing data then the reads needed to calculate the parity 
will slow you down.


as for read speed, it all depends on your access pattern and stripe size. 
if you are reading data that spans disks (larger then your stripe size) 
you end up with a single read tieing up multiple spindles. with Raid 1 
(and varients) you can read from either disk of the set if you need 
different data within the same stripe that's on different disk tracks (if 
it's on the same track you'll get it just as fast reading from a single 
drive, or so close to it that it doesn't matter). beyond that the question 
is how many spindles can you keep busy reading (as opposed to seeking to 
new data or sitting idle becouse you don't need their data)


the worst case for reading is to be jumping through your data in strides 
of stripe*# disks available (accounting for RAID type) as all your reads 
will end up hitting the same disk.


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org