[PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Rajesh Kumar Mallah
Hi,

I  plan to put 7.4-RC2 in our production servers in next few hours.

Since the hardware config & the performance related GUCs parameter
are  going to remain the same i am interested in seeing the performance
improvements in 7.4 as compared 7.3 .
For this i plan to use the OSDB 0.14  and compare the results for both the
cases.
Does any one has suggestions for comparing 7.4 against 7.3 ?
Since i am using OSDB for second time only any tips/guidance
on usage of that is also appreciated.


H/W config:

CPU: 4 X Intel(R) Xeon(TM) CPU 2.00GHz
MEM : 2 GB
I/O config : PGDATA on 1 RPM Ultra160 scsi , pg_xlog on a similar
seperate SCSI
GUC:
shared_buffers = 1
max_fsm_relations = 5000
max_fsm_pages = 55099264
sort_mem = 16384
vacuum_mem = 8192
All other performance related parameter have default
value eg:
#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)


BTW i get following error at the moment:
-
/usr/local/bin/osdb-pg-ui --postgresql=no_hash_index
"osdb"
"Invoked: /usr/local/bin/osdb-pg-ui --postgresql=no_hash_index"
 create_tables()0.78 secondsreturn value = 0
  load()1.02 secondsreturn value = 0
 create_idx_uniques_key_bt()0.64 secondsreturn value = 0
 create_idx_updates_key_bt()0.61 secondsreturn value = 0
 create_idx_hundred_key_bt()0.61 secondsreturn value = 0
  create_idx_tenpct_key_bt()0.62 secondsreturn value = 0
 create_idx_tenpct_key_code_bt()0.45 secondsreturn value = 0
create_idx_tiny_key_bt()0.46 secondsreturn value = 0
  create_idx_tenpct_int_bt()0.46 secondsreturn value = 0
   create_idx_tenpct_signed_bt()0.45 secondsreturn value = 0
 create_idx_uniques_code_h()0.46 secondsreturn value = 0
   create_idx_tenpct_double_bt()0.46 secondsreturn value = 0
   create_idx_updates_decim_bt()0.45 secondsreturn value = 0
create_idx_tenpct_float_bt()0.46 secondsreturn value = 0
 create_idx_updates_int_bt()0.46 secondsreturn value = 0
create_idx_tenpct_decim_bt()0.46 secondsreturn value = 0
 create_idx_hundred_code_h()0.45 secondsreturn value = 0
  create_idx_tenpct_name_h()0.46 secondsreturn value = 0
 create_idx_updates_code_h()0.46 secondsreturn value = 0
  create_idx_tenpct_code_h()0.45 secondsreturn value = 0
  create_idx_updates_double_bt()0.46 secondsreturn value = 0
create_idx_hundred_foreign()0.41 secondsreturn value = 0
  populateDataBase()11.54 seconds   return value = 0
Error in test Counting tuples at (6746)osdb.c:294:
... empty database -- empty results
perror() reports: Resource temporarily unavailable
someone sighup'd the parent

Any clue?

--

Regards
Mallah.




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Rajesh Kumar Mallah


the error mentioned in first email has been overcome
by running osdb on the same machine hosting the DB server.
regds
mallah.
Rajesh Kumar Mallah wrote:

Hi,

I  plan to put 7.4-RC2 in our production servers in next few hours.

Since the hardware config & the performance related GUCs parameter
are  going to remain the same i am interested in seeing the performance
improvements in 7.4 as compared 7.3 .
For this i plan to use the OSDB 0.14  and compare the results for both 
the
cases.

Does any one has suggestions for comparing 7.4 against 7.3 ?
Since i am using OSDB for second time only any tips/guidance
on usage of that is also appreciated.


H/W config:

CPU: 4 X Intel(R) Xeon(TM) CPU 2.00GHz
MEM : 2 GB
I/O config : PGDATA on 1 RPM Ultra160 scsi , pg_xlog on a similar
seperate SCSI
GUC:
shared_buffers = 1
max_fsm_relations = 5000
max_fsm_pages = 55099264
sort_mem = 16384
vacuum_mem = 8192
All other performance related parameter have default
value eg:
#effective_cache_size = 1000# typically 8KB each
#random_page_cost = 4   # units are one sequential page fetch 
cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)



BTW i get following error at the moment:
-
/usr/local/bin/osdb-pg-ui --postgresql=no_hash_index
"osdb"
"Invoked: /usr/local/bin/osdb-pg-ui --postgresql=no_hash_index"
 create_tables()0.78 secondsreturn value = 0
  load()1.02 secondsreturn value = 0
 create_idx_uniques_key_bt()0.64 secondsreturn value = 0
 create_idx_updates_key_bt()0.61 secondsreturn value = 0
 create_idx_hundred_key_bt()0.61 secondsreturn value = 0
  create_idx_tenpct_key_bt()0.62 secondsreturn value = 0
 create_idx_tenpct_key_code_bt()0.45 secondsreturn value = 0
create_idx_tiny_key_bt()0.46 secondsreturn value = 0
  create_idx_tenpct_int_bt()0.46 secondsreturn value = 0
   create_idx_tenpct_signed_bt()0.45 secondsreturn value = 0
 create_idx_uniques_code_h()0.46 secondsreturn value = 0
   create_idx_tenpct_double_bt()0.46 secondsreturn value = 0
   create_idx_updates_decim_bt()0.45 secondsreturn value = 0
create_idx_tenpct_float_bt()0.46 secondsreturn value = 0
 create_idx_updates_int_bt()0.46 secondsreturn value = 0
create_idx_tenpct_decim_bt()0.46 secondsreturn value = 0
 create_idx_hundred_code_h()0.45 secondsreturn value = 0
  create_idx_tenpct_name_h()0.46 secondsreturn value = 0
 create_idx_updates_code_h()0.46 secondsreturn value = 0
  create_idx_tenpct_code_h()0.45 secondsreturn value = 0
  create_idx_updates_double_bt()0.46 secondsreturn value = 0
create_idx_hundred_foreign()0.41 secondsreturn value = 0
  populateDataBase()11.54 seconds   return value = 0
Error in test Counting tuples at (6746)osdb.c:294:
... empty database -- empty results
perror() reports: Resource temporarily unavailable
someone sighup'd the parent

Any clue?

--

Regards
Mallah.




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark

Dennis Bjorklund <[EMAIL PROTECTED]> writes:

> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
> 
> > 
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> > FROM company c, traffic_logs ts
> >WHERE c.company_id = ts.company_id
> >  AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;

So depending on how much work you're willing to do there are some more
dramatic speedups you could get:

Use partial indexes like this (you'll need one for every month):

create index i on traffic_log (company_id) 
 where month_trunc(runtime) = '2003-10-01'

then group by company_id only so it can use the index:

select * 
  from company
  join (
select company_id, sum(bytes) as total_traffic
  from traffic_log
 where month_trunc(runtime) = '2003-10-01'
 group by company_id
   ) as x using (company_id)
  order by company_name



Actually you might be able to get the same effect using function indexes like:

create index i on traffic_log (month_trunc(runtime), company_id)


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier

On Tue, 11 Nov 2003, Greg Stark wrote:

> Actually you might be able to get the same effect using function indexes
> like:
>
> create index i on traffic_log (month_trunc(runtime), company_id)

had actually thought of that one ... is it something that is only
available in v7.4?

ams=# create index i on traffic_logs ( month_trunc(runtime), company_id );
ERROR:  parser: parse error at or near "," at character 54


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Christopher Browne
After a long battle with technology,[EMAIL PROTECTED] (Rajesh Kumar Mallah), an 
earthling, wrote:
> the error mentioned in first email has been overcome
> by running osdb on the same machine hosting the DB server.

Yes, it seems unrealistic to try to run the "client" on a separate
host from the database.  

I got the osdb benchmark running last week, and had to separate client
from server.  I had to jump through a fair number of hoops including
copying data files over to the server.  The benchmark software needs a
bit more work...
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/lsf.html
Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Josh Berkus
Rajesh, Chris,

> I got the osdb benchmark running last week, and had to separate client
> from server.  I had to jump through a fair number of hoops including
> copying data files over to the server.  The benchmark software needs a
> bit more work...

What about the OSDL's TPC-derivative benchmarks?   That's a much more 
respected database test, and probably less buggy than OSDB.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Josh Berkus
marc,

> had actually thought of that one ... is it something that is only
> available in v7.4?

Yes.  New feature.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier


On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
>
> >
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> > FROM company c, traffic_logs ts
> >WHERE c.company_id = ts.company_id
> >  AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;
>
> What if you do
>
>   ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
>
> and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual 
time=14862.71..26552.39 rows=144 loops=1)
   ->  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual 
time=9634.28..20967.07 rows=462198 loops=1)
 ->  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual 
time=9634.24..12838.73 rows=462198 loops=1)
   Sort Key: company_id
   ->  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 
rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1)
 Index Cond: (month_trunc(runtime) = '2003-10-01 
00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual 
time=13307.52..29274.66 rows=144 loops=1)
   ->  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual 
time=10809.02..-673265.13 rows=462198 loops=1)
 ->  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual 
time=10808.99..14069.79 rows=462198 loops=1)
   Sort Key: company_id
   ->  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 
width=16) (actual time=0.07..6801.92 rows=462198 loops=1)
 Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without 
time zone) AND (runtime < '2003-11-01 00:00:00'::timestamp without time zone))
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

   QUERY PLAN

 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 
width=16) (actual time=0.96..5432.93 rows=462198 loops=1)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, >15k msec:

   QUERY PLAN

 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual 
time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp 
without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark
"Marc G. Fournier" <[EMAIL PROTECTED]> writes:

> On Tue, 11 Nov 2003, Greg Stark wrote:
> 
> > Actually you might be able to get the same effect using function indexes
> > like:
> >
> > create index i on traffic_log (month_trunc(runtime), company_id)
> 
> had actually thought of that one ... is it something that is only
> available in v7.4?

Hum, I thought you could do simple functional indexes like that in 7.3, but
perhaps only single-column indexes.

In any case, given your situation I would seriously consider putting a
"month" integer column on your table anyways. Then your index would be a
simple (month, company_id) index.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Rajesh Kumar Mallah




Josh Berkus wrote:

  Rajesh, Chris,

  
  
I got the osdb benchmark running last week, and had to separate client
from server.  I had to jump through a fair number of hoops including
copying data files over to the server.  The benchmark software needs a
bit more work...

  
  
What about the OSDL's TPC-derivative benchmarks?   That's a much more 
respected database test, and probably less buggy than OSDB.

  

Hmm... really sorry! my 
pg_dump | psql is almost finishing in next 20 mins. 

creating indexes at the moment :)

Really sorry can't rollback and delay anymore becoz my
website is *unavailable* for past 30 mins.

I ran OSDB .15 version  and pg_bench .


Regds
Mallah.








Re: [PERFORM] Suggestions for benchmarking 7.4RC2 against 7.3

2003-11-11 Thread Rajesh Kumar Mallah






RC2 is running in production without any apparent problems
till now.  Well its difficult to say at the moment how much speed
gain is there unless the heavy duty batch SQL scripts are run by
cron. 

Count(*) and group by on large tables are significantly (5x) faster
and better error reporting has made it easier to spot the faulty data.
eg in fkey violation.

Will post the OSDB .15 versions' results on 7.3 & 7.4 soon.

Regds
Mallah.

Christopher Browne wrote:

  After a long battle with technology,[EMAIL PROTECTED] (Rajesh Kumar Mallah), an earthling, wrote:
  
  
the error mentioned in first email has been overcome
by running osdb on the same machine hosting the DB server.

  
  
Yes, it seems unrealistic to try to run the "client" on a separate
host from the database.  

I got the osdb benchmark running last week, and had to separate client
from server.  I had to jump through a fair number of hoops including
copying data files over to the server.  The benchmark software needs a
bit more work...
  







Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread scott.marlowe
On 11 Nov 2003, Greg Stark wrote:

> "Marc G. Fournier" <[EMAIL PROTECTED]> writes:
> 
> > On Tue, 11 Nov 2003, Greg Stark wrote:
> > 
> > > Actually you might be able to get the same effect using function indexes
> > > like:
> > >
> > > create index i on traffic_log (month_trunc(runtime), company_id)
> > 
> > had actually thought of that one ... is it something that is only
> > available in v7.4?
> 
> Hum, I thought you could do simple functional indexes like that in 7.3, but
> perhaps only single-column indexes.
> 
> In any case, given your situation I would seriously consider putting a
> "month" integer column on your table anyways. Then your index would be a
> simple (month, company_id) index.

In 7.3 and before, you had to use only column names as inputs, so you 
could cheat:

alter table test add alp int;
alter table test add omg int;
update test set alp=0;
update test set omg=13;

and then create a functional index:

create index test_xy on test (substr(info,alp,omg));

select * from test where substr(info,alp,omg)=='abcd';





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Chris Field
We are getting ready to spec out a new machine and are wondering about
the wisdom of buying a quad versus a dual processor machine.  Seing as
how postgres in not a threaded application, and this server will only be
used for log/transaction analysis (it will only ever have a few large
queries running).  Is there any performance to be gained, and if so is
it worth the large cost?  Any thoughts/experience are much
appreciated...




-- 
Chris Field
[EMAIL PROTECTED]
Affinity Solutions Inc.
386 Park Avenue South
Suite 1209
New York, NY 10016
(212) 685-8748 ext. 32


signature.asc
Description: This is a digitally signed message part


Re: [PERFORM] Server Configs

2003-11-11 Thread Anjan Dave
Dear Gurus,

We are planning to add more db server hardware for the apps. The
question is, what makes more sense regarding
performance/scalability/price of the hardware...

There are a couple of apps, currently on a dual-cpu Dell server. The
usage of the apps is going to increase quite a lot, and considering the
prices, we are looking at the following options:

Option 1:
==
Have each app on a separate db server (looking at 4 of these). The
server being a PowerEdge 2650, Dual 2.8GHz/512KB XEONS, 2GB RAM, PERC-3
RAID-5, split back plane (2+3), and 5 x 36GB HDDs (10K RPM).

Note: These servers are 1/3 the price of the Quad-cpu 6650 server.

Option 2:
==
Have two to three apps dbs hosted on a single server. The server being a
PowerEdge 6650, 4 x 2GHz/1MB XEONS, 8GB RAM, PERC-3 RAID-5, split back
plane (2+3), and 5 x 36GB HDDs (10K RPM).

Note: This server is 3 times more the price of the option 1.



Appreciate your guidance.

Thanks,
Anjan

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Rod Taylor
On Tue, 2003-11-11 at 18:32, Chris Field wrote:
> We are getting ready to spec out a new machine and are wondering about
> the wisdom of buying a quad versus a dual processor machine.  Seing as
> how postgres in not a threaded application, and this server will only be
> used for log/transaction analysis (it will only ever have a few large
> queries running).  Is there any performance to be gained, and if so is
> it worth the large cost?  Any thoughts/experience are much
> appreciated...

Since you're asking the question, I'll assume you don't have CPU
intensive queries or monstrous loads.

I'd probably invest in a Quad system with 2 chips in it (2 empty
sockets) and put the difference in funds into a few extra GB of Ram or
improved IO.

In 6 months or a year, if you start doing longer or more complex
queries, toss in the other 2 chips. So long as you don't hit a memory
limit, it'll be fine.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread scott.marlowe
On Tue, 11 Nov 2003, Rod Taylor wrote:

> On Tue, 2003-11-11 at 18:32, Chris Field wrote:
> > We are getting ready to spec out a new machine and are wondering about
> > the wisdom of buying a quad versus a dual processor machine.  Seing as
> > how postgres in not a threaded application, and this server will only be
> > used for log/transaction analysis (it will only ever have a few large
> > queries running).  Is there any performance to be gained, and if so is
> > it worth the large cost?  Any thoughts/experience are much
> > appreciated...
> 
> Since you're asking the question, I'll assume you don't have CPU
> intensive queries or monstrous loads.
> 
> I'd probably invest in a Quad system with 2 chips in it (2 empty
> sockets) and put the difference in funds into a few extra GB of Ram or
> improved IO.
> 
> In 6 months or a year, if you start doing longer or more complex
> queries, toss in the other 2 chips. So long as you don't hit a memory
> limit, it'll be fine.

Note that you want to carefully look at the difference in cost of the 
motherboard versus the CPUs.  It's often the motherboard that raises the 
cost, not the CPUs so much.  Although with Xeons, the CPUs are not cheap.

The second issue is that Intel (and AMD probably) only guarantee proper 
performance from chips int he same batch, so you may wind up replacing the 
two working CPUs with two new ones to go with the other two you'll be 
buying, to make sure that they work together.

My guess is that more CPUs aren't gonna help this problem a lot, so look 
more at fast RAM and lots of it, as well as a fast I/O subsystem.

2 CPUs should be plenty.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Allan Wind
On 2003-11-11T17:40:14-0700, scott.marlowe wrote:
> 2 CPUs should be plenty.

for everyone?  No, I must have been thinking of someone else :-)


/Allan
-- 
Allan Wind
P.O. Box 2022
Woburn, MA 01888-0022
USA


signature.asc
Description: Digital signature


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread fred
> On Tue, 11 Nov 2003, Rod Taylor wrote:
>
>> On Tue, 2003-11-11 at 18:32, Chris Field wrote:
>> > We are getting ready to spec out a new machine and are wondering about
>> > the wisdom of buying a quad versus a dual processor machine.  Seing as
>> > how postgres in not a threaded application, and this server will only
>> be
>> > used for log/transaction analysis (it will only ever have a few large
>> > queries running).  Is there any performance to be gained, and if so is
>> > it worth the large cost?  Any thoughts/experience are much
>> > appreciated...
>>
>> Since you're asking the question, I'll assume you don't have CPU
>> intensive queries or monstrous loads.
>>
>> I'd probably invest in a Quad system with 2 chips in it (2 empty
>> sockets) and put the difference in funds into a few extra GB of Ram or
>> improved IO.
>>
>> In 6 months or a year, if you start doing longer or more complex
>> queries, toss in the other 2 chips. So long as you don't hit a memory
>> limit, it'll be fine.
>
> Note that you want to carefully look at the difference in cost of the
> motherboard versus the CPUs.  It's often the motherboard that raises the
> cost, not the CPUs so much.  Although with Xeons, the CPUs are not cheap.
>
> The second issue is that Intel (and AMD probably) only guarantee proper
> performance from chips int he same batch, so you may wind up replacing the
> two working CPUs with two new ones to go with the other two you'll be
> buying, to make sure that they work together.
>
> My guess is that more CPUs aren't gonna help this problem a lot, so look
> more at fast RAM and lots of it, as well as a fast I/O subsystem.
>
> 2 CPUs should be plenty.
I agree that the additional cpus won't help as much since I haven't found
any benefits in terms of individual query speed for a quad vs. an smp on
benchmarks I've run on test machines I was considering purchasing.  Quads
are also expensive - on similar architectures the quad was 20k vs 7k for
the dual.
>
>
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Ron Johnson
On Tue, 2003-11-11 at 17:32, Chris Field wrote:
> We are getting ready to spec out a new machine and are wondering about
> the wisdom of buying a quad versus a dual processor machine.  Seing as
> how postgres in not a threaded application, and this server will only be
> used for log/transaction analysis (it will only ever have a few large
> queries running).  Is there any performance to be gained, and if so is
> it worth the large cost?  Any thoughts/experience are much
> appreciated...

Xeon or Opteron?  The faster Opterons *really* blaze, especially
in 64-bit mode.  As others have said, though, RAM and I/O are most
important.

-- 
-
Ron Johnson, Jr. [EMAIL PROTECTED]
Jefferson, LA USA

"As I like to joke, I may have invented it, but Microsoft made it
popular"
David Bradley, regarding Ctrl-Alt-Del 


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Chris Field
we are looking at Xeon, We are currently running it on a quad sun v880
compiled to be 64bit and have been getting dreadful performance.  I don't
think we really have much to gain from going 64bit.


- Original Message - 
From: "Ron Johnson" <[EMAIL PROTECTED]>
To: "PgSQL Performance ML" <[EMAIL PROTECTED]>
Sent: Tuesday, November 11, 2003 8:24 PM
Subject: Re: [PERFORM] Value of Quad vs. Dual Processor machine


> On Tue, 2003-11-11 at 17:32, Chris Field wrote:
> > We are getting ready to spec out a new machine and are wondering about
> > the wisdom of buying a quad versus a dual processor machine.  Seing as
> > how postgres in not a threaded application, and this server will only be
> > used for log/transaction analysis (it will only ever have a few large
> > queries running).  Is there any performance to be gained, and if so is
> > it worth the large cost?  Any thoughts/experience are much
> > appreciated...
>
> Xeon or Opteron?  The faster Opterons *really* blaze, especially
> in 64-bit mode.  As others have said, though, RAM and I/O are most
> important.
>
> -- 
> -
> Ron Johnson, Jr. [EMAIL PROTECTED]
> Jefferson, LA USA
>
> "As I like to joke, I may have invented it, but Microsoft made it
> popular"
> David Bradley, regarding Ctrl-Alt-Del
>
>
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faqs/FAQ.html
>
>


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Value of Quad vs. Dual Processor machine

2003-11-11 Thread Fred Moyer
One thing I learned after spending about a week comparing the Athlon (2
ghz, 333 mhz frontside bus) and Xeon (2.4 ghz, 266 mhz frontside bus)
platforms was that on average the select queries I was benchmarking ran
30% faster on the Athlon (this was with data cached in memory so may not
apply to the larger data sets where I/O is the limiting factor.)

I benchmarked against the Opteron 244 when it came out and it came in
about the same as the Athlon (makes sense since both were 333 mhz
memory).  The results within +/- 5-10% that of the Athlon.  From testing
against a couple of other machines I noticed that the memory bus speeds
were almost directly proportional to the query times under these
conditions.

Not sure how these compare against the quad sun but the AMD chips
returned the select queries faster than the Xeons from the informal
investigations I did.  Definitely try it before you buy it if possible.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris Field
Sent: Tuesday, November 11, 2003 6:13 PM
To: Ron Johnson; PgSQL Performance ML
Subject: Re: [PERFORM] Value of Quad vs. Dual Processor machine


we are looking at Xeon, We are currently running it on a quad sun v880
compiled to be 64bit and have been getting dreadful performance.  I
don't think we really have much to gain from going 64bit.


- Original Message - 
From: "Ron Johnson" <[EMAIL PROTECTED]>
To: "PgSQL Performance ML" <[EMAIL PROTECTED]>
Sent: Tuesday, November 11, 2003 8:24 PM
Subject: Re: [PERFORM] Value of Quad vs. Dual Processor machine


> On Tue, 2003-11-11 at 17:32, Chris Field wrote:
> > We are getting ready to spec out a new machine and are wondering 
> > about the wisdom of buying a quad versus a dual processor machine.  
> > Seing as how postgres in not a threaded application, and this server

> > will only be used for log/transaction analysis (it will only ever 
> > have a few large queries running).  Is there any performance to be 
> > gained, and if so is it worth the large cost?  Any 
> > thoughts/experience are much appreciated...
>
> Xeon or Opteron?  The faster Opterons *really* blaze, especially in 
> 64-bit mode.  As others have said, though, RAM and I/O are most 
> important.
>
> --
> -
> Ron Johnson, Jr. [EMAIL PROTECTED]
> Jefferson, LA USA
>
> "As I like to joke, I may have invented it, but Microsoft made it 
> popular" David Bradley, regarding Ctrl-Alt-Del
>
>
> ---(end of 
> broadcast)---
> TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faqs/FAQ.html
>
>


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark

"Marc G. Fournier" <[EMAIL PROTECTED]> writes:

> Just as a side note, just doing a straight scan for the records, with no
> SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

Well so the problem isn't the query at all, you just have too much data to
massage online. You can preprocess the data offline into a more managable
amount of data for your online reports.

What I used to do for a similar situation was to do hourly queries sort of
like this:

insert into data_aggregate (day, hour, company_id, total_bytes)
 (select trunc(now(),'day'), trunc(now(), 'hour'), company_id, sum(bytes)
from raw_data
   where time between trunc(now(),'hour') and trunc(now(),'hour')+'1 hour'::interval
   group by company_id
 )

[this was actually on oracle and the data looked kind of different, i'm making
this up as i go along]

Then later the reports could run quickly based on data_aggregate instead of
slowly based on the much larger data set accumulated by the minute. Once I had
this schema set up it was easy to follow it for all of the rapidly growing
data tables.

Now in my situation I had thousands of records accumulating per second, so
hourly was already a big win. I originally chose hourly because I thought I
might want time-of-day reports but that never panned out. On the other hand it
was a win when the system broke once because I could easily see that and fix
it before midnight when it would have actually mattered. Perhaps in your
situation you would want daily aggregates or something else.

One of the other advantages of these aggregate tables was that we could purge
the old data much sooner with much less resistance from the business. Since
the reports were all still available and a lot of ad-hoc queries could still
be done without the raw data anyways.

Alternatively you can just give up on online reports. Eventually you'll have
some query that takes way more than 8s anyways. You can pregenerate the entire
report as a batch job instead. Either send it off as a nightly e-mail, store
it as an html or csv file for the web server, or (my favourite) store the data
for the report as an sql table and then have multiple front-ends that do a
simple "select *" to pull the data and format it.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match