Re: [PERFORM] Postgres low end processing.

2003-10-07 Thread Stef
Hi again all,

I've tested postgres 7.3.4 on Linux version 2.4.17 
and this is what I found :

The initial instance took up 8372K and this fluctuated
between +- 8372K  and 10372K, plus +- 3500K for
every connection.

I did quite a few transactions on both connections, plus
a few vacuums and a pg_dump and the total memory usage
didn't seem to go over 16M

I set all the _buffers, _mem, _fsm settings to the minimum,
restarted every time, but this had absolutely no noticeable 
increase or decrease in total memory usage.

(I used a program called gmemusage to get these stats.)

On the same machine , I tested postgres 7.1.2 with basically
the same conf options (not _fsm) and got the following :

The initial instance was 1772K and fluctuated to +- 4000K,
plus +- 3400K for every connection.

Doing the same transactions, vacuum + pg_dump, total
memory usage didn't really go over 11M, 
which was exactly what I needed. 

Although I've lived through some of the shortcomings of
7.1.2, it is still very stable, and works perfectly for
what it is going to be used for.

Again, here, I was only able to restrict things a little
by changing the configuration options, but no major
difference in memory usage.

Regards
Stef

On Mon, 6 Oct 2003 09:55:51 +0200
Stef <[EMAIL PROTECTED]> wrote:

=> Thanks for the replies,
=> 
=> On Fri, 3 Oct 2003 11:08:48 -0700
=> Josh Berkus <[EMAIL PROTECTED]> wrote:
=> => 1. Make sure that the WAL files (pg_xlog) are on a seperate disk from the 
=> => database files, either through mounting or symlinking.
=>  
=> I'm not sure I understand how this helps?
=> 
=> => 2. Tweak the .conf file for low vacuum_mem (1024?), but vacuum very 
=> => frequently, like every 1-5 minutes.  Spend some time tuning your 
=> => fsm_max_pages to the ideal level so that you're not allocating any extra 
=> => memory to the FSM.
=> =>
=> => 3. If your concern is *average* CPU/RAM consumption, and not peak load 
=> => activity, increase wal_files and checkpoint_segments to do more efficient 
=> => batch processing of pending updates as the cost of some disk space.  If peak 
=> => load activity is a problem, don't do this.
=> => 
=> => 4. Tune all of your queries carefully to avoid anything requiring a 
=> => RAM-intensive merge join or CPU-eating calculated expression hash join, or 
=> => similar computation-or-RAM-intensive operations.
=> 
=> Thanks, I'll try some of these, and post the results.
=> The actual machines seem to be Pentium I machines,
=> with 32M RAM. I've gathered that it is theoretically 
=> possible, so no to go try it.
=> 
=> Regards
=> Stef
=> 


pgp0.pgp
Description: PGP signature


Re: [PERFORM] Postgres low end processing.

2003-10-07 Thread Jeff
On Tue, 7 Oct 2003, Stef wrote:

> The initial instance took up 8372K and this fluctuated
> between +- 8372K  and 10372K, plus +- 3500K for
> every connection.
>

Does that include/exlude the size of say, shared code & libraries?
I know linux does copy-on-write forking.. so it may be less in reality...

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TPC-R benchmarks

2003-10-07 Thread Timothy D. Witham

On Thu, 2003-09-25 at 14:32, Jenny Zhang wrote:
> I am running TPC-H with scale factor of 1 on RedHat7.2 with the kernel
> 2.5.74.  Q17 can always finish in about 7 seconds on my system.  The
> execution plan is:

  I just want to point out that we are the OSDL are not running
a TPC-X anything.  We have fair use implementations of these 
benchmarks but because of differences our performance tests can
not be compared with the TPCC's benchmark results.

> 
>  Aggregate  (cost=780402.43..780402.43 rows=1 width=48)
>->  Nested Loop  (cost=0.00..780397.50 rows=1973 width=48)
>  Join Filter: ("inner".l_quantity < (subplan))
>  ->  Seq Scan on part  (cost=0.00..8548.00 rows=197 width=12)
>Filter: ((p_brand = 'Brand#31'::bpchar) AND (p_container
> = 'LG CASE'::bpchar))
>  ->  Index Scan using i_l_partkey on lineitem 
> (cost=0.00..124.32 rows=30 width=36)
>Index Cond: ("outer".p_partkey = lineitem.l_partkey)
>  SubPlan
>->  Aggregate  (cost=124.40..124.40 rows=1 width=11)
>  ->  Index Scan using i_l_partkey on lineitem 
> (cost=0.00..124.32 rows=30 width=11)
>Index Cond: (l_partkey = $0)
> (11 rows)
> 
> Hope this helps,
> Jenny
> On Thu, 2003-09-25 at 12:40, Oleg Lebedev wrote:
> > I am running TPC-R benchmarks with a scale factor of 1, which correspond
> > to approximately 1 GB database size on PostgreSQL 7.3.4 installed on
> > CygWin on Windows XP. I dedicated 128 MB of shared memory to my postrges
> > installation.
> > Most of the queries were able to complete in a matter of minutes, but
> > query 17 was taking hours and hours. The query is show below. Is there
> > any way to optimize it ?
> >  
> > select
> >  sum(l_extendedprice) / 7.0 as avg_yearly
> > from
> >  lineitem,
> >  part
> > where
> >  p_partkey = l_partkey
> >  and p_brand = 'Brand#11'
> >  and p_container = 'SM PKG'
> >  and l_quantity < (
> >   select
> >0.2 * avg(l_quantity)
> >   from
> >lineitem
> >   where
> >l_partkey = p_partkey
> >  );
> >  
> > Thanks.
> >  
> > Oleg
> > 
> > *
> > 
> > This e-mail may contain privileged or confidential material intended for the named 
> > recipient only.
> > If you are not the named recipient, delete this message and all attachments.
> > Unauthorized reviewing, copying, printing, disclosing, or otherwise using 
> > information in this e-mail is prohibited.
> > We reserve the right to monitor e-mail sent through our network. 
> > 
> > *
> 
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
-- 
Timothy D. Witham - Lab Director - [EMAIL PROTECTED]
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton OR, 97005
(503)-626-2455 x11 (office)(503)-702-2871 (cell)
(503)-626-2436 (fax)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TPC-R benchmarks

2003-10-07 Thread Josh Berkus
Tom,

I've found the problem with TPC-R query #19.  And it, unfortunately, appears 
to be a problem in the PostgreSQL query planner.

To sum up the below:  it appears that whenever a set of WHERE conditions 
exceeds a certain level of complexity, the planner just ignores all 
applicable indexes and goes for a seq scan.   While this may be unavoidable 
to some degree, it seems to me that we need to raise the threshold of 
complexity at which it does this.

tpcr=# select version();
 version
-
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226 
(prerelease) (SuSE Linux)
(1 row)

I've tested a number of indexes on the query, and found the two most efficient 
on subsets of the query.  Thus:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 8 and l_quantity <= 8 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
   
   
QUERY PLAN
--
 Aggregate  (cost=10380.70..10380.70 rows=1 width=30) (actual 
time=161.61..161.61 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..10380.67 rows=13 width=30) (actual 
time=81.54..161.47 rows=17 loops=1)
 ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=62 
width=4) (actual time=81.21..137.24 rows=98 loops=1)
   Index Cond: (p_brand = 'Brand#33'::bpchar)
   Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container = 
'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM 
PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5))
 ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.84 
rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98)
   Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric))
   Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 161.71 msec



This also works for a similar query:

explain analyze
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#52'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 14 and l_quantity <= 14 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);

 Aggregate  (cost=11449.36..11449.36 rows=1 width=30) (actual 
time=195.72..195.72 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11449.29 rows=28 width=30) (actual 
time=56.42..195.39 rows=48 loops=1)
 ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=139 
width=4) (actual time=56.15..153.17 rows=166 loops=1)
   Index Cond: (p_brand = 'Brand#52'::bpchar)
   Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container = 
'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container = 
'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10))
 ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.29 
rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166)
   Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <= 
24::numeric))
   Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 195.82 msec
(9 rows)


If, however, I combine the two where clauses with an OR, the planner gets 
confused and insists on loading the entire tables into memory (even though I 
don't have that much memory):

explain
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#33'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 8 and l_quantity <= 8 + 10
and p_size between 1 and 5
a

Re: [PERFORM] TPC-R benchmarks

2003-10-07 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes:
> To sum up the below:  it appears that whenever a set of WHERE conditions 
> exceeds a certain level of complexity, the planner just ignores all 
> applicable indexes and goes for a seq scan.

It looks to me like the planner is coercing the WHERE clause into
canonical OR-of-ANDs form (DNF).  Which is often a good heuristic
but it seems unhelpful for this query.

> How can we fix this?

Feel free to propose improvements to the heuristics in
src/backend/optimizer/prep/prepqual.c ...

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend