Re: [PERFORM] disable archiving

2007-07-24 Thread valgog
On Jul 23, 7:24 pm, [EMAIL PROTECTED] (Paul van den Bogaard)
wrote:
> the manual somewhere states "... if archiving is enabled..." To me
> this implies that archiving can be disabled. However I cannot find
> the parameter to use to get this result. Or should I enable archiving
> and use a backup script like
>
> #!/usr/bin/bash
> exit 0
>
> Would appreciate a hint. And yes I know I put my database in danger
> etc. This is for some benchmarks where I do not want the overhead of
> archiving. Jus a file system that will not fill with zillions of
> these 16MB WAL files ;^)
>
> Thanks
> Paul.
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

Is it normal to spoil other threads? or is it a bug?

If it is not a bug, please change the subject of the topic back to
what it was!

With best regards,

Valentine Gogichashvili


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] multicolumn index column order

2007-07-24 Thread Valentine Gogichashvili

On 7/24/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


valgog <[EMAIL PROTECTED]> wrote ..
> On Jul 23, 7:00 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
> > valgog <[EMAIL PROTECTED]> writes:
> > > how to build an multicolumn index with one column order ASCENDING
and
> > > another column order DESCENDING?
> >
> > Use 8.3 ;-)
> >
> > In existing releases you could fake it with a custom reverse-sorting
> > operator class, but it's a pain in the neck to create one.

I've often gotten what I want by using a calculated index on (f1, -f2).
ORDER BY will take an expression, e.g. ORDER BY f1, -f2. Simpler than a
custom operator.




Yes, this is true, but I do now know how to make text order be reversible?
There is no - (minus) operator for text value. By now it is not a problem
for me, but theoretically I do not see other chance to reverse text fields
order...


Re: [PERFORM] multicolumn index column order

2007-07-24 Thread andrew
valgog <[EMAIL PROTECTED]> wrote ..
> On Jul 23, 7:00 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
> > valgog <[EMAIL PROTECTED]> writes:
> > > how to build an multicolumn index with one column order ASCENDING and
> > > another column order DESCENDING?
> >
> > Use 8.3 ;-)
> >
> > In existing releases you could fake it with a custom reverse-sorting
> > operator class, but it's a pain in the neck to create one.

I've often gotten what I want by using a calculated index on (f1, -f2). ORDER 
BY will take an expression, e.g. ORDER BY f1, -f2. Simpler than a custom 
operator.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query performance issue

2007-07-24 Thread Chris

Jonathan Gray wrote:
We’re experiencing a query performance problem related to the planner 
and its ability to perform a specific type of merge.


 

We have created a test case (as attached, or here: 
http://www3.streamy.com/postgres/indextest.sql) which involves a 
hypothetical customer ordering system, with customers, orders, and 
customer groups.


 

If we want to retrieve a single customers 10 most recent orders, sorted 
by date, we can use a double index on (customer,date); Postgres’s query 
planner will use the double index with  a backwards index scan on the 
second indexed column (date).


 

However, if we want to retrieve a “customer class’s” 10 most recent 
orders, sorted by date, we are not able to get Postgres to use double 
indexes.


You don't have any indexes on the 'customerclass' table.

Creating a foreign key doesn't create an index, you need to do that 
separately.


Try

create index cc_customerid_class on indextest.customerclass(classid, 
customerid);


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query performance issue

2007-07-24 Thread Chris

Chris wrote:

Jonathan Gray wrote:
We’re experiencing a query performance problem related to the planner 
and its ability to perform a specific type of merge.


 

We have created a test case (as attached, or here: 
http://www3.streamy.com/postgres/indextest.sql) which involves a 
hypothetical customer ordering system, with customers, orders, and 
customer groups.


 

If we want to retrieve a single customers 10 most recent orders, 
sorted by date, we can use a double index on (customer,date); 
Postgres’s query planner will use the double index with  a backwards 
index scan on the second indexed column (date).


 

However, if we want to retrieve a “customer class’s” 10 most recent 
orders, sorted by date, we are not able to get Postgres to use double 
indexes.


You don't have any indexes on the 'customerclass' table.

Creating a foreign key doesn't create an index, you need to do that 
separately.


Try

create index cc_customerid_class on indextest.customerclass(classid, 
customerid);




It could also be that since you don't have very much data (10,000) rows 
- postgres is ignoring the indexes because it'll be quicker to scan the 
tables.


If you bump it up to say 100k rows, what happens?

--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query performance issue

2007-07-24 Thread Jonathan Gray
Chris,

Creating indexes on the customerclass table does speed up the queries but
still does not create the plan we are looking for (using the double index
with a backward index scan on the orders table).

The plans we now get, with times on par or slightly better than with the
plpgsql hack, are:

  EXPLAIN ANALYZE
  SELECT o.orderid,o.orderstamp FROM indextest.orders o 
  INNER JOIN indextest.customerclass cc ON (cc.classid = 2) 
  WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5;

 
QUERY PLAN



 Limit  (cost=0.00..176.65 rows=5 width=12) (actual time=0.930..3.675 rows=5
loops=1)
   ->  Nested Loop  (cost=0.00..46388.80 rows=1313 width=12) (actual
time=0.927..3.664 rows=5 loops=1)
 ->  Index Scan Backward using orders_orderstamp_idx on orders o
(cost=0.00..6225.26 rows=141001 width=16) (actual time=0.015..0.957 rows=433
loops=1)
 ->  Index Scan using customerclass_customerid_idx on customerclass
cc  (cost=0.00..0.27 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=433)
   Index Cond: (o.customerid = cc.customerid)
   Filter: (classid = 2)

And

  EXPLAIN ANALYZE
  SELECT o.orderid,o.orderstamp FROM indextest.orders o 
  INNER JOIN indextest.customerclass cc ON (cc.classid = 2) 
  WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 100;

  QUERY
PLAN

---
 Limit  (cost=1978.80..1979.05 rows=100 width=12) (actual time=6.167..6.448
rows=100 loops=1)
   ->  Sort  (cost=1978.80..1982.09 rows=1313 width=12) (actual
time=6.165..6.268 rows=100 loops=1)
 Sort Key: o.orderstamp
 ->  Nested Loop  (cost=3.99..1910.80 rows=1313 width=12) (actual
time=0.059..4.576 rows=939 loops=1)
   ->  Bitmap Heap Scan on customerclass cc  (cost=3.99..55.16
rows=95 width=4) (actual time=0.045..0.194 rows=95 loops=1)
 Recheck Cond: (classid = 2)
 ->  Bitmap Index Scan on customerclass_classid_idx
(cost=0.00..3.96 rows=95 width=0) (actual time=0.032..0.032 rows=95 loops=1)
   Index Cond: (classid = 2)
   ->  Index Scan using orders_customerid_idx on orders o
(cost=0.00..19.35 rows=15 width=16) (actual time=0.006..0.025 rows=10
loops=95)
 Index Cond: (o.customerid = cc.customerid)


As I said, this is a hypothetical test case we have arrived at that
describes our situation as best as we can given a simple case.  We're
interested in potential issues with the approach, why postgres would not
attempt something like it, and how we might go about implementing it
ourselves at a lower level than we currently have (in SPI, libpq, etc). 

If it could be generalized then we could use it in cases where we aren't
pulling from just one table (the orders table) but rather trying to merge,
in sorted order, results from different conditions on different tables.
Right now we use something like the plpgsql or plpythonu functions in the
example and they outperform our regular SQL queries by a fairly significant
margin.

An example might be:

SELECT * FROM ( 
(SELECT orderid,stamp FROM indextest.orders_usa WHERE customerid =  ORDER BY stamp DESC LIMIT 5) UNION 
(SELECT orderid,stamp FROM indextest.orders_can WHERE customerid =  ORDER BY stamp DESC LIMIT 5) 
) as x ORDER BY x.stamp DESC

Again, that's a general example but some of my queries contain between 5 and
10 different sorted joins of this kind and it would be helpful to have
something internal in postgres to efficiently handle it (do something just
like the above query but not have to do the full LIMIT 5 for each set, some
kind of in order merge/heap join?)
 
Jonathan Gray

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chris
Sent: Tuesday, July 24, 2007 1:51 AM
To: Jonathan Gray
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query performance issue

Chris wrote:
> Jonathan Gray wrote:
>> We're experiencing a query performance problem related to the planner 
>> and its ability to perform a specific type of merge.
>>
>>  
>>
>> We have created a test case (as attached, or here: 
>> http://www3.streamy.com/postgres/indextest.sql) which involves a 
>> hypothetical customer ordering system, with customers, orders, and 
>> customer groups.
>>
>>  
>>
>> If we want to retrieve a single customers 10 most recent orders, 
>> sorted by date, we can use a double index on (customer,date); 
>> Postgres's query planner will use the double index with  a backwards 
>> index scan on the second indexed column (date).
>>
>>  
>>
>> However, if we want to retrieve a "customer class's" 10 m

Re: [PERFORM] Query performance issue

2007-07-24 Thread Chris

Jonathan Gray wrote:

Chris,

Creating indexes on the customerclass table does speed up the queries but
still does not create the plan we are looking for (using the double index
with a backward index scan on the orders table).


Stupid question - why is that particular plan your "goal" plan?


The plans we now get, with times on par or slightly better than with the
plpgsql hack, are:

  EXPLAIN ANALYZE
  SELECT o.orderid,o.orderstamp FROM indextest.orders o 
  INNER JOIN indextest.customerclass cc ON (cc.classid = 2) 
  WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5;


Didn't notice this before...

Shouldn't this be:

INNER JOIN indextest.customerclass cc ON (o.customerid = cc.customerid)
WHERE cc.classid = 2

ie join on the common field not the classid one which doesn't appear in 
the 2nd table?



As I said, this is a hypothetical test case we have arrived at that
describes our situation as best as we can given a simple case.  We're
interested in potential issues with the approach, why postgres would not
attempt something like it, and how we might go about implementing it
ourselves at a lower level than we currently have (in SPI, libpq, etc). 


If it could be generalized then we could use it in cases where we aren't
pulling from just one table (the orders table) but rather trying to merge,
in sorted order, results from different conditions on different tables.
Right now we use something like the plpgsql or plpythonu functions in the
example and they outperform our regular SQL queries by a fairly significant
margin.


I'm sure if you posted the queries you are running with relevant info 
you'd get some help ;)


--
Postgresql & php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Query performance issue

2007-07-24 Thread Jonathan Gray
That particular plan is our goal because we've "hacked" it together to
perform better than the normal sql plans.  Analytically it makes sense to
approach this particular problem in this way because it is relatively
invariant to the distributions and sizes of the tables (with only having to
deal with increased index size).

Also, changing around the query doesn't change the query plan at all.  The
planner is intelligent enough to figure out what it really needs to join on
despite my poor query writing.  I originally had it this way to ensure my
(customerid,orderstamp) conditions were in the correct order but again
appears to not matter.

I will try to get a more complex/sophisticated test case running.  I'm not
able to post my actual structure or queries but I'll try to produce a better
example of the other (multiple table) case tomorrow. 

Thanks.

Jonathan Gray


-Original Message-
From: Chris [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 24, 2007 2:36 AM
To: Jonathan Gray
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query performance issue

Jonathan Gray wrote:
> Chris,
> 
> Creating indexes on the customerclass table does speed up the queries but
> still does not create the plan we are looking for (using the double index
> with a backward index scan on the orders table).

Stupid question - why is that particular plan your "goal" plan?

> The plans we now get, with times on par or slightly better than with the
> plpgsql hack, are:
> 
>   EXPLAIN ANALYZE
>   SELECT o.orderid,o.orderstamp FROM indextest.orders o 
>   INNER JOIN indextest.customerclass cc ON (cc.classid = 2) 
>   WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5;

Didn't notice this before...

Shouldn't this be:

INNER JOIN indextest.customerclass cc ON (o.customerid = cc.customerid)
WHERE cc.classid = 2

ie join on the common field not the classid one which doesn't appear in 
the 2nd table?

> As I said, this is a hypothetical test case we have arrived at that
> describes our situation as best as we can given a simple case.  We're
> interested in potential issues with the approach, why postgres would not
> attempt something like it, and how we might go about implementing it
> ourselves at a lower level than we currently have (in SPI, libpq, etc). 
> 
> If it could be generalized then we could use it in cases where we aren't
> pulling from just one table (the orders table) but rather trying to merge,
> in sorted order, results from different conditions on different tables.
> Right now we use something like the plpgsql or plpythonu functions in the
> example and they outperform our regular SQL queries by a fairly
significant
> margin.

I'm sure if you posted the queries you are running with relevant info 
you'd get some help ;)

-- 
Postgresql & php tutorials
http://www.designmagick.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Postgres configuration for 64 CPUs, 128 GB RAM...

2007-07-24 Thread Marc Mamin
 
Hello,

thank you for all your comments and recommendations.

I'm aware that the conditions for this benchmark are not ideal, mostly
due to the lack of time to prepare it. We will also need an additional
benchmark on a less powerful - more realistic - server to better
understand the scability of our application.


Our application is based on java and is generating dynamic reports from
log files content. Dynamic means here that a repor will be calculated
from the postgres data the first time it is requested (it will  then be
cached). Java is used to drive the data preparation and to
handle/generate the reports requests.

This is much more an OLAP system then an OLTP, at least for our
performance concern.




Data preparation:

1) parsing the log files with a heavy use of perl (regular expressions)
to generate csv files. Prepared statements also maintain reference
tables in the DB. Postgres performance is not an issue for this first
step.

2) loading the csv files with COPY. As around 70% of the data to load
come in a single daily table, we don't allow concurrent jobs for this
step. We have between a few and a few hundreds files to load into a
single table; they are processed one after the other. A primary key is
always defined; for the case when the required indexes are alreay built
and when the new data are above a given size, we are using a "shadow" 
table  instead (without the indexes) , build the index after the import
and then replace the live table with the shadow one. 
For example, we a have a table of 13 GB + 11 GB indexes (5 pieces).

Performances :

a) is there an "ideal" size to consider for our csv files (100 x 10
MB or better 1 x 1GB ?)
b) maintenance_work_mem: I'll use around 1 GB as recommended by
Stefan

3) Data agggregation. This is the heaviest part for Postgres. On our
current system some queries need above one hour, with phases of around
100% cpu use, alterning with times of heavy i/o load when temporary
results are written/read to the plate (pgsql_tmp). During the
aggregation, other postgres activities are low (at least should be) as
this should take place at night. Currently we have a locking mechanism
to avoid having more than one of such queries running concurently. This
may be to strict for the benchmark server but better reflect our current
hardware capabilities.

Performances : Here we should favorise a single huge transaction and
consider a low probability to have another transaction requiring large
sort space. Considering this, is it reasonable to define work_mem being
3GB (I guess I should raise this parameter dynamically before running
the aggregation queries)

4) Queries (report generation)

We have only few requests which are not satisfying while requiring large
sort operations. The data are structured in different aggregation levels
(minutes, hours, days) with logical time based partitions in oder to
limit the data size to compute for a given report. Moreover we can scale
our infrastrucure while using different or dedicated Postgres servers
for different customers. Smaller customers may share a same instance,
each of them having its own schema (The lock mechanism for large
aggregations apply to a whole Postgres instance, not to a single
customer) . The benchmark will help us to plan such distribution.

During the benchmark, we will probably not have more than 50 not idle
connections simultaneously. It is a bit too early for us to fine tune
this part. The benchmark will mainly focus on the steps 1 to 3

During the benchmark, the Db will reach a size of about 400 GB,
simulating 3 different customers, also with data quite equally splitted
in 3 scheemas.



I will post our configuration(s) later on.



Thanks again for all your valuable input.

Marc Mamin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Campbell, Lance
I have installed pgAdmin III 1.6.  In the tool when you click on a
particular table you can select a tab called "Statistics".  This tab has
all kinds of info on your table.  For some reason the only info I see is
for table size, toast table size and indexes size.  Is there a reason
that the other 15 fields have zeros in them?  I was thinking that maybe
I needed to turn on a setting within my database in order to get
statistics reported.

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Jean-Max Reymond

Campbell, Lance a écrit :
I have installed pgAdmin III 1.6.  In the tool when you click on a 
particular table you can select a tab called “Statistics”.  This tab has 
all kinds of info on your table.  For some reason the only info I see is 
for table size, toast table size and indexes size.  Is there a reason 
that the other 15 fields have zeros in them?  I was thinking that maybe 
I needed to turn on a setting within my database in order to get 
statistics reported.


it seems that the module pgstattuple is needed

--
Jean-Max Reymond
CKR Solutions http://www.ckr-solutions.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Dave Page


> --- Original Message ---
> From: Jean-Max Reymond <[EMAIL PROTECTED]>
> To: pgsql-performance@postgresql.org
> Sent: 24/07/07, 18:23:53
> Subject: Re: [PERFORM] Table Statistics with pgAdmin III
> 
> Campbell, Lance a écrit :
> > I have installed pgAdmin III 1.6.  In the tool when you click on a 
> > particular table you can select a tab called “Statistics”.  This tab has 
> > all kinds of info on your table.  For some reason the only info I see is 
> > for table size, toast table size and indexes size.  Is there a reason 
> > that the other 15 fields have zeros in them?  I was thinking that maybe 
> > I needed to turn on a setting within my database in order to get 
> > statistics reported.
> 
> it seems that the module pgstattuple is needed

That'll allow you to see extra stats in 1.8, but won't alter what you already 
see, in fact 1.6 won't use it at all. What values are at zero?

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] index over timestamp not being used

2007-07-24 Thread Arnau

Hi all,

 I've got the following two tables running on postgresql 8.1.4

 transactions
   Column |Type |   Modifiers
--+-+---
transaction_id| character varying(32)   | not null
user_id   | bigint  | not null
timestamp_in  | timestamp without time zone | default now()
type_id   | integer |
technology_id | integer |
Indexes:
   "pk_phusrtrans_transid" PRIMARY KEY, btree (transaction_id)
   "idx_phusrtrans_paytyptech" btree (type_id, technology_id)
   "idx_putrnsctns_tstampin" btree (timestamp_in)



  statistics
   Column |Type |Modifiers
--+-+---
statistic_id  | bigint  | not null
duration  | bigint  |
transaction_id| character varying(32)   |
Indexes:
   "pk_phstat_statid" PRIMARY KEY, btree (statistic_id)
   "idx_phstat_transid" btree (transaction_id)


the idea is to have a summary of how many transactions, duration, and
type for every date. To do so, I've done the following query:


SELECT
  count(t.transaction_id) AS num_transactions
  , SUM(s.duration) AS duration
  , date(t.timestamp_in) as date
  , t.type_id
FROM
 transactions t
 LEFT OUTER JOIN statistics s ON t.transaction_id = s.transaction_id
WHERE
 t.timestamp_in >= to_timestamp('20070101', 'MMDD')
GROUP BY date, t.type_id;

I think this could be speed up if the index idx_putrnsctns_tstampin
(index over the timestamp) could be used, but I haven't been able to do
it. Any suggestion?

Thanks all
--
Arnau

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Table Statistics with pgAdmin III

2007-07-24 Thread Campbell, Lance
All of the fields are zero except for the three I listed in my posting.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Page
Sent: Tuesday, July 24, 2007 12:50 PM
To: Jean-Max Reymond
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Table Statistics with pgAdmin III



> --- Original Message ---
> From: Jean-Max Reymond <[EMAIL PROTECTED]>
> To: pgsql-performance@postgresql.org
> Sent: 24/07/07, 18:23:53
> Subject: Re: [PERFORM] Table Statistics with pgAdmin III
> 
> Campbell, Lance a écrit :
> > I have installed pgAdmin III 1.6.  In the tool when you click on a 
> > particular table you can select a tab called "Statistics".  This tab has 
> > all kinds of info on your table.  For some reason the only info I see is 
> > for table size, toast table size and indexes size.  Is there a reason 
> > that the other 15 fields have zeros in them?  I was thinking that maybe 
> > I needed to turn on a setting within my database in order to get 
> > statistics reported.
> 
> it seems that the module pgstattuple is needed

That'll allow you to see extra stats in 1.8, but won't alter what you already 
see, in fact 1.6 won't use it at all. What values are at zero?

Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] index over timestamp not being used

2007-07-24 Thread Tom Lane
Arnau <[EMAIL PROTECTED]> writes:
> timestamp_in  | timestamp without time zone | default now()

> SELECT ...
> FROM
>   transactions t
>   LEFT OUTER JOIN statistics s ON t.transaction_id = s.transaction_id
> WHERE
>   t.timestamp_in >= to_timestamp('20070101', 'MMDD')
> GROUP BY date, t.type_id;

to_timestamp() produces timestamp *with* timezone, so your WHERE query
is effectively
t.timestamp_in::timestamptz >= to_timestamp('20070101', 'MMDD')
which doesn't match the index.

The first question you should ask yourself is whether you picked the
right datatype for the column.  IMHO timestamp with tz is the more
appropriate choice in the majority of cases.

If you do want to stick with timestamp without tz, you'll need to cast
the result of to_timestamp to that.

Alternatively, do you really need to_timestamp at all?  The standard
timestamp input routine won't have any problem with that format:
t.timestamp_in >= '20070101'

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] index over timestamp not being used

2007-07-24 Thread Arnau

Hi Tom,



Alternatively, do you really need to_timestamp at all?  The standard
timestamp input routine won't have any problem with that format:
t.timestamp_in >= '20070101'


This is always I think I'm worried, what happens if one day the internal 
format in which the DB stores the date/timestamps changes. I mean, if 
instead of being stored as MMDD is stored as DDMM, should we 
have to change all the queries? I thought the 
to_char/to_date/to_timestamp functions were intented for this purposes



--
Arnau

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] index over timestamp not being used

2007-07-24 Thread Tom Lane
Arnau <[EMAIL PROTECTED]> writes:
>> Alternatively, do you really need to_timestamp at all?  The standard
>> timestamp input routine won't have any problem with that format:
>> t.timestamp_in >= '20070101'

> This is always I think I'm worried, what happens if one day the internal 
> format in which the DB stores the date/timestamps changes. I mean, if 
> instead of being stored as MMDD is stored as DDMM, should we 
> have to change all the queries?

You are confusing internal storage format with the external
representation.

> I thought the 
> to_char/to_date/to_timestamp functions were intented for this purposes

No, they're intended for dealing with wacky formats that the regular
input/output routines can't understand or produce.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 8.2 -> 8.3 performance numbers

2007-07-24 Thread Merlin Moncure

On 7/20/07, Josh Berkus <[EMAIL PROTECTED]> wrote:

Jim,

> Has anyone benchmarked HEAD against 8.2? I'd like some numbers to use in
> my OSCon lightning talk. Numbers for both with and without HOT would be
> even better (I know we've got HOT-specific benchmarks, but I want
> complete 8.2 -> 8.3 numbers).

We've done it on TPCE, which is a hard benchmark for PostgreSQL.  On
that it's +9% without HOT and +13% with HOT.  I think SpecJ would show a
greater difference, but we're still focussed on benchmarks we can
publish (i.e. 8.2.4) right now.


Are there any industry standard benchmarks that you know of which
PostgreSQL excels at?

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Karl Denninger

I've got an interesting issue here that I'm running into with 8.2.3

This is an application that has run quite well for a long time, and has 
been operating without significant changes (other than recompilation) 
since back in the early 7.x Postgres days.  But now we're seeing a LOT 
more load than we used to with it, and suddenly, we're seeing odd 
performance issues.


It APPEARS that the problem isn't query performance per-se.  That is, 
while I can find a few processes here and there in a run state when I 
look with a PS, I don't see them consistently churning.


But here's the query that has a habit of taking the most time

select forum, * from post where toppost = 1 and (replied > (select 
lastview from forumlog where login='theuser' and forum=post.forum and 
number is null)) is not false AND (replied > (select lastview from 
forumlog where login='theuser' and forum=post.forum and 
number=post.number)) is not f

alse order by pinned desc, replied desc offset 0 limit 20

Now the numeric and "login" fields may change; when I plug it into 
explain what I get back is:


QUERY 
PLAN  
---

Limit  (cost=57270.22..57270.27 rows=20 width=757)
  ->  Sort  (cost=57270.22..57270.50 rows=113 width=757)
Sort Key: pinned, replied
->  Index Scan using post_top on post  (cost=0.00..57266.37 
rows=113 width=757)

  Index Cond: (toppost = 1)
  Filter: (((replied > (subplan)) IS NOT FALSE) AND 
((replied > (subplan)) IS NOT FALSE))

  SubPlan
->  Index Scan using forumlog_composite on forumlog  
(cost=0.00..8.29 rows=1 width
  Index Cond: (("login" = 'theuser'::text) AND 
(forum = $0) AND (number = $1))
->  Bitmap Heap Scan on forumlog  (cost=4.39..47.61 
rows=1 width=8)
  Recheck Cond: (("login" = 'theuser'::text) AND 
(forum = $0))

  Filter: (number IS NULL)
  ->  Bitmap Index Scan on forumlog_composite  
(cost=0.00..4.39 rows=12 width=0)
Index Cond: (("login" = 'theuser'::text) 
AND (forum = $0))

(14 rows)

And indeed, it returns a fairly reasonable number of rows.

This takes a second or two to return - not all that bad - although this 
is one that people hit a LOT. 


One thing that DOES bother me is this line from the EXPLAIN output:
->  Index Scan using post_top on post  (cost=0.00..57266.53 rows=113 
width=757)


This is indexed using:

 "post_top" btree (toppost)

Ain't nothing fancy there.  So how come the planner thinks this is going 
to take that long?!?


More interesting, if I do a simple query on that line, I get

ticker=> explain select forum from post where toppost='1';
   QUERY PLAN
---

Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
  Index Cond: (toppost = 1)

Hmm; that's a bit more reasonable.  So what's up with the above line?

What I'm seeing is that as concurrency increases, I see the CPU load 
spike.  Disk I/O is low to moderate at less than 10% of maximum 
according to systat -vm, no swap in use, 300mb dedicated to shared 
memory buffers for Postgres (machine has 1GB of RAM and is a P4/3G/HT 
running FreeBSD 6.2-STABLE)  It does not swap at all, so it does not 
appear I've got a problem with running out of physical memory.  shmem is 
pinned to physical memory via the sysctl tuning parameter to prevent 
page table thrashing.


However, load goes up substantially and under moderate to high 
concurrency gets into the 4-5 range with response going somewhat to 
crap.  The application is still usable, but its not "crisp".  If I do a 
"ps" during times that performance is particularly bad, I don't see any 
particular overrepresentation of this query .vs. others (I have the 
application doing a "setproctitle" so I know what command - and thus 
what sets of queries - it is executing.)


Not sure where to start here.  It appears that I'm CPU limited and the 
problem may be that this is a web-served application that must connect 
to the Postgres backend for each transaction, perform its queries, and 
then close the connection down - in other words the load may be coming 
not from Postgres but rather from places I can't fix at the application 
layer (e.g. fork() overhead, etc).  The DBMS and Apache server are on 
the same machine, so there's no actual network overhead involved. 

If that's the case the only solution is to throw more hardware at it.  I 
can do that, but before I go tossing more CPU at the problem I'd like to 
know I'm not just wasting money.


The application uses the "C" language interface and just calls 
"Connectdb" - the only parameter is 

Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Tom Lane
Karl Denninger <[EMAIL PROTECTED]> writes:
> But here's the query that has a habit of taking the most time

> select forum, * from post where toppost = 1 and (replied > (select 
> lastview from forumlog where login='theuser' and forum=post.forum and 
> number is null)) is not false AND (replied > (select lastview from 
> forumlog where login='theuser' and forum=post.forum and 
> number=post.number)) is not f
> alse order by pinned desc, replied desc offset 0 limit 20

Did that ever perform well for you?  It's the sub-selects that are
likely to hurt ... in particular,

>  ->  Index Scan using post_top on post  (cost=0.00..57266.37 
> rows=113 width=757)
>Index Cond: (toppost = 1)
>Filter: (((replied > (subplan)) IS NOT FALSE) AND 
> ((replied > (subplan)) IS NOT FALSE))

versus

>  Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
>Index Cond: (toppost = 1)

The planner thinks that the two subplan filter conditions will eliminate
about 90% of the rows returned by the bare indexscan (IIRC this is
purely a rule of thumb, not based on any statistics) and that testing
them 1013 times will add over 5 cost units to the basic indexscan.
That part I believe --- correlated subqueries are expensive.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Karl Denninger
Yeah, the problem doesn't appear to be there.  As I said, if I look at 
the PS of the system when its bogging, there aren't a whole bunch of 
processes stuck doing these, so while this does take a second or two to 
come back, that's not that bad.


Its GENERAL performance that just bites - the system is obviously out of 
CPU, but what I can't get a handle on is WHY.  It does not appear to be 
accumulating large amounts of runtime in processes I can catch, but the 
load average is quite high.


This is why I'm wondering if what I'm taking here is a hit on the 
fork/exec inside the portmaster, in the setup internally in there, in 
the IPC between my process via libPQ, etc - and how I can profile what's 
going on.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger <[EMAIL PROTECTED]> writes:
  

But here's the query that has a habit of taking the most time



  
select forum, * from post where toppost = 1 and (replied > (select 
lastview from forumlog where login='theuser' and forum=post.forum and 
number is null)) is not false AND (replied > (select lastview from 
forumlog where login='theuser' and forum=post.forum and 
number=post.number)) is not f

alse order by pinned desc, replied desc offset 0 limit 20



Did that ever perform well for you?  It's the sub-selects that are
likely to hurt ... in particular,

  
 ->  Index Scan using post_top on post  (cost=0.00..57266.37 
rows=113 width=757)

   Index Cond: (toppost = 1)
   Filter: (((replied > (subplan)) IS NOT FALSE) AND 
((replied > (subplan)) IS NOT FALSE))



versus

  

 Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
   Index Cond: (toppost = 1)



The planner thinks that the two subplan filter conditions will eliminate
about 90% of the rows returned by the bare indexscan (IIRC this is
purely a rule of thumb, not based on any statistics) and that testing
them 1013 times will add over 5 cost units to the basic indexscan.
That part I believe --- correlated subqueries are expensive.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Merlin Moncure

On 7/25/07, Karl Denninger <[EMAIL PROTECTED]> wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look at the
PS of the system when its bogging, there aren't a whole bunch of processes
stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously out of
CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but the load
average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Karl Denninger
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Merlin Moncure wrote:

On 7/25/07, Karl Denninger <[EMAIL PROTECTED]> wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look 
at the
PS of the system when its bogging, there aren't a whole bunch of 
processes

stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously 
out of

CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but 
the load

average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin


%SPAMBLOCK-SYS: Matched [google.com+], message ok



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Tom Lane
Karl Denninger <[EMAIL PROTECTED]> writes:
> Hm. now that's interesting.  Stats collector IS accumulating 
> quite a bit of runtime. me thinks its time to go grab 8.2.4.

I think Merlin might have nailed it --- the "stats collector bug" is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Karl Denninger

Aha!

BIG difference.  I won't know for sure until the biz day tomorrow but 
the "first blush" look is that it makes a HUGE difference in system 
load, and I no longer have the stats collector process on the top of the 
"top" list..


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger <[EMAIL PROTECTED]> writes:
  
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.



I think Merlin might have nailed it --- the "stats collector bug" is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance issue with 8.2.3 - "C" application

2007-07-24 Thread Nis Jørgensen
Karl Denninger skrev:
> I've got an interesting issue here that I'm running into with 8.2.3
> 
> This is an application that has run quite well for a long time, and has
> been operating without significant changes (other than recompilation)
> since back in the early 7.x Postgres days.  But now we're seeing a LOT
> more load than we used to with it, and suddenly, we're seeing odd
> performance issues.
> 
> It APPEARS that the problem isn't query performance per-se.  That is,
> while I can find a few processes here and there in a run state when I
> look with a PS, I don't see them consistently churning.
> 
> But here's the query that has a habit of taking the most time
> 
> select forum, * from post where toppost = 1 and (replied > (select
> lastview from forumlog where login='theuser' and forum=post.forum and
> number is null)) is not false AND (replied > (select lastview from
> forumlog where login='theuser' and forum=post.forum and
> number=post.number)) is not false order by pinned desc, replied desc offset 0 
> limit 20

Since I can do little to help you with anything else, here is a little
help from a guy with a hammer. It seems you may be able to convert the
subqueries into a left join. Not sure whether this helps, nor whether I
got some bits of the logic wrong, but something like this might help the
planner find a better plan:

SELECT forum, *
FROM post
LEFT JOIN forumlog
ON post.forum = forumlog.forum
AND forumlog.login = 'theuser'
AND (post.number = forumlog.number OR forumlog.number IS NULL)
AND post.replied <= lastview
WHERE forumlog.forum IS NULL
AND forum.toppost = 1
ORDER BY pinned DESC, replied DESC OFFSET 0 LIMIT 20 ;


Nis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match