Re: [PERFORM] Help: how to speed up query after db server reboot

2009-09-03 Thread Magnus Hagander
On Wed, Sep 2, 2009 at 00:01, Wei Yan wrote:
> Hi:
>
> Looks like after postgres db server reboot, first query is very slow
> (10+mins). After the system cache built, query is pretty fast.
> Now the question is how to speed up the first query slow issue?
>
> Any pointers?

Schedule a run of a couple of representative queries right as the
database has started? That should pre-populate the cache before your
users get there, hopefully.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_stat_activity.current_query explanation?

2009-09-03 Thread Joshua Tolley
On Wed, Sep 02, 2009 at 11:29:14AM -0400, Pat Chan wrote:
>One is '' and the other is simply the word 'end'. 
>I googled everywhere about this but to not avail. If you could shed some
>light on this subject that would be great!
>Thank you in advance.

'' means that the client has opened a transaction but
isn't doing anything right now. If you issue a "BEGIN;" command and then just
sit there, for instance, you'll see these.

'END' is synonymous with 'COMMIT', so where those show up, it means the client
is in the middle of committing a transaction.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [PERFORM] Help: how to speed up query after db server reboot

2009-09-03 Thread Tom Lane
Magnus Hagander  writes:
> On Wed, Sep 2, 2009 at 00:01, Wei Yan wrote:
>> Looks like after postgres db server reboot, first query is very slow
>> (10+mins). After the system cache built, query is pretty fast.
>> Now the question is how to speed up the first query slow issue?

> Schedule a run of a couple of representative queries right as the
> database has started? That should pre-populate the cache before your
> users get there, hopefully.

I wonder if VACUUMing his key tables would be a good answer.  I bet that
a lot of the problem is swapping in indexes in a slow random-access
fashion.  In recent-model Postgres, VACUUM will do a sequential scan of
the indexes (at least for btree) which should be a much more efficient
way of bringing them into kernel cache.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition query using Seq Scan even when index is present

2009-09-03 Thread Merlin Moncure
On Wed, Sep 2, 2009 at 4:15 PM, Kenneth Cox wrote:
> With postgresql-8.3.6, I have many partitions inheriting a table.  SELECT
> min() on the parent performs a Seq Scan, but SELECT min() on a child uses
> the index.  Is this another case where the planner is not aware enough to
> come up with the best plan?  I tried creating an index on the parent table
> to no avail.  Is there a way to formulate the query so that it uses the
> index?  Here is the general flavor:
>
> create table calls (caller text, ts timestamptz);
> create table calls_partition_2009_08 (check (ts >= '2009-08-01' and ts <
> '2009-09-01')) inherits (calls);
> create index calls_partition_2009_08_ts on calls_partition_2009_08 (ts);
> insert into calls_partition_2009_08 (ts)
>  select to_timestamp(unix_time)
>    from generate_series(extract(epoch from '2009-08-01'::timestamptz)::int,
>                         extract(epoch from '2009-08-31
> 23:59'::timestamptz)::int, 60) as unix_time;
> analyze calls_partition_2009_08;
> explain select min(ts) from calls;

ATM, constraint exclusion mainly only supports queries of the form:
SELECT ... WHERE 'x', with x being an expression in the check
constraint.  Table partitioning unfortunately is not a free lunch, you
have to be aware of it at all times when writing queries vs your
partitioned tables.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Ivan Voras

Scott Otis wrote:

> 2x Intel Xeon Quad Core (@2 Ghz - Clovertown,L5335)
> 4GB RAM
> 4x Seagate 73GB SAS HDD 10k RPM – in RAID ( stripped and mirrored )

Would love to get some advice on how to change my conf settings / setup 
to get better I/O performance.



~1500 databases w/ ~60 tables each


This tells us nothing - size and complexity of databases is more 
important than their number.



Total I/O (these number are pretty constant throughout the day):

Reads: ~ 100 / sec for about 2.6 Mb/sec

Writes: ~ 400 /sec for about 46.1Mb/sec


Again, not enough information. How did you measure these? With iostat? 
Are those random reads or sequential? (i.e. what was the IO transaction 
size?) Caching can explain why you have 4x more writes than reads, but 
it's still unusual, especially with the high write transfer rate you claim.


If random, you're doing ~~ 500 IOPS on a RAID10 array of 4 10 kRPM 
drives, which is much more than you should - you're lucky you have the 
performance you do.


By the way, why do you think your setup is slow? Is your application 
slow and you think your database is the reason?



shared_buffers = 1GB

work_mem = 5MB

maintenance_work_mem = 256MB


Ok.


synchronous_commit = off


Ok. Could be important if your IO is slow as yours is.


checkpoint_segments = 6


You could try raising this to 20, but I doubt it will help you that 
much. OTOH it won't hurt.



checkpoint_warning = 30s

effective_cache_size = 1GB


Ok.

Most of the SQL happening is selects – very little inserts, updates and 
deletes comparatively.


Are you sure? Your write rate is a bit big for there to be very little 
insert/update/delete activity.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Kevin Grittner
astro77  wrote:
 
> I've got a table set up with an XML field that I would like to search
on 
> with
> 2.5 million records. The xml are serialized objects from my
application
> which are too complex to break out into separate tables. I'm trying
to run a
> query similar to this:
> 
>   SELECT  serialized_object as outVal
>from object  where
>   (
>
array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object, 
>  ARRAY
>  [
>  ARRAY['a',
'http://schemas.datacontract.org/2004/07/Objects'],
>  ARRAY['b',
'http://schemas.datacontract.org/2004/07/Security']
>  
>  ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
> 
>   )
>   limit 1000;
 
I would try to minimize how many XML values it had to read, parse, and
search.  The best approach that comes to mind would be to use tsearch2
techniques (with a GIN or GiST index on the tsvector) to identify
which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
to combine that with your xpath search.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Andy Colson

Scott Otis wrote:
Would love to get some advice on how to change my conf settings / setup 
to get better I/O performance.


Total I/O (these number are pretty constant throughout the day):
Reads: ~ 100 / sec for about 2.6 Mb/sec
Writes: ~ 400 /sec for about 46.1Mb/sec


Most of the SQL happening is selects – very little inserts, updates and 
deletes comparatively.




Maybe I'm wrong, but those two don't seem to jive.  You say its mostly selects, 
but you show higher writes per second.

Does freebsd have a vmstat or iostat?  How did you get the numbers above?  
How's the cpu's look?  (are they pegged?)

The io stats above seem low  (reading 2 meg a second is a tiny fraction of what 
your system should be capable of).  Have you tried a dd test?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition query using Seq Scan even when index is present

2009-09-03 Thread Kenneth Cox
Thank you, Greg!  I tweaked your function to use recursion to search all  
inherited tables; my inheritance structure is two levels deep.


This function is for integers only; I will copy/waste to create one for  
timestamps.  Extra credit for anyone who can rewrite it to be polymorphic.


-- Same as max(_colname) from _relname but much faster for inherited
-- tables with an index on _colname.  In postgresql-8.3.6 a naive query
-- on a parent table will not use the indexes on the child tables.
create or replace function partition_max_int(_relname text, _colname text)  
returns int AS

$$
declare
  childtable RECORD;
  childres RECORD;
  maxval int;
  tmpval int;
  sql text;
begin
  -- find max in this table (only)
  sql := 'select max('||_colname||') from only '||quote_ident(_relname);
  execute sql into maxval;

  -- recurse to find max in descendants
  FOR childtable in
  select pc.relname as relname
from pg_class pc
join pg_inherits pi on pc.oid=pi.inhrelid
where inhparent=(select oid from pg_class where relname=_relname)
  LOOP
tmpval := partition_max_int(childtable.relname, _colname);
IF tmpval is not NULL and (tmpval > maxval or maxval is null) THEN
  maxval := tmpval;
END IF;
  END LOOP;

  return maxval;
end;
$$
language 'plpgsql' STABLE;

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Andy Colson

Scott Otis wrote:
I agree that they don't make sense - part of the reason I am looking for 
help :)


I am using iostat to get those numbers ( which I specify to average over 
5 min then collect to display in Cacti ).


2 processes are taking up a good deal of CPU - the postgres stats 
collector and autovacuum ones.  Both of those are using a lot of 1 core 
each.


I am not familiar with a dd test - what is that?

Thanks,

Scott

On Sep 3, 2009, at 8:03 AM, "Andy Colson"  wrote:


Scott Otis wrote:
Would love to get some advice on how to change my conf settings / 
setup to get better I/O performance.

Total I/O (these number are pretty constant throughout the day):
Reads: ~ 100 / sec for about 2.6 Mb/sec
Writes: ~ 400 /sec for about 46.1Mb/sec
Most of the SQL happening is selects – very little inserts, updates 
and deletes comparatively.


Maybe I'm wrong, but those two don't seem to jive.  You say its mostly 
selects, but you show higher writes per second.


Does freebsd have a vmstat or iostat?  How did you get the numbers 
above?  How's the cpu's look?  (are they pegged?)


The io stats above seem low  (reading 2 meg a second is a tiny 
fraction of what your system should be capable of).  Have you tried a 
dd test?


-Andy


Please keep the list included so others may help.


the dd test:

http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm


I think Ivan is right, the 2 meg a second is probably because most of the reads 
are from cache.  But he and I looked at the writes differently.  If we ignore 
the 400/sec, and just read 46 meg a second (assuming you meant megabyte and not 
megabit) then, that's pretty slow (for sequential writing) -- which the dd test 
will measure your sequential read and write speed.

Ivan asked a good question:
By the way, why do you think your setup is slow? Is your application slow and 
you think your database is the reason?


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Robert Haas
On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote:
>
> I've got a table set up with an XML field that I would like to search on with
> 2.5 million records. The xml are serialized objects from my application
> which are too complex to break out into separate tables. I'm trying to run a
> query similar to this:
>
>        SELECT  serialized_object as outVal
>         from object  where
>        (
>        array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>
>        )
>        limit 1000;
>
> I've also set up an index on the xpath query like this...
>
> CREATE INDEX concurrently
> idx_object_nodeid
> ON
> object
> USING
> btree(
>
>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ])as text[])
> );
>
> The query takes around 30 minutes to complete with or without the index in
> place and does not cache the query. Additionally the EXPLAIN say that the
> index is not being used. I've looked everywhere but can't seem to find solid
> info on how to achieve this. Any ideas would be greatly appreciated.

Why do you have a cast in the index definition?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Scott Otis
Sorry about not responding to the whole list earlier - this is my first time 
posting to a mailing list.

Would providing more information about the size and complexities of the 
databases help?

I measure I/O stats with iostat - here is the command I use:

iostat -d -x mfid0 -t 290 2

I tried looking at the man page for iostat but couldn't find anywhere how to 
determine what the stats are for sequential vs random - any help there?

When using 'top -m io' the postgres stats collector process is constantly at 
99% - 100%.

When using 'top' the WCPU for the postgres stats collector and the autovacuum 
process are constantly at 20% - 21%.

Is that normal?  It seems to me that the stats collector is doing all the I/O 
(which would mean the stats collector is doing 46.1 megabytes /sec).

Also, the I/O stats don't change hardly at all (except at night during backups 
which makes sense).  They don't go up or down with user activity on the server 
- which makes me wonder a little bit.  I have a feeling that if I just turned 
off Apache that the I/O stats wouldn't change.  Which leads me to believe that 
the I/O is not query related - its stats collecting and autovacuuming related.  
Is that expected?

It seems to me that the stats collector shouldn't be using that much I/O and 
CPU (and the autovacuum shouldn't be using that much CPU)  - therefore 
something in my configuration must be messed up or could be changed somehow.  
But maybe I'm wrong - please let me know.

I don't think my setup is necessarily slow.  I just want to make it as 
efficient as possible and wanted to get some feedback to see if am setting 
things up right.  I am also looking out into the future and seeing how much 
load I can put on this server before getting another one.  If I can reduce the 
I/O and CPU that the stats collector and autovacuum are using without losing 
any functionality then I can put more load on the server.

Again thanks for all the help.

Scott Otis
CIO / Lead Developer
Intand
www.intand.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Andy Colson

Scott Otis wrote:

Sorry about not responding to the whole list earlier - this is my first time 
posting to a mailing list.

Would providing more information about the size and complexities of the 
databases help?

I measure I/O stats with iostat - here is the command I use:

iostat -d -x mfid0 -t 290 2

I tried looking at the man page for iostat but couldn't find anywhere how to 
determine what the stats are for sequential vs random - any help there?

When using 'top -m io' the postgres stats collector process is constantly at 
99% - 100%.

When using 'top' the WCPU for the postgres stats collector and the autovacuum 
process are constantly at 20% - 21%.

Is that normal?  It seems to me that the stats collector is doing all the I/O 
(which would mean the stats collector is doing 46.1 megabytes /sec).

Also, the I/O stats don't change hardly at all (except at night during backups 
which makes sense).  They don't go up or down with user activity on the server 
- which makes me wonder a little bit.  I have a feeling that if I just turned 
off Apache that the I/O stats wouldn't change.  Which leads me to believe that 
the I/O is not query related - its stats collecting and autovacuuming related.  
Is that expected?

It seems to me that the stats collector shouldn't be using that much I/O and 
CPU (and the autovacuum shouldn't be using that much CPU)  - therefore 
something in my configuration must be messed up or could be changed somehow.  
But maybe I'm wrong - please let me know.

I don't think my setup is necessarily slow.  I just want to make it as 
efficient as possible and wanted to get some feedback to see if am setting 
things up right.  I am also looking out into the future and seeing how much 
load I can put on this server before getting another one.  If I can reduce the 
I/O and CPU that the stats collector and autovacuum are using without losing 
any functionality then I can put more load on the server.

Again thanks for all the help.

Scott Otis
CIO / Lead Developer
Intand
www.intand.com




When using 'top -m io' the postgres stats collector process is constantly at 
99% - 100%.
When using 'top' the WCPU for the postgres stats collector and the autovacuum 
process are constantly at 20% - 21%.


Yeah, that sounds excessive.  But my database gets 20 transactions a DAY, so, I 
have no experience with a busy box.

You say its mostly selects, but do you have any triggers or anything that might 
update a table?  Do you do inserts or updates to track traffic?

What does:

select * from pg_stat_activity

look like? (I think vacuum will show up in there, right?)  I'm curious if we 
can find the table autovacuum is working on, maybe that'll help pin it down.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL not using index for statement with group by

2009-09-03 Thread Mark Starkman
I am new to PostgreSQL and I am evaluating it for use as a data
warehouse.  I am really struggling to get a simple query to perform
well.  I have put the appropriate indexes on the table (at least they
are appropriate from my use with several other RDBMS's).  However, the
query doesn't perform well, and I'm not sure how to get it to return in
reasonable amount of time.  Right now the query takes between 2 - 3
minutes to return.  There are about 39 million rows in the table. Here
is all of the information that I have.  Please let me know if you I have
done anything wrong or what needs to change.

 

Thanks,

Mark

 

Table Definition:

CREATE TABLE temp_inventory_fact

(

  item_id integer NOT NULL,

  date_id timestamp with time zone NOT NULL,

  "CBL_Key" integer NOT NULL,

  product_group_id integer NOT NULL,

  supplier_id numeric(19) NOT NULL,

  "Cost" numeric(19,9) NOT NULL,

  qty_on_hand numeric(19,9) NOT NULL,

  qty_allocated numeric(19,9) NOT NULL,

  qty_backordered numeric(19,9) NOT NULL,

  qty_on_po numeric(19,9) NOT NULL,

  qty_in_transit numeric(19,9) NOT NULL,

  qty_reserved numeric(19,9) NOT NULL,

  nonstock_id boolean NOT NULL

)

WITH (

  OIDS=FALSE

);

 

Query:

select product_group_id, SUM("Cost")

FROM temp_inventory_Fact

where product_group_id < 100

group by product_group_id

order by product_group_id

limit 50;

 

Indexes on table:

CREATE INDEX idx_temp_inventory_fact_product_cost ON temp_inventory_fact
(product_group_id, "Cost");

CREATE INDEX idx_temp_inventory_fact_product ON temp_inventory_fact
(product_group_id);



Re: [PERFORM] SAAS and MySQL

2009-09-03 Thread Tguru


So as I understand, what you need is an online database program able to
perform ETL tasks, that works in the cloud. 

There are a few companies out there able to perform what you are asking.
What I could propose is a company called Talend. With Talend On Demand. 

This solution is based on the open source Talend Open Studio. You are
offered a collaborative platform to work on, meaning that all your teams in
different countries will be working on the same database on a secured web
service. 

Go check it out on the website: 
http://www.talend.com/talend-on-demand/talend-on-demand.php . Hope this
helps.








Rstat wrote:
> 
> 
> Hi all, 
> 
> We are a young, new company on the market. We are starting to open up new
> markets in other countries (Europe). 
> 
> It somewhat is a challenge for us: we can't share our data and mysql
> database between all our different services. So here is my question: do
> you think it would be possible to find an ETL program that could work in
> the cloud? 
> 
> It would not have to be too complex, but sturdy and working as a Software
> as a Service. 
> 
> Thanks a lot for your help.
> 

-- 
View this message in context: 
http://www.nabble.com/SAAS-and-MySQL-tp25258395p25276553.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread astro77

I was receiving an error that an XML field does not support the various
indexes available in postgresql. Is there an example of how to do this
properly?


Robert Haas wrote:
> 
> On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote:
>>
>> I've got a table set up with an XML field that I would like to search on
>> with
>> 2.5 million records. The xml are serialized objects from my application
>> which are too complex to break out into separate tables. I'm trying to
>> run a
>> query similar to this:
>>
>>        SELECT  serialized_object as outVal
>>         from object  where
>>        (
>>      
>>  array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>>
>>        )
>>        limit 1000;
>>
>> I've also set up an index on the xpath query like this...
>>
>> CREATE INDEX concurrently
>> idx_object_nodeid
>> ON
>> object
>> USING
>> btree(
>>
>>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ])as text[])
>> );
>>
>> The query takes around 30 minutes to complete with or without the index
>> in
>> place and does not cache the query. Additionally the EXPLAIN say that the
>> index is not being used. I've looked everywhere but can't seem to find
>> solid
>> info on how to achieve this. Any ideas would be greatly appreciated.
> 
> Why do you have a cast in the index definition?
> 
> ...Robert
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25283175.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Robert Haas
On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote:
> Sorry about not responding to the whole list earlier - this is my first time 
> posting to a mailing list.
>
> Would providing more information about the size and complexities of the 
> databases help?
>
> I measure I/O stats with iostat - here is the command I use:
>
> iostat -d -x mfid0 -t 290 2
>
> I tried looking at the man page for iostat but couldn't find anywhere how to 
> determine what the stats are for sequential vs random - any help there?
>
> When using 'top -m io' the postgres stats collector process is constantly at 
> 99% - 100%.
>
> When using 'top' the WCPU for the postgres stats collector and the autovacuum 
> process are constantly at 20% - 21%.
>
> Is that normal?  It seems to me that the stats collector is doing all the I/O 
> (which would mean the stats collector is doing 46.1 megabytes /sec).
>
> Also, the I/O stats don't change hardly at all (except at night during 
> backups which makes sense).  They don't go up or down with user activity on 
> the server - which makes me wonder a little bit.  I have a feeling that if I 
> just turned off Apache that the I/O stats wouldn't change.  Which leads me to 
> believe that the I/O is not query related - its stats collecting and 
> autovacuuming related.  Is that expected?
>
> It seems to me that the stats collector shouldn't be using that much I/O and 
> CPU (and the autovacuum shouldn't be using that much CPU)  - therefore 
> something in my configuration must be messed up or could be changed somehow.  
> But maybe I'm wrong - please let me know.
>
> I don't think my setup is necessarily slow.  I just want to make it as 
> efficient as possible and wanted to get some feedback to see if am setting 
> things up right.  I am also looking out into the future and seeing how much 
> load I can put on this server before getting another one.  If I can reduce 
> the I/O and CPU that the stats collector and autovacuum are using without 
> losing any functionality then I can put more load on the server.
>
> Again thanks for all the help.

Can you post to the list all the uncommented lines from your
postgresql.conf file and attach the results of "select * from
pg_stat_all_tables" as an attachment?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Robert Haas
On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote:
> I was receiving an error that an XML field does not support the various
> indexes available in postgresql.

Please post what happens when you try.

> Is there an example of how to do this
> properly?

Not sure.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Andy Colson

Robert Haas wrote:

On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote:

Sorry about not responding to the whole list earlier - this is my first time 
posting to a mailing list.

Would providing more information about the size and complexities of the 
databases help?

I measure I/O stats with iostat - here is the command I use:

iostat -d -x mfid0 -t 290 2

I tried looking at the man page for iostat but couldn't find anywhere how to 
determine what the stats are for sequential vs random - any help there?

When using 'top -m io' the postgres stats collector process is constantly at 
99% - 100%.

When using 'top' the WCPU for the postgres stats collector and the autovacuum 
process are constantly at 20% - 21%.

Is that normal?  It seems to me that the stats collector is doing all the I/O 
(which would mean the stats collector is doing 46.1 megabytes /sec).

Also, the I/O stats don't change hardly at all (except at night during backups 
which makes sense).  They don't go up or down with user activity on the server 
- which makes me wonder a little bit.  I have a feeling that if I just turned 
off Apache that the I/O stats wouldn't change.  Which leads me to believe that 
the I/O is not query related - its stats collecting and autovacuuming related.  
Is that expected?

It seems to me that the stats collector shouldn't be using that much I/O and 
CPU (and the autovacuum shouldn't be using that much CPU)  - therefore 
something in my configuration must be messed up or could be changed somehow.  
But maybe I'm wrong - please let me know.

I don't think my setup is necessarily slow.  I just want to make it as 
efficient as possible and wanted to get some feedback to see if am setting 
things up right.  I am also looking out into the future and seeing how much 
load I can put on this server before getting another one.  If I can reduce the 
I/O and CPU that the stats collector and autovacuum are using without losing 
any functionality then I can put more load on the server.

Again thanks for all the help.


Can you post to the list all the uncommented lines from your
postgresql.conf file and attach the results of "select * from
pg_stat_all_tables" as an attachment?

...Robert



The first message he posted had this, and other info... Which is funny, because 
I almost asked the exact same question :-)


FreeBSD 6.4
Apache 2.2
PostgreSQL 8.3.6
PHP 5.2.9


~1500 databases w/ ~60 tables each


Conf settings:

listen_addresses = '*'
max_connections = 600
ssl = on
password_encryption = on
shared_buffers = 1GB
work_mem = 5MB
maintenance_work_mem = 256MB
max_fsm_pages = 280
max_fsm_relations = 16
synchronous_commit = off
checkpoint_segments = 6
checkpoint_warning = 30s
effective_cache_size = 1GB


pg_stat_bgwriter:

checkpoints_timed: 16660
checkpoints_req: 1309
buffers_checkpoint: 656346
buffers_clean: 120922
maxwritten_clean: 1
buffers_backend: 167623
buffers_alloc: 472802349



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Robert Haas
On Thu, Sep 3, 2009 at 5:27 PM, Andy Colson wrote:
> Robert Haas wrote:
>>
>> On Thu, Sep 3, 2009 at 4:16 PM, Scott Otis wrote:
>>>
>>> Sorry about not responding to the whole list earlier - this is my first
>>> time posting to a mailing list.
>>>
>>> Would providing more information about the size and complexities of the
>>> databases help?
>>>
>>> I measure I/O stats with iostat - here is the command I use:
>>>
>>> iostat -d -x mfid0 -t 290 2
>>>
>>> I tried looking at the man page for iostat but couldn't find anywhere how
>>> to determine what the stats are for sequential vs random - any help there?
>>>
>>> When using 'top -m io' the postgres stats collector process is constantly
>>> at 99% - 100%.
>>>
>>> When using 'top' the WCPU for the postgres stats collector and the
>>> autovacuum process are constantly at 20% - 21%.
>>>
>>> Is that normal?  It seems to me that the stats collector is doing all the
>>> I/O (which would mean the stats collector is doing 46.1 megabytes /sec).
>>>
>>> Also, the I/O stats don't change hardly at all (except at night during
>>> backups which makes sense).  They don't go up or down with user activity on
>>> the server - which makes me wonder a little bit.  I have a feeling that if I
>>> just turned off Apache that the I/O stats wouldn't change.  Which leads me
>>> to believe that the I/O is not query related - its stats collecting and
>>> autovacuuming related.  Is that expected?
>>>
>>> It seems to me that the stats collector shouldn't be using that much I/O
>>> and CPU (and the autovacuum shouldn't be using that much CPU)  - therefore
>>> something in my configuration must be messed up or could be changed somehow.
>>>  But maybe I'm wrong - please let me know.
>>>
>>> I don't think my setup is necessarily slow.  I just want to make it as
>>> efficient as possible and wanted to get some feedback to see if am setting
>>> things up right.  I am also looking out into the future and seeing how much
>>> load I can put on this server before getting another one.  If I can reduce
>>> the I/O and CPU that the stats collector and autovacuum are using without
>>> losing any functionality then I can put more load on the server.
>>>
>>> Again thanks for all the help.
>>
>> Can you post to the list all the uncommented lines from your
>> postgresql.conf file and attach the results of "select * from
>> pg_stat_all_tables" as an attachment?
>>
>> ...Robert
>>
>
> The first message he posted had this, and other info... Which is funny,
> because I almost asked the exact same question :-)
>
>
> FreeBSD 6.4
> Apache 2.2
> PostgreSQL 8.3.6
> PHP 5.2.9
>
>
> ~1500 databases w/ ~60 tables each
>
>
> Conf settings:
>
> listen_addresses = '*'
> max_connections = 600
> ssl = on
> password_encryption = on
> shared_buffers = 1GB
> work_mem = 5MB
> maintenance_work_mem = 256MB
> max_fsm_pages = 280
> max_fsm_relations = 16
> synchronous_commit = off
> checkpoint_segments = 6
> checkpoint_warning = 30s
> effective_cache_size = 1GB
>
>
> pg_stat_bgwriter:
>
> checkpoints_timed: 16660
> checkpoints_req: 1309
> buffers_checkpoint: 656346
> buffers_clean: 120922
> maxwritten_clean: 1
> buffers_backend: 167623
> buffers_alloc: 472802349

You're right - I missed that.  But I still want to see pg_stat_all_tables.

I wonder if it would be worth attaching strace to the stats collector
and trying to get some idea what it's doing (if FreeBSD has
strace...).

Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Ivan Voras
2009/9/3 Scott Otis :
> Sorry about not responding to the whole list earlier - this is my first time 
> posting to a mailing list.
>
> Would providing more information about the size and complexities of the 
> databases help?
>
> I measure I/O stats with iostat - here is the command I use:
>
> iostat -d -x mfid0 -t 290 2

Simply do "iostat mfid0 1" and post 10 lines of its output.

> When using 'top -m io' the postgres stats collector process is constantly at 
> 99% - 100%.

In itself it doesn't mean much. The number of IOs is important.

> I don't think my setup is necessarily slow.  I just want to make it as 
> efficient as possible and wanted to get some feedback to see if am setting 
> things up right.  I am also looking out into the future and seeing how much 
> load I can put on this server before getting another one.  If I can reduce 
> the I/O and CPU that the stats collector and autovacuum are using without 
> losing any functionality then I can put more load on the server.

In general it's tricky to optimize for unknown targets - if your
performance is OK right now, you should leave it alone.

On the other hand, your diagnosis of stats collector doing 46 MB/s
points to something very abnormal. You should probably post your
entire postgresql.conf.

-- 
f+rEnSIBITAhITAhLR1nM9F4cIs5KJrhbcsVtUIt7K1MhWJy1A==

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL not using index for statement with group by

2009-09-03 Thread Kevin Grittner
Mark Starkman  wrote:
 
> I'm not sure how to get it to return in
> reasonable amount of time.
 
Some more information could help.
 
What version of PostgreSQL is this?
 
Please give an overview of the hardware and OS.
 
Please show your postgresql.conf file, excluding comments.
 
Please run your query with EXPLAIN ANALYZE in front, so we can see the
execution plan, with cost estimates compared to actual information. 
If the the plan indicates a sequential scan, and you think an indexed
scan may be faster, you might be able to coerce it into the indexed
plan for diagnostic purposes by running this on the connection before
an EXPLAIN ANALYZE run:
 
set enable_seqscan = off;
 
You don't want to leave it off, or try to use that in production, but
it might be useful in figuring out what's going on.
 
That might be enough to diagnose the issue.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Scott Otis
> Simply do "iostat mfid0 1" and post 10 lines of its output.

  tty   mfid0 cpu
 tin tout  KB/t tps  MB/s  us ni sy in id
   0  152 108.54 335 35.51  43  0 30  1 27
   0  525 85.73 759 63.55  14  0 12  0 74
   0   86 67.72 520 34.39  13  0 12  0 75
   0   86 86.89 746 63.26  12  0 12  0 76
   0   86 70.09 594 40.65  13  0 11  0 76
   0   86 78.50 756 57.99  13  0 10  0 77
   0  351 81.46 774 61.61  12  0 11  0 77
   0   86 63.87 621 38.72   9  0  8  0 83
   0   86 80.87 821 64.86   8  0  8  0 83
   0   86 58.78 637 36.55  11  0 11  0 77

Scott




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Scott Otis
> Can you post to the list all the uncommented lines from your
postgresql.conf file and attach the results of "select * from
pg_stat_all_tables" as an attachment?

I attached a CSV of "select * from pg_stat_all_tables" from one of our
more heavily used databases.  Note: I turned off stats collection and
autvacuuming a couple days ago to see what it would do and then
restarted postgres - I turned those back on this morning to that is why
there aren't more autovacuumed and autoanalyzed tables.

Sorry if this is a little verbose - I didn't want to leave anything out.

Uncommented lines from Postgresql.conf:

listen_addresses = '*'
max_connections = 600
ssl = on
password_encryption = on
shared_buffers = 1GB
work_mem = 5MB
maintenance_work_mem = 256MB
max_fsm_pages = 280
max_fsm_relations = 16
synchronous_commit = off
checkpoint_segments = 6
checkpoint_warning = 30s
effective_cache_size = 1GB
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/pgsql'
log_filename = '%m%d%y_%H%M%S-pgsql.log'
log_rotation_age = 1d
log_rotation_size = 10MB
log_min_messages = warning
log_error_verbosity = default
log_min_error_statement = warning
silent_mode = on
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
log_temp_files = 0
track_activities = on
track_counts = on
update_process_title = off
log_parser_stats = off
log_planner_stats = off
log_executor_stats = off
log_statement_stats = off
autovacuum = on
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'



pg_stat_all_tables2.csv
Description: pg_stat_all_tables2.csv

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SAAS and MySQL

2009-09-03 Thread Merlin Moncure
On Thu, Sep 3, 2009 at 9:52 AM, Tguru wrote:
>
>
> So as I understand, what you need is an online database program able to
> perform ETL tasks, that works in the cloud.
>
> There are a few companies out there able to perform what you are asking.
> What I could propose is a company called Talend. With Talend On Demand.
>
> This solution is based on the open source Talend Open Studio. You are
> offered a collaborative platform to work on, meaning that all your teams in
> different countries will be working on the same database on a secured web
> service.
>
> Go check it out on the website:
> http://www.talend.com/talend-on-demand/talend-on-demand.php . Hope this
> helps.
>
>
>
>
>
>
>
>
> Rstat wrote:
>>
>>
>> Hi all,
>>
>> We are a young, new company on the market. We are starting to open up new
>> markets in other countries (Europe).
>>
>> It somewhat is a challenge for us: we can't share our data and mysql
>> database between all our different services. So here is my question: do
>> you think it would be possible to find an ETL program that could work in
>> the cloud?
>>
>> It would not have to be too complex, but sturdy and working as a Software
>> as a Service.
>>
>> Thanks a lot for your help.
>>
>

huh?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL not using index for statement with group by

2009-09-03 Thread Scott Marlowe
On Thu, Sep 3, 2009 at 7:33 AM, Mark Starkman wrote:
> I am new to PostgreSQL and I am evaluating it for use as a data  warehouse.
> I am really struggling to get a simple query to perform well.  I have put
> the appropriate indexes on the table (at least they are appropriate from my
> use with several other RDBMS’s).

Ok, first things first.  Pgsql isn't like most other dbms.  It's
indexes do not contain visibility info, which means that if the db
were to use the indexes to look up entries in a table, it still has to
go back to the table to look those values up to see if they are
visible to the current transation.

So, if you're retrieving a decent percentage of the table, it's
cheaper to just hit the table.  Note that this makes PostgreSQL poorly
suited for very wide tables.

Generally the trick to making large accesses run fast in pgsql is to
increase work_mem.  But some queries just aren't efficient in pgsql
that can be efficient in other dbs.

Possibly clustering on product_group_id would help.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Robert Haas
On Thu, Sep 3, 2009 at 7:11 PM, Scott Otis wrote:
>> Can you post to the list all the uncommented lines from your
> postgresql.conf file and attach the results of "select * from
> pg_stat_all_tables" as an attachment?
>
> I attached a CSV of "select * from pg_stat_all_tables" from one of our
> more heavily used databases.  Note: I turned off stats collection and
> autvacuuming a couple days ago to see what it would do and then
> restarted postgres - I turned those back on this morning to that is why
> there aren't more autovacuumed and autoanalyzed tables.

Do you by any chance have a bazillion databases in this cluster?  Can
you do these?

select sum(1) from pg_database;
select pg_relation_size('pg_database');
select sum(pg_column_size(d.*)) from pg_database;

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Scott Otis
Robert Haas :
 
> Do you by any chance have a bazillion databases in this cluster?  Can
> you do these?

> select sum(1) from pg_database;
 
1555 

> select pg_relation_size('pg_database');
 
221184

> select sum(pg_column_size(d.*)) from pg_database;
 
That gave me:
 
ERROR:  missing FROM-clause entry for table "d"
LINE 1: select sum(pg_column_size(d.*)) from pg_database;

So I did this: 
 
select sum(pg_column_size(d.*)) from pg_database as d;
 
and got:
 
192910
 
Also did this:
 
select sum(pg_database_size(datname)) from pg_database;
 
and got:
 
13329800428 (12.4GB)
 
Scott
 


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Claus Guttesen
>> > max_fsm_pages = 280
>> > max_fsm_relations = 16
>
>> What does the last couple of lines from a 'vacuum analyze verbose'
>> say? I have max_fsm_pages = 400 and max_fsm_relations = 1500.
>
>> You can also try to lower random_page_cost to a lower value like 1.2
>> but I doubt this will help in your case.
>
> last couple lines from 'vacuumdb -a -v -z':
>
> INFO:  free space map contains 114754 pages in 42148 relations
> DETAIL:  A total of 734736 page slots are in use (including overhead).

v---
> 734736 page slots are required to track all free space.
^---

> Current limits are:  280 page slots, 16 relations, using 26810 kB.

You can lower your max_fsm_pages setting to a number above 'xyz page
slots required ...' to 100 and fsm-relations to like 5.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Seeking performance advice and explanation for high I/O on 8.3

2009-09-03 Thread Scott Otis
Claus Guttesen kome...@gmail.com:

> > Would love to get some advice on how to change my conf settings / setup to
> > get better I/O performance.
> >
> > Server Specs:
> >
> > 2x Intel Xeon Quad Core (@2 Ghz - Clovertown,L5335)
> > 4GB RAM
> > 4x Seagate 73GB SAS HDD 10k RPM - in RAID ( stripped and mirrored )
> >
> > FreeBSD 6.4
> > Apache 2.2
> > PostgreSQL 8.3.6
> > PHP 5.2.9
> >
> > ~1500 databases w/ ~60 tables each
> >
> > max_connections = 600
> > shared_buffers = 1GB

> On a dual-core HP DL380 with 16 GB ram I have set shared_buffers at
> 512 MB for 900 max_connections. Far the largest table have approx. 120
> mill. records. You could try to lower shared_buffers.

> > max_fsm_pages = 280
> > max_fsm_relations = 16

> What does the last couple of lines from a 'vacuum analyze verbose'
> say? I have max_fsm_pages = 400 and max_fsm_relations = 1500.

> You can also try to lower random_page_cost to a lower value like 1.2
> but I doubt this will help in your case.
 
last couple lines from 'vacuumdb -a -v -z':

INFO:  free space map contains 114754 pages in 42148 relations
DETAIL:  A total of 734736 page slots are in use (including overhead).
734736 page slots are required to track all free space.
Current limits are:  280 page slots, 16 relations, using 26810 kB.


Scott