Re: [PERFORM] Heavy contgnous load

2011-11-11 Thread kzsolt
Looks like I found more magic.

My table is: each record near 1kbyte, contain dozen col some text some
numeric, some of the numeric columns are indexed. The database located at
ramdisk (tmpfs) ((I hope)). The table is contignously filled with rows.

If the table has less than 4Mrec then looks like everythink is fine.
But near at 6Mrec the CPU load is go to very high and even the COUNT(*) need
8sec executing time (1sec/Mrec). The insert is slowing down too. 
But more stange if I try to search a record by indexed col then the server
bring up it very quick!

My server config is:
/
max_connections  = 24
shared_buffers  = 256MB
log_destination = 'stderr'  # Valid values are combinations of
logging_collector  = true
silent_mode = on# Run server silently.
log_line_prefix = '%t %d %u '
datestyle = 'iso, ymd'
lc_messages = 'hu_HU'   # locale for system error message
lc_monetary = 'hu_HU'   # locale for monetary formatting
lc_numeric = 'hu_HU'# locale for number formatting
lc_time = 'hu_HU'   # locale for time formatting
default_text_search_config = 'pg_catalog.hungarian'
port = 9033
unix_socket_directory = standard disk
log_directory = standard disk
log_filename = 'sqld.log'
effective_cache_size = 8MB
checkpoint_segments = 16
synchronous_commit = off
/

Any idea how it possible to increase the performance?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4965371.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WAL partition filling up after high WAL activity

2011-11-11 Thread Rafael Martinez
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/09/2011 05:06 PM, Greg Smith wrote:
> On 11/07/2011 05:18 PM, Richard Yen wrote:
>> My biggest question is: we know from the docs that there should be no
>> more than (2 + checkpoint_completion_target) * checkpoint_segments + 1
>> files.  For us, that would mean no more than 48 files, which equates
>> to 384MB--far lower than the 9.7GB partition size.  **Why would WAL
>> use up so much disk space?**
>>
> 
> That's only true if things are operating normally.  There are at least
> two ways this can fail to be a proper upper limit on space used:
> 
> 1) You are archiving to a second system, and the archiving isn't keeping
> up.  Things that haven't been archived can't be re-used, so more disk
> space is used.
> 
> 2) Disk I/O is slow, and the checkpoint writes take a significant period
> of time.  The internal scheduling assumes each individual write will
> happen without too much delay.  That assumption can easily be untrue on
> a busy system.  The worst I've seen now are checkpoints that take 6
> hours to sync, where the time is supposed to be a few seconds.  Disk
> space in that case was a giant multiple of checkpoint_segments.  (The
> source of that problem is very much improved in PostgreSQL 9.1)
> 


Hello

We have a similar case in june but we did not find the cause of our
problem. More details and information:
http://archives.postgresql.org/pgsql-docs/2011-06/msg7.php

Your explanation in 2) sounds like a good candidate for the problem we
had. As I said in june, I think we need to improve the documentation in
this area. A note in the documentation about what you have explained in
2) with maybe some hints about how to find out if this is happening will
be a great improvement.

We did not understand why we experienced this problem in june when
creating a GIN index on a tsvector column. But we found out that a lot
of the tsvector data was generated from "garbage" data (base64 encoding
of huge attachments). When we generated the right tsvector data, the
creation of the GIN index ran smoothly and the problem with extra WAL
files disappeared.

PS.- In our case, the disk space used by all the extra WAL files was
almost the equivalent to the 17GB of our GIN index.

regards,
- -- 
 Rafael Martinez Guerrero
 Center for Information Technology
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk688LoACgkQBhuKQurGihTbvQCfaSBdYNF2oOtErcx/e4u0Zw1J
pLIAn2Ztdbuz33es2uw8ddSIjj8UXe3s
=olkD
-END PGP SIGNATURE-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] avoiding seq scans when two columns are very correlated

2011-11-11 Thread Ruslan Zakirov
Hello,

A table has two columns id and EffectiveId. First is primary key.
EffectiveId is almost always equal to id (95%) unless records are
merged. Many queries have id = EffectiveId condition. Both columns are
very distinct and Pg reasonably decides that condition has very low
selectivity and picks sequence scan.

Simple perl script that demonstrates estimation error:
https://gist.github.com/1356744

Estimation is ~200 times off (5 vs 950), for real situation it's very
similar. Understandably difference depends on correlation coefficient.

In application such wrong estimation result in seq scan of this table
winning leading position in execution plans over other tables and
index scans.

What can I do to avoid this problem?

Tested with PostgreSQL 9.0.3 on x86_64-apple-darwin10.6.0, compiled by
GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664),
64-bit

-- 
Best regards, Ruslan.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] avoiding seq scans when two columns are very correlated

2011-11-11 Thread Tom Lane
Ruslan Zakirov  writes:
> A table has two columns id and EffectiveId. First is primary key.
> EffectiveId is almost always equal to id (95%) unless records are
> merged. Many queries have id = EffectiveId condition. Both columns are
> very distinct and Pg reasonably decides that condition has very low
> selectivity and picks sequence scan.

I think the only way is to rethink your data representation.  PG doesn't
have cross-column statistics at all, and even if it did, you'd be asking
for an estimate of conditions in the "long tail" of the distribution.
That's unlikely to be very accurate.

Consider adding a "merged" boolean, or defining effectiveid differently.
For instance you could set it to null in unmerged records; then you
could get the equivalent of the current meaning with
COALESCE(effectiveid, id).  In either case, PG would then have
statistics that bear directly on the question of how many merged vs
unmerged records there are.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] where clause + function, execution order

2011-11-11 Thread Sorin Dudui
Hi,

I have the following function:

===
CREATE OR REPLACE FUNCTION xxx(text)
  RETURNS SETOF v AS
$BODY$
select a.x, a.y,
CASE
WHEN strpos($1,b.x) > 0
THEN b.x
ELSE NULL
END AS mp_hm
from  a LEFT JOIN  b ON a.id=b.id

$BODY$
  LANGUAGE sql STABLE
  COST 1000
  ROWS 1;
===

which I call as:

select * from xxx('test0|test1')  where a.x = 'value'


I am wondering when the where clause (a.x = 'value') is executed. After the 
select statement in the function finishes? Or is it appended at the select 
statement in the function?


Thank you,
Sorin




Re: [PERFORM] where clause + function, execution order

2011-11-11 Thread Julius Tuskenis

Hello,

On 2011.11.11 17:38, Sorin Dudui wrote:


Hi,

I have the following function:

===

CREATE OR REPLACE FUNCTION xxx(text)

  RETURNS SETOF v AS

$BODY$

select a.x, a.y,

CASE

WHEN strpos($1,b.x) > 0

THEN b.x

ELSE NULL

END AS mp_hm

from  a LEFT JOIN  b ON a.id=b.id

$BODY$

  LANGUAGE sql STABLE

  COST 1000

  ROWS 1;

===

which I call as:

select * from xxx(‘test0|test1‘)  where a.x = ‘value’


You should get an error as there is no "a" in this statement...


I am wondering when the where clause (a.x = ‘value’) is executed. 
After the select statement in the function finishes? Or is it appended 
at the select statement in the function?


Function execute plan is prepared when creating it, so the "where" 
clause should check the function result not altering its execution..


--
Julius Tuskenis
Head of the programming department
UAB nSoft
mob. +37068233050


Re: [PERFORM] where clause + function, execution order

2011-11-11 Thread Richard Huxton

On 11/11/11 15:54, Julius Tuskenis wrote:

On 2011.11.11 17:38, Sorin Dudui wrote:

I have the following function:

CREATE OR REPLACE FUNCTION xxx(text)

[snip]

LANGUAGE sql STABLE



Function execute plan is prepared when creating it, so the "where"
clause should check the function result not altering its execution..


Not true for SQL functions. They can be inlined, but I'm not sure if 
this one will be.


What does EXPLAIN ANALYSE show for this query?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] where clause + function, execution order

2011-11-11 Thread Tom Lane
Sorin Dudui  writes:
> I am wondering when the where clause (a.x = 'value') is executed. After the 
> select statement in the function finishes? Or is it appended at the select 
> statement in the function?

EXPLAIN is your friend ...

In this case the function looks inline-able, so reasonably recent
versions of PG should do what you want.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] where clause + function, execution order

2011-11-11 Thread Sorin Dudui
Hi,

this is the EXPLAIN ANALYSE output:


"Merge Left Join  (cost=0.00..2820.34 rows=23138 width=777) (actual 
time=0.049..317.935 rows=26809 loops=1)"
"  Merge Cond: ((a.admin10)::text = (b.link_id)::text)"
"  ->  Index Scan using admin_lookup_admin10 on admin_lookup a  
(cost=0.00..845.04 rows=5224 width=742) (actual time=0.015..40.263 rows=8100 
loops=1)"
"Filter: (((admin40)::text <> '-1'::text) AND (((admin40)::text = 
'ITA10'::text) OR ((admin40)::text = 'ITA15'::text) OR ((admin40)::text = 
'ITA19'::text) OR ((admin40)::text = 'ITA04'::text) OR ((admin40)::text = 
'ITA09'::text) OR ((admin40)::text = 'ITA03'::text) OR ((admin40)::text = 
'ITA08'::text) OR ((admin40)::text = 'ITA17'::text) OR ((admin40)::text = 
'ITA02'::text) OR ((admin40)::text = 'ITA18'::text) OR ((admin40)::text = 
'ITA01'::text) OR ((admin40)::text = 'ITA20'::text) OR ((admin40)::text = 
'ITA13'::text) OR ((admin40)::text = 'ITA11'::text) OR ((admin40)::text = 
'ITA14'::text) OR ((admin40)::text = 'ITA16'::text) OR ((admin40)::text = 
'ITA07'::text) OR ((admin40)::text = 'ITA06'::text) OR ((admin40)::text = 
'ITA12'::text) OR ((admin40)::text = 'ITA05'::text)))"
"  ->  Index Scan using reg_data_a08id_copy on registrations_data b  
(cost=0.00..1496.89 rows=24174 width=45) (actual time=0.008..70.408 rows=24174 
loops=1)"
"Total runtime: 372.765 ms"


Regards,
Sorin

-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] Im Auftrag von Richard Huxton
Gesendet: Freitag, 11. November 2011 17:00
An: Julius Tuskenis
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] where clause + function, execution order

On 11/11/11 15:54, Julius Tuskenis wrote:
> On 2011.11.11 17:38, Sorin Dudui wrote:
>> I have the following function:
>>
>> CREATE OR REPLACE FUNCTION xxx(text)
[snip]
>> LANGUAGE sql STABLE

> Function execute plan is prepared when creating it, so the "where"
> clause should check the function result not altering its execution..

Not true for SQL functions. They can be inlined, but I'm not sure if this one 
will be.

What does EXPLAIN ANALYSE show for this query?


-- 
   Richard Huxton
   Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] avoiding seq scans when two columns are very correlated

2011-11-11 Thread Ruslan Zakirov
On Fri, Nov 11, 2011 at 7:36 PM, Tom Lane  wrote:
> Ruslan Zakirov  writes:
>> A table has two columns id and EffectiveId. First is primary key.
>> EffectiveId is almost always equal to id (95%) unless records are
>> merged. Many queries have id = EffectiveId condition. Both columns are
>> very distinct and Pg reasonably decides that condition has very low
>> selectivity and picks sequence scan.
>
> I think the only way is to rethink your data representation.  PG doesn't
> have cross-column statistics at all, and even if it did, you'd be asking
> for an estimate of conditions in the "long tail" of the distribution.
> That's unlikely to be very accurate.

Rethinking schema is an option that requires more considerations as we
do it this way for years and run product on mysql, Pg and Oracle.
Issue affects Oracle, but it can be worked around by dropping indexes
or may be by building correlation statistics in 11g (didn't try it
yet).

Wonder if "CROSS COLUMN STATISTICS" patch that floats around would
help with such case?

> Consider adding a "merged" boolean, or defining effectiveid differently.
> For instance you could set it to null in unmerged records; then you
> could get the equivalent of the current meaning with
> COALESCE(effectiveid, id).  In either case, PG would then have
> statistics that bear directly on the question of how many merged vs
> unmerged records there are.

NULL in EffectiveId is the way to go, however when we actually need
those records (not so often situation) query becomes frightening:

SELECT main.* FROM Tickets main
JOIN Tickets te
ON te.EffectiveId = main.id
OR (te.id = main.id AND te.EffectiveId IS NULL)
JOIN OtherTable ot
ON ot.Ticket = te.id

Past experience reminds that joins with ORs poorly handled by many optimizers.

In the current situation join condition is very straightforward and effective.

>                        regards, tom lane

-- 
Best regards, Ruslan.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] where clause + function, execution order

2011-11-11 Thread Richard Huxton

On 11/11/11 16:28, Sorin Dudui wrote:

Hi,

this is the EXPLAIN ANALYSE output:


"Merge Left Join  (cost=0.00..2820.34 rows=23138 width=777) (actual 
time=0.049..317.935 rows=26809 loops=1)"
"  Merge Cond: ((a.admin10)::text = (b.link_id)::text)"
"  ->   Index Scan using admin_lookup_admin10 on admin_lookup a  (cost=0.00..845.04 
rows=5224 width=742) (actual time=0.015..40.263 rows=8100 loops=1)"
"Filter: (((admin40)::text<>  '-1'::text) AND (((admin40)::text = 
'ITA10'::text) OR ((admin40)::text = 'ITA15'::text) OR ((admin40)::text = 'ITA19'::text) OR 
((admin40)::text = 'ITA04'::text) OR ((admin40)::text = 'ITA09'::text) OR ((admin40)::text = 
'ITA03'::text) OR ((admin40)::text = 'ITA08'::text) OR ((admin40)::text = 'ITA17'::text) OR 
((admin40)::text = 'ITA02'::text) OR ((admin40)::text = 'ITA18'::text) OR ((admin40)::text = 
'ITA01'::text) OR ((admin40)::text = 'ITA20'::text) OR ((admin40)::text = 'ITA13'::text) OR 
((admin40)::text = 'ITA11'::text) OR ((admin40)::text = 'ITA14'::text) OR ((admin40)::text = 
'ITA16'::text) OR ((admin40)::text = 'ITA07'::text) OR ((admin40)::text = 'ITA06'::text) OR 
((admin40)::text = 'ITA12'::text) OR ((admin40)::text = 'ITA05'::text)))"
"  ->   Index Scan using reg_data_a08id_copy on registrations_data b  
(cost=0.00..1496.89 rows=24174 width=45) (actual time=0.008..70.408 rows=24174 loops=1)"
"Total runtime: 372.765 ms"


That certainly looks like it's been inlined. You are testing for 
"ITA10", "ITA15" etc outside the function-call, no? It's pushing those 
tests down, using index "admin_lookup_admin10" to test for them then 
joining afterwards.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] unlogged tables

2011-11-11 Thread Anibal David Acosta
Hello, just for clarification.

 

Unlogged tables are not memory tables don't?

 

If we stop postgres server (normal stop) and start again, all information in
unlogged tables still remain?

 

So, can I expect a data loss just in case of crash, power failure or SO
crash don't?

 

In case of crash, is possible that data corruption happened in a unlogged
tables?

 

For performance purpose can I use async commit and unlogged tables?

 

 

Thanks!

 

 

 

 



Re: [PERFORM] unlogged tables

2011-11-11 Thread Stephen Frost
* Anibal David Acosta (a...@devshock.com) wrote:
> Unlogged tables are not memory tables don't?

Unlogged tables are not memory tables.

> If we stop postgres server (normal stop) and start again, all information in
> unlogged tables still remain?

Yes.

> So, can I expect a data loss just in case of crash, power failure or SO
> crash don't?

Yes.

> In case of crash, is possible that data corruption happened in a unlogged
> tables?

In a crash, unlogged tables are automatically truncated.

> For performance purpose can I use async commit and unlogged tables?

I'm not aware of any issues (beyond those already documented for async
commit..) with having async commit and unlogged tables.

THanks,

Stephen


signature.asc
Description: Digital signature


[PERFORM] Using incrond for archiving

2011-11-11 Thread Shaun Thomas

Hey guys,

I've been running some tests while setting up some tiered storage, and I 
noticed something. Even having an empty 'echo' as archive_command 
drastically slows down certain operations. For instance:


=> ALTER TABLE foo SET TABLESPACE slow_tier;
ALTER TABLE
Time: 3969.962 ms

When I set archive_command to anything:

=> ALTER TABLE foo SET TABLESPACE slow_tier;
ALTER TABLE
Time: 11969.962 ms

I'm guessing it has something to do with the forking code, but I haven't 
dug into it very deeply yet.


I remembered seeing incrond as a way to grab file triggers, and did some 
tests with an incrontab of this:


/db/wal/ IN_CLOSE_WRITE cp -a $@/$# /db/archive/$#

Sure enough, files don't appear there until PG closes them after 
writing. The background writing also doesn't appear to affect speed of 
my test command.


So my real question: is this safe? Supposedly the trigger only gets 
activated when the xlog file is closed, which only the PG daemon should 
be doing. I was only testing, so I didn't add a 'test -f' command to 
prevent overwriting existing archives, but I figured... why bother if 
there's no future there?


I'd say tripling the latency for some database writes is a pretty 
significant difference, though. I'll defer to the experts in case this 
is sketchy. :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] WAL partition filling up after high WAL activity

2011-11-11 Thread Greg Smith

On 11/11/2011 04:54 AM, Rafael Martinez wrote:

Your explanation in 2) sounds like a good candidate for the problem we
had. As I said in june, I think we need to improve the documentation in
this area. A note in the documentation about what you have explained in
2) with maybe some hints about how to find out if this is happening will
be a great improvement.
   


A new counter was added to pg_stat_bgwriter in PostgreSQL 9.1 that 
tracks when the problem I described happens.  It's hard to identify it 
specifically without a source code change of some sort.  Initially I 
added new logging to the server code to identify the issue before the 
new counter was there.  The only thing you can easily look at that tends 
to correlate well with the worst problems here is the output from 
turning log_checkpoint on.  Specifically, the "sync" times going way up 
is a sign there's a problem with write speed.


As for the documentation, not much has really changed from when you 
brought this up on the docs list.  The amount of WAL files that can be 
created by a "short-term peak" is unlimited, which is why there's no 
better limit listed than that.  Some of the underlying things that make 
the problem worse are operating system level issues, not ones in the 
database itself; the PostgreSQL documentation doesn't try to wander too 
far into that level.  There are also a large number of things you can do 
at the application level that will generate a lot of WAL activity.  It 
would be impractical to list all of them in the checkpoint documentation 
though.


On reviewing this section of the docs again, one thing that we could do 
is make the "WAL Configuration" section talk more about log_checkpoints 
and interpreting its output.  Right now there's no mention of that 
parameter in the section that talks about parameters to configure; there 
really should be.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance