Re: [PERFORM] Heavy contgnous load
Looks like I found more magic. My table is: each record near 1kbyte, contain dozen col some text some numeric, some of the numeric columns are indexed. The database located at ramdisk (tmpfs) ((I hope)). The table is contignously filled with rows. If the table has less than 4Mrec then looks like everythink is fine. But near at 6Mrec the CPU load is go to very high and even the COUNT(*) need 8sec executing time (1sec/Mrec). The insert is slowing down too. But more stange if I try to search a record by indexed col then the server bring up it very quick! My server config is: / max_connections = 24 shared_buffers = 256MB log_destination = 'stderr' # Valid values are combinations of logging_collector = true silent_mode = on# Run server silently. log_line_prefix = '%t %d %u ' datestyle = 'iso, ymd' lc_messages = 'hu_HU' # locale for system error message lc_monetary = 'hu_HU' # locale for monetary formatting lc_numeric = 'hu_HU'# locale for number formatting lc_time = 'hu_HU' # locale for time formatting default_text_search_config = 'pg_catalog.hungarian' port = 9033 unix_socket_directory = standard disk log_directory = standard disk log_filename = 'sqld.log' effective_cache_size = 8MB checkpoint_segments = 16 synchronous_commit = off / Any idea how it possible to increase the performance? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Heavy-contgnous-load-tp4913425p4965371.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WAL partition filling up after high WAL activity
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 11/09/2011 05:06 PM, Greg Smith wrote: > On 11/07/2011 05:18 PM, Richard Yen wrote: >> My biggest question is: we know from the docs that there should be no >> more than (2 + checkpoint_completion_target) * checkpoint_segments + 1 >> files. For us, that would mean no more than 48 files, which equates >> to 384MB--far lower than the 9.7GB partition size. **Why would WAL >> use up so much disk space?** >> > > That's only true if things are operating normally. There are at least > two ways this can fail to be a proper upper limit on space used: > > 1) You are archiving to a second system, and the archiving isn't keeping > up. Things that haven't been archived can't be re-used, so more disk > space is used. > > 2) Disk I/O is slow, and the checkpoint writes take a significant period > of time. The internal scheduling assumes each individual write will > happen without too much delay. That assumption can easily be untrue on > a busy system. The worst I've seen now are checkpoints that take 6 > hours to sync, where the time is supposed to be a few seconds. Disk > space in that case was a giant multiple of checkpoint_segments. (The > source of that problem is very much improved in PostgreSQL 9.1) > Hello We have a similar case in june but we did not find the cause of our problem. More details and information: http://archives.postgresql.org/pgsql-docs/2011-06/msg7.php Your explanation in 2) sounds like a good candidate for the problem we had. As I said in june, I think we need to improve the documentation in this area. A note in the documentation about what you have explained in 2) with maybe some hints about how to find out if this is happening will be a great improvement. We did not understand why we experienced this problem in june when creating a GIN index on a tsvector column. But we found out that a lot of the tsvector data was generated from "garbage" data (base64 encoding of huge attachments). When we generated the right tsvector data, the creation of the GIN index ran smoothly and the problem with extra WAL files disappeared. PS.- In our case, the disk space used by all the extra WAL files was almost the equivalent to the 17GB of our GIN index. regards, - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk688LoACgkQBhuKQurGihTbvQCfaSBdYNF2oOtErcx/e4u0Zw1J pLIAn2Ztdbuz33es2uw8ddSIjj8UXe3s =olkD -END PGP SIGNATURE- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] avoiding seq scans when two columns are very correlated
Hello, A table has two columns id and EffectiveId. First is primary key. EffectiveId is almost always equal to id (95%) unless records are merged. Many queries have id = EffectiveId condition. Both columns are very distinct and Pg reasonably decides that condition has very low selectivity and picks sequence scan. Simple perl script that demonstrates estimation error: https://gist.github.com/1356744 Estimation is ~200 times off (5 vs 950), for real situation it's very similar. Understandably difference depends on correlation coefficient. In application such wrong estimation result in seq scan of this table winning leading position in execution plans over other tables and index scans. What can I do to avoid this problem? Tested with PostgreSQL 9.0.3 on x86_64-apple-darwin10.6.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit -- Best regards, Ruslan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] avoiding seq scans when two columns are very correlated
Ruslan Zakirov writes: > A table has two columns id and EffectiveId. First is primary key. > EffectiveId is almost always equal to id (95%) unless records are > merged. Many queries have id = EffectiveId condition. Both columns are > very distinct and Pg reasonably decides that condition has very low > selectivity and picks sequence scan. I think the only way is to rethink your data representation. PG doesn't have cross-column statistics at all, and even if it did, you'd be asking for an estimate of conditions in the "long tail" of the distribution. That's unlikely to be very accurate. Consider adding a "merged" boolean, or defining effectiveid differently. For instance you could set it to null in unmerged records; then you could get the equivalent of the current meaning with COALESCE(effectiveid, id). In either case, PG would then have statistics that bear directly on the question of how many merged vs unmerged records there are. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] where clause + function, execution order
Hi, I have the following function: === CREATE OR REPLACE FUNCTION xxx(text) RETURNS SETOF v AS $BODY$ select a.x, a.y, CASE WHEN strpos($1,b.x) > 0 THEN b.x ELSE NULL END AS mp_hm from a LEFT JOIN b ON a.id=b.id $BODY$ LANGUAGE sql STABLE COST 1000 ROWS 1; === which I call as: select * from xxx('test0|test1') where a.x = 'value' I am wondering when the where clause (a.x = 'value') is executed. After the select statement in the function finishes? Or is it appended at the select statement in the function? Thank you, Sorin
Re: [PERFORM] where clause + function, execution order
Hello, On 2011.11.11 17:38, Sorin Dudui wrote: Hi, I have the following function: === CREATE OR REPLACE FUNCTION xxx(text) RETURNS SETOF v AS $BODY$ select a.x, a.y, CASE WHEN strpos($1,b.x) > 0 THEN b.x ELSE NULL END AS mp_hm from a LEFT JOIN b ON a.id=b.id $BODY$ LANGUAGE sql STABLE COST 1000 ROWS 1; === which I call as: select * from xxx(‘test0|test1‘) where a.x = ‘value’ You should get an error as there is no "a" in this statement... I am wondering when the where clause (a.x = ‘value’) is executed. After the select statement in the function finishes? Or is it appended at the select statement in the function? Function execute plan is prepared when creating it, so the "where" clause should check the function result not altering its execution.. -- Julius Tuskenis Head of the programming department UAB nSoft mob. +37068233050
Re: [PERFORM] where clause + function, execution order
On 11/11/11 15:54, Julius Tuskenis wrote: On 2011.11.11 17:38, Sorin Dudui wrote: I have the following function: CREATE OR REPLACE FUNCTION xxx(text) [snip] LANGUAGE sql STABLE Function execute plan is prepared when creating it, so the "where" clause should check the function result not altering its execution.. Not true for SQL functions. They can be inlined, but I'm not sure if this one will be. What does EXPLAIN ANALYSE show for this query? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] where clause + function, execution order
Sorin Dudui writes: > I am wondering when the where clause (a.x = 'value') is executed. After the > select statement in the function finishes? Or is it appended at the select > statement in the function? EXPLAIN is your friend ... In this case the function looks inline-able, so reasonably recent versions of PG should do what you want. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] where clause + function, execution order
Hi, this is the EXPLAIN ANALYSE output: "Merge Left Join (cost=0.00..2820.34 rows=23138 width=777) (actual time=0.049..317.935 rows=26809 loops=1)" " Merge Cond: ((a.admin10)::text = (b.link_id)::text)" " -> Index Scan using admin_lookup_admin10 on admin_lookup a (cost=0.00..845.04 rows=5224 width=742) (actual time=0.015..40.263 rows=8100 loops=1)" "Filter: (((admin40)::text <> '-1'::text) AND (((admin40)::text = 'ITA10'::text) OR ((admin40)::text = 'ITA15'::text) OR ((admin40)::text = 'ITA19'::text) OR ((admin40)::text = 'ITA04'::text) OR ((admin40)::text = 'ITA09'::text) OR ((admin40)::text = 'ITA03'::text) OR ((admin40)::text = 'ITA08'::text) OR ((admin40)::text = 'ITA17'::text) OR ((admin40)::text = 'ITA02'::text) OR ((admin40)::text = 'ITA18'::text) OR ((admin40)::text = 'ITA01'::text) OR ((admin40)::text = 'ITA20'::text) OR ((admin40)::text = 'ITA13'::text) OR ((admin40)::text = 'ITA11'::text) OR ((admin40)::text = 'ITA14'::text) OR ((admin40)::text = 'ITA16'::text) OR ((admin40)::text = 'ITA07'::text) OR ((admin40)::text = 'ITA06'::text) OR ((admin40)::text = 'ITA12'::text) OR ((admin40)::text = 'ITA05'::text)))" " -> Index Scan using reg_data_a08id_copy on registrations_data b (cost=0.00..1496.89 rows=24174 width=45) (actual time=0.008..70.408 rows=24174 loops=1)" "Total runtime: 372.765 ms" Regards, Sorin -Ursprüngliche Nachricht- Von: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] Im Auftrag von Richard Huxton Gesendet: Freitag, 11. November 2011 17:00 An: Julius Tuskenis Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] where clause + function, execution order On 11/11/11 15:54, Julius Tuskenis wrote: > On 2011.11.11 17:38, Sorin Dudui wrote: >> I have the following function: >> >> CREATE OR REPLACE FUNCTION xxx(text) [snip] >> LANGUAGE sql STABLE > Function execute plan is prepared when creating it, so the "where" > clause should check the function result not altering its execution.. Not true for SQL functions. They can be inlined, but I'm not sure if this one will be. What does EXPLAIN ANALYSE show for this query? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] avoiding seq scans when two columns are very correlated
On Fri, Nov 11, 2011 at 7:36 PM, Tom Lane wrote: > Ruslan Zakirov writes: >> A table has two columns id and EffectiveId. First is primary key. >> EffectiveId is almost always equal to id (95%) unless records are >> merged. Many queries have id = EffectiveId condition. Both columns are >> very distinct and Pg reasonably decides that condition has very low >> selectivity and picks sequence scan. > > I think the only way is to rethink your data representation. PG doesn't > have cross-column statistics at all, and even if it did, you'd be asking > for an estimate of conditions in the "long tail" of the distribution. > That's unlikely to be very accurate. Rethinking schema is an option that requires more considerations as we do it this way for years and run product on mysql, Pg and Oracle. Issue affects Oracle, but it can be worked around by dropping indexes or may be by building correlation statistics in 11g (didn't try it yet). Wonder if "CROSS COLUMN STATISTICS" patch that floats around would help with such case? > Consider adding a "merged" boolean, or defining effectiveid differently. > For instance you could set it to null in unmerged records; then you > could get the equivalent of the current meaning with > COALESCE(effectiveid, id). In either case, PG would then have > statistics that bear directly on the question of how many merged vs > unmerged records there are. NULL in EffectiveId is the way to go, however when we actually need those records (not so often situation) query becomes frightening: SELECT main.* FROM Tickets main JOIN Tickets te ON te.EffectiveId = main.id OR (te.id = main.id AND te.EffectiveId IS NULL) JOIN OtherTable ot ON ot.Ticket = te.id Past experience reminds that joins with ORs poorly handled by many optimizers. In the current situation join condition is very straightforward and effective. > regards, tom lane -- Best regards, Ruslan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] where clause + function, execution order
On 11/11/11 16:28, Sorin Dudui wrote: Hi, this is the EXPLAIN ANALYSE output: "Merge Left Join (cost=0.00..2820.34 rows=23138 width=777) (actual time=0.049..317.935 rows=26809 loops=1)" " Merge Cond: ((a.admin10)::text = (b.link_id)::text)" " -> Index Scan using admin_lookup_admin10 on admin_lookup a (cost=0.00..845.04 rows=5224 width=742) (actual time=0.015..40.263 rows=8100 loops=1)" "Filter: (((admin40)::text<> '-1'::text) AND (((admin40)::text = 'ITA10'::text) OR ((admin40)::text = 'ITA15'::text) OR ((admin40)::text = 'ITA19'::text) OR ((admin40)::text = 'ITA04'::text) OR ((admin40)::text = 'ITA09'::text) OR ((admin40)::text = 'ITA03'::text) OR ((admin40)::text = 'ITA08'::text) OR ((admin40)::text = 'ITA17'::text) OR ((admin40)::text = 'ITA02'::text) OR ((admin40)::text = 'ITA18'::text) OR ((admin40)::text = 'ITA01'::text) OR ((admin40)::text = 'ITA20'::text) OR ((admin40)::text = 'ITA13'::text) OR ((admin40)::text = 'ITA11'::text) OR ((admin40)::text = 'ITA14'::text) OR ((admin40)::text = 'ITA16'::text) OR ((admin40)::text = 'ITA07'::text) OR ((admin40)::text = 'ITA06'::text) OR ((admin40)::text = 'ITA12'::text) OR ((admin40)::text = 'ITA05'::text)))" " -> Index Scan using reg_data_a08id_copy on registrations_data b (cost=0.00..1496.89 rows=24174 width=45) (actual time=0.008..70.408 rows=24174 loops=1)" "Total runtime: 372.765 ms" That certainly looks like it's been inlined. You are testing for "ITA10", "ITA15" etc outside the function-call, no? It's pushing those tests down, using index "admin_lookup_admin10" to test for them then joining afterwards. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] unlogged tables
Hello, just for clarification. Unlogged tables are not memory tables don't? If we stop postgres server (normal stop) and start again, all information in unlogged tables still remain? So, can I expect a data loss just in case of crash, power failure or SO crash don't? In case of crash, is possible that data corruption happened in a unlogged tables? For performance purpose can I use async commit and unlogged tables? Thanks!
Re: [PERFORM] unlogged tables
* Anibal David Acosta (a...@devshock.com) wrote: > Unlogged tables are not memory tables don't? Unlogged tables are not memory tables. > If we stop postgres server (normal stop) and start again, all information in > unlogged tables still remain? Yes. > So, can I expect a data loss just in case of crash, power failure or SO > crash don't? Yes. > In case of crash, is possible that data corruption happened in a unlogged > tables? In a crash, unlogged tables are automatically truncated. > For performance purpose can I use async commit and unlogged tables? I'm not aware of any issues (beyond those already documented for async commit..) with having async commit and unlogged tables. THanks, Stephen signature.asc Description: Digital signature
[PERFORM] Using incrond for archiving
Hey guys, I've been running some tests while setting up some tiered storage, and I noticed something. Even having an empty 'echo' as archive_command drastically slows down certain operations. For instance: => ALTER TABLE foo SET TABLESPACE slow_tier; ALTER TABLE Time: 3969.962 ms When I set archive_command to anything: => ALTER TABLE foo SET TABLESPACE slow_tier; ALTER TABLE Time: 11969.962 ms I'm guessing it has something to do with the forking code, but I haven't dug into it very deeply yet. I remembered seeing incrond as a way to grab file triggers, and did some tests with an incrontab of this: /db/wal/ IN_CLOSE_WRITE cp -a $@/$# /db/archive/$# Sure enough, files don't appear there until PG closes them after writing. The background writing also doesn't appear to affect speed of my test command. So my real question: is this safe? Supposedly the trigger only gets activated when the xlog file is closed, which only the PG daemon should be doing. I was only testing, so I didn't add a 'test -f' command to prevent overwriting existing archives, but I figured... why bother if there's no future there? I'd say tripling the latency for some database writes is a pretty significant difference, though. I'll defer to the experts in case this is sketchy. :) -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] WAL partition filling up after high WAL activity
On 11/11/2011 04:54 AM, Rafael Martinez wrote: Your explanation in 2) sounds like a good candidate for the problem we had. As I said in june, I think we need to improve the documentation in this area. A note in the documentation about what you have explained in 2) with maybe some hints about how to find out if this is happening will be a great improvement. A new counter was added to pg_stat_bgwriter in PostgreSQL 9.1 that tracks when the problem I described happens. It's hard to identify it specifically without a source code change of some sort. Initially I added new logging to the server code to identify the issue before the new counter was there. The only thing you can easily look at that tends to correlate well with the worst problems here is the output from turning log_checkpoint on. Specifically, the "sync" times going way up is a sign there's a problem with write speed. As for the documentation, not much has really changed from when you brought this up on the docs list. The amount of WAL files that can be created by a "short-term peak" is unlimited, which is why there's no better limit listed than that. Some of the underlying things that make the problem worse are operating system level issues, not ones in the database itself; the PostgreSQL documentation doesn't try to wander too far into that level. There are also a large number of things you can do at the application level that will generate a lot of WAL activity. It would be impractical to list all of them in the checkpoint documentation though. On reviewing this section of the docs again, one thing that we could do is make the "WAL Configuration" section talk more about log_checkpoints and interpreting its output. Right now there's no mention of that parameter in the section that talks about parameters to configure; there really should be. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance