[GENERAL] Strange duplicate key violation error
I have this table "replica_sync_log" which is updated thousands of times a day to reflect the state of various schemas in a database which acts as an offline secondary to various other databases (each of the source databases is mapped to its own schema in the secondary). The table has the following definition: Table "radio_prod_default.replica_sync_log" Column |Type | Modifiers --+-+--- db_host | text| not null db_port | text| not null db_name | text| not null last_sync_id | integer | last_sync_time | timestamp without time zone | last_commit_time | timestamp without time zone | Indexes: "replica_sync_log_pkey" PRIMARY KEY, btree (db_host, db_port, db_name), tablespace "data1" Tablespace: "data1" There are in fact one of these tables for each schema, each one contains exactly one row (the "log" in the name is a bit misleading, these just contain the current replica state, not a running log). Here is an error that I got this morning that doesn't make much sense to me: 2007-06-28 08:53:54.937 PDT [d:radio_reports_new u:slony s:4683d86f. 3681 3] ERROR: duplicate key violates unique constraint "replica_sync_log_pkey" 2007-06-28 08:53:54.937 PDT [d:radio_reports_new u:slony s:4683d86f. 3681 4] STATEMENT: UPDATE replica_sync_log SET last_sync_id = 7147892, last_sync_time = '2007-06-27 23:28:04.586846'::timestamp, last_commit_time = 'Thu Jun 28 08:53:54 2007'::timestamp WHERE db_name = 'radio_prod_default' AND db_host = 'radiodb-default-1' AND db_port = '5432'; I've never seen this error before in millions of updates to this table. It confuses me because the columns indexed by the primary key *are not* being modified, so how could the update violate the constraint? Furthermore there is only one record in the table anyway. The updates are performed by daemons that are in charge of replicating the changes for one database each. So there is no concurrent updating on this table, only one process updates it. The error caused the daemon to pause, so I restarted it. The error did not recur the second time. fwiw this is postgres 8.1.8 on Linux. Thanks for any insights. -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Backup Large Tables
Are you dumping the whole database or just a single table? If it's the former, try the latter and see if you still get errors. If pg_dump is not working, maybe some system table is hosed. What errors are you getting? If you can get in via psql, log in as a superuser and execute: COPY mytable TO 'mytable.txt'; That will dump the table data to a text file which can be re-imported into a new database using the COPY FROM command. Basically you're just doing part of what pg_dump does for you by hand. -Casey On Sep 21, 2006, at 9:19 PM, Charles Ambrose wrote: Hi! I encounter errors in dumping the database using pg_dump. The database i think is corrupt. It was looking for triggers and stored procedures that are now longer in the database. This is also the reason why I opted to create a program to dump the database. On 9/22/06, Michael Nolan <[EMAIL PROTECTED]> wrote: I have a table with over 6 million rows in it that I do a dump on every night. It takes less than 2 minutes to create a file that is around 650 MB. Are you maybe dumping this file in 'insert' mode? -- Mike Nolan On 9/21/06, Charles Ambrose < [EMAIL PROTECTED]> wrote: Hi! I have a fairly large database tables (say an average of 3Million to 4Million records). Using the pg_dump utility takes forever to dump the database tables. As an alternative, I have created a program that gets all the data from the table and then put it into a text file. I was also unsuccessfull in this alternative to dump the database. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] postgress 8.1.4 deadlocking??
On Sep 27, 2006, at 7:28 AM, Rafal Pietrak wrote: Hi, I fell into the following problem (unfortunately, the database contents has sensitive customer information, so can publish very little of that). Currently postgress process takes close to 100% CPU time. I've restarted the process a moment ago, and it was calm for a brief minute. It started to consume the CPU once I've issued: ALTER GROUP majster DROP USER rafal; ... which took close to 5minutes to complete. Then I've issued: Sounds like it was blocked (unsure by what). You can use pg_locks to check that. VACUUM ANALYZE ; ... which I had to cancel when it didnt' completed within another 10minutes. CPU utilisation remains at 100% even after the VACCUM was canceled. VACUUM sometimes hangs around for a while after it is cancelled, which can be annoying. I think this has been improved for 8.2 FWIW What was consuming all the CPU? I assume from what you wrote previously that it is a postgres backend. If so, what was it doing? You can use ps, the pg_stat_activity view and strace to figure that out. pg_dumpall was successfull during the large-CPU-usage time. ...another postgress-process restart, and another VACCUM ANALYSE, this one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And everything seams working. Was something else using the database previously? Was something else different this time than the last? the database is currently almost empty: c.a. 100k records within c.a. 20 tables with another 30 or so views. There are c.a. 6k userids (roles) created in the cluster. How big is the data on disk? Is it possible that you have queries scanning tables containing lots of dead tuples? If so a VACUUM FULL would help, but that's totally speculative. Any ideas where should I look (system tables? process stats??) to dieagnose the problem? just to be prepared when it appears again? Postgres itself should not spontaneously start using all of the cpu. Some query operations must be running unless there is an underlying problem (hardware, os, fs). When you saw this happen did you happen to check if it was user or system cpu usage? I had a corrupt filesystem once that cause pg to go off into the weeds consuming all cpu forever, but it was all *system* cpu time. Also I couldn't shut pg down because it was presumably caught in an eternal system call unable to respond to signals. Your description above sounds different than this to me, however. -Casey ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] postgress 8.1.4 deadlocking??
On Sep 27, 2006, at 12:35 PM, Rafal Pietrak wrote: Thenx Duncan for the analysis. This happend again, so I'm able to peek at the details you've pointed out. On Wed, 2006-09-27 at 09:33 -0700, Casey Duncan wrote: Sounds like it was blocked (unsure by what). You can use pg_locks to check that. That view currently contais 240 entries source of blocking. Now I have to dig the ultimate reason why those locks end up there (and are not being cleand). Any ideas? You need to look at the procpid that's holding the lock and then use pg_stat_activity to see what it is doing. I'm not sure what locks that query needs to hold, it may be blocked by any concurrent query from a user in that group, I dunno. Whould those locks go away by themselves in time? Yeah, once the transactions holding the locks complete. What was consuming all the CPU? I assume from what you wrote previously that it is a postgres backend. If so, what was it doing? ps output (just one backend) postgres 4759 0.0 2.0 19368 5292 ?S21:02 0:00 postgres: czyn_738773 czyn 127.0.0.1(55431) UPDATE waiting --- "czyn" is the name of the database, "czyn_738773" is database username. IP address is localhost, since the database purpose is web serbvice back-end. If the update is waiting, it is blocked and should not be consuming any noticeable cpu. This was using lots of CPU at this moment? Still, I'm not sure what that UPDATE counld that be. You can use ps, the pg_stat_activity view and strace to figure that out. - exerpt from pg_stat_activity- 17660 | czyn |4759 |19541 | czyn_738773 | | | 2006-09-27 21:02:05.914728+02 | 127.0.0.1 | 55431 (18 rows) -- But this is not very meaningfull to me. You need to add this to your postgresql.conf: stats_command_string = on Then send a HUP signal to the postmaster (you can use pgctl reload for this or 'kill -HUP') Then you can see what the exact command is. Another option that may be helpful to diagnose is: log_min_duration_statement = 500 This will output the sql for all commands taking longer than 500ms. You can adjust this up and down depending on what counts as "slow" in your app. You probably only want it on temporarily, but I find it sometimes helpful to always leave it on with a suitably high value (like 1000). That way you can always troll the logs for laggards. ...another postgress-process restart, and another VACCUM ANALYSE, this one completes in 3-5 secs. Now "ALTER GROUP ..." goes OK. And everything seams working. Was something else using the database previously? Was something else different this time than the last? No. My impression was (naive, I know), that not all things were correctly cleand-up on the first restart. Now I think, that on the second go I was just a little bit more lucky - www service was calm for a little longer. That was anough for VACUUM to complete. Yeah, it sounds like things are fairly busy given the locks above. Still, throuble started later. How big is the data on disk? Is it possible that you have queries close to nothing: c.a. 200MiB Sounds like the problem is more about concurrency then data size. How many concurrent connections do you have (select count(*) from pg_stat_activity)? How many are in transaction and running queries concurrently? If you see multiple instances of a given query (or the same one appearing frequently), try running EXPLAIN ANALYZE on it. That'll tell you what makes it slow. problem (hardware, os, fs). When you saw this happen did you happen to check if it was user or system cpu usage? I had a corrupt It's the user CPU time. Yeah, just sounds like a scaling issue. Hopefully easily solved ;^) As it currently looks like some unknown UPDATE causes the trouble, how can I identify what it is? see config recommendations above. 1. the database is not very sophisticated, apart from what I've mentioned earlier, it does have just a few stored procedures in pgsql - nothing fancy. And there is no triggers just specialized views (although quite a lot of referencial integrity constrains). These can be a source of lock contention on updates (though this is much better on 8.1 compared to 7.4). Once you know the update query that's getting blocked you can know for sure. 2. machine is 'from the old days': 450MHz CPU What is the iowait like on this box? You can use iostat to determine this. That will tell you if the disks are also struggling. 3. the system is currently under 'available-for-public' condition, so there may be 'waves' (peeks) of activity. 4. when the system was tested in-lab (small number of users - no risk /
[GENERAL] Expected accuracy of planner statistics
I have some databases that have grown significantly over time (as databases do). As the databases have grown, I have noticed that the statistics have grown less and less accurate. In particular, the n_distinct values have become many OOM too small for certain foreign key columns. Predictably this leads to poor query plans. The databases in question were all using the default stats target value, so naturally the thing to do is to increase that and see what happens. First I'll show you one table in question: qa_full=# \d fk Table "public.fk" Column |Type | Modifiers --+-+--- fk_id | bigint | not null st_id | bigint | not null is_positive | boolean | not null mc_id | character varying(20) | not null matching_seed | character varying(20) | ft_id | character varying(20) | s_title | text| not null a_summary | text| not null date_created | timestamp without time zone | default now() qx_id | bigint | Indexes: "fk_pkey" PRIMARY KEY, btree (fk_id) "fk_st_mc_id_idx" UNIQUE, btree (st_id, mc_id) "fk_date_created_is_positive_idx" btree (is_positive, date_created) "fk_st_id_idx" btree (st_id) Foreign-key constraints: "fk_qx_id_fkey" FOREIGN KEY (qx_id) REFERENCES st(st_id) ON DELETE RESTRICT "fk_st_id_fkey" FOREIGN KEY (st_id) REFERENCES st(st_id) ON DELETE RESTRICT qa_full=# select count(*) from fk; count --- 19889 Here are the n_distinct stats on the st_id column with stock stats settings: qa_full=# select n_distinct from pg_stats where tablename='fk' and attname='st_id''; attname | n_distinct ---+- st_id | 14910 here's the actual distinct count: qa_full=# select count(distinct st_id) from fk; count -- 15191387 (1 row) Here's what it looks like after turning the stats target up to 100: qa_full=# select n_distinct from pg_stats where tablename='fk' and attname='st_id''; attname | n_distinct ---+- st_id | 136977 Still way off (3 OOM), so let's pull out the stops and go for 1000: qa_full=# select n_distinct from pg_stats where tablename='fk' and attname='st_id''; attname | n_distinct ---+- st_id | 860796 Better, but still way off. Here's more of the pg_stats row for the curious with the stats target at 1000: schemaname| public tablename | fk attname | st_id null_frac | 0 avg_width | 8 n_distinct| 860796 most_common_vals | {9822972459012807,81553350123749183,50260420266636724,16953859416556337, 57992478091506908,6789385517968759,13155841310992808,4649594156182905,11 950505984130111,19815690615418387,23232929805154508,24940819255590358,25 304517086243633,30084673952005845,33845252828401578,36510232790970904,44 301350711321256,47572440754042499,66302045808587415,106949745150210138,7 94825759857,11709841786637953,12034360925626832,17311819170902574,21 933556169120032,31401742852411043,37178443803282644,39714175315169346,42 699954975194688,63648700912541567,73785794393665562,...many elided..} most_common_freqs | {7.3e-05,6.7e-05,5.3e-05,5e-05,5e-05,4.7e-05,4.7e-05 , 4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05, 4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05,4.3e-05, 4.3e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05,4e-05, 4e-05,4e-05,4e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3. 7e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3.7e-05,3. 7e-05,3.7e-05,3.7e-05,3.7e-05,3.3e-05,3.3e-05,3. 3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3. 3e-05,3.3e-05,3.3e-05,3.3e-05,3.3e-05,3e-05,3e-05,3e-05, 3e-05,3e-05,3e-05,3e-05,3e-05,3e-05,..many elided..} histogram_bounds | {9474697855526,186642098833097,425502410065792,655064117100237,917344884 40,1135224280975580,1510900775316064,1919850381534192,23918286327044 65,2773745714634569,3197981109338899,3601128214604953,3887435029566307,4 289757501117626,4604286546172963,503060515434,5410915764179364,57126 62986537560,6096452674229658,6531206443844232,6761515475182966,692428185 0823004,7145897868348599,7357502317108796,7537560231072453,7737194605867 515,7923617661480232,8094845122681350,8304911973154200,8504211340608556, 8735469559703009,9008968782181381,9233161779966219,..many elided..} correlation | 0.770339 The correlation is likely high here because this table has been clustered on th
Re: [GENERAL] Expected accuracy of planner statistics
On Sep 28, 2006, at 8:51 PM, Tom Lane wrote: [..] The information we've seen says that the only statistically reliable way to arrive at an accurate n_distinct estimate is to examine most of the table :-(. Which seems infeasible for extremely large tables, which is exactly where the problem is worst. Marginal increases in the sample size seem unlikely to help much ... as indeed your experiment shows. I think a first step might be to introduce a new analyze command, such as ANALYZE FULL. This would be executed deliberately (IOW not by autovacuum) like CLUSTER or VACUUM FULL when deemed necessary by the dba. The command as implied would scan the entire table and fill in the stats based on that (as analyze used to do IIRC). It would also be useful if this command froze the stats so that autovacuum didn't clobber them with inaccurate ones shortly thereafter. Perhaps an explicit ANALYZE FULL FREEZE command would be useful for that case, the behavior being that a normal ANALYZE would not overwrite the stats for a stats-frozen table, another ANALYZE FULL would, however. Such a frozen state would also be useful if you wanted to hand-tweak stats for a single table and have it stick and still use autovac. As I understand it now, with autovac on, you cannot do that unless you hack the pg_autovacuum table (i.e., set anl_base_thresh to an artificially high value). Another option (that I think others have suggested) would be to make this the behavior for VACUUM ANALYZE. That saves the baggage of a new command at least. Another advantage would be that the autovac daemon could run it. Perhaps some smarts could also be built in. What if VACUUM ANALYZE first runs a normal (sampled) ANALYZE. Then it performs the VACUUM with full ANALYZE pass. The stats gathered by the latter full pass are compared to that of the first sampled pass. If the full ANALYZE statistics are sufficiently different from the sampled pass, then the table is flagged so that normal ANALYZE is not performed by the autovac daemon on that table. Also, a global ANALYZE could ignore it (though this seems more magical). A more pie-in-the-sky idea could take advantage of the fact that the larger a table is the less likely the statistics will change much over time. If we cannot afford to sample many rows in a given analyze pass, then perhaps we should use a "newton's method" approach where we attempt to converge on an accurate value over time with each analyze pass contributing more samples to the statistics and honing them incrementally rather than simply replacing the old ones. I'm not statistician, so it's not clear to me how much more state you would need to keep between analyze passes to make this viable, but in order for this to work the following would need to be true: 1) Analyze would need to be run on a regular basis (luckily we have autovaccum to help). You would want to analyze this table periodically even if nothing much changed, however. Perhaps tuning the autovac parameters is enough here. 2) Each analyze pass would need to sample randomly so that multiple passes tend to sample different rows. 3) The stats would need to somehow be cumulative. Perhaps this means storing sample values between passes, or some other statistical voodoo. 4) Needs to be smart enough to realize when a table has changed drastically, and toss out the old stats in this case. Either that or we require a human to tell us via ANALYZE FULL/VACUUM ANALYZE. I think that the incremental stats approach would more or less depend on the full ANALYZE functionality for bootstrapping. I think when you first load the table, you want to get the stats right immediately and not wait some indeterminate amount of time for them to "converge" on the right value. -Casey ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Array assignment behavior (was Re: [ADMIN] Stored procedure array limits)
On Sep 29, 2006, at 9:14 AM, Tom Lane wrote: [ expanding this thread, as it now needs wider discussion ] "Paul B. Anderson" <[EMAIL PROTECTED]> writes: Actually, I was not filling all of the arrays in sequential order. I added code to initialize them in order and the function seems to be working now. Is that a known problem? Well, it's a documented behavior: section 8.10.4 saith A stored array value can be enlarged by assigning to an element adjacent to those already present, or by assigning to a slice that is adjacent to or overlaps the data already present. Up to 8.2 we didn't have a lot of choice about this, because without any ability to have nulls embedded in arrays, there wasn't any sane thing to do with the intermediate positions if you assigned to an element not adjacent to the existing range. As of 8.2 we could allow assignment to arbitrary positions by filling the intermediate positions with nulls. The code hasn't actually been changed to allow that, but it's something we could consider doing now. Comments? At first blush, this strikes me as a bit too magical/implicit. Are there other languages where sequences behave similarly? The best analogy that comes to mind is sparse files, but in that case there is an implicit contract that the intervening empty regions do not actually occupy physical space, doesn't sound like that's true here. I think the result of this change would be more difficult debugging of off-by-one errors and their ilk, rather than actually being a real benefit. OTOH, perhaps there is a real use-case I am missing here. I don't see the rest of this thread on GENERAL and I couldn't find it searching the archives, where did it come from? -Casey ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Two efficiency questions - clustering and ints
On Oct 6, 2006, at 11:12 AM, John D. Burger wrote: Richard Huxton wrote: Should I always cluster the tables? That is, even if no column jumps out as being involved in most queries, should I pick a likely one and cluster on it? Well you cluster on an index, and if you don't think the index is useful, I'd drop it. If you have an index, clustering isn't necessarily going to help you unless you regularly read a series of rows in order. Right, but clustering on that index means an index scan (with that index) will be faster. This is uninteresting if the index doesn't get used, but is there any =downside= to doing it? Here's a simpler question - for static data, should I always cluster on the index I think will be used the most? Depends on the queries. If the index is on a foreign key value where there may be many rows with the same key scattered about it will help queries that lookup using that foreign key. Clustering on a column with high cardinality isn't much of a win typically for single key lookups (depends on the lookup pattern), but would be for ranges and possibly for sorting on that column. It also depends on the size of the table and indices. If they are small enough to fit in memory then clustering to reduce random access isn't really helpful. I would suggest doing some timing tests on typical queries with the data unclustered and clustered to know what you are gaining. -Casey ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] how to get number of minutes?
select extract(epoch from interval '2 hours')/60; 'epoch' returns the number epoch seconds that comprise the interval. That differs from 'seconds' which just returns the "seconds place", which is zero for 2:00:00 of course. -Casey On Oct 6, 2006, at 12:22 PM, Chris Hoover wrote: If I subtract 2 timestamps, how do I get the results returned as the total number of minutes. example now() - (now()-'2 hours'::interval) = 2:00:00 -- I need the result returned as 120 minutes. Thanks, Chris ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] COPY FROM STDIN instead of INSERT
On Oct 18, 2006, at 5:20 AM, Ilja Golshtein wrote: When starting a database from scratch it is much faster to import the data and then create the indexes. The time to create index on a full table is less than the extra time from each index update from the inserts. The more indexes to update the more time updating indexes takes. The problem with a live database is removing the indexes slows down current users and if you are adding 2,000 rows to a table that already has 5,000,000 rows in it then you will loose the benefit. I am 100% agree with you. What you are describing is a very good and useful technique for some maintenance operations. My current goal is to increase performance in normal [almost ;)] OLTP mode of my application, so removing indexes for some time is not an option here. And my question remains. Is it Ok to use COPY BINARY FROM STDIN instead of multiple INSERTs? I don't think I would use BINARY, it seems likely to be susceptible to changes in the underlying data type storage. From the docs: "To determine the appropriate binary format for the actual tuple data you should consult the PostgreSQL source, in particular the *send and *recv functions for each column's data type (typically these functions are found in the src/backend/utils/adt/ directory of the source distribution)." Regular text COPY is quite a big performance win over INSERTs, and doesn't require your application to know the vagaries of the data storage. Also, if you have many indices, time to update them will probably dominate anyhow, making the difference between binary and text copy negligible. A was mentioned, COPY can only insert static data, and does not support rules (that you might use to support constraint exclusion, etc). AFIAK, the locking semantics are the same as INSERT, i.e., it does not lock the entire table or anything like that. Since it would allow transactions that insert data to finish faster, it should actually work better under high concurrency. -Casey ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Help needed
On Oct 19, 2006, at 8:19 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: I want to upgrade a system from 7.3.8 ro 8.1.5 I am new to Psql and looking for handy hints Any known problems or pit-falls ? You'll need to dump the database and reload (pg_dump and pg_restore), 8.1 uses a different data format. There can be issues with character encodings. IIRC 8.1 is much more strict about what it allows than 7.4 was. So you could potentially run into errors importing the data into 8.1 fro 7.x. I would keep the 7.4 database around after dumping it to correct any possible encoding glitches. There are also quite a few query planner changes between these versions. They are certainly improvements, but they can mean that queries that work correctly on 7.4 that fail to use the same indices or just behave differently on 8.1. Most of the time things will just work (and work better), but you'll want to watch out for that. Anything we need to do in preparation ? I would just do a dry run. Dump the 7.4 database and install 8.1.5 somewhere (perhaps in a local directory or on another machine). Then load the dump into it and run the ANALYZE sql command. If it loads successfully, test out your applications against it. Test it and test it some more. 8.1 is typically configured differently than 7.4 wrt things like shared_buffers and whatnot. You'll definitely want to adjust the defaults for things (upwards) and you'll probably want different settings than before even running on the same hardware. You'll probably also want to turn autovacuum on, and possible vacuum_cost_delay. This page describes a lot of the settings with recommendations (even though it's getting pretty dated now, the info is still useful): http://www.powerpostgresql.com/Downloads/annotated_conf_80.html hth, -Casey ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] AutoVacuum on demand?
On Nov 13, 2006, at 1:05 PM, Glen Parker wrote: Matthew T. O'Connor wrote: Glen Parker wrote: I would like a way to run the autovacuum daemon on demand periodically. Every night at 2 AM, for example. Anybody know if this is possible? If not, it's a feature request :-) Autovacuum can be enabled / disabled on the fly using the GUC settings. Perhaps you can write a cron script that runs at 2AM that enables autovac, and at 6AM (just a guess) disables it in the same fashion. I think that should work. Would be curious to hear your results. I thought it sounded like a good idea, but then... SET autovacuum = on; ERROR: parameter "autovacuum" cannot be changed now I've turned it on and off in the postgresql.conf several times (followed by a postmaster HUP) and it worked as expected. I think this is a global setting that cannot be toggled in a particular session. -Casey ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] statement_timeout
When I configure statement_timeout globally, I typically override it for superusers and other accounts used by dbas. Just issue: ALTER USER postgres SET statement_timeout = 0; Repeat for other superusers (slony, etc). Then the policy won't apply to them. -Casey On Nov 16, 2006, at 6:46 AM, Marcin Mańk wrote: Hello. I have an unconfirmed feeling that autovac does not like system-wide statement_timeout. I.e. when I in some panic move set system-wide statement_timeout to 90 seconds, autovac stopped working (I do not know for 100% if there is a dependency). Ups... Now I checked that pg_dump has the same issue. and vacuumdb. This is all on 8.1.4 . Yeah, system-wide statement_timeout is not much of a brilliant idea :( Pozdrawiam Marcin Mańk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] n00b RAID + wal hot standby question
On Dec 4, 2006, at 1:11 PM, Anton Melser wrote: Hi, I am just starting at a company and we are inheriting a previously built solution. It looks pretty good but my previous experience with pg is seriously small-time compared with this... I am very new at the job, and don't know what hd config we have but it will be RAID-something I imagine (hey I was working with desktop "servers" before this!). If that is very important I can find out. We seem to be saving our WAL to the same partition as PGDATA, and I notice that we are maxing out a reasonable looking server. The db is not very big (~4gig, 400meg pgdump), and though I can't see any vacuum strategy (autovacuum on a 8.1.4 is disabled), we didn't have as much consistent CPU usage at my old job (with a 6 gig db and MUCH less CPU and RAM, and probably as many connections), and my vacuum strategy was also pitiful! Sure, completely different environments, but I am thinking that WAL replication could be a factor. So my question... being in complete ignorance of how RAID works (the performance details)... would it be better to try and separate the WAL destination from PGDATA? How much of a difference could it make? Should we wait till the customer starts complaining (no explosion in traffic/db size realistic for the foreseeable future...)? Any abuse welcome. When you say maxing out, what do you mean? Posting some vmstat output under load would be very helpful (assuming *nix, otherwise can't help you ;^). My very general RAID recommendation would be: Put the wal on a 2 disk RAID-1 array either by itself or together with the system if you expect little disk activity from the rest of the system. Put the data on a RAID-10 array (at least 4 disks, but more would be much better). As for the hardware itself, we've had good luck with 10k rpm WD raptor SATA drives + 3ware 95xx raid controllers (the 9550SX being preferred due to their smarter caching and higher bandwidth). THe most important thing though is too keep as much of the database in RAM as possible. That means putting as much RAM in the database box as you can afford and not sharing it with other processes (i.e., move other RAM-hungry things to another box). And configure postgres to use the available RAM by tuning shared_buffers, work_mem, etc. Tuning checkpoint_segments and wal_buffers can help with write performance. And running autovacuum (or regular full database vacuums) is very important to the performance of the database over time. Otherwise you will definitely notice significant performance degradation as the garbage in the tables mounts (unless of course the db is read-only). If it's been off for a long time (or never run), you'll probably need to spend some quality time doing a "VACUUM FULL" and possibly a "REINDEX DATABASE" first. -Casey ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] [PERFORM] Locking in PostgreSQL?
On Dec 5, 2006, at 11:04 PM, Joost Kraaijeveld wrote: Does PostgreSQL lock the entire row in a table if I update only 1 column? Know that updating 1 column is actually updating the whole row. So if one transaction updates column A of a row, it will block another concurrent transaction that tries to update column B of the same row. As was mentioned however, neither of these transactions block others reading the row in question, though they see the row as it existed before the updates until those update transactions commit. If you know that your application will suffer excessive update contention trying to update different columns of the same row, you could consider splitting the columns into separate tables. This is an optimization to favor write contention over read performance (since you would likely need to join the tables when selecting) and I wouldn't do it speculatively. I'd only do it if profiling the application demonstrated significantly better performance with two tables. -Casey ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Anything I can do to speed up this query?
On Dec 6, 2006, at 2:34 PM, Wei Weng wrote: On Tue, 2006-12-05 at 15:56 -0500, Wei Weng wrote: I have a table that has roughly 200,000 entries and many columns. The query is very simple: SELECT Field1, Field2, Field3... FieldN FROM TargetTable; TargetTable has an index that is Field1. I think I have discovered the reason for why the query runs so slow. The original query has an ORDER BY Field1 clause that I forgot to put in my email. So the query looks like this: SELECT Field1, Field2, Field3... FieldN FROM TargetTable ORDER BY Field1 DESC; What is the effective way to optimize this query(or to optimize the system) to run a little faster than it does now? Thanks and I really appreciate all the helps I've gotten so far. clustering the table on the index used for the sort might help, but likely performance of the above is probably worse than your original example due to the added random access overhead caused by the index scan (assuming the sort uses an index). If the table changes infrequently, you could consider creating an in-order copy of the data (using INSERT INTO) so that you don't need to use an index to order it. If you need the full list of results before the application can do anything (thus cursors won't help), then you'll either need to reduce the amount of data scanned and returned (optimize fields for byte size, move any fields not needed by this query to another table, reduce the number of rows in the table, etc) or scan it faster (faster disks and cpu). If you have multiple cpus on the database side, and you are not I/O bound, you could consider using table partitioning to break the table up into pieces where each piece contains a fraction of the whole (say one quarter). Then issue four concurrent queries from the application and reassemble the results on that end. You might not need to use table partitioning if you can efficiently determine the "cleave" points at run time. Then you would just use use a where clause to select the proper range. -Casey ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] replication advice needed
You could use slony for this a couple of ways. One is a simpler more hacky way, another is a bit more involved but perhaps more "correct": For the simple way, setup the source database as a provider and the remote replica as a normal slony subscriber.Don't run the slon daemons all the time, just launch them and let them run every night to catch up the replica. Newer slony versions allow you to run slon in a mode where it terminates after a certain point. This is simple, but has a couple of drawbacks: - The data will become bloated in the source during the day, because each insert/update to a replicated table will create log data in ancillary slony tables that won't get cleaned out until you replicate at night. This is probably a pretty minor issue though unless the data it huge. - The replication process will put appreciable load on the source database. And as the log tables grow it takes longer and longer for the replica to catch up. Another approach is perhaps more "standard" though more complex: - Setup a slony provider and subscriber both on the local network (you need at least one secondary to use log shipping) - Setup a remote secondary as a "log shipping" node - Run slons for the local provider and subscriber with "log shipping" mode turned on. This will generate sql files that contain the data updates made to the provider. - Periodically tar up the files, ship them over to the remote secondary and apply them (using psql or some custom script). An advantage to the latter is that you can compress the log shipping files. They are not very compact (since they contain sql statements and your data), but they should compress quite well. hth, -Casey On Jan 10, 2007, at 11:02 AM, Gene wrote: I searched the web sites for advice but came up short on something that looked like what I need. I need to "replicate" a few tables nightly (2am-6am). Each table will have about 1 million new records (~100B each, ~5/second) each day. I have tried using something like a cron job (copy from/copy to) but it has limitiations as far as handling duplicate keys (just need to replace) and it will not fill in gaps if there happen to be any. the servers are also far from each other and only have a 50 KB/s ipsec link between them so data transfer needs to be somewhat minimized. thanks for any suggestions, gene ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Where art thou pg_clog?
We have a production system with multiple identical database instances on the same hardware, with the same configuration, running databases with the exact same schema. They each have different data, but the database sizes and load patterns are almost exactly the same. We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh well ;^) and since then we have noticed the following error on two of the servers: 2007-02-15 00:35:03.324 PST ERROR: could not access status of transaction 2565134864 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory The first time this happened, I chalked it up to some kind of disk corruption based on the mailing list archives. So I dumped the databases, did a fresh initdb, forced an fsck (these run with a jfs data partition and an ext2 wal partition) which found no problems and then reloaded the databases. Now about a week later I see the same problem on different server. We never saw this problem running 8.1.3 on these same machines over many months, so I'm beginning to get suspect that something we changed since running 8.1.3 is to blame. Before the upgrade these systems ran postgres 8.1.3 and slony 1.1.5. Now they run postgres 8.1.5 and slony 1.2.6 (I don't know that the slony version is important, I add it here for completeness). Nothing else important has changed on these boxes. I see the 8.1.8 is out now, though nothing I see in the release notes seems relevant to this issue. Here are some specific things I'd like to know: 1. Is it possible to "fix" this problem without an dumpall/initdb/ restore. That takes many hours and can only be done when I'm supposed to be at home relaxing (yeah right) ;^) FWIW, the system is functioning fine right now from what I can tell, save the above errors in the log every few minutes. 2. What more info can I give to figure out the "cause" of this. Are there files I can inspect to find out more? 3. Is it possible that this is a side-affect of the upgrade to 8.1.5? Thanks for any insights, -Casey ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 1:50 PM, Peter Eisentraut wrote: Casey Duncan wrote: 2007-02-15 00:35:03.324 PST ERROR: could not access status of transaction 2565134864 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory The first time this happened, I chalked it up to some kind of disk corruption based on the mailing list archives. So I dumped the databases, did a fresh initdb, forced an fsck (these run with a jfs data partition and an ext2 wal partition) which found no problems and then reloaded the databases. Now about a week later Unless you actually executed 2565134864 transactions in that one week, this is still data corruption. Check for faulty memory. I'd be more inclined to agree with you if it happened on only one server machine. But this has now happened on two different machines in the space of a week. My understanding is that the transaction id logged is garbage because the bookkeeping fields have been clobbered for some tuple(s). The one last week was really low (like < 1000). -Casey ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote: Casey Duncan wrote: We have a production system with multiple identical database instances on the same hardware, with the same configuration, running databases with the exact same schema. They each have different data, but the database sizes and load patterns are almost exactly the same. We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh well ;^) and since then we have noticed the following error on two of the servers: 2007-02-15 00:35:03.324 PST ERROR: could not access status of transaction 2565134864 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory Can you relate it to autovacuum? Maybe. Here's what I get when I crank up the logging to debug4: 2007-02-15 14:20:48.771 PST DEBUG: StartTransaction 2007-02-15 14:20:48.771 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.771 PST DEBUG: vacuuming "pg_catalog.pg_statistic" 2007-02-15 14:20:48.771 PST ERROR: could not access status of transaction 2565134864 2007-02-15 14:20:48.772 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory 2007-02-15 14:20:48.772 PST DEBUG: proc_exit(0) 2007-02-15 14:20:48.772 PST DEBUG: shmem_exit(0) 2007-02-15 14:20:48.773 PST DEBUG: exit(0) 2007-02-15 14:20:48.775 PST DEBUG: reaping dead processes does that imply that it is the pg_statistic table that is hosed? Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. -Casey ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 2:44 PM, Alvaro Herrera wrote: Casey Duncan wrote: On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote: [..] Can you relate it to autovacuum? Maybe. Here's what I get when I crank up the logging to debug4: 2007-02-15 14:20:48.771 PST DEBUG: StartTransaction 2007-02-15 14:20:48.771 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.771 PST DEBUG: vacuuming "pg_catalog.pg_statistic" 2007-02-15 14:20:48.771 PST ERROR: could not access status of transaction 2565134864 2007-02-15 14:20:48.772 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory 2007-02-15 14:20:48.772 PST DEBUG: proc_exit(0) 2007-02-15 14:20:48.772 PST DEBUG: shmem_exit(0) 2007-02-15 14:20:48.773 PST DEBUG: exit(0) 2007-02-15 14:20:48.775 PST DEBUG: reaping dead processes does that imply that it is the pg_statistic table that is hosed? Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. Except template0 I presume? Is this autovacuum running in template0 perchance? I note that 800 million transactions have passed since the Xid in the error message was current. Wouldn't you know it! A little farther back up in the log file: 2007-02-15 14:20:48.480 PST LOG: autovacuum: processing database "template0" 2007-02-15 14:20:48.480 PST DEBUG: StartTransaction 2007-02-15 14:20:48.480 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.481 PST DEBUG: autovacuum: VACUUM FREEZE whole database 2007-02-15 14:20:48.481 PST DEBUG: CommitTransaction 2007-02-15 14:20:48.481 PST DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> fwiw, I did a cluster-wide vacuum on 1/20/2007. Not sure if that has any impact on anything, just thought I'd throw it out there. I'm curious how template0 got stomped on. Certainly nothing's been changing it. Of course it might just be some random bug so the fact it landed on a file for template0 could be completely arbitrary. Anyhow it does seem curious to me. -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote: Casey Duncan wrote: Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. Except template0 I presume? Is this autovacuum running in template0 perchance? I note that 800 million transactions have passed since the Xid in the error message was current. Wouldn't you know it! A little farther back up in the log file: 2007-02-15 14:20:48.480 PST LOG: autovacuum: processing database "template0" 2007-02-15 14:20:48.480 PST DEBUG: StartTransaction 2007-02-15 14:20:48.480 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.481 PST DEBUG: autovacuum: VACUUM FREEZE whole database 2007-02-15 14:20:48.481 PST DEBUG: CommitTransaction 2007-02-15 14:20:48.481 PST DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> This is a bug we fixed in 8.1.7. I suggest you update to the latest of the 8.1 series, to get that fix among others. ok, great. To fix the problem, set pg_database.datallowconn=true for template0, then connect to it and do a VACUUM FREEZE. Then set datallowconn=false again. Do you mean to do this after upgrading to 8.1.8? If I try than in 8.1.5, I get (unsurprisingly): % psql -U postgres template0 -c "vacuum freeze" ERROR: could not access status of transaction 2565134864 DETAIL: could not open file "pg_clog/098E": No such file or directory I'm curious how template0 got stomped on. Certainly nothing's been changing it. Of course it might just be some random bug so the fact it landed on a file for template0 could be completely arbitrary. The problem is that all databases are vacuumed every so many transactions, to avoid Xid wraparound problems; even non connectable databases. The problem is that a bug in autovacuum caused that vacuum operation to neglect using the FREEZE flag; this negligence makes it leave non-permanent Xids in the tables, leading to the problem you're seeing. Ironically we were earlier bitten by the bug that autovacuum didn't do the cluster-wide vacuum until too late. Now we got bitten by the fact that did do the cluster-wide vacuum. Talk about damned-if-you-do- and-damned-if-you-don't! 8^) ok, this is a much better sounding explanation than "random data corruption" ;^) Thanks! -Casey ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 5:50 PM, Alvaro Herrera wrote: Casey Duncan wrote: On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote: Casey Duncan wrote: To fix the problem, set pg_database.datallowconn=true for template0, then connect to it and do a VACUUM FREEZE. Then set datallowconn=false again. Do you mean to do this after upgrading to 8.1.8? If I try than in 8.1.5, I get (unsurprisingly): % psql -U postgres template0 -c "vacuum freeze" ERROR: could not access status of transaction 2565134864 DETAIL: could not open file "pg_clog/098E": No such file or directory Hum, yeah, I forgot to mention that you need to create the 098E pg_clog segment for that to work at all :-) Fill it with byte 0x55 till the needed position, which is the bit pattern for "all transactions committed". I'd make sure to remove it manually after the freeze is done, just in case! (I think the system would remove it at next checkpoint, but anyway.) That seems a bit scary to do on a running production server. Could I get away with dropping the template0 database and loading one from another identical pg instance (or a new one) or will that freak things out? -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Large aggregate query running out of memory in ExecutorState
I have some nightly statisics queries that runs against a view which unions several large tables. Recently one of these queries started running into out of memory errors. This is on postgresql 8.1.8 running on 32-bit Debian Linux. Here is the error in the log including the query (excluding memory detail): 2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 3] ERROR: out of memory 2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 4] DETAIL: Failed on request of size 16. 2007-03-02 05:20:46.302 PST [d:reports u:hits s:45e81fd1.635a 5] STATEMENT: select count(case when is_positive then true end), count(case when not is_positive then true end), count(distinct st_id)", count(distinct st_id & X''::bigint) from fb_packed; Here is the query plan for the above: QUERY PLAN - Aggregate (cost=20393102.13..20393102.14 rows=1 width=9) -> Append (cost=0.00..12015096.06 rows=418900303 width=45) -> Seq Scan on fb_packed (cost=0.00..20.30 rows=1030 width=45) -> Seq Scan on fb_packed (cost=0.00..1411898.37 rows=75468337 width=45) -> Seq Scan on fb_packed (cost=0.00..1675027.88 rows=89690588 width=45) -> Seq Scan on fb_packed (cost=0.00..1553071.43 rows=83061343 width=45) -> Seq Scan on fb_packed (cost=0.00..1632686.20 rows=87618920 width=45) -> Seq Scan on fb_packed (cost=0.00..1553388.85 rows=83060085 width=45) (8 rows) Not very complicated at all, just lots of rows to process. The estimates for row quantity and width are roughly correct above. Here is the relevant bits from the memory detail in the logs. Notice the first ExecutorState value, which doesn't seem reasonable: TopMemoryContext: 8241296 total in 12 blocks; 5528 free (13 chunks); 8235768 used TopTransactionContext: 8192 total in 1 blocks; 7856 free (0 chunks); 336 used Local Buffer Lookup Table: 57344 total in 3 blocks; 23096 free (7 chunks); 34248 used Type information cache: 8192 total in 1 blocks; 1864 free (0 chunks); 6328 used Operator class cache: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used MessageContext: 122880 total in 4 blocks; 59280 free (3 chunks); 63600 used smgr relation table: 24576 total in 2 blocks; 9952 free (3 chunks); 14624 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used PortalHeapMemory: 1024 total in 1 blocks; 848 free (0 chunks); 176 used ExecutorState: -1525554452 total in 337 blocks; 1476423576 free (20505908 chunks); 1292989268 used ExecutorState: 24576 total in 2 blocks; 15560 free (3 chunks); 9016 used [...Remainder elided] This just started recently, with no changes to the query itself. Seems like we just crossed some size threshold that triggered this condition. I tried analyzing all of the tables again just in case, but it didn't help. TIA, -Casey ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Unit testing plpgsql code
Anyone know of a unit testing framework for plpgsql stored procedures? We are about to convert a bunch of stored procedures from Oracle over to Postgresql. We currently use the utPLSQL package to unit test those, so something comparable to it would be optimal. We'll probably roll our own if nothing currently exists, but I figured it was worth it to ask first before reinventing the wheel. TIA, -Casey ---(end of broadcast)--- TIP 8: explain analyze is your friend
[GENERAL] Possible infinite loop in query using bitmap scans
I have this report query that runs daily on a table with several hundred million rows total using pg 8.1.3 on Debian Linux on hw with dual opteron processors: SELECT count(*) FROM webhits WHERE path LIKE '/radio/tuner_%.swf' AND status = 200 AND date_recorded >= '3/10/2006'::TIMESTAMP AND date_recorded < '3/11/2006'::TIMESTAMP; Here is the explain output: QUERY PLAN -- Aggregate (cost=794775.08..794775.09 rows=1 width=0) -> Bitmap Heap Scan on webhits (cost=315820.45..794771.74 rows=1337 width=0) Recheck Cond: ((date_recorded >= '2006-03-10 00:00:00'::timestamp without time zone) AND (date_recorded < '2006-03-11 00:00:00'::timestamp without time zone)) Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND (status = 200)) -> BitmapAnd (cost=315820.45..315820.45 rows=249152 width=0) -> Bitmap Index Scan on webhits_date_idx1 (cost=0.00..140407.45 rows=15379741 width=0) Index Cond: ((date_recorded >= '2006-03-10 00:00:00'::timestamp without time zone) AND (date_recorded < '2006-03-11 00:00:00'::timestamp without time zone)) -> Bitmap Index Scan on webhits_path_idx2 (cost=0.00..175412.76 rows=15343959 width=0) Index Cond: (((path)::text >= '/radio/tuner'::character varying) AND ((path)::text < '/radio/tunes'::character varying)) According to the planner it should take <15 minutes which is typical in practice. About half the times it runs, however, it never terminates (even after days) and just spins consuming 99+% of CPU with no disk activity. This query was never a problem in postgres versions < 8.1.2, however the data has grown substantially since that time. I notice it uses the recent in-memory bitmap feature, so I wondered if it was exposing a bug. If I restart the postmaster, the query will complete in the expected time. -Casey ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Possible infinite loop in query using bitmap scans
On Mar 13, 2006, at 9:50 AM, Michael Fuhr wrote: On Sun, Mar 12, 2006 at 11:36:23PM -0800, Casey Duncan wrote: SELECT count(*) FROM webhits WHERE path LIKE '/radio/tuner_%.swf' AND status = 200 AND date_recorded >= '3/10/2006'::TIMESTAMP AND date_recorded < '3/11/2006'::TIMESTAMP; [...] Aggregate (cost=794775.08..794775.09 rows=1 width=0) [...] According to the planner it should take <15 minutes which is typical in practice. The planner's cost estimate is in units of disk page fetches, not time. The above estimate isn't 794775.09 ms (~13.25 min) but rather 794775.09 times the cost of a single page fetch, however much that is. See "Using EXPLAIN" in the "Performance Tips" chapter of the documentation. http://www.postgresql.org/docs/8.1/interactive/performance- tips.html#USING-EXPLAIN Doh! I should've known that. About half the times it runs, however, it never terminates (even after days) and just spins consuming 99+% of CPU with no disk activity. This query was never a problem in postgres versions < 8.1.2, however the data has grown substantially since that time. I notice it uses the recent in-memory bitmap feature, so I wondered if it was exposing a bug. If the problem happens half the time then you have a somewhat repeatable test case. Do you get more consistent performance if you set enable_bitmapscan to off? What's the query plan if you do that? Here's the plan with bitmap scans off: QUERY PLAN -- Aggregate (cost=891363.71..891363.72 rows=1 width=0) -> Index Scan using webhits_date_idx1 on webhits (cost=0.00..891360.30 rows=1362 width=0) Index Cond: ((date_recorded >= '2006-03-10 00:00:00'::timestamp without time zone) AND (date_recorded < '2006-03-11 00:00:00'::timestamp without time zone)) Filter: (((path)::text ~~ '/radio/tuner_%.swf'::text) AND (status = 200)) (4 rows) The query runs to completion this way in about 40 minutes. I turned bitmap scans back on and it hangs again (I ran it for about 5 hours). If you narrow the search criteria so the query returns fewer rows, do you still see the problem? Can you identify a "sour spot" where the problem starts to happen? I'll do that tomorrow, and let you know. If I restart the postmaster, the query will complete in the expected time. Does the problem eventually start happening again? If so, after how long? How did you determine that the restart is relevant? Do you consistently see different (presumably better) performance after a restart than if you don't restart? This is a production box, so I can't restart it whenever I want and I haven't yet reproduced it elsewhere -- the data base size makes that cumbersome at best -- but once after it hung up, I restarted postgres and the report ran to completion for a couple of days then started hanging again. Today it seems to pretty consistently hang, I'll see if I can restart it overnight and test it again. -Casey ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Possible infinite loop in query using bitmap scans
On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote: [..] If I restart the postmaster, the query will complete in the expected time. Does the problem eventually start happening again? If so, after how long? How did you determine that the restart is relevant? Do you consistently see different (presumably better) performance after a restart than if you don't restart? I restarted postgres this morning and this time it didn't seem to help. That query has been running for several hours now. I'm going to let it go a while longer to see if it eventually completes, but I suspect it won't. So perhaps the prior restart was just dumb luck. I'll try some narrower queries as well. -Casey ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Possible infinite loop in query using bitmap scans
On Mar 14, 2006, at 1:31 PM, Casey Duncan wrote: On Mar 13, 2006, at 5:25 PM, Casey Duncan wrote: [..] If I restart the postmaster, the query will complete in the expected time. Does the problem eventually start happening again? If so, after how long? How did you determine that the restart is relevant? Do you consistently see different (presumably better) performance after a restart than if you don't restart? I restarted postgres this morning and this time it didn't seem to help. That query has been running for several hours now. I'm going to let it go a while longer to see if it eventually completes, but I suspect it won't. So perhaps the prior restart was just dumb luck. I'll try some narrower queries as well. Interestingly, I can only narrow the query by about an hour (i.e., a 23 hour span) before the plan changes to a simple index scan. That may explain why this only started hanging recently since it probably only started using the bitmap scan for a day interval recently. Let me know if there is anything I should do or any info you need to further pin this down. In the mean time I will disable bitmap scans for this query. Thanks. -Casey ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] What is your favorite front end for user interaction to postgresql databases?
On May 8, 2006, at 3:33 PM, Karen Hill wrote: What is your favorite front end for end users to interact with your postgresql db? Is it java, .net, web apache + php, MS-Access, ruby on rails? Why is it your favorite? Which would you recommend for end users on multiple OSes? This is totally dependent on the individual user's preference: - Many users comfortable with command line shells prefer psql (this is my choice). - Others who like a gui might use pgAdmin (or miriad others) - If interoperability with MSOffice is your main concern, I could see Access/ODBC being useful. As a general front-end, probably not. When you start talking about java v. .net v. php vs. ruby, that's a whole other ball of wax. That's a lot less about interaction, or even about databases and a lot more about programmer preference. Personally, I would recommend: Python + psycopg (highly interactive, great for complex scripts or even apps) Java + JDBC + Hibernate (I think JDBC sucks, but Hibernate totally rocks and more or less hides it from you) Shell + psql (great for simple reporting, automating dba tasks, etc. etc) Also, what do you think of having the database management system do all work (business logic etc) with the front end as a user interface vs. having N-tier with an application server handling business logic, and the db just accepting data. I currently have the opinion that N-tier is not as good as having the db contain the business logic via stored procedures. This comes from having to re-invent the wheel every time a new user app needs to be created. From personal experience (and others will disagree), I find putting logic in the database to be a bad idea. I only use stored procedures for triggers and the like, and I try to avoid those whenever possible. Here are my reasons why: - I don't like the PL/pgSQL language (yes there are alternatives, but they have their own drawbacks) - It's complex to test and upgrade (we actually wrote non-trivial infrastructure to automate both) - It's difficult to debug (compared to external languages like python or java) - It's difficult to profile, therefore getting good performance can be difficult I had a very complex system coded in stored procedures that performed poorly and was hard to maintain. It's now refactored into java/ hibernate code that's simpler to understand, performs much better and is easy to extend and maintain. Of course that's just my particular case and obviously YMMV. Stored procs could make a lot of sense if you have many different clients accessing the db in different ways and you want to strictly enforce business rules across all of them. I had no such requirements in my case. In any case I would strongly recommend doing the simplest thing that you can get away with. If your business rules can be fulfilled with grants, views and constraints alone, use them. -Casey ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] autovacuum "connections" are hidden
Trying to drop a database, this morning I ran into the not so unusual error: dropdb: database removal failed: ERROR: database "test_seg1" is being accessed by other users however, when I did "select * from pg_stat_activity" on the pg server, it showed no connection to that db. Then I looked at the processes: tmp0% ps ax | grep test_seg1 10317 ?D 0:36 postgres: autovacuum process test_seg1 I imagine this doesn't show up because there is no connection, per se, the autovacuum process is a bon-a-fide backend process in its own right. It's just a bit confusing in this circumstance. I guess this is more of a heads up than a question. Another circumstance where this has bitten me is in doing a slony SUBSCRIBE_SET. At least in 1.1.5 the autovac daemon can deadlock with slony and cause the subscriber operation to fail. When this happened to me it was somewhat baffling at first because I had altered pg_hba.conf to forbid all non-superuser connections and there were no other connections visible at the time (in pg_stat_activity). Turning off autovacuum during the subscribe fixed this, but not without a little head-scratching (and maybe some luck). No way comes to my mind to improve the visiblity of autovac "connections" but I thought I would throw this out here in case anyone had any thoughts on the matter. -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] autovacuum "connections" are hidden
On May 17, 2006, at 12:34 PM, Tom Lane wrote: Casey Duncan <[EMAIL PROTECTED]> writes: however, when I did "select * from pg_stat_activity" on the pg server, it showed no connection to that db. Then I looked at the processes: tmp0% ps ax | grep test_seg1 10317 ?D 0:36 postgres: autovacuum process test_seg1 Hmm, autovacuum won't show up in pg_stat_activity because it never calls pgstat_report_activity(). Seems like maybe it should, though. Is this worth filing a bug about for posterity? -Casey ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Cannot restart postgres
Are there any other partitions on that machine with space available? If so you could move some files there from your postgres data dir and symlink to them from their original location. At least then you might get it to start so you can get a pg_dump to work. -Casey On Jun 19, 2006, at 1:43 PM, elein wrote: 7.x installation. (Yeah, yeah, upgrade.) I have a disk which has only the database on it. It has become full and postgres won't start. Is there an option (besides restore from backup) other than trying to move the data dir to a bigger disk? Would pg_resetxlog do the right thing to free up some space so I can go in and vacuum and delete rows? This is somewhat urgent. Thanks for your help. --elein [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Can't start PostgreSQL
It seems you were running a pre-8.x postgresql version before, its data files are not compatible with the new version you have now. You'll need to find out the version that used to be installed by looking at the PG_VERSION file in your postgres data directory. Once you do that you will need to install the old version (you can download it from postgresql.org), start it against the database directory and dump the databases. You can use pg_dumpall for this. Once you have dumped the dbs, you can move the old data directory aside, and run initdb for the new postgresql (8.1.4) and start it. Then import the databases you previously dumped. hth, -Casey On Jul 27, 2006, at 9:22 AM, Ian Johnson wrote: Hi List I want to develop an application in PostgreSQL but when starting the service I get the following message: An old version of the database format was found. You need to upgrade the data format before using PostgreSQL. See /usr/share/doc/postgresql-8.1.4/README.rpm-dist for more information. I am running Fedora Core 5 but initially had built the system with Fedora Core 3, with PostgreSQL installed and automatically updated it over the past 2 years. I may have created a test database 2 years ago which may be the source of the problem. Is this the source of the problem? I could find nothing in README.rpm-dist that helps. I would appreciate any advice on how to deal with this problem. Thank you Ian Johnson ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq