Re: [GENERAL] Row count estimation bug in BETWEEN?
Tom Lane-2 wrote > PG doesn't try to estimate inequalities exactly, because it usually > doesn't make enough of a difference to matter. Currently we don't > even bother to distinguish say ">" from ">=" for estimation purposes, > though certainly we would need to in order to deal with zero-width ranges > with any great amount of precision. Thank you for your answer! I'm sorry, but after looking into documentation and sources (scalarineqsel function in selfuncs.c, clauselist_selectivity and addRangeClause functions in clausesel.c) and experimenting a little I've got an impression that PostgreSQL actually bothers to distinguish ">" from ">=" for estimation purposes sometimes (probably, when MCV is used), but in my example it uses histogram and indeed doesn't distinguish them. My simple test (using MCVs) is below: - CREATE TABLE t2(n int); INSERT INTO t2(n) VALUES (0),(0),(0),(0),(1),(1),(1),(1),(2),(2),(2),(2); ANALYZE t2; EXPLAIN SELECT * FROM t2 WHERE n > 0 AND n < 2 -- rows=4 EXPLAIN SELECT * FROM t2 WHERE n > 0 AND n < 2 -- rows=12 -- Looking further, I found ineq_histogram_selectivity function in selfuncs.c, and this fragment seems relevant: - /* * We have values[i-1] <= constant <= values[i]. * * Convert the constant and the two nearest bin boundary * values to a uniform comparison scale, and do a linear * interpolation within this bin. */ binfrac = (val - low) / (high - low); - And now I'm stuck. Can ">" operators can be distinguished from ">=" operators at this point? - WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/Row-count-estimation-bug-in-BETWEEN-tp5853687p5853725.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Row count estimation bug in BETWEEN?
> My simple test (using MCVs) is below ... I've posted wrong second query in the test, should have been: EXPLAIN SELECT * FROM t2 WHERE n >= 0 AND n <= 2 -- rows=12 - WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/Row-count-estimation-bug-in-BETWEEN-tp5853687p5853771.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Row count estimation bug in BETWEEN?
Tom Lane-2 wrote > The assumption is that the histogram represents a > continuous distribution of values in which no one value occurs often > enough to be interesting (if it did, it would be in the MCV list...). > Therefore it does not matter much whether any specific histogram entry > is exactly "=". And of course, for comparison values that are between > histogram entries, we have no idea whatsoever whether there are any > "=" entries in the table; This assumption is correct for continuous types, but in my example the type (bigint) is discrete (as some other types like date, numerics (with defined scale) and even varchar/text are), so the assumption is wrong for it. Tom Lane-2 wrote > so even if the code did distinguish ">" from > ">=", it would be unclear what to do with the knowledge. The vague idea that popped into my head is: As the code in convert_to_scalar already switches on the value type, a flag to distinguish ">=" operators from ">" operators could be added there. It would use the equality of "a > const::sometype" to "a >= next_value(const::sometype)", i.e. that "a > 2::int" equals "a >= 3::int". So, corresponding convert_to... functions would use "value+1" instead of "value" in my case, next date if the type is date, "value+0.01" if type is numeric(n, 2), etc. IMHO, the problem with these estimations is that they are horribly off. I've searched the archives, and it seems that PostgreSQL's users are bitten by it sometimes, like: http://www.postgresql.org/message-id/4583.1358289...@sss.pgh.pa.us. - WBR, Yaroslav Schekin. -- View this message in context: http://postgresql.nabble.com/Row-count-estimation-bug-in-BETWEEN-tp5853687p5853938.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Move PGdata to a different drive
David Fetter wrote: On Thu, Jun 04, 2009 at 11:11:29AM -0400, Bruce Momjian wrote: Jennifer Trey wrote: Hi, What file should I be working with? Just shut down the server and move the directory whever you want and restart the server. There are no file contents that need changing. Of course should adjust your scripts or PGDATA environment variable. Another way to do this is to mount the drive at the place where PGDATA used to be. The down side of this approach is that it only works if you want your PGDATA at the root of that new drive. And of course one can create a safety symlink to the new location of PGDATA from the old location just in case, e.g., if one suspects hidden references to PGDATA location in home-made scripts etc. Of course, that had better be straightened out sooner than later and the safety symlink removed for the sake of clarity, but when you need to get everything rolling *ASAP*, temporary symlinks can be helpful. Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to store text files in the postgresql?
DimitryASuplatov wrote: My task is to store a lot (10^5) of small ( <10 MB) text files in the database with the ability to restore them back to the hard drive on demand. I cannot but ask the community a related question here: Can such design, that is, storing quite large objects of varying size in a PostgreSQL database, be a good idea in the first place? I used to believe that what RDBMS were really good at was storing a huge number of relations, each of a small and mostly uniform size if expressed in bytes; but today people tend to put big things, e.g., email or files, in relational databases because it's convenient to them. That's absolutely normal as typical data objects we have to deal with keep growing in size, but how well can databases stand the pressure? And can't it still be better to store large things as plain files and put just their names in the database? File systems were designed for such kind of job after all, unlike RDBMS. Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] ZFS prefetch considered evil?
Hi All, I have a mid-size database (~300G) used as an email store and running on a FreeBSD + ZFS combo. Its PG_DATA is on ZFS whilst xlog goes to a different FFS disk. ZFS prefetch was enabled by default and disk time on PG_DATA was near 100% all the time with transfer rates heavily biased to read: ~50-100M/s read vs ~2-5M/s write. A former researcher, I was going to set up disk performance monitoring to collect some history and see if disabling prefetch would have any effect, but today I had to find out the difference the hard way. Sorry, but that's why the numbers I can provide are quite approximate. Due to a peak in user activity the server just melted down, with mail data queries taking minutes to execute. As the last resort, I rebooted the server with ZFS prefetch disabled -- it couldn't be disabled at run time in FreeBSD. Now IMAP feels much more responsive; transfer rates on PG_DATA are mostly <10M/s read and 1-2M/s write; and disk time stays way below 100% unless a bunch of email is being inserted. My conclusion is that although ZFS prefetch is supposed to be adaptive and handle random access more or less OK, in reality there is plenty of room for improvement, so to speak, and for now Postgresql performance can benefit from its staying just disabled. The same may apply to other database systems as well. Thanks, Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ZFS prefetch considered evil?
On 08/07/2009, at 8:39 PM, Alban Hertroys wrote: On Jul 8, 2009, at 2:50 AM, Yaroslav Tykhiy wrote: Hi All, I have a mid-size database (~300G) used as an email store and running on a FreeBSD + ZFS combo. Its PG_DATA is on ZFS whilst xlog goes to a different FFS disk. ZFS prefetch was enabled by default and disk time on PG_DATA was near 100% all the time with transfer rates heavily biased to read: ~50-100M/s read vs ~2-5M/s write. A former researcher, I was going to set up disk performance monitoring to collect some history and see if disabling prefetch would have any effect, but today I had to find out the difference the hard way. Sorry, but that's why the numbers I can provide are quite approximate. Due to a peak in user activity the server just melted down, with mail data queries taking minutes to execute. As the last resort, I rebooted the server with ZFS prefetch disabled -- it couldn't be disabled at run time in FreeBSD. Now IMAP feels much more responsive; transfer rates on PG_DATA are mostly <10M/s read and 1-2M/s write; and disk time stays way below 100% unless a bunch of email is being inserted. My conclusion is that although ZFS prefetch is supposed to be adaptive and handle random access more or less OK, in reality there is plenty of room for improvement, so to speak, and for now Postgresql performance can benefit from its staying just disabled. The same may apply to other database systems as well. Are you sure you weren't hitting swap? A sceptic myself, I genuinely understand your doubt. But this time I was sure because I paid attention to the name of the device involved. Moreover, a thrashing system wouldn't have had such a disparity between disk read and write rates. IIRC prefetch tries to keep data (disk blocks?) in memory that it fetched recently. What you described is just a disk cache. And a trivial implementation of prefetch would work as follows: An application or other file/disk consumer asks the provider (driver, kernel, whatever) to read, say, 2 disk blocks worth of data. The provider thinks, "I know you are short- sighted; I bet you are going to ask for more contiguous blocks very soon," so it schedules a disk read for many more contiguous blocks than requested and caches them in RAM. For bulk data applications such as file serving this trick works as a charm. But other applications do truly random access and they never come back after the prefetched blocks; in this case both disk bandwidth and cache space are wasted. An advanced implementation can try to distinguish sequential and random access patterns, but in reality it appears to be a challenging task. ZFS uses quite a bit of memory, so if you distributed all your memory to be used by just postgres and disk cache then you didn't leave enough space for the prefetch data and _something_ will be moved to swap. I hope you know that FreeBSD is exceptionally good at distributing available memory between its consumers. That said, useless prefetch indeed puts extra pressure on disk cache and results in unnecessary cache evictions, thus making things even worse. It is true that ZFS is memory hungry and so rather sensitive to non-optimal memory use patterns. Useless prefetch wastes memory that could be used to speed up other ZFS operations. If you're running FreeBSD i386 then ZFS requires some careful tuning due to the limits a 32-bit OS puts on memory. I recall ZFS not being very stable on i386 a while ago for those reasons, which has by now been fixed as far as possible, but it's not ideal (and it likely never will be). I use FreeBSD/amd64 and I'm generally happy with ZFS on that platform. You'll probably want to ask about this on the FreeBSD mailing lists as well, they'll know much better than I do ;) Are you a local FreeBSD expert? ;-) Jokes apart, I don't think this topic has to do with FreeBSD as such; it is mostly about making the advanced technologies of Postgresql and ZFS go well together. Even ZFS developers admit that in database related applications exceptions from general ZFS practices and rules may be called for. When I set up my next ZFS based Postgresql server, I think I'll play with the recordsize property of ZFS and see if setting it to PAGESIZE makes any difference. Thanks, Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Failover, Wal Logging, and Multiple Spares
On 18/08/2009, at 9:36 AM, Bryan Murphy wrote: Ok, I've asked this a few times, but nobody ever responded. I think I finally got it though, could somebody confirm my logic? Basically, you setup a chain of servers, and when fails you replicate to the next link in the chain, like so: Master (A) --> Warm Standby (B) --> Warn Standby (C) --> etc. Master Fails, now becomes: Old Master (A) x> New Master (B) --> Warm Standby (C) And, of course, you might have an additional replication chain from Master (A) just in case you goof something up in the failover process, but that's the basic idea. Excuse me, but I fail to see how you are going to replicate from one warm standby to another warm standby. I don't think PostgreSQL can do that. That said, the idea of just partially degrading a warm standby cluster by electing a new master node looked very attractive to me, too. On Sun, Aug 16, 2009 at 9:35 PM, Bryan Murphy wrote: Assuming we are running a Postgres instance that is shipping log files to 2 or more warm spares, is there a way I can fail over to one of the spares, and have the second spare start receiving updates from the new master without missing a beat? I can live with losing the old master, and at least at the moment it would be a controlled failover, but I would like to to know if it's possible during an uncontrolled failover as well (catastrophic hardware failure). Right now, we have just that setup, but every time I've failed over to the new master, we've had to rebuild our spares from scratch and unfortunately this is a multi-hour long process. We can't afford the risk of not having a warm spare for that length of time. We're planning to move entirely to a slony cluster, but I'd like to fail over to a more powerful machine before we begin the slony migration as the current server is already overloaded. Encouraged by Bruce Momjian, I tried and had some success in this area. It was a controlled failover but it worked like a charm. An obvious condition was that the warm standbys be in perfect sync; you can't do the trick if some of them received the last WAL segment while the others didn't. Please see http://archives.postgresql.org/pgsql-general/2009-07/msg00215.php for my report. Of course, questions and comments are welcome. Cheers, Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "Could not open relation XXX: No such file or directory"
Hi there, On 19/08/2009, at 8:38 PM, Craig Ringer wrote: On 19/08/2009 6:26 PM, Alan Millington wrote: 2009-08-19 03:06:45 ERROR: could not read block 0 of relation 1663/52752/52896: No such file or directory Clearly something is amiss, but I don't know what. I should be grateful for any suggestions as to what I should check. Got a virus scanner installed? If so, remove it (do not just disable it) and see if you can reproduce the problem. Ditto anti-spyware software. You should also `chkdsk' your file system(s) and use a SMART diagnostic tool to test your hard disk (assuming it's a single ATA disk). By the way, `chkdsk' in Windows or `fsck' in Unix can, in a way, be a _source_ of file loss if the file metadata got damaged badly, e.g., by a system crash, and the file node has to be cleared. So I've always been curious if there is a way to retrieve surviving records from a PostgreSQL database damaged by file loss. Do you know any? (Of course, the only true solution is to have been making backups beforehand, but...) Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: R: [GENERAL] Field's position in Table
On 20/08/2009, at 7:24 PM, vinny wrote: I can't really think of any real reason to put the field at a particular position, applications don't reallty care about the order of fields. ... unless an application is brain-damaged by its using a wildcard select, which is a well-known no-no even for home-made scripts, as it has already been pointed out here. My point here being that applications' robustness to apparent field order, like liberty, shouldn't be taken for granted: it needs to be explicitly minded, protected and sometimes fought for. :-) Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "Could not open relation XXX: No such file or directory"
On 21/08/2009, at 12:40 PM, Seth Gordon wrote: Yaroslav Tykhiy wrote: By the way, `chkdsk' in Windows or `fsck' in Unix can, in a way, be a _source_ of file loss if the file metadata got damaged badly, e.g., by a system crash, and the file node has to be cleared. So I've always been curious if there is a way to retrieve surviving records from a PostgreSQL database damaged by file loss. Do you know any? (Of course, the only true solution is to have been making backups beforehand, but...) The Ubuntu Linux site has this page on data recovery (also applicable to other Linux flavors): https://help.ubuntu.com/community/DataRecovery I assume that a database file, because of its structure, is harder to recover after it becomes corrupt than, say, an XML file. But any port in a storm, right? Excuse me, but my curiosity was about a somewhat different thing. Let's assume we did file system level data recovery but lost just a couple of files from $PGDATA/base that were damaged hopelessly. Now, if we start pgsql and try accessing the database, pgsql will fail as soon as it hits a missing file. So I wondered if there was a way to tell pgsql to ignore such errors at the cost of returning possibly inconsistent and corrupted data. It has just occurred to me that recreating the files zero-filled is another option to try. As long as the objects stored in the database are small and/or uncompressed, screwing up a few pages shouldn't affect data from the other pages, right? Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [Skytools-users] WAL Shipping + checkpoint
On 31/08/2009, at 6:16 PM, Sébastien Lardière wrote: On 28/08/2009 18:14, Simon Riggs wrote: On Fri, 2009-08-28 at 17:54 +0200, Sébastien Lardière wrote: Since this moment, the slave didn't make any checkpoint. Now, we know why. Thanks a lot ! But how can i fix it ? Current issue: Rebuild standby from base backup. Cause: Locate the bug in the Index AM that causes it. Symptom: Currently index AMs don't detect situation that index is corrupt so they keep trying to recover the index for ever, even though attempting to do so permanently prevents restartpoints. I would also like them to have a mechanism for marking index corrupt and then auto-rebuild them following recovery. Thanks, I make a restore backup on the slave this morning, and It works ! Could you detail your solution please, if any? I've seen frozen pg_controldata output on my standby server for ages and attributed that to the ancient version of pgsql (8.0.x) I'm stuck with. There have been no index related error messages in my log though. Thank you! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] where clauses and multiple tables
On 09/09/2009, at 9:02 AM, David W Noon wrote: On Tue, 8 Sep 2009 14:25:20 -0700, Scott Frankel wrote about [GENERAL] where clauses and multiple tables: Is it possible to join tables in the where clause of a statement? [snip] Given a statement as follows: SELECT foo.foo_id, foo.name FROM foo, bar WHERE foo.bar_id = bar.bar_id AND bar.name = 'martini'; Just use an IN predicate: SELECT foo_id, name FROM foo WHERE bar_id IN (SELECT bar_id FROM bar WHERE name = 'martini'); This is frequently called a semi-join. By the way, folks, do you think there may be performance gain or loss from rewriting this with an explicit JOIN? E.g.: SELECT DISTINCT foo.foo_id, foo.name FROM foo JOIN bar ON foo.bar_id = bar.bar_id WHERE bar.name='martini'; Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Can I Save images in postgres?
On 09/09/2009, at 10:43 AM, John R Pierce wrote: 纪晓曦 wrote: Can I save images in the postgres? How to define? Does the format matters? Can I save JPG/PNG?How? you can save images as BYTEA data, and the format is totally up to your application, as postgres just treats it as a block of bytes. however, I generally find it easier to store my images on a file system, and just put the file path in the database. By the way, here's a relatively recent discussion of this issue: http://archives.postgresql.org/pgsql-general/2009-06/msg00599.php Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] adding another node to our pitr config
On 06/10/2009, at 11:51 PM, Geoffrey wrote: We are currently using WAL shipping to have a hot spare of our databases. We want to add another node to this configuration. The question is, what is the best way to go about this? Currently, our script checks to see if the WAL file already exists on the target server, if not, then we scp the file over. This is a local machine, so the scp overhead is not considered to be an issue. So, the current approach is: ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4 So, should I simply duplicate that line for the second server and place it below this one, or should they be dependent upon each other? That is: archive_command = 'archive.sh node1 /esc/master/pitr %p %f node2' ssh $1 "test ! -f $2/$4" && scp $3 $1:$2/$4 && ssh $5 "test ! -f $2/$4" && scp $3 $5:$2/$4 The second node will not be at the same location, thus the network reliability is less. Thanks for any insights. I've been interested in a similar setup, too, although I haven't implemented it yet. I think there are at least 3 obvious approaches to consider. They all are basically solutions/workaround to the following issue with multiple spare nodes: If the WAL copy operation to spare node 1 succeeds but that to spare node 2 fails, you have to handle the partial success somehow. Your suggested archive_command will keep returning failure because the WAL segment already exists on node 1. 1. A shared WAL spool on a fault-tolerant SAN mounted via NFS or similar by all nodes. Then you can use a trivial `test && cp && mv' archive_command on the master node and have all the spare nodes fetch WAL files from the spool. (mv is there to make the copy atomic.) An additional advantage is that you can use the same WAL spool for periodic DB backups: You run a file-level backup once in a while and rely on the spool to accumulate the matching WALs for you. A problem with this approach is that it can be non-trivial to implement a truly fault-tolerant shared spool and you'll end up with a single point of failure on it. 2. Destructive copy. Just let your archive_command overwrite existing WAL segments. Then a failure with node 2 will result in a retry from scratch. 3. Delegation of failure handling to archive_command. Instead of relying on the pg archiver process to retry archive_command if it returned failure, run the copy op to each spare node in a loop until success. Then you'll be sure all the nodes received the WAL by the end of the script. From a probabilistic PoV, (3) will be notably better than (2) only if the probability of failure for each node is high. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Reversing flow of WAL shipping
Hi David, On 22/10/2009, at 2:52 PM, David Jantzen wrote: I want to run a warm standby scenario by you. I'm pretty sure it'll work, but it's a very large database so even the slightest mistake can mean a major setback. Scenario: Server A is the provider node, shipping WAL files to Server B. Server B is destined to become the provider node (newer hardware), and has a sibling Server C that will be the warm standby. Here's the question: when I turn Server B into the production/provider node, it's going to switch into the next timeline. If I then attempt to put Server A (or Server C) into recovery mode, will it switch to the new current timeline, or complain about being in the older timeline? Do I have to take another full backup of Server B after it's become the production/provider node? I did a similar trick more than once with a mission-critical database by starting with 1 master/provider server (A) and 2 warm standby servers (B, C), then failing over from A to B, then shipping WAL files from the new master B to C. However I might be (ab)using the feature or bug in Postgresql 8.0 that it wouldn't switch to a new timeline in my environment. (I'd love to find out why it was so.) Please see http://archives.postgresql.org/pgsql-general/2009-07/msg00215.php for details. Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] High Availability with Postgres
Hi, On 21/06/2010, at 3:37 AM, Raymond O'Donnell wrote: On 20/06/2010 17:34, Elior Soliman wrote: Hello, My company looking for some solution for High availability with Postgres. There's quite a bit of information in the documentation here: http://www.postgresql.org/docs/8.4/static/high-availability.html And to keep oneself up to speed: http://momjian.us/main/blogs/pgblog/2010.html#June_16_2010 Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Warm Standby and resetting the primary as a standby
On Sat, Aug 21, 2010 at 12:45:44PM -0400, Bruce Momjian wrote: > Derrick Rice wrote: > > I've been reading up on the documentation for WAL shipping and warm standby > > configuration. One concern that I have (a common one, I'm sure) is that it > > seems that after bringing a standby server up as primary, other standby > > servers (including the original primary) need to be rebased before they can > > read the new primary's WALs in continuous recovery mode. > > > > It seems that the cause of this is a change to the leading digit of the WAL > > files: > > > > http://archives.postgresql.org/pgsql-general/2010-03/msg00985.php > > http://archives.postgresql.org/pgsql-admin/2009-08/msg00179.php > > > > I was hoping that someone would shed some light on this situation with a > > technical explanation. It's not clear to me why the WAL files are > > incompatible or why the digit increases. What does that first digit mean to > > postgresql? Is it possible to have the restore_command ignore the leading > > digit? > > The first digit in the WAL filename is the timeline. > > I think we need to figure out a better way to promote slaves when there > is a new master, but no one has done the research yet. In Postgresql 8.0, I used to rely on what seemed to be a bug in it when it didn't switch timelines if restore_command returned a non-zero status, and that worked like a charm more than once for me. Can switching time- lines be just made optional in recovery.conf or depending on what restore_command returns? Sorry if I'm missing any important architectural points here. Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Warm Standby Weirdness
On Thu, Aug 19, 2010 at 11:22:15PM -0400, Tom Lane wrote: > Sam Nelson writes: > > Here's the output from pg_controldata: > > > $ pg_controldata `pwd` > > WARNING: Calculated CRC checksum does not match value stored in file. > > Either the file is corrupt, or it has a different layout than this program > > is expecting. The results below are untrustworthy. [...] > This is just an educated guess, but I'm going to bet on 32-bit vs 64-bit. > Are you trying to copy the DB to a machine with different word size? > Won't work. Just in case, if anyone ever _really_ needs to run a legacy 32-bit DB on a 64-bit system, they may find it good to know that it works no problem at least in FreeBSD. All what it takes is the 32-bit libraries (a stock part of a 64-bit FreeBSD install) and probably compatibility libraries (from a package) if the 32-bit Postgresql was built on an older system. Of course, the old Postgresql binaries will have to be copied over and used with the old database, but they can work all right in a 64-bit system as soon as compatibility environment is provided using standard system components. My 2 cents worth. Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Query plan choice issue
Hi all, I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can be executed using either of two different query plans, one taking a few milliseconds and the other, tens of seconds. The work_mem setting doesn't seem to affect it -- tried to increase or decrease it by 2 or 4 times, but it didn't seem to favour the fast plan choice. Honestly, I have no idea what affects the plan choice, but I saw Postgresql change it at random. The query in question looks like this -- sorry, it's rather complex: SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 12345 AND status IN (0,1) AND headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) ILIKE '%<@mail.gmail.com>%' ORDER BY message_idnr; It comes from DBMail. That said, I don't think DBMail can be blamed here because at least sometimes Postgresql is able to do the right thing with respect to that query. Here is the slow plan: QUERY PLAN -- Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual time=20933.166..20933.168 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual time=3327.658..20933.122 rows=1 loops=1) -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) (actual time=3327.599..20933.056 rows=1 loops=1) -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) (actual time=3327.503..20932.944 rows=1 loops=1) Join Filter: (v.headername_id = n.id) -> Seq Scan on dbmail_headername n (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Seq Scan on dbmail_headervalue v (cost=0.00..1234407.96 rows=2666 width=16) (actual time=3327.465..20931.942 rows=1 loops=1) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual time=0.088..0.100 rows=1 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.048..0.050 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 20933.241 ms (17 rows) And here is the fast plan: QUERY PLAN --- Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual time=14.163..14.165 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual time=4.272..14.152 rows=1 loops=1) -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) (actual time=4.261..14.135 rows=1 loops=1) -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32) (actual time=4.249..14.117 rows=1 loops=1) -> Index Scan using dbmail_messages_mailbox_idx on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual time=0.043..2.810 rows=358 loops=1) Index Cond: (mailbox_idnr = 12345) Filter: (status = ANY ('{0,1}'::integer[])) -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Index Scan using dbmail_headername_pkey on dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (n.id = v.headername_id) Filter: ((n.headername)::text ~~* 'MESSAGE- ID'::text) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 14.231 ms (18 row
[GENERAL] Schema search path
Hi there, Sorry but I've got yet another issue to discuss today, this time that on schema search path. In fact it may not be a bug, but it may be worth a note in the documentation. It seems that if the table in SELECT FROM has an explicit schema specifier, further references to the same table name will implicitly inherit it. E.g., this query will be valid because the second reference will be to foo.bar not public.bar: SELECT * FROM foo.bar WHERE bar.a=1; ^^^ this means foo.bar Here is a more complex case where I initially came across this issue: psql (8.4.4) Type "help" for help. pgsql=# show search_path; search_path "$user",public (1 row) pgsql=# create table public.tbl_bar (a int); CREATE TABLE pgsql=# create schema sch_foo; CREATE SCHEMA pgsql=# create table sch_foo.tbl_bar (a int); CREATE TABLE pgsql=# insert into public.tbl_bar (a) values (1); INSERT 0 1 pgsql=# insert into sch_foo.tbl_bar (a) values (2); INSERT 0 1 pgsql=# select a from tbl_bar where not exists (select a from sch_foo.tbl_bar where tbl_bar.a=sch_foo.tbl_bar.a); a --- (0 rows) pgsql=# select a from tbl_bar where not exists (select a from sch_foo.tbl_bar where public.tbl_bar.a=sch_foo.tbl_bar.a); a --- 1 (1 row) As just shown, this can be even more confusing with nested queries. Do you think it's a feature or a bug? :-) Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
Hi Martin, Thank you for your response! On 13/09/2010, at 10:49 AM, Martin Gainty wrote: a cursory look of the plan details a FTS on dbmail_headername invoked by the JOIN clause JOIN dbmail_headername n ON v.headername_id=n.id you would accelerate the seek appreciably by placing indexes on both participating columns v.headername_id n.id Granted, there was no index on v.headername_id but creating one just slowed the query down, with a different plan: CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue (headername_id); EXPLAIN ANALYSE ... QUERY PLAN Sort (cost=222020.81..222020.81 rows=1 width=8) (actual time=28636.426..28636.426 rows=0 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1409.82..222020.80 rows=1 width=8) (actual time=28636.409..28636.409 rows=0 loops=1) -> Nested Loop (cost=1409.82..222012.27 rows=1 width=24) (actual time=28636.405..28636.405 rows=0 loops=1) -> Nested Loop (cost=1409.82..221959.94 rows=6 width=8) (actual time=28543.441..28624.750 rows=1 loops=1) -> Seq Scan on dbmail_headername n (cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.71 rows=1 width=16) (actual time=11.646..11.646 rows=0 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.52 rows=1 width=8) (never executed) Index Cond: (p.id = m.physmessage_id) Total runtime: 28636.517 ms (19 rows) I also see a FTS on domain_headervalue invoked by the JOIN cluase JOIN dbmail_headervalue v ON v.physmessage_id=p.id place indexes on both columns v.physmessage_id p.id Both columns already indexed here: On public.dbmail_headervalue (alias v): "dbmail_headervalue_2" btree (physmessage_id) On public.dbmail_physmessage (alias p): "dbmail_physmessage_pkey" PRIMARY KEY, btree (id) Perhaps I should provide some data on the table sizes. dbmail_headervalue is the largest table with respect to its record count: 36 million records. dbmail_headername is small: 5640 records. dbmail_physmessage and dbmail_messages are of an average size: ~2 million records each. Sorry for my cluelessness on this issue. But Postgresql's ability to build a fast query plan for this query type at least occasionally is encouraging. :-) Yar the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed for both participants Martin __ Verzicht und Vertraulichkeitanmerkung Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. From: y...@barnet.com.au To: pgsql-general@postgresql.org Subject: [GENERAL] Query plan choice issue Date: Mon, 13 Sep 2010 09:36:35 +1000 Hi all, I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can be executed using either of two different query plans, one taking a few milliseconds and the other, tens of seconds. The work_mem setting doesn't seem to affect it -- tried to increase or decrease it by 2 or 4 times, but it didn't seem to favour the fast plan choice. Honestly, I have no idea what affects the plan choice, but I saw Postgresql change it at random. The query in question looks like this -- sorry, it's rather complex: SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbma
Re: [GENERAL] Schema search path
On 14/09/2010, at 8:56 AM, Tom Lane wrote: Bruce Momjian writes: Yaroslav Tykhiy wrote: SELECT * FROM foo.bar WHERE bar.a=1; ^^^ this means foo.bar Do you think it's a feature or a bug? :-) Feature, and SQL-standard behavior. It might be worth pointing out that this has nothing to do with search_path; rather, the key is that the FROM clause establishes a table alias "bar" for the query. Sure, that makes sense because it just extends the well-known aliasing for unqualified column names, as in "SELECT a FROM foo", to table names as well. But a remark on this feature in the SCHEMA related documentation pages can be a good idea IMHO. Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
Hi Tom, On 14/09/2010, at 12:41 AM, Tom Lane wrote: Yaroslav Tykhiy writes: -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead: mail=# \d dbmail_headervalue Table "public.dbmail_headervalue" Column | Type | Modifiers + +--- headername_id | bigint | not null physmessage_id | bigint | not null id | bigint | not null default nextval('dbmail_headervalue_idnr_seq'::regclass) headervalue| text | not null default ''::text Indexes: "dbmail_headervalue_pkey" PRIMARY KEY, btree (id) "dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id) "dbmail_headervalue_2" btree (physmessage_id) "dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255)) ... EXPLAIN ANALYZE... -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) ... Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable. Can it mean that the default planner configuration slightly overfavours seq scans? Thank you all guys! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
On 14/09/2010, at 10:37 AM, Yaroslav Tykhiy wrote: On 14/09/2010, at 12:41 AM, Tom Lane wrote: Yaroslav Tykhiy writes: [...] I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead: [...] Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable. Can it mean that the default planner configuration slightly overfavours seq scans? Funnily, after a few days of running with random_page_cost=2, exactly the same query became slow again and I had to reduce random_page_cost further to 1.5 to make it fast. Can it be a sign of a problem in the planner? Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Restore/dump from "/usr/local/pgsql/data" directory
On 25/09/2010, at 1:11 AM, Craig Ringer wrote: On 24/09/2010 8:40 PM, Raymond O'Donnell wrote: On 24/09/2010 13:21, kongs...@stud.ntnu.no wrote: What version of PG was it? The "PG_VERSION" file = 8.3 OK, well at least it's not an ancient version that's not available any more. :-) As Craig said, the best thing is to get hold of a copy of 8.3 that matches the architecture of the old server machine Or compile one, if necessary. You should *certainly* compile one in preference to trying to hack outdated packages onto your new system by force, as some people seem to do. *BAD* *IDEA*, do not try it. Just by way of warning. If you do compile it, specify a non-default --prefix that's a unique new subtree, so you can just "rm -rf" it when you're done with it. Think --prefix=/opt/pgsql8.3 . If you install it directly into /usr/ local you'll have a crufty old libpq and headers hanging around later. I'd like to just add a few words of encouragement here. Postgresql's build system is absolutely marvellous in that it can produce a totally self-contained installation under --prefix that can be copied across compatible systems etc. You even can build several variants and install them at different prefixes if unsure which is going to match the database. So your only worry is to make sure you are playing around with a copy of the database copy, not with the master copy itself. :-) A virtual machine can be useful for experiments, as usual, but a Postgresql installation to a custom prefix will not litter the base system so it's pretty safe to try it out on a live machine of suitable architecture rather than in a "sandbox", which can save some time. Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Warm standby: 1 to N
Hi All, Let's consider the following case: WAL segments from a master have been shipped to N warm standby servers, and now the master fails. Using this or that mechanism, one of the warm standbys takes over and becomes the new master. Now the question is what to do with the other N-1 warm standbys. By the failure, all N warm standbys were the same exact copies of the master. So at least in theory, the N-1 warm standbys left can be fed with WAL segments from the new master. Do you think it will work in practice? Are there any pitfalls? Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general