Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera wrote: > Steve Kehlet wrote: >> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we >> just dropped new binaries in place) but it wouldn't start up. I found this >> in the logs: >> >> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: >> database system was shut down at 2015-05-27 13:12:55 PDT >> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is >> starting up >> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of >> transaction 1 > > I am debugging today a problem currently that looks very similar to > this. AFAICT the problem is that WAL replay of an online checkpoint in > which multixact files are removed fails because replay tries to read a > file that has already been removed. Hmm, so what exactly is the sequence of events here? It's possible that I'm not thinking clearly just now, but it seems to me that if we're replaying the same checkpoint we replayed previously, the offset of the oldest multixact will be the first file that we didn't remove. However, I can see that there could be a problem if we try to replay an older checkpoint after having already replayed a new one - for example, if a standby replays checkpoint A truncating the members multixact and performs a restart point, and then replays checkpoint B truncating the members multixact again but without performing a restartpoint, and then is shut down, it will resume replay from checkpoint A, and trouble will ensue. Is that the scenario, or is there something else? > I think the fix to this is to verify whether the file exists on disk > before reading it; if it doesn't, assume the truncation has already > happened and that it's not necessary to remove it. That might be an OK fix, but this implementation doesn't seem very clean. If we're going to remove the invariant that MultiXactState->oldestOffset will always be valid after replaying a checkpoint, then we should be explicit about that and add a flag indicating whether or not it's currently valid. Shoving nextOffset in there and hoping that's good enough seems like a bad idea to me. I think we should modify the API for find_multixact_start. Let's have it return a Boolean and return oldestOffset via an out parameter. If !InRecovery, it will always return true and set the out parameter; but if in recovery, it is allowed to return false without setting the out parameter. Both values can get stored in MultiXactState, and we can adjust the logic elsewhere to disregard oldestOffset when the accompanying flag is false. This still leaves open an ugly possibility: can we reach normal running without a valid oldestOffset? If so, until the next checkpoint happens, autovacuum has no clue whether it needs to worry. There's got to be a fix for that, but it escapes me at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, May 27, 2015 at 10:14 PM, Alvaro Herrera wrote: > Well I'm not very clear on what's the problematic case. The scenario I > actually saw this first reported was a pg_basebackup taken on a very > large database, so the master could have truncated multixact and the > standby receives a truncated directory but actually tries to apply a > checkpoint that is much older than what the master currently has > transmitted as pg_multixact contents. OK, that makes sense. >> That might be an OK fix, but this implementation doesn't seem very >> clean. If we're going to remove the invariant that >> MultiXactState->oldestOffset will always be valid after replaying a >> checkpoint, then we should be explicit about that and add a flag >> indicating whether or not it's currently valid. Shoving nextOffset in >> there and hoping that's good enough seems like a bad idea to me. >> >> I think we should modify the API for find_multixact_start. Let's have >> it return a Boolean and return oldestOffset via an out parameter. If >> !InRecovery, it will always return true and set the out parameter; but >> if in recovery, it is allowed to return false without setting the out >> parameter. Both values can get stored in MultiXactState, and we can >> adjust the logic elsewhere to disregard oldestOffset when the >> accompanying flag is false. > > Sounds good. I think I prefer that multixact creation is rejected > altogether if the new flag is false. Is that what you mean when you say > "adjust the logic"? No. I'm not sure quite what you mean here. We can't reject multixact creation during normal running, and during recovery, we won't create any really new mulitxacts, but we must replay the creation of multixacts. What I meant was stuff like this: if (!MultiXactIdPrecedes(result, MultiXactState->multiVacLimit) || (MultiXactState->nextOffset - MultiXactState->oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD)) I meant that we'd change the second prong of the test to check multiXactState->nextOffsetValid && MultiXactState->nextOffset - MultiXactState->oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD. And likewise change anything else that relies on oldestOffset. Or else we guarantee that we can't reach those points until the oldestOffset is valid, and then check that it is with an Assert() or elog(). >> This still leaves open an ugly possibility: can we reach normal >> running without a valid oldestOffset? If so, until the next >> checkpoint happens, autovacuum has no clue whether it needs to worry. >> There's got to be a fix for that, but it escapes me at the moment. > > I think the fix to that issue is to set the oldest offset on > TrimMultiXact. That way, once WAL replay finished we're certain that we > have a valid oldest offset to create new multixacts with. > > I'm also wondering whether the call to DetermineSafeOldestOffset on > StartupMultiXact is good. At that point, we haven't replayed any WAL > yet, so the oldest multi might be pointing at a file that has already > been removed -- again considering the pg_basebackup scenario where the > multixact files are copied much later than pg_control, so the checkpoint > to replay is old but the pg_multixact contents have already been > truncated in the master and are copied truncated. Moving the call from StartupMultiXact() to TrimMultiXact() seems like a good idea. I'm not sure why we didn't do that before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera wrote: > Steve Kehlet wrote: >> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we >> just dropped new binaries in place) but it wouldn't start up. I found this >> in the logs: >> >> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: >> database system was shut down at 2015-05-27 13:12:55 PDT >> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is >> starting up >> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of >> transaction 1 > > I am debugging today a problem currently that looks very similar to > this. AFAICT the problem is that WAL replay of an online checkpoint in > which multixact files are removed fails because replay tries to read a > file that has already been removed. Steve: Can you tell us more about how you shut down the old cluster? Did you by any chance perform an immediate shutdown? Do you have the actual log messages that were written when the system was shut down for the upgrade? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera wrote: > Steve Kehlet wrote: >> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we >> just dropped new binaries in place) but it wouldn't start up. I found this >> in the logs: >> >> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: >> database system was shut down at 2015-05-27 13:12:55 PDT >> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is >> starting up >> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of >> transaction 1 > > I am debugging today a problem currently that looks very similar to > this. AFAICT the problem is that WAL replay of an online checkpoint in > which multixact files are removed fails because replay tries to read a > file that has already been removed. Wait a minute, wait a minute. There's a serious problem with this theory, at least in Steve's scenario. This message: 2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: database system was shut down at 2015-05-27 That message implies a *clean shutdown*. If he had performed an immediate shutdown or just pulled the plug, it would have said "database system was interrupted" or some such. There may be bugs in redo, also, but they don't explain what happened to Steve. Steve, is there any chance we can get your pg_controldata output and a list of all the files in pg_clog? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 8:01 AM, Robert Haas wrote: > On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera > wrote: >> Steve Kehlet wrote: >>> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we >>> just dropped new binaries in place) but it wouldn't start up. I found this >>> in the logs: >>> >>> waiting for server to start2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: >>> database system was shut down at 2015-05-27 13:12:55 PDT >>> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is >>> starting up >>> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of >>> transaction 1 >> >> I am debugging today a problem currently that looks very similar to >> this. AFAICT the problem is that WAL replay of an online checkpoint in >> which multixact files are removed fails because replay tries to read a >> file that has already been removed. > > Wait a minute, wait a minute. There's a serious problem with this > theory, at least in Steve's scenario. This message: > > 2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: database system was shut > down at 2015-05-27 > > That message implies a *clean shutdown*. If he had performed an > immediate shutdown or just pulled the plug, it would have said > "database system was interrupted" or some such. > > There may be bugs in redo, also, but they don't explain what happened to > Steve. > > Steve, is there any chance we can get your pg_controldata output and a > list of all the files in pg_clog? Err, make that pg_multixact/members, which I assume is at issue here. You didn't show us the DETAIL line from this message, which would presumably clarify: FATAL: could not access status of transaction 1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 8:03 AM, Robert Haas wrote: >> Steve, is there any chance we can get your pg_controldata output and a >> list of all the files in pg_clog? > > Err, make that pg_multixact/members, which I assume is at issue here. > You didn't show us the DETAIL line from this message, which would > presumably clarify: > > FATAL: could not access status of transaction 1 And I'm still wrong, probably. The new code in 9.4.2 cares about being able to look at an *offsets* file to find the corresponding member offset. So most likely it is an offsets file that is missing here. The question is, how are we ending up with an offsets file that is referenced by the control file but not actually present on disk? It seems like it would be good to compare the pg_controldata output to what is actually present in pg_multixact/offsets (hopefully that's the right directory, now that I'm on my third try) and try to understand what is going on here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 8:51 AM, Robert Haas wrote: > [ speculation ] OK, I finally managed to reproduce this, after some off-list help from Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to do it: 1. Install any pre-9.3 version of the server and generate enough multixacts to create at least TWO new segments. When you shut down the server, all segments except for the most current one will be removed. At this point, the only thing in $PGDATA/pg_multixact/offsets should be a single file, and the name of that file should not be or 0001. 2. Use pg_upgrade to upgrade to 9.3.4. It is possible that versions < 9.3.4 will also work here, but you must not use 9.3.5 or higher, because 9.3.5 includes Bruce's commit 3d2e18510, which arranged to preserve relminmxid and datminmxid values. At this point, pg_controldata on the new cluster should show an oldestMultiXid value greater than 1 (copied from the old cluster), but all the datminmxid values are 1. Also, initdb will have left behind a bogus file in pg_multixact/offsets. 3. Move to 9.3.5 (or 9.3.6), not via pg_upgrade, but just by dropping in the new binaries. Follow the instructions in the 9.3.5 release notes; since you created at least TWO new segments in step one, there will be no 0001 file, and the query there will say that you should remove the bogus file. So do that, leaving just the good file in pg_multixact/offsets. At this point, pg_multixact/offsets is OK, and pg_controldata still says that oldestMultiXid > 1, so that is also OK. The only problem is that we've got some bogus datminmxid values floating around. Our next step will be to convince vacuum to propagate the bogus datminmxid values back into pg_controldata. 4. Consume at least one transaction ID (e.g. SELECT txid_current()) and then do this: postgres=# set vacuum_freeze_min_age = 0; SET postgres=# set vacuum_freeze_table_age = 0; SET postgres=# vacuum; VACUUM Setting the GUCs forces full table scans, so that we advance relfrozenxid. But notice that we were careful not to just run VACUUM FREEZE, which would have also advanced relminmxid, which, for purposes of reproducing this bug, is not what we want to happen. So relminmxid is still (incorrectly) set to 1 for every database. However, since the vacuum did advance relfrozenxid, it will call vac_truncate_clog, which will call SetMultiXactIdLimit, which will propagate the bogus datminmxid = 1 setting into shared memory. (In my testing, this step doesn't work if performed on 9.3.4; you have to do it on 9.3.5. I think that's because of Tom's commit 78db307bb, but I believe in a more complex test scenario you might be able to get this to happen on 9.3.4 also.) I believe it's the case that an autovacuum of even a single table can substitute for this step if it happens to advance relfrozenxid but not relminmxid. 5. The next checkpoint, or the shutdown checkpoint in any event, will propagate the bogus value of 1 from shared memory back into the control file. 6. Now try to start 9.3.7. It will see the bogus oldestMultiXid = 1 value in the control file, attempt to read the corresponding offsets file, and die. In the process of investigating this, we found a few other things that seem like they may also be bugs: - As noted upthread, replaying an older checkpoint after a newer checkpoint has already happened may lead to similar problems. This may be possible when starting from an online base backup; or when restarting a standby that did not perform a restartpoint when replaying the last checkpoint before the shutdown. - pg_upgrade sets datminmxid = old_cluster.controldata.chkpnt_nxtmulti, which is correct only if there are ZERO multixacts in use at the time of the upgrade. It would be best, I think, to set this to the same value it had in the old cluster, but if we're going to use a blanket value, I think it needs to be chkpnt_oldstMulti. - There's a third possible problem related to boundary cases in SlruScanDirCbRemoveMembers, but I don't understand that one well enough to explain it. Maybe Thomas can jump in here and explain the concern. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, May 28, 2015 at 4:06 PM, Joshua D. Drake wrote: > FTR: Robert, you have been a Samurai on this issue. Our many thanks. Thanks! I really appreciate the kind words. So, in thinking through this situation further, it seems to me that the situation is pretty dire: 1. If you pg_upgrade to 9.3 before 9.3.5, then you may have relminmxid or datminmxid values which are 1 instead of the correct value. Setting the value to 1 was too far in the past if your MXID counter is < 2B, and too far in the future if your MXID counter is > 2B. 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid values which are equal to the next-mxid counter instead of the correct value; in other words, they are two new. 3. If you pg_upgrade to 9.3.5, 9.3.6, 9.4.0, or 9.4.1, then you will have the first problem for tables in template databases, and the second one for the rest. (See 866f3017a.) 4. Wrong relminmxid or datminmxid values can eventually propagate into the control file, as demonstrated in my previous post. Therefore, we can't count on relminmxid to be correct, we can't count on datminmxid to be correct, and we can't count on the control file to be correct. That's a sack of sad. 5. If the values are too far in the past, then nothing really terrible will happen unless you upgrade to 9.3.7 or 9.4.2, at which point the system will refuse to start. Forcing a VACUUM FREEZE on every database, including the unconnectable ones, should fix this and allow you to upgrade safely - which you want to do, because 9.3.7 and 9.4.2 fix a different set of multixact data loss bugs. 6. If the values are too far in the future, the system may fail to prevent wraparound, leading to data loss. I am not totally clear on whether a VACUUM FREEZE will fix this problem. It seems like the chances are better if you are running at least 9.3.5+ or 9.4.X, because of 78db307bb. But I'm not sure how complete a fix that is. So what do we do about this? I have a few ideas: A. Most obviously, we should fix pg_upgrade so that it installs chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so that we stop creating new instances of this problem. That won't get us out of the hole we've dug for ourselves, but we can at least try to stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the wrong thing - anyone want to double-check me on that one?) B. We need to change find_multixact_start() to fail softly. This is important because it's legitimate for it to fail in recovery, as discussed upthread, and also because we probably want to eliminate the fail-to-start hazard introduced in 9.4.2 and 9.3.7. find_multixact_start() is used in three places, and they each require separate handling: - In SetMultiXactIdLimit, find_multixact_start() is used to set MultiXactState->oldestOffset, which is used to determine how aggressively to vacuum. If find_multixact_start() fails, we don't know how aggressively we need to vacuum to prevent members wraparound; it's probably best to decide to vacuum as aggressively as possible. Of course, if we're in recovery, we won't vacuum either way; the fact that it fails softly is good enough. - In DetermineSafeOldestOffset, find_multixact_start() is used to set MultiXactState->offsetStopLimit. If it fails here, we don't know when to refuse multixact creation to prevent wraparound. Again, in recovery, that's fine. If it happens in normal running, it's not clear what to do. Refusing multixact creation is an awfully blunt instrument. Maybe we can scan pg_multixact/offsets to determine a workable stop limit: the first file greater than the current file that exists, minus two segments, is a good stop point. Perhaps we ought to use this mechanism here categorically, not just when find_multixact_start() fails. It might be more robust than what we have now. - In TruncateMultiXact, find_multixact_start() is used to set the truncation point for the members SLRU. If it fails here, I'm guessing the right solution is not to truncate anything - instead, rely on intense vacuuming to eventually advance oldestMXact to a value whose member data still exists; truncate then. C. I think we should also change TruncateMultiXact() to truncate offsets first, and then members. As things stand, if we truncate members first, we increase the risk of seeing an offset that will fail when passed to find_multixact_start(), because TruncateMultiXact() might get interrupted before it finishes. That seem like an unnecessary risk. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
at we're doing right now. The current logic purports to leave a one-file gap in the member space, but there's no guarantee that the gap really exists on disk the way we think it does. With this approach, we can be certain that there is a gap. And that is a darned good thing to be certain about. >> C. I think we should also change TruncateMultiXact() to truncate >> offsets first, and then members. As things stand, if we truncate >> members first, we increase the risk of seeing an offset that will fail >> when passed to find_multixact_start(), because TruncateMultiXact() >> might get interrupted before it finishes. That seem like an >> unnecessary risk. > > Not sure about this point. We did it the way you propose previously, > and found it to be a problem because sometimes we tried to read an > offset file that was no longer there. Do we really read member files > anywhere? I thought we only tried to read offset files. If we remove > member files, what is it that we try to read and find not to be present? Do you have a link to the previous discussion? I mean, the problem we're having right now is that sometimes we have an offset, but the corresponding member isn't there. So clearly offsets reference members. Do members also reference offsets? I didn't think so, but life is full of surprises. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, May 29, 2015 at 10:17 AM, Tom Lane wrote: > Thomas Munro writes: >> On Fri, May 29, 2015 at 11:24 AM, Robert Haas wrote: >>> B. We need to change find_multixact_start() to fail softly. > >> Here is an experimental WIP patch that changes StartupMultiXact and >> SetMultiXactIdLimit to find the oldest multixact that exists on disk >> (by scanning the directory), and uses that if it is more recent than >> the oldestMultiXactId from shmem, > > Not sure about the details of this patch, but I was planning to propose > what I think is the same thing: the way to make find_multixact_start() > fail softly is to have it find the oldest actually existing file if the > one that should be there isn't. Working on that now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, May 29, 2015 at 12:43 PM, Robert Haas wrote: > Working on that now. OK, here's a patch. Actually two patches, differing only in whitespace, for 9.3 and for master (ha!). I now think that the root of the problem here is that DetermineSafeOldestOffset() and SetMultiXactIdLimit() were largely ignorant of the possibility that they might be called at points in time when the cluster was inconsistent. SetMultiXactIdLimit() bracketed certain parts of its logic with if (!InRecovery), but those guards were ineffective because it gets called before InRecovery is set in the first place. It seems pretty clear that we can't effectively determine anything about member wraparound until the cluster is consistent. Before then, there might be files missing from the offsets or members SLRUs which get put back during replay. There could even be gaps in the sequence of files, with some things having made it to disk before the crash (or having made it into the backup) and others not. So all the work of determining what the safe stop points and vacuum thresholds for members are needs to be postponed until TrimMultiXact() time. And that's fine, because we don't need this information in recovery anyway - it only affects behavior in normal running. So this patch does the following: 1. Moves the call to DetermineSafeOldestOffset() that appears in StartupMultiXact() to TrimMultiXact(), so that we don't try to do this until we're consistent. Also, instead of passing MultiXactState->oldestMultiXactId, pass the newer of that value and the earliest offset that exists on disk. That way, it won't try to read data that's not there. Note that the second call to DetermineSafeOldestOffset() in TruncateMultiXact() doesn't need a similar guard, because we already bail out of that function early if the multixacts we're going to truncate away don't exist. 2. Adds a new flag MultiXactState->didTrimMultiXact indicate whether we've finished TrimMultiXact(), and arranges for SetMultiXactIdLimit() to use that rather than InRecovery to test whether it's safe to do complicated things that might require that the cluster is consistent. This is a slight behavior change, since formerly we would have tried to do that stuff very early in the startup process, and now it won't happen until somebody completes a vacuum operation. If that's a problem, we could consider doing it in TrimMultiXact(), but I don't think it's safe the way it was. The new flag also prevents oldestOffset from being set while in recovery; I think it would be safe to do that in recovery once we've reached consistency, but I don't believe it's necessary. 3. Arranges for TrimMultiXact() to set oldestOffset. This is necessary because, previously, we relied on SetMultiXactIdLimit doing that during early startup or during recovery, and that's no longer true. Here too we set oldestOffset keeping in mind that our notion of the oldest multixact may point to something that doesn't exist; if so, we use the oldest MXID that does. 4. Modifies TruncateMultiXact() so that it doesn't re-scan the SLRU directory on every call to find the oldest file that exists. Instead, it arranges to remember the value from the first scan and then updates it thereafter to reflect its own truncation activity. This isn't absolutely necessary, but because this oldest-file logic is used in multiple places (TrimMultiXact, SetMultiXactIdLimit, and TruncateMultiXact all need it directly or indirectly) caching the value seems like a better idea than recomputing it frequently. I have tested that this patch fixes Steve Kehlet's problem, or at least what I believe to be Steve Kehlet's problem based on the reproduction scenario I described upthread. I believe it will also fix the problems with starting up from a base backup with Alvaro mentioned upthread. It won't fix the fact that pg_upgrade is putting a wrong value into everybody's datminmxid field, which should really be addressed too, but I've been working on this for about three days virtually non-stop and I don't have the energy to tackle it right now. If anyone feels the urge to step into that breech, I think what it needs to do is: when upgrading from a 9.3-or-later instance, copy over each database's datminmxid into the corresponding database in the new cluster. Aside from that, it's very possible that despite my best efforts this has serious bugs. Review and testing would be very much appreciated. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 699497c..8d28a5c 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -197,8 +197,9 @@ typedef struct MultiXactStateData Mu
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, May 29, 2015 at 3:08 PM, Robert Haas wrote: > It won't fix the fact that pg_upgrade is putting > a wrong value into everybody's datminmxid field, which should really > be addressed too, but I've been working on this for about three days > virtually non-stop and I don't have the energy to tackle it right now. > If anyone feels the urge to step into that breech, I think what it > needs to do is: when upgrading from a 9.3-or-later instance, copy over > each database's datminmxid into the corresponding database in the new > cluster. Bruce was kind enough to spend some time on IM with me this afternoon, and I think this may actually be OK. What pg_upgrade does is: 1. First, put next-xid into the relminmxid for all tables, including catalog tables. This is the correct behavior for upgrades from a pre-9.3 release, and is correct for catalog tables in general. 2. Next, restoring the schema dump will set the relminmxid values for all non-catalog tables to the value dumped from the old cluster. At this point, everything is fine provided that we are coming from a release 9.3 or newer. But if the old cluster is pre-9.3, it will have dumped *zero* values for all of its relminmxid values; so all of the user tables go from the correct value they had after step 1 to an incorrect value. 3. Finally, if the old cluster is pre-9.3, repeat step 1, undoing the damage done in step 2. This is a bit convoluted, but I don't know of a reason why it shouldn't work. Sorry for the false alarm. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, May 29, 2015 at 9:46 PM, Andres Freund wrote: > On 2015-05-29 15:08:11 -0400, Robert Haas wrote: >> It seems pretty clear that we can't effectively determine anything >> about member wraparound until the cluster is consistent. > > I wonder if this doesn't actually hints at a bigger problem. Currently, > to determine where we need to truncate SlruScanDirectory() is > used. That, afaics, could actually be a problem during recovery when > we're not consistent. I agree. I actually meant to mention this in my previous email, but, owing to exhaustion and burnout, didn't. > I think at least for 9.5+ we should a) invent proper truncation records > for pg_multixact b) start storing oldestValidMultiOffset in pg_control. > The current hack of scanning the directories to get knowledge we should > have is a pretty bad hack, and we should not continue using it forever. > I think we might end up needing to do a) even in the backbranches. That may be the right thing to do. I'm concerned that changing the behavior of master too much will make it every subsequent fix twice as hard, because we'll have to do one fix in master and another fix in the back-branches. I'm also concerned that it will create even more convoluted failure scenarios. The failure-to-start problem discussed on this thread requires a chain of four (maybe three) different PostgreSQL versions in order to create it, and the more things we go change, the harder it's going to be to reason about this stuff. The diseased and rotting elephant in the room here is that clusters with bogus relminmxid, datminmxid, and/or oldestMultiXid values may exist in the wild and we really have no plan to get rid of them. 78db307bb may have helped somewhat - although I'm haven't grokked what it's about well enough to be sure - but it's certainly not a complete solution, as this bug report itself illustrates rather well. Unless we figure out some clever solution that is not now apparent to me, or impose a hard pg_upgrade compatibility break at some point, we basically can't count on pg_control's "oldest multixact" information to be correct ever again. We may be running into clusters 15 years from now that have problems that are just holdovers from what was fixed in 9.3.5. One thing I think we should definitely do is add one or two additional fields to pg_controldata that get filled in by pg_upgrade. One of them should be "the oldest known catversion in the lineage of this cluster" and the other should be "the most recent catverson in the lineage of this cluster before this one". Or maybe we should store PG_VERSION_NUM values. Or store both things. I think that would make troubleshooting this kind of problem a lot easier - just from the pg_controldata output, you'd be able to tell whether the cluster had been pg_upgraded, whether it had been pg_upgraded once or multiple times, and at least some of the versions involved, without relying on the user's memory of what they did and when. Fortunately, Steve Kellet had a pretty clear idea of what his history was, but not all users know that kind of thing, and I've wanted it more than once while troubleshooting. Another thing I think we should do is add a field to pg_class that is propagated by pg_upgrade and stores the most recent PG_VERSION_NUM that is known to have performed a scan_all vacuum of the table. This would allow us to do things in the future like (a) force a full-table vacuum of any table that hasn't been vacuumed since $BUGGYRELEASE or (b) advise users to manually inspect the values and manually perform said vacuum or (c) only believe that certain information about a table is accurate if it's been full-scanned by a vacuum newer than $BUGGYRELEASE. It could also be used as part of a strategy for reclaiming HEAP_MOVED_IN/HEAP_MOVED_OFF; e.g. you can't upgrade to 10.5, which repurposes those bits, unless you've done a scan_all vacuum on every table with a release new enough to guarantee that they're not used for their historical purpose. > This problem isn't conflicting with most of the fixes you describe, so > I'll continue with reviewing those. Thank you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Sat, May 30, 2015 at 8:55 PM, Andres Freund wrote: > Is oldestMulti, nextMulti - 1 really suitable for this? Are both > actually guaranteed to exist in the offsets slru and be valid? Hm. I > guess you intend to simply truncate everything else, but just in > offsets? oldestMulti in theory is the right thing, I think, but in actuality we know that some people have 1 here instead of the correct value. >> One argument against this idea is that we may not want to keep a full >> set of member files on standbys (due to disk space usage), but that's >> what will happen unless we truncate during replay. > > I think that argument is pretty much the death-knell.= Yes. Truncating on the standby is really not optional. >> > I think at least for 9.5+ we should a) invent proper truncation records >> > for pg_multixact b) start storing oldestValidMultiOffset in pg_control. >> > The current hack of scanning the directories to get knowledge we should >> > have is a pretty bad hack, and we should not continue using it forever. >> > I think we might end up needing to do a) even in the backbranches. >> >> Definitely agree with WAL-logging truncations; also +1 on backpatching >> that to 9.3. We already have experience with adding extra WAL records >> on minor releases, and it didn't seem to have bitten too hard. > > I'm inclined to agree. My only problem is that I'm not sure whether we > can find a way of doing all this without adding a pg_control field. Let > me try to sketch this out: > > 1) We continue determining the oldest > SlruScanDirectory(SlruScanDirCbFindEarliest) >on the master to find the oldest offsets segment to >truncate. Alternatively, if we determine it to be safe, we could use >oldestMulti to find that. > 2) SlruScanDirCbRemoveMembers is changed to return the range of members >to remove, instead of doing itself > 3) We wal log [oldest offset segment guaranteed to not be alive, >nextmulti) for offsets, and [oldest members segment guaranteed to not be > alive, >nextmultioff), and redo truncations for the entire range during >recovery. > > I'm pretty tired right now, but this sounds doable. I'm probably biased here, but I think we should finish reviewing, testing, and committing my patch before we embark on designing this. So far we have no reports of trouble attributable to the lack of the WAL-logging support discussed here, as opposed to several reports of trouble from the status quo within days of release. I'm having trouble reconstructing the series of events where what you're worried about here really becomes a problem, and I think we ought to have a very clear idea about that before back-patching changes of this type. It's true that if the state of the SLRU directory is in the future, because recovery was restarted from an earlier checkpoint, we might replay a checkpoint and remove some of those files from the future. But so what? By the time we've reached the minimum recovery point, they will have been recreated by the same WAL records that created them in the first place. If, in the previous replay, we had wrapped all the way around, some of the stuff we keep may actually already have been overwritten by future WAL records, but they'll be overwritten again. Now, that could mess up our determination of which members to remove, I guess, but I'm not clear that actually matters either: if the offsets space has wrapped around, the members space will certainly have wrapped around as well, so we can remove anything we like at this stage and we're still OK. I agree this is ugly the way it is, but where is the actual bug? As far as your actual outline goes, I think if we do this, we need to be very careful about step #2. Right now, we decide what we need to keep and then remove everything else, but that's kind of wonky because new stuff may be getting created at the same time, so we keep adjusting our idea of exactly what needs to be removed. It would be far better to invert that logic: decide what needs to be removed - presumably, everything from the oldest member that now exists up until some later point - and then remove precisely that stuff and nothing else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch wrote: > Incomplete review, done in a relative rush: Thanks. > On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote: >> OK, here's a patch. Actually two patches, differing only in >> whitespace, for 9.3 and for master (ha!). I now think that the root >> of the problem here is that DetermineSafeOldestOffset() and >> SetMultiXactIdLimit() were largely ignorant of the possibility that >> they might be called at points in time when the cluster was >> inconsistent. > > A cause perhaps closer to the root is commit f741300 moving truncation from > VACUUM to checkpoints. CLOG has given us deep experience with VACUUM-time > truncation. Commit f6a6c46d and this patch are about bringing CHECKPOINT-time > truncation up to the same level. > > Speaking of commit f6a6c46d, it seems logical that updating allocation stop > limits should happen proximate to truncation. That's currently the case for > CLOG (vac_truncate_clog() does both) and pg_multixact/members (checkpoint's > TruncateMultiXact() call does both). However, pg_multixact/offsets is > truncated from TruncateMultiXact(), but vac_truncate_clog() updates its limit. > I did not distill an errant test case, but this is fishy. Good point. Because we confine ourselves to using half the offset space, it seems much harder for us to get into trouble here than it is with members. The first scenario that occurred to me is that the SLRU might actually wrap. That seems tough, though: between one checkpoint and the next, vacuum would need to advance oldest_datminmxid by 2^31 MXIDs while generating 2^31 new ones, or something like that. That doesn't seem real plausible. But then it occurred to me that it's probably sufficient to advance the head of the SLRU far enough that TruncateMultiXact things that the tail is in the future instead of in the past. I see no reason why that couldn't happen. Then we'd end up leaving some files behind that we should have removed. I'm not sure exactly what problem that would cause; would they just get overwritten on the next pass through the space, or would they cause errors? I have not had time to check. >> SetMultiXactIdLimit() bracketed certain parts of its >> logic with if (!InRecovery), but those guards were ineffective because >> it gets called before InRecovery is set in the first place. > > SetTransactionIdLimit() checks InRecovery for the same things, and it is > called at nearly the same moments as SetMultiXactIdLimit(). Do you have a > sense of whether it is subject to similar problems as a result? Well, I think it's pretty weird that those things will get done before beginning recovery, even on an inconsistent cluster, but not during recovery. That is pretty strange. I don't see that it can actually do any worse than emit a few log messages at the start of recovery that won't show up again until the end of recovery, though. >> 1. Moves the call to DetermineSafeOldestOffset() that appears in >> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this >> until we're consistent. Also, instead of passing >> MultiXactState->oldestMultiXactId, pass the newer of that value and >> the earliest offset that exists on disk. That way, it won't try to >> read data that's not there. > > Perhaps emit a LOG message when we do that, since it's our last opportunity to > point to the potential data loss? If the problem is just that somebody minmxid got set to 1 instead of the real value, I think that there is no data loss, because none of those older values are actually present there. But we could add a LOG message anyway. How do you suggest that we phrase that? >> + * PostgreSQL 9.3.0 through 9.3.6 and PostgreSQL 9.4.0 through 9.4.1 >> + * had bugs that could allow users who reached those release through > > s/release/releases/ Fixed. >> @@ -2859,6 +2947,14 @@ TruncateMultiXact(void) >> SimpleLruTruncate(MultiXactOffsetCtl, >> >> MultiXactIdToOffsetPage(oldestMXact)); >> >> + /* Update oldest-on-disk value in shared memory. */ >> + earliest = range.rangeStart * MULTIXACT_OFFSETS_PER_PAGE; >> + if (earliest < FirstMultiXactId) >> + earliest = FirstMultiXactId; >> + LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE); >> + Assert(MultiXactState->oldestMultiXactOnDiskValid); >> + MultiXactState->oldestMultiXactOnDiskValid = earliest; > > That last line needs s/Valid//, I presume. Is it okay that > oldestMultiXactOnDisk becomes too-old during TruncateMultiXact(), despite its > Valid indicator remaining true? Ay yai
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Mon, Jun 1, 2015 at 4:58 AM, Andres Freund wrote: >> I'm probably biased here, but I think we should finish reviewing, >> testing, and committing my patch before we embark on designing this. > > Probably, yes. I am wondering whether doing this immediately won't end > up making some things simpler and more robust though. I'm open to being convinced of that, but as of this moment I'm not seeing any clear-cut evidence that we need to go so far. >> So far we have no reports of trouble attributable to the lack of the >> WAL-logging support discussed here, as opposed to several reports of >> trouble from the status quo within days of release. > > The lack of WAL logging actually has caused problems in the 9.3.3 (?) > era, where we didn't do any truncation during recovery... Right, but now we're piggybacking on the checkpoint records, and I don't have any evidence that this approach can't be made robust. It's possible that it can't be made robust, but that's not currently clear. >> By the time we've reached the minimum recovery point, they will have >> been recreated by the same WAL records that created them in the first >> place. > > I'm not sure that's true. I think we could end up errorneously removing > files that were included in the base backup. Anyway, let's focus on your > patch for now. OK, but I am interested in discussing the other thing too. I just can't piece together the scenario myself - there may well be one. The base backup will begin replay from the checkpoint caused by pg_start_backup() and remove anything that wasn't there at the start of the backup. But all of that stuff should get recreated by the time we reach the minimum recovery point (end of backup). >> If, in the previous >> replay, we had wrapped all the way around, some of the stuff we keep >> may actually already have been overwritten by future WAL records, but >> they'll be overwritten again. Now, that could mess up our >> determination of which members to remove, I guess, but I'm not clear >> that actually matters either: if the offsets space has wrapped around, >> the members space will certainly have wrapped around as well, so we >> can remove anything we like at this stage and we're still OK. I agree >> this is ugly the way it is, but where is the actual bug? > > I'm more worried about the cases where we didn't ever actually "badly > wrap around" (i.e. overwrite needed data); but where that's not clear on > the standby because the base backup isn't in a consistent state. I agree. The current patch tries to make it so that we never call find_multixact_start() while in recovery, but it doesn't quite succeed: the call in TruncateMultiXact still happens during recovery, but only once we're sure that the mxact we plan to call it on actually exists on disk. That won't be called until we replay the first checkpoint, but that might still be prior to consistency. Since I forgot to attach the revised patch with fixes for the points Noah mentioned to that email, here it is attached to this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit d33b4eb0167f465edb00bd6c0e1bcaa67dd69fe9 Author: Robert Haas Date: Fri May 29 14:35:53 2015 -0400 foo diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..aca829d 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -199,8 +199,9 @@ typedef struct MultiXactStateData MultiXactOffset nextOffset; /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that may still be referenced from a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; @@ -213,6 +214,18 @@ typedef struct MultiXactStateData */ MultiXactId lastCheckpointedOldest; + /* + * This is the oldest file that actually exist on the disk. This value + * is initialized by scanning pg_multixact/offsets, and subsequently + * updated each time we complete a truncation. We need a flag to + * indicate whether this has been initialized yet. + */ + MultiXactId oldestMultiXactOnDisk; + bool oldestMultiXactOnDiskValid; + + /* Has TrimMultiXact been called yet? */ + bool didTrimMultiXact; + /* support for anti-wraparound measures */ MultiXactId multiVacLimit; MultiXactId multiWarnLimit; @@ -344,6 +357,8 @@ static char *mxstatus_to_string(MultiXactStatus status); /* management of SLRU infrastructure */ static int ZeroMultiXactOffsetPage(int page
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 1:21 AM, Noah Misch wrote: > On Mon, Jun 01, 2015 at 02:06:05PM -0400, Robert Haas wrote: >> On Mon, Jun 1, 2015 at 12:46 AM, Noah Misch wrote: >> > On Fri, May 29, 2015 at 03:08:11PM -0400, Robert Haas wrote: >> >> SetMultiXactIdLimit() bracketed certain parts of its >> >> logic with if (!InRecovery), but those guards were ineffective because >> >> it gets called before InRecovery is set in the first place. >> > >> > SetTransactionIdLimit() checks InRecovery for the same things, and it is >> > called at nearly the same moments as SetMultiXactIdLimit(). Do you have a >> > sense of whether it is subject to similar problems as a result? >> >> Well, I think it's pretty weird that those things will get done before >> beginning recovery, even on an inconsistent cluster, but not during >> recovery. That is pretty strange. I don't see that it can actually >> do any worse than emit a few log messages at the start of recovery >> that won't show up again until the end of recovery, though. > > Granted. Would it be better to update both functions at the same time, and > perhaps to make that a master-only change? Does the status quo cause more > practical harm via SetMultiXactIdLimit() than via SetTransactionIdLimit()? It does in the case of the call to find_multixact_start(). If that fails, we take the server down for no good reason, as demonstrated by the original report. I'll revert the changes to the other two things in this function that I changed to be protected by did_trim. There's no special reason to think that's a necessary change. >> >> 1. Moves the call to DetermineSafeOldestOffset() that appears in >> >> StartupMultiXact() to TrimMultiXact(), so that we don't try to do this >> >> until we're consistent. Also, instead of passing >> >> MultiXactState->oldestMultiXactId, pass the newer of that value and >> >> the earliest offset that exists on disk. That way, it won't try to >> >> read data that's not there. >> > >> > Perhaps emit a LOG message when we do that, since it's our last >> > opportunity to >> > point to the potential data loss? >> >> If the problem is just that somebody minmxid got set to 1 instead of >> the real value, I think that there is no data loss, because none of >> those older values are actually present there. But we could add a LOG >> message anyway. How do you suggest that we phrase that? > > There's no data loss if 1 <= true_minmxid <= 2^31 at the time minmxid got set > to 1. Otherwise, data loss is possible. Yes, but in that scenario, the log message you propose wouldn't be triggered. If true_minmxid > 2^31, then the stored minmxid will not precede the files on disk; it will follow it (assuming the older stuff hasn't been truncated, as is likely). So the message would be essentially: LOG: you didn't lose data. but if exactly the opposite of what this message is telling you about had happened, then you would have. DETAIL: Have a nice day. > I don't hope for an actionable > message, but we might want a reporter to grep logs for it when we diagnose > future reports. Perhaps this: > > "missing pg_multixact/members files; begins at MultiXactId %u, expected %u" This seems misleading. In the known failure case, it's not that the pg_multixact files are unexpectedly missing; it's that we incorrectly think that they should still be there. Maybe: oldest MultiXactId on disk %u follows expected oldest MultiXact %u > For good measure, perhaps emit this when lastCheckpointedOldest > earliest by > more than one segment: > > "excess pg_multixact/members files; begins at MultiXactId %u, expected %u" So, this scenario will happen whenever the system was interrupted in the middle of a truncation, or when the system is started from a base backup and a truncation happened after files were copied. I'm wary of giving users the idea that this is an atypical event. Perhaps a message at DEBUG1? >> I'm not sure what you mean about it becoming too old. At least with >> that fix, it should get updated to exactly the first file that we >> didn't remove. Isn't that right? > > Consider a function raw_GOMXOD() that differs from GetOldestMultiXactOnDisk() > only in that it never reads or writes the cache. I might expect > oldestMultiXactOnDisk==raw_GOMXOD() if oldestMultiXactOnDiskValid, and that > does hold most of the time. It does not always hold between the start of the > quoted code's SimpleLruTruncate() and its oldestMultiXactOnDisk assignment. > That&
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 8:56 AM, Andres Freund wrote: > But what *definitely* looks wrong to me is that a TruncateMultiXact() in > this scenario now (since a couple weeks ago) does a > SimpleLruReadPage_ReadOnly() in the members slru via > find_multixact_start(). That just won't work acceptably when we're not > yet consistent. There very well could not be a valid members segment at > that point? Am I missing something? Yes: that code isn't new. TruncateMultiXact() called SimpleLruReadPage_ReadOnly() directly in 9.3.0 and every subsequent release until 9.3.7/9.4.2. The only thing that's changed is that we've moved that logic into a function called find_multixact_start() instead of having it directly inside that function. We did that because we needed to use the same logic in some other places. The reason why 9.3.7/9.4.2 are causing problems for people that they didn't have previously is because those new, additional call sites were poorly chosen and didn't include adequate protection against calling that function with an invalid input value. What this patch is about is getting back to the situation that we were in from 9.3.0 - 9.3.6 and 9.4.0 - 9.4.1, where TruncateMultiXact() did the thing that you're complaining about here but no one else did. >From my point of view, I think that you are absolutely right to question what's going on in TruncateMultiXact(). It's kooky, and there may well be bugs buried there. But I don't think fixing that should be the priority right now, because we have zero reports of problems attributable to that logic. I think the priority should be on undoing the damage that we did in 9.3.7/9.4.2, when we made other places to do the same thing. We started getting trouble reports attributable to those changes *almost immediately*, which means that whether or not TruncateMultiXact() is broken, these new call sites definitely are. I think we really need to fix those new places ASAP. > I think at the very least we'll have to skip this step while not yet > consistent. That really sucks, because we'll possibly end up with > multixacts that are completely filled by the time we've reached > consistency. That would be a departure from the behavior of every existing release that includes this code based on, to my knowledge, zero trouble reports. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 11:27 AM, Andres Freund wrote: > On 2015-06-02 11:16:22 -0400, Robert Haas wrote: >> I'm having trouble figuring out what to do about this. I mean, the >> essential principle of this patch is that if we can't count on >> relminmxid, datminmxid, or the control file to be accurate, we can at >> least look at what is present on the disk. If we also cannot count on >> that to be accurate, we are left without any reliable source of >> information. Consider a hypothetical cluster where all our stored >> minmxids of whatever form are corrupted (say, all change to 1) and in >> addition there are stray files in pg_multixact. I don't think there's >> really any way to get ourselves out of trouble in that scenario. > > If we were to truncate after vacuum, and only on the primary (via WAL > logging), we could, afaics, just rely on all the values to be > recomputed. I mean relminmxid will be recomputed after a vacuum, and > thus, after some time, will datminmxid and the control file value. We > could just force a value of 1 to always trigger anti-wraparound vacuums > (or wait for that to happen implicitly, to delay the impact?). That'll > then should then fix the problem in a relatively short amount of time? The exact circumstances under which we're willing to replace a relminmxid with a newly-computed one that differs are not altogether clear to me, but there's an "if" statement protecting that logic, so there are some circumstances in which we'll leave the existing value intact. If we force non-stop vacuuming in that scenario, autovacuum will just run like crazy without accomplishing anything, which wouldn't be good. It would similarly do so when the oldest MXID reference in the relation is in fact 1, but that value can't be vacuumed away yet. Also, the database might be really big. Even if it were true that a full scan of every table would get us out of this state, describing the time that it would take to do that as "relatively short" seems to me to be considerably understating the impact of a full-cluster VACUUM. With regard to the more general question of WAL-logging this, are you going to work on that? Are you hoping Alvaro or I will work on that? Should we draw straws? It seems like somebody needs to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 11:36 AM, Andres Freund wrote: >> That would be a departure from the behavior of every existing release >> that includes this code based on, to my knowledge, zero trouble >> reports. > > On the other hand we're now at about bug #5 attributeable to the odd way > truncation works for multixacts. It's obviously complex and hard to get > right. It makes it harder to cope with the wrong values left in > datminxid etc. So I'm still wondering whether fixing this for good isn't > the better approach. It may well be. But I think we should do something more surgical first. Perhaps we can justify the pain and risk of making changes to the WAL format in the back-branches, but let's not do it in a rush. If we can get this patch to a state where it undoes the damage inflicted in 9.3.7/9.4.2, then we will be in a state where we have as much reliability as we had in 9.3.6 plus the protections against member-space wraparound added in 9.3.7 - which, like the patch I'm proposing now, were directly motivated by multiple, independent bug reports. That seems like a good place to get to. If nothing else, it will buy us some time to figure out what else we want to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 11:44 AM, Andres Freund wrote: > On 2015-06-02 11:37:02 -0400, Robert Haas wrote: >> The exact circumstances under which we're willing to replace a >> relminmxid with a newly-computed one that differs are not altogether >> clear to me, but there's an "if" statement protecting that logic, so >> there are some circumstances in which we'll leave the existing value >> intact. > > I guess we'd have to change that then. Yeah, but first we'd need to assess why it's like that. Tom was the one who installed the current logic, but I haven't been able to fully understand it. >> It would similarly do so when the oldest MXID reference in the >> relation is in fact 1, but that value can't be vacuumed away yet. > > I'd thought of just forcing consumption of one multixact in that > case. Not pretty, but imo acceptable. What if multixact 1 still has living members? >> Also, the database might be really big. Even if it were true that a >> full scan of every table would get us out of this state, describing >> the time that it would take to do that as "relatively short" seems to >> me to be considerably understating the impact of a full-cluster >> VACUUM. > > Well. We're dealing with a corrupted cluster. Having a way out that's > done automatically, even if it takes a long while, is pretty good > already. In many cases the corruption (i.e. pg_upgrade) happened long > ago, so the table's relminmxid will already have been recomputed. I > think that's acceptable. I'm a long way from being convinced the automated recovery is possible. There are so many different scenarios here that it's very difficult to reason generally about what the "right" thing to do is. I agree it would be nice if we had it, though. >> With regard to the more general question of WAL-logging this, are you >> going to work on that? Are you hoping Alvaro or I will work on that? >> Should we draw straws? It seems like somebody needs to do it. > > I'm willing to invest the time to develop an initial version, but I'll > need help evaluating it. I don't have many testing resources available > atm, and I'm not going to trust stuff I developed while travelling by > just looking at the code. I'm willing to help with that. Hopefully I'm not the only one, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 4:19 PM, Andres Freund wrote: > I'm not really convinced tying things closer to having done trimming is > easier to understand than tying things to recovery having finished. > > E.g. > if (did_trim) > oldestOffset = GetOldestReferencedOffset(oldest_datminmxid); > imo is harder to understand than if (!InRecovery). > > Maybe we could just name it finishedStartup and rename the functions > accordingly? Basing that particular call site on InRecovery doesn't work, because InRecovery isn't set early enough. But I'm fine to rename it to whatever. > Maybe it's worthwhile to add a 'NB: At this stage the data directory is > not yet necessarily consistent' StartupMultiXact's comments, to avoid > reintroducing problems like this? Sure. >> /* >> + * We can read this without a lock, because it only changes when >> nothing >> + * else is running. >> + */ >> + did_trim = MultiXactState->didTrimMultiXact; > > Err, Hot Standby? It might be ok to not lock, but the comment is > definitely wrong. I'm inclined to simply use locking, this doesn't look > sufficiently critical performancewise. /me nods. Good point. > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning > the disk it'll always get one at a segment boundary, right? I'm not sure > that's actually ok; because the value at the beginning of the segment > can very well end up being a 0, as MaybeExtendOffsetSlru() will have > filled the page with zeros. > > I think that should be harmless, the worst that can happen is that > oldestOffset errorneously is 0, which should be correct, even though > possibly overly conservative, in these cases. Uh oh. That seems like a real bad problem for this approach. What keeps that from being the opposite of too conservative? There's no "safe" value in a circular numbering space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, Jun 3, 2015 at 4:48 AM, Thomas Munro wrote: > On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera > wrote: >> Thomas Munro wrote: >>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera >>> wrote: >>> > My guess is that the file existed, and perhaps had one or more pages, >>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes >>> > back. read() returns 0 in this case but doesn't set errno. >>> > >>> > I didn't find a way to set things so that the file exists but is of >>> > shorter contents than oldestMulti by the time the checkpoint record is >>> > replayed. >>> >>> I'm just starting to learn about the recovery machinery, so forgive me >>> if I'm missing something basic here, but I just don't get this. As I >>> understand it, offsets/0046 should either have been copied with that >>> page present in it if it existed before the backup started (apparently >>> not in this case), or extended to contain it by WAL records that come >>> after the backup label but before the checkpoint record that >>> references it (also apparently not in this case). >> >> Exactly --- that's the spot at which I am, also. I have had this >> spinning in my head for three days now, and tried every single variation >> that I could think of, but like you I was unable to reproduce the issue. >> However, our customer took a second base backup and it failed in exactly >> the same way, module some changes to the counters (the file that >> didn't exist was 004B rather than 0046). I'm still at a loss at what >> the failure mode is. We must be missing some crucial detail ... > > I have finally reproduced that error! See attached repro shell script. > > The conditions are: > > 1. next multixact == oldest multixact (no active multixacts, pointing > past the end) > 2. next multixact would be the first item on a new page (multixact % 2048 == > 0) > 3. the page must not be the first in a segment (or we'd get the > read-zeroes case) > > That gives you odds of 1/2048 * 31/32 * (probability of a wraparound > vacuum followed by no multixact creations right before your backup > checkpoint). That seems like reasonably low odds... if it happened > twice in a row, maybe I'm missing something here and there is some > other way to get this... > > I realise now that this is actually a symptom of a problem spotted by > Noah recently: > > http://www.postgresql.org/message-id/20150601045534.gb23...@tornado.leadboat.com > > He noticed the problem for segment boundaries, when not in recovery. > In recovery, segment boundaries don't raise an error (the read-zeroes > case applies), but page boundaries do. The fix is probably to do > nothing if they are the same, as we do elsewhere, like in the attached > patch. Actually, we still need to call DetermineSafeOldestOffset in that case. Otherwise, if someone goes from lots of multixacts to none, the stop point won't advance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund wrote: >> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning >> > the disk it'll always get one at a segment boundary, right? I'm not sure >> > that's actually ok; because the value at the beginning of the segment >> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have >> > filled the page with zeros. >> > >> > I think that should be harmless, the worst that can happen is that >> > oldestOffset errorneously is 0, which should be correct, even though >> > possibly overly conservative, in these cases. >> >> Uh oh. That seems like a real bad problem for this approach. What >> keeps that from being the opposite of too conservative? There's no >> "safe" value in a circular numbering space. > > I think it *might* (I'm really jetlagged) be fine because that'll only > happen after a upgrade from < 9.3. And in that case we initialize > nextOffset to 0. That ought to safe us? That's pretty much betting the farm on the bugs we know about today being the only ones there are. That seems imprudent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Wed, Jun 3, 2015 at 8:24 AM, Robert Haas wrote: > On Tue, Jun 2, 2015 at 5:22 PM, Andres Freund wrote: >>> > Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning >>> > the disk it'll always get one at a segment boundary, right? I'm not sure >>> > that's actually ok; because the value at the beginning of the segment >>> > can very well end up being a 0, as MaybeExtendOffsetSlru() will have >>> > filled the page with zeros. >>> > >>> > I think that should be harmless, the worst that can happen is that >>> > oldestOffset errorneously is 0, which should be correct, even though >>> > possibly overly conservative, in these cases. >>> >>> Uh oh. That seems like a real bad problem for this approach. What >>> keeps that from being the opposite of too conservative? There's no >>> "safe" value in a circular numbering space. >> >> I think it *might* (I'm really jetlagged) be fine because that'll only >> happen after a upgrade from < 9.3. And in that case we initialize >> nextOffset to 0. That ought to safe us? > > That's pretty much betting the farm on the bugs we know about today > being the only ones there are. That seems imprudent. So here's a patch taking a different approach. In this approach, if the multixact whose members we want to look up doesn't exist, we don't use a later one (that might or might not be valid). Instead, we attempt to cope with the unknown. That means: 1. In TruncateMultiXact(), we don't truncate. 2. If setting the offset stop limit (the point where we refuse to create new multixact space), we don't arm the stop point. This means that if you're in this situation, you run without member wraparound protection until it's corrected. A message gets logged once per checkpoint telling you that you have this problem, and another message gets logged when things get straightened out and the guards are enabled. 3. If setting the vacuum force point, we assume that it's appropriate to immediately force vacuum. I've only tested this very lightly - this is just to see what you and Noah and others think of the approach. As compared with the previous approach, it has the advantage of making minimal assumptions about the sanity of what's on disk. It has the disadvantage that, for some people, the member-wraparound guard won't be enabled at startup -- but note that those people can't start 9.3.7/9.4.2 *at all*, so currently they are either running without member wraparound protection anyway (if they haven't upgraded to those releases) or they're down entirely. Another disadvantage is that we'll be triggering what may be quite a bit of autovacuum activity for some people, which could be painful. On the plus side, they'll hopefully end up with sane relminmxid and datminmxid guards afterwards. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..4400fc5 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -198,13 +198,24 @@ typedef struct MultiXactStateData /* next-to-be-assigned offset */ MultiXactOffset nextOffset; + /* Have we completed multixact startup? */ + bool finishedStartup; + /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that is still potentially referenced by a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; + + /* + * Oldest multixact offset that is potentially referenced by a + * multixact referenced by a relation. We don't always know this value, + * so there's a flag here to indicate whether or not we currently do. + */ MultiXactOffset oldestOffset; + bool oldestOffsetKnown; /* * This is what the previous checkpoint stored as the truncate position. @@ -221,6 +232,7 @@ typedef struct MultiXactStateData /* support for members anti-wraparound measures */ MultiXactOffset offsetStopLimit; + bool offsetStopLimitKnown; /* * Per-backend data starts here. We have two arrays stored in the area @@ -350,10 +362,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2); static void ExtendMultiXactOffset(MultiXactId multi); static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers); -static void DetermineSafeOldestOffset(MultiXactId oldestMXact); +static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact); stat
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 2:42 AM, Noah Misch wrote: > I like that change a lot. It's much easier to seek forgiveness for wasting <= > 28 GiB of disk than for deleting visibility information wrongly. I'm glad you like it. I concur. >> 2. If setting the offset stop limit (the point where we refuse to >> create new multixact space), we don't arm the stop point. This means >> that if you're in this situation, you run without member wraparound >> protection until it's corrected. A message gets logged once per >> checkpoint telling you that you have this problem, and another message >> gets logged when things get straightened out and the guards are >> enabled. >> >> 3. If setting the vacuum force point, we assume that it's appropriate >> to immediately force vacuum. > > Those seem reasonable, too. Cool. >> I've only tested this very lightly - this is just to see what you and >> Noah and others think of the approach. As compared with the previous >> approach, it has the advantage of making minimal assumptions about the >> sanity of what's on disk. It has the disadvantage that, for some >> people, the member-wraparound guard won't be enabled at startup -- but >> note that those people can't start 9.3.7/9.4.2 *at all*, so currently >> they are either running without member wraparound protection anyway >> (if they haven't upgraded to those releases) or they're down entirely. > > That disadvantage is negligible, considering. All right. >> Another disadvantage is that we'll be triggering what may be quite a >> bit of autovacuum activity for some people, which could be painful. >> On the plus side, they'll hopefully end up with sane relminmxid and >> datminmxid guards afterwards. > > That sounds good so long as each table requires just one successful emergency > autovacuum. I'm not seeing code to ensure that the launched autovacuum will > indeed perform a full-table scan and update relminmxid; is it there? No. Oops. > For sites that can't tolerate an autovacuum storm, what alternative can we > provide? Is "SET vacuum_multixact_freeze_table_age = 0; VACUUM " of > every table, done before applying the minor update, sufficient? I don't know. In practical terms, they probably need to ensure that if pg_multixact/offsets/ does not exist, no relations have relminmxid = 1 and no remaining databases have datminmxid = 1. Exactly what it will take to get there is possibly dependent on which minor release you are running; on current minor releases, I am hopeful that what you propose is sufficient. >> static void >> -DetermineSafeOldestOffset(MultiXactId oldestMXact) >> +DetermineSafeOldestOffset(MultiXactOffset oldestMXact) > > Leftover change from an earlier iteration? The values passed continue to be > MultiXactId values. Oopsie. >> /* move back to start of the corresponding segment */ >> - oldestOffset -= oldestOffset % >> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); >> + offsetStopLimit = oldestOffset - (oldestOffset % >> + (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT)); >> + /* always leave one segment before the wraparound point */ >> + offsetStopLimit -= (MULTIXACT_MEMBERS_PER_PAGE * >> SLRU_PAGES_PER_SEGMENT); >> + >> + /* if nothing has changed, we're done */ >> + if (prevOffsetStopLimitKnown && offsetStopLimit == prevOffsetStopLimit) >> + return; >> >> LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE); >> - /* always leave one segment before the wraparound point */ >> - MultiXactState->offsetStopLimit = oldestOffset - >> - (MULTIXACT_MEMBERS_PER_PAGE * SLRU_PAGES_PER_SEGMENT); >> + MultiXactState->offsetStopLimit = oldestOffset; > > That last line needs s/oldestOffset/offsetStopLimit/, I presume. Another oops. Thanks for the review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas wrote: > Thanks for the review. Here's a new version. I've fixed the things Alvaro and Noah noted, and some compiler warnings about set but unused variables. I also tested it, and it doesn't quite work as hoped. If started on a cluster where oldestMultiXid is incorrectly set to 1, it starts up and indicates that the member wraparound guards are disabled. But even after everything is fixed, they don't get enabled until after the next full restart. I think that's because TruncateMultiXact() bails out too early, without calling DetermineSafeOldestOffset. My attempt at a quick fix for that problem didn't work out, so I'm posting this version for now to facilitate further review and testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit eb39cf10e4ff853ed4b9d3fab599cf42911e6f70 Author: Robert Haas Date: Thu Jun 4 11:58:49 2015 -0400 bar diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 699497c..209d3e6 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -196,13 +196,24 @@ typedef struct MultiXactStateData /* next-to-be-assigned offset */ MultiXactOffset nextOffset; + /* Have we completed multixact startup? */ + bool finishedStartup; + /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that is still potentially referenced by a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; + + /* + * Oldest multixact offset that is potentially referenced by a + * multixact referenced by a relation. We don't always know this value, + * so there's a flag here to indicate whether or not we currently do. + */ MultiXactOffset oldestOffset; + bool oldestOffsetKnown; /* * This is what the previous checkpoint stored as the truncate position. @@ -219,6 +230,7 @@ typedef struct MultiXactStateData /* support for members anti-wraparound measures */ MultiXactOffset offsetStopLimit; + bool offsetStopLimitKnown; /* * Per-backend data starts here. We have two arrays stored in the area @@ -348,10 +360,11 @@ static bool MultiXactOffsetPrecedes(MultiXactOffset offset1, MultiXactOffset offset2); static void ExtendMultiXactOffset(MultiXactId multi); static void ExtendMultiXactMember(MultiXactOffset offset, int nmembers); -static void DetermineSafeOldestOffset(MultiXactId oldestMXact); +static void DetermineSafeOldestOffset(MultiXactOffset oldestMXact); static bool MultiXactOffsetWouldWrap(MultiXactOffset boundary, MultiXactOffset start, uint32 distance); -static MultiXactOffset find_multixact_start(MultiXactId multi); +static bool SetOffsetVacuumLimit(bool finish_setup); +static bool find_multixact_start(MultiXactId multi, MultiXactOffset *result); static void WriteMZeroPageXlogRec(int pageno, uint8 info); @@ -960,7 +973,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) * against catastrophic data loss due to multixact wraparound. The basic * rules are: * - * If we're past multiVacLimit or the safe threshold for member storage space, + * If we're past multiVacLimit or the safe threshold for member storage + * space, or we don't know what the safe threshold for member storage is, * start trying to force autovacuum cycles. * If we're past multiWarnLimit, start issuing warnings. * If we're past multiStopLimit, refuse to create new MultiXactIds. @@ -969,6 +983,7 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) *-- */ if (!MultiXactIdPrecedes(result, MultiXactState->multiVacLimit) || + !MultiXactState->oldestOffsetKnown || (MultiXactState->nextOffset - MultiXactState->oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD)) { @@ -1083,7 +1098,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) *-- */ #define OFFSET_WARN_SEGMENTS 20 - if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset, + if (MultiXactState->offsetStopLimitKnown && + MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, nextOffset, nmembers)) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), @@ -1095,7 +,8 @@ GetNewMultiXactId(int nmembers, MultiXactOffset *offset) MultiXactState->offsetStopLimit - nextOffset - 1), errhint("Execute a database-wide VACUUM in database with OID %u with reduced vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age settings.", MultiXactState->oldestMultiXactDB))); - else if (MultiXactOffsetWouldWrap(MultiXactState->offsetStopLimit, + else if (M
Re: [HACKERS] Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 1:27 PM, Andres Freund wrote: > On 2015-06-04 12:57:42 -0400, Robert Haas wrote: >> + /* >> + * Do we need an emergency autovacuum? If we're not sure, assume yes. >> + */ >> + return !oldestOffsetKnown || >> + (nextOffset - oldestOffset > MULTIXACT_MEMBER_SAFE_THRESHOLD); > > I think without teaching autovac about those rules, this might just lead > to lots of autovac processes starting without knowing they should do > something? They know about autovacuum_multixact_freeze_age, but they > know neither about !oldestOffsetKnown nor, afaics, about pending offset > wraparounds? You're right, but that's why the latest patch has changes in MultiXactMemberFreezeThreshold. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 12:57 PM, Robert Haas wrote: > On Thu, Jun 4, 2015 at 9:42 AM, Robert Haas wrote: >> Thanks for the review. > > Here's a new version. I've fixed the things Alvaro and Noah noted, > and some compiler warnings about set but unused variables. > > I also tested it, and it doesn't quite work as hoped. If started on a > cluster where oldestMultiXid is incorrectly set to 1, it starts up and > indicates that the member wraparound guards are disabled. But even > after everything is fixed, they don't get enabled until after the next > full restart. I think that's because TruncateMultiXact() bails out > too early, without calling DetermineSafeOldestOffset. > > My attempt at a quick fix for that problem didn't work out, so I'm > posting this version for now to facilitate further review and testing. Here's a new version with some more fixes and improvements: - SetOffsetVacuumLimit was failing to set MultiXactState->oldestOffset when the oldest offset became known if the now-known value happened to be zero. Fixed. - SetOffsetVacuumLimit now logs useful information at the DEBUG1 level, so that you can see that it's doing what it's supposed to. - TruncateMultiXact now calls DetermineSafeOldestOffset to adjust the offsetStopLimit even if it can't truncate anything. This seems useless, but it's not, because it may be that the last checkpoint advanced lastCheckpointedOldest from a bogus value (i.e. 1) to a real value, and now we can actually set offsetStopLimit properly. - TruncateMultiXact no longer calls find_multixact_start when there are no remaining multixacts. This is actually a completely separate bug that goes all the way back to 9.3.0 and can potentially cause TruncateMultiXact to remove every file in pg_multixact/offsets. Restarting the cluster becomes impossible because TrimMultiXact barfs. - TruncateMultiXact now logs a message if the oldest multixact does not precede the earliest one on disk and is not equal to the next multixact and yet does not exist. The value of the log message is that it discovered the bug mentioned in the previous line, so I think it's earning its keep. With this version, I'm able to see that when you start up a 9.3.latest+this patch with a cluster that has a bogus value of 1 in relminmxid, datminmxid, and the control file, autovacuum vacuums everything in sight, all the values get set back to the right thing, and the next checkpoint enables the member-wraparound guards. This works with both autovacuum=on and autovacuum=off; the emergency mechanism kicks in as intended. We'll want to warn people with big databases who upgrade to 9.3.0 - 9.3.4 via pg_upgrade that they may want to pre-vacuum those tables before upgrading to avoid a vacuum storm. But generally I'm pretty happy with this: forcing those values to get fixed so that we can guard against member-space wraparound seems like the right thing to do. So, to summarize, this patch does the following: - Fixes the failure-to-start problems introduced in 9.4.2 in complicated pg_upgrade scenarios. - Prevents the new calls to find_multixact_start we added in 9.4.2 from happening during recovery, where they can only create failure scenarios. The call in TruncateMultiXact that has been there all along is not eliminated, but now handles failure more gracefully. - Fixes possible incorrect removal of every single pg_multixact/offsets file when no multixacts exist; one file should be kept. - Forces aggressive autovacuuming when the control file's oldestMultiXid doesn't point to a valid MultiXact and enables member wraparound at the next checkpoint following the correction of that problem. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company commit 87aa15fe5257060e0c971e135dd9f460fdc00bd0 Author: Robert Haas Date: Thu Jun 4 11:58:49 2015 -0400 bar diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..7c457a6 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -198,13 +198,24 @@ typedef struct MultiXactStateData /* next-to-be-assigned offset */ MultiXactOffset nextOffset; + /* Have we completed multixact startup? */ + bool finishedStartup; + /* - * Oldest multixact that is still on disk. Anything older than this - * should not be consulted. These values are updated by vacuum. + * Oldest multixact that is still potentially referenced by a relation. + * Anything older than this should not be consulted. These values are + * updated by vacuum. */ MultiXactId oldestMultiXactId; Oid oldestMultiXactDB; + + /* + * Oldest multixact offset that is potentially referenced by a + * multixact referenced by a relation. We don't always know this value, + * so there's a flag here to ind
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Thu, Jun 4, 2015 at 5:29 PM, Robert Haas wrote: > - Forces aggressive autovacuuming when the control file's > oldestMultiXid doesn't point to a valid MultiXact and enables member > wraparound at the next checkpoint following the correction of that > problem. Err, enables member wraparound *protection* at the next checkpoint, not the wraparound itself. It's worth noting that every startup will now include one of the following two messages: LOG: MultiXact member wraparound protections are now enabled Or: LOG: MultiXact member wraparound protections are disabled because oldest checkpointed MultiXact %u does not exist on disk ...where %u is probably 1 If you get the second one, you'll get the first one later after vacuum has done its thing and a checkpoint has happened. This is, obviously, some log chatter for people who don't have a problem and never have, but I think it's worth emitting the first message at startup even when there's no problem, so that people don't have to make inferences from the absence of a message. We can tell people very simply that (1) if they see the first message, everything is fine; (2) if they see the second message, autovacuum is going to clean things up and they will eventually see the first message; and (3) if they see neither message, they haven't upgraded to a fixed version yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch wrote: > On Thu, Jun 04, 2015 at 05:29:51PM -0400, Robert Haas wrote: >> Here's a new version with some more fixes and improvements: > > I read through this version and found nothing to change. I encourage other > hackers to study the patch, though. The surrounding code is challenging. Andres tested this and discovered that my changes to find_multixact_start() were far more creative than intended. Committed and back-patched with a trivial fix for that stupidity and a novel-length explanation of the changes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 12:00 PM, Andres Freund wrote: > On 2015-06-05 11:43:45 -0400, Tom Lane wrote: >> Robert Haas writes: >> > On Fri, Jun 5, 2015 at 2:20 AM, Noah Misch wrote: >> >> I read through this version and found nothing to change. I encourage >> >> other >> >> hackers to study the patch, though. The surrounding code is challenging. >> >> > Andres tested this and discovered that my changes to >> > find_multixact_start() were far more creative than intended. >> > Committed and back-patched with a trivial fix for that stupidity and a >> > novel-length explanation of the changes. >> >> So where are we on this? Are we ready to schedule a new set of >> back-branch releases? If not, what issues remain to be looked at? > > We're currently still doing bad things while the database is in an > inconsistent state (i.e. read from SLRUs and truncate based on the > results of that). It's quite easy to reproduce base backup startup > failures. > > On the other hand, that's not new. And the fix requires, afaics, a new > type of WAL record (issued very infrequently). I'll post a first version > of the patch, rebased ontop of Robert's commit, tonight or tomorrow. I > guess we can then decide what we'd like to do. There are at least two other known issues that seem like they should be fixed before we release: 1. The problem that we might truncate an SLRU members page away when it's in the buffers, but not drop it from the buffers, leading to a failure when we try to write it later. 2. Thomas's bug fix for another longstanding but that occurs when you run his checkpoint-segment-boundary.sh script. I think we might want to try to fix one or both of those before cutting a new release. I'm less sold on the idea of installing WAL-logging in this minor release. That probably needs to be done, but right now we've got stuff that worked in early 9.3.X release and is now broken, and I'm in favor of fixing that first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund wrote: > On 2015-06-05 14:33:12 -0400, Tom Lane wrote: >> Robert Haas writes: >> > 1. The problem that we might truncate an SLRU members page away when >> > it's in the buffers, but not drop it from the buffers, leading to a >> > failure when we try to write it later. > > I've got a fix for this, and about three other issues I found during > development of the new truncation codepath. > > I'll commit the fix tomorrow. OK. Then I think we should release next week, so we get the fixes we have out before PGCon. The current situation is not good. >> > I think we might want to try to fix one or both of those before >> > cutting a new release. I'm less sold on the idea of installing >> > WAL-logging in this minor release. That probably needs to be done, >> > but right now we've got stuff that worked in early 9.3.X release and >> > is now broken, and I'm in favor of fixing that first. > > I've implemented this, and so far it removes more code than it > adds. It's imo also a pretty clear win in how understandable the code > is. The remaining work, besides testing, is primarily going over lots > of comment and updating them. Some of them are outdated by the patch, > and some already were. > > Will post tonight, together with the other fixes, after I get back from > climbing. > > My gut feeling right now is that it's a significant improvement, and > that it'll be reasonable to include it. But I'd definitely like some > independent testing for it, and I'm not sure if that's doable in time > for the wrap. I think we would be foolish to rush that part into the tree. We probably got here in the first place by rushing the last round of fixes too much; let's try not to double down on that mistake. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 2:36 PM, Alvaro Herrera wrote: > Tom Lane wrote: >> Robert Haas writes: > >> > There are at least two other known issues that seem like they should >> > be fixed before we release: >> >> > 1. The problem that we might truncate an SLRU members page away when >> > it's in the buffers, but not drop it from the buffers, leading to a >> > failure when we try to write it later. >> >> > 2. Thomas's bug fix for another longstanding but that occurs when you >> > run his checkpoint-segment-boundary.sh script. >> >> > I think we might want to try to fix one or both of those before >> > cutting a new release. I'm less sold on the idea of installing >> > WAL-logging in this minor release. That probably needs to be done, >> > but right now we've got stuff that worked in early 9.3.X release and >> > is now broken, and I'm in favor of fixing that first. >> >> Okay, but if we're not committing today to a release wrap on Monday, >> I don't see it happening till after PGCon. > > In that case, I think we should get a release out next week. The > current situation is rather badly broken and dangerous, and the above > two bugs are nowhere as problematic. If we can get fixes for these over > the weekend, that would be additional bonus. Yeah, I think I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 5, 2015 at 4:40 PM, Andres Freund wrote: >>I think we would be foolish to rush that part into the tree. We >>probably got here in the first place by rushing the last round of >>fixes too much; let's try not to double down on that mistake. > > My problem with that approach is that I think the code has gotten > significantly more complex in the least few weeks. I have very little trust > that the interactions between vacuum, the deferred truncations in the > checkpointer, the state management in shared memory and recovery are correct. > There's just too many non-local subtleties here. > > I don't know what the right thing to do here is. That may be true, but we don't need to get to perfect to be better than 9.4.2 and 9.4.3, where some people can't start the database. I will grant you that, if the patch I committed today introduces some regression that is even worse, life will suck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Mon, Jun 8, 2015 at 1:23 PM, Alvaro Herrera wrote: > Andres Freund wrote: >> On June 8, 2015 7:06:31 PM GMT+02:00, Alvaro Herrera >> wrote: >> >I might be misreading the code, but PMSIGNAL_START_AUTOVAC_LAUNCHER >> >only causes things to happen (i.e. a new worker to be started) when >> >autovacuum is disabled. If autovacuum is enabled, postmaster >> >receives the signal and doesn't do anything about it, because the >> >launcher is already running. Of course, regularly scheduled autovac >> >workers will eventually start running, but perhaps this is not good >> >enough. >> >> Well that's just the same for the plain xid precedent? I'd not mind >> improving further, but that seems like a separate thing. > > Sure. I just concern that we might be putting excessive trust on > emergency workers being launched at a high pace. With normally > configured systems (naptime=1min) it shouldn't be a problem, but we have > seen systems with naptime set to one hour or so, and those might feel > some pain; and it would get worse the more databases you have, because > people might feel the need to space the autovac runs even more. > > (My personal alarm bells go off when I see autovac_naptime=15min or > more, but apparently not everybody sees things that way.) Uh, I'd echo that sentiment if you did s/15min/1min/ I think Andres's patch is just improving the existing mechanism so that it's reliable, and you're proposing something notably different which might be better, but which is really a different proposal altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
On Fri, Jun 12, 2015 at 7:27 PM, Steve Kehlet wrote: > Just wanted to report that I rolled back my VM to where it was with 9.4.2 > installed and it wouldn't start. I installed 9.4.4 and now it starts up just > fine: > >> 2015-06-12 16:05:58 PDT [6453]: [1-1] LOG: database system was shut down >> at 2015-05-27 13:12:55 PDT >> 2015-06-12 16:05:58 PDT [6453]: [2-1] LOG: MultiXact member wraparound >> protections are disabled because oldest checkpointed MultiXact 1 does not >> exist on disk >> 2015-06-12 16:05:58 PDT [6457]: [1-1] LOG: autovacuum launcher started >> 2015-06-12 16:05:58 PDT [6452]: [1-1] LOG: database system is ready to >> accept connections >> done >> server started > > And this is showing up in my serverlog periodically as the emergency > autovacuums are running: > >> 2015-06-12 16:13:44 PDT [6454]: [1-1] LOG: MultiXact member wraparound >> protections are disabled because oldest checkpointed MultiXact 1 does not >> exist on disk > > **Thank you Robert and all involved for the resolution to this.** > >> With the fixes introduced in this release, such a situation will result in >> immediate emergency autovacuuming until a correct oldestMultiXid value can >> be determined > > Okay, I notice these vacuums are of the "to prevent wraparound" type (like > VACUUM FREEZE), that do hold locks preventing ALTER TABLEs and such. Good to > know, we'll plan our software updates accordingly. > > Is there any risk until these autovacuums finish? As long as you see only a modest number of files in pg_multixact/members, you're OK. But in theory, until that emergency autovacuuming finishes, there's nothing keeping that directory from wrapping around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Change in order of criteria - reg
On Wed, Jun 1, 2016 at 5:22 AM, Amit Langote wrote: > On 2016/06/01 13:07, sri harsha wrote: >> Hi, >> >> In PostgreSQL , does the order in which the criteria is given matter ?? >> For example >> >> Query 1 : Select * from TABLE where a > 5 and b < 10; >> >> Query 2 : Select * from TABLE where b <10 and a > 5; >> >> Are query 1 and query 2 the same in PostgreSQL or different ?? If its >> different , WHY ?? > > tl;dr they are the same. As in they obviously produce the same result and > result in invoking the same plan. > > Internally, optimizer will order application of those quals in resulting > plan based on per-tuple cost of individual quals. So a cheaper, more > selective qual might result in short-circuiting of relatively expensive > quals for a large number of rows in the table saving some cost in > run-time. Also, if index scan is chosen and quals pushed down, the > underlying index method might know to order quals smartly. > > However, the cost-markings of operators/functions involved in quals better > match reality. By default, most operators/functions in a database are > marked with cost of 1 unit. Stable sorting used in ordering of quals > would mean the order of applying quals in resulting plan matches the > original order (ie, the order in which they appear in the query). So, if > the first specified qual really happens to be an expensive qual but marked > as having the same cost as other less expensive quals, one would have to > pay the price of evaluating it for all the rows. Whereas, correctly > marking the costs could have avoided that (as explained above). Note that > I am not suggesting that ordering quals in query by their perceived cost > is the solution. Keep optimizer informed by setting costs appropriately > and it will do the right thing more often than not. :) I think that if the costs are actually identical, the system will keep the quals in the same order they were written - so then the order does matter, a little bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] PgQ and pg_dump
On Thu, Jun 16, 2016 at 1:46 PM, Martín Marqués wrote: > The comment is accurate on what is going to be dumpable and what's not > from the code. In our case, as the pgq schema is not dumpable becaause > it comes from an extension, other objects it contain will not be > dumpable as well. > > That's the reason why the PgQ event tables created by > pgq.create_queue() are not dumped. That sucks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] pg_dumping extensions having sequences with 9.6beta3
On Wed, Jul 27, 2016 at 2:24 AM, Michael Paquier wrote: > On Wed, Jul 27, 2016 at 8:07 AM, Stephen Frost wrote: >> That'd be great. It's definitely on my list of things to look into, but >> I'm extremely busy this week. I hope to look into it on Friday, would >> be great to see what you find. > > Sequences that are directly defined in extensions do not get dumped, > and sequences that are part of a serial column in an extension are > getting dumped. Looking into this problem, getOwnedSeqs() is visibly > doing an incorrect assumption: sequences owned by table columns are > dumped unconditionally, but this is not true for sequences that are > part of extensions. More precisely, dobj->dump is being enforced to > DUMP_COMPONENT_ALL, which makes the sequence definition to be dumped. > Oops. > > The patch attached fixes the problem for me. I have added as well > tests in test_pg_dump in the shape of sequences defined in an > extension, and sequences that are part of a serial column. This patch > is also able to work in the case where a sequence is created as part > of a serial column, and gets removed after, say with ALTER EXTENSION > DROP SEQUENCE. The behavior for sequences and serial columns that are > not part of extensions is unchanged. > > Stephen, it would be good if you could check the correctness of this > patch as you did all this refactoring of pg_dump to support catalog > ACLs. I am sure by the way that checking for (owning_tab->dobj.dump && > DUMP_COMPONENT_DEFINITION) != 0 is not good because of for example the > case of a serial column created in an extension where the sequence is > dropped from the extension afterwards. Stephen, is this still on your list of things for today? Please provide a status update. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations
On Thu, Mar 10, 2016 at 1:40 AM, David G. Johnston wrote: > Adding -hackers for consideration in the Commitfest. I don't much like how this patch uses the arbitrary constant 50 in no fewer than 5 locations. Also, it seems like we could arrange for head_title to be "" rather than NULL when myopt.title is NULL. Then instead of this: +if (head_title) +snprintf(title, strlen(myopt.title) + 50, + _("Watch every %lds\t%s\n%s"), + sleep, asctime(localtime(&timer)), head_title); +else +snprintf(title, 50, _("Watch every %lds\t%s"), + sleep, asctime(localtime(&timer))); ...we could just the first branch of that if all the time. if (res == -1) +{ +pg_free(title); +pg_free(head_title); return false; +} Instead of repeating the cleanup code, how about making this break; then, change the return statement at the bottom of the function to return (res != -1). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations
On Sun, Mar 20, 2016 at 9:31 AM, Michael Paquier wrote: > And the patch attached gives the following output: > With title: > =# \watch 1 > Watch every 1sSun Mar 20 22:28:38 2016 > popo > a > --- > 1 > (1 row) > > And without title: > Watch every 1sSun Mar 20 22:29:31 2016 > > a > --- > 1 > (1 row) And does everybody agree that this is a desirable change? As for the patch itself, you could replace all this: + /* +* Take into account any title present in the user setup as a part of +* what is printed for each iteration by using it as a header. +*/ + if (myopt.title) + { + title_len = strlen(myopt.title); + title = pg_malloc(title_len + 50); + head_title = pg_strdup(myopt.title); + } + else + { + title_len = 0; + title = pg_malloc(50); + head_title = pg_strdup(""); + } ...with: head_title = pg_strdup(myopt.title != NULL ? myopt.title : ""); title_len = strlen(head_title); title = pg_malloc(title_len + 50); Better yet, include the + 50 in title_len, and then you don't need to reference the number 50 again further down. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Request - repeat value of \pset title during \watch interations
On Mon, Mar 21, 2016 at 11:17 AM, David G. Johnston wrote: >> And does everybody agree that this is a desirable change? > > Adding the title is desirable. While I'm inclined to bike-shed this > anything that gets it in I can live with and so I'm content letting the > author/committer decide where exactly things (including whitespace) appear. > > It is a bit odd that the "Watch every %s" gets centered if the result is > wide but that the title remains left-aligned. Well, the title isn't normally centered, but yeah, that is odd. Yeah, that is odd. Come to think of it, I think I might have expected the title to appear *above* "Watch every %s", not below it. That might decrease the oddness. As for letting the committer decide, I don't care about this personally at all, so I'm only looking at it to be nice to the people who do. Whatever is the consensus is OK with me. I just don't want to get yelled at later for committing something here, so it would be nice to see a few votes for whatever we're gonna do here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Request - repeat value of \pset title during \watch interations
On Mon, Mar 21, 2016 at 2:09 PM, David G. Johnston wrote: > On Monday, March 21, 2016, Tom Lane wrote: >> "David G. Johnston" writes: >> > I'd rather not omit sleep but removing "Watch every" is fine (preferred >> > actually), so: >> > Title Is Here Mon Mar 21 15:05:06 2016 (5s) >> >> Meh ... seems a bit awkward to me. Couldn't you include " (5s)" in the >> title, if you want that info? If it's variable, you could still >> accommodate that: > > Actually, only if it's a variable that you setup and repeat and you show. A > bit cumbersome and mixes the parts that are title and those that are present > only because you are watching. Ah, come on. This doesn't really seem like an issue we should spend more time quibbling about. I think Tom's version is fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Large object corruption during 'piped' pg_restore
On Fri, Jan 21, 2011 at 12:44 PM, Bosco Rama wrote: > Tom Lane wrote: >> >> So I'm not sure whether to fix it, or leave it as a known failure case >> in old branches. Comments? > > I understand the reluctance to fool with stable code. I have zero insight > into your installed versions distribution and backward compatibility needs > so any comment I may have here is purely selfish. > > As an end user there is one area of the DB that I want to work correctly > 100% of the time and that is the dump/restore tool(s). If it's not going > to work under certain circumstances it should at least tell me so and > fail. I don't think having the tool appear to work while corrupting the > data (even if documented as doing so) is a viable method of operation. Yeah, I lean toward saying we should back-patch this. Yeah, it'll be lightly tested, but it's a pretty confined change, so it's unlikely to break anything else. ISTM the worst case scenario is that it takes two minor releases to get it right, and even that seems fairly unlikely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries
On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan wrote: > Quite right, but the commitfest manager isn't meant to be a substitute for > one. Bug fixes aren't subject to the same restrictions of feature changes. Another option would be to add this here: http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Shared memory changes in 9.4?
On Tue, May 27, 2014 at 8:22 PM, Maciek Sakrejda wrote: > On Mon, May 26, 2014 at 12:24 AM, Andres Freund > wrote: >> >> Any chance you're using a 9.3 configuration file instead of the one >> generated by initdb? >> dynamic_shared_memory_type defaults to 'posix' if not specified in the >> config file (on platforms supporting it). If initdb detects that 'posix' >> can't be used it'll emit a different value. If you're copying the config >> from 9.3 and your environment doesn't support posix shm that'll cause >> the above error. >> I still think dynamic_shared_memory_type should default to 'none' >> because of such problems > > It works with 'none' and 'sysv'--I think the issue is that technically our > environment does support 'posix', but '/dev/shm' is indeed not mounted in > the LXC container, leading to a discrepancy between what initdb decides and > what's actually possible. Thanks for your help. I think it would be good to understand why initdb isn't getting this right. Did you run initdb outside the LXC container, where /dev/shm would have worked, but then run postgres inside the LXC container, where /dev/shm does not work? I ask because initdb is supposed to be doing the same thing that postgres does, so it really ought to come to the same conclusion about what will and won't work. With regard to Andres' proposal, I'm not that keen on setting dynamic_shared_memory_type='none' by default. Would we leave it that way until we get in-core users of the facility, and then change it? I guess that'd be OK, but frankly if enabling dynamic_shared_memory_type by default is causing us many problems, then we'd better reconsider the design of the facility now, before we start adding more dependencies on it. We've already fixed a bunch of DSM-related issues as a result of the fact that the default *isn't* none, and I dunno how many of those we would have found if the default had been none. I tend to think DSM is an important facility that we're going to be wanting to build on in future releases, so I'm keen to have it available by default so that we can iron out any kinks before we get too far down that path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Question about partial functional indexes and the query planner
On Tue, Jun 10, 2014 at 7:19 PM, Tom Lane wrote: > Given the lack of previous complaints, I'm not sure this amounts to > a back-patchable bug, but it does seem like something worth fixing > going forward. Agreed, although I'd be willing to see us slip it into 9.4. It's doubtful that anyone will get upset if their query plans change between beta1 and beta2, but the same cannot be said for released branches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane wrote: > One thing that occurs to me is that if the generic plan estimate comes > out much cheaper than the custom one, maybe we should assume that the > generic's cost estimate is bogus. Right offhand I can't think of a reason > for a custom plan to look worse than a generic one, unless there's a > statistical quirk like this one. That's an interesting idea, but what do we do after deciding that it's bogus? The generic plan really can't be cheaper than the custom plan, but it could be the same price, or as close as makes no difference. I have long thought that maybe the custom vs. generic plan decision should have something to do with whether the parameters are MCVs of the table columns they're being compared against. This case is interesting because it demonstrates that querying for a *non*-MCV can require a switch to a custom plan, which is something I don't think I've seen before. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Performance issue with libpq prepared queries on 9.3 and 9.4
On Mon, Nov 17, 2014 at 4:27 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Nov 13, 2014 at 7:34 PM, Tom Lane wrote: >>> One thing that occurs to me is that if the generic plan estimate comes >>> out much cheaper than the custom one, maybe we should assume that the >>> generic's cost estimate is bogus. Right offhand I can't think of a reason >>> for a custom plan to look worse than a generic one, unless there's a >>> statistical quirk like this one. > >> That's an interesting idea, but what do we do after deciding that it's >> bogus? > > Keep using custom plans. It's possible that the estimate that's in error > is the custom one, but that's not the way to bet IMO, since the custom > plan estimate is based on better information. > >> The generic plan really can't be cheaper than the custom plan, >> but it could be the same price, or as close as makes no difference. > > Right, and what we want to do is use the generic plan as long as it's > close to the same cost (close enough to not justify replanning effort). > The trick here is to not be fooled by estimation errors. Can we assume > that generic cost < custom cost is always an estimation error? Maybe. It seems like kind of a fragile bet to me. There's going to be some qual selectivity below which an index scan on a particular table outperforms a sequential scan, but the selectivity estimated for a generic plan can be either higher or lower than the selectivity we'd estimate for some particular value. And once one of the two plans decides on an index scan while the other one divides on a sequential scan, it can cascade through and change the whole plan - e.g. because it affects whether the tuples emerge with usable pathkeys. I don't feel very confident about assuming that applying < to the result of all that is going to tell us anything useful. I think what's really going on here is that the generic plan will be optimal for some range of possible qual selectivities. Typically, the qual is of the form col = val, so the plan will be optimal for all values where the estimated frequency is between some values A and B. What would be nice is to notice when we see a value that is outside that range, and switch to a custom plan in that case. I realize that the planner isn't really well set up to deliver the information we'd need to test that for each new supplied value, but that's really what we need to do. The current system wastes CPU cycles by replanning up to 5 times even when there is no benefit to be gained by it, but can also cause big performance problems when it settles into a generic plan and then a value with different characteristics shows up later on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)
On Thu, Mar 30, 2017 at 4:45 PM, Tom Lane wrote: > In short, it seems like this statement in the docs is correctly describing > our code's behavior, but said behavior is wrong and should be changed. > I'd propose fixing it like that in HEAD; I'm not sure if the back branches > should also be changed. Sounds reasonable, but I don't see much advantage to changing it in the back-branches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)
On Fri, Mar 31, 2017 at 11:29 AM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Mar 30, 2017 at 4:45 PM, Tom Lane wrote: >>> In short, it seems like this statement in the docs is correctly describing >>> our code's behavior, but said behavior is wrong and should be changed. >>> I'd propose fixing it like that in HEAD; I'm not sure if the back branches >>> should also be changed. > >> Sounds reasonable, but I don't see much advantage to changing it in >> the back-branches. > > Well, it's a SQL-compliance bug, and we often back-patch bug fixes. Personally, I'd be more inclined to view it as a definitional change. Maybe we picked the wrong definition before, but it's well-established behavior at this point. The SQL standard also says that identifiers are supposed to be folded to upper case, so I don't think the theory that every deviation from the standard should be fixed and back-patched holds a lot of water. > The argument for not back-patching a bug fix usually boils down to > fear of breaking existing applications, but it's hard to see how > removal of a permission check could break a working application --- > especially when the permission check is as hard to trigger as this one. > How many table owners ever revoke their own REFERENCES permission? Sure, but that argument cuts both ways. If nobody ever does that, who will be helped by back-patching this? I certainly agree that back-patching this change is pretty low risk. I just don't think it has any real benefits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] contributing patches
I have to dig this up and see if I still have it. ...Robert -Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 11, 2008 1:05 PM To: Robert Haas Cc: Tom Lane; pgsql-general@postgresql.org Subject: Re: [GENERAL] contributing patches Great, would you please end your truncate patch to the patches email list? Thanks. --- Robert Haas wrote: > > We require that all submissions conform to the Postgres BSD license, > > but we are not picky about requiring paperwork to prove it. Just put > > the same copyright header into any added files as you see in existing > > files. > > OK cool. > > > Yeah, the core team does not have a lot of bandwidth right now for > > thinking about 8.4 development. You don't have to wait for 8.3 > > final release, but probably after the first couple of betas are out > > would be better than now. Bear in mind also that any patches > developed > > now are likely to have merge problems after the pgindent run that will > > occur late in beta --- so you might want to wait till after that > before > > starting any large coding effort. > > OK. I wrote a patch to implement the following TODO item, which I > selected on the basis of the fact that it appeared to be easy: > > * %Add a separate TRUNCATE permission > > Currently only the owner can TRUNCATE a table because triggers are not > called, and the table is locked in exclusive mode. > > It was easy, so I can update the patch as necessary until it can be > submitted. > > Thanks, > > ...Robert > > ---(end of broadcast)--- > TIP 4: Have you searched our list archives? > >http://archives.postgresql.org/ -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] libpq port number handling
On Thu, Sep 24, 2009 at 8:36 PM, Tom Lane wrote: > BTW, are port numbers still limited to 16 bits in IPv6? Yes. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] libpq port number handling
On Thu, Sep 24, 2009 at 8:59 PM, Tom Lane wrote: > Sam Mason writes: >> + if (portnum < 1 || portnum > 65535) > > BTW, it strikes me that we could tighten this even more by rejecting > target ports below 1024. This is guaranteed safe on all Unix systems > I know of, because privileged ports can only be listened to by root-owned > processes and we know the postmaster won't be one. I am not sure > whether it would be possible to start the postmaster on a low-numbered > port on Windows though. Anyone know? Even if it's possible, do we > want to allow it? I don't think we get much benefit out of artificially limiting libpq in this way. In 99.99% of cases it won't matter, and in the other 0.01% it will be a needless annoyance. I think we should restrict ourselves to checking what is legal, not what we think is a good idea. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Urgent Help Required
*Don't* VACUUM FULL. Just VACUUM. It's not the same thing. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Error compiling sepgsql in PG9.1
2011/5/24 Kohei Kaigai : > The attached patch enables to abort configure script when we run it with > '--with-selinux' > option, but libselinux is older than minimum requirement to SE-PostgreSQL. > > As the documentation said, it needs libselinux-2.0.93 at least, because this > or later > version support selabel_lookup(3) for database object classes; used to > initial labeling. > > The current configure script checks existence of libselinux, but no version > checks. > (getpeercon_raw(3) has been a supported API for a long term.) > The selinux_sepgsql_context_path(3) is a good watermark of libselinux-2.0.93 > instead. Looks to me like you need to adjust the wording of the error message. Maybe "libselinux version 2.0.93 or newer is required", or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries
On Wed, Feb 9, 2011 at 4:50 AM, Thom Brown wrote: > On 9 February 2011 02:11, Robert Haas wrote: >> On Tue, Feb 8, 2011 at 8:30 PM, Andrew Dunstan wrote: >>> Quite right, but the commitfest manager isn't meant to be a substitute for >>> one. Bug fixes aren't subject to the same restrictions of feature changes. >> >> Another option would be to add this here: >> >> http://wiki.postgresql.org/wiki/PostgreSQL_9.1_Open_Items > > I've removed it from the commitfest because it really doesn't belong > there, and I've added it to the open items list. So, I finally got around to look at this, and I think there is a simpler solution. When an overflow occurs while calculating the next value, that just means that the value we're about to return is the last one that should be generated. So we just need to frob the context state so that the next call will decide we're done. There are any of number of ways to do that; I just picked what looked like the easiest one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company generate-series-overflow.patch Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries
On Fri, Jun 17, 2011 at 2:15 PM, Tom Lane wrote: > Robert Haas writes: >> So, I finally got around to look at this, and I think there is a >> simpler solution. When an overflow occurs while calculating the next >> value, that just means that the value we're about to return is the >> last one that should be generated. So we just need to frob the >> context state so that the next call will decide we're done. There are >> any of number of ways to do that; I just picked what looked like the >> easiest one. > > +1 for this solution. > > BTW, there was some mention of changing the timestamp versions of > generate_series as well, but right offhand I'm not convinced that > those need any change. I think you'll get overflow detection there > automatically from the functions being used --- and if not, it's a > bug in those functions, not in generate_series. Maybe not, because those functions probably throw an error if an overflow is detected, and that's not really correct. By definition, the second generate_series() is the point at which we should stop generating, and that point has to be within the range of the underlying data type, by definition. So if an overflow occurs, that's just another way of saying that we've certainly gone past the stop point and needn't generate anything further. The error is an artifact of the method we've used to generate the next point. I'm not sure how much energy it's worth expending on that case. Using really large dates may be less common that using values that strain the range of a 4-byte integer. But it might at least be worth a TODO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Issues with generate_series using integer boundaries
On Fri, Jun 17, 2011 at 10:39 AM, David Johnston wrote: > Tangential comment but have you considered emitting a warning (and/or log > entry) when you are 10,000-50,000 away from issuing the last available > number in the sequence so that some recognition exists that any code > depending on the sequence is going to fail soon? > > Also, during sequence creation you know the integer type being used so that > maximum value is known and an overflow should not need to come into play (I > guess the trade-off is the implicit "try-catch" [or whatever mechanism C > uses] performance hit versus the need to store another full integer in the > data structure). > > You could also give access to the "warning threshold" value so that the > developer can change it to whatever value is desired (with a meaningful > default of course). There are already tools out there that can monitor this stuff - for example, check_postgres.pl. http://bucardo.org/check_postgres/check_postgres.pl.html#sequence We tend to avoid emitting warnings for this kind of thing because they can consume vast amounts of disk space, and a lot of times no one's looking at them anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [PERFORM] change sample size for statistics
On Mon, Jun 13, 2011 at 6:33 PM, Willy-Bas Loos wrote: > On Fri, Jun 10, 2011 at 9:58 PM, Josh Berkus wrote: >> >> It's not 10%. We use a fixed sample size, which is configurable on the >> system, table, or column basis. > > It seems that you are referring to "alter column set statistics" and > "default_statistics_target", which are the number of percentiles in the > histogram (and MCV's) . > I mean the number of records that are scanned by analyze to come to the > statistics for the planner, especially n_disctict. In 9.0+ you can do ALTER TABLE .. ALTER COLUMN .. SET (n_distinct = ...); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Creating temp tables inside read only transactions
On Fri, Jul 8, 2011 at 2:21 AM, Darren Duncan wrote: > I think an even better way to support this is would be based on Postgres > having support for directly using multiple databases within the same SQL > session at once, as if namespaces were another level deep, the first level > being the databases, the second level the schemas, and the third level the > schema objects. > > Kind of like what the SQL standard defines its catalog/schema/object > namespaces. > > This instead of needing to use federating or that contrib module to use > multiple Pg databases of the same cluster at once. But if that's what you want, just don't put your data in different databases in the first place. That's what schemas are for. If for some reason we needed to have tables that happened to be called x.y.z and a.b.c accessible from a single SQL session, we could allow that much more simply by allowing schemas to be nested. Then we could allow arbitrary numbers of levels, not just three. The whole point of having databases and schemas as separate objects is that they do different things: schemas are just containers for names, allowing common access to data, and databases are completely separate entities, allowing privilege separation for (say) a multi-tenant hosting environment. We're not going to throw out the latter concept just so people can use two dots in their table names instead of one. > Under this scenario, we make the property of a database being read-only or > read-write for the current SQL session associated with a database rather > than the whole SQL session. A given transaction can read from any database > but can only make changes to the ones not read-only. > > Also, the proper way to do temporary tables would be to put them in another > database than the main one, where the whole other database has the property > of being temporary. > > Under this scenario, there would be separate system catalogs for each > database, and so the ones for read-only databases are read-only, and the > ones for other databases aren't. > > Then the system catalog itself fundamentally isn't more complicated, per > database, and anything extra to handle cross-database queries or whatever, > if anything, is a separate layer. Code that only deals with a single > database at once would be an optimized situation and perform no worse than > it does now. I think you should make more of an effort to understand how the system works now, and why, before proposing radical redesigns. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL][HACKERS] register creation date of table
On Fri, Oct 14, 2011 at 6:20 AM, Willy-Bas Loos wrote: > 1. I think that there is no such information in the system tables. is > that correct? Yes. It's been discussed before but some people (particularly, Tom, IIRC) are not convinced that it's useful enough to justify its existence. > 2. i would like to go back in time. I think that i will just look up > the creation date for the files in the data directory and translate > their oid's to the object names and then update their dates. This > would of course only work from the last restore. Is that a good way to > do it? Well, that timestamp will get bumped on TRUNCATE, CLUSTER, VACUUM FULL, and rewriting versions of ALTER TABLE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [RRR] [HACKERS] Commitfest: The Good, The Bad, and the Ugly
On Tue, Sep 28, 2010 at 9:11 PM, Itagaki Takahiro wrote: > On Wed, Sep 29, 2010 at 6:03 AM, David Fetter wrote: >> The Good: >> - Most patches still in play have a reviewer. > > As far as I remember, there were discussions about the issue > "A patch has a reviewer, but in Needs Review state for several weeks " > in 9.0 development. > > Do we have any plans for it? According to the commitfest app, one patch > has only one reviewer at once. A new reviewer might avoid reviewing > a patch that have another reviewer already. No, the column is very clearly labelled "Reviewers", not "Reviewer". And we have certainly had patches with more than one person's name in that field in the past. The issue is rather that we don't have enough people reviewing. We haven't had enough people volunteer to do reviews to even assign ONE person to each patch, let alone two. There are, as of this writing, SEVEN patches that have no reviewer at all. Of course, several of the committers, including you, me, and Tom, have been working our way through the patches. And that is great. But the point of the CommitFest process is that everyone is supposed to pitch in and help out, not just the committers. That is not happening, and it's a problem. This process does not work and will not scale if the committers are responsible for doing all the work on every patch from beginning to end. That has never worked, and the fact that we have a few more committers now doesn't change that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [RRR] [HACKERS] Commitfest: The Good, The Bad, and the Ugly
On Tue, Sep 28, 2010 at 9:33 PM, Itagaki Takahiro wrote: > On Wed, Sep 29, 2010 at 10:18 AM, Robert Haas wrote: >> No, the column is very clearly labelled "Reviewers", not "Reviewer". >> And we have certainly had patches with more than one person's name in >> that field in the past. The issue is rather that we don't have enough >> people reviewing. We haven't had enough people volunteer to do >> reviews to even assign ONE person to each patch, let alone two. There >> are, as of this writing, SEVEN patches that have no reviewer at all. > > Some of them might be too difficult to review. For example, replication > or snapshot management requires special skills to review. > > I'm worrying about new reviewers hesitate to review a patch that has > a previous reviewer, and then, if they think the remaining patches are > too difficult for them, they would just leave the commitfest page. That's a legitimate concern, but I am not sure how much of a problem it is in practice. Most people who become round-robin reviewers are getting pulled into the process a little more than just stumbling across the CF page by happenstance, or at least I hope they are. Not all patches can benefit from multiple reviewers, but CF managers can and should encourage multiple reviews of those that can. However, at the moment, the problem is that regardless of who is assigned to do what, we're not getting enough reviews done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] is not distinct from any(...)
I'm trying to write a SQL statement to determine whether a value is an an array, but I want the comparison to be done using IS NOT DISTINCT FROM rather than =. My first thought was that instead of writing: SELECT value = ANY(array) ...I could simply write: SELECT value IS NOT DISTINCT FROM ANY(array) That doesn't seem to work, because IS NOT DISTINCT FROM is not an operator. So then I tried creating an operator === (anyelement, anyelement) that just does IS NOT DISTINCT FROM and writing: select 1 === any(array[1]); which got me: ERROR: could not find array type for data type anyelement Grr... any suggestions? ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] calling a function that takes a row type and returns a set of rows
So, say I have something like this - the actual example is something a bit more useful: CREATE TABLE foo (a integer, b integer); INSERT INTO foo VALUES (1, 1); -- must have some data to generate the failure CREATE FUNCTION bar (foo) RETURNS SETOF foo AS $$ DECLARE f foo; BEGIN f.a := 1; RETURN NEXT f; f.a := 2; RETURN NEXT f; END $$ LANGUAGE plpgsql; I can't find any legal way of calling this function. SELECT bar(f) FROM foo f; ERROR: set-valued function called in context that cannot accept a set SELECT * FROM foo f, bar(f); ERROR: function expression in FROM may not refer to other relations of same query level Any help appreciated. Thanks, ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] calling a function that takes a row type and returns a set of rows
Hmm, the implicit cursor approach approach won't work for me because I want to be able to call the function on an arbitrary slice of the rows in the table, but the explicit cursor approach looks like it might work. I'll give that a try, thanks. ...Robert On Fri, Oct 10, 2008 at 4:01 PM, Pavel Stehule <[EMAIL PROTECTED]> wrote: > Hello > > PostgreSQL doesn't support pipe functions, so you cannot do what you > wont. But you should to use SQL SETOF functions, that should be > called in normal context. I dislike this feature, but it should be > useful for you, > > try: > > create or replace function bar1(foo) > returns setof foo as $$ > select 1, $1.b > union all > select 2, $1.b; > $$ language sql; > > postgres=# select (bar1(foo)).* from foo; > a | b > ---+--- > 1 | 1 > 2 | 1 > (2 rows) > > I thing, so much better and cleaner version is using explicit or > implicit cursor in function > > -- implicit cursor > create or replace function bar() returns setof foo as $$ > declare r record; > begin > for r in select * from foo loop >r.a := 1; >return next r; >r.a := 2; >return next r; > end loop; > return; > end; > $$ language plpgsql; > > postgres=# select * from bar(); > a | b > ---+--- > 1 | 1 > 2 | 1 > (2 rows) > > -- using explicit cursor (it's more complicated variant, and I thing, > so it's better don't use it) > create or replace function bar(c refcursor) returns setof foo as $$ > declare r record; > begin > loop >fetch c into r; >exit when not found; >r.a := 1; >return next r; >r.a := 2; >return next r; > end loop; > return; > end; > $$ language plpgsql; > > begin; > declare x cursor for select * from foo; > select * from bar('x'::refcursor); > commit; > > postgres=# declare x cursor for select * from foo; > DECLARE CURSOR > postgres=# select * from bar('x'::refcursor); > a | b > ---+--- > 1 | 1 > 2 | 1 > (2 rows) > > postgres=# commit; > COMMIT > > Regards > Pavel Stehule > > > 2008/10/10 Robert Haas <[EMAIL PROTECTED]>: >> So, say I have something like this - the actual example is something a >> bit more useful: >> >> CREATE TABLE foo (a integer, b integer); >> INSERT INTO foo VALUES (1, 1); -- must have some data to generate the >> failure >> >> CREATE FUNCTION bar (foo) RETURNS SETOF foo AS $$ >> DECLARE >>f foo; >> BEGIN >>f.a := 1; >>RETURN NEXT f; >>f.a := 2; >>RETURN NEXT f; >> END >> $$ LANGUAGE plpgsql; >> >> I can't find any legal way of calling this function. >> >> SELECT bar(f) FROM foo f; >> ERROR: set-valued function called in context that cannot accept a set >> >> SELECT * FROM foo f, bar(f); >> ERROR: function expression in FROM may not refer to other relations >> of same query level >> >> Any help appreciated. >> >> Thanks, >> >> ...Robert >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] calling a function that takes a row type and returns a set of rows
> You need LATERAL support for this: > SELECT * FROM foo f LATERAL bar(f); > > I'm not sure about the syntax, but LATERAL is a standard JOIN type wherein > upper "nodes" are visible. That would be really nice. Then you could presumably also do: SELECT f.id, f.name, f.apple, f.banana, bar.apple AS bar_apple, bar.banana AS bar_banana FROM foo f LATERAL bar(f); ...which I frequently wish to do, and can't. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Hot Standby utility and administrator functions
> * pg_last_recovered_xact_xid() > Will throw an ERROR if *not* executed in recovery mode. > returns bigint > > * pg_last_completed_xact_xid() > Will throw an ERROR *if* executed in recovery mode. > returns bigint Should these return xid? ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] why hash on the primary key?
I'm seeing a lot of plans in my database that look like this: portal=# explain select * from foo i, foo j where i.id = j.id; QUERY PLAN - Hash Join (cost=769.87..2159.36 rows=13283 width=264) Hash Cond: (i.id = j.id) -> Seq Scan on foo i (cost=0.00..343.83 rows=13283 width=132) -> Hash (cost=343.83..343.83 rows=13283 width=132) -> Seq Scan on foo j (cost=0.00..343.83 rows=13283 width=132) It seems very strange for the planner to decide to build an in-memory hash table on a column that is already indexed (the primary key, no less!). But this is happening A LOT - I often see plans where a majority of the joins are executed this way (and they're not all self-joins either...). It seems like the planner is concluding that it's going to need most or all of the pages in the table anyway, and that building a hash table as it goes is quicker than reading the index pages in from disk. On a simple query like the above, setting enable_seqscan to off or random_page_cost to 1 generates the expected plan: QUERY PLAN - Merge Join (cost=0.00..2534.24 rows=13283 width=264) Merge Cond: (i.id = j.id) -> Index Scan using foo_pkey on foo i (cost=0.00..1167.50 rows=13283 width=132) -> Index Scan using foo_pkey on foo j (cost=0.00..1167.50 rows=13283 width=132) (4 rows) Experimentation shows this is actually about 25% faster. But, this is kind of a blunt instrument, and my attempts to fiddle with various parameters have not been real succesful in generating better plans for more complicated examples. Any suggestions/explanations? ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why hash on the primary key?
> Could you send the output of these two queries using "explain analyze" > instead of plain explain? portal=# explain analyze select * from foo i, foo j where i.id = j.id; QUERY PLAN Hash Join (cost=747.87..2127.36 rows=13283 width=272) (actual time=68.434..205.536 rows=13283 loops=1) Hash Cond: (i.id = j.id) -> Seq Scan on foo i (cost=0.00..315.83 rows=13283 width=136) (actual time=0.024..23.349 rows=13283 loops=1) -> Hash (cost=315.83..315.83 rows=13283 width=136) (actual time=68.353..68.353 rows=13283 loops=1) -> Seq Scan on foo j (cost=0.00..315.83 rows=13283 width=136) (actual time=0.009..23.839 rows=13283 loops=1) Total runtime: 223.390 ms (6 rows) portal=# set enable_seqscan to false; SET portal=# explain analyze select * from foo i, foo j where i.id = j.id; QUERY PLAN Merge Join (cost=0.00..2310.24 rows=13283 width=272) (actual time=0.272..149.317 rows=13283 loops=1) Merge Cond: (i.id = j.id) -> Index Scan using foo_pkey on foo i (cost=0.00..1055.50 rows=13283 width=136) (actual time=0.205..29.714 rows=13283 loops=1) -> Index Scan using foo_pkey on foo j (cost=0.00..1055.50 rows=13283 width=136) (actual time=0.025..37.534 rows=13283 loops=1) Total runtime: 166.364 ms (5 rows) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why hash on the primary key?
> What's strange about it? A probe into an in-memory hashtable is a lot > cheaper than a probe into an index, so this type of plan makes plenty > of sense if the hashtable will fit in RAM and there are going to be a > lot of probes. (Where "a lot" means "enough to amortize the cost of > building the hashtable", of course.) Hmm... it didn't occur to me that the index probe itself might be more expensive than a hash probe. Is that due to concurrency control, or are you talking about the need to possibly read index pages in from disk? >> Experimentation shows this is actually about 25% faster. > > Well, that just says your cost parameters need a bit of adjustment > if you'd like the planner to get the crossover point exactly right. Any sense of which ones might be worth fiddling with? ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] why hash on the primary key?
>>> Well, that just says your cost parameters need a bit of adjustment >>> if you'd like the planner to get the crossover point exactly right. > >> Any sense of which ones might be worth fiddling with? > > random_page_cost, effective_cache_size, maybe the cpu_xxx parameters. I fiddled with this some more on a more complex query with more than 20 joins. It appears that the culprit is from_collapse_limit. If I raise from_collapse_limit, most of the hash joins turn into Nested Loops or Nested Loop Left Joins with Index Scans. The estimated cost is about 20% of the cost of the original plan, but the planning takes so much longer that the actual time is 60% larger. This is really the pits. I had hoped that the work Simon Riggs was doing on join removal would hit 8.4, but that doesn't seem likely at this point. The problem is that the application is a web app where the user can select which columns they want to see. Most of the time only ~6-8 columns of 40+ that are available are selected, and the joins needed to generate the unselected columns can be tossed (which would hopefully both avoid unnecessary join steps and also speed up planning the remaining joins). ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Updating column on row update
On Sun, Nov 22, 2009 at 6:51 PM, Tom Lane wrote: > Craig Ringer writes: >> I do think this comes up often enough that a built-in trigger "update >> named column with result of expression on insert" trigger might be >> desirable. > > There's something of the sort in contrib already, I believe, though > it's so old it still uses abstime :-( > >> So might "CREATE LANGUAGE ... IF NOT EXISTS". Maybe even "CREATE ROLE >> ... IF NOT EXISTS" and "CREATE USER ... IF NOT EXISTS" - I know I'd find >> them really handy. > > CREATE IF NOT EXISTS has been proposed and rejected before, more than > once. Please see the archives. Search for CINE to find the discussions. This is a good place to start: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00252.php Despite Tom's assertions to the contrary, I am unable to find a clear consensus against this feature in the archives, and still think it would be useful. MySQL and SQLite both support it, at least in the specific case of CREATE TABLE IF NOT EXISTS. But I've exhausted my quota of beating my head against a brick wall on this issue. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Updating column on row update
On Sun, Nov 22, 2009 at 11:38 PM, Tom Lane wrote: > Robert Haas writes: >> On Sun, Nov 22, 2009 at 6:51 PM, Tom Lane wrote: >>> CREATE IF NOT EXISTS has been proposed and rejected before, more than >>> once. Please see the archives. > >> Search for CINE to find the discussions. This is a good place to start: >> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00252.php > >> Despite Tom's assertions to the contrary, I am unable to find a clear >> consensus against this feature in the archives, > > I think you didn't look back far enough --- that issue was settled years > ago. IIRC the killer argument is that after CINE you do not know the > state of the object: it exists, yes, but what properties has it got? > If it already existed then it's still got its old definition, which > might or might not be what you're expecting. > > CREATE OR REPLACE has got far safer semantics from the viewpoint of a > script that wants to bull through without having any actual error > handling (which is more or less the scenario we are arguing here, no?) > After successful execution of the command you know exactly what > properties the object has got. Sure. I think that CINE only makes sense for objects for which COR can't be implemented - things that have internal substructure, like tables or tablespaces. I agree that there are pitfalls for the unwary but I still think it's better than nothing. I understand that you disagree. > Whether it would be sensible to have CREATE OR REPLACE semantics for a > language is something I'm not very clear on. It seems like changing any > of the properties of a pg_language entry could be rather fatal from the > viewpoint of an existing function using the language. > > [ thinks for awhile... ] Actually, CREATE LANGUAGE is unique among > creation commands in that the common cases have no parameters, at least > not since we added pg_pltemplate. So you could imagine defining CINE > for a language as disallowing any parameters and having these semantics: > * language not present -> create from template > * language present, matches template -> OK, do nothing > * language present, does not match template -> report error > This would meet the objection of not being sure what the state is > after successful execution of the command. It doesn't scale to any > other object type, but is it worth doing for this one type? CREATE OR REPLACE seems like a better fit in this case. For example, it seems plausible that someone might want to add an inline handler to a procedural language that didn't have one without dropping and recreating the language. Even changing the call handler seems like it could be potentially useful in an upgrade scenario. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Updating column on row update
On Tue, Nov 24, 2009 at 12:28 PM, Tom Lane wrote: > "Kevin Grittner" writes: >> So we're conceding that this is a valid need and people will now have >> a way to meet it. Is the argument against having CINE syntax that it >> would be more prone to error than the above, or that the code would be >> so large and complex as to create a maintenance burden? > > The argument against CINE is that it's unsafe. The fragment proposed > by Andrew is no safer, of course, but it could be made safe by adding > additional checks that the properties of the existing object are what > the script expects. So in principle that's an acceptable approach, > whereas CINE will never be safe. Well, there can be methods extrinsic to the system for controlling this sort of thing. For example, I can provide a script, using CINE, that will either install version 2 of my app into some database or that will upgrade an existing version 1 installation to version 2. It's true that if someone has taken the version-1 schema and made manual modifications to it, then things might blow up. But, I can tell people that they shouldn't do that, or the upgrade script might break. If they do and it does then they get to keep both pieces. Even if I do the whole thing in PL/pgsql, I'm still not going to check for every stupid thing someone might have done to break the schema... I think the cat is already out of the bag on this one, and it's just a matter of whether we're willing to provide some convenient syntax or leave people to hand-code it. > But actually I thought we had more or less concluded that CREATE OR > REPLACE LANGUAGE would be acceptable (perhaps only if it's given > without any extra args?). I'm not sure there's any value in that restriction - seems more confusing than helpful. > Or for that matter there seems to be enough > opinion on the side of just installing plpgsql by default. CINE is > a markedly inferior alternative to either of those. For languages, yes. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] Updating column on row update
On Tue, Nov 24, 2009 at 2:07 PM, Kevin Grittner wrote: > Tom Lane wrote: > >> If it did so, that would be outside the apparent meaning of the >> command, which is to do nothing if an object of that name exists. >> That's why we've gone with CREATE OR REPLACE instead. > > I think that "fail on existence of an object conflicting with given > definition" is behavior which could be documented and rates fairly > low on my astonishment scale. (I can't speak for anyone else.) I think CINE should create the object if it does not exist and otherwise do nothing. It might be useful to have some kind of consistency-checking behavior, but it would probably be more useful if decoupled from CINE, and in any case, that's not what "CREATE IF NOT EXISTS" means to me. > I am skeptical that, in the absence of built-in support for checking > the existing object against the supplied definition, people would > generally go any further than Andrew's example. When they did, I'm > skeptical about how often they would get the details exactly right. Bingo. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Fwd: psql+krb5
2009/11/30 rahimeh khodadadi : > > > -- Forwarded message -- > From: rahimeh khodadadi > Date: 2009/11/29 > Subject: Re: psql+krb5 > To: Denis Feklushkin Please review the guidelines for reporting a problem, which you can find here: http://wiki.postgresql.org/wiki/Guide_to_reporting_problems It seems to me that you've done the exact opposite of nearly everything suggested there, starting with cross-posting your email to four mailing lists at least three of which are irrelevant to the problem that you're attempting to solve. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Fwd: psql+krb5
On Tue, Dec 1, 2009 at 11:26 AM, Scott Marlowe wrote: > Except that he posted a month ago and got no answers... Gee, I wonder why. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] [HACKERS] Sugerencia de opcion
2009/1/22 Informatica-Cooperativa Cnel. Oviedo : > Buenos Dias todos, > > Soy un usuario de postgres de Paraguay, consulto > sobre la posibilidad de inclucion en la futura version la siguiente > sentencia(Uso de alias en la condicion HAVING ): > > > SELECT id, sum(salario) as SumaSalario > FROM salarios > GROUP BY id > HAVING SumaSalario>500; I've wished for that syntax once or twice myself, but I'm assuming there's a reason we haven't implemented it? Part of the problem is it's inheritantly ambiguous if salarios happens to contain a column called sumasalario, which is a problem that seems to arise for me fairly regularly in practice. Still, it would be nice for WHERE/GROUP BY/HAVING clauses to have an explicit way to reference "the target list column called foo". ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!
On Mon, Jan 30, 2012 at 11:18 PM, Tom Lane wrote: > I don't recall that we thought very hard about what should happen when > pg_dump switches are used to produce a selective dump, but ISTM > reasonable that if it's "user data" then it should be dumped only if > data in a regular user table would be. Yep. > What's not apparent to me is whether there's an argument for doing more > than that. It strikes me that the current design is not very friendly > towards the idea of an extension that creates a table that's meant > solely to hold user data --- you'd have to mark it as "config" which > seems a bit unfortunate terminology for that case. Is it important to > do something about that, and if so what? Is this anything more than a naming problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [BUGS] [GENERAL] Altering a table with a rowtype column
On Wed, Mar 7, 2012 at 3:49 PM, Merlin Moncure wrote: > On Wed, Mar 7, 2012 at 2:31 PM, Tom Lane wrote: >> Merlin Moncure writes: >>> On Wed, Mar 7, 2012 at 11:45 AM, Mike Blackwell >>> wrote: >>>> alter table a add column even_more_stuff boolean not null default false; >> >>> aha! that's not what you posted last time. you appended 'not null >>> default false'; which inexplicably breaks the ALTER. >> >>> try this: >>> ALTER TABLE a ADD COLUMN even_more_stuff text not null; >>> ALTER TABLE a ALTER even_more_stuff set default false; >>> ALTER TABLE a DROP COLUMN even_more_stuff; >>> ALTER TABLE a ADD COLUMN even_more_stuff boolean not null default false; >> >>> (this really looks like a bug in postgres, cc-ing to bugs) >> >> It is not a bug. The ALTER ADD ... DEFAULT ... form implies rewriting >> every existing tuple of the rowtype to insert a non-null value in the >> added column, and we don't have support for doing that to rowtype >> columns, only to the target table and descendants. > > I'm not buying that..it implies no such thing. In particular, for > table-as-rowtype columns, there's no way that I can see to have > default values be generated. So why does it follow that the dependent > table has to be rewritten? Column constraints are not enforced on the > rowtype, so it follows that default shouldn't be either considering > there's no way to get the default to fire. Composite type (or table > based composite) defaults are applied to the composite as a whole, not > to specific fields. I think Tom's correct about what the right behavior would be if composite types supported defaults, but they don't, never have, and maybe never will. I had a previous argument about this with Tom, and lost, though I am not sure that anyone other than Tom thinks that the current behavior is for the best. But see commits a06e41deebdf74b8b5109329dc75b2e9d9057962 and a40b1e0bf32b1da46c1baa9bc7da87f207cd37d8. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [BUGS] [GENERAL] Altering a table with a rowtype column
On Tue, Mar 20, 2012 at 2:48 PM, Tom Lane wrote: >> I think Tom's correct about what the right behavior would be if >> composite types supported defaults, but they don't, never have, and >> maybe never will. I had a previous argument about this with Tom, and >> lost, though I am not sure that anyone other than Tom thinks that the >> current behavior is for the best. > > Um, did I say I thought it was for the best? I thought I said we don't > have support for doing better. > > If we are willing to legislate that column defaults are not and never > will be applied to composite types, then I think Merlin might be right > that we could just let an ALTER ADD with DEFAULT ignore the existence of > composite columns. I tend to think that's exactly what we should do, and it's what that patch did, although as you point out my commit message was the product of confused thinking. > I'd always figured that we'd want to try to fix that > omission eventually, though. It's mildly tempting, but as Merlin points out, it's hard to know exactly when you'd apply those rules. We talked a while back about domains with NOT NULL constraints; if someone does a left join with a domain-typed column on the outer side, what are you going to put there if you don't put NULL? This case seems somewhat similar. Defaults make sense when applied to table columns, because the semantics are clear: columns not explicitly mentioned get their default value if any, else NULL. But if we rule that a composite type with no default gets the composite type's default values for each column, then we're overriding the general SQL presumption that unspecified columns are NULL. And similarly for temps created by uninitialized variables or, worse, LEFT JOINs. In languages like C++ or even Perl, there's always a very clear notion of when an object gets created, and constructors and so on run at that time. Defaults logically should run at the same time that a constructor would, but that concept doesn't really exist in SQL, which is seemingly deliberately quite murky about when values spring into existence. Does the SQL standard say anything on this topic? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Tue, Mar 31, 2009 at 10:44 AM, Greg Stark wrote: > On Tue, Mar 31, 2009 at 3:42 PM, Sam Mason wrote: >> >> string_to_array('',',')::INT[] => invalid input syntax for integer: "" > > Oof. That's a good point. +1. I find this argument much more compelling than anything else that's been offered up so far. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Wed, Apr 1, 2009 at 12:52 PM, David E. Wheeler wrote: > Well, I'd just point out that the return value of string_to_array() is > text[]. Thus, this is not a problem with string_to_array(), but a casting > problem from text[] to int[]. Making string_to_array() return a NULL for > this case to make casting simpler is addressing the problem in the wrong > place, IMHO. If I want to do this in Perl, for example, I'd do something > like this: > > my @ints = grep { defined $_ && $_ ne '' } split ',', $string; I've written code that looks a whole lot like this myself, but there's no easy way to do that in SQL. SQL, in particular, lacks closures, so grep {} and map {} don't exist. I really, really wish they did, but I believe that our type system is too woefully pathetic to be up to the job. So it seems to me that arguing that SQL (which lacks those primitives) should match Perl (which has them) isn't really getting us anywhere. > my @ints = map { $_ || 0 } split ',', $string; > > This ensures that I get the proper number of records in the example of > something like '1,2,,4'. I can't see that there's any way to do this in SQL regardless of how we define this operation. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Wed, Apr 1, 2009 at 1:05 PM, justin wrote: > I'm still a hold out, We are taking a string putting it into a array based > on a delimiter. That is very simple and straight forward. Yet many argue > if we want to cast this into another data type the function should deal with > in limited cases. > > string_to_array('',',')::INT[] works as proposed > > But > string_to_array(',,,', ',' )::INT[] Fails > or > string_to_array('1,2,,4', ',' )::INT[] Fails . But... but... those aren't comma-separated lists of integers. If they were, it would work. string_to_array('cow,dog,horse')::INT[] will also fail. If you take 0 items of any type whatsoever and join them together with commas, you will get the empty string. It is also true that if you join 1 item together with commas, you will get that item back, and if that item is the empty string, you will now have the empty string. I think it's better to worry more about the first case because it applies to any type at all, whereas the latter case ONLY applies in situations where the empty string is a potentially legal value. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Wed, Apr 1, 2009 at 5:22 PM, Tom Lane wrote: > Or we could stick to the current behavior and say "use COALESCE() to > resolve the ambiguity, if you need to". If there's no consensus on changing the behavior, it's probably better to be backward compatible than not. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Thu, Apr 2, 2009 at 12:17 PM, David E. Wheeler wrote: > On Apr 1, 2009, at 2:22 PM, Tom Lane wrote: > >> Another way to state the point is that we can offer people a choice of >> two limitations: string_to_array doesn't work for zero-length lists, >> or string_to_array doesn't work for empty strings (except most of the >> time, it does). The former is sounding less likely to bite people >> unexpectedly. > > Right, very well put. > >> Or we could stick to the current behavior and say "use COALESCE() to >> resolve the ambiguity, if you need to". > > Steve has a point that leaving it as-is leaves it as impossible to tell the > difference between string_to_array(NULL, ',') and string_to_array('', ','). > The former properly handles an unknown value, while the latter, where '' is > a known value, seems weird to be returning NULL. *shrug* CASE WHEN blah IS NOT NULL THEN string_to_array(blah, ',') END More and more I'm leaning toward leaving this alone. No matter how you define it, the behavior can be changed to whichever alternative you prefer with a 1-line case statement. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Thu, Apr 2, 2009 at 12:10 PM, David E. Wheeler wrote: > On Apr 1, 2009, at 12:19 PM, Robert Haas wrote: > >>> my @ints = map { $_ || 0 } split ',', $string; >>> >>> This ensures that I get the proper number of records in the example of >>> something like '1,2,,4'. >> >> I can't see that there's any way to do this in SQL regardless of how >> we define this operation. > > It's easy enough to write a function to do it: > > CREATE OR REPLACE FUNCTION trim_blanks (anyarray) RETURNS anyarray AS $$ > SELECT ARRAY( > SELECT CASE WHEN $1[i] IS NULL OR $1[i] = '' THEN '0' ELSE $1[i] END > FROM generate_series(1, array_upper($1, 1)) s(i) > ORDER BY i > ); > $$ LANGUAGE SQL IMMUTABLE; Ah! Thanks for the tip. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Thu, Apr 2, 2009 at 2:04 PM, Tom Lane wrote: > Right at the moment, if we stick with the historical definition > of the function, *both* camps have to write out their choice of > the above. Seems like this is the worst of all possible worlds. > We should probably pick one or the other. ISTM there are three camps. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Thu, Apr 2, 2009 at 2:18 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Apr 2, 2009 at 2:04 PM, Tom Lane wrote: >>> Right at the moment, if we stick with the historical definition >>> of the function, *both* camps have to write out their choice of >>> the above. Seems like this is the worst of all possible worlds. >>> We should probably pick one or the other. > >> ISTM there are three camps. > > If there's a camp that actually *wants* a NULL result for this case, > I missed the reasoning. AFAICS we can either say that every application > is going to have to put in a CASE wrapper around this function, or say > that we'll make it do the right thing for some of them and the rest have > to put the same wrapper around it. So that we don't break existing apps because of an issue that is trivial to work around. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [HACKERS] [GENERAL] string_to_array with empty input
On Thu, Apr 2, 2009 at 2:50 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Apr 2, 2009 at 2:18 PM, Tom Lane wrote: >>> If there's a camp that actually *wants* a NULL result for this case, >>> I missed the reasoning. > >> So that we don't break existing apps because of an issue that is >> trivial to work around. > > We would only be breaking them if a NULL result were actually the > correct behavior for the application's requirements, which seems > a bit unlikely. But that's completely untrue. If the most useful behavior is either ARRAY[''] or ARRAY[], then there are presumably lots and lots of people out there who have apps that do COALESCE(string_to_array(...), something). Whichever way you change string_to_array() will break all of the people doing this who wanted the opposite behavior for no good reason. ...Robert -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] complex referential integrity constraints
So, I have the following problem. Suppose you have two kinds of animals, sheep and wolves. Since they have very similar properties, you create a single table to hold both kinds of animals, and an animal_type table to specify the type of each animal: CREATE TABLE animal_type ( id integer not null, namevarchar(80) not null, primary key (id) ); INSERT INTO animal_type VALUES (1, 'Sheep'); INSERT INTO animal_type VALUES (2, 'Wolf'); CREATE TABLE animal ( id serial, type_id integer not null references animal_type (id), namevarchar(80) not null, age integer not null, weight_in_poundsinteger not null, primary key (id) ); The animal_type table is more or less written in stone, but the animal table will be updated frequently. Now, let's suppose that we want to keep track of all of the cases where one animal is mauled by another animal: CREATE TABLE mauling ( id serial, attacker_id integer not null references animal (id), victim_id integer not null references animal (id), attack_time timestamp not null, primary key (id) ); The problem with this is that I have a very unsettled feeling about the foreign key constraints on this table. The victim_id constraint is fine, but the attacker_id constraint is really inadequate, because the attacker CAN NEVER BE A SHEEP. I really want a way to write a constraint that says that the attacker must be an animal, but specifically, a wolf. It would be really nice to be able to write: FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) Or: CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; -- and then FOREIGN KEY (attacker_id) REFERENCES INDEX wolves ...but that's entirely speculative syntax. I don't think there's any easy way to do this. (Please tell me I'm wrong.) The problem really comes in when people start modifying the animal table. Every once in a while we have a case where we record something as a wolf, but it turns out to have been a sheep in wolf's clothing. In this case, we want to do something like this: UPDATE animal SET type_id = 1 WHERE id = 572; HOWEVER, this operation MUST NOT be allowed if it turns out there is a row in the mauling table where attacker_id = 572, because that would violate my integrity constraints that says that sheep do not maul. Any suggestions? I've thought about creating rules or triggers to check the conditions, but I'm scared that this could either (a) get really complicated when there are a lot more tables and constraints involved or (b) introduce race conditions. Thanks, ...Robert ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] complex referential integrity constraints
The ability to make a foreign key reference a specific partial unique index (rather than just a set of columns that have a unique index) would solve many problems of this type. As another example, you might have a table where one of the columns is "is_deleted boolean not null". By creating a partial unique index on the primary key of that table "WHERE NOT is_deleted" and then pointing a foreign key at it, you could enforce that each row in the child table references a parent who isn't deleted. However, this would break down when there's more one than intermediate step involved. For example, if you have: CREATE TABLE animal_type ( idserial, namevarchar(60) not null, is_attacker boolean not null, primary key (id) ); CREATE TABLE animal ( id serial, type_id integer not null references animal_type (id), namevarchar(60) not null, primary key (id) ); CREATE TABLE mauling ( id serial, attacker_id integer not null references animal (id), victim_id integer not null references animal (id), attack_time timestamp with time zone not null, primary key (id) ); It would be easy to enforce the constraint that the attacker must be an animal of some specific type, but difficult to enforce the constraint that the attacker must be an animal whose type, in turn, has a true value for is_attacker. The best idea that I can think of right now to handle multiple levels of tables is to allow FOREIGN KEY constraints to references a VIEW, rather than a table. Then you could say: CREATE VIEW attackers AS SELECT a.id FROM animal a, animal_type t WHERE a.type_id = t.id AND t.attacker; ...and then FOREIGN KEY (attacker_id) REFERENCES attackers (id). This syntax would solve a number of other problems as well, such as requiring that some record in table A has a parent either in table P or in table Q. However, I think this would probably require implementing some kind of materialized view so that you could actually build an index on the view, and that opens up a whole new can of worms, because it's not very difficult to define a view that is costly to update incrementally. The problem is really that there is a pretty large gap between writing a foreign key constraint, which is trivial, and enforcing a constraint using triggers, which is quite a bit more complex (and therefore, easy to screw up), because the foreign key automatically handles all the cases (insert into child table, update of child table, update of parent table, delete from parent table) whereas with triggers you have to address each of those cases individually. Unfortunately, something tells me that implementing a more powerful system for foreign key constraints is a non-trivial project, however useful it would be. Still, I'd love to see it in the TODO file, too. ...Robert -Original Message- From: Joris Dobbelsteen [mailto:[EMAIL PROTECTED] Sent: Thursday, February 22, 2007 8:03 AM To: Robert Haas; elein Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] complex referential integrity constraints I partially agree: If people CAN do stupid things, they are 'clever' enough to find a way to actually do it. I've seen them destroy things, by just using a system in a way it was not intended. They effectively found a way to blow away the very thing that part was designed for. But indeed, it's a lot of work, especially if the number of tables that must be referenced increases. I'm a strong supporter for ensuring consistency. Postgres has what it takes to do the job, but it doesn't make my life a lot easier. But it seems to be as good as it gets today... Perhaps we should rather define a 'database' constraint in the order of: "For every mauling, the attacking animal must be of the attacker type" (in a computer understandable manner). From the set theory this should be possible without too much problems, However doing so efficiently might be slightly harder. This might be a fun project and useful for the TODO list. At least it makes it a lot easier (and maintanable) to enforce database-wide constraints. - Joris >-Original Message- >From: Robert Haas [mailto:[EMAIL PROTECTED] >Sent: woensdag 21 februari 2007 3:37 >To: Joris Dobbelsteen; elein >Cc: pgsql-general@postgresql.org >Subject: RE: [GENERAL] complex referential integrity constraints > >Yes, exactly. And while you might not care about all of those >(e.g. I care about the first two but am not worried about the >third one because I'm the only one who will ever update that >table), writing multiple triggers to enforce each constraint >of this type quickly gets old if there are even a
Re: [GENERAL] complex referential integrity constraints
The idea here is that a wolf can attack a sheep, or a wolf can attack another wolf, but sheep can't attack anything. I suppose I could list each wolf in both the predator and prey tables, but that seems a bit duplicative (and causes other problems). ...Robert -Original Message- From: David Fetter [mailto:[EMAIL PROTECTED] Sent: Monday, February 19, 2007 1:04 PM To: Robert Haas Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints On Fri, Feb 16, 2007 at 09:58:56AM -0500, Robert Haas wrote: > So, I have the following problem. > > Suppose you have two kinds of animals, sheep and wolves. Since they > have very similar properties, you create a single table to hold both > kinds of animals, and an animal_type table to specify the type of each > animal: > > CREATE TABLE animal_type ( > idinteger not null, > name varchar(80) not null, > primary key (id) > ); > INSERT INTO animal_type VALUES (1, 'Sheep'); > INSERT INTO animal_type VALUES (2, 'Wolf'); > > CREATE TABLE animal ( > idserial, > type_id integer not null references animal_type (id), > name varchar(80) not null, > age integer not null, > weight_in_pounds integer not null, > primary key (id) > ); > > The animal_type table is more or less written in stone, but the animal > table will be updated frequently. Now, let's suppose that we want to > keep track of all of the cases where one animal is mauled by another > animal: > > CREATE TABLE mauling ( > id serial, > attacker_id integer not null references animal (id), > victim_id integer not null references animal (id), > attack_time timestamp not null, > primary key (id) > ); > > The problem with this is that I have a very unsettled feeling about the > foreign key constraints on this table. The victim_id constraint is > fine, but the attacker_id constraint is really inadequate, because the > attacker CAN NEVER BE A SHEEP. I really want a way to write a > constraint that says that the attacker must be an animal, but > specifically, a wolf. > > It would be really nice to be able to write: > > FOREIGN KEY (attacker_id, 2) REFERENCES animal (id, type_id) > > Or: > > CREATE UNIQUE INDEX wolves ON animal (id) WHERE type_id = 2; > -- and then > FOREIGN KEY (attacker_id) REFERENCES INDEX wolves > > ...but that's entirely speculative syntax. I don't think there's any > easy way to do this. (Please tell me I'm wrong.) > > The problem really comes in when people start modifying the animal > table. Every once in a while we have a case where we record something > as a wolf, but it turns out to have been a sheep in wolf's clothing. In > this case, we want to do something like this: > > UPDATE animal SET type_id = 1 WHERE id = 572; > > HOWEVER, this operation MUST NOT be allowed if it turns out there is a > row in the mauling table where attacker_id = 572, because that would > violate my integrity constraints that says that sheep do not maul. > > Any suggestions? I've thought about creating rules or triggers to check > the conditions, but I'm scared that this could either (a) get really > complicated when there are a lot more tables and constraints involved or > (b) introduce race conditions. > > Thanks, > > ...Robert I'd do something like this: CREATE TABLE animal_type ( animal_name TEXT PRIMARY KEY, CHECK(animal_name = trim(animal_name)) ); /* Only one of {Wolf,wolf} can be in the table. */ CREATE UNIQUE INDEX just_one_animal_name ON animal_type(LOWER(animal_name)); CREATE TABLE predator ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE prey ( animal_name TEXT NOT NULL REFERENCES animal_type(animal_name) ON DELETE CASCADE, PRIMARY KEY(animal_name) ); CREATE TABLE mauling ( id SERIAL PRIMARY KEY, attacker_idINTEGER NOT NULL REFERENCES predator (animal_type_id), victim_id INTEGER NOT NULL REFERENCES prey (animal_type_id), attack_timeTIMESTAMP WITH TIME ZONE NOT NULL ); Cheers, D -- David Fetter <[EMAIL PROTECTED]> http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] complex referential integrity constraints
Yes, exactly. And while you might not care about all of those (e.g. I care about the first two but am not worried about the third one because I'm the only one who will ever update that table), writing multiple triggers to enforce each constraint of this type quickly gets old if there are even a few of them. It is exponentially harder to write a constraint of this type than it is to write a simple foreign key constraint. ...Robert -Original Message- From: Joris Dobbelsteen [mailto:[EMAIL PROTECTED] Sent: Monday, February 19, 2007 5:59 AM To: elein; Robert Haas Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] complex referential integrity constraints >Why don't you add a field in animal_types that is boolean mauler. >Then you can add a trigger on the mauling table to raise an >error when the attacker_id is an animal type mauler. This is only partial. You need a lot more triggers to guarentee the constraints are enforced. Precisely you need to validate: * mauling on insert/update of attacker_id * animal on update of type_id * animal_type on update of your property Of course you need to think about the MVCC model, such that: Transaction 1 executes INSERT INTO mauling VALUES ('someattacker'), Transaction 2 executes UPDATE animal_type SET mauler = false WHERE name = 'someattacker', such that both transaction happen in parallel. This is perfectly possible and will make it possible to violate the constraint, UNLESS locking of the tuples is done correctly. These contraints are not trivial to implement (unfortunally). It would be great if they where. - Joris ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] complex referential integrity constraints
Actually, what would be really nice is if there were just a button I could push that would make all of my data automatically correct. Can that go into 8.3? Thanks, ...Robert -Original Message- From: Alvaro Herrera [mailto:[EMAIL PROTECTED] Sent: Friday, February 23, 2007 9:35 AM To: Alban Hertroys Cc: Robert Haas; David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Alban Hertroys wrote: > Robert Haas wrote: > > The idea here is that a wolf can attack a sheep, or a wolf can attack > > another wolf, but sheep can't attack anything. I suppose I could list > > each wolf in both the predator and prey tables, but that seems a bit > > duplicative (and causes other problems). > > I'm quite certain a wolf is much more likely to attack a sheep than to > attack another wolf, and even more unlikely to attack for example a > lion. It seems to me that just the fact that it can isn't enough > information. > > It looks like you need "weighted constraints"; there's 0 chance that a > sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, > >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact > numbers will vary, and I have absolutely no idea what they would be > like. It probably requires some kind of ranking system that adjusts > according to the known animals and their likelihood to attack eachother. Depending on what you're modelling, even this could be too simple -- for example, while a single wolf is unlikely to attack a lion, a pack of wolves have a lot more probability of doing so. Do you keep packs of wolves in your barn? If so, watch your lions. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] complex referential integrity constraints
I don't understand what a weighted constraint would mean. Either the attacker_id can be a wolf, or it can't. Knowing that it is only 1% likely over the long haul is insufficient to disallow any particular transaction. It's certainly true that the constraint as stated is insufficient to guarantee that the table will contain good data. For example if we looked at the maulings table and wolves were always mauling other wolves but never sheep, we would naturally want to dig into that a little more and find out why they weren't picking easier targets. But this is neither here nor there, because NO constraint (foreign key, check, or what have you) is ever strong enough to ensure that the data in a table is completely clean. At least as I understand it, the purpose of these constraints is to allow us to write application code which relies on certain basic invariants being true, i.e. so that we can join animal to animal_type and not have to worry about rows dropping out because some animals had an invalid type, or rows getting added because there are two animal_type records with the same id. Besides, the problem as stated is a proxy for some real problem which is part of a non-zoological project the details of which (a) would take too long to explain and (b) should probably not be posted to a public mailing list. :-) So far, the best ideas I've seen have been: (a) Tom Lane's idea of denormalizing by copying the animal type column into the maulings table with ON UPDATE CASCADE, and then adding a CHECK constraint on that column, and (b) Creating a separate table called "wolf" and some triggers that ensure that the wolf table will always contain the subset of IDs from the animal table where the type_id is that of a wolf, with a foreign key constraint from that id column back to animal with "on delete cascade". This ensures that nobody can delete a wolf or change it into a sheep if it has maulings, but permits it otherwise. For what it's worth, I've adopted the latter solution for the present. Unfortunately, it's too much work to do it everywhere it would be nice to have, so I'm just doing it in some really critical cases and hoping that the others don't break. Thanks, ...Robert -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: Friday, February 23, 2007 4:02 AM To: Robert Haas Cc: David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Robert Haas wrote: > The idea here is that a wolf can attack a sheep, or a wolf can attack > another wolf, but sheep can't attack anything. I suppose I could list > each wolf in both the predator and prey tables, but that seems a bit > duplicative (and causes other problems). > > ...Robert I'm quite certain a wolf is much more likely to attack a sheep than to attack another wolf, and even more unlikely to attack for example a lion. It seems to me that just the fact that it can isn't enough information. It looks like you need "weighted constraints"; there's 0 chance that a sheep attacks a wolf, but there's >0 chance that a wolf attacks a sheep, >0 chance it attacks a wolf and >0 chance it attacks a lion. The exact numbers will vary, and I have absolutely no idea what they would be like. It probably requires some kind of ranking system that adjusts according to the known animals and their likelihood to attack eachother. I'm pretty sure you can't get this done without defining some triggers. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] complex referential integrity constraints
I sort of think that this kind of stuff belongs in a separate table somehow. For example in this case I would want to: CREATE TABLE attack_probability ( attacker_type_id integer not null references animal_type (id), victim_type_id integer not null references animal_type (id), percent_chance integer not null, primary key (attacker_type_id, victim_type_id) ); ...and then declare that the CONSTRAINT on the "maulings" table is that if I look up the types of the attacker and victim, that pair of types must be present in the attack_probability table with percent_chance > 0. I guess my point here is that I think in your proposal you are letting domain-specific data creep into the schema. It's the job of the schema to enforce integrity constraints, but not to know specifically how things work. The fact (if it is a fact) that the chance of one type of animal attacking another can be captured as a probability (rather than, say, one probability for the day time and another probability for the night time, or one probability for each specific animal rather than each animal type, or I don't know what's going on and want to infer the probabilities from the data after I've gathered it) is domain-specific. I don't really want the information about attack probabilities (or whatever) to be something that's hardcoded in my schema; I want it to be part of the data in the schema, with the schema enforcing such constraints on that data as I may see fit to define. I don't want to have to select things out of system tables to find out attack probabilities. Also, as a practical matter, I suspect that such a setup would result in an absurdly complex constraint language. ...Robert -Original Message- From: Alban Hertroys [mailto:[EMAIL PROTECTED] Sent: Monday, February 26, 2007 4:15 AM To: Robert Haas Cc: David Fetter; pgsql-general@postgresql.org Subject: Re: [GENERAL] complex referential integrity constraints Robert Haas wrote: > I don't understand what a weighted constraint would mean. Either the > attacker_id can be a wolf, or it can't. Knowing that it is only 1% > likely over the long haul is insufficient to disallow any particular > transaction. Basically I suggested to combine the constraint with a probability. If the probability of one animal attacking another is 0% it can't attack the other animal - that's a strict constraint. The other way around it can, and you'll also immediately know how likely that is to happen. An added bonus is that you can generalize certain constraints. If certain animals are less than - say 25% - likely to attack other certain other animals you could determine that the attacked animal is not in fact prey. An example would probably be wolves attacking other wolves (or predators in general). For relations that depend on an animal being prey, a constraint would be that this number be <25%. In this discussion it is also not entirely defined what attacking means. Is a ram defending his horde from wolves attacking (wolves)? I realise this all started from an analogy to a real problem, so most of this is probably not very relevant to your actual problem. No less interesting, though. -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq