Re: [BUGS] Recovery bug

2010-10-19 Thread Heikki Linnakangas

On 19.10.2010 08:51, Fujii Masao wrote:

On Tue, Oct 19, 2010 at 7:00 AM, Jeff Davis  wrote:

Do users have any expectation that they can restore a backup without
using recovery.conf by merely having the WAL segments in pg_xlog?


I would expect that to work.


What's the use case?


Creating a stand-alone backup tarball that contains both the base backup 
and all WAL files required to restore.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Recovery bug

2010-10-19 Thread Fujii Masao
On Tue, Oct 19, 2010 at 5:18 PM, Heikki Linnakangas
 wrote:
> On 19.10.2010 08:51, Fujii Masao wrote:
>>
>> On Tue, Oct 19, 2010 at 7:00 AM, Jeff Davis  wrote:
>
> Do users have any expectation that they can restore a backup without
> using recovery.conf by merely having the WAL segments in pg_xlog?

 I would expect that to work.
>>
>> What's the use case?
>
> Creating a stand-alone backup tarball that contains both the base backup and
> all WAL files required to restore.

Oh, I had forgotten that case. Thanks. I'd drop my proposal.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Recovery bug

2010-10-19 Thread Heikki Linnakangas

On 18.10.2010 01:48, Jeff Davis wrote:

On Fri, 2010-10-15 at 15:58 -0700, Jeff Davis wrote:

I don't have a fix yet, because I think it requires a little discussion.
For instance, it seems to be dangerous to assume that we're starting up
from a backup with access to the archive when it might have been a crash
of the primary system. This is obviously wrong in the case of an
automatic restart, or one with no restore_command. Fixing this issue
might also remove the annoying "If you are not restoring from a backup,
try removing..." PANIC error message.

Also, in general we should do more logging during recovery, at least the
first stages, indicating what WAL segments it's looking for to get
started, why it thinks it needs that segment (from backup or control
data), etc. Ideally we would verify that the necessary files exist (at
least the initial ones) before making permanent changes. It was pretty
painful trying to work backwards on this problem from the final
controldata (where checkpoint and prior checkpoint are the same, and
redo is before both), a crash, a PANIC, a backup_label.old, and not much
else.



Here's a proposed fix. I didn't solve the problem of determining whether
we really are restoring a backup, or if there's just a backup_label file
left around.

I did two things:
   1. If reading a checkpoint from the backup_label location, verify that
the REDO location for that checkpoint exists in addition to the
checkpoint itself. If not, elog with a FATAL immediately.


Makes sense. I wonder if we could just move the rename() after reading 
the checkpoint record?



   2. Change the error that happens when the checkpoint location
referenced in the backup_label doesn't exist to a FATAL. If it can
happen due to a normal crash, a FATAL seems more appropriate than a
PANIC.


I guess, although it's really not appropriate that the database doesn't 
recover after a crash during a base backup.



I still think it would be nice if postgres knew whether it was restoring
a backup or recovering from a crash, otherwise it's hard to
automatically recover from failures. I thought about using the presence
of recoveryRestoreCommand or PrimaryConnInfo to determine that. But it
seemed potentially dangerous if the person restoring a backup simply
forgot to set those, and then it tries restoring from the controldata
instead (which is unsafe to do during a backup).


Right, that's not good either.

One alternative is to not remove any WAL files during a base backup. The 
obvious downside is that if the backup takes a long time, you run out of 
disk space.


The fundamental problem is that by definition, a base backup is 
completely indistinguishable from the data directory in the original 
server. Or is it? We recommend that you exclude the files under pg_xlog 
from the backup. So we could create a "pg_xlog/just_kidding" file along 
with backup_label. When starting recovery, if just_kidding exists, we 
can assume that we're doing crash recovery and ignore backup_label.


Excluding pg_xlog is just a recommendation at the moment, though, so we 
would need a big warning in the docs. And some way to enforce that 
just_kidding is not included in the backup would be nice, maybe we could 
remove read-permission from it?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #5717: Domain as array of numeric/varchar does not respect limits

2010-10-19 Thread Richard Huxton

The following bug has been logged online:

Bug reference:  5717
Logged by:  Richard Huxton
Email address:  d...@archonet.com
PostgreSQL version: 9.0.1
Operating system:   linux
Description:Domain as array of numeric/varchar does not respect
limits
Details: 

Summary: you can insert numbers that are outside the numeric(n,m)
restrictions via a function's return value *iff* the numbers are elements of
an array. This does not apply to a single numeric. A similar issue applies
to varchar lengths.

The only route appears to be through the return value of an array.
Presumably the system trusts the value to be restricted to the domain when
it isn't.

The following allows (and displays) {121.} and {0.0001} in a column
defined as numeric(4,2)[1].


BEGIN;

CREATE DOMAIN mynums numeric(4,2)[1];

CREATE TEMP TABLE tt(n mynums);
CREATE TEMP TABLE tt2(n numeric[1]);

CREATE FUNCTION mul_num(n mynums) RETURNS mynums AS $$
DECLARE
n2 mynums;
i  integer;
BEGIN
n2[1] := n[1] * n[1];
RETURN n2;
END;
$$ LANGUAGE plpgsql;

INSERT INTO tt VALUES (ARRAY[1]);
SELECT * FROM tt;

\echo
\echo 'This should not work'
\echo
INSERT INTO tt SELECT mul_num(ARRAY[11]);
INSERT INTO tt SELECT mul_num(ARRAY[0.01]);
SELECT * FROM tt;

\echo
\echo 'This fails, which is what I expect'
\echo
SAVEPOINT s1;
INSERT INTO tt VALUES (ARRAY[121]);
ROLLBACK TO s1;
INSERT INTO tt2 VALUES (ARRAY[121]);
INSERT INTO tt SELECT n FROM tt2;

ROLLBACK;

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Recovery bug

2010-10-19 Thread Jeff Davis
On Tue, 2010-10-19 at 12:26 +0300, Heikki Linnakangas wrote:
> >1. If reading a checkpoint from the backup_label location, verify that
> > the REDO location for that checkpoint exists in addition to the
> > checkpoint itself. If not, elog with a FATAL immediately.
> 
> Makes sense. I wonder if we could just move the rename() after reading 
> the checkpoint record?

I assume you mean "after reading the record at the REDO location"? And
you'd need to make sure that any changes to the control data were after
reading the record at the REDO location, as well.

I, too, thought about juggling the order around, but there are quite a
few global variables so it seemed fairly fragile. I'm open to
suggestion, however.

> >2. Change the error that happens when the checkpoint location
> > referenced in the backup_label doesn't exist to a FATAL. If it can
> > happen due to a normal crash, a FATAL seems more appropriate than a
> > PANIC.
> 
> I guess, although it's really not appropriate that the database doesn't 
> recover after a crash during a base backup.

Agreed. I'm a little concerned that any fix for that might be intrusive
enough that we didn't want to back-patch it though.

> One alternative is to not remove any WAL files during a base backup. The 
> obvious downside is that if the backup takes a long time, you run out of 
> disk space.

That doesn't seem very appealing.

> The fundamental problem is that by definition, a base backup is 
> completely indistinguishable from the data directory in the original 
> server. Or is it? We recommend that you exclude the files under pg_xlog 
> from the backup. So we could create a "pg_xlog/just_kidding" file along 
> with backup_label. When starting recovery, if just_kidding exists, we 
> can assume that we're doing crash recovery and ignore backup_label.

I had some similar ideas, but I rejected them because they added room
for error into the process. One of the problems is that assuming that
we're doing normal recovery when we're doing backup recovery is more
dangerous than the other way around (setting aside this particular bug I
found, anyway). That's because the control data might start the WAL
replay too _late_, which is worse than starting it too early. 

> Excluding pg_xlog is just a recommendation at the moment, though, so we 
> would need a big warning in the docs. And some way to enforce that 
> just_kidding is not included in the backup would be nice, maybe we could 
> remove read-permission from it?

Hmm, removing the read bit would add some confidence into the process. I
like that idea better than just assuming that the user won't copy it.

I think I like this direction most, because it doesn't leave us
guessing. If the file is there then we assume normal recovery. If we
encounter recovery.conf we throw FATAL. If we encounter backup_label we
can simply remove it (perhaps with a warning that there was a crash in
the middle of a backup).

However, I do still have some lingering doubts. One is that the file
probably needs to have some content, just in case a backup tool blindly
creates the directory entry and then silently ignores the fact that it
couldn't copy the contents. Another concern is: what if they are using
some kind of filesystem mirroring tool that doesn't take consistent
snapshots (do such tools exist?)? Are there other system-level tools
that people might be using that are safe now, but would be dangerous if
they somehow got a copy of the file?

Regards,
Jeff Davis


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5716: Regression joining tables in UPDATE with composite types

2010-10-19 Thread Tom Lane
"Andrew Tipton"  writes:
> Attempting to execute an UPDATE that joins to another table where the join
> condition is comparing a composite type fails with the (presumably internal)
> error message "psql:testcase.sql:29: ERROR:  could not find pathkey item to
> sort".

Fixed, thanks for the report!

BTW ... while this is unrelated to the cause of the problem, I think
this is quite an inefficient coding technique:

> CREATE TYPE price_key AS (
> id INTEGER
> );

> CREATE FUNCTION price_key_from_table(price) RETURNS price_key AS $$
> SELECT $1.id
> $$ LANGUAGE SQL IMMUTABLE;

> CREATE FUNCTION price_key_from_input(price_input) RETURNS price_key AS $$
> SELECT $1.id
> $$ LANGUAGE SQL IMMUTABLE;

> UPDATE price ...
> WHERE price_key_from_table(price.*) = 
> price_key_from_input(input_prices.*);

Comparing composite types is probably a good two orders of magnitude
slower than comparing plain ints would be.  I'm sure that coding
technique looks cute, but you're paying through the nose for it.
Consider making price_key a simple domain over int.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Recovery bug

2010-10-19 Thread Jeff Davis
On Tue, 2010-10-19 at 09:51 -0700, Jeff Davis wrote:
> On Tue, 2010-10-19 at 12:26 +0300, Heikki Linnakangas wrote:
> > Excluding pg_xlog is just a recommendation at the moment, though, so we 
> > would need a big warning in the docs. And some way to enforce that 
> > just_kidding is not included in the backup would be nice, maybe we could 
> > remove read-permission from it?
> 
> Hmm, removing the read bit would add some confidence into the process. I
> like that idea better than just assuming that the user won't copy it.
> 
> I think I like this direction most, because it doesn't leave us
> guessing. If the file is there then we assume normal recovery. If we
> encounter recovery.conf we throw FATAL. If we encounter backup_label we
> can simply remove it (perhaps with a warning that there was a crash in
> the middle of a backup).
> 

On second thought, this doesn't sound backpatch-friendly. We should
probably put a simpler fix in first and back-patch it. Then we can do
something like your proposal for 9.1. What do you think of my proposed
patch?

Regards,
Jeff Davis


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] Recovery bug

2010-10-19 Thread Robert Haas
On Tue, Oct 19, 2010 at 5:26 AM, Heikki Linnakangas
 wrote:
> The fundamental problem is that by definition, a base backup is completely
> indistinguishable from the data directory in the original server. Or is it?
> We recommend that you exclude the files under pg_xlog from the backup. So we
> could create a "pg_xlog/just_kidding" file along with backup_label. When
> starting recovery, if just_kidding exists, we can assume that we're doing
> crash recovery and ignore backup_label.

I think you'll find that it's completely impossible to make this work
reliably given all of the ways people may choose to make a backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5705: btree_gist: Index on inet changes query result

2010-10-19 Thread Robert Haas
On Mon, Oct 11, 2010 at 7:50 PM, Tom Lane  wrote:
> "Andreas Karlsson"  writes:
>> I was looking at the code to see how one would improve indexing of the inet
>> types and saw an inconsistency between the compressed format
>> (gbt_inet_compress) and how network_cmp_internal works. The btree_gist
>> module ignores the netmask.
>
> Well, actually the btree_gist implementation for inet is a completely
> broken piece of junk: it thinks that convert_network_to_scalar is 100%
> trustworthy and can be used as a substitute for the real comparison
> functions, which isn't even approximately true.  I'm not sure why
> Teodor implemented it like that instead of using the type's real
> comparison functions, but it's pretty much useless if you want the
> same sort order that the type itself defines.

Are you planning to fix this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] BUG #5705: btree_gist: Index on inet changes query result

2010-10-19 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 11, 2010 at 7:50 PM, Tom Lane  wrote:
>> Well, actually the btree_gist implementation for inet is a completely
>> broken piece of junk: it thinks that convert_network_to_scalar is 100%
>> trustworthy and can be used as a substitute for the real comparison
>> functions, which isn't even approximately true.

> Are you planning to fix this?

No.  I don't understand why Teodor did it like that, so I'm not going
to try to change it.  I'd be willing to take responsibility for ripping
out btree_gist's inet support altogether ...

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs