Re: [HACKERS] pg_rewind failure by file deletion in source server

Fujii Masao Tue, 23 Jun 2015 20:50:07 -0700

On Wed, Jun 24, 2015 at 3:36 AM, Michael Paquier
<[email protected]> wrote:
> On Wed, Jun 24, 2015 at 1:40 AM, Fujii Masao <[email protected]> wrote:
>> On Tue, Jun 23, 2015 at 11:21 PM, Heikki Linnakangas <[email protected]> wrote:
>>> On 06/23/2015 05:03 PM, Fujii Masao wrote:
>>>>
>>>> On Tue, Jun 23, 2015 at 9:19 PM, Heikki Linnakangas <[email protected]>
>>>> wrote:
>>>>>
>>>>> On 06/23/2015 07:51 AM, Michael Paquier wrote:
>>>>>>
>>>>>>
>>>>>> So... Attached are a set of patches dedicated at fixing this issue:
>>>>>
>>>>>
>>>>>
>>>>> Thanks for working on this!
>>>>>
>>>>>> - 0001, add if_not_exists to pg_tablespace_location, returning NULL if
>>>>>> path does not exist
>>>>>> - 0002, same with pg_stat_file, returning NULL if file does not exist
>>>>>> - 0003, same with pg_read_*file. I added them to all the existing
>>>>>> functions for consistency.
>>>>>> - 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
>>>>>> (thanks Robert for the naming!)
>>>>>> - 0005, as things get complex, a set of regression tests aimed to
>>>>>> covering those things. pg_tablespace_location is platform-dependent,
>>>>>> so there are no tests for it.
>>>>>> - 0006, the fix for pg_rewind, using what has been implemented before.
>>>>>
>>>>>
>>>>>
>>>>> With thes patches, pg_read_file() will return NULL for any failure to
>>>>> open
>>>>> the file, which makes pg_rewind to assume that the file doesn't exist in
>>>>> the
>>>>> source server, and will remove the file from the destination. That's
>>>>> dangerous, those functions should check specifically for ENOENT.
>>>>
>>>>
>>>> I'm wondering if using pg_read_file() to copy the file from source server
>>>> is reasonable. ISTM that it has two problems as follows.
>>>>
>>>> 1. It cannot read very large file like 1GB file. So if such large file was
>>>>      created in source server after failover, pg_rewind would not be able
>>>>      to copy the file. No?
>>>
>>>
>>> pg_read_binary_file() handles large files just fine. It cannot return more
>>> than 1GB in one call, but you can call it several times and retrieve the
>>> file in chunks. That's what pg_rewind does, except for reading the control
>>> file, which is known to be small.
>>
>> Yeah, you're right.
>>
>> I found that pg_rewind creates a temporary table to fetch the file in chunks.
>> This would prevent pg_rewind from using the *hot standby* server as a source
>> server at all. This is of course a limitation of pg_rewind, but we might want
>> to alleviate it in the future.
>
> This is something that a replication command could address properly.
>
>>>> 2. Many users may not allow a remote client to connect to the
>>>>      PostgreSQL server as a superuser for some security reasons. IOW,
>>>>      there would be no entry in pg_hba.conf for such connection.
>>>>      In this case, pg_rewind always fails because pg_read_file() needs
>>>>      superuser privilege. No?
>>>>
>>>> I'm tempting to implement the replication command version of
>>>> pg_read_file(). That is, it reads and sends the data like BASE_BACKUP
>>>> replication command does...
>>>
>>>
>>> Yeah, that would definitely be nice. Peter suggested it back in January
>>> (http://www.postgresql.org/message-id/[email protected]). I think
>>> it's way too late to do that for 9.5, however. I'm particularly worried that
>>> if we design the required API in a rush, we're not going to get it right,
>>> and will have to change it again soon. That might be difficult in a minor
>>> release. Using pg_read_file() and friends is quite flexible, even though we
>>> just find out that they're not quite flexible enough right now (the ENOENT
>>> problem).
>>
>> I agree that it's too late to do what I said...
>>
>> But just using pg_read_file() cannot address the #2 problem that I pointed
>> in my previous email. Also requiring a superuser privilege on pg_rewind
>> really conflicts with the motivation why we added replication privilege.
>>
>> So we should change pg_read_file() so that even replication user can
>> read the file?
>
> From the security prospective, a replication user can take a base
> backup so it can already retrieve easily the contents of PGDATA. Hence
> I guess that it would be fine. However, what about cases where
> pg_hba.conf authorizes access to a given replication user via psql and
> blocks it for the replication protocol? We could say that OP should
> not give out replication access that easily, but in this case the user
> would have access to the content of PGDATA even if he should not. Is
> that unrealistic?


The most realistic case is that both source and target servers have
the pg_hba.conf containing the following authentication setting
regarding replication. That is, each server allows other to use the
replication user to connect to via replication protocol.

    # TYPE  DATABASE  USER     ADDRESS    METHOD
    host      replication  repuser  X.X.X.X/Y  md5

This case makes me think that allowing even replication user to
call pg_read_file() may not be good solution for us. Because
in that case replication user needs to log in the real database
like "postgres" to call pg_read_file(), but usually there would be
no entry allowing replication user to connect to any real database in
pg_hba.conf. So if we want to address this problem, replication
command version of pg_read_file() would be required. However,
that's too late to do for now...

>> Or replication user version of pg_read_file() should be implemented?
>
> You mean a new function? In what is it different from authorizing
> pg_read_file usage for a replication user?
>
> Honestly, I can live with this superuser restriction in 9.5. And come
> back to the replication user restriction in 9.6 once things cool down
> a bit.

Yeah, finally I agree with you. This seems only approach we can adopt
for now.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_rewind failure by file deletion in source server

Reply via email to