Eric Haszlakiewicz <[EMAIL PROTECTED]> writes:
> On Sun, Oct 19, 2008 at 10:15:22PM -0400, Tom Lane wrote:
>> Well, different chroot would do it, but you didn't mention that ;-)

>   er.. why does a chroot matter?

Putting the servers in different chroots would mean that they see two
different /tmp directories, thus no conflict from both trying to open
Unix-domain sockets at /tmp/.s.PGSQL.5432.

>> Anyway, I still think that the proposed documentation patches are wrong,
>> because the code ought to work as long as you don't have a direct
>> conflict on TCP or Unix sockets.  It's true that the port number is used

> I don't understand how the configuration I have contains a conflict.

It doesn't.  So the question is why do you have a problem?

>> What platform is this, anyway?

> I'm running on NetBSD 4.

> Well, it seems that something doesn't work right with the "try the next key"
> code when the userid are the same.  I'm not really sure what I should try
> here.

I read the code and the shmget spec a bit more.  It looks to me like the
issue may be about the ordering of error checks in the kernel.  The
Single Unix Spec quoth

    The shmget() function will fail if:

    [EEXIST]
    A shared memory identifier exists for the argument key but
    (shmflg&IPC_CREAT)&&(shmflg&IPC_EXCL) is non-zero.

    [EINVAL]
    The value of size is less than the system-imposed minimum or greater
    than the system-imposed maximum, or a shared memory identifier exists
    for the argument key but the size of the segment associated with it is
    less than size and size is not 0. 

    [ and some other error cases that aren't interesting here ]

If you are starting the two servers with different shmem sizing
parameters then it is possible that the second reason for giving EINVAL
applies.  Now our code is expecting to get EEXIST if there's a shmem
conflict, and it treats EINVAL as fatal because of the first reason for
giving EINVAL.  I wonder whether NetBSD is coded so that it kicks out
EINVAL in this situation.  It would be within its rights according to
SUS I suppose (since the spec quoth "If more than one error occurs in
processing a function call, any one of the possible errors may be
returned, as the order of detection is undefined.") but I would still
argue that this is a kernel bug because that behavior is useless.
The EINVAL error is sufficiently ambiguous that it should not be
returned if there is a less ambiguous reason to fail.

For comparison, the Linux manpage for shmget says in so many words

       If shmflg specifies both IPC_CREAT and IPC_EXCL and a shared
       memory segment already exists for key, then shmget() fails with
       errno set to EEXIST.

and the Darwin (some-BSD-derived) manpage also gives EEXIST priority,
saying

     [EINVAL]           No shared memory segment is to be created, and a
                        shared memory segment exists for key, but the size of
                        the segment associated with it is less than size,
                        which is non-zero.

So the first question for you is did you give the two servers different
shmem sizing parameters?  If so, does the behavior change if you start
them in the opposite order?  If the answer to both is "yes" then I think
you ought to file a bug against NetBSD kernel.  They're returning an
error code that is uselessly confusing and out of step with other
implementations.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to