Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Andrew Klosterman
On Mon, 13 Feb 2006, Stephen Frost wrote:

> * Andrew Klosterman ([EMAIL PROTECTED]) wrote:
> > (gdb) bt
> > #0  0x401c3851 in kill () from /lib/libc.so.6
> > #1  0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
> > #2  0x40139823 in memalign () from /usr/lib/libefence.so.0
> > #3  0x401399ad in malloc () from /usr/lib/libefence.so.0
> > #4  0x40139a10 in calloc () from /usr/lib/libefence.so.0
> > #5  0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
> > #6  0x402c8b3f in ?? () from /usr/lib/libpq.so.4
> > #7  0x402ded88 in ?? () from /usr/lib/libpq.so.4
> > #8  0x in ?? ()
> >
> > Looks like something fishy going on between libpq and libkrb5.  I'm
> > especially suspicious since I'm not using kerberos for authentication at
> > all.
>
> Seems kind of unlikely...  What exact (.deb) versions of libpq and
> Postgres are you using?  You originally posted w/ 8.1.0 but perhaps on
> the client you had something more recent?
>
>   Thanks,
>
>   Stephen

Running "aptitude show X" where "X" is the package name, and applying
appropriate filtering gives the following results on my development
systems:

Package: libpq-dev
Version: 8.1.0-3

Package: libpq3
Version: 1:7.4.9-2

Package: libpq4
Version: 8.1.0-3

Package: postgresql-8.1
Version: 8.1.0-3

Package: postgresql-contrib-8.1
Version: 8.1.0-3

Package: postgresql-server-dev-8.1
Version: 8.1.0-3

Package: postgresql-client-8.1
Version: 8.1.0-3

Package: postgresql-common
Version: 39

(I frequently update and upgrade my installations...)

--Andrew J. Klosterman
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Jens-Wolfhard Schicke



--On Montag, Februar 13, 2006 21:25:30 -0500 Stephen Frost 
<[EMAIL PROTECTED]> wrote:



* Andrew Klosterman ([EMAIL PROTECTED]) wrote:

> Seems kind of unlikely...  What exact (.deb) versions of libpq and
> Postgres are you using?  You originally posted w/ 8.1.0 but perhaps on
> the client you had something more recent?

aptitude install build-essential debhelper cdbs bison perl libperl-dev \
tk8.4-dev flex libreadline5-dev libssl-dev zlib1g-dev \
libpam0g-dev libxml2-dev libkrb5-dev libxslt1-dev python-dev \
gettext bzip2 fakeroot
You might want to add valgrind to this list. It analyzes code on assembler 
basis and does a lot of memory checking / undefined variables checking 
while the program runs. Fixed all SIGSEGV I ever encoutered which were not 
infinite recursions.


Mit freundlichem Gruß
Jens Schicke
--
Jens Schicke  [EMAIL PROTECTED]
asco GmbH http://www.asco.de
Mittelweg 7   Tel 0531/3906-127
38106 BraunschweigFax 0531/3906-400

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Volkan YAZICI
On Feb 13 04:01, Andrew Klosterman wrote:
> I threw in a pthread mutex around the code making the database connections
> for each of my threads.  The problem is still there ("corrupted
> double-linked list").
> ...
> Program received signal SIGILL, Illegal instruction.
> [Switching to Thread 16384 (LWP 24753)]
> 0x401c3851 in kill () from /lib/libc.so.6
> (gdb) bt
> #0  0x401c3851 in kill () from /lib/libc.so.6
> #1  0x40139dd5 in EF_Abort () from /usr/lib/libefence.so.0
> #2  0x40139823 in memalign () from /usr/lib/libefence.so.0
> #3  0x401399ad in malloc () from /usr/lib/libefence.so.0
> #4  0x40139a10 in calloc () from /usr/lib/libefence.so.0
> #5  0x404a182f in krb5_set_default_tgs_ktypes () from /usr/lib/libkrb5.so.3
> #6  0x402c8b3f in ?? () from /usr/lib/libpq.so.4
> #7  0x402ded88 in ?? () from /usr/lib/libpq.so.4
> #8  0x in ?? ()

I met with some other thread-safety issues caused by libc used in
Debian repos. For instance, getpwuid_r() is broken in Debian's
current stable libc package and this causes a similar memory leak
in the libpq code.

IMHO, testing code with a newer libc version can be the solution.
Otherwise, for an exact answer - as Tom said - we need libpq symbols
in the backtrace.


Regards.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [BUGS] BUG #2257: Can' stop server while autovacuum is running

2006-02-14 Thread Tom Lane
"Evgeny Gridasov" <[EMAIL PROTECTED]> writes:
> autovacuum process (when active) did not respond to kill (TERM). Only kill
> -9 helped to stop autovacuum process.

autovacuum does respond to shutdown requests, but in poking at this I
found that btree index vacuuming may fail to notice a pending interrupt
for long periods, if you've got large indexes.  I've committed a fix
for that.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Stephen Frost
* Andrew Klosterman ([EMAIL PROTECTED]) wrote:
> Alright, I have built a system with the symbols left into the binaries.
[...]
> Again, it is showing a bad malloc in what appears to be some code using
> kerberos.  But there's nothing in my setup that I can think of right now
> that should induce a connection to be set up using kerberos.

The Kerberos libraries are still called when support for them has been
compiled in.  They generally don't cause any problems though.  For some
reason the line numbers in the backtrace line up but the function names
don't quite (perhaps inlineing).  Anyhow, the error is being reported
down in 'krb5_init_context()' so either something strange is happening
or it's actually a Kerberos bug after all.  The reason the Kerberos
libraries are called is to get the 'username' to use, which is
determined prior to actually connecting to the backend (and finding
out what authentication mechanism the backend thinks we should be
trying).

It's kind of a chicken-and-egg here because the backend decides what
authentication mechanism to ask for based off the username (at least in
part) through pg_hba.conf, so you can't find out the authentication
method until you know the username so all methods to find the username
have to be exhausted.  You could avoid this by explicitly passing
'user=' into the connection parameters though...  Would be interesting
to know what happens then...

Might also be interesting to look into the Kerberos libraries to see why
they're attempting to malloc(0), perhaps there's a bug there when
Kerberos isn't set up on the machine?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Stephen Frost
* Andrew Klosterman ([EMAIL PROTECTED]) wrote:
> On Tue, 14 Feb 2006, Stephen Frost wrote:
> 
> > It's kind of a chicken-and-egg here because the backend decides what
> > authentication mechanism to ask for based off the username (at least in
> > part) through pg_hba.conf, so you can't find out the authentication
> > method until you know the username so all methods to find the username
> > have to be exhausted.  You could avoid this by explicitly passing
> > 'user=' into the connection parameters though...  Would be interesting
> > to know what happens then...
> 
> When asking about "explicitly passing 'user=' in to the connection
> parameters" do you mean that the EXEC SQL CONNECT line that ecpg parses
> should specify a username?

Oh, I see now.  You're not using PQconnectdb but rather PQsetdbLogin, or
at least, that's what ECPG is using.  This ends up meaning that you
can't pass in your own conninfo string and during the PQsetdbLogin call,
libpq calls connectOptions1 with an empty conninfo string, which makes
libpq think there's no set username which in turn makes it ask the
Kerberos libraries for a username...

As an initial comment, it seems like it'd be a good thing to update ECPG
to use PQconnectdb.  It's possible this is exposed already in some way
but I'm not familiar enough with ECPG to know.

Another approach would be to have PQsetdbLogin build up a conninfo
string and pass that into connectOptions1 instead of calling
connectOptions1 with an empty string and then changing the values
afterwards.  That'd probably be too large of a change to get in as a
bugfix though.  An alternative might be to move the pg_fe_getauthname()
call to connectOptions2 as it's actually a bit more work than one might
expect and if that can be avoided then that's probably all to the good.
I'm a little worried about if that would work for all the various ways
to use libpq to connect to the database...

Sorry I don't have a simple answer. :/  In the end it seems like the
Kerberos libraries should be able to survive Kerberos not being
configured or whatever is going on to make it try to malloc 0-bytes...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Tom Lane
Stephen Frost <[EMAIL PROTECTED]> writes:
> Another approach would be to have PQsetdbLogin build up a conninfo
> string and pass that into connectOptions1 instead of calling
> connectOptions1 with an empty string and then changing the values
> afterwards.  That'd probably be too large of a change to get in as a
> bugfix though.  An alternative might be to move the pg_fe_getauthname()
> call to connectOptions2 as it's actually a bit more work than one might
> expect and if that can be avoided then that's probably all to the good.

Right offhand I like the idea of pushing it into connectOptions2 --- can
you experiment with that?  Seems like there is no reason to call
Kerberos if the user supplies the name to connect as.

> Sorry I don't have a simple answer. :/  In the end it seems like the
> Kerberos libraries should be able to survive Kerberos not being
> configured or whatever is going on to make it try to malloc 0-bytes...

We may be spending too much time on this one point --- as long as
Kerberos isn't *writing* into the zero-length alloc, there is nothing
illegal immoral or fattening about malloc(0).  Can you get ElectricFence
to not abort right here but continue on to the real problem?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > Another approach would be to have PQsetdbLogin build up a conninfo
> > string and pass that into connectOptions1 instead of calling
> > connectOptions1 with an empty string and then changing the values
> > afterwards.  That'd probably be too large of a change to get in as a
> > bugfix though.  An alternative might be to move the pg_fe_getauthname()
> > call to connectOptions2 as it's actually a bit more work than one might
> > expect and if that can be avoided then that's probably all to the good.
> 
> Right offhand I like the idea of pushing it into connectOptions2 --- can
> you experiment with that?  Seems like there is no reason to call
> Kerberos if the user supplies the name to connect as.

Sure thing, I'll take a look at this probably tommorow night or thursday
evening.

> > Sorry I don't have a simple answer. :/  In the end it seems like the
> > Kerberos libraries should be able to survive Kerberos not being
> > configured or whatever is going on to make it try to malloc 0-bytes...
> 
> We may be spending too much time on this one point --- as long as
> Kerberos isn't *writing* into the zero-length alloc, there is nothing
> illegal immoral or fattening about malloc(0).  Can you get ElectricFence
> to not abort right here but continue on to the real problem?

Good point.

Stephen


signature.asc
Description: Digital signature


Re: [BUGS] BUG #2246: Bad malloc interactions: ecpg, openssl

2006-02-14 Thread Tom Lane
Andrew Klosterman <[EMAIL PROTECTED]> writes:
> (gdb) print *conn
> ...
>   allow_ssl_try = 1 '\001', wait_ssl_try = 0 '\0', ssl = 0x806d1d0,
>   peer = 0x807e430,
> ...
> *** glibc detected *** corrupted double-linked list: 0x0807e428 ***

Hm, it looks like the problem is associated with whatever was allocated
just before conn->peer (which is returned by SSL_get_peer_certificate
called from open_client_SSL).  Can you get efence or some other tool to
produce a trace of malloc calls so we can determine what that is?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend