dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

2020-10-05 Thread Mario Emmenlauer


Dear All,

I've used PostgreSQL since version 9.x successfully on Linux, macOS
and Windows. Today I've upgraded from 12.3 to 13.0 and suddenly I can
not start the server any more on Ubuntu 20.04 (inside Docker on Ubuntu
18.04) and on macOS 10.15.

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING:  dup(0) failed after 0 successes: 
Bad file descriptor
2020-10-05 11:48:19.720 CEST [84731] FATAL:  insufficient file descriptors 
available to start server process
2020-10-05 11:48:19.720 CEST [84731] DETAIL:  System allows 0, we need at least 
58.
2020-10-05 11:48:19.720 CEST [84731] LOG:  database system is shut down

What makes this quite curious is that everything continues to work on
Ubuntu 18.04, and Windows with Visual Studio 2019.

I compile postgreSQL myself from source, but there are no patches or
tweaks involved (that I could think relevant for this problem).

I've searched for the particular error and understand that it is usually
caused by system limits on new files(?) But in my case, the failing
setups are relatively modern, powerful CI test machines, with virtually
no load at the time of test.

On Linux Ubuntu 20.04, `ulimit -n` shows `1048576` which seems also
relatively high (but must be the default, not changed by me).

Does this problem mean anything to anyone? I'm completely lost where
to go from here, or what to try next :-( Help would be greatly
appreciated!

All the best,

Mario Emmenlauer




Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

2020-10-05 Thread Mario Emmenlauer
On 05.10.20 13:22, Mario Emmenlauer wrote:
> I've used PostgreSQL since version 9.x successfully on Linux, macOS
> and Windows. Today I've upgraded from 12.3 to 13.0 and suddenly I can
> not start the server any more on Ubuntu 20.04 (inside Docker on Ubuntu
> 18.04) and on macOS 10.15.
> 
> I get reproducibly the error:
> 2020-10-05 11:48:19.720 CEST [84731] WARNING:  dup(0) failed after 0 
> successes: Bad file descriptor
> 2020-10-05 11:48:19.720 CEST [84731] FATAL:  insufficient file descriptors 
> available to start server process
> 2020-10-05 11:48:19.720 CEST [84731] DETAIL:  System allows 0, we need at 
> least 58.
> 2020-10-05 11:48:19.720 CEST [84731] LOG:  database system is shut down
> 
> What makes this quite curious is that everything continues to work on
> Ubuntu 18.04, and Windows with Visual Studio 2019.
> 
> I compile postgreSQL myself from source, but there are no patches or
> tweaks involved (that I could think relevant for this problem).
> 
> I've searched for the particular error and understand that it is usually
> caused by system limits on new files(?) But in my case, the failing
> setups are relatively modern, powerful CI test machines, with virtually
> no load at the time of test.
> 
> On Linux Ubuntu 20.04, `ulimit -n` shows `1048576` which seems also
> relatively high (but must be the default, not changed by me).
> 
> Does this problem mean anything to anyone? I'm completely lost where
> to go from here, or what to try next :-( Help would be greatly
> appreciated!


I've just regression-tested 12.4 for the above problem, and I can now
isolate that 12.3 and 12.4 do _not_ give me above problem whereas 13.0
_does_ give me above problem. I tried also to isolate commits that may
be related, but could not find anything useful :-(

Help would be greatly appreciated!

All the best,

Mario Emmenlauer




Is it possible to write a generic UPSERT?

2020-11-12 Thread Mario Emmenlauer


Dear all,

I hope this request is not too naiive. Please ignore if its not
matching this lists focus, or if it just shows my lack of SQL
knowledge. Any pointer to documentation would be appreciated!

My question is about the "ON CONFLICT UPDATE" clause. We often have
one dedicated condition that we want to achieve: Insert an item if
it does not exist, or update all given fields if it does exist.

Your excellent documentation makes it easy to implement this in SQL.

But the statement is slightly complex to type, and I find me and my
colleagues often spend more time on this than I would hope. Our two
main challenges are:
(1) we have to look up the uniqueness constraints on the table, and
(2) we have to duplicate the insert statement in the UPDATE section
again, because virtually all fields should get overwritten
(except for the conflicting ones). On long inserts this can be
quite annoying and error-prone.

I can see how "ON CONFLICT" is very powerful. But that power seems
often a burden for us. We would prefer something that is less manual
effort for the specific use case. Basically, we would like:
INSERT if not exist, and
UPDATE _all_ non-conflicting fields in case of _any_ conflict

In my (naiive) thinking, such a construct would cover 99% of our
use cases. Or did other people make very different experiences?


Now the question: I fail to generically implement this. I do not
necessarily mean that this has to be supported by PostgreSQL. Any
solution would work for us, be it in SQL, PostgreSQL or C/C++. For
example an acceptable solution would be a C/C++ wrapper method that
generates the statement, given the corresponding INSERT statement and
the types and values of all arguments. The wrapper would also have
access to the table definitions because we create all tables in code.

We currently do not have concurrency concerns so a more complex
solution using multiple statements would be fine.

Has anybody ever done something like this? Is there an SQL way to
achieve this? Or another programmatic way?

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




Re: Is it possible to write a generic UPSERT?

2020-11-12 Thread Mario Emmenlauer
On 12.11.20 17:55, Alban Hertroys wrote:
>> On 12 Nov 2020, at 14:58, Mario Emmenlauer  wrote:
> 
> (…)
> 
>> But the statement is slightly complex to type, and I find me and my
>> colleagues often spend more time on this than I would hope. Our two
>> main challenges are:
>> (1) we have to look up the uniqueness constraints on the table, and
>> (2) we have to duplicate the insert statement in the UPDATE section
>>again, because virtually all fields should get overwritten
>>(except for the conflicting ones). On long inserts this can be
>>quite annoying and error-prone.
>>
>> I can see how "ON CONFLICT" is very powerful. But that power seems
>> often a burden for us. We would prefer something that is less manual
>> effort for the specific use case. Basically, we would like:
>>INSERT if not exist, and
>>UPDATE _all_ non-conflicting fields in case of _any_ conflict
>>
>> In my (naiive) thinking, such a construct would cover 99% of our
>> use cases. Or did other people make very different experiences?
> 
> (…)
> 
>> Has anybody ever done something like this? Is there an SQL way to
>> achieve this? Or another programmatic way?
> 
> We generate the SQL @work based on the definitions in, IIRC, the 
> information_schema. It has tables for both the column lists per table and the 
> primary key definitions.
> 
> With that, an SQL statement that returns the required SQL statement is easy 
> to generate, after which you can execute it either from a plpgsql execute 
> statement in a function or in a do-block.


This is actually a very very interesting idea! I did not consider
that we could completely generate the statements based on the actual
table information from the information_schema. I need to give this a
bit more thought but I very much like the idea, thanks for pushing me
in a new direction!

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




Re: Is it possible to write a generic UPSERT?

2020-11-13 Thread Mario Emmenlauer
On 12.11.20 18:34, Michael Lewis wrote:
> On Thu, Nov 12, 2020 at 6:58 AM Mario Emmenlauer  <mailto:ma...@emmenlauer.de>> wrote:
> 
> I can see how "ON CONFLICT" is very powerful. But that power seems
> often a burden for us. We would prefer something that is less manual
> effort for the specific use case. Basically, we would like:
>     INSERT if not exist, and
>     UPDATE _all_ non-conflicting fields in case of _any_ conflict
> 
> 
> If you do not have significant ratio of HOT (heap only tuple) updates that 
> you want to preserve and you don't have sequences that are GENERATED AS ALWAYS
> (rather than BY DEFAULT), you could consider just doing a DELETE where the 
> keys exist, then insert all the rows. It should be trivial to figure out the 
> primary
> key or other unique index to match on.
> 
> MERGE command is implemented for this use case in some DBMS, but not Postgres 
> (yet?).

Actually I'm quite happy you suggest this, because its the one thing
that also came to my mind :) I was not really sure if this is a sane
idea. But it seems this could be the "easiest" way out, because its
short (like "easy to write") and works for all our use cases.

The main potential problem for my use case may be cascading deletes,
that I think we may want to use in the long run.

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




lib and share are installed differently, but why?

2021-08-30 Thread Mario Emmenlauer


Dear All,

I've been building and using PostgreSQL from source for a few years
now, thanks for the awesome work!

Since upgrading from 12.5 to 12.8, I've found a problem that I can
not resolve myself. We build PostgreSQL twice, once to folder A, and
once to folder B. This is because A is more easily deployable, while
B holds all our development build targets. The only difference in
the two builds is the change in `--prefix` for configure. Everything
else is identical, and both builds start from a clean directory.

But I see a difference in the installations:
 - The contents of `share` go to A/share/ vs B/share/postgresql/
 - The contents of `lib` go to B/lib vs B/lib/postgresql/

I've played with all sorts of --libdir, --datarootdir and the like,
but the behaviour stays the same. I fail to understand why in the
case of --prefix=B (the target where we host a lot of other stuff),
the install will use subdirectory 'postgresql', whereas for the
--prefix=A (which is an empty directory) the install will not use
a subdirectory.

Whats more, when I try to run the server in A, initdb will complain
that the file in A/share/postgresql/postgresql.bki is missing, which
indeed is true (because the file is at A/share/postgresql.bki).

Any help would be appreciated!

All the best,

Mario Emmenlauer


--
BioDataAnalysis GmbH, Mario Emmenlauer  Tel. Buero: +49-89-74677203
Balanstr. 43   mailto: memmenlauer * biodataanalysis.de
D-81669 München  http://www.biodataanalysis.de/




Re: lib and share are installed differently, but why?

2021-08-30 Thread Mario Emmenlauer
On 30.08.21 16:24, Tom Lane wrote:
> Mario Emmenlauer  writes:
>> I've been building and using PostgreSQL from source for a few years
>> now, thanks for the awesome work!
> 
>> Since upgrading from 12.5 to 12.8, I've found a problem that I can
>> not resolve myself. We build PostgreSQL twice, once to folder A, and
>> once to folder B. This is because A is more easily deployable, while
>> B holds all our development build targets. The only difference in
>> the two builds is the change in `--prefix` for configure. Everything
>> else is identical, and both builds start from a clean directory.
> 
>> But I see a difference in the installations:
>>  - The contents of `share` go to A/share/ vs B/share/postgresql/
>>  - The contents of `lib` go to B/lib vs B/lib/postgresql/
> 
> You didn't say exactly what "A" and "B" stand for, but I'll bet
> that the string "A" includes "pgsql" or "postgres".  Makefile.global
> makes some installation path changes when that's not true, e.g.
> 
> pkglibdir = $(libdir)
> ifeq "$(findstring pgsql, $(pkglibdir))" ""
> ifeq "$(findstring postgres, $(pkglibdir))" ""
> override pkglibdir := $(pkglibdir)/postgresql
> endif
> endif

Ahhh! This is very likely what I am tripped over! Thanks a million!
Indeed my path A does *not* contain "postgres" while B is set to
"A/postgresql". This explains the difference in the install.


> This behavior is pretty ancient; I'm quite sure it didn't change
> between 12.5 and 12.8.  So I speculate that you made some change
> in your build process that you didn't mention.

I'm pretty sure there is no change in my build. But my provided
information may still be incorrect. What I can see is that since
postgres 12.8 I get the aforementioned error that the file
"A/share/postgresql/postgresql.bki" would be missing. I did not
get this error before, and therefore never investigated if the
folder layout was the same between A and B. So, as you say, it may
have been different since basically forever.

But a new thing seems to be this error.

After this find, I will change my build not to compile twice any
more. Therefore the issue is resolved for me!

But it may be good to keep this email for future reference, if
someone else trips over the problem of a missing "postgresql.bki".
Hint to this future person: does your install path contain the
strings "pgsql" or "postgres", then search in this area for the
error...

All the best, and thanks a lot,

Mario Emmenlauer




Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

2021-09-01 Thread Mario Emmenlauer
On 05.10.20 14:35, Tom Lane wrote:
> Mario Emmenlauer  writes:
>> I get reproducibly the error:
>> 2020-10-05 11:48:19.720 CEST [84731] WARNING:  dup(0) failed after 0 
>> successes: Bad file descriptor
> 
> Hmph.  That code loop assumes that stdin exists to be duplicated,
> but maybe if it had been closed, you'd get this error.
> 
> However, that logic hasn't changed in decades, and we've not heard
> complaints about it before.  Are you starting the server in some
> unusual way?

Replying to a very old thread here: I could indeed trace the problem of
the failing `dup(0)` back to how we start the server! We start the
server from an executable that closes stdin very early on, and this
seems to lead to the problem.

We solved it by not closing stdin. But for future reference, if other
people may be affected by this, the code could probably be modified to
revert to stdout or stderr or a temporary file in case stdin is not
available (guessing here...).

All the best,

Mario Emmenlauer