[GENERAL] Server-side hooks for user session start and session end

2015-09-26 Thread Nikolai Zhubr

Hi all,

I'm trying to find a soultion to automatically execute something 
(preferrably a function or at least some pure sql statements) at the 
beginning and at the end of a user session. As an example, imagine just 
storing of all login and logout timestamps (though the real task is a 
bit more complicated than that)


I would not like it to be somehow explicitely client-side initiated 
(like e.g. trivially calling some certain do_at_sess_begin(), 
do_at_sess_end()) because first, a malicious client could then mess it 
up easily, and furthermore, in the case that the session somehow ended 
abnormally (due to say temporary network fault) client-side finalizer 
function would not be called anyway.


My first try was to create an event trigger for sql drop events, then 
create a temporary table and watch for its drop at the end of a session. 
However, it seems drop events are just not fired for temporary tables 
(although documentation does not state this anywhere, as far as I can 
see, maybe I'm wrong about that)


It would seem like one of the simplest things ever, but I'm now totally 
stuck out of ideas.


As a partial solution, handling just session _end_ would already be OK.

Any thoughts?


Thank you,
Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server-side hooks for user session start and session end

2015-09-26 Thread Nikolai Zhubr

Hi Pavel,
26.09.2015 19:26, Pavel Stehule wrote:
[...]

This cannot be solved without patching PostgreSQL source code :( . There
are not good hooks for custom extension. Patch is relative simple, but I
cannot to publish it.


Ok, I see. Creating such a patch might be not very hard actually.
But living with custom-patched server could cause massive pain later...


You have to modify main loop in src/backend/tcop/postgres.c file. Login
point can be immediately before loop. Logout point can be joined to
'X',EOF message.


Thanks for the hint, I've noticed this loop while searching for some 
more 'regular' methods.



Attention - there are some corner case, what are necessary to solve -
mainly how to handle exceptions in login/logout procedures.


Right. And I'd guess that might be the reason for not implementing such 
thing in upstream (yet) - the lack of proper context for exceptions and 
such.



p.s. I understand so this hack is necessary for porting older
application from other databases, but I am not sure if it is a good idea
to use it. Not all tasks can be solved well in database.


No, the application in question has been running on top of postgres 
almost from its very start and for quite some years successfully now, it 
definitely does not contain anything foreign to postgres.
And the events of session start and session end would seem quite generic 
and usefull anyway?



Thank you,
Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server-side hooks for user session start and session end

2015-09-26 Thread Nikolai Zhubr

26.09.2015 20:59, Tom Lane wrote:
[...]

A session-end hook seems pretty problematic to me: you simply cannot
guarantee that it will run at all.  (Consider process crash or server
abort cases.)  So anything built on the assumption that it gets to run
at session end is going to be inherently unreliable.


Yes, I understand that.
For such cases (of e.g. abnormal shutdown) it would also be nice if a 
database could 'autorun' some specified function just after server has 
performed whetever startup/replay/cleanup procedures necessary and 
immediately before it is ready for normal operation. This would allow to 
perform some checks and cleanups to restore consistency. And actually, 
such 'autorun' function could be interesting regardless of this session 
begin/end thing.
And there are already quite some hooks in use for session-end cleanups 
at C-level, like e.g. for disposing temp tables, just none of them 
expose anything to SQL level. I'd guess this technique is available for 
use by extensions, so potentially SQL-level hook could also be 
implemented. Though I think it is a bit beyond my capability at the 
moment...


Regards,


Thank you,
Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server-side hooks for user session start and session end

2015-09-27 Thread Nikolai Zhubr

Hi,
27.09.2015 8:29, Pavel Stehule:

I'll check it.


It is working. Patch attached


Oh, brilliant! This is a _huge_ help actually!

If I understand it correctly, any unhandled SQL-level exceptions will 
essentially be ignored there, so that the session will continue regardless?


And maybe it could even be proposed for upstream?
It is so wonderfully small and looks not too much intrusive.


Thank you,
Nikolai



Regards

Pavel





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server-side hooks for user session start and session end

2015-09-27 Thread Nikolai Zhubr

Hi,
27.09.2015 22:59, Pavel Stehule:


2015-09-27 21:40 GMT+02:00 Jim Nasby mailto:jim.na...@bluetreble.com>>:

_PG_fini


It should not work - see a doc


I'd rather consider registering with on_shmem_exit -- as per the 
insightfull note placed in postgresql.c:4290.

(Haven't tried - just thinking)


Thank you,
Nikolai


If the file includes a function named |_PG_fini|, that function will be
called immediately before unloading the file. Likewise, the function
receives no parameters and should return void. Note that |_PG_fini| will
only be called during an unload of the file, not during process
termination. (Presently, unloads are disabled and will never occur, but
this may change in the future.)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Server-side hooks for user session start and session end

2015-09-27 Thread Nikolai Zhubr

Hi,
27.09.2015 23:22, Pavel Stehule wrote:
[...]

updated patch - fixed error reporting


Wow! I'll definitely borrow it :)


Thank you,
Nikolai



Regards

Pavel


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Nonblocking libpq + openssl = ?

2016-09-16 Thread Nikolai Zhubr

Hello all,

Not sure this is exactly right list, so feel free to point me to some 
other as appropriate.


While working on a higher-level binding to the libpq library, I've 
(likely) discovered a problem with non-blocking operation in case of 
using openssl. And, it looks so striking I'd like to share my observation.


For libpq, non-blocking operation is documented as a normal supported 
feature, e.g. [1]
Now, openssl transport is also documented as a normal supported feature, 
e.g. [2]
I have not found anywhere in documentaion any clear warnings that 
non-blocking operation and openssl transport are mutually exclusive or 
might not quite work as specified in any way.


From [1] we learn (through some intricate wording) that in order to 
avoid blocking at PQgetResult() one can employ PQsetnonblocking(), 
PQflush(), PQconsumeInput() and PQisBusy(), supposedly all of them 
non-blocking after calling PQsetnonblocking(), although not stated 
explicitely so, but otherwise it would make just no sence whatsoever, right?


Now lets have a look at e.g. PQconsumeInput():

===
.
/*
 * Load more data, if available. We do this no matter what state we are
 * in, since we are probably getting called because the application wants
 * to get rid of a read-select condition. Note that we will NOT block
 * waiting for more input.
 */
if (pqReadData(conn) < 0)
return 0;

/* Parsing of the data waits till later. */
 return 1;
}
===

It is stated that pqReadData() will NOT block. Now let's get inside:

===
.
/* OK, try to read some data */
retry3:
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
 conn->inBufSize - conn->inEnd);
.
/*
 * Still not sure that it's EOF, because some data could have just
 * arrived.
 */
retry4:
nread = pqsecure_read(conn, conn->inBuffer + conn->inEnd,
conn->inBufSize - conn->inEnd);



Now in case of SSL, this pqsecure_read() is just a wrapper around 
pgtls_read(), so lets look further:



pgtls_read(PGconn *conn, void *ptr, size_t len)
{
.
rloop:
SOCK_ERRNO_SET(0);
n = SSL_read(conn->ssl, ptr, len);
err = SSL_get_error(conn->ssl, n);
switch (err)
{
..
break;
case SSL_ERROR_WANT_WRITE:
/* Returning 0 here would cause caller to wait for read-ready,
 * which is not correct since what SSL wants is wait for
 * write-ready.  The former could get us stuck in an infinite
 * wait, so don't risk it; busy-loop instead. */
goto rloop;
==

So going PQconsumeInput()->pqReadData()->pqsecure_read()->pgtls_read() 
in a supposedly non-blocking operation we finally come to a tight 
busy-loop waiting for SSL_ERROR_WANT_WRITE to go down! How could such 
thing ever be,


- with no even sleep(1),
- no timeout,
- no diagnostics of any sort,
- a comment implying that getting stuck in a (potentially) infinite 
sleepless loop deep inside a library is OK.


And looking more into this pgtls_read() function it seems it just has 
inadequate interface. So that it has really no way to reliably indicate 
some important details to its caller, namely the need to wait for 
write-readyness. It's like if ssl support was a quick-n-dirty hack 
rather than a consistently integrated feature. Or do I read it all wrong?

Any thoughts?

[1] https://www.postgresql.org/docs/9.5/static/libpq-async.html
[2] https://www.postgresql.org/docs/9.5/static/libpq-ssl.html

Thank you,
Regards,

Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-16 Thread Nikolai Zhubr

17.09.2016 2:05, Andres Freund:
[...]

Well, it's not pretty. I quite dislike this bit, and I've complained
about it before.  But it is noteworthy that it's nearly impossible to
hit these days, due to ssl-renegotiation support having been ripped out.
That's what could trigger openssl to require writes upon reads.


Looks like it _usually_ happens so that such interdependent reads and 
writes are unnecessary in the absence of renegotiations. But still [1] 
instructs to always check for both SSL_ERROR_WANT_READ and 
SSL_ERROR_WANT_WRITE in all cases. Supposedly it is for a reason. The 
way it is implemented in fe-secure-openssl.c looks just somewhat unfinished.

I'm wondering is there really something that prevents doing it properly?

[1] https://www.openssl.org/docs/manmaster/ssl/SSL_get_error.html

Thank you,
Regards,
Nikolai



Greetings,

Andres Freund






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-17 Thread Nikolai Zhubr

17.09.2016 3:27, Andres Freund:
[...]

Looks like it _usually_ happens so that such interdependent reads and writes
are unnecessary in the absence of renegotiations. But still [1] instructs to
always check for both SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE in all
cases. Supposedly it is for a reason. The way it is implemented in
fe-secure-openssl.c looks just somewhat unfinished.
I'm wondering is there really something that prevents doing it properly?


The relevant user-level API of libpq (PQisBusy) doesn't have a way to
return "waiting for write". So we'd have to break API compatibility.


Ah, I see. But then, this is a very common sort of problem (Existing API 
spec getting inadequate for some new features added later, maintaining 
complete interface compatability getting impossible.)


In this specific case, I'd say a reasonable approach would be to 
urgently introduce some new PQisBusyParams() returning the flag in 
question, and subsequently deprecating the historical PQisBusy(). Maybe 
something else would be necessary. Meanwhile, it would seem logical to 
move this busy-loop to PQisBusy() so it would become more evident 
PQisBusy() is flawed.


Besides, even with no changes to API, one good thing can be done still.
If SSL_ERROR_WANT_WRITE is so unlikely to ever happen in pgtls_read(), 
why not just throw a (descriptive enough) error and get out immediately? 
And see if someone compains about dropped connections because of this?


And while we are at it, it would be nice to have something like 
pqWaitTimed() included in the API, so as to be able to (mostly) avoid 
messing with the underlying OS handles/sockets outside of libpq (and 
keeping user code more generic therefore). Is there a reason for not 
providing this?



Thank you,
Regards,
Nikolai



Greetings,

Andres Freund






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-17 Thread Nikolai Zhubr

17.09.2016 11:15, I wrote:
[...]

The relevant user-level API of libpq (PQisBusy) doesn't have a way to
return "waiting for write". So we'd have to break API compatibility.


Ah, I see. But then, this is a very common sort of problem (Existing API
spec getting inadequate for some new features added later, maintaining
complete interface compatability getting impossible.)

In this specific case, I'd say a reasonable approach would be to
urgently introduce some new PQisBusyParams() returning the flag in
question, and subsequently deprecating the historical PQisBusy(). Maybe
something else would be necessary. Meanwhile, it would seem logical to
move this busy-loop to PQisBusy() so it would become more evident
PQisBusy() is flawed.


I'm now doing a respective cleanup I'll use for myself in my project.

Also, I'd be happy to share my changes to libpq so that they could 
eventually be used upstream, so I'll try to post it to pgsql-hackers 
when ready.



Thank you,
Regards,
Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Nonblocking libpq + openssl = ?

2016-09-17 Thread Nikolai Zhubr

17.09.2016 3:27, Andres Freund:
[...]

The relevant user-level API of libpq (PQisBusy) doesn't have a way to
return "waiting for write". So we'd have to break API compatibility.


I think this is actually not entirely correct.
PQisBusy is fine, I do not see a need to touch it at all.
Rather, the return value of PQConsumeInput could be extended (in fact 
has just been extended by me here) as follows:


Old spec:
0 == error
1 == no error

New spec:
0 == error
1 == no error
2 == no error but can't proceed until pending write finished.

Agreed, this is still an incompatible API change, but not excessively 
destructive and it should be fairly easy to make adjustments for such 
change (unless the usage has already been hopelessly broken).



Thank you,
Regards,
Nikolai



Greetings,

Andres Freund






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-01-30 Thread Nikolai Zhubr

Hello all,

(Hopefully this is right place to post on the subject, otherwise please 
let me know.)


I'm observing some strange inexplicable effect in 9.5.5 server running 
on x86 windows (32-bit windows xp sp3). That is, CPU usage in backend 
process for the session in question starts to grow, going from approx 
0-1 to 8-15 percent and more, stays that high for several seconds, then 
goes back to 0-1. All the effect takes about 15-30 seconds, and repeats 
stably every 10-20 minutes (as long as the respective client continues 
to run the same queries). Apparently it is essential to consider the 
pattern of requests going to the server: in this case there is a 
continuous stream of very small cheap queries, but quite a lot of them 
per second. Trying to understand the reason, I've managed to craft a 
pure artifical test triggering very similar CPU load waves without the 
need for any specific database at all, easy to repeat:


1. "select localtimestamp" 40 times (As separate requests, one by one, 
but no delay inserted in between)

2. wait 1/2 second.
3. goto 1

That's it. Just let it run for > 20 minutes in one session. (These 
queries are so cheap that normally they consume approx zero resources)


Screenshots: https://yadi.sk/i/J_yj_0t43BgdGw
(I can also send as file if this link does not work)

Other notes:
- the server instance in question is EDB 9.5.5-1 win32 binaries.
- the production machine is Core 2 duo 2600MHz, 2Gb ram, typical CPU 
load is rather low, like 0% to 3%, therefore the effect is easily noticable.

- no antivirus or other weird or abusive software involved.
- communication goes through libpq (tcp/ip only).
- turning SSL on/off does not matter.
- my test application issueing "select localtimestamp" in the endless 
loop is written in pascal, although this probably does not matter.

- query execution time as seen by the client is not affected (stays low).
- pausing the test in the client causes backend to drop CPU usage 
immediately to 0, resuming causes it to go back to where it was (unless 
pausing for too long), so excessive CPU load is tied to some normal 
activity of backend (i.e. no activity == no load).
- the effect is NOT observed (yet?) when running test on a server 
machine directly (pointing it to 127.0.0.1)
- the effect looks more substantial in SMP case (2 Cores), compared to 
UMP case (when testing in a VM, see below).

- nothing appears in the log.

To me it looks like some sort of wait/check/synchronize issue for a 
socket/lock/signal or similar. Probably some rare corner case, probably 
windows-specific. However, looking through backend/port/win32/socket.c 
and backend/port/win32_latch.c I cannot immediately see anything wrong 
yet (but WaitForMultipleObjects is a tricky thing IIRC).


Luckily I've managed to construct a configuration very similar to 
production machine in a development VM (VirtualBox) and reproduced the 
effect there, so now I'm able to safely and comfortably test whatever 
ideas might appear (Although I'm somewhat reluctant to rebuild and 
hand-debug the server here myself, because preparing proper build 
environment on windows is quite a lot of work...)


Any help/thoughts/instructions greatly appreciated.

Should I also report this to bugtracker at postgresql.org?


Thank you,
Nikolai



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-01-30 Thread Nikolai Zhubr

31.01.2017 6:43, Michael Paquier wrote:

On Mon, Jan 30, 2017 at 6:16 PM, Nikolai Zhubr  wrote:

I'm observing some strange inexplicable effect in 9.5.5 server running on
x86 windows (32-bit windows xp sp3).


Oh, well... You are aware that this is out of support by Microsoft, right?


Sure. The box is quite old.
However, at the moment I'm not confident the issue is XP-specific. Not 
even really sure it is windows-specific.
And, I can't see any indication that the behaviour of e.g. 
WaitForMultipleObjects() changed much from NT 3.51 to Windows 10 anyway.


[...]

1. "select localtimestamp" 40 times (As separate requests, one by one, but
no delay inserted in between)
2. wait 1/2 second.
3. goto 1


Craig, could this be a side-effect of 519b0757? That's new in 9.5, and
that's directly related to the code paths discussed here.


Meanwhile I'll redo my tests using 9.4 instead of 9.5 and report back.

Thank you.

Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-01-31 Thread Nikolai Zhubr

31.01.2017 10:37, I wrote:

[...]

1. "select localtimestamp" 40 times (As separate requests, one by
one, but
no delay inserted in between)
2. wait 1/2 second.
3. goto 1


Craig, could this be a side-effect of 519b0757? That's new in 9.5, and
that's directly related to the code paths discussed here.


Meanwhile I'll redo my tests using 9.4 instead of 9.5 and report back.


Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32)

Would it make sense to check some even older ones?


Thank you.

Nikolai



Thank you.

Nikolai






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-01-31 Thread Nikolai Zhubr

31.01.2017 19:51, Andres Freund:
[...]

Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32)


That's good to know, less because of 519b0757, more because of the latch
changes - but they also went in in 9.5...


Would it make sense to check some even older ones?


Could you use process monitor or such to see what the process is doing
while using a lot of CPU?


I'm not sure how to do this, especially considering that the process in 
question is running as a service?


Now, some more input:

* 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We 
are moving to Centos soon anyway!)


* 9.4.4 server running on win7 32-bit - affected, same thing as on XP.


Thank you.

Nikolai



Regards,

Andres





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-01 Thread Nikolai Zhubr

01.02.2017 1:02, I wrote:
[...]

Could you use process monitor or such to see what the process is doing
while using a lot of CPU?


I'm not sure how to do this, especially considering that the process in
question is running as a service?

Now, some more input:

* 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We
are moving to Centos soon anyway!)

* 9.4.4 server running on win7 32-bit - affected, same thing as on XP.


I've managed to create a "fix" (see diff below).
It looks like the wait logic is somehow broken on windows currently, 
though I can not find the problem myself yet.
It would be great if someone more familiar with the (windows-specific) 
code came up with ideas.

I have a build environment ready so I could do more tests then.

--- be-secure.c.orig2017-02-01 22:37:37.228032608 +0300
+++ be-secure.c 2017-02-01 22:51:17.655751292 +0300
@@ -159,6 +159,7 @@
 * socket to become ready again.
 */
}
+Sleep(15); /* n.zhubr */
goto retry;
}

@@ -238,6 +239,7 @@
 * socket to become ready again.
 */
}
+Sleep(15); /* n.zhubr */
goto retry;
}


Thank you.

Nikolai




Thank you.

Nikolai



Regards,

Andres









--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-03 Thread Nikolai Zhubr

02.02.2017 2:14, I wrote:

01.02.2017 1:02, I wrote:
[...]

Could you use process monitor or such to see what the process is doing
while using a lot of CPU?


I'm not sure how to do this, especially considering that the process in
question is running as a service?

Now, some more input:

* 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We
are moving to Centos soon anyway!)

* 9.4.4 server running on win7 32-bit - affected, same thing as on XP.


I've managed to create a "fix" (see diff below).
It looks like the wait logic is somehow broken on windows currently,
though I can not find the problem myself yet.
It would be great if someone more familiar with the (windows-specific)
code came up with ideas.
I have a build environment ready so I could do more tests then.


Some update.

Adding this "Sleep(15)" before "goto retry" into secure_read() has 
apparently eliminated the effect at our production server too. That is, 
my load-bug-detector has been quiet for > 24hr or more.


Now by adding more debigging stuff into secure_read() and secure_write() 
I've found that:


* secure_write() is likely irrelevant, as "goto retry" there was never 
actually hit yet;


* in secure_read(), during the intervals of excessive cpu load, 
WaitLatchOrSocket() was never observed to indicate latch event, and was 
never observed to (erroneously) indicate socket readiness more than once 
(with socket read attempt in between), which I was suspecting happening, 
so I can not blame secure_read() itself and this all makes me wonder 
even more...


Note: I'm testing with SSL off now.

As always, and hints greatly appreciated!


Thank you.
Nikolai



--- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300
+++ be-secure.c 2017-02-01 22:51:17.655751292 +0300
@@ -159,6 +159,7 @@
* socket to become ready again.
*/
}
+ Sleep(15); /* n.zhubr */
goto retry;
}

@@ -238,6 +239,7 @@
* socket to become ready again.
*/
}
+ Sleep(15); /* n.zhubr */
goto retry;
}


Thank you.

Nikolai




Thank you.

Nikolai



Regards,

Andres













--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-03 Thread Nikolai Zhubr

03.02.2017 13:52, I wrote:
[...]

Adding this "Sleep(15)" before "goto retry" into secure_read() has
apparently eliminated the effect at our production server too. That is,
my load-bug-detector has been quiet for > 24hr or more.

Now by adding more debigging stuff into secure_read() and secure_write()
I've found that:

* secure_write() is likely irrelevant, as "goto retry" there was never
actually hit yet;

* in secure_read(), during the intervals of excessive cpu load,
WaitLatchOrSocket() was never observed to indicate latch event, and was
never observed to (erroneously) indicate socket readiness more than once
(with socket read attempt in between), which I was suspecting happening,
so I can not blame secure_read() itself and this all makes me wonder
even more...


Ok, secure_read() is likely irrelevant too.

I think what happened after I inserted "Sleep(15)" into secure_read() is 
that this "Sleep(15)" was essentially added into the main "for(;;)" loop 
of PostgresMain (through ReadCommand), introducing an artifical 
additional CPU relaxation step along with every incoming query and 
therefore just masking a real CPU eater.


So probably I'll have to somehow profile this "for(;;)" in PostgresMain.


Thank you.

Nikolai



Note: I'm testing with SSL off now.

As always, and hints greatly appreciated!


Thank you.
Nikolai



--- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300
+++ be-secure.c 2017-02-01 22:51:17.655751292 +0300
@@ -159,6 +159,7 @@
* socket to become ready again.
*/
}
+ Sleep(15); /* n.zhubr */
goto retry;
}

@@ -238,6 +239,7 @@
* socket to become ready again.
*/
}
+ Sleep(15); /* n.zhubr */
goto retry;
}


Thank you.

Nikolai




Thank you.

Nikolai



Regards,

Andres

















--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-03 Thread Nikolai Zhubr

03.02.2017 20:29, Andres Freund:
[...]

Could you use process monitor or such to see what the process is doing
while using a lot of CPU?


I'm not sure how to do this, especially considering that the process in
question is running as a service?


I don't think that stops you, if you have admin privileges.


Well, profiling postgres.exe is still beyond my capability at this time 
anyway. Instead, I'll try to prepare a most simple client application 
example for testing the behaviour in question so that anyone could run 
it easily. (And while working on such an example, maybe I'll get some 
more understanding of what is actually going on here.)



Thank you.

Nikolai








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-05 Thread Nikolai Zhubr

03.02.2017 22:16, Andres Freund:
[...]

Well, profiling postgres.exe is still beyond my capability at this time
anyway. Instead, I'll try to prepare a most simple client application
example for testing the behaviour in question so that anyone could run it
easily. (And while working on such an example, maybe I'll get some more
understanding of what is actually going on here.)


I'm missing something. All you need to do is to start
processexplorer.exe and filter out other applications?


Ah, ok. I first thought about some kind of code-level debugging.

And yes, running Process Explorer gave some new and unexpected input. 
During the period of this strange high load it claims 40% CPU is used by 
interrupts (normally 0.01%) and 3% used by backend postgres.exe 
(normally approx 0%). I'd guess this means some problem happening in the 
OS (which hosts this postgres.exe), probably related to network 
communication? (Because nothing else seems likely related to interrupts 
in such scenario?)


The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs


Thank you.

Nikolai




Andres





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-05 Thread Nikolai Zhubr

05.02.2017 17:10, I wrote:
[...]

And yes, running Process Explorer gave some new and unexpected input.
During the period of this strange high load it claims 40% CPU is used by
interrupts (normally 0.01%) and 3% used by backend postgres.exe
(normally approx 0%). I'd guess this means some problem happening in the
OS (which hosts this postgres.exe), probably related to network
communication? (Because nothing else seems likely related to interrupts
in such scenario?)


Additionally, I've now got a kernrate viewer reports, one for high load, 
and one for normal load period, attached below. Here, tcpip shows some 
more kernel activity during the problematic period, but the difference 
is not so huge.


Time   36348 hits, 25000 events per hit 
 ModuleHits   msec  %Total  Events/Sec
intelppm  35048  4564096 %19198071
hal1030  45640 2 %  564198
ntkrnlpa204  45640 0 %  111744
tcpip20  45640 0 %   10955
win32k   18  45640 0 %9859
afd   6  45640 0 %3286
ipnat 6  45640 0 %3286
NDIS  4  45640 0 %2191
..

Time   37227 hits, 25000 events per hit 
 ModuleHits   msec  %Total  Events/Sec
intelppm  35856  4682896 %19142393
hal1089  46828 2 %  581382
ntkrnlpa229  46828 0 %  122255
win32k   29  46828 0 %   15482
tcpip 9  46828 0 %4804
NDIS  4  46828 0 %2135
afd   3  46828 0 %1601
psched3  46828 0 %1601
ipnat 2  46828 0 %1067


Thank you.

Nikolai



The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs


Thank you.

Nikolai




Andres







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Actual row order in UPDATE and SELECT FOR UPDATE

2016-02-15 Thread Nikolai Zhubr

Hello all,

I can't find any clear description of how to reliably figure and/or 
enforce specific row (locking) order within UPDATE and SELECT FOR UPDATE 
statements dealing with multiple rows.
I'd like to get rid of some deadlocks (caused by share locks). While the 
manual explains locks and deadlocks themselves pretty fine (in e.g.

http://www.postgresql.org/docs/9.5/static/explicit-locking.html
) it somehow avoids discussing multi-row updates there. On the other 
hand, the UPDATE section of the manual somehow avoids discussing actual 
update order and respective locking too.
So is it defined anywhere explicitely? Or do I rather have to convert 
all multi-row UPDATE statements into single-row updates and then wrap 
them into e.g. plpgsql loops? That would look quite strange...

Any hints?


Thank you,
Nikolai


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Actual row order in UPDATE and SELECT FOR UPDATE

2016-02-15 Thread Nikolai Zhubr

Hello all,
16.02.2016 2:41, Vitaly Burovoy:
[...]

UPDATE deals with rows in order they are fetched from a heap. In a
common case it can be considered as unordered.

[...]

However SELECT can fetch rows in a specific order and locking by FOR
UPDATE clause is applied _after_ sorting. Then you can use CTE[1] with
Ok. So if I get it correctly, plain UPDATE statement by itself can not 
be constructed in such way that reliably avoids deadlocks in case of 
possibly overlapping concurrent updates. So in order to be safe, UPDATE 
statements will need to always be 'protected' by respective SELECT FOR 
UPDATE first. I'd suppose this fact deserves some more explicit mention 
in the manual, as it is not so obvious...


Thanks a lot for your code example and comprehensive advice.
I think I'm able to fix my deadlocks now.


Regards,
Nikolai


SELECT ... ORDER BY ... FOR UPDATE to pass all necessary data to the
UPDATE statement. Since UPDATE still deals with unordered rows they
have already locked, and parallel queries are waiting in SELECT
statement rather than in UPDATE:

WITH lck AS (
 SELECT
id,  -- for WHERE clause in UPDATE

-- you can do calculations here or in-place (see "field2" below)
field1 + 1 as field1,
...
 FROM your_table
 WHERE ...
 ORDER BY id  -- for example
 FOR UPDATE
)
UPDATE your_table t
SET
   field1=lck.field1,  -- lhs is always field of updatable table;
-- rhs must be pointed by a "table" if they are the same in both "tables"

   field2=field2 + 2, -- or if you make changes in-place and it doesn't appear
--in a table mentioned in "FROM" clause, you can avoid table/alias name
...
FROM lck
WHERE
 t.id=lck.id



Thank you,
Nikolai


[1]http://www.postgresql.org/docs/current/static/queries-with.html




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-08 Thread Nikolai Zhubr

05.02.2017 22:05, I wrote:
[...]

And yes, running Process Explorer gave some new and unexpected input.
During the period of this strange high load it claims 40% CPU is used by
interrupts (normally 0.01%) and 3% used by backend postgres.exe
(normally approx 0%). I'd guess this means some problem happening in the
OS (which hosts this postgres.exe), probably related to network
communication? (Because nothing else seems likely related to interrupts
in such scenario?)


Ok, I've got a working example as simple as the following:

do {
for (i=0; i<40; i++) {
Sleep(15);
res = PQexec(conn, "SELECT localtimestamp ");
if (PQresultStatus(res) != PGRES_TUPLES_OK) return 1;
PQclear(res);
}
fprintf(stdout, "*");
Sleep(350);
} while(1);

Here, Sleep(15) and Sleep(350) are very important, removing or changing 
them can cause the effect to disappear.
Looks like some timing-sensitive issue in windows TCP/IP implementation? 
But then, how can it be that no-one have noticed it yet? Puzzling.



Thank you.

Regards,
Nikolai



Additionally, I've now got a kernrate viewer reports, one for high load,
and one for normal load period, attached below. Here, tcpip shows some
more kernel activity during the problematic period, but the difference
is not so huge.

Time 36348 hits, 25000 events per hit 
Module Hits msec %Total Events/Sec
intelppm 35048 45640 96 % 19198071
hal 1030 45640 2 % 564198
ntkrnlpa 204 45640 0 % 111744
tcpip 20 45640 0 % 10955
win32k 18 45640 0 % 9859
afd 6 45640 0 % 3286
ipnat 6 45640 0 % 3286
NDIS 4 45640 0 % 2191
..

Time 37227 hits, 25000 events per hit 
Module Hits msec %Total Events/Sec
intelppm 35856 46828 96 % 19142393
hal 1089 46828 2 % 581382
ntkrnlpa 229 46828 0 % 122255
win32k 29 46828 0 % 15482
tcpip 9 46828 0 % 4804
NDIS 4 46828 0 % 2135
afd 3 46828 0 % 1601
psched 3 46828 0 % 1601
ipnat 2 46828 0 % 1067


Thank you.

Nikolai



The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs


Thank you.

Nikolai




Andres











--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-12 Thread Nikolai Zhubr

Hello all,

In order to locate the problem more precisely, I'd like to prepare a 
test, involving some ping-like communication between the server and a 
test client. That is, I'd like to repeatedly send something valid to the 
server and get some valid replies from it, but without any kind of real 
activity happening on the server. I've looked through the main loop in 
PostgresMain() but could not find any suitable candidates.


Any thoughts?

Thank you.

Nikolai


03.02.2017 16:30, I wrote:
[...]

Ok, secure_read() is likely irrelevant too.

I think what happened after I inserted "Sleep(15)" into secure_read() is
that this "Sleep(15)" was essentially added into the main "for(;;)" loop
of PostgresMain (through ReadCommand), introducing an artifical
additional CPU relaxation step along with every incoming query and
therefore just masking a real CPU eater.

So probably I'll have to somehow profile this "for(;;)" in PostgresMain.


Thank you.

Nikolai





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-13 Thread Nikolai Zhubr

13.02.2017 23:58, Rader, David:

How about using pg_isready?
https://www.postgresql.org/docs/current/static/app-pg-isready.html


But it doesn't actually communicate with the server AFAIK, just checks 
if a connection could be established?


Maybe I should have been more specific.
What I need is debugging/profiling pure communication side of server 
operation, implying huge lots of requests and replies going over the 
wire to and from the server within some continued (valid) session, but 
so that the server is not actually doing anything above that (no sql, no 
locking, no synchronizing, zero usefull activity, just pumping network I/O)



Regards,
Nikolai



--
David Rader
dav...@openscg.com <mailto:dav...@openscg.com>

On Sun, Feb 12, 2017 at 12:23 PM, Nikolai Zhubr mailto:n-a-zh...@yandex.ru>> wrote:

Hello all,

In order to locate the problem more precisely, I'd like to prepare a
test, involving some ping-like communication between the server and
a test client. That is, I'd like to repeatedly send something valid
to the server and get some valid replies from it, but without any
kind of real activity happening on the server. I've looked through
the main loop in PostgresMain() but could not find any suitable
candidates.

Any thoughts?

Thank you.

Nikolai


03.02.2017 16:30, I wrote:
[...]

Ok, secure_read() is likely irrelevant too.

I think what happened after I inserted "Sleep(15)" into
secure_read() is
that this "Sleep(15)" was essentially added into the main
"for(;;)" loop
of PostgresMain (through ReadCommand), introducing an artifical
additional CPU relaxation step along with every incoming query and
therefore just masking a real CPU eater.

So probably I'll have to somehow profile this "for(;;)" in
PostgresMain.


Thank you.

Nikolai




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-13 Thread Nikolai Zhubr

14.02.2017 1:10, Thomas Kellerer:

Nikolai Zhubr schrieb am 13.02.2017 um 23:03:

Maybe I should have been more specific.
What I need is debugging/profiling pure communication side of server
operation, implying huge lots of requests and replies going over the
wire to and from the server within some continued (valid) session,
but so that the server is not actually doing anything above that (no
sql, no locking, no synchronizing, zero usefull activity, just
pumping network I/O)



If you are willing to drop the "no sql" requirement you could use
something like

select rpad('*', 1, '*');

this will send a lot of data over the wire, the SQL overhead should be
fairly small.


Well yes, but I've been there already.
Now I'd like to locate a CPU eater more precisely - supposedly there is 
some issue with communication, that is why I don't want to mix in 
anything else.


Anyway, I've now got pretty sure the standard protocol out of the box 
does not provide such short-circuite capability so I'll have to hack it 
a bit.



Regards,
Nikolai



You can send more data if you combine that with e.g. generate_series()




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-14 Thread Nikolai Zhubr

14.02.2017 12:47, John R Pierce:

On 2/13/2017 11:03 PM, Nikolai Zhubr wrote:

Now I'd like to locate a CPU eater more precisely - supposedly there
is some issue with communication, that is why I don't want to mix in
anything else.


use iperf to test the network transport layer, without any postgres in
the loop?


No, the network performance in general is fine. However, some specific 
continuous communication pattern causes something to go very wrong on 
windows periodically. (And there is no such problem on linux btw) While 
the issue now seems rather related to windows itself and not postgres, 
but the pattern typical for some of my postgres uses is apparently a 
reliable trigger. Therefore, I'm thinking to first isolate the problem 
inside postgres, and if it still triggers then, second step, create a 
test "server" consisting of communication code of postgres to run 
without any other parts of postgres.



Regards,
Nikolai








--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-14 Thread Nikolai Zhubr

14.02.2017 18:15, Rader, David:
[...]

Try the libpq call from pg_isready. It does actually make a round trip
to the postgres server and asks the server if it is ready to accept
connections. So you are running the socket communication code of
postgres and a small bit of "status" check but not any sql.


Indeed, such function is available. But essentially, this function is a 
(kind of) combined login+logout, therefore it would not work for my 
purpose. (Despite its name, it can not be used to perform some 
communication "ping" within an established session, such thing simply 
does not exist in the protocol, at least as of 9.5)



Regards,
Nikolai



from mirror:
https://github.com/postgres/postgres/blob/master/src/bin/scripts/pg_isready.c

relevant call is PQping:

rv = PQpingParams(keywords, values, 1);







--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-02-26 Thread Nikolai Zhubr

15.02.2017 0:06, I wrote:
[...]

Indeed, such function is available. But essentially, this function is a
(kind of) combined login+logout, therefore it would not work for my
purpose. (Despite its name, it can not be used to perform some
communication "ping" within an established session, such thing simply
does not exist in the protocol, at least as of 9.5)


Ok, I've made some additions to the protocol to be able to do pings to a 
real server, some standalone tests mimicing postgresql communication and 
employing RDTSC, and run rediculously extensive experiments. And my 
findings to the moment is that the reason for CPU load issue is a wierd 
behaviour of windows' core functions, specifically 
WaitForMultipleObjects (and apparently the whole family of Wait 
functions) which is used in postgresql backend for waiting.


So, what I've observed is that Wait* functions _usually_ go to sleep 
nicely when the state is not signalled, but _sometimes_, depending on 
unknown criteria, it can choose to instead do a busy-loop wait or 
something CPU-expensive. Maybe it tries to optimize the delay, or maybe 
it is a bug. The effect somewhat varies depending on windows version, 
CPU cores, selected system timer frequency, and Wait* call pattern 
(frequency).


I can not currently see how it can be fixed in a generic and reliable 
way in postgres without sacrificing some performance. On the other hand, 
leaving it as-is is exposing the server to a substantial CPU abuse (even 
unintentional, as it initially happened in my case).



Regards,
Nikolai




Regards,
Nikolai



from mirror:
https://github.com/postgres/postgres/blob/master/src/bin/scripts/pg_isready.c


relevant call is PQping:

rv = PQpingParams(keywords, values, 1);











--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).

2017-03-01 Thread Nikolai Zhubr

27.02.2017 10:08, I wrote:
[...]

So, what I've observed is that Wait* functions _usually_ go to sleep
nicely when the state is not signalled, but _sometimes_, depending on
unknown criteria, it can choose to instead do a busy-loop wait or
something CPU-expensive. Maybe it tries to optimize the delay, or maybe
it is a bug. The effect somewhat varies depending on windows version,
CPU cores, selected system timer frequency, and Wait* call pattern
(frequency).

I can not currently see how it can be fixed in a generic and reliable
way in postgres without sacrificing some performance. On the other hand,
leaving it as-is is exposing the server to a substantial CPU abuse (even
unintentional, as it initially happened in my case).


Ok, I've finally found what happened. False alarm, in short. The server 
is perfectly fine.


My mistake was that I was using most basic ways of monitoring CPU load, 
i.e. provided by GetProcessTimes() and taskmgr.exe and such. And because 
they only operate at the scheduler granularity (usually 16ms approx), 
some rounding has to happen in cpu consumption calculation. With this 
approach, this rounding can apparently distort the measurement 
dramatically (in some corner cases).


Now, xperf employs a more fair approach to cpu consumption measurement, 
based on all the exact times of process switching-in and -out, which 
apparently gives a much better result.


So, in my case the results of taskmgr and xperf are very different, and 
considering the above, taskmgr's report should be dismissed as bogus. 
And xperf indicates that my stress-tests actually do not cause postgres 
backend process to consume any noticable cpu percentage al all. That's good.



Regards,
Nikolai




Regards,
Nikolai




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general