On Tue, Jul 9, 2019 at 8:30 AM Konstantin Knizhnik <k.knizh...@postgrespro.ru> wrote: >>>> Rebased version of the patch is attached.
Thanks for including nice documentation in the patch, which gives a good overview of what's going on. I haven't read any code yet, but I took it for a quick drive to understand the user experience. These are just some first impressions. I started my server with -c connection_proxies=1 and tried to connect to port 6543 and the proxy segfaulted on null ptr accessing port->gss->enc. I rebuilt without --with-gssapi to get past that. Using SELECT pg_backend_pid() from many different connections, I could see that they were often being served by the same process (although sometimes it created an extra one when there didn't seem to be a good reason for it to do that). I could see the proxy managing these connections with SELECT * FROM pg_pooler_state() (I suppose this would be wrapped in a view with a name like pg_stat_proxies). I could see that once I did something like SET foo.bar = 42, a backend became dedicated to my connection and no other connection could use it. As described. Neat. Obviously your concept of tainted backends (= backends that can't be reused by other connections because they contain non-default session state) is quite simplistic and would help only the very simplest use cases. Obviously the problems that need to be solved first to do better than that are quite large. Personally I think we should move all GUCs into the Session struct, put the Session struct into shared memory, and then figure out how to put things like prepared plans into something like Ideriha-san's experimental shared memory context so that they also can be accessed by any process, and then we'll mostly be tackling problems that we'll have to tackle for threads too. But I think you made the right choice to experiment with just reusing the backends that have no state like that. On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old school poll() for now), I see the connection proxy process eating a lot of CPU and the temperature rising. I see with truss that it's doing this as fast as it can: poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000) = 1 (0x1) Ouch. I admit that I had the idea to test on FreeBSD because I noticed the patch introduces EPOLLET and I figured this might have been tested only on Linux. FWIW the same happens on a Mac. That's all I had time for today, but I'm planning to poke this some more, and get a better understand of how this works at an OS level. I can see fd passing, IO multiplexing, and other interesting things happening. I suspect there are many people on this list who have thoughts about the architecture we should use to allow a smaller number of PGPROCs and a larger number of connections, with various different motivations. > Thank you, I will look at Takeshi Ideriha's patch. Cool. > > Could you please fix these compiler warnings so we can see this > > running check-world on CI? > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.46324 > > https://travis-ci.org/postgresql-cfbot/postgresql/builds/555180678 > > > Sorry, I do not have access to Windows host, so can not check Win32 > build myself. C:\projects\postgresql\src\include\../interfaces/libpq/libpq-int.h(33): fatal error C1083: Cannot open include file: 'pthread-win32.h': No such file or directory (src/backend/postmaster/proxy.c) [C:\projects\postgresql\postgres.vcxproj] These relative includes in proxy.c are part of the problem: #include "../interfaces/libpq/libpq-fe.h" #include "../interfaces/libpq/libpq-int.h" I didn't dig into this much but some first reactions: 1. I see that proxy.c uses libpq, and correctly loads it as a dynamic library just like postgres_fdw. Unfortunately it's part of core, so it can't use the same technique as postgres_fdw to add the libpq headers to the include path. 2. libpq-int.h isn't supposed to be included by code outside libpq, and in this case it fails to find pthead-win32.h which is apparently expects to find in either the same directory or the include path. I didn't look into what exactly is going on (I don't have Windows either) but I think we can say the root problem is that you shouldn't be including that from backend code. -- Thomas Munro https://enterprisedb.com