Hello, Allan 2014-09-11 0:29 GMT+04:00 Allan Kamau <kamaual...@gmail.com>:
> Dear Dmitriy, > > To add on to David's suggestions, Data caching is a difficult task to > undertake. Consider an example where your data may not all fit into memory, > when you cache these data outside PostgreSQL you would need to look into > memory management as well as issues around concurrent population of this > cache as well as means to keep the data in the cache fresh in tune with any > changes to the data. These are no trivial tasks and the database community > has spent years constructing and improving algorithms to do this on behalf > of the front end database application developer. Also each time a TCP > connection is created, additional compute resources are consumed by the OS > as well as the database management server software. > Memory - limited resource. Hence, it's possible to catch "out of memory" everywhere. By caching in this case I mean flushing the data retrieved by HTTP server from the database server on the disk during "socket read-ready event" dispatching if the retrieved data cannot be send to the HTTP-client (socket not write-ready yet). When the socket to the HTTP-client became "write-ready" the data will be streamed from cache. I believe, it's not hard to implement it. And note, there is no need to make such cache shared. It's just a temporary files with live time of HTTP request dispatching. > A simpler way would be to use connection pooling where a thread of your > running application "borrows" a connection from a pool of open connections, > executes the SQL command then returns the connection immediately on > completion of the SQL command. This will require few concurrent connections > (depending of configuration) and let the database do the caching of the > data for you. For effective database data caching may need to make > adjustments of the PostgreSQL configuration file (postgresql.conf file) as > well as the operating system resources configuration. This way the response > time of your client application will degrade gracefully with the increase > of concurrent client requests. > In practice, this approach works. How well? It works. But with thread/connection pools your site can be blocked by attacker ("cool-hacker") by initiating many slow concurrent connections. Using threads a good only if you're doing a lot of CPU work, rather than communication work. > For small number of concurrent connections, the speed advantage direct > “streaming” solution may have over the traditional connection pooling > solution may hardly be noticeable to end user. The easier way to increase > response time is to look into PostgreSQL performance tuning as well as > investing in faster hardware (mainly the the disk subsystem and more RAM). > Yes, I agree. Good and fast hardware is always good :-). But every hardware will has limited resources anyway. And I believe, that for many tasks async solution will help to leverage it to the max. > Regards, > > Allan. > > > > > On Wed, Sep 10, 2014 at 8:25 PM, Dmitriy Igrishin <dmit...@gmail.com> > wrote: > >> Hello, Steve >> >> 2014-09-10 21:08 GMT+04:00 Steve Atkins <st...@blighty.com>: >> >>> >>> On Sep 10, 2014, at 12:16 AM, Dmitriy Igrishin <dmit...@gmail.com> >>> wrote: >>> >>> > Hello, David >>> > >>> > 2014-09-10 4:31 GMT+04:00 David Boreham <david_l...@boreham.org>: >>> > Hi Dmitriy, are you able to say a little about what's driving your >>> quest for async http-to-pg ? >>> > I'm curious as to the motivations, and whether they match up with some >>> of my own reasons for wanting to use low-thread-count solutions. >>> > For many web projects I consider Postgres as a development platform. >>> Thus, >>> > I prefer to keep the business logic (data integrity trigger functions >>> and >>> > API functions) in the database. Because of nature of the Web, many >>> concurrent >>> > clients can request a site and I want to serve maximum possible of >>> them with >>> > minimal overhead. Also I want to avoid a complex solutions. So, I >>> believe that >>> > with asynchronous solution it's possible to *stream* the data from the >>> database >>> > to the maximum number of clients (which possible can request my site >>> over a >>> > slow connection). >>> >>> That's going to require you to have one database connection open for each >>> client. If the client is over a slow connection it'll keep the database >>> connection >>> open far longer than is needed, (compared to the usual "pull data from >>> the >>> database as fast as the disks will go, then spoonfeed it out to the slow >>> client" >>> approach). Requiring a live database backend for every open client >>> connection >>> doesn't seem like a good idea if you're supporting many slow concurrent >>> clients. >>> >> Good point. Thus, some of caching on the HTTP server side should be >> implemented >> then. >> >> >> -- >> // Dmitriy. >> >> > -- // Dmitriy.