On 22.12.2017 16:13, Claudio Freire wrote:


On Fri, Dec 22, 2017 at 10:07 AM, Konstantin Knizhnik <k.knizh...@postgrespro.ru <mailto:k.knizh...@postgrespro.ru>> wrote:

    While my experiments with pthreads version of Postgres I find out
    that I can not create more than 100k backends even at the system
    with 4Tb of RAM.
    I do not want to discuss now the idea of creating so large number
    of backends - yes, most of the real production systems are using
    pgbouncer or similar connection pooling
    tool allowing to restrict number of connections to the database.
    But there are 144 cores at this system and if we want to utilize
    all system resources then optimal number of
    backends will be several hundreds (especially taken in account
    that Postgres backends are usually not CPU bounded and have to
    read data from the disk, so number of backends
    should be much larger than number of cores).

    There are several per-backend arrays in postgres which size
    depends on maximal number of backends.
    For max_connections=100000 Postgres allocates 26Mb for each snapshot:

            CurrentRunningXacts->xids = (TransactionId *)
                malloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));

    It seems to be too overestimated value, because
    TOTAL_MAX_CACHED_SUBXIDS is defined as:

        /*
         * During Hot Standby processing we have a data structure called
         * KnownAssignedXids, created in shared memory. Local data
    structures are
         * also created in various backends during GetSnapshotData(),
         * TransactionIdIsInProgress() and
    GetRunningTransactionData(). All of the
         * main structures created in those functions must be
    identically sized,
         * since we may at times copy the whole of the data structures
    around. We
         * refer to this size as TOTAL_MAX_CACHED_SUBXIDS.
         *
         * Ideally we'd only create this structure if we were actually
    doing hot
         * standby in the current run, but we don't know that yet at
    the time
         * shared memory is being set up.
         */
    #define TOTAL_MAX_CACHED_SUBXIDS \
        ((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)


    Another 12Mb array is used for deadlock detection:

    #2  0x00000000008ac397 in InitDeadLockChecking () at deadlock.c:196
    196            (EDGE *) palloc(maxPossibleConstraints * sizeof(EDGE));
    (gdb) list
    191         * last MaxBackends entries in possibleConstraints[]
    are reserved as
    192         * output workspace for FindLockCycle.
    193         */
    194        maxPossibleConstraints = MaxBackends * 4;
    195        possibleConstraints =
    196            (EDGE *) palloc(maxPossibleConstraints * sizeof(EDGE));
    197


    As  result amount of dynamic memory allocated for each backend
    exceeds 50Mb and so 100k backends can not be launched even at the
    system with 4Tb!
    I think that we should use more accurate allocation policy in this
    places and do not waste memory in such manner (even if it is virtual).


Don't forget each thread also has its own stack. I don't think you can expect 100k threads to ever work.

Yes, Postgres  requires large stack. Although minimal pthread stack size is 16kb, Postgres requires at least 512kb and it is still not enough for passing regression tests. But even with 1Mb thread stack size, 100k connections requires just (!) 100Gb. But 50Mb is too much.


If you get to that point, you really need to consider async query execution. There was a lot of work related to that in other threads, you may want to take a look.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Reply via email to