Hi Holger, I did not completely understand your mail but when reading the answer of Sven I remembered that some time ago we also had problems with running out of semaphores.
After writing with Esteban Lorenzano, we did the following: 1) Smalltalk vm maxExternalSemaphoresSilently: 65535. at startup of the application 2) setting the default pool size of VOMongoRepository from 10 to 2 (we have our own VOMongoRepository subclass) Perhaps this is not your topic perhaps it helps. Sabine Am Mo., 10. Feb. 2020 um 15:14 Uhr schrieb Sven Van Caekenberghe < s...@stfx.eu>: > Hi Holger, > > That is a complicated story ;-) > > But, you running out of external semaphores means that you are using too > many sockets, are not closing/releasing them (in time) and/or your GC does > not run enough to keep up (it is easy to deplete the external semaphore > table without the GC kicking in). > > You must have a loop somewhere that goes too fast and maybe does not clean > up properly while doing so. > > YMMV, but I do similar things -- implement/offer REST services that call > other REST/network services, all with timeouts, in several variations, for > years, and I do not have problems like you describe. > > I would suggest enabling logging so that you can see better where the > allocations happen and if your cleanup code does its work. > > Sven > > PS: Zinc logging is easy, just do > > ZnLogEvent logToTranscript > > > On 9 Feb 2020, at 16:31, Holger Freyther <hol...@freyther.de> wrote: > > > > tl;dr: I am searching for a pattern (later code) to apply expiration to > operations. > > > > > > > > Introduction: > > > > One nice aspect of Mongodb is that it has built-in data distribution[1] > and configurable retention[2]. The upstream project has a document called > "Server Discovery and Monitoring (SDAM)", defining how a client should > behave. Martin Dias is currently implementing SDAM in MongoTalk/Voyage and > I took it on a test drive. > > > > > > Behavior: > > > > My software stack is using Zinc, Zinc-REST, Voyage and Mongo. When a new > REST requests arrives I am using Voyage (e.g. >>#selectOne:) which will use > MongoTalk. The MongoTalk code needs to select the right server. It's > currently done by waiting for a result. > > > > Next I started to simulate database outages. The rest clients retried > when not receiving a result within two seconds (no back-off/jitter). What > happened was roughly the following: > > > > > > [ > > 1.) ZnServer accepts a new connection > > 2.) MongoTalk waits for a server longer than 2s > > "nothing.. the above waits..." > > ] repeat. > > > > > > > > > > Problem: > > > > What happened next surprised me. I expected to have a bad time when the > database recovers and all the stale (remember the REST clients already gave > up and closed the socket) requests will be answered. Instead my image > crashed early in my test as the ExternalSemaphoreTable was full. > > > > Let's focus on the timeout behavior and discuss the existence of the > ExternalSemaphoreTable and the number of entries separately at a different > time. > > > > > > > > > > To me the two main problems I see are: > > > > > > 1.) Lack of back-pressure for ZnManagingMultiThreadedServer > > > > 2.) Disconnect of time between the Application Layer handling REST is > allowed to take and down the stack how long MongoTalk may sleep and wait > for a server. > > > > > > The first item is difficult. Even answering HTTP 500 when we are out of > space in the ExternalSemaphore is difficult... Let's ignore this for now as > well. > > > > > > > > > > > > > > What I look for: > > > > > > 1.) Voluntarily Timeout > > > > Inside my Application code I would like to tag an operation with a > timeout. This means everything that is done should complete within X > seconds. It can be used on a voluntarily basis. > > > > > >>> #lookupPerson > > > > "We expect all database operations to complete within two seconds" > > person := ComputeContext current withTimeout: 2 seconds during: [ > > repository selectOne: Person where: [:each name | ...], > > ]. > > > > > > > > MongoTalk>>stuff > > "See if the outer context timeout has expired and signal. E.g. before > writing > > something into the socket to keep consistency." > > ComputeContext current checkExpired. > > > > > > MongoTalk>>other > > "Sleep for up to the remaining time out > > (someSemaphore waitTimeoutContext: ComputeContext current) ifFalse: [ > > SomethingExpired signal. > > ] > > > > > > > > 2.) Cancellation > > > > > > More difficult to write in pseudo code (without TaskIt?). In my above > case we are waiting for the database to be ready while the client already > closed the file descriptor. Now we are not able to see this until much > later. > > > > The idea is that in addition to the timeout we can pass a block that is > called when an operation should be cancelled and the ComputeContext can be > checked if something has been cancelled? > > > > > > > > > > The above takes inspiration from Go's context package[3]. In Go the > context should be passed as parameter but we could make it a Process > variable? > > > > > > > > > > > > Question: > > > > How do you handle this in your systems? Is this something we can > consider for Pharo9? > > > > > > > > thanks > > holger > > > > > > > > > > > > > > > > > > [1] It has the concept of "replicationSet" and works by having a > primary, secondary and arbiters running. > > [2] For every write one can configure if the write should succeed > immediately (before it is even on disk) or when it has been written to > multiple stores (e.g. majority, US and EMEA) > > [3] https://golang.org/pkg/context/ > > > > > > > > >