Overflowing to Kafka (option b) is better. Actually I would dump all the activations there and have a separate process to drain that Kafka topic to the datastore or logstore.
There is another approach of routing the logs directly to a logstore without going through the invoker at all. IBM may have experimented with this maybe someone else can comment on that. -r > On Jun 20, 2019, at 2:20 AM, Chetan Mehrotra <chetan.mehro...@gmail.com> > wrote: > > Hi Team, > > When rate of activation is high (specially with concurrency enabled) > in a specific invoker then its possible that rate of storage of > activation in ArtifactStore lags behind rate of activation record > generation. > > For CouchDB this was somewhat mitigated by using a Batcher > implementation which internally used CouchDB bulk insert api > (#2812)[1]. However currently Batcher is configured with a queue size > of Int.max [2] which can potentially lead to Invoker going OOM > > We tried to implement a similar support for CosmosDB (#4513)[3]. With > our test we see that even with queue size of 100k was getting filled > up quickly with higher load. > > For #2812 Rodric had mentioned the need to support backpressure [4] > >> we should perhaps open an issue to refactor the relevant code so that we can >> backpressure the invoker feed when the activations can't be drained fast >> enough. > > Currently the storeActivation call is not waited upon in > ContainerProxy and hence there is no backpressure. Wanted to check on > what possible options we can try if activations can't be drained fast > enough. > > Option A - Wait till enqueue > ------------------------------------- > > Somehow when calling storeACtivation wait till calls gets "enqueued". > If it gets rejected due to queue full (assuming here that > ArtifactStore has a queued storage) then we wait and retry few times. > If it gets queued then we simply complete the call. > > With this we would not be occupying the invoker slot untill storage is > complete. Instead we just occupy bit more till activations get > enqueued to in memory buffer > > Option B - Overflow to Kafka and new OverflownActivationRecorderService > ---------------------------------------------------------------------------------------------------- > > With enqueuing the activations there is always a chance of increase > the heap pressure specially if activation size is large (user > controlled aspect). So another option would be to overflow to Kafka > for storing activation. > > If internal queue is full (queue size can now be small) then we would > enque the record to Kafka. Kafka would in general have higher > throughput rate compared to normal storage. > > Then on other end we can have a new micro service which would poll > this "overflowActivations" topic and then persist them in > ArtifactStore. Here we can even use single but partitioned topic if > need to scale out the queue processing by multiple service instances > if needed. > > Any feedback on possible option to pursue would be helpful! > > Chetan Mehrotra > [1]: https://github.com/apache/incubator-openwhisk/pull/2812 > [2]: > https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/org/apache/openwhisk/core/database/Batcher.scala#L56 > [3]: https://github.com/apache/incubator-openwhisk/pull/4513 > [4]: > https://github.com/apache/incubator-openwhisk/pull/2812#pullrequestreview-67378126