First of all, I totally agree with adding more descriptive logs.

Let me rephrase your sentences to see whether I understood the issue
correctly.

So the main issue here is there are too many "Rescheduling Run" messages in
invokers?
Since controllers are given their own resource for all invokers and they
schedule activations based on their own semaphore, it is not supposed to
see such messages many times.
But it occasionally happens in the real scene and you suspect this is
because of some flaws or bugs in the system.
Do I understand correctly?

If yes, two things just come up in my mind.

1. getInvokerSlot
https://github.com/apache/incubator-openwhisk/blob/master/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/ShardingContainerPoolBalancer.scala#L477

As you know each invoker memory is divided by all controllers, and if the
given memory is less than the minimum size, controllers are given the
minimum memory size.

private def getInvokerSlot(memory: ByteSize): ByteSize = {
    val newTreshold = if (memory / _clusterSize < MemoryLimit.minMemory) {
      logging.warn(
        this,
        s"registered controllers: ${_clusterSize}: the slots per invoker
fall below the min memory of one action.")(
        TransactionId.loadbalancer)
      MemoryLimit.minMemory      // here
    } else {
      memory / _clusterSize
    }
    newTreshold
  }

This can easily happen with small invokers in a large scale deployment.
For example, invokers have 10GB memory and there are 100 controllers, each
controller would be only given 100MB.
Then all controllers would have 128MB for all invokers and if they schedule
activations to one invoker, then 128MB * 100 can be scheduled to an invoker.
Clearly, it is more than the invoker can handle.



2. MessageFeed

Currently, invokers have their own memory assigned and create containers
based on it.
But the MessageFeed does not care about the memory.
https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/org/apache/openwhisk/core/connector/MessageConsumer.scala#L106

It still uses "MaxPeek" to decide whether it will forward messages to
InvokerReactive.
So if there is only 256MB remaining in the invoker when the InvokerReactive
sends MessageFeed.Processed to the MessageFeed,
MessageFeed will send one message to the invoker regardless of the
remaining memory.
If the message requires a higher MemoryLimit than 256MB, it will induce a
"Rescheduling Run" message.


I hope this helps you to diagnose the issue.
And if I missed or understood wrong anything, kindly let me know.

Best regards
Dominic





2019년 7월 4일 (목) 오후 5:39, Sven Lange-Last <sven.lange-l...@de.ibm.com>님이 작성:

> When running Apache OpenWhisk with a high load of heterogeneous action
> invocations - i.e. high variance in memory / time limit and high number of
> different namespaces / actions -, single invokers are occasionally
> overloaded. With this post, I want to share my observations and
> conclusions with the community. As a consequence, I'm planning to provide
> a series of pull requests that are meant to provide more insights when
> running into the invoker overloads.
>
> Please share your observations and conclusions as well. I'm looking
> forward to your feedback.
>
> Invoker overloads can be concluded from the occurrence of the (in)famous
> "Rescheduling Run message" log entry on invokers [1]:
>
> "Rescheduling Run message, too many message in the pool, freePoolSize: 0
> containers and 0 MB, busyPoolSize: 2 containers and 512 MB,
> maxContainersMemory 512 MB, userNamespace: namespace, action:
> ExecutableWhiskAction/namespace/package/action@0.0.1, needed memory: 256
> MB, waiting messages: 0"
>
> Excess activations on overloaded invokers need to wait until any of the
> currently running actions completes. This can take up to the maximum
> action time limit, potentially causing inacceptable wait times for these
> excess activations.
>
> In large OW installations, we see thousands of said log entries at times
> in a few days. Do other OW adoptors share these observations?
>
> I'm aware of two conditions that cause overloaded invokers by design:
>
> 1. A controller schedules an activation - but no acknowledgement is
> received from the invoker within the expected time because the invoker
> takes too long to complete the activation. Based on timer expiration, a
> forced acknowledgement removes said activation from the load balancer's
> memory book-keeping. With the released capacity, the load balancer
> schedules a new activation to the invoker - which may still be running the
> action that timed out in the controller before.
>
> 2. A controller cannot identify an invoker that has enough free capacity
> to schedule an activation - in particular, this can happen if the
> activation's memory limit is equal to the controller's shard memory size
> on the invoker. If there is at least one usable invoker, the load balancer
> will select a random invoker to schedule the activation. This situation is
> called overload in the load balancer code and will yield a book-keeping
> semaphore with a negative value. Apparently, the selected invoker cannot
> process the scheduled activation.
>
> Did I miss other conditions that cause overloaded invokers by design?
>
> I suspect that there are additional causes for overloaded invokers -
> design flaws in the controller / load balancer or even bugs. I'm
> suggesting to extend log messages and improve existing metrics / introduce
> new metrics to better understand what's going on with overloaded invokers.
> We need to be careful when extending log messages - we must neither
> considerably increase the log volume nor impact performance due to
> additional operations. Overall, the goal is to eventually fix invoker
> overloads.
>
> I already opened a pull request (together with Sugandha) to add the action
> timeout limit to the invoker assignment message in load balancer plus some
> other changes [2]. Please check the PR for details.
>
> I'm planning further pull requests in these areas:
>
> * At times, we see a non-negligible amount of forced acknowledgements in
> large OW installations. For this reason, I suggest to extend log messages
> in the processCompletion() method [3] in the common load balancer code to
> provide more diagnostic information when forced acknowledgements occur. In
> particular, I want to add more information about the invoker processing
> the activation and the action itself. A metric reflecting forced
> acknowledgements also seems helpful.
>
> * As discussed above, the load balancer will schedule an activation to a
> randomly selected usable invoker if it cannot find a usable invoker that
> has enough free user memory ("overload"). This can also be caused by
> fragmentation where all invokers are running activations with small memory
> limits and the activation to be scheduled has a very high memory limit.
> Even though the invoker pool may have plenty of free user memory in total,
> no single invoker may have enough free memory to fit a large activation.
>
> For this reason, I'm planning to extend the schedule() method [4] in the
> ShardingContainerPoolBalancer to collect more information about scheduling
> that is logged afterwards: how many invokers were visited? Which minimum,
> average and maximum free memory did the usable invokers have that were not
> selected?
>
> * When the "Rescheduling Run message" log entry on invokers [1] occurs, we
> don't know what's currently going on in busy and free container pools. I'm
> planning to extend the log message with more detail information about the
> pools to better understand the scheduling history of this invoker. We need
> to understand which activations currently occupy the invoker.
>
> Please let me know what you think.
>
> [1]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/ContainerPool.scala#L175-L199
> [2] https://github.com/apache/incubator-openwhisk/pull/4537
> [3]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/CommonLoadBalancer.scala#L217-L222
> [4]
>
> https://github.com/apache/incubator-openwhisk/blob/ee33c22224515d0161b5b595b0d45dbc14164c6e/core/controller/src/main/scala/org/apache/openwhisk/core/loadBalancer/ShardingContainerPoolBalancer.scala#L377-L398
>
>
> Mit freundlichen Grüßen / Regards,
>
> Sven Lange-Last
> Senior Software Engineer
> IBM Cloud Functions
> Apache OpenWhisk
>
>
> E-mail: sven.lange-l...@de.ibm.com
> Find me on:
>
>
> Schoenaicher Str. 220
> Boeblingen, 71032
> Germany
>
>
>
>
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart,
> HRB 243294
>
>
>

Reply via email to