I've noticed that one of my systems is getting hammered...and that more and
more traffic is being sent to the system having trouble.  Looking at
LeastLoadedNodeSelector.java I can see why.

LoadLoadedNodeSelector finds the node in the cluster that is least loaded
but its calculation of least loaded is based on the number of active
connections and ignores failures which tends to cause more connections to
be made to the machine that failed on a previous attempt.

Here is the code for the compare function that sorts the list of nodes.  It
checks for active count, borrowed count and then lastly corrupted count.
Corrupted count is the interesting one but its almost never gotten to since
the borrowed count will almost always differ between the nodes in the
cluster.

       * public int compareTo(Candidate candidate) {*
*            int value = numActive - candidate.numActive;*

*            if (value == 0)*
*                value = numBorrowed - candidate.numBorrowed;*

*            if (value == 0)*
*                value = numCorrupted - candidate.numCorrupted;*

*            return value;*
*        }*

I've seen this problem with other companies and products: leastloaded as a
means of picking servers is almost always liable to death spirals when a
server can have a failure.

Is there any way to configure away from this in C*?

Thanks,

Brian Tarbox

Reply via email to