### Description
While running a simple load with a UAC(Sipp) and two UAS(Both Sipp) , with 
Kamailio in the middle, acting as a Call Stateful Proxy, I observed that the 
Dispatcher Module reports one call stuck despite NO calls being reported as 
Active in the Dialog Module.

Setup:
------- 
                                                                                
                              |------> UAS1(sipp)
UAC(sipp) ----------------------Kamailio(Call Stateful Proxy)------------
                                                                                
                              |------> UAS2(sipp)

Scenario: The scenario is simple INVITE----- 200OK------ACK , with BYE being 
sent after 60 secs by UAC.

### Troubleshooting
 
Point noteworthy is that this is purely timing issue and happens generally with 
very long duration, like 24 hours or so, but can crop up withing a few hours 
also. 

#### Reproduction

Just run the calls via sipp as described in the setup. 

#### Debugging Data

**OUTPUT after cleanly stopping the test:**

> Output of "kamctl stats dialog" command
     {
  "jsonrpc":  "2.0",
  "result": [
    "dialog:active_dialogs = 0",
    "dialog:early_dialogs = 0",
    "dialog:expired_dialogs = 0",
    "dialog:failed_dialogs = 959",
    "dialog:processed_dialogs = 1695662"
  ],
  "id": 21282
 }

> Output of "kamcmd dispatcher.list"

   {
        NRSETS: 1
        RECORDS: {
                SET: {
                        ID: 1
                        TARGETS: {
                                DEST: {
                                        URI: sip:10.214.3.20:5060;transport=udp
                                        FLAGS: AP
                                        PRIORITY: 0
                                        ATTRS: {
                                                BODY: 
duid=sample-cas-1;maxload=1000
                                                DUID: sample-cas-1
                                                MAXLOAD: 1000
                                                WEIGHT: 0
                                                RWEIGHT: 0
                                                SOCKET: 
                                        }
                                        LATENCY: {
                                                AVG: 2.409000
                                                STD: 108.999000
                                                EST: 1.124000
                                                MAX: 9503
                                                TIMEOUT: 22
                                        }
                                        RUNTIME: {
                                                DLGLOAD: 997
                                        }
                                }
                                DEST: {
                                        URI: sip:10.214.3.19:5060;transport=udp
                                        FLAGS: AP
                                        PRIORITY: 0
                                        ATTRS: {
                                                BODY: 
duid=sample-cas-0;maxload=1000
                                                DUID: sample-cas-0
                                                MAXLOAD: 1000
                                                WEIGHT: 0
                                                RWEIGHT: 0
                                                SOCKET: 
                                        }
                                        LATENCY: {
                                                AVG: 1.429000
                                                STD: 40.279000
                                                EST: 0.999000
                                                MAX: 3502
                                                TIMEOUT: 22
                                        }
                                        RUNTIME: {
                                                DLGLOAD: 985
                                        }
                                }
                        }
                }
        }
}

NOTE: Please see that the dispatcher shows a lot of calls stuck after overnight 
load, and there by we started seeing 404's sent back by Kamailio dispatcher 
module for most of the calls. 
It is noteworthy that we have kept a close tab on the tests thereafter and 
caught an iteration where initially only call is stuck. 

### Possible Solutions

The problem most likely seems to be in the dispatcher module, which does not 
seem to be incrementing/decrementing the load variable for each UAS( 'dload' 
variable to be precise), due to which the concurrent incrementing/decrementing 
of the 'dload' parameter makes the dispatcher report spurious values.

FIX: The following needs to be done.
1.   File: **dispatch.c**
        Function: **ds_load_add()**
        Just before dset->dlist[dst].dload++;  line, we must take a lock and 
release it thereafter. Like:

                 **lock_get(&dset->lock)**;
                       dset->dlist[dst].dload++;
                 **lock_release(&dset->lock);**

 2: File: **dispatch.c**
       Function: **ds_load_replace()** , the code needs a lock as well while 
decrementing. Like: 
       **lock_get(&idx->lock);**
        if(idx->dlist[olddst].dload > 0) {
                idx->dlist[olddst].dload--;
                **lock_release(&idx->lock);**
                  ......
        } else {
                **lock_release(&idx->lock);**
                ...... 
        }

3. File: **dispatch.c**
       Function: **ds_load_remove_byid** , the code needs a lock as well while 
decrementing. Like: 

    **lock_get(&idx->lock);**
        if(idx->dlist[olddst].dload > 0) {
                idx->dlist[olddst].dload--;
                **lock_release(&idx->lock);**
                ...........
        }  else {
                **lock_release(&idx->lock);**
                ........
        }

NOTE: I believe the lock should be taken also for the the API 
"ds_get_leastloaded()" in dispatch.c because otherwise, we may get select 
incorrect destination, especially when we are at just about the same load and 
that is almost close to the 'maxload' value configured in the dispatcher list 
file. But I will wait for the community comments for this.

The above fix solved my issue, so please review the same and if you deem 
suitable, I could go ahead with the fix delivery.

### Additional Information

Kamailio Version: 
[root@CPaaSVM ~]# kamailio -v
version: kamailio 5.3.2 (x86_64/linux) 7ba545
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, 
USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, 
DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, 
USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, 
BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 7ba545
compiled on 19:01:39 May  1 2020 with gcc 4.8.5

NOTE: I do not think this is version specific and probably that it exists in 
the later versions also.

* **Operating System**:
Centos 7.7



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2322
_______________________________________________
Kamailio (SER) - Development Mailing List
sr-dev@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev

Reply via email to