Fixes for SEGV in dialog module.

<!-- Kamailio Pull Request Template -->

<!--
IMPORTANT:
  - for detailed contributing guidelines, read:
    https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
  - pull requests must be done to master branch, unless they are backports
    of fixes from master branch to a stable branch
  - backports to stable branches must be done with 'git cherry-pick -x 
...'
  - code is contributed under BSD for core and main components (tm, sl, auth, 
tls)
  - code is contributed GPLv2 or a compatible license for the other components
  - GPL code is contributed with OpenSSL licensing exception
-->

#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the 
checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING 
guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on 
sr-dev mailing list -->
- [x] Commit message has the format required by CONTRIBUTING guide
- [x] Commits are split per component (core, individual modules, libs, utils, 
...)
- [x] Each component has a single commit (if not, squash them into one commit)
- [x] No commits to README files for modules (changes must be done to docbook 
files
in `doc/` subfolder, the README file is autogenerated)

#### Type Of Change
- [ ] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)

#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the 
checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [ ] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)

#### Description
Hi !

The problem - periodically, 2-3 times in а month, kamailio dumps core with SEGV.

Different OS versions (Ubuntu 18,20,22), different kamailio versions from 5.7.1 
up to 5.8.4,
high call load, python kemi, actively used dialog module.

Analyzing core files shows more or less the same picture - SEGV happends in 
dialog mododule
around the dialog vars, can be in rpc call (dlg.list), in destroy dlg, in 
accessing dlg variable.

Backtrace can look like:

```
Core was generated by `/usr/sbin/kamailio -P /run/kamailio/kamailio.pid -f 
/etc/kamailio/kamailio.cfg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, 
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236     dlg_var.c: No such file or directory.
(gdb) bt full
#0  get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, 
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
        var = 0x8
        var_list = <optimized out>
#1  0x00007fa2a2ce0397 in get_dlg_varref (dlg=dlg@entry=0x7fa2818849b8, 
key=key@entry=0x7ffe67990d68) at dlg_var.c:297
        var = 0x0
        __func__ = "get_dlg_varref"
#2  0x00007fa2a2ccd429 in ki_dlg_var_get_mode (msg=<optimized out>, 
name=0x7ffe67990d68, rmode=0) at dialog.c:2462
        dlg = 0x7fa2818849b8
        pval = <optimized out>
#3  0x000055e2d5b87cbb in sr_kemi_exec_func (ket=ket@entry=0x7fa2a2d0c0c0 
<sr_kemi_dialog_exports+1728>, msg=msg@entry=0x7fa2a34577c0, 
pno=pno@entry=1, vps=vps@entry=0x7ffe67990d60) at core/kemiexec.c:82
        ret = <optimized out>
        __func__ = "sr_kemi_exec_func"
#4  0x00007fa2a2930376 in sr_apy_kemi_exec_func_ex 
(ket=ket@entry=0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>, 
self=self@entry=0x7fa280ce2b80, args=args@entry=0x7fa2804ba3a0, 
idx=idx@entry=485) at apy_kemi.c:327
        fname = <optimized out>
        i = <optimized out>
        ret = <optimized out>
        vps = {{vtype = 2, v = {n = -2142122656, l = 140335914274144, s = {s = 
0x7fa28051cd60 "rtpe_setid", len = 10}, dict = 0x7fa28051cd60}}, 
{vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 
0, v = {
              n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v 
= {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, 
l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0,
              s = {s = 0x0, len = 0}, dict = 0x0}}}
        env_P = <optimized out>
        lmsg = 0x7fa2a34577c0
        xret = <optimized out>
        slen = 10
        alen = 1
        pobj = <optimized out>
        __func__ = "sr_apy_kemi_exec_func_ex"
#5  0x00007fa2a2931f45 in sr_apy_kemi_exec_func (self=0x7fa280ce2b80, 
args=0x7fa2804ba3a0, idx=485) at apy_kemi.c:356
        ket = 0x7fa2a2d0c0c0 <sr_kemi_dialog_exports+1728>
        ret = 0x0
        pstate = 0x0
        pframe = 0x0
        tvb = {tv_sec = 0, tv_usec = 0}
        tve = {tv_sec = 0, tv_usec = 0}
        tz = {tz_minuteswest = 2, tz_dsttime = 0}
        tdiff = <optimized out>
        __func__ = "sr_apy_kemi_exec_func"
#6  0x00007fa2a266f697 in ?? () from /lib/x86_64-linux-gnu/libpython3.8.so.1.0

```

Inspecting dlg vars, the chain look like this:

```
(gdb) frame 0
#0  get_dlg_variable_unsafe (dlg=dlg@entry=0x7fa2818849b8, 
key=key@entry=0x7ffe67990d68) at dlg_var.c:236
236     in dlg_var.c
(gdb) p *dlg.vars
$1 = {key = {s = 0x7fa2833ea040 "mos", len = 3}, value = {s = 
0x7fa2833ea0b0 "4.3", len = 3}, vflags = 1, next = 0x7fa2825fa3d0}
(gdb) p *$.next
$2 = {key = {s = 0x7fa200332e34 <error: Cannot access memory at address 
0x7fa200332e34>, len = -1061109568}, value = {s = 0xabcdefed <error: 
Cannot access memory at address 0xabcdefed>, len = 49}, vflags = 1, next = 
0x8}
(gdb)
```



Additionaly, inspecting logs i found that this is happends in SIP race 
conditions, f.e. when caller and calle sends BYE at the same time and
kamailio processing two different transactions of same dialog in two different 
workers, so present some competition from identical scenarios.
Long time i can't reproduce this in lab, until insert an artificial delay 
into the scenario which process BYE request to increase competition.
This code quickly causes SEGV in race conditions with BYE:

```
#kemi get/set dlg var
while time.time()-start < 0.5:
    for i in ('bbb','ccc'):
        dlgvar=KSR.dialog.var_get(i)
        KSR.dialog.var_sets(i,str(random.randint(1,500000)))
```

SEGV not fire if i change dialog var_get/set to kamscript variant:

```
while time.time()-start < 0.5:
    for i in ('bbb','ccc'):
        dlgvar=KSR.pv.get(f"$dlg_var({i})")
        KSR.pv.sets(f"$dlg_var({i})", str(random.randint(1,500000)))

```

By analyzing source i found some places which are not properly covered by 
locks, and some dangerous places
where direct pointer to dialog var structure used, which can change in 
concurrent process,
so in first case i insert the locks, in second - it is better to use the clone 
of variable in private PV buffer.
Also found some mistakes with dialog flags, which can potentionaly affect DMQ 
and DB dialog behavior.


You can view, comment on, or merge this pull request online at:

  https://github.com/kamailio/kamailio/pull/4151

-- Commit Summary --

  * dialog: covering by locks accessing/changing variables in kemi functions

-- File Changes --

    M src/modules/dialog/dialog.c (36)
    M src/modules/dialog/dlg_hash.h (2)
    M src/modules/dialog/dlg_var.c (2)

-- Patch Links --

https://github.com/kamailio/kamailio/pull/4151.patch
https://github.com/kamailio/kamailio/pull/4151.diff

-- 
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/4151
You are receiving this because you are subscribed to this thread.

Message ID: <kamailio/kamailio/pull/4...@github.com>
_______________________________________________
Kamailio - Development Mailing List -- sr-dev@lists.kamailio.org
To unsubscribe send an email to sr-dev-le...@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the 
sender!

Reply via email to