Thanks for your help, indeed duplicating the result sounds cleaner than duplicating before escaping. However, you patch still fails, when allocating in "pool_data_stack_malloc" this time. I get the following stack trace:
Panic: pool_data_stack_malloc(): stack frame changed Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0x91662) [0x7f4106ba1662] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x916e9) [0x7f4106ba16e9] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f4106b3aae1] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xac14e) [0x7f4106bbc14e] -> /usr/local/lib/dovecot/libdovecot.so.0(p_strdup+0x28) [0x7f4106bcbd88] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x5ef0c) [0x7f4106b6ef0c] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x5f30f) [0x7f4106b6f30f] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x5ebef) [0x7f4106b6ebef] -> /usr/local/lib/dovecot/libdovecot.so.0(connection_input_default+0xb1) [0x7f4106b9ee81] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x5f) [0x7f4106bb5fdf] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0x109) [0x7f4106bb7499] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x25) [0x7f4106bb6085] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f4106bb6228] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x5ef6c) [0x7f4106b6ef6c] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x5fafc) [0x7f4106b6fafc] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(+0x1b4b0) [0x55f11cd3e4b0] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(doveadm_cmd_ver2_to_cmd_wrapper+0x23a) [0x55f11cd5ae6a] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(doveadm_cmd_run_ver2+0x555) [0x55f11cd5bc15] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(doveadm_cmd_try_run_ver2+0x37) [0x55f11cd5bc67] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(main+0x1da) [0x55f11cd3c31a] -> /usr/lib/libc.so.6(__libc_start_main+0xf1) [0x7f4106792291] -> /home/kaiyou/projects/dovecot/src/doveadm/.libs/doveadm(_start+0x2a) [0x55f11cd3c6fa] The trace is missing some symbols, I will debug tomorrow and see where the call comes from exactly. Regards, On 10/17/2016 06:23 PM, Aki Tuomi wrote: > Hi! > > Looking at the code, I think the bug is that it just copies pointer to the > value, it should, instead, duplicate the value to some memory region. Can you > see if this following patch fixes it? > > Aki > >> On October 17, 2016 at 4:14 PM Pierre Jaury <pie...@jaury.eu> wrote: >> >> >> Okay, it seems to me that the bug is due to "t_str_tabunescape" using >> the unsafe datastack ("t_strdup_noconst") while the string is actually >> returned in an async callback. >> >> Before it is handled by "client_dict_lookup", "client_dict_wait" >> actually fires some IO loops that are more than likely to call "t_pop" >> and free/flush the result string (I checked, it does call "t_pop" a >> couple times indeed). Maybe it would be safer to use a standard >> datastack pool when unescaping a string in that context. >> >> I could work on a patch that would: >> >> - add an optional "pool" attribute to the "client_dict_cmd" structure, >> - pass the pool to the async lookup function, >> - use the pool when escaping strings that should survive the callback >> chain >> >> What do you think? >> >> Regards, >> kaiyou >> >> >> On 10/17/2016 09:52 AM, Pierre Jaury wrote: >>> While trying to isolate properly and reproduce, I was able to trigger >>> the same bug with the following code: >>> >>> struct dict *dict; >>> char* dict_uri = "proxy::sieve"; >>> char* key = "priv/key"; >>> char* username = "ad...@domain.tld"; >>> char* value, error; >>> >>> dict_drivers_register_builtin(); >>> dict_init(dict_uri, DICT_DATA_TYPE_STRING, username, >>> doveadm_settings->base_dir, &dict, &error); >>> dict_lookup(dict, pool_datastack_create(), key, &value); >>> printf(">%s\n", value); // outputs an empty string >>> dict_deinit(&dict); >>> >>> I trimmed it to the bare minimal string manipulation functions involved >>> but cannot reproduce in that case: >>> >>> pool_t pool = pool_datastack_create(); >>> >>> char* s1 = "test\001n\001rtest"; >>> char* s2 = t_str_tabunescape(s1); >>> char* s3 = p_strdup(pool, s2); >>> >>> printf("1>%s\n", s1); >>> printf("2>%s\n", s2); >>> printf("3>%s\n", s3); // all three output the string with NL and CR >>> >>> Maybe I am missing a function call in the process or maybe the issue is >>> related to the way unescaping is performed in the async callback >>> function in "dict-client.c", or maybe even some other edge case. >>> >>> Finally, I was able to run the first snippet without bug by removing the >>> string duplication in "t_str_tabunescape" (which I realize is not a >>> proper solution), or by explicitely using the following pool: >>> >>> return str_tabunescape(p_strdup(pool_datastack_create(), str)); >>> >>> >>> Hope this helps. >>> >>> kaiyou >>> >>> >>> On 10/17/2016 07:51 AM, Aki Tuomi wrote: >>>> Hi! >>>> >>>> This does sound like a bug, we'll have look. >>>> >>>> Aki >>>> >>>> >>>> On 17.10.2016 01:26, Pierre Jaury wrote: >>>>> I dived a little bit further into the rabbit hole, up to the point where >>>>> debugging has become unpracticle but I still haven't found the root >>>>> cause for sure. >>>>> >>>>> I read most of the code for "p_strdup" based on datastack memory pools >>>>> (which are used for dictionary lookups both with doveadm and by extdata) >>>>> and it seems ok. Still, after "t_malloc_real" is called in "t_malloc0", >>>>> the allocated buffer has the same address as the source string. >>>>> >>>>> The only sensible explanation I can come up with is that during >>>>> unescaping, strings are not allocated properly, leading to the memory >>>>> pool reusing the string address and zeroing it in the process before the >>>>> string copy operation. >>>>> >>>>> I will follow on this path tomorrow, any lead is more than welcome. >>>>> >>>>> kaiyou. >>>>> >>>>> On 10/16/2016 11:16 PM, Pierre Jaury wrote: >>>>>> Hello, >>>>>> >>>>>> I am using a dict proxy for my sieve extdata plugin to access some >>>>>> fields from an SQLite database (autoreply text and other >>>>>> database-configured items). >>>>>> >>>>>> All tests are performed against version 2.2.25. >>>>>> >>>>>> $ dovecot --version >>>>>> 2.2.25 (7be1766) >>>>>> >>>>>> My configuration looks like: >>>>>> >>>>>> dict { >>>>>> sieve = sqlite:/etc/dovecot/pigeonhole-sieve.dict >>>>>> } >>>>>> >>>>>> [...] >>>>>> sieve_extdata_dict_uri = proxy::sieve >>>>>> >>>>>> I am able to read pretty much any attribute without any issue, except >>>>>> when the value contains a special character like "\r" or "\n". By using >>>>>> the doveadm dict client, I narrowed it down to the dictionary management >>>>>> part (either server, protocol or client). >>>>>> >>>>>> I was suspecting escaping functions from "lib/strescape.c" (mostly >>>>>> str_tabescape and its counterpart, used by "lib-dict/client.c"), so I >>>>>> monitored socket communications. It seems that escaping is done properly >>>>>> on the server and the socket is not an issue either. >>>>>> >>>>>> The following strace dump results from running "doveadm dict get" >>>>>> against the dict socket: >>>>>> >>>>>> connect(8, {sa_family=AF_UNIX, sun_path="..."}, 110) = 0 >>>>>> fstat(8, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 >>>>>> [...] >>>>>> write(8, "H2\t0\t0\tad...@domain.tld\tsieve\n", 30) = 30 >>>>>> [...] >>>>>> read(8, "Otest\1r\1ntest\n", 8192) = 14 >>>>>> >>>>>> Indeed "\1r" and "\1n" are the escape sequences used by >>>>>> "lib/strescape.c". I went deeped and debugged the call to "dict_lookup" >>>>>> performed by doveadm. Indeed the client gets the proper string from the >>>>>> socket and to my surprise, it is properly unescaped. >>>>>> >>>>>> Then, in "client_dict_lookup" ("lib-dict/dict-client.c"), the call to >>>>>> "p_strdup" returns an empty string (null byte set at the target address). >>>>>> >>>>>> Before the call to the dict "->lookup" attribute (client_dict_lookup): >>>>>> >>>>>> RAX: 0x7ffff73a37c0 (push r14) >>>>>> RBX: 0x6831b8 ("priv/reply_body") >>>>>> RCX: 0x7fffffffe240 --> 0x682a60 --> 0x6831b8 ("priv/reply_body") >>>>>> RDX: 0x6831b8 ("priv/reply_body") >>>>>> RSI: 0x683288 --> 0x7ffff7653120 --> 0x7ffff73ea620 ([...]) >>>>>> RDI: 0x690ad0 --> 0x7ffff7400713 --> 0x75250079786f7270 ('proxy') >>>>>> >>>>>> 0x7ffff73a1f10 <dict_lookup+32>: mov rcx,r11 (value_r) >>>>>> 0x7ffff73a1f13 <dict_lookup+35>: mov rdx,r8 (key) >>>>>> 0x7ffff73a1f16 <dict_lookup+38>: mov rsi,r10 (pool) >>>>>> 0x7ffff73a1f19 <dict_lookup+41>: mov rdi,r9 (dict) >>>>>> 0x7ffff73a1f1c <dict_lookup+44>: add rsp,0x8 >>>>>> => 0x7ffff73a1f20 <dict_lookup+48>: jmp rax >>>>>> >>>>>> Before the call to p_strdup in "client_dict_lookup": >>>>>> >>>>>> RSI: 0x6832d8 ("test\r\ntest") (lookup.result.value) >>>>>> RDI: 0x683288 --> 0x7ffff7653120 --> [...] (pool) >>>>>> RAX: 0x0 (result) >>>>>> >>>>>> 0x7ffff73a384f: nop >>>>>> 0x7ffff73a3850: mov rsi,QWORD PTR [rsp+0x8] >>>>>> 0x7ffff73a3855: mov rdi,r14 >>>>>> => 0x7ffff73a3858: call 0x7ffff736d3c0 <p_strdup@plt> >>>>>> 0x7ffff73a385d: mov QWORD PTR [r13+0x0],rax >>>>>> 0x7ffff73a3861: mov rsi,QWORD PTR [rsp+0x18] >>>>>> 0x7ffff73a3866: xor rsi,QWORD PTR fs:0x28 >>>>>> 0x7ffff73a386f: mov eax,ebx >>>>>> >>>>>> After the call: >>>>>> >>>>>> 0x7ffff73a3850: mov rsi,QWORD PTR [rsp+0x8] >>>>>> 0x7ffff73a3855: mov rdi,r14 >>>>>> 0x7ffff73a3858: call 0x7ffff736d3c0 <p_strdup@plt> >>>>>> => 0x7ffff73a385d: mov QWORD PTR [r13+0x0],rax >>>>>> 0x7ffff73a3861: mov rsi,QWORD PTR [rsp+0x18] >>>>>> 0x7ffff73a3866: xor rsi,QWORD PTR fs:0x28 >>>>>> 0x7ffff73a386f: mov eax,ebx >>>>>> 0x7ffff73a3871: jne 0x7ffff73a38da >>>>>> >>>>>> RSI: 0x0 >>>>>> RDI: 0x6832d8 --> 0x0 >>>>>> RAX: 0x6832d8 --> 0x0 (result) >>>>>> >>>>>> It is worth noting that I can reproduce the exact same execution flow >>>>>> with a non-multiline result string (lookup.result.value) that is >>>>>> properly copied by "p_strdup" and returned in RAX, then displayed by >>>>>> doveadm. >>>>>> >>>>>> I am not familiar with the pooling mechanism hidden behind the call to >>>>>> p_strdump and not quite sure why this behaviour is emerging. Maybe I am >>>>>> even miles away from an understanding of the issue here, but it sounds >>>>>> to me like something is wrong in the way "p_strdup" performs the copy. >>>>>> >>>>>> Hope this helps, >>>>>> kaiyou. >>>>>> >>>>>> >>>>> >