https://bugs.kde.org/show_bug.cgi?id=417377
Bug ID: 417377 Summary: False-positive "uninitialised value(s)" reports from memcheck Product: valgrind Version: unspecified Platform: Other OS: Linux Status: REPORTED Severity: normal Priority: NOR Component: memcheck Assignee: jsew...@acm.org Reporter: l...@yuriev.ru Target Milestone: --- SUMMARY When dealing with memcheck messages about reading uninitialized data in some project, I came to the conclusion that these are false-positives. I would like to understand with the problem. Below I described my actions and findings. Basically I see the reports like this: ==21788== Conditional jump or move depends on uninitialised value(s) ==21788== at 0x4E561A7: mdbx_pnl_check (core.c:2733) ==21788== by 0x4E561A7: mdbx_page_alloc.isra.61 (core.c:5073) ==21788== by 0x4E5742A: mdbx_page_touch (core.c:5433) ==21788== by 0x4E57F8F: mdbx_page_search (core.c:10794) ==21788== by 0x4E5E214: mdbx_cursor_touch (core.c:11656) ==21788== by 0x4E623F2: mdbx_cursor_put (core.c:11865) ==21788== by 0x4E63C39: mdbx_put (core.c:14879) ==21788== by 0x11A068: testcase::insert(std::unique_ptr<keygen::result, keygen::buffer_deleter> const&, std::unique_ptr<keygen::result, keygen::buffer_deleter> const&, unsigned int) (test.cc:617) ==21788== by 0x115358: testcase_nested::grow_head(unsigned int) (nested.cc:195) ==21788== by 0x1157D9: testcase_nested::run() (nested.cc:272) ==21788== by 0x1190A1: test_execute(actor_config const&) (test.cc:585) ==21788== by 0x10F18D: osal_actor_start(actor_config const&, int&) (osal-unix.cc:369) ==21788== by 0x10C4C0: main (main.cc:525) ==21788== Uninitialised value was created by a client request ==21788== at 0x4E5ABD2: mdbx_page_malloc (core.c:3563) ==21788== by 0x4E5EBE7: mdbx_page_split (core.c:14502) ==21788== by 0x4E62720: mdbx_cursor_put (core.c:12214) ==21788== by 0x4E63C39: mdbx_put (core.c:14879) ==21788== by 0x11A068: testcase::insert(std::unique_ptr<keygen::result, keygen::buffer_deleter> const&, std::unique_ptr<keygen::result, keygen::buffer_deleter> const&, unsigned int) (test.cc:617) ==21788== by 0x115358: testcase_nested::grow_head(unsigned int) (nested.cc:195) ==21788== by 0x1157D9: testcase_nested::run() (nested.cc:272) ==21788== by 0x1190A1: test_execute(actor_config const&) (test.cc:585) ==21788== by 0x10F18D: osal_actor_start(actor_config const&, int&) (osal-unix.cc:369) ==21788== by 0x10C4C0: main (main.cc:525) This is a test code for an embedded DBMS (called MDBX), which is well checked. The presence of such an error (getting uninitialized data in the list of pages) is extremely unlikely, since it will lead to damage to the DBMS (B+tree) structure, and the test could not be completed successfully. In addition, the MDBX code supports the build mode with-valgrind, in which more strict control is organized through Valgrind's "Client Request" mechanism. In this mode, there are no messages about using uninitialized data (not counting the symptoms of https://bugs.kde.org/show_bug.cgi?id=417075). However, the absence of messages in this mode is not sufficient, since (first) users do not use it to check their programs, and (second) there may be an errors using the "client request" mechanism. So I tried to make some modifications to manifest a bugs. In simpler terms: 1) To minimize the influence I defined xx_VALGRIND_MAKE_MEM_UNDEFINED macro exactly the same as VALGRIND_MAKE_MEM_UNDEFINED, and redefne all native Valgrind's macros to the "do{}while(0)". This allows me to make a client request in one place, but disable all the others without changing the source code. 2) Added initialization of the suspected memory allocation with a characteristic value (e.g. 42) with a following call to xx_VALGRIND_MAKE_MEM_UNDEFINED(). Thus now if I see the values "42" in the data, then the valgrind messages are correct, otherwise false-positive. For complete changeset see https://github.com/leo-yuriev/libmdbx/commit/5bf3d285cb55a8266308d5b5d9e236e40f282c04 So, actually the source code of the relevant function is: MDBX_page *mdbx_page_malloc(MDBX_txn *txn, unsigned num) { MDBX_env *env = txn->mt_env; MDBX_page *np = env->me_dpages; size_t size = env->me_psize; if (likely(num == 1 && np)) { ASAN_UNPOISON_MEMORY_REGION(np, size); VALGRIND_MEMPOOL_ALLOC(env, np, size); VALGRIND_MAKE_MEM_DEFINED(&np->mp_next, sizeof(np->mp_next)); env->me_dpages = np->mp_next; } else { size = pgno2bytes(env, num); np = mdbx_malloc(size); if (unlikely(!np)) { txn->mt_flags |= MDBX_TXN_ERROR; return np; } VALGRIND_MEMPOOL_ALLOC(env, np, size); } memset(np, 42, size); #if MDBX_DEBUG np->mp_pgno = 0; #endif xx_VALGRIND_MAKE_MEM_UNDEFINED(np, size); np->mp_flags = 0; np->mp_pages = num; return np; } https://github.com/leo-yuriev/libmdbx/blob/5bf3d285cb55a8266308d5b5d9e236e40f282c04/src/elements/core.c#L3540-L3567 Then I ran the test in stop mode on an error, connect by gdb+vgdb, and looked at the contents of the memory bytes that valgrind reported as uninitialized. There are no 42 values. Always and everywhere I saw correct meaningful values, which is expected (otherwise the test would break quickly). This allows me to be sure that valgrind messages are false-positive. A couple of significant notes: 1. If we comment out xx_VALGRIND_MAKE_ME_UNDEFINED, but leave memset(.., 42, ..), as expected Valgrind stops reporting uninitialized data, and the DBMS test still works without errors. 2. I use Valgrind version 3.13 (with GCC 7.4), since it does not show the problem https://bugs.kde.org/show_bug.cgi?id=417075, but version 3.15 (with GCC 9.1) behaves similarly, i.e. it gives false-positive reports. 3. I was able to reproduce the problem only in a multi-process test scenario, when the original process runs multiple copies of itself that are operating concurrently with a single DB in shared memory. However, the" non-initialized " memory allocations reported by valgrind are not shared between processes, i.e. they are always local. STEPS TO REPRODUCE 1. git clone https://github.com/leo-yuriev/libmdbx.git --branch digging-valgrind 2. cd libmdbx 3. make -j2 mdbx_test && rm -f /dev/shm/mdbx-test.* *.log && \ valgrind --trace-children=yes --log-file=valgrind-%p.log --leak-check=full --track-origins=yes --error-exitcode=42 --suppressions=test/valgrind_suppress.txt --vgdb=yes --vgdb-error=2 \ ./mdbx_test basic > test.log 4. Wait about 30 seconds and look at the valgrind-*.log files to find out the pid of the process with the error. 5. gdb ./mdbx_test; target remote | vgdb --pid=XYZ I assume that such an error can be difficult to reproduce, analyze, and fix. However, it is likely that this error is the cause of many cases of false-positive reports. Therefore, I am ready to provide all possible assistance in fixing a bug. On the other hand, in addition to the correction itself, it is important for me to confirm the existence of such an Valgrind's bug. Because it allows you to explain the reason and eliminate some users' distrust of my code. Thanks in advance. Regards, Leonid. -- You are receiving this mail because: You are watching all bug changes.