[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
New submission from Howard A. Landman : I have a program qtd.py that reliably dies with free(): invalid pointer after about 13 hours of runtime (on a RPi3B+). This is hard to debug because (1) try:except: will not catch SIGABRT (2) signal.signal(signal.SIGABRT, sigabrt_handler) also fails to catch SIGABRT (3) python3-dbg does not work with libraries like RPi.GPIO and spidev() This happens under both Buster and Stretch, so I don't think it's OS-dependent. I managed to get a core dump and gdb back trace gives: warning: core file may not match specified executable file. [New LWP 10277] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1". Core was generated by `python3 qtd.py'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x76c5e308 in __GI_abort () at abort.c:100 #2 0x76cae51c in __libc_message (action=action@entry=do_abort, fmt=) at ../sysdeps/posix/libc_fatal.c:181 #3 0x76cb5044 in malloc_printerr (str=) at malloc.c:5341 #4 0x76cb6d50 in _int_free (av=0x76d917d4 , p=0x43417c , have_lock=) at malloc.c:4165 #5 0x001b3bb4 in list_ass_item (a=, i=, v=, a=, i=, v=) at ../Objects/listobject.c:739 Backtrace stopped: Cannot access memory at address 0x17abeff8 So, as far as I can tell, it looks like list_ass_item() is where the error happens. I'm willing to help debug and maybe even fix this, but is there any way to remove any of the roadblocks (1-3) above? NOTE: qtd.py will exit unless there is a Texas Instruments TDC7201 chip attached to the RPI's SPI pins. -- components: Interpreter Core hgrepos: 389 messages: 373911 nosy: Howard_Landman priority: normal severity: normal status: open title: free(): invalid pointer in list_ass_item() in Python 3.7.3 type: crash versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Change by Howard A. Landman : -- hgrepos: -389 ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: This is not a memory leak problem. "Top" reports VIRT 21768 RES 13516 for the whole run, and Python internal resource reporting says 13564 kb for the whole run. So that's less than 1 kb leaked in 118.6M measurement cycles; most likely zero. -- ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12423] signal handler doesn't handle SIGABRT from os.abort
Howard A. Landman added the comment: I don't think changing the documentation makes this not be a bug. My situation: I have a Python 3.7.3 program that reliably dies (after about 13 hours, having called its measure() method between 118.6M and 118.7M times) with free(): invalid pointer, which calls abort(). I believe that this is a bug in Python; and it's NOT a memory leak, since the size of the program doesn't change at all over time and is under 14 MB (real). I would like to debug it. This basically says "you're screwed". I can't catch the abort, and I can't use python3-dbg because it won't bind with the RPi.GPIO or spidev libraries. So all I can get from a core dump is that free() is being called by list_ass_item() at ../Objects/listobject.c line 739. Assuming that I'm right and that this is a bug in Python, how do you expect anyone to ever debug it? At a bare minimum, there needs to be an easy way to get a full stack trace through both Python and C. -- nosy: +Howard_Landman ___ Python tracker <https://bugs.python.org/issue12423> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: After a quick glance at the source code for the spidev library, I think it is unlikely but not impossible to be the home for the bug. It does do malloc() and free(), but only for data that is greater than 256 bytes. Short tx and rx data is kept in static local buffers. Also, these calls do not match the partial stack trace I got. There is a small amount of allocating and deallocating Python objects, however, including calls to PyList_New(), PyList_SET_ITEM(), and Py_TYPE(self)->tp_free((PyObject *)self), so it's possible that the bug is buried under one of those. -- nosy: -christian.heimes, stestagg ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: I'm running under 32-bit Raspbian, so let's assume the magic number is 13. There are only two places in my own code where the number 13 appears: (1) My result_list is 14 items long, i.e. 0 to 13. Relevant code from qtd.py: cum_results = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ... while batches != 0: ... result_list = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] for m in range(ITERS): ... result = tdc.measure(...) result_list[result] += 1 ... for i in range(len(result_list)): cum_results[i] += result_list[i] I notice that result_list is getting thrown away each cycle, and thus must be garbage-collected. I could try changing that and see if it has any effect. (2) in the tdc7201 library, 13 occurs as a possible (error) result code from measure(). However several of the failing runs had zero errors of this class, which means the code with 13 in it never got executed even once. I am not using pin 13 of the RPi's header, so I never send that number to RPi.GPIO. GPIO13 is pin 33 of the header, and I am not using that pin either, so I don't think RPi.GPIO is translating one of my pin number arguments to 13 internally. (But I should check Broadcom Mode numbers.) -- ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: Made some progress. By running it under gdb and breaking in malloc_printerr(), I got a better stack trace: Breakpoint 1, malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 5341malloc.c: No such file or directory. (gdb) bt #0 malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 #1 0x76d44d50 in _int_free (av=0x76e1f7d4 , p=0x43417c , have_lock=) at malloc.c:4165 #2 0x001b4f40 in list_dealloc (op=0x760ef288) at ../Objects/listobject.c:324 #3 0x001be784 in frame_dealloc ( f=Frame 0x765fcc30, for file /home/pi/src/QTD/src/tdc7201/tdc7201/__init__.py, line 719, in read_regs24 (i=40, reg=28)) at ../Objects/frameobject.c:470 #4 function_code_fastcall (globals=, nargs=, args=, co=) at ../Objects/call.c:291 #5 _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=) at ../Objects/call.c:408 #6 0x0011f984 in call_function (kwnames=0x0, oparg=, pp_stack=0x7effed40) at ../Python/ceval.c:4554 #7 _PyEval_EvalFrameDefault (f=, throwflag=) at ../Python/ceval.c:3110 #8 0x0011d63c in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x760e8960, for file /home/pi/src/QTD/src/tdc7201/tdc7201/__init__.py, line 962, in measure (self=, reg1=[130, 66, 0, 7, 255, 255, 6, 64, 0, 1, 65535, 1600, Breakpoint 1, malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 5341malloc.c: No such file or directory. (gdb) bt #0 malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 #1 0x76d44d50 in _int_free (av=0x76e1f7d4 , p=0x43417c , have_lock=) at malloc.c:4165 #2 0x001b4f40 in list_dealloc (op=0x760ef288) at ../Objects/listobject.c:324 #3 0x001be784 in frame_dealloc ( f=Frame 0x765fcc30, for file /home/pi/src/QTD/src/tdc7201/tdc7201/__init__.py, line 719, in read_regs24 (i=40, reg=28)) at ../Objects/frameobject.c:470 #4 function_code_fastcall (globals=, nargs=, args=, co=) at ../Objects/call.c:291 #5 _PyFunction_FastCallKeywords (func=, stack=, nargs=, kwnames=) at ../Objects/call.c:408 #6 0x0011f984 in call_function (kwnames=0x0, oparg=, pp_stack=0x7effed40) at ../Python/ceval.c:4554 #7 _PyEval_EvalFrameDefault (f=, throwflag=) at ../Python/ceval.c:3110 #8 0x0011d63c in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x760e8960, for file /home/pi/src/QTD/src/tdc7201/tdc7201/__init__.py, line 962, in measure (self=, reg1=[130, 66, 0, 7, 255, 255, 6, 64, 0, 1, 65535, 1600, 1, None, None, None, 1284, 322, 2371, 0, 0, 0, 0, 0, 0, 0, 0, 2322, 23172, None], chip_select=1, sclk=23, miso=21, mosi=19, cs1=24, cs2=26, enable=12, osc_enable=16, trig1=7, int1=37, trig2=None, int2=None, start=18, stop=22, meas_mode=2, cal_pers=10, cal_count=, norm_lsb=, tof1=, tof2=, tof3=0, tof4=0, tof5=0) at remote 0x76191750>, simulate=True, error_prefix='59835 ', log_file=<_io.TextIOWrapper at remote 0x761df4b0>, cf1=130, timeout=, n_stop=3, threshold=, pulse=2)) at ../Python/ceval.c:3930 1, None, None, None, 1284, 322, 2371, 0, 0, 0, 0, 0, 0, 0, 0, 2322, 23172, None], chip_select=1, sclk=23, miso=21, mosi=19, cs1=24, cs2=26, enable=12, osc_enable=16, trig1=7, int1=37, trig2=None, int2=None, start=18, stop=22, meas_mode=2, cal_pers=10, cal_count=, norm_lsb=, tof1=, tof2=, tof3=0, tof4=0, tof5=0) at remote 0x76191750>, simulate=True, error_prefix='59835 ', log_file=<_io.TextIOWrapper at remote 0x761df4b0>, cf1=130, timeout=, n_stop=3, threshold=, pulse=2)) at ../Python/ceval.c:3930 ... read_regs24 (i=40, reg=28) performs an SPI read of 13 24-bit registers (so 39 bytes); the corresponding Python code is: def read_regs24(self): """Read all 24-bit chip registers, using auto-increment feature.""" result24 = self._spi.xfer([self.MINREG24|self._AI, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]) #print("AI read 24-bits =", result24) #print("length =", len(result24)) i = 1 for reg in range(self.MINREG24, self.MAXREG24+1): # Data comes in MSB first. self.reg1[reg] = (result24[i] << 16) | (result24[i+1] << 8) | result24[i+2] i += 3 The zero padding is necessary to make sure that enough SPI clock cycles are sent for the data bytes to get clocked back in. The _AI flag turns on auto-increment, so each byte is from the next address. So it looks like the bug is either in the spidev library, or in Python deallocation of this f
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: Getting closer to isolating this. A small program that does nothing but call read_regs24() over and over dies with the same error after about 107.4M iterations. So it seems to be definitely in that method. But that only takes a few hours, not half a day. Going to try to separate the SPI operation from the remaining Python code next. -- ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: It appears to be in the spidev library xfer method, when used for reading. Calling that 107.4M times (using the same code as inside my read_regs24() method) causes the free() error. Breakpoint 1, malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 5341malloc.c: No such file or directory. (gdb) bt #0 malloc_printerr (str=0x76e028f8 "free(): invalid pointer") at malloc.c:5341 #1 0x76d44d50 in _int_free (av=0x76e1f7d4 , p=0x43417c , have_lock=) at malloc.c:4165 #2 0x001b4f40 in list_dealloc (op=0x766d0f58) at ../Objects/listobject.c:324 #3 0x7669b660 in ?? () from /usr/lib/python3/dist-packages/spidev.cpython-37m-arm-linux-gnueabihf.so Backtrace stopped: previous frame identical to this frame (corrupt stack?) Should I be worried about the "corrupt stack" message? -- ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: OK, this has been filed against the spidev library: https://github.com/doceme/py-spidev/issues/107 Do you want it closed, or left open until that gets resolved? -- resolution: -> third party ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41335] free(): invalid pointer in list_ass_item() in Python 3.7.3
Howard A. Landman added the comment: As far as we can tell, this is a known Py_DECREF problem with spidev==3.4. Testing on spidev==3.5 has not triggered the bug so far, so it appears to be already fixed. Under 3.4, changing the list to a tuple did not affect the behavior. -- ___ Python tracker <https://bugs.python.org/issue41335> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com