Re: coredump without '+' final argument
Good day, Mike - Without any arguments, it does nothing . I did write in previous mails (I think): $ pil L_RT.l -pr # coredumps / $ pil L_RT.l -pr + # no coredump Sorry if I did not make that clear . L_RT.l is a half-finished part of an appilication specific Web-Based IP + VPN + Router + Firewall + DNS + DHCP + RADIUS / LDAP + SNMP Configurator I am writing for my company. Without any arguments, it assumes it is just being Sourced and does nothing - Arguments : '-pr' | '-PR' | 'prin_route' : load & process routes , with a function that expects a single 'route' LIST argument . I got as far as getting it to merge the 2 main command-line accessable kernel RT-NETLINK route info data sources: /proc/net/route and 'ip route show' -- before discovering this pil bug, as I believe this is. Certainly, a pil debugger, when configured in Emacs mode, with an Emacs Server running, SHOULD IMHO attempt to bring up a picolisp Debug session and a GDB Debug Session in Emacs buffers using 'emacsclient -e' . That is what I am now focusing on getting working. But secondly, the debugger is not detecting any problems, yet a coredump occurs WITHOUT debugging enabled, which suggests a problem with the implementation of the special handling for the trailing '+' last member of (argv) (though this is never shown in lists returned by (argv) ). I just thought I should report this anomalous / buggy coredump to the pil development team - it is one that has got me foxed & don't have time to investigate it in depth . Best Regards, Jason On 02/08/2023, tankf33...@disroot.org wrote: > On 02-08-2023 03:03, Jason Vas Dias wrote: >> Here's an improved version of that program, > > > $ pil L_RT.l > : > $ > > > I have got just a prompt. > > (mike) > -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
I should alsohave made more clear that on my host it the core dump occurs when trying to print out this pair of routes, printed by the trailing '+' debugging enabled run : 192.168.42.1/32 ppp00.0.0.0 UP,HO 0 { prefsrc:192.168.42.10 protocol:kernel scope:link type:unicast} 192.168.42.1/32 ppp00.0.0.0 UP,HO 50 { } These are created by libreSwan for my VPN. WHY it creates 2 routes with identical 'keys' ('dst' fields) I do not know , but this is definitely part of the problem -- the code has previously called (idx 'tri, (list "192.168.42.1/32" (list $attrList1 ...)) T) ..(idx 'tri, (list "192.168.42.1/32" (list $attrList2 ...)) T) (ie. it is asking 'idx' to store 2 nodes with same key and different contents). Is this allowed ? The documentation suggests so IMHO. Then, if so, there does appear to be a problem with interaction of 'idx' and the '+' debugging mode that results in this coredump &| the coredump causing problem not being detected. Best Regards, Jason On 02/08/2023, Jason Vas Dias wrote: > Good day, Mike - > > Without any arguments, it does nothing . > I did write in previous mails (I think): > $ pil L_RT.l -pr # coredumps > / $ pil L_RT.l -pr + # no coredump > > Sorry if I did not make that clear . > L_RT.l is a half-finished part of an appilication specific Web-Based > IP + VPN + Router + Firewall + DNS + DHCP + RADIUS / LDAP + SNMP > Configurator I am writing for my company. > Without any arguments, it assumes it is just being Sourced and does nothing > - > Arguments : >'-pr' | '-PR' | 'prin_route' : load & process routes , with a function > that > expects a single 'route' LIST argument > . > I got as far as getting it to merge the 2 main command-line accessable > kernel RT-NETLINK route info data sources: /proc/net/route and 'ip route > show' > -- before discovering this pil bug, as I believe this is. > > Certainly, a pil debugger, when configured in Emacs mode, with an Emacs > Server > running, SHOULD IMHO attempt to bring up a picolisp Debug session and > a GDB Debug Session in Emacs buffers using 'emacsclient -e' . > That is what I am now focusing on getting working. > > But secondly, the debugger is not detecting any problems, yet a coredump > occurs WITHOUT debugging enabled, which suggests a problem with the > implementation of the special handling for the trailing '+' last member of > (argv) (though this is never shown in lists returned by (argv) ). > > I just thought I should report this anomalous / buggy coredump to the pil > development team - it is one that has got me foxed & don't have time > to investigate it in depth . > > Best Regards, > Jason > > > On 02/08/2023, tankf33...@disroot.org wrote: >> On 02-08-2023 03:03, Jason Vas Dias wrote: >>> Here's an improved version of that program, >> >> >> $ pil L_RT.l >> : >> $ >> >> >> I have got just a prompt. >> >> (mike) >> > -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
Good day Alex - RE: >Can you debug this a little more? E.g. look at the output of (traceAll) and see > *where* exactly it happens. That's the whole problem - doesn't 'traceAll' depend on Debug Mode being enabled by trailing '+' ? And the coredump does NOT occur in debug mode, nor in normal usage where there are not 2 routes that share the same Key / destination . The coredump only occurs when NOT in Debug Mode, only when 2 routes share the same IDX Key / destination . Best Regards, Jason On 02/08/2023, Alexander Burger wrote: > Hi Jason, > > I did not try to install and run it. > > But I think it is by chance that "+" has an influence on the crash. There > must > be a "hard" reason. > > What I can see from the stack backtrace, it crashes in 'consTree', so it > must be > in one of the 'idx' calls. > > Can you debug this a little more? E.g. look at the output of (traceAll) and > see > *where* exactly it happens. > > > BTW, I cannot see your mail in the mail archive. Not sure if anyone else > except > me got it. Probably because you mailed to me and put the list only into CC. > I send this directly to the list. > > ☺/ A!ex > > > On Tue, Aug 01, 2023 at 10:34:48PM +0100, Jason Vas Dias wrote: >> >> Good day - >> >> Why, without a final '+' argument, does the attached program coredump, >> when with a final '+' argument (enabling debugging) , it does not ? >> >> This is with picolisp 23.07.28 on my Fedora 36 12-core x86_86 laptop >> PC - my route printing / processing program: >> >> $ ./L_RT.l -pr + >> 0.0.0.0/0wlp59s0 192.168.43.1 >> UP,GW >> >> 600 { dev:wlp59s0 gateway:192.168.43.1metric:600 >> prefsrc:192.168.43.70 protocol:dhcp scope:globaltype:unicast} >> 0.0.0.0/32 * 0.0.0.0 >> UP,HO >> 0 { protocol:boot scope:globaltype:blackhole >> } >> ... >> >> $ ./L_RT.l -pr >> 192.168.42.1/32 ppp00.0.0.0 >> UP,HO >> >> 0 { dev:ppp0metric:50 prefsrc:192.168.42.10 >> protocol:kernel scope:link type:unicast} >> ... >> Segmentation Fault >> : >> [jvd@jvdspc]:~/src/pil21/src [3292] 22:05:06 [#:980!:28555]{1} >> $ gdb ../bin/picolisp /tmp/pil.1800629.core >> GNU gdb (GDB) Fedora 12.1-2.fc36 ... >> Reading symbols from ../bin/picolisp... >> (No debugging symbols found in ../bin/picolisp) >> [New LWP 1800629] >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> Core was generated by `/usr/bin/picolisp /usr/lib/picolisp/lib.l >> /usr/bin/pil /home/jvd/bin/L_RT.l -pr'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0x00444921 in consTree () >> Missing separate debuginfos, use: dnf debuginfo-install >> libffi-3.4.2-8.fc36.x86_64 ncurses-libs-6.2-9.20210508.fc36.x86_64 >> readline-8.2-2.fc36.x86_64 >> (gdb) where >> #0 0x00444921 in consTree () >> #1 0x00422428 in _for () >> #2 0x004212f7 in _prog () >> #3 0x0042324d in _let () >> #4 0x0042324d in _let () >> #5 0x00432469 in evExpr () >> #6 0x0041fd02 in _eval () >> #7 0x004211d8 in _bool () >> #8 0x00421218 in _not () >> #9 0x004214ac in _if () >> #10 0x004212f7 in _prog () >> #11 0x0042324d in _let () >> #12 0x0043e505 in loop1 () >> #13 0x00422573 in _for () >> #14 0x0042324d in _let () >> #15 0x0042324d in _let () >> #16 0x004238c7 in _catch () >> #17 0x00434476 in repl () >> #18 0x004495b8 in main () >> (gdb) >> >> quit >> >> >> This is from the route which has 2 identical idx tree 'keys': >> 192.168.42.1/32ppp00.0.0.0 >> UP,HO >> >> 0 { dev:ppp0metric:50 prefsrc:192.168.42.10 >> protocol:kernel scope:link type:unicast} >> 192.168.42.1/32ppp00.0.0.0 >> UP,HO >> 50 { } >> >> Why does this situation cause a coredump / inability to process 'ip >> route' output without final '+' command line argument in effect ? >> Very strange - if the debugger is enabled , it >> should detect a problem and trap to it, no ? >> >> Any constructive ideas / suggested workarounds gratefully received . >> >> Note, it is not fixed by doing a '(load "@lib/debug.l") in the program
Re: coredump without '+' final argument
The coredump occurs within this loop of the 'prin_route' function, for the same route, only when debug mode is enabled, as can be proved by the output ending with '{' : $ L_RT.l -pr 0.0.0.0/0 wlp59s0 192.168.43.1UP,GW 600 { prefsrc:192.168.43.70 protocol:dhcp scope:globaltype:unicast} 0.0.0.0/32 * 0.0.0.0 UP,HO 0 { protocol:boot scope:globaltype:blackhole } .. 192.168.42.1/32 ppp00.0.0.0 UP,HO 0 { prefsrc:192.168.42.10 protocol:kernel scope:link type:unicast} 192.168.42.1/32 ppp00.0.0.0 UP,HO 50 { Segmentation fault (core dumped) So the code MUST be in this loop when the coredump occurs : (for r (idx ratr) (when (and (bool r) (lst? r)) (let ( k (car r) v (cdr r) ) (case k ('( "dst" "gateway" "dev" "metric" "mtu" )) (T (out 1 (prin (pack k ":" v "^I"))) ) ) ) ) ) Why, only when the trailing '+' "Enable Debug Mode" is in '(argv)' , should the behaviour of 'idx' change so drastically ? I can send you hundreds of such coredumps - they are not very helpful unless you can combine using GDB with use of a live picolisp to inspect the stack . That is what I'd like to get working . I suspect the CFA stack frame info being generated and possibly data layouts when not in debug mode may be different to when in debug mode ? Here's more details of the one that just happened : (gdb) where #0 0x00444921 in consTree () #1 0x00422428 in _for () #2 0x004212f7 in _prog () #3 0x0042324d in _let () #4 0x0042324d in _let () #5 0x00432469 in evExpr () #6 0x0041fd02 in _eval () #7 0x004211d8 in _bool () #8 0x00421218 in _not () #9 0x004214ac in _if () #10 0x004212f7 in _prog () #11 0x0042324d in _let () #12 0x0043e505 in loop1 () #13 0x00422573 in _for () #14 0x0042324d in _let () #15 0x0042324d in _let () #16 0x004238c7 in _catch () #17 0x0042324d in _let () #18 0x00434476 in repl () #19 0x004495b8 in main () (gdb) info reg rax0x45b6b84568760 rbx0x7f38d2f23780 139882033985408 rcx0x7f38d2f247c0 139882033989568 rdx0x7f38d2f23780 139882033985408 rsi0x3 3 rdi0x7f38d2f247c0 139882033989568 rbp0x7ffc913afc80 0x7ffc913afc80 rsp0x7ffc913afc40 0x7ffc913afc40 r8 0x45b5e84568552 r9 0x45b5e84568552 r100x45b6b84568760 r110x202 514 r120x7ffc913afc40 140722745048128 r130x45b6b84568760 r140x7ffc913afc50 140722745048144 r150x45b5e84568552 rip0x4449210x444921 eflags 0x10212 [ AF IF RF ] cs 0x3351 ss 0x2b43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) disass Dump of assembler code for function consTree: 0x004448a0 <+0>: push %rbp 0x004448a1 <+1>: mov%rsp,%rbp 0x004448a4 <+4>: push %r15 0x004448a6 <+6>: push %r14 0x004448a8 <+8>: push %r12 0x004448aa <+10>:push %rbx 0x004448ab <+11>:mov%rsi,%rax 0x004448ae <+14>:mov%rdi,%rbx 0x004448b1 <+17>:test $0xf,%bl 0x004448b4 <+20>:jne0x4449c4 0x004448ba <+26>:mov%rsp,%rcx 0x004448bd <+29>:lea-0x10(%rcx),%r14 0x004448c1 <+33>:mov%r14,%rsp 0x004448c4 <+36>:mov$0x45b5e8,%r15 0x004448cb <+43>:mov(%r15),%rdx 0x004448ce <+46>:mov%rdx,-0x8(%rcx) 0x004448d2 <+50>:mov%rsp,%rcx 0x004448d5 <+53>:lea-0x10(%rcx),%r12 0x004448d9 <+57>:mov%r12,%rsp 0x004448dc <+60>:movq $0xa,-0x10(%rcx) 0x004448e4 <+68>:mov%r14,-0x8(%rcx) 0x004448e8 <+72>:mov%r12,(%r15) 0x004448eb <+75>:mov$0xa,%ecx 0x004448f0 <+80>:mov0x8(%rbx),%rsi 0x004448f4 <+84>:mov0x8(%rsi),%rdx 0x004448f8 <+88>:test $0xf,%dl 0x004448fb <+91>:jne0x44492e 0x004448fd <+93>:add$0x8,%rsi 0x00444901 <+97>
Re: coredump without '+' final argument
Hi Jason, > >Can you debug this a little more? E.g. look at the output of (traceAll) and > >see > > *where* exactly it happens. > > That's the whole problem - doesn't 'traceAll' depend on Debug Mode > being enabled by trailing '+' ? Oh, right, you said it happens only if *not* in debug mode. Still, as I said, I'm quite sure it does not directly have to do with debug mode. Rather it looks like a heisenbug to me, where the error appears and disappears depending on unrelated things like memory or stack layout, timing etc. It *can* be, though, that your program conflicts with stuff loaded only in debug mode. I did not succeed to test it here, but some parts of your code look suspicioos, at least noy following the Pil conventions. Perhaps some of your lower-cased locally bound symbbls conflict somewhere? In any case you could try other ways to debug it without complete debug mode, e.g. by inserting (msg '<1>) or so in various parts of the program until you find which 'idx' call crashes, and what the environment is at that moment. ☺/ A!ex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
> So the code MUST be in this loop when the coredump occurs : OK Though I don't know the reason for the crash, pleaes try to stick with pil conventions >(for r (idx ratr) (foa R (idx Ratr) > (when (and (bool r) (lst? r)) (when (and R (lst? R)) which is (when (pair R) > (let > ( k (car r) v (cdr r) ) (let ((K . V) R) > (case k >('( "dst" "gateway" "dev" "metric" "mtu" )) 'case' does not eval the keys, so the quote is wrong. > Why, only when the trailing '+' "Enable Debug Mode" is in '(argv)' , > should the behaviour of 'idx' change so drastically ? > I can send you hundreds of such coredumps - they are not very helpful Mike Pechkin tried to reproduce it, also with your recommended invocation, but it does not crash. I think it is an heisenbug. > unless you can combine using GDB with use of a live picolisp to inspect > the stack . That is what I'd like to get working . > > I suspect the CFA stack frame info being generated and possibly data layouts > when not in debug mode may be different to when in debug mode ? Debug mode does not change anything in the interpreter. ☺/ A!ex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
On Wed, Aug 02, 2023 at 07:41:23PM +0200, Alexander Burger wrote: > Though I don't know the reason for the crash, pleaes > try to stick with pil conventions For example, in 'load-routes' there is (let ( cnt 0 tits NIL) However, 'cnt' is a built-in function, which is now bound to 0 (null-pointer). If some code (in 'load-routes' or any other function called from within it, it is sure to crash. ☺/ A!ex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
On Wed, Aug 02, 2023 at 09:15:54PM +0200, Alexander Burger wrote: > On Wed, Aug 02, 2023 at 07:41:23PM +0200, Alexander Burger wrote: > > Though I don't know the reason for the crash, pleaes > > try to stick with pil conventions Other issues are: 1. In 'ipv4-route-flag' there is (let ... fs NI Probably a mistype and NIL was meant. 2. 'load-routes' uses 'ratr' without binding it in an argument or a 'let'. 3. 'prin_route' binds 'dstr' but never uses it. (I found these isskes with (lintAll)) ☺/ A!ex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: coredump without '+' final argument
Loop is now as you suggested, same problem : (let ( ... r NIL) ... (for r (idx ratr) (when (and (bool r) (lst? r)) (let ( (k . v) r ) (case k (( "dst" "gateway" "dev" "metric" )) (T (out 1 (prin (pack k ":" v "^I"))) ) ) ) ) ) ...) So '(let ( ( k . v ) l ) ...)' copies CONS cell in l, while '(let ( ( k v ) l ) ... )' sets k & v to (car l) and (cadr l) respectively, right ? That is cool. I will try printing all symbols from the program in debug mode and comparing with all symbols in non-debug mode . Why though when run under Emacs in an Emacs terminal , or with '+' debug mode option, no warnings or coredumps or errors occur (since we are in Debug Mode), that in itself is a bug in the Debugger if some major re-naming has occurred - it should message about 'Redefining Symbols' , no ? It doesn't : : (load "/home/jvd/J/L_RT.l") # pil_inc redefined # load-routes redefined # prin_route redefined -> NIL : (this was because I had a previous version loaded). You'd hope , when running with debug enabled in an Emacs terminal, that any redefinition of a core built-in symbol would be warned about, no? So yes, I think picolisp definitely needs ability to control both GDB and pil debugger driver Emacs sessions for the same process to enable investigating situations such as this - one needs to be able to inspect the picoLisp Stack in Emacs and see which Variables / symbols / strings / numbers / external symbols & in each environment they refer to - this is not trivial, but is what is needed, and is what eg