Re: coredump without '+' final argument

2023-08-02 Thread Jason Vas Dias
Good day, Mike -

 Without any arguments, it does nothing .
  I did write in previous mails (I think):
$ pil L_RT.l -pr  # coredumps
 / $ pil  L_RT.l -pr +  # no coredump

 Sorry if I did not make that clear .
 L_RT.l is a half-finished part of an appilication specific Web-Based
  IP + VPN + Router + Firewall + DNS + DHCP + RADIUS / LDAP + SNMP
  Configurator I am writing for my company.
 Without any arguments, it assumes it is just being Sourced and does nothing -
 Arguments :
   '-pr' | '-PR' | 'prin_route' : load & process routes , with a function that
  expects a single 'route' LIST argument .
  I got as far as getting it to merge the 2 main command-line accessable
 kernel RT-NETLINK route info data sources: /proc/net/route and 'ip route show'
 -- before discovering this pil bug, as I believe this is.

 Certainly, a pil debugger, when configured in Emacs mode, with an Emacs Server
 running, SHOULD IMHO attempt to bring up a picolisp Debug session and
 a GDB Debug Session in Emacs buffers using 'emacsclient -e' .
 That is what I am now focusing on getting working.

 But secondly, the debugger is not detecting any problems, yet a coredump
 occurs WITHOUT debugging enabled, which suggests a problem with the
 implementation of the special handling for the trailing '+' last member of
 (argv) (though this is never shown in lists returned by (argv) ).

 I just thought I should report this anomalous / buggy coredump to the pil
 development team - it is one that has got me foxed & don't have time
 to investigate it in depth .

Best Regards,
Jason


On 02/08/2023, tankf33...@disroot.org  wrote:
> On 02-08-2023 03:03, Jason Vas Dias wrote:
>> Here's an improved version of that program,
>
> 
> $ pil L_RT.l
> :
> $
> 
>
> I have got just a prompt.
>
> (mike)
>

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Jason Vas Dias
I should alsohave made more clear that on my host
it the core dump occurs when trying to print out this
pair of routes, printed by the trailing '+' debugging enabled run :

192.168.42.1/32 ppp00.0.0.0 UP,HO
0   {   prefsrc:192.168.42.10   protocol:kernel 
scope:link  type:unicast}
192.168.42.1/32 ppp00.0.0.0 UP,HO
50  {   }

These are created by libreSwan for my VPN. WHY it creates
2 routes with identical 'keys' ('dst' fields) I do not know , but
this is definitely part of the problem -- the code has previously called
   (idx 'tri, (list "192.168.42.1/32" (list $attrList1 ...)) T)
..(idx 'tri, (list "192.168.42.1/32" (list $attrList2 ...)) T)
(ie. it is asking 'idx' to store 2 nodes with same key and
 different contents).
Is this allowed ? The documentation suggests so IMHO.
Then, if so, there does appear to be a problem with interaction
of 'idx' and the '+' debugging mode that results in this coredump
&| the coredump causing problem not being detected.

Best Regards,
Jason



On 02/08/2023, Jason Vas Dias  wrote:
> Good day, Mike -
>
>  Without any arguments, it does nothing .
>   I did write in previous mails (I think):
> $ pil L_RT.l -pr  # coredumps
>  / $ pil  L_RT.l -pr +  # no coredump
>
>  Sorry if I did not make that clear .
>  L_RT.l is a half-finished part of an appilication specific Web-Based
>   IP + VPN + Router + Firewall + DNS + DHCP + RADIUS / LDAP + SNMP
>   Configurator I am writing for my company.
>  Without any arguments, it assumes it is just being Sourced and does nothing
> -
>  Arguments :
>'-pr' | '-PR' | 'prin_route' : load & process routes , with a function
> that
>   expects a single 'route' LIST argument
> .
>   I got as far as getting it to merge the 2 main command-line accessable
>  kernel RT-NETLINK route info data sources: /proc/net/route and 'ip route
> show'
>  -- before discovering this pil bug, as I believe this is.
>
>  Certainly, a pil debugger, when configured in Emacs mode, with an Emacs
> Server
>  running, SHOULD IMHO attempt to bring up a picolisp Debug session and
>  a GDB Debug Session in Emacs buffers using 'emacsclient -e' .
>  That is what I am now focusing on getting working.
>
>  But secondly, the debugger is not detecting any problems, yet a coredump
>  occurs WITHOUT debugging enabled, which suggests a problem with the
>  implementation of the special handling for the trailing '+' last member of
>  (argv) (though this is never shown in lists returned by (argv) ).
>
>  I just thought I should report this anomalous / buggy coredump to the pil
>  development team - it is one that has got me foxed & don't have time
>  to investigate it in depth .
>
> Best Regards,
> Jason
>
>
> On 02/08/2023, tankf33...@disroot.org  wrote:
>> On 02-08-2023 03:03, Jason Vas Dias wrote:
>>> Here's an improved version of that program,
>>
>> 
>> $ pil L_RT.l
>> :
>> $
>> 
>>
>> I have got just a prompt.
>>
>> (mike)
>>
>

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Jason Vas Dias
Good day Alex -
RE:
>Can you debug this a little more? E.g. look at the output of (traceAll) and see
> *where* exactly it happens.

That's the whole problem - doesn't 'traceAll' depend on Debug Mode
being enabled by trailing '+' ?
And the coredump does NOT occur in debug mode, nor in normal
usage where there are not 2 routes that share the same Key / destination .

The coredump only occurs when NOT in Debug Mode,  only when
2 routes share the same IDX Key / destination .

Best Regards,
Jason
On 02/08/2023, Alexander Burger  wrote:
> Hi Jason,
>
> I did not try to install and run it.
>
> But I think it is by chance that "+" has an influence on the crash. There
> must
> be a "hard" reason.
>
> What I can see from the stack backtrace, it crashes in 'consTree', so it
> must be
> in one of the 'idx' calls.
>
> Can you debug this a little more? E.g. look at the output of (traceAll) and
> see
> *where* exactly it happens.
>
>
> BTW, I cannot see your mail in the mail archive. Not sure if anyone else
> except
> me got it. Probably because you mailed to me and put the list only into CC.
> I send this directly to the list.
>
> ☺/ A!ex
>
>
> On Tue, Aug 01, 2023 at 10:34:48PM +0100, Jason Vas Dias wrote:
>>
>> Good day -
>>
>>   Why, without a final '+' argument, does the attached program coredump,
>>   when with a final '+' argument (enabling debugging) , it does not ?
>>
>>   This is with picolisp 23.07.28 on my Fedora 36 12-core x86_86 laptop
>>   PC - my route printing / processing program:
>>
>>  $ ./L_RT.l -pr +
>> 0.0.0.0/0wlp59s0 192.168.43.1
>> UP,GW
>>
>>  600 {   dev:wlp59s0 gateway:192.168.43.1metric:600  
>> prefsrc:192.168.43.70   protocol:dhcp   scope:globaltype:unicast}
>> 0.0.0.0/32   *   0.0.0.0 
>> UP,HO
>>  0   {   protocol:boot   scope:globaltype:blackhole  
>> }
>> ...
>>
>>  $ ./L_RT.l -pr
>>  192.168.42.1/32 ppp00.0.0.0 
>> UP,HO
>>
>>  0   {   dev:ppp0metric:50   prefsrc:192.168.42.10   
>> protocol:kernel scope:link  type:unicast}
>>  ...
>>  Segmentation Fault
>>   :
>>  [jvd@jvdspc]:~/src/pil21/src [3292] 22:05:06 [#:980!:28555]{1}  
>>   $ gdb ../bin/picolisp /tmp/pil.1800629.core
>>   GNU gdb (GDB) Fedora 12.1-2.fc36 ...
>>   Reading symbols from ../bin/picolisp...
>>   (No debugging symbols found in ../bin/picolisp)
>>   [New LWP 1800629]
>>   [Thread debugging using libthread_db enabled]
>>   Using host libthread_db library "/lib64/libthread_db.so.1".
>>   Core was generated by `/usr/bin/picolisp /usr/lib/picolisp/lib.l
>> /usr/bin/pil /home/jvd/bin/L_RT.l -pr'.
>>   Program terminated with signal SIGSEGV, Segmentation fault.
>>   #0  0x00444921 in consTree ()
>>   Missing separate debuginfos, use: dnf debuginfo-install
>> libffi-3.4.2-8.fc36.x86_64 ncurses-libs-6.2-9.20210508.fc36.x86_64
>> readline-8.2-2.fc36.x86_64
>>   (gdb) where
>>   #0  0x00444921 in consTree ()
>>   #1  0x00422428 in _for ()
>>   #2  0x004212f7 in _prog ()
>>   #3  0x0042324d in _let ()
>>   #4  0x0042324d in _let ()
>>   #5  0x00432469 in evExpr ()
>>   #6  0x0041fd02 in _eval ()
>>   #7  0x004211d8 in _bool ()
>>   #8  0x00421218 in _not ()
>>   #9  0x004214ac in _if ()
>>   #10 0x004212f7 in _prog ()
>>   #11 0x0042324d in _let ()
>>   #12 0x0043e505 in loop1 ()
>>   #13 0x00422573 in _for ()
>>   #14 0x0042324d in _let ()
>>   #15 0x0042324d in _let ()
>>   #16 0x004238c7 in _catch ()
>>   #17 0x00434476 in repl ()
>>   #18 0x004495b8 in main ()
>>   (gdb)
>>
>>   quit
>>  
>>
>>  This is from the route which has 2 identical idx tree 'keys':
>>   192.168.42.1/32ppp00.0.0.0 
>> UP,HO
>>
>>  0   {   dev:ppp0metric:50   prefsrc:192.168.42.10   
>> protocol:kernel scope:link  type:unicast}
>>   192.168.42.1/32ppp00.0.0.0 
>> UP,HO
>>  50  {   }
>>
>>  Why does this situation cause a coredump / inability to process 'ip
>>  route' output without final '+' command line  argument in effect ?
>>  Very strange - if the debugger is enabled , it
>>  should detect a problem and trap to it, no ?
>>
>>  Any constructive ideas / suggested workarounds gratefully received .
>>
>>  Note, it is not fixed by doing a '(load "@lib/debug.l") in the program 

Re: coredump without '+' final argument

2023-08-02 Thread Jason Vas Dias
The coredump occurs within this loop of the 'prin_route' function,
 for the same route, only when debug mode is enabled, as can be
 proved by the output ending with '{' :
  $ L_RT.l -pr
0.0.0.0/0   wlp59s0 192.168.43.1UP,GW
600 {   prefsrc:192.168.43.70   protocol:dhcp   
scope:globaltype:unicast}
0.0.0.0/32  *   0.0.0.0 UP,HO
0   {   protocol:boot   scope:globaltype:blackhole  
}
..
192.168.42.1/32 ppp00.0.0.0 UP,HO
0   {   prefsrc:192.168.42.10   protocol:kernel 
scope:link  type:unicast}
192.168.42.1/32 ppp00.0.0.0 UP,HO
50  {   Segmentation fault (core dumped)

So the code MUST be in this loop when the coredump occurs :

   (for r (idx ratr)
 (when (and (bool r) (lst? r))
 (let
  ( k (car r) v (cdr r) )
  (case k
   ('( "dst" "gateway" "dev" "metric" "mtu" ))
   (T
(out 1 (prin (pack k ":" v "^I")))
   )
  )
 )
)
   )

Why, only when the trailing '+' "Enable Debug Mode" is in '(argv)' ,
should the behaviour of 'idx' change so drastically ?

I can send you hundreds of such coredumps - they are not very helpful
unless you can combine using GDB with use of a live picolisp to inspect
the stack . That is what I'd like to get working .

I suspect the CFA stack frame info being generated and possibly data layouts
when not in debug mode may be different to when in debug mode ?

Here's more details of the one that just happened :

(gdb) where
#0  0x00444921 in consTree ()
#1  0x00422428 in _for ()
#2  0x004212f7 in _prog ()
#3  0x0042324d in _let ()
#4  0x0042324d in _let ()
#5  0x00432469 in evExpr ()
#6  0x0041fd02 in _eval ()
#7  0x004211d8 in _bool ()
#8  0x00421218 in _not ()
#9  0x004214ac in _if ()
#10 0x004212f7 in _prog ()
#11 0x0042324d in _let ()
#12 0x0043e505 in loop1 ()
#13 0x00422573 in _for ()
#14 0x0042324d in _let ()
#15 0x0042324d in _let ()
#16 0x004238c7 in _catch ()
#17 0x0042324d in _let ()
#18 0x00434476 in repl ()
#19 0x004495b8 in main ()
(gdb) info reg
rax0x45b6b84568760
rbx0x7f38d2f23780  139882033985408
rcx0x7f38d2f247c0  139882033989568
rdx0x7f38d2f23780  139882033985408
rsi0x3 3
rdi0x7f38d2f247c0  139882033989568
rbp0x7ffc913afc80  0x7ffc913afc80
rsp0x7ffc913afc40  0x7ffc913afc40
r8 0x45b5e84568552
r9 0x45b5e84568552
r100x45b6b84568760
r110x202   514
r120x7ffc913afc40  140722745048128
r130x45b6b84568760
r140x7ffc913afc50  140722745048144
r150x45b5e84568552
rip0x4449210x444921 
eflags 0x10212 [ AF IF RF ]
cs 0x3351
ss 0x2b43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb) disass
Dump of assembler code for function consTree:
   0x004448a0 <+0>: push   %rbp
   0x004448a1 <+1>: mov%rsp,%rbp
   0x004448a4 <+4>: push   %r15
   0x004448a6 <+6>: push   %r14
   0x004448a8 <+8>: push   %r12
   0x004448aa <+10>:push   %rbx
   0x004448ab <+11>:mov%rsi,%rax
   0x004448ae <+14>:mov%rdi,%rbx
   0x004448b1 <+17>:test   $0xf,%bl
   0x004448b4 <+20>:jne0x4449c4 
   0x004448ba <+26>:mov%rsp,%rcx
   0x004448bd <+29>:lea-0x10(%rcx),%r14
   0x004448c1 <+33>:mov%r14,%rsp
   0x004448c4 <+36>:mov$0x45b5e8,%r15
   0x004448cb <+43>:mov(%r15),%rdx
   0x004448ce <+46>:mov%rdx,-0x8(%rcx)
   0x004448d2 <+50>:mov%rsp,%rcx
   0x004448d5 <+53>:lea-0x10(%rcx),%r12
   0x004448d9 <+57>:mov%r12,%rsp
   0x004448dc <+60>:movq   $0xa,-0x10(%rcx)
   0x004448e4 <+68>:mov%r14,-0x8(%rcx)
   0x004448e8 <+72>:mov%r12,(%r15)
   0x004448eb <+75>:mov$0xa,%ecx
   0x004448f0 <+80>:mov0x8(%rbx),%rsi
   0x004448f4 <+84>:mov0x8(%rsi),%rdx
   0x004448f8 <+88>:test   $0xf,%dl
   0x004448fb <+91>:jne0x44492e 
   0x004448fd <+93>:add$0x8,%rsi
   0x00444901 <+97>

Re: coredump without '+' final argument

2023-08-02 Thread Alexander Burger
Hi Jason,

> >Can you debug this a little more? E.g. look at the output of (traceAll) and 
> >see
> > *where* exactly it happens.
> 
> That's the whole problem - doesn't 'traceAll' depend on Debug Mode
> being enabled by trailing '+' ?

Oh, right, you said it happens only if *not* in debug mode.

Still, as I said, I'm quite sure it does not directly have to do with debug
mode. Rather it looks like a heisenbug to me, where the error appears and
disappears depending on unrelated things like memory or stack layout, timing
etc.

It *can* be, though, that your program conflicts with stuff loaded only in debug
mode.

I did not succeed to test it here, but some parts of your code look suspicioos,
at least noy following the Pil conventions. Perhaps some of your lower-cased
locally bound symbbls conflict somewhere?

In any case you could try other ways to debug it without complete debug mode,
e.g. by inserting

   (msg '<1>)

or so in various parts of the program until you find which 'idx' call crashes,
and what the environment is at that moment.

☺/ A!ex

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Alexander Burger
> So the code MUST be in this loop when the coredump occurs :

OK

Though I don't know the reason for the crash, pleaes
try to stick with pil conventions


>(for r (idx ratr)
 (foa R (idx Ratr)


>  (when (and (bool r) (lst? r))
   (when (and R (lst? R))


which is

(when (pair R)



>  (let
>   ( k (car r) v (cdr r) )

(let
   ((K . V) R)

>   (case k
>('( "dst" "gateway" "dev" "metric" "mtu" ))

'case' does not eval the keys, so the quote is wrong.


> Why, only when the trailing '+' "Enable Debug Mode" is in '(argv)' ,
> should the behaviour of 'idx' change so drastically ?

> I can send you hundreds of such coredumps - they are not very helpful

Mike Pechkin tried to reproduce it, also with your recommended invocation, but
it does not crash. I think it is an heisenbug.



> unless you can combine using GDB with use of a live picolisp to inspect
> the stack . That is what I'd like to get working .
> 
> I suspect the CFA stack frame info being generated and possibly data layouts
> when not in debug mode may be different to when in debug mode ?

Debug mode does not change anything in the interpreter.

☺/ A!ex

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Alexander Burger
On Wed, Aug 02, 2023 at 07:41:23PM +0200, Alexander Burger wrote:
> Though I don't know the reason for the crash, pleaes
> try to stick with pil conventions

For example, in 'load-routes' there is

   (let ( cnt 0 tits NIL)

However, 'cnt' is a built-in function, which is now bound to 0 (null-pointer).
If some code (in 'load-routes' or any other function called from within it, it
is sure to crash.

☺/ A!ex

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Alexander Burger
On Wed, Aug 02, 2023 at 09:15:54PM +0200, Alexander Burger wrote:
> On Wed, Aug 02, 2023 at 07:41:23PM +0200, Alexander Burger wrote:
> > Though I don't know the reason for the crash, pleaes
> > try to stick with pil conventions

Other issues are:

1. In 'ipv4-route-flag' there is

(let
 ...
  fs NI

   Probably a mistype and NIL was meant.

2. 'load-routes' uses 'ratr' without binding it in an argument or a 'let'.

3. 'prin_route' binds 'dstr' but never uses it.

(I found these isskes with (lintAll))

☺/ A!ex

-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: coredump without '+' final argument

2023-08-02 Thread Jason Vas Dias
Loop is now as you suggested, same problem :
 (let ( ... r NIL)
 ...
   (for r (idx ratr)
 (when (and (bool r) (lst? r))
 (let
  ( (k . v) r )
  (case k
   (( "dst" "gateway" "dev" "metric"  ))
   (T
(out 1 (prin (pack k ":" v "^I")))
   )
  )
 )
)
   )
 ...)

  So '(let ( ( k . v ) l ) ...)' copies CONS cell in l, while
  '(let ( ( k v ) l ) ... )' sets k & v to (car l) and (cadr l)
respectively,
  right ?
  That is cool.

 I will try printing all symbols from the program in debug mode and
 comparing with all symbols in non-debug mode .

 Why though when run under Emacs in an Emacs terminal , or with '+'
 debug mode option, no warnings or coredumps or errors occur
 (since we are in Debug Mode), that in itself is a bug in the Debugger
 if some major re-naming has occurred - it should message about
 'Redefining Symbols' , no ? It doesn't :
: (load "/home/jvd/J/L_RT.l")
# pil_inc redefined
# load-routes redefined
# prin_route redefined
-> NIL
:
(this was because I had a previous version loaded).
You'd hope , when running with debug enabled in an Emacs terminal, that
any redefinition of a core built-in symbol would be warned about, no?

So yes, I think picolisp definitely needs ability to control both
GDB and pil debugger driver Emacs sessions for the same process
to enable investigating situations such as this - one needs to be
able to inspect the picoLisp Stack in Emacs and see which Variables
/ symbols / strings / numbers / external symbols & in each environment
 they refer to - this is not trivial, but is what is needed, and is what eg