> From: Mark H Weaver <m...@netris.org> > Cc: Panicz Maciej Godek <godek.mac...@gmail.com>, guile-user@gnu.org > Date: Sun, 25 Aug 2013 12:59:43 -0400 > > > Anyway, I looked into this a bit. I can confirm that the simple > > program you mentioned the first time aborts due to "stack overflow" > [...] > > [...] it turns out that GC_get_stack_base, which > > is implemented in libgc, returns zero as the stack base. > > Thanks for looking into this, Eli! > > This raises the question: what's the relevant difference between > Panicz's simple 'main' and Guile's 'main' (in libguile/guile.c) that > causes one to (apparently) initialize the stack base properly, where the > other fails? It would be worthwhile to find out.
Thanks for the suggestion, here's some follow-up: Comparison between libguile/guile.exe and the test program shows that the former goes through a different initialization process, calling libgc as part of it. Here's the call stack from guilify_self_1, when called from libguile/guile.exe: Breakpoint 1, guilify_self_1 (base=base@entry=0x28fe8c) at threads.c:533 533 { (gdb) n 541 t.pthread = scm_i_pthread_self (); (gdb) n 542 t.handle = SCM_BOOL_F; (gdb) p t.pthread $1 = 0 (gdb) bt #0 guilify_self_1 (base=base@entry=0x28fe8c) at threads.c:542 #1 0x004330b7 in scm_threads_prehistory (base=base@entry=0x28fe8c) at threads.c:2176 #2 0x00412c2d in scm_i_init_guile (base=base@entry=0x28fe8c) at init.c:386 #3 0x0043249c in scm_i_init_thread_for_guile (base=0x28fe8c, parent=<optimized out>) at threads.c:835 #4 scm_i_init_thread_for_guile (base=0x28fe8c, parent=<optimized out>) at threads.c:814 #5 0x004324c4 in with_guile_and_parent (base=0x28fe8c, data=0x28feb4) at threads.c:901 #6 0x709cae6f in ?? () from D:\usr\bin\libgc-1.dll #7 0x004326cc in scm_i_with_guile_and_parent (parent=<optimized out>, data=0x28fee0, data@entry=0x28feb0, func=func@entry=0x412abc <invoke_main_func>) at threads.c:951 #8 scm_with_guile (func=func@entry=0x412abc <invoke_main_func>, data=data@entry=0x28fee0) at threads.c:957 #9 0x00412beb in scm_boot_guile (argc=argc@entry=1, argv=argv@entry=0x28c6970, main_func=main_func@entry=0x4013d4 <inner_main>, closure=closure@entry=0x0) at init.c:320 #10 0x004c5eab in main (argc=1, argv=0x28c6970) at guile.c:108 (gdb) p t.base $2 = (SCM_STACKITEM *) 0x28fe8c while the latter does not go through libgc: Breakpoint 1, guilify_self_1 (base=base@entry=0x2deff0c) at threads.c:533 533 { (gdb) bt #0 guilify_self_1 (base=base@entry=0x2deff0c) at threads.c:533 #1 0x00402f5f in scm_threads_prehistory (base=base@entry=0x2deff0c) at threads.c:2176 #2 0x00404b89 in scm_i_init_guile (base=base@entry=0x2deff0c) at init.c:386 #3 0x00402344 in scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:835 #4 scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:814 #5 0x00402519 in scm_init_guile () at threads.c:869 #6 0x004013c0 in main () at guile-hello.c:13 (gdb) p base->mem_base $1 = (void *) 0x0 In the former case, scm_i_with_guile_and_parent does this: static void * scm_i_with_guile_and_parent (void *(*func)(void *), void *data, SCM parent) { struct with_guile_args args; args.func = func; args.data = data; args.parent = parent; return GC_call_with_stack_base (with_guile_and_parent, &args); } ^^^^^^^^^^^^^^^^^^^^^^^ and the call to GC_call_with_stack_base correctly initializes the stack base: GC_API void * GC_CALL GC_call_with_stack_base(GC_stack_base_func fn, void *arg) { struct GC_stack_base base; void *result; base.mem_base = (void *)&base; <<<<<<<<<<<<<<<<<<<<<<<<<< I have verified in the debugger that base.mem_base gets a good value here. By contrast, in the test program, GC_call_with_stack_base is never called. Instead, GC_get_stack_base is called, which returns zero. The reason for that seems to be that GC_get_stack_base is called _before_ GC_init: (gdb) break GC_get_stack_base Breakpoint 1 at 0x4bbed8 (gdb) break GC_init Breakpoint 2 at 0x4bbfa8 (gdb) r Starting program: D:\usr\eli\utils\guile-2.0.9\guile-hello.exe [New Thread 11200.0x39ac] Breakpoint 1, 0x004bbed8 in GC_get_stack_base () (gdb) bt #0 0x004bbed8 in GC_get_stack_base () #1 0x00402508 in scm_init_guile () at threads.c:868 #2 0x004013c0 in main () at guile-hello.c:13 (gdb) c Continuing. Breakpoint 2, 0x004bbfa8 in GC_init () (gdb) bt #0 0x004bbfa8 in GC_init () #1 0x0043d53b in scm_storage_prehistory () at gc.c:653 #2 0x00404b81 in scm_i_init_guile (base=base@entry=0x2deff0c) at init.c:385 #3 0x00402344 in scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:835 #4 scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:814 #5 0x00402519 in scm_init_guile () at threads.c:869 #6 0x004013c0 in main () at guile-hello.c:13 (gdb) (Stepping through GC_init shows that GC_setpagesize is called and returns the correct value: 0x1000. But it is called too late.) This happens because scm_init_guile calls GC_get_stack_base without verifying that libgc was initialized. Changing the test program to call GC_init at the beginning, like this: #include <stdio.h> #include <guile/2.0/libguile.h> #include <gc/gc.h> int main (void) { GC_init (); scm_init_guile (); return 0; } passes the stack overflow test, but crashes further down the Guile initialization path: Program received signal SIGSEGV, Segmentation fault. 0x0042c66d in symbol_lookup_assoc_fn (obj=0x2f5870, alist=0x1e4d50, closure=0x0) at symbols.c:176 176 SCM sym = SCM_CAAR (alist); (gdb) bt #0 0x0042c66d in symbol_lookup_assoc_fn (obj=0x2f5870, alist=0x1e4d50, closure=0x0) at symbols.c:176 #1 0x0046a045 in weak_bucket_assoc (table=table@entry=0x2d8fd8, buckets=buckets@entry=0x2d4000, bucket_index=bucket_index@entry=2165, hash_fn=hash_fn@entry=0x42c110 <symbol_lookup_hash_fn>, assoc=assoc@entry=0x42c63c <symbol_lookup_assoc_fn>, object=object@entry=0x2f5870, closure=closure@entry=0x0) at hashtab.c:214 #2 0x0046a80e in scm_hash_fn_create_handle_x (table=0x2d8fd8, obj=0x2f5870, init=init@entry=0x904, hash_fn=hash_fn@entry=0x42c110 <symbol_lookup_hash_fn>, assoc_fn=assoc_fn@entry=0x42c63c <symbol_lookup_assoc_fn>, closure=closure@entry=0x0) at hashtab.c:698 #3 0x0042c207 in intern_symbol (symbol=<optimized out>) at symbols.c:195 #4 scm_i_str2symbol (str=0x4dd028 <scm_logbit_p__name_string_raw_cell>) at symbols.c:218 #5 0x0042c440 in scm_string_to_symbol ( string=string@entry=0x4dd028 <scm_logbit_p__name_string_raw_cell>) at symbols.c:323 #6 0x0041d0b5 in scm_init_numbers () at ../libguile/numbers.x:48 #7 0x00404cb9 in scm_i_init_guile (base=base@entry=0x2deff0c) at init.c:453 #8 0x00402348 in scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:835 #9 scm_i_init_thread_for_guile (base=0x2deff0c, parent=<optimized out>) at threads.c:814 #10 0x0040251d in scm_init_guile () at threads.c:869 #11 0x004013c5 in main () at guile-hello.c:15 (gdb) Any further ideas?