Hey Steve,

I'm no expert here, but the thing that sticks out to me is the garbage 
collection stacks in there, which suggests things are crashing while GC is 
running. Does the frequency change if you adjust the garbage collection 
frequency?

Charles

On Thursday, July 2, 2020 at 2:18:31 PM UTC-7 est...@yottadb.com wrote:

> We have developed a Go API (which we call a Go wrapper - 
> https://pkg.go.dev/lang.yottadb.com/go/yottadb?tab=doc) to the YottaDB 
> hierarchical key-value database (https://yottadb.com). It works well for 
> the most part, but there are some edge cases during process shutdown that 
> are standing between us and a full production-grade API. We have a test 
> program called threeenp1C2 that is an implementation of the classic 3n+1 
> problem that does some database-intensive activity. It comes in two 
> flavors: a multi-process version (1 [application] goroutine per process) 
> and a single process version (multiple [application] goroutines in one 
> process). The latter runs fine; the discussion below is about the 
> multi-process version.
>
> A Go main spawns 15 copies of itself as workers, each of which runs an 
> internal routine. Each process computes the lengths of 3n+1 sequences for a 
> block of integers, and then returns for another block of integers. The 
> results are stored in the database so that processes can use the results of 
> other processes' work. When they finish computing the results for the 
> problem, they shut down. So, for example, the overall problem may be to 
> compute the 3n+1 sequences for all integers from one through a million, 
> with processes working on blocks of 10,000 initial integers. There is 
> nothing special about the computation other than that it generates a lot of 
> database activity.
>
> This version runs fine, and always shuts down cleanly if allowed to run to 
> completion. But since the database is used for mission critical 
> applications, we have a number of stress tests. The threeenp1C2 test 
> involves starting 15 cooperative worker processes, and then sending each 
> process a SIGTERM or SIGINT, depending on the test. Sporadically, one of 
> the processes receiving the signal shuts down generating a core file due to 
> a SIGSEGV instead of shutting down cleanly. That's the 10,000 ft view. Here 
> are more details:
>
>    - The database engine is daemonless, and runs in the address space of 
>    each process, with processes cooperating with one another to manage the 
>    database using a variety of semaphores and data structures in shared 
> memory 
>    segments, as well as OS semaphores and mutexes. The database engine is 
>    single-threaded, and when multiple threads in a process call the database 
>    engine, there are mechanisms to ensure that non-reentrant code is called 
>    without reentrancy.
>    - The database engine is written in C, and traditionally has had a 
>    heavy reliance on signals but with the Go wrapper calling the engine 
>    through cgo, things were a bit dicey. So code was reworked for use with Go 
>    such that Go now handles the signals and lets the YottaDB engine know 
> about 
>    them. To that end, a goroutine for each signal type we want to know about 
>    (around 17 of them) is started up each of which then call into a "signal 
>    dispatcher" in the C code to drive signal handlers for those signals we 
> are 
>    notifed of.
>    - When a fatal signal such as a SIGTERM occurs, the goroutines started 
>    for signal handling are all told to close down and we wait for them to 
> shut 
>    down before driving a panic to stop the world (doing this reduced the 
>    failures from their previous failure rate of nearly 100%). The current 
>    failure rate with the core dumps now occurs 3-10% of the time. These 
>    strange failures are in the bowels of Go (usually in either a futex call 
> or 
>    something called newstack?). Empirical evidence suggests the failure 
>    rate increases when the system is loaded - I guess thus affecting the 
>    timing of how/when things shutdown though proof is hard to come by.
>    - The database engine uses optimistic concurrency control to implement 
>    ACID transactions. What this means is that Go application code (say 
> Routine 
>    A) calls the database engine through CGO, passing it a Go entry point (say 
>    Routine B) that the database engine calls one or more times till it 
>    completes the transaction (to avoid live-locks, in a final retry, should 
>    one be required, the engine locks out all other accesses, essentially 
>    single-threading the database). Routine B itself calls into the YottaDB 
> API 
>    through cgo. So mixed stacks of C and Go code are common.
>    - To avoid endangering database integrity, the engine attempts to shut 
>    down at “safe”points. If certain fatal signals are received at an unsafe 
>    spot, we defer handling the signal till it is in a safe place. To ensure 
>    that everything stops when it reaches a safe place, the engine calls back 
>    into Go and drives a panic of choice to shut down the entire process.  I 
>    find myself wondering if the "sandwich stack" of C and Go routines is 
>    somehow causing the panic to generate cores.
>    
> These failures are like nothing I've seen. Sometimes one of the threads 
> that we create in the C code are still around (it's been told to shutdown 
> but evidently hasn't gotten around to it yet - like thread 11 in the gdb 
> trace below). It has a stack trace with ydb_stm_thread() in it and is 
> generally asleep on a timer. Note I'm using Go 1.14.4 on Ubuntu 18.04.
>
> Here is the list of goroutines from delve (which I've only just started 
> using so not well versed in its usage):
>
> [3:01pm] [estess@flyingv] : 
> /testarea/estess/tst_V994_R129_dbg_04_200701_144712/go_0_6/sigint/go/src/threeenp1C2
>  
> > dlv core threeenp1C2 core.88642 --check-go-version=false
> Type 'help' for list of commands.
> (dlv) goroutines
>   Goroutine 1 - User: /snap/go/5830/src/time/sleep.go:84 time.NewTimer 
> (0x4ed308) (thread 89352)
>   Goroutine 2 - User: /snap/go/5830/src/runtime/proc.go:305 runtime.gopark 
> (0x445270)
>   Goroutine 3 - User: /snap/go/5830/src/runtime/proc.go:305 runtime.gopark 
> (0x445270)
>   Goroutine 4 - User: /snap/go/5830/src/runtime/proc.go:305 runtime.gopark 
> (0x445270)
>   Goroutine 18 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 34 - User: /snap/go/5830/src/runtime/sigqueue.go:147 
> os/signal.signal_recv (0x45a74c) (thread 88678)
>   Goroutine 35 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 36 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 37 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 38 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 53 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
>   Goroutine 54 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
> * Goroutine 66 - User: /snap/go/5830/src/runtime/sys_linux_amd64.s:568 
> runtime.futex (0x476273) (thread 88642)
>   Goroutine 67 - User: /snap/go/5830/src/runtime/mgcmark.go:1241 
> runtime.scanobject (0x42fa96) (thread 88718)
>   Goroutine 68 - User: /snap/go/5830/src/runtime/proc.go:305 
> runtime.gopark (0x445270)
> [15 goroutines]
> (dlv) bt
> 0  0x0000000000476273 in runtime.futex
>    at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> 1  0x0000000000471b50 in runtime.systemstack_switch
>    at /snap/go/5830/src/runtime/asm_amd64.s:330
> 2  0x000000000042b2db in runtime.gcMarkDone
>    at /snap/go/5830/src/runtime/mgc.go:1449
> 3  0x000000000042c38e in runtime.gcBgMarkWorker
>    at /snap/go/5830/src/runtime/mgc.go:2000
> 4  0x0000000000473c61 in runtime.goexit
>    at /snap/go/5830/src/runtime/asm_amd64.s:1373
> (dlv)
>
> And here's the thread list and traceback done in gdb:
>
> [3:04pm] [estess@flyingv] : 
> /testarea/estess/tst_V994_R129_dbg_04_200701_144712/go_0_6/sigint/go/src/threeenp1C2
>  
> > gdb threeenp1C2 core.88642 
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/> 
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/> 
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from threeenp1C2...done.
> [New LWP 88642]
> [New LWP 88678]
> [New LWP 88718]
> [New LWP 89351]
> [New LWP 88681]
> [New LWP 89352]
> [New LWP 88682]
> [New LWP 88684]
> [New LWP 89203]
> [New LWP 88666]
> [New LWP 88740]
> [New LWP 88783]
> [New LWP 88738]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by 
> `/extra3/testarea1/estess/tst_V994_R129_dbg_04_200701_144712/go_0_6/sigint/go/sr'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> 568             MOVL    AX, ret+40(FP)
> [Current thread is 1 (Thread 0x7fc4a58621c0 (LWP 88642))]
> Loading Go Runtime support.
> (gdb) thread apply all where
>
> Thread 13 (Thread 0x7fc44ffff700 (LWP 88738)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0xc0001804c8, val=0, 
> ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0xc0001804c8) at 
> /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000448ce0 in runtime.stopm () at 
> /snap/go/5830/src/runtime/proc.go:1834
> #4  0x000000000044a2fd in runtime.findrunnable (gp=0xc00004b000, 
> inheritTime=false) at /snap/go/5830/src/runtime/proc.go:2366
> #5  0x000000000044ae3c in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2526
> #6  0x000000000044b3bd in runtime.park_m (gp=0xc00008a900) at 
> /snap/go/5830/src/runtime/proc.go:2696
> #7  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #8  0x0000000000000000 in ?? ()
>
> Thread 12 (Thread 0x7fc44bffd700 (LWP 88783)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0xc0001032c8, val=0, 
> ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0xc0001032c8) at 
> /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000448ce0 in runtime.stopm () at 
> /snap/go/5830/src/runtime/proc.go:1834
> #4  0x000000000044a2fd in runtime.findrunnable (gp=0xc00003e800, 
> inheritTime=false) at /snap/go/5830/src/runtime/proc.go:2366
> #5  0x000000000044ae3c in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2526
> #6  0x000000000044b3bd in runtime.park_m (gp=0xc000105c80) at 
> /snap/go/5830/src/runtime/proc.go:2696
> #7  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #8  0x0000000000000000 in ?? ()
>
> Thread 11 (Thread 0x7fc44dffe700 (LWP 88740)):
> #0  __clock_nanosleep (clock_id=1, flags=1, req=0x7fc44dffdec0, rem=0x0) 
> at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
> #1  0x00007fc4a49b70f0 in ydb_stm_thread (dummy_parm=0x0) at 
> /Distrib/YottaDB/V994_R129/sr_unix/ydb_stm_thread.c:109
> #2  0x00007fc4a45796db in start_thread (arg=0x7fc44dffe700) at 
> pthread_create.c:463
> #3  0x00007fc4a42a288f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> Thread 10 (Thread 0x7fc47bfee700 (LWP 88666)):
> #0  runtime.usleep () at /snap/go/5830/src/runtime/sys_linux_amd64.s:146
> #1  0x000000000044fbfd in runtime.sysmon () at 
> /snap/go/5830/src/runtime/proc.go:4479
> #2  0x0000000000447793 in runtime.mstart1 () at 
> /snap/go/5830/src/runtime/proc.go:1097
> #3  0x00000000004476ae in runtime.mstart () at 
> /snap/go/5830/src/runtime/proc.go:1062
> #4  0x000000000053291c in crosscall_amd64 () at gcc_amd64.S:35
> #5  0x00007ffe8a519cd0 in ?? ()
> #6  0x0000000001c952a0 in ?? ()
> #7  0x0000000000000000 in ?? ()
>
> Thread 9 (Thread 0x7fc443fff700 (LWP 89203)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0xc000180848, val=0, 
> ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0xc000180848) at 
> /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000448ce0 in runtime.stopm () at 
> /snap/go/5830/src/runtime/proc.go:1834
> #4  0x0000000000449622 in runtime.startlockedm (gp=0xc000000180) at 
> /snap/go/5830/src/runtime/proc.go:2007
> #5  0x000000000044aba9 in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2563
> #6  0x000000000044be36 in runtime.goexit0 (gp=0xc000183080) at 
> /snap/go/5830/src/runtime/proc.go:2855
> #7  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #8  0x0000000000000000 in ?? ()
>
> Thread 8 (Thread 0x7fc45ffff700 (LWP 88684)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0x12ee618 
> <runtime.newmHandoff+24>, val=0, ns=-1) at 
> /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0x12ee618 
> <runtime.newmHandoff+24>) at /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000448c02 in runtime.templateThread () at 
> /snap/go/5830/src/runtime/proc.go:1812
> #4  0x0000000000447793 in runtime.mstart1 () at 
> /snap/go/5830/src/runtime/proc.go:1097
> #5  0x00000000004476ae in runtime.mstart () at 
> /snap/go/5830/src/runtime/proc.go:1062
> #6  0x000000000053291c in crosscall_amd64 () at gcc_amd64.S:35
> #7  0x00007ffe8a519d70 in ?? ()
> #8  0x0000000001c956c0 in ?? ()
> #9  0x0000000000000000 in ?? ()
>
> Thread 7 (Thread 0x7fc46fffd700 (LWP 88682)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0xc000088148, val=0, 
> ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0xc000088148) at 
> /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000449378 in runtime.stoplockedm () at 
> /snap/go/5830/src/runtime/proc.go:1977
> #4  0x000000000044afe6 in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2460
> #5  0x000000000044b3bd in runtime.park_m (gp=0xc00008a480) at 
> /snap/go/5830/src/runtime/proc.go:2696
> #6  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #7  0x0000000000000000 in ?? ()
>
> Thread 6 (Thread 0x7fc435ffe700 (LWP 89352)):
> #0  0x0000000000545d84 in __tsan::MemoryRangeSet(__tsan::ThreadState*, 
> unsigned long, unsigned long, unsigned long, unsigned long long) [clone 
> .isra.176] [clone .part.177] ()
> #1  0x0000000000475a73 in racecall () at 
> /snap/go/5830/src/runtime/race_amd64.s:381
> #2  0x0000000000000000 in ?? ()
>
> Thread 5 (Thread 0x7fc471ffe700 (LWP 88681)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0xc000058bc8, val=0, 
> ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c25f in runtime.notesleep (n=0xc000058bc8) at 
> /snap/go/5830/src/runtime/lock_futex.go:151
> #3  0x0000000000448ce0 in runtime.stopm () at 
> /snap/go/5830/src/runtime/proc.go:1834
> #4  0x000000000044a2fd in runtime.findrunnable (gp=0xc000046000, 
> inheritTime=false) at /snap/go/5830/src/runtime/proc.go:2366
> #5  0x000000000044ae3c in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2526
> #6  0x000000000044b3bd in runtime.park_m (gp=0xc000105b00) at 
> /snap/go/5830/src/runtime/proc.go:2696
> #7  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #8  0x0000000000000000 in ?? ()
>
> Thread 4 (Thread 0x7fc437fff700 (LWP 89351)):
> #0  runtime.epollwait () at /snap/go/5830/src/runtime/sys_linux_amd64.s:705
> #1  0x000000000043ed52 in runtime.netpoll (delay=9999949504, ~r1=...) at 
> /snap/go/5830/src/runtime/netpoll_epoll.go:119
> #2  0x000000000044a01b in runtime.findrunnable (gp=0xc00003e800, 
> inheritTime=false) at /snap/go/5830/src/runtime/proc.go:2329
> #3  0x000000000044ae3c in runtime.schedule () at 
> /snap/go/5830/src/runtime/proc.go:2526
> #4  0x000000000044b3bd in runtime.park_m (gp=0xc000105b00) at 
> /snap/go/5830/src/runtime/proc.go:2696
> #5  0x0000000000471b3b in runtime.mcall () at 
> /snap/go/5830/src/runtime/asm_amd64.s:318
> #6  0x0000000000000000 in ?? ()
>
> Thread 3 (Thread 0x7fc457fff700 (LWP 88718)):
> #0  runtime.scanobject (b=824635064320, gcw=0xc000042698) at 
> /snap/go/5830/src/runtime/mgcmark.go:1241
> #1  0x000000000042f30b in runtime.gcDrain (gcw=0xc000042698, flags=3) at 
> /snap/go/5830/src/runtime/mgcmark.go:1032
> #2  0x000000000046ec40 in runtime.gcBgMarkWorker.func2 () at 
> /snap/go/5830/src/runtime/mgc.go:1940
> #3  0x0000000000471bc6 in runtime.systemstack () at 
> /snap/go/5830/src/runtime/asm_amd64.s:370
> #4  0x0000000000447640 in ?? () at <autogenerated>:1
> #5  0x0000000000000000 in ?? ()
>
> Thread 2 (Thread 0x7fc473fff700 (LWP 88678)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f076 in runtime.futexsleep (addr=0x12ee720 
> <runtime.sig>, val=0, ns=-1) at /snap/go/5830/src/runtime/os_linux.go:45
> #2  0x000000000041c336 in runtime.notetsleep_internal (n=0x12ee720 
> <runtime.sig>, ns=-1, ~r2=<optimized out>)
>     at /snap/go/5830/src/runtime/lock_futex.go:174
> #3  0x000000000041c53c in runtime.notetsleepg (n=0x12ee720 <runtime.sig>, 
> ns=-1, ~r2=<optimized out>) at /snap/go/5830/src/runtime/lock_futex.go:228
> #4  0x000000000045a74c in os/signal.signal_recv (~r0=<optimized out>) at 
> /snap/go/5830/src/runtime/sigqueue.go:147
> #5  0x0000000000515ce0 in os/signal.loop () at 
> /snap/go/5830/src/os/signal/signal_unix.go:23
> #6  0x0000000000473c61 in runtime.goexit () at 
> /snap/go/5830/src/runtime/asm_amd64.s:1373
> #7  0x0000000000000000 in ?? ()
>
> Thread 1 (Thread 0x7fc4a58621c0 (LWP 88642)):
> #0  runtime.futex () at /snap/go/5830/src/runtime/sys_linux_amd64.s:568
> #1  0x000000000043f0f4 in runtime.futexsleep (addr=0x8b35d0 
> <runtime.sched+304>, val=0, ns=100000) at 
> /snap/go/5830/src/runtime/os_linux.go:51
> #2  0x000000000041c3de in runtime.notetsleep_internal (n=0x8b35d0 
> <runtime.sched+304>, ns=100000, ~r2=<optimized out>)
>     at /snap/go/5830/src/runtime/lock_futex.go:193
> #3  0x000000000041c4b1 in runtime.notetsleep (n=0x8b35d0 
> <runtime.sched+304>, ns=100000, ~r2=<optimized out>)
>     at /snap/go/5830/src/runtime/lock_futex.go:216
> #4  0x0000000000447dcc in runtime.forEachP (fn={void (runtime.p *)} 
> 0x7ffe8a519ef8) at /snap/go/5830/src/runtime/proc.go:1292
> #5  0x000000000046e7fe in runtime.gcMarkDone.func1 () at 
> /snap/go/5830/src/runtime/mgc.go:1456
> #6  0x0000000000471bc6 in runtime.systemstack () at 
> /snap/go/5830/src/runtime/asm_amd64.s:370
> #7  0x0000000000447640 in ?? () at <autogenerated>:1
> #8  0x0000000000471a54 in runtime.rt0_go () at 
> /snap/go/5830/src/runtime/asm_amd64.s:220
> #9  0x00000000005644d0 in ?? ()
> #10 0x0000000000471a5b in runtime.rt0_go () at 
> /snap/go/5830/src/runtime/asm_amd64.s:225
> #11 0x0000000000000003 in ?? ()
> #12 0x00007ffe8a51a058 in ?? ()
> #13 0x0000000000000003 in ?? ()
> #14 0x00007ffe8a51a058 in ?? ()
> #15 0x0000000000000000 in ?? ()
> (gdb) i goroutines
> * 1 running  time.NewTimer
>   2 waiting  runtime.gopark
>   3 waiting  runtime.gopark
>   4 waiting  runtime.gopark
>   18 waiting  runtime.gopark
> * 34 syscall  runtime.notetsleepg
>   35 waiting  runtime.gopark
> * 66 waiting  runtime.systemstack_switch
>   36 waiting  runtime.gopark
> * 67 waiting  runtime.systemstack_switch
>   53 waiting  runtime.gopark
>   68 waiting  runtime.gopark
>   37 waiting  runtime.gopark
>   38 waiting  runtime.gopark
>   54 waiting  runtime.gopark
>
> (gdb)
>
> In the testing I've done with tracing turned on, I have seen the panic 
> begin, I've seen the deferred yottadb.Exit() handler set in the main 
> routine start up, watched the main routine's cleanup handler run and then 
> it cores with one of these strange cores on a futex access. And it is 
> ALWAYS the main thread of the process that fails. I really don't know at a 
> low level what is happening so have no idea how to fix it. Note this also 
> seems like it may have occurred after Go took down its signal handlers as 
> there was ZERO output from this failure so this was not a typical 
> synchronous signal as Go defines them. This failure seems like it may have 
> been handled by the default handlers - thus creating the core even though 
> $GOTRACEBACK was not set.
>
> If anyone has any thoughts on what could be causing this, we would really 
> appreciate the suggestions.
>
> In case anyone would like to try to reproduce, below are links to the 
> source for the YottaDB runtime (would need to be built using directions in 
> README), the Go wrapper, and the specific test program I'm referring to. 
> I'm sorry this is not a nice tidy little package but unfortunately a lot of 
> code is involved and I've been unsuccessful trying to create this failure 
> without the full thing:
>
>    - YottaDB: https://gitlab.com/estess/YDB
>    - Go Wrapper: https://gitlab.com/estess/YDBGo/-/tree/develop
>    - threeenp1C2: 
>    https://gitlab.com/estess/YDBTest/-/blob/master/go/inref/threeenp1C2.go
>
> Note this facility uses a yottadb.pc config file that is located in the 
> install directory of the YottaDB runtime. To run the test after everything 
> is built and installed, do the following:
>
>    1. $ydb_dist/GDE exit {this runs GDE to create a global directory file 
>    [default mumps.gld] for the database in the current directory}
>    2. $ydb_dist/mupip create {this creates the database [default 
>    mumps.dat] in the current directory}
>    3. Run threeenp1C2 {It will sit and wait for input - it's just a test 
>    so has limited glamor}
>    4. Enter "1000000" which should have it running long enough to allow 
>    it to be shot with a signal.
>    5. From another session do "kill threeenp1C2" to kill all of the 
>    spawned processes. Each spawned process creates an output file. If no 
>    cores, try again. It may be good to script it so it does it over and over 
>    until a core occurs. Our failure rate is once in 20-30 though failures 
>    happen more often the more loaded the system is. 
>
> If further information from this or any of the other cores would help, I 
> can certainly do that if you let me know what/how and where to do it.
>
> Thankyou for your time.
> Steve
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/d5a0bc9b-b294-4649-bba9-0d7cf93a8e8an%40googlegroups.com.

Reply via email to