Still errors I'm afraid :/

On Thursday, 21 March 2019 21:54:59 UTC-7, Ian Lance Taylor wrote:
>
> On Thu, Mar 21, 2019 at 9:39 PM Tom <hype...@gmail.com <javascript:>> 
> wrote: 
> > 
> > I've been stuck on this for a few days so thought I would ask the brains 
> trust. 
> > 
> > TL;DR: When I have native amd64 instructions mutating (updating the len 
> + values of a []uint64) a slice, I experience spurious & random memory 
> corruption when under heavy load (# runnable goroutines > MAXPROCS, doing 
> the same thing continuously), and only when the GC is enabled. Any 
> debugging ideas or things I should look into? 
> > 
> > Background: 
> > 
> > I'm calling into go assembly with a few pointers to slices (*[]uint64), 
> and that assembly is mutating them (reading/writing values, updating len 
> within capacity). I'm experiencing random memory corruption, but I can only 
> trigger it in the following scenarios: 
> > 
> > Heavy load - Doing a zillion things at once (specifically running all my 
> test cases in parallel) and maxing out my machine. 
> > Parallelism - A panic due to memory corruption happens faster if 
> --parallel is set higher, and never if not in parallel. 
> > GC - The panic never happens if the GC is disabled (of course, the test 
> process eventually runs out of memory). 
> > 
> > The memory corruption varies, but usually results in an element of an 
> unrelated slice being zero'ed, the len of a unrelated slice being zeroed, 
> or (less likely) a segfault. 
> > 
> > Tested on go1.11.2 and go1.12.1. I can only trigger this if I run all my 
> test cases at once (with --count at 8000 or so & using t.Parallel()). 
> Running thing serially or individually yields the correct behaviour. 
> > 
> > The assembly in question looks like this: 
> > 
> > TEXT ·jitcall(SB),NOSPLIT|NOFRAME,$0-24 
> >         GO_ARGS 
> >         MOVQ asm+0(FP),     AX  // Load the address of the assembly 
> section. 
> >         MOVQ stack+8(FP),   R10 // Load the address of the 1st slice. 
> >         MOVQ locals+16(FP), R11 // Load the address of the 2nd slice. 
> >         MOVQ 0(AX),         AX  // Deference pointer to native code. 
> >         JMP AX                  // Jump to native code. 
> > 
> > And slice manipulation like this (this is a 'pop'): 
> > 
> >  MOVQ r13,     [r10+8]       // Load the length of the slice. 
> >  DECQ r13                    // Decrements the len (I can guarantee this 
> will never underflow). 
> >  MOVQ r12,     [r10]         // Load the 0th element address. 
> >  LEAQ r12,     [r12 + r13*8] // Compute the address of the last element. 
> >  MOVQ reg,     [r12]         // Load the element to reg. 
> >  MOVQ [r10+8], r13           // Write the len back. 
> > 
> > or 'push' like this (note: cap is always large enough for any pushes) 
> ... 
> > 
> >  MOVQ r12,     [r10]          // Load the 0th element address. 
> >  MOVQ r13,     [r10+8]        // Load the len. 
> >  LEAQ r12,     [r12 + r13*8]  // Compute the address of the last element 
> + 1. 
> >  INCQ r13                     // Increment the len. 
> >  MOVQ [r10+8], r13            // Save the len. 
> >  MOVQ [r12],   reg            // Write the new element. 
> > 
> > 
> > I acknowledge that calling into code like this is unsupported, but I 
> struggle to understand how such corruption can happen, and having stared at 
> it for a few days, I am frankly stumped. I mean, even if non-cooperative 
> preemption was in these versions of Go I would expect the GC to  abort when 
> it cant find the stack maps for my RIP value. With no GC safe points in my 
> native assembly, I dont see how the GC could interfere (yet the issue 
> disappears with the GC off??). 
> > 
> > Questions: 
> > 
> > Any ideas what I'm doing wrong? 
> > Any ideas how I can trace this from the application side and also the 
> runtime side? I've tried schedtrace and the like, but the output didnt 
> appear useful or correlated to the crashes. 
> > Any suggestions for assumptions I might have missed and should write 
> tests / guards for? 
>
> See whether it helps to add runtime.KeepAlive calls for the slices and 
> any other pointers that you pass to the assembly code.  If that fixes 
> the problem, then it's a liveness problem. 
>
> Ian 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to