I managed to resolve the issue by allocating in non-GC memory, using persistentalloc. Should anyone want to look into it in detail, here is the commit <https://gitlab.com/AbelThar/go.batch/commit/ed813d3728781e97999013503ef81b5e6f977e9d>
On Monday, 8 April 2019 20:50:48 UTC+2, Tharen Abela wrote: > > I am keeping an `allb` slice, and with that I did see it occasionally > succeed. > > I am using the binarytree > <https://gitlab.com/AbelThar/go.batch/blob/b10ef431c29b01fa7568a7bf9712a0286033266f/batching/src/runnables/binarytree.go> > > test, since it is an issue regarding the GC. > In fact running it with GOGC=off, or also keep a slice with pointers in > the program, does consistently succeed, as well. > > What I do know is when I allocate a batch, I keep a raw pointer in the > slice, and is never popped or removed from there at any point. > P's will keep their own batch using a uintptr, and the others are stored > in either a global batch queue, or a queue of empty batches, the same as > gQueue but for the batch type *b*uintptr: all of which are irrelevant to > the GC. > > Now I modified the batch allocation to show me the pointer of `allb` and > the new batch allocated: > // Allocate a new batch > //go:nosplit > //go:yeswritebarrierrec > func allocb() *b { > // Break the cycle by doing acquirem/releasem around new(b). > // The acquirem/releasem increments m.locks during new(b), > // which keeps the garbage collector from being invoked. > mp := acquirem() > > var bp *b > > bp = new(b) > allb = append(allb, bp) > print("allb: ", allb, ", bp:", bp, "\n") > > releasem(mp) > return bp > } > > With GOGC=off I get that 6 batches have been created > GOGC=off GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc000062000, bp:0xc000060000 > allb: [6/8]0xc000062000, bp:0xc000060010 > > When it does succeed with the GC on, it consistently takes 13 batches, > which I find rather odd. > GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc000062000, bp:0xc000060000 > allb: [6/8]0xc000062000, bp:0xc000060010 > allb: [7/8]0xc000062000, bp:0xc000016150 > allb: [8/8]0xc000062000, bp:0xc0004b8000 > allb: [9/16]0xc000510000, bp:0xc00044a040 > allb: [10/16]0xc000510000, bp:0xc000514000 > allb: [11/16]0xc000510000, bp:0xc000540000 > allb: [12/16]0xc000510000, bp:0xc0004b8010 > allb: [13/16]0xc000510000, bp:0xc0004b8020 > > Now when it crashes it returns the following: (full stack trace on > pastebin) <https://pastebin.com/40iYNQrh> > GODEBUG=gccheckmark=1 gobatch run ./binarytree.go > allb: [1/1]0xc000010010, bp:0xc0000160f0 > allb: [2/2]0xc000012010, bp:0xc000016100 > allb: [3/4]0xc00000e020, bp:0xc000016110 > allb: [4/4]0xc00000e020, bp:0xc000016120 > allb: [5/8]0xc00006a000, bp:0xc000068000 > allb: [6/8]0xc00006a000, bp:0xc000068010 > allb: [7/8]0xc00006a000, bp:0xc000016170 > allb: [8/8]0xc00006a000, bp:0xc0004b4000 > allb: [9/16]0xc0004ea000, bp:0xc0004b4010 > allb: [10/16]0xc0004ea000, bp:0xc0004ee000 > allb: [11/16]0xc0004ea000, bp:0xc000448040 > runtime: marking free object 0xc000448040 found at *(0xc0004ea000+0x50) > base=0xc0004ea000 s.base()=0xc0004ea000 s.limit=0xc0004ec000 s.spanclass= > 18 s.elemsize=128 s.state=mSpanInUse > *(base+0) = 0xc0000160f0 > *(base+8) = 0xc000016100 > *(base+16) = 0xc000016110 > *(base+24) = 0xc000016120 > *(base+32) = 0xc000068000 > *(base+40) = 0xc000068010 > *(base+48) = 0xc000016170 > *(base+56) = 0xc0004b4000 > *(base+64) = 0xc0004b4010 > *(base+72) = 0xc0004ee000 > *(base+80) = 0xc000448040 <== > *(base+88) = 0x0 > *(base+96) = 0x0 > *(base+104) = 0x0 > *(base+112) = 0x0 > *(base+120) = 0x0 > obj=0xc000448040 s.base()=0xc000448000 s.limit=0xc00044a000 s.spanclass=5 > s.elemsize=16 s.state=mSpanInUse > *(obj+0) = 0x0 > *(obj+8) = 0xc0004ee000 > fatal error: marking free object > > At this point i'm assuming the error has been done, and the trace is just > when it was realized to be wrong. > > What I do notice is that its not always at the same level when the error > is noticed: > for the full stack trace, it was when the depth was 3, ... > goroutine 1 [runnable]:runtime.newobject(0x464820, 0x2) > /go.batch/src/runtime/malloc.go:1067 +0x51 fp=0xc000084b20 sp= > 0xc000084b18 pc=0x40a701 > main.bottomUpTree(0xffffffffffffbefb, 0x3, 0xc0008bb840) > /go.batch/batching/src/runnables/binarytree.go:33 +0x91 fp= > 0xc000084b60 sp=0xc000084b20 pc=0x44f011 > > ...and in another stack trace <https://pastebin.com/AukWCxAe>, it occured > when the depth was 1. > > I ran it some more times, and it always seemed to crash after batch 11, > and depth ranged from 0 to 3 > > The stack trace for depth 0 <https://pastebin.com/ZMG02MM3> of goroutine > 1 started more interesting, where it did not trigger at `newobject`. > goroutine 1 [GC assist marking]: > runtime.systemstack_switch() > /go.batch/src/runtime/asm_amd64.s:311 fp=0xc000086930 sp=0xc000086928 > pc=0x446d30 > runtime.gcAssistAlloc(0xc000000180) > /go.batch/src/runtime/mgcmark.go:422 +0x15c fp=0xc000086990 sp= > 0xc000086930 pc=0x416e5c > runtime.mallocgc(0x18, 0x464820, 0x1, 0x18) > /go.batch/src/runtime/malloc.go:843 +0x8e6 fp=0xc000086a30 sp= > 0xc000086990 pc=0x40a456 > runtime.newobject(0x464820, 0xc00045e000) > /go.batch/src/runtime/malloc.go:1068 +0x38 fp=0xc000086a60 sp= > 0xc000086a30 pc=0x40a6e8 > main.bottomUpTree(0xfffffffffffdec85, 0x0, 0x20) > /go.batch/batching/src/runnables/binarytree.go:29 +0xfc fp= > 0xc000086aa0 sp=0xc000086a60 pc=0x44f07c > > I think at this point I may be overthinking it a bit, and my lack of > experience is more apparent. > If there is something else I should be looking into, I am open to ideas. > > On Monday, 8 April 2019 19:42:56 UTC+2, Ian Lance Taylor wrote: >> >> On Sun, Apr 7, 2019 at 12:30 PM Tharen Abela <abela...@gmail.com> wrote: >> > >> > The gist of the problem is that I am allocating an object in the >> runtime, (which I refer to as the batch struct), and the GC is deallocating >> the object, even though a reference is being kept in a slice (similar to >> allp and allm). >> > While allocating, I call acquirem to prevent the GC being triggered, >> during which I append the batch pointer to the slice. >> > >> > From running `GODEBUG=gccheckmark=1` I know that the batch object >> allocated, was being freed, yet when it crashes it says the object is being >> marked (hence marking a freed object). >> > >> > Now my intention is to keep the batch allocation till the end of the >> program, keeping it in an extra batch queue, so it should not be freed. >> > >> > Thinking about it now, I am not sure if the deallocation occurs after >> the work of the program is finished and is winding down, by de-allocating >> everything, but a reference is still kept in allb, so a double free will >> occur, OR, >> > what I have been assuming so far, that this takes place while work is >> incomplete so the GC is incorrectly de-allocating a batch object still in >> use. >> > >> > Another thing to take note of, is that the batch in P is referenced by >> a uintptr, I'm not sure how that might affect it. >> >> That is going to be your problem. The GC only tracks values with live >> pointers. A value of type `uintptr` can not be a live pointer. The >> runtime can only get away with the `guintptr`, `muintptr` and >> `puintptr` types because it knows that there are existing other >> pointers to all G and P values (in the allgs and allp slices and the >> allm linked list). If there is ever any moment that your batch >> objects are only referenced by `uintptr` values and not by a value of >> pointer type, then the garbage collector can collect it. >> >> Ian >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.