On Tue, Apr 30, 2019 at 6:33 PM vaastav anand <vaastav.anan...@gmail.com>
wrote:

> I have encountered a SIGBUS with go before but I was hacking inside the
> runtime and using shared mem with mmap.
>
> goroutines are assigned IDs incrementally and each goroutine at bare
> minimum has 2.1KB stack space in go1.11 down from 2.7KB in go1.10 if I
> recall correctly. So, at the very least at that point you could have easily
> burnt through at least 7.5GB of memory. I am not sure what could happen if
> you somehow exceed the amount of memory available. Seems like that is a
> test you could write and see if launching more goroutines than that could
> fit in the size of memory could actually cause a SIGBUS.
>

The stack trace only listed 282 goroutines, which seems about right
considering the number of clients that are connected. Its about 3
goroutines per client connection, plus the other stuff in the server. I
think it just indicates that I have turned over a lot of client connections
over time.


>
> On Monday, 29 April 2019 23:25:52 UTC-7, Justin Israel wrote:
>>
>>
>>
>> On Tue, Apr 30, 2019 at 6:09 PM vaastav anand <vaastav...@gmail.com>
>> wrote:
>>
>>> Ok, so in the 2nd piece of code you posted, is some request being pushed
>>> onto some OS queue? If so, is it possible that you may be maxing the queue
>>> out and then pushing something else into it and that could cause a SIGBUS
>>> as well.... This seems super farfetched tho but it is hard to debug without
>>> really knowing what the application might really be doing.
>>>
>>
>> I want to say that I really appreciate you taking the time to try and
>> give me some possible ideas, even though this is a really vague problem. I
>> had only hoped someone had encountered something similar.
>>
>> So that line in the SIGBUS crash is just trying to add a subscription to
>> a message topic callback in the nats client connection:
>> https://godoc.org/github.com/nats-io/go-nats#Conn.Subscribe
>> It's pretty high level logic at my application level.
>>
>> One thing that stood out to me was that in the crash, the goroutine id
>> number was 3538668. I had to double check to confirm that the go runtime
>> just uses an insignificant increasing number. I guess it does indicate that
>> the application turned over > 3 mil goroutines by that point. I'm wondering
>> if this is caused by something in the gnatsd embedded server (
>> https://github.com/nats-io/gnatsd/tree/master/server) since most the
>> goroutines do come from that, with all the client handling going on. If we
>> are talking about something that is managing very large queues, that would
>> be the one doing so in this application.
>>
>>
>>>
>>> On Monday, 29 April 2019 22:57:40 UTC-7, Justin Israel wrote:
>>>>
>>>>
>>>>
>>>> On Tue, Apr 30, 2019 at 5:43 PM vaastav anand <vaastav...@gmail.com>
>>>> wrote:
>>>>
>>>>> I'd be very surprised if the anonymous goroutine is the reason behind
>>>>> a SIGBUS violation.
>>>>> So, if I remember SIGBUS correctly, it means that you are issuing a
>>>>> read/write to a memory address which is not really addressable or it is
>>>>> misaligned. I think the chances of the address being misaligned are very
>>>>> low.....so it really has to be a non-existent address.
>>>>> It can happen if you have try to access memory outside the region
>>>>> mmaped into your application.
>>>>> If your application has any kind of mmap or shared memory access, I
>>>>> would start there.
>>>>> In any case your best bet is to somehow reproduce the bug consistently
>>>>> and then figure out which memory access is causing the fault.
>>>>>
>>>>
>>>> My application isn't doing anything with mmap or shared memory, and my
>>>> direct and indirect dependencies don't seem to be anything like that
>>>> either. Its limited to pretty much nats.io client, gnatds embedded
>>>> server, and a thrift rpc.
>>>>
>>>> It seems so random that I doubt I could get a reproducible crash. So I
>>>> can really only try testing this on go 1.11 instead to see if any of the GC
>>>> work in 1.12 causes this.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Monday, 29 April 2019 21:59:34 UTC-7, Justin Israel wrote:
>>>>>>
>>>>>>
>>>>>> On Thursday, November 29, 2018 at 6:22:56 PM UTC+13, Justin Israel
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 29, 2018 at 6:20 PM Justin Israel <justin...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Nov 29, 2018 at 5:32 PM Ian Lance Taylor <ia...@golang.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Nov 28, 2018 at 7:18 PM Justin Israel <justin...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > I've got a service that I have been testing quite a lot over the
>>>>>>>>> last few days. Only after I handed it off for some testing to a 
>>>>>>>>> colleague,
>>>>>>>>> was he able to produce a SIGBUS panic that I had not seen before:
>>>>>>>>> >
>>>>>>>>> > go 1.11.2 linux/amd64
>>>>>>>>> >
>>>>>>>>> > The service does set up its own SIGINT/SIGTERM handling via the
>>>>>>>>> typical siginal.Notify approach. The nature of the program is that it
>>>>>>>>> listens on nats.io message queues, and receives requests to run
>>>>>>>>> tasks as sub-processes. My tests have been running between 40-200 of 
>>>>>>>>> these
>>>>>>>>> instances over the course of a few days. But this panic occurred on a
>>>>>>>>> completely different machine that those I had been testing...
>>>>>>>>> >
>>>>>>>>> > goroutine 1121 [runnable (scan)]:
>>>>>>>>> > fatal error: unexpected signal during runtime execution
>>>>>>>>> > panic during panic
>>>>>>>>> > [signal SIGBUS: bus error code=0x2 addr=0xfa2adc pc=0x451637]
>>>>>>>>> >
>>>>>>>>> > runtime stack:
>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72
>>>>>>>>> > runtime.sigpanic()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 +0x2f2
>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff,
>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377
>>>>>>>>> > runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0,
>>>>>>>>> 0xc0004baa80, 0x0)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:728 +0xf3
>>>>>>>>> > runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0,
>>>>>>>>> 0xc0004baa80)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:682 +0x52
>>>>>>>>> > runtime.tracebackothers(0xc00012e780)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:947 +0x187
>>>>>>>>> > runtime.dopanic_m(0xc00012e780, 0x42dcc2, 0x7f83f6ffc808, 0x1)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:805 +0x2aa
>>>>>>>>> > runtime.fatalthrow.func1()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:663 +0x5f
>>>>>>>>> > runtime.fatalthrow()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:660 +0x57
>>>>>>>>> > runtime.throw(0xcf7fe3, 0x2a)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/panic.go:608 +0x72
>>>>>>>>> > runtime.sigpanic()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/signal_unix.go:374 +0x2f2
>>>>>>>>> > runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff,
>>>>>>>>> 0x0, 0xc0004baa80, 0x0, 0x0, 0x7fffffff, 0x7f83f6ffcd00, 0x0, 0x0, 
>>>>>>>>> ...)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/traceback.go:190 +0x377
>>>>>>>>> > runtime.scanstack(0xc0004baa80, 0xc000031270)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:786 +0x15a
>>>>>>>>> > runtime.scang(0xc0004baa80, 0xc000031270)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/proc.go:947 +0x218
>>>>>>>>> > runtime.markroot.func1()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:264 +0x6d
>>>>>>>>> > runtime.markroot(0xc000031270, 0xc000000047)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:245 +0x309
>>>>>>>>> > runtime.gcDrain(0xc000031270, 0x6)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgcmark.go:882 +0x117
>>>>>>>>> > runtime.gcBgMarkWorker.func2()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/mgc.go:1858 +0x13f
>>>>>>>>> > runtime.systemstack(0x7f83f7ffeb90)
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/asm_amd64.s:351 +0x66
>>>>>>>>> > runtime.mstart()
>>>>>>>>> >         /vol/apps/go/1.11.2/src/runtime/proc.go:1229
>>>>>>>>> >
>>>>>>>>> > Much appreciated for any insight.
>>>>>>>>>
>>>>>>>>> Is the problem repeatable?
>>>>>>>>>
>>>>>>>>> It looks like it crashed while tracing back the stack during
>>>>>>>>> garbage
>>>>>>>>> collection, but I don't know why since the panic was evidently
>>>>>>>>> able to
>>>>>>>>> trace back the stack just fine.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the reply. Unfortunately it was rare and never happened
>>>>>>>> in my own testing of thousands of runs of this service. The colleague 
>>>>>>>> that
>>>>>>>> saw this crash on one of his workstations was not able to repro it 
>>>>>>>> after
>>>>>>>> attempting another run of the workflow. I wasn't really sure how to 
>>>>>>>> debug
>>>>>>>> this particular crash since it was in the gc and I have seen a "panic
>>>>>>>> during panic" before. Thought it might jump out at someone.
>>>>>>>>
>>>>>>>
>>>>>>> Oops. I meant that I *haven't* seen a "panic during panic" before
>>>>>>> :-)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Ian
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> This is a follow up to the issue of seeing a SIGBUS in my
>>>>>> application. While I still don't have a way to reproduce the problem, I
>>>>>> have received reports from my users of another similar SIGBUS:
>>>>>>
>>>>>> unexpected fault address 0x7fdf50
>>>>>> fatal error: fault
>>>>>> [signal 0xb code=0x2 addr=0x7fdf50 pc=0x7fdf50]
>>>>>>
>>>>>> runtime.throw(0xad7840, 0x5)
>>>>>>         /s/go/1.12.1/src/runtime/panic.go:617 +0x72 fp=0xc000f75aa8
>>>>>> sp=0xc000f75a78 pc=0x444a5e
>>>>>> runtime.sigpanic()
>>>>>>         /s/go/1.12.1/src/runtime/sigpanic_unix.go:387 +0x47e
>>>>>> fp=0xc000f75ad8 sp=0xc000f75aa8 pc=0x444a5e
>>>>>>
>>>>>> project.com/project/obj.(*Server).newPushHandler.func1.1.1(0xc0008ea330,
>>>>>> 0x25, 0x0)
>>>>>>
>>>>>> This is an anonymous inline function closure that was passed to a
>>>>>> nats.io client topic subscription. If I am reading this correctly,
>>>>>> it seems the address to the anonymous function is suddenly invalid?
>>>>>>
>>>>>> ie.
>>>>>>
>>>>>> go func() {
>>>>>>     ...
>>>>>>     someChan := make(chan bool, 1)
>>>>>>     natsConn.Subscribe(topic, func(_ string, typ Type) {
>>>>>>         ...
>>>>>>         someChan <- true
>>>>>>     })
>>>>>>     ...
>>>>>> }()
>>>>>>
>>>>>> Could I be triggering a bug based on this anonymous function closure
>>>>>> in the goroutine? I can try defining things outside the goroutine,
>>>>>> including the function. But honestly without this being a reliable crash 
>>>>>> I
>>>>>> would be fumbling in the dark.
>>>>>>
>>>>>> Justin
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "golang-nuts" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> golan...@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "golang-nuts" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> golan...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/5tIkzXWCK0k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to