Hi Christian, I need to take a deeper look to give any constructive comment but your use of auto_ptr should be at fault. This can be a memory allocation problem. As Murphy hinted on, you might want to use the facilities available in NOX to help rather than write everything on your own. For example, messenger does binary messages on a TCP connection already, and it uses cooperative thread---so that should help.
Regards KK On 22 September 2011 22:48, Murphy McCauley <jam...@nau.edu> wrote: > I think this is a race condition because you're calling > send_openflow_packet() from a non-cooperative thread. > I think your best bet is probably to rewrite your server thread as a > cooperative thread. > There are other options... you could have queue which is consumed by a > simple cooperative thread and does the actual sends from there. Or... I > can't remember for sure if the cooperative threading library supports > scheduling one thread group from a native thread, but... when you want to > send from the native thread, schedule a cooperative thread (and then wait > for it to run) which just blocks until you're done sending from the native > thread. Or you could put locks into NOX's async output stuff (I would > suggest not doing that). There are probably lots of other options too. I'd > still suggest rewriting your server thread as a cooperative thread if > possible. :) > Hope that helps. > -- Murphy > On Sep 21, 2011, at 11:06 AM, Christian Esteve Rothenberg wrote: > > Dear NOX friends, > > we are facing a nasty bug and we would very much appreciate any help > in debugging and understanding the root cause. We have been > struggling for some time now... :( > > The code base is fairly simple and has worked well for some time, > but for some reason it has started to crash: > https://github.com/chesteve/RouteFlow/blob/master/rf-controller/src/nox/netapps/routeflowc/routeflowc.cc > > As fas as I can tell, the code has remained unchanged and only the datapath > and application traffic (i.e., payload of packet-in and packet-out packets) > has changed. > > > This is the error we are seeing in NOX, a failed assertion: > > /usr/include/c++/4.5/backward/auto_ptr.h:194: element_type* > std::auto_ptr<_Tp>::operator->() const [with _Tp = vigil::Buffer, > element_type = vigil::Buffer]: Assertion '_M_ptr != 0' failed. > Caught signal 6. > 0xb74ae2be 64 (vigil::fault_handler(int)+0x4e) > 0xb7748400 3068602152 (__kernel_sigreturn+0x0) > 0xb71dc34e 296 (abort+0x17e) > 0xb74ecc11 80 (vigil::Openflow_stream_connection::send_tx_buf()+0x121) > 0xb74ece21 80 > (vigil::Openflow_stream_connection::do_send_openflow(ofp_header > const*)+0xc1) > 0xb74ed7cf 80 > (vigil::Openflow_connection::call_send_openflow(ofp_header > const*)+0x2f) > 0xb74ee14f 64 > (vigil::Openflow_connection::send_openflow(ofp_header const*, > bool)+0x5f) > 0xb74eedae 96 > (vigil::Openflow_connection::send_packet(vigil::Buffer const&, > ofp_action_header const*, unsigned int, unsigned short, bool)+0xfe) > 0xb74eeeb9 96 > (vigil::Openflow_connection::send_packet(vigil::Buffer const&, > unsigned short, unsigned short, bool)+0x79) > 0xb75f18b4 96 > (vigil::nox::send_openflow_packet_out(vigil::datapathid const&, > vigil::Buffer const&, unsigned short, unsigned short, bool)+0x74) > 0xb75ce7cc 48 > (vigil::container::Component::send_openflow_packet(vigil::datapathid > const&, vigil::Buffer const&, unsigned short, unsigned short, bool) > const+0x3c) > > > Using gdb, the backtrace is as follows: > > > (gdb) bt > #0 0xb772c367 in ?? () from /lib/ld-linux.so.2 > #1 0xb772c979 in ?? () from /lib/ld-linux.so.2 > #2 0xb7730a31 in ?? () from /lib/ld-linux.so.2 > #3 0xb7736c40 in ?? () from /lib/ld-linux.so.2 > #4 0xb7487dc2 in fgets () at /usr/include/bits/stdio2.h:255 > #5 read_mem_map () at ../../../src/lib/fault.cc:79 > #6 vigil::dump_backtrace () at ../../../src/lib/fault.cc:180 > #7 0xb74882be in vigil::fault_handler (sig_nr=6) at > ../../../src/lib/fault.cc:280 > #8 <signal handler called> > #9 0xb7722424 in __kernel_vsyscall () > #10 0xb71b2e71 in raise (sig=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #11 0xb71b634e in abort () at abort.c:92 > #12 0xb74c6c11 in __replacement_assert (this=0x93d7830) at > /usr/include/c++/4.5/i686-linux-gnu/bits/c++config.h:326 > #13 operator-> (this=0x93d7830) at > /usr/include/c++/4.5/backward/auto_ptr.h:194 > #14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at > ../../../src/lib/openflow.cc:824 > #15 0xb74c6e21 in vigil::Openflow_stream_connection::do_send_openflow > (this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:844 > #16 0xb74c77cf in vigil::Openflow_connection::call_send_openflow > (this=0x93d7830, oh=0x9104c50) at ../../../src/lib/openflow.cc:248 > #17 0xb74c814f in vigil::Openflow_connection::send_openflow > (this=0x93d7830, oh=0x9104c50, block=true) at > ../../../src/lib/openflow.cc:232 > #18 0xb74c8dae in vigil::Openflow_connection::send_packet > (this=0x93d7830, packet=..., actions=0xb64a6618, actions_len=8, > in_port=65535, block=true) at ../../../src/lib/openflow.cc:445 > #19 0xb74c8eb9 in vigil::Openflow_connection::send_packet > (this=0x93d7830, packet=..., out_port=1, in_port=65535, block=true) at > ../../../src/lib/openflow.cc:413 > #20 0xb75cb8b4 in vigil::nox::send_openflow_packet_out > (datapath_id=..., packet=..., out_port=1, in_port=65535, block=true) > at ../../../src/builtin/nox.cc:435 > #21 0xb75a87cc in vigil::container::Component::send_openflow_packet > (this=0x92232f8, datapath_id=..., packet=..., out_port=1, > in_port=65535, block=true) at ../../../src/builtin/component.cc:83 > #22 0xb64b2876 in process_message (this=0x92232f8) at > ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 > #23 (anonymous namespace)::RouteFlowC::server (this=0x92232f8) at > ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:470 > #24 0xb64b034d in operator() (function_obj_ptr=...) at > /usr/include/boost/bind/mem_fn_template.hpp:49 > #25 operator()<boost::_mfi::mf0<void, <unnamed>::RouteFlowC>, > boost::_bi::list0> (function_obj_ptr=...) at > /usr/include/boost/bind/bind.hpp:253 > #26 operator() (function_obj_ptr=...) at > /usr/include/boost/bind/bind_template.hpp:20 > #27 > boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, > boost::_mfi::mf0<void, <unnamed>::RouteFlowC>, > boost::_bi::list1<boost::_bi::value<<unnamed>::RouteFlowC*> > >, > void>::invoke(boost::detail::function::function_buffer &) > (function_obj_ptr=...) at > /usr/include/boost/function/function_template.hpp:153 > #28 0xb74f9f35 in operator() (thread_=0x9352cf0) at > /usr/include/boost/function/function_template.hpp:1013 > #29 vigil::thread_main (thread_=0x9352cf0) at > ../../../src/lib/threads/impl.cc:1359 > #30 0xb7174e99 in start_thread (arg=0xb64a8b70) at pthread_create.c:304 > #31 0xb725873e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130 > > (gdb) f 24 > #24 0xb64b2876 in process_message (this=0x92232f8) at > ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 > 195 (Buffer &) buff, > pack_msg.port_out, OFPP_NONE, true) ) > (gdb) display buff > 12: buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data = > 0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity = > 78} > > > > Inspecting frame 22, which seems to be the starting point of the issue > accoding to the NOX log, the variable values seem ok: > > > (gdb) frame 22 > #22 0xb64b2876 in process_message (this=0x92232f8) at > ../../../../../src/nox/netapps/routeflowc/routeflowc.cc:195 > 195 (Buffer &) buff, pack_msg.port_out, OFPP_NONE, true) ) > (gdb) info args > msg = 0xb64a72e0 > this = 0x92232f8 > (gdb) info locals > buff = {<vigil::Buffer> = {_vptr.Buffer = 0xb755ba98, m_data = > 0x939a980 "\001", m_size = 78}, base = 0x939a980 "\001", capacity = > 78} > pack_msg = {datapath_id = 7, port_out = 1, pkt_id = 2697} > > > > Inspecting now frame 14 where the problem arises, in line 824 we can > see the value of M_ptr = 0x0, which causes the assertion error. > > Anyone why this may be happening or how to prevent it? > > > (gdb) f 14 > #14 vigil::Openflow_stream_connection::send_tx_buf (this=0x93d7830) at > ../../../src/lib/openflow.cc:824 > 824 if (!tx_buf->size()) { > (gdb) info args > this = 0x93d7830 > (gdb) info locals > bytes_written = 102 > error = <value optimized out> > (gdb) print tx_buf > $9 = {_M_ptr = 0x0} > > > Thanks in advance for any hint! > > Christian > > -- > Christian Esteve Rothenberg, Ph.D. > Converged Networks Division (DRC) > Tel.:+55 19-3705-4479 / Cel.: +55 19-8193-7087 > est...@cpqd.com.br > www.cpqd.com.br > > > > -- > Christian > _______________________________________________ > nox-dev mailing list > nox-dev@noxrepo.org > http://noxrepo.org/mailman/listinfo/nox-dev > > > _______________________________________________ > nox-dev mailing list > nox-dev@noxrepo.org > http://noxrepo.org/mailman/listinfo/nox-dev > > _______________________________________________ nox-dev mailing list nox-dev@noxrepo.org http://noxrepo.org/mailman/listinfo/nox-dev