:have you put a SFENCE between write A and write B? You never tell us :where you've tried to put the various fence instructions... : :-- : John-Mark Gurney Voice: +1 415 225 5579
No, I haven't tried doing that because both the AMD and Intel manuals make it very clear that writes are ordered. This is the code on the read side: wi = ip->ip_windex; <<<<<<< READ B (INDEX) while ((ri = ip->ip_rindex) != wi) { ip->ip_rindex = ri + 1; ri &= MAXCPUFIFO_MASK; ip->ip_func[ri](ip->ip_arg[ri], frame); ip->ip_xindex = ip->ip_rindex; } ip_func is lwkt_putport_remote which is basically: static void lwkt_putport_remote(lwkt_msg_t msg) { lwkt_port_t port = msg->ms_target_port; <<<<<< READ A thread_t td = port->mp_td; TAILQ_INSERT_TAIL(&port->mp_msgq, msg, ms_node); [ CRASH ON BAD 'PORT' VARIABLE ] if (port->mp_flags & MSGPORTF_WAITING) lwkt_schedule(td); } When the crash occurs, the data load of A is bad data... the contents of that field in the msg structure BEFORE the other cpu had written it rather then after. It is looking at the correct message structure, it happened to be in a register when it crashed and it matches the message structure that was transmitted. The contents of the field in the message structure post-crash was *CORRECT*. There are about 16 instructions between the READ B where the code sees the updated index and the READ A where the code reads the bad data. ------------ On the sending side we have this: int lwkt_default_putport(lwkt_port_t port, lwkt_msg_t msg) { crit_enter(); msg->ms_flags |= MSGF_QUEUED; /* abort interlock */ msg->ms_flags &= ~MSGF_DONE; msg->ms_target_port = port; <<<<<<<<<<< WRITE A _lwkt_putport(port, msg, 0); crit_exit(); return(EASYNC); } [ inline that default_putport calls obviously comes before, putting it after so the code flow is more obvious ] static __inline void _lwkt_putport(lwkt_port_t port, lwkt_msg_t msg, int force) { thread_t td = port->mp_td; if (force || td->td_gd == mycpu) { TAILQ_INSERT_TAIL(&port->mp_msgq, msg, ms_node); if (port->mp_flags & MSGPORTF_WAITING) lwkt_schedule(td); } else { lwkt_send_ipiq(td->td_gd, (ipifunc_t)lwkt_putport_remote, msg); } } lwkt_send_ipiq( ... ) { ... [ about 7-8 lines of executed C code ] ... /* * Queue the new message */ windex = ip->ip_windex & MAXCPUFIFO_MASK; ip->ip_func[windex] = (ipifunc2_t)func; ip->ip_arg[windex] = arg; ++ip->ip_windex; <<<<<<<<<<< WRITE B (INDEX) --gd->gd_intr_nesting_level; ... } Which is about ~30 instructions between the writing of A and the writing of B. It seems very unlikely that the writes got misordered on the sending side. But on the receiving side there are ~16 instructions between the read B and the read A. This seemed very unlikely to me too but I have not been able to come to any other conclusion. During tests when we added 'too much' debug code to the READ side the problem went away. When we added debug code to the WRITE side the problem seemed to stay put. The original crash was reported on a system with 4 processor boards (8 logical cpus). The user pulled 3 boards out so there was one processor board and 2 logical cpus and the problem still occured. It seems so unlikely that this could occur across physical cpus that I was not surprised at all by this. But 16 instructions seemed unlikely to me. The only scenario I can come up with is that the READ SIDE on the HT cpu (logical cpu #1) did a speculative read of B before logical cpu #0 wrote to it, then somehow held that speculative read for 16 whole instructions on logical cpu #1. Is that even possible ? holding speculative read data across 16 instructions ? The only other possibility is that there are major interactions in the instruction pipeline and cpu #1 is reading e.g. the index B from the pipeline or write buffer and data A from memory prior to data A being retired to memory by cpu #0. That seems ridiculous to me, but I wonder if it's possible without an SFENCE. This crash occurs fairly rarely. It takes a lot of packets for it to occur... perhaps a million or more. In anycase, we are now testing a kernel with a locked bus cycle inbetwen the READ B and the READ A to see if that fixes the problem. If that doesn't work I will put an SFENCE between the WRITE A and the WRITE B. And if that doesn't work then I'm shooting up the wrong alley and it isn't an instruction/memory ordering issue. -Matt Matthew Dillon <[EMAIL PROTECTED]> _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"