It sounds like a paradox. Did adding another goroutine really make my testing/synctest based network simulation fully deterministic, suitable for DST?
Yep. On the fifth rewrite, I finally discovered the fundamental way to leverage the testing/synctest package and get a fully deterministic network simulation. The trick, now implemented and available in the latest release of my network package and its simulation engine https://github.com/glycerine/rpc25519 and https://github.com/glycerine/gosimnet is to use one additional goroutine to accept and queue all channel operations "in the background". Don't try to interleave synctest.Wait with select and channel operations on the same goroutine. Its too much of a mess. More importantly, it didn't work. It was incredibly hard to get determinism out of it. I tried four different ways that did not work. They would look like they were going to work, but then under load testing I would get straggling requests that missed their previous batch. This created non-determinism, aka non reproducible simulation. That's not good. We want the determinism of DST so that any bug we find in our distributed system is instantly reproducible. If DST is a new idea, this is a great motivating conversation[1]. Instead of mixing client requests over channels with sleep/synctest.Wait logic directly, what you want to do is: buffer all client goroutine channel requests into a master event queue (MEQ) on a separate goroutine that runs completely independently of the main scheduler goroutine (the one that will sleep and call synctest.Wait). Let that background accumulator goroutine be the one with your big for/select loop to service client requests. Those requests that used to go directly to the scheduler goroutine now all get queued, and then handled in one batch once the scheduling time quantum ends. The scheduler simply sleeps for its time quantum, invokes the barrier synctest.Wait(), and, and then locks and reads out the accumulated events from the MEQ, and then unlocks the MEQ so the background goroutine will have access when the scheduler restarts the clock (with their next sleep). The scheduler sorts the accumulated batch of events using deterministic sorting criteria, dispatches them (matching sends and reads and firing timers in the network), and then deterministically orders the any newly available replies. And voila: deterministic simulation testing (DST) of network operations in Go. Enjoy. Jason [1] https://www.youtube.com/watch?v=C1nZzQqcPZw&t=936s "FoundationDB: from idea to Apple acquisition" Dave Scherer, CTO of FoundationDB and Antithesis, really motivates why they invented DST. In short, its crazy difficult to test distributed systems well in any other way. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/4dbdabc5-4221-493a-a045-fbc5e74fec73n%40googlegroups.com.
