Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts

Benjamin Beichler Mon, 20 Nov 2023 05:42:43 -0800

Am 13.11.2023 um 22:57 schrieb Johannes Berg:

So maybe for my future self and all the bystanders, I'll try toexplain how I see the issue that causes patches 4, 6, 7 and 8 to beneeded:

--- snip ---
Hello Johannes,

Sorry for my delayed response to your detailed email. I find it quitehard to discuss such a complex topic via mailing lists without itsounding impolite.

Maybe also as some basis for my reasoning: I'm quite familiar withdiscrete event-based simulation, with a special focus on SystemCsimulation.There aresome common (old) principles of DES that map perfectly to the timetravel mode, so my mental model has always been shaped around thoseconstraints.- Every instance/object/element in a DES has some inputs/signals thatactivate it either at the current moment or at some later point in time.- Every instance/object/element in a DES will run, starting from itsactivation time until it has "finished", without advancing the globalsimulation time.- Events (or activations) occurring at the exact same point insimulation time would happen in parallel. Therefore, the actual order ofexecution in asequential simulator is more or less unimportant (though someimplementations may require a deterministic order, ensuring simulationswith the sameparameters yield the exact same output, e.g., SystemC). The parallelexecution of events at the same time in the simulation is anoptimization that

may be implemented, but is not always utilized.

After reviewing all your analyses, I believe the most significantdifference in my implementation lies in the last point. I did notenforce the orderof message processing when they occur exactly at the same simulationtime. Consequently, I modified my implementation to eliminate thesynchronous(and, to be honest, quite hacky) read operation with special handling onthe timetravel socket. Instead, I implemented a central epoll routine, whichis called by my master simulation kernel (NS3). My rationale was that ifI haven't received the request from the TT-protocol, I cannot advance time.

In conjunction with running not a single UML instance but many (mycurrent use case consists of a 10-node pair == 20 UML nodes), this cancreate all

sorts of race/deadlock conditions, which we have identified.

For me, the ACK of vhost/virtio seemed somewhat redundant, as itprovides the same information as the TT-protocol, assuming my devicesimulationresides within the scheduler. I must admit my assumption was incorrect,primarily because the implementation of the TT-protocol in the kernel issomewhat fragile and, most importantly, your requirement that***nothing***is allowed to interfere (creating SIGIOs) at certain statesof a time-travelingUML instance is not well-documented. We need the ACK not for anacknowledgment of registering the interrupt but to know that we areallowed to send thenext TT-msg. This very tight coupling of these two protocols does notappear to be the best design or, at the very least, is poorly documented.The prohibition of interference in certain TT-states also led to mysecond mistake. I relaxed my second DES requirement and allowedinterrupts whenthe UML instance is in the RUN-state. This decision was based on theimpression that UML was built to work this way without TT, so why should itbreak when in TT-Mode (which you proved was wrong). Whether this issemantically reasonable or not can be questioned, but it triggered technical

problems with the current implementation.

With this realization, I tend to agree that maybe the whole patches toensure thread-safe (or reentrant-safe access) of the event list might bedropped.Still, we should ensure that the SIGIO is simply processed synchronouslyin the idle loop. This aligns with my last DES constraint: since everythinghappens at the same moment in simulation time, we do not need "real"interrupts but can process interrupts (SIGIOs) later (but at the samesimulation time).I think this approach only works in ext or cpu-inf mode and may beproblematic in "normal" timetravel mode. I might even consider droppingthe signal handler,which marks the interrupts pending, and process the signals with asignalfd, but that is, again, only an optimization.Additionally, to address the interrupt acknowledgment for the serialline, I'd like to propose this: why not add an extra file descriptor in thecommand line, which is something the kernel could write to, such as aneventfd or a pipe, to signal the acknowledgment of the interrupt. Forexample, thecommand line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the serialline driver with timetravel mode but without that acknowledgment fd, wecan emit

a warning or an error.

I believe all these changes should work well with the shared memoryoptimization and should make the entire time travel ext protocol a bitmore robust,

easier to use, and harder to misuse. ;-)

However, even after the lengthy discussion on "when" interrupts shouldbe processed, I still hold the opinion that the response to raising aninterruptshould be immediate to the device simulation and not delayed insimulation time. This only makes the device simulation harder withoutreal benefit. Ifyou want to delay the interrupt handling (ISR and so on), that's stillpossible and in both situations highly dependent on the UMLimplementation. Ifwe want to add an interrupt delay, we need to implement something in UMLanyway. If you want to delay it in the device driver, you can also always doit, but you are not at the mercy of some hard-to-determine extra delayfrom UML.

Overall, if you think that makes sense, I could start on some patches,or perhaps you feel more comfortable doing that.



Benjamin



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts

Reply via email to