Am 13.11.2023 um 22:57 schrieb Johannes Berg:

So maybe for my future self and all the bystanders, I'll try to explain how I see the issue that causes patches 4, 6, 7 and 8 to be needed:
--- snip ---
Hello Johannes,
Sorry for my delayed response to your detailed email. I find it quite hard to discuss such a complex topic via mailing lists without it sounding impolite.

Maybe also as some basis for my reasoning: I'm quite familiar with discrete event-based simulation, with a special focus on SystemC simulation.There are some common (old) principles of DES that map perfectly to the time travel mode, so my mental model has always been shaped around those constraints. - Every instance/object/element in a DES has some inputs/signals that activate it either at the current moment or at some later point in time. - Every instance/object/element in a DES will run, starting from its activation time until it has "finished", without advancing the global simulation time. - Events (or activations) occurring at the exact same point in simulation time would happen in parallel. Therefore, the actual order of execution in a sequential simulator is more or less unimportant (though some implementations may require a deterministic order, ensuring simulations with the same parameters yield the exact same output, e.g., SystemC). The parallel execution of events at the same time in the simulation is an optimization that
may be implemented, but is not always utilized.


After reviewing all your analyses, I believe the most significant difference in my implementation lies in the last point. I did not enforce the order of message processing when they occur exactly at the same simulation time. Consequently, I modified my implementation to eliminate the synchronous (and, to be honest, quite hacky) read operation with special handling on the timetravel socket. Instead, I implemented a central epoll routine, which is called by my master simulation kernel (NS3). My rationale was that if I haven't received the request from the TT-protocol, I cannot advance time.


In conjunction with running not a single UML instance but many (my current use case consists of a 10-node pair == 20 UML nodes), this can create all
sorts of race/deadlock conditions, which we have identified.
For me, the ACK of vhost/virtio seemed somewhat redundant, as it provides the same information as the TT-protocol, assuming my device simulation resides within the scheduler. I must admit my assumption was incorrect, primarily because the implementation of the TT-protocol in the kernel is somewhat fragile and, most importantly, your requirement that ***nothing***is allowed to interfere (creating SIGIOs) at certain states of a time-traveling UML instance is not well-documented. We need the ACK not for an acknowledgment of registering the interrupt but to know that we are allowed to send the next TT-msg. This very tight coupling of these two protocols does not appear to be the best design or, at the very least, is poorly documented. The prohibition of interference in certain TT-states also led to my second mistake. I relaxed my second DES requirement and allowed interrupts when the UML instance is in the RUN-state. This decision was based on the impression that UML was built to work this way without TT, so why should it break when in TT-Mode (which you proved was wrong). Whether this is semantically reasonable or not can be questioned, but it triggered technical
problems with the current implementation.

With this realization, I tend to agree that maybe the whole patches to ensure thread-safe (or reentrant-safe access) of the event list might be dropped. Still, we should ensure that the SIGIO is simply processed synchronously in the idle loop. This aligns with my last DES constraint: since everything happens at the same moment in simulation time, we do not need "real" interrupts but can process interrupts (SIGIOs) later (but at the same simulation time). I think this approach only works in ext or cpu-inf mode and may be problematic in "normal" timetravel mode. I might even consider dropping the signal handler, which marks the interrupts pending, and process the signals with a signalfd, but that is, again, only an optimization. Additionally, to address the interrupt acknowledgment for the serial line, I'd like to propose this: why not add an extra file descriptor in the command line, which is something the kernel could write to, such as an eventfd or a pipe, to signal the acknowledgment of the interrupt. For example, the command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the serial line driver with timetravel mode but without that acknowledgment fd, we can emit
a warning or an error.

I believe all these changes should work well with the shared memory optimization and should make the entire time travel ext protocol a bit more robust,
easier to use, and harder to misuse. ;-)

However, even after the lengthy discussion on "when" interrupts should be processed, I still hold the opinion that the response to raising an interrupt should be immediate to the device simulation and not delayed in simulation time. This only makes the device simulation harder without real benefit. If you want to delay the interrupt handling (ISR and so on), that's still possible and in both situations highly dependent on the UML implementation. If we want to add an interrupt delay, we need to implement something in UML anyway. If you want to delay it in the device driver, you can also always do it, but you are not at the mercy of some hard-to-determine extra delay from UML.

Overall, if you think that makes sense, I could start on some patches, or perhaps you feel more comfortable doing that.


Benjamin



_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um

Reply via email to