> >> how close is thread safey? > > > > In a useful form: a fair way off. > > > > It's relatively simple to hack something together than runs. Making it > > work correctly and go fast is much harder though. My current prototype > > (running on 2 cores) runs about a quarter the speed of normal qemu, and > > dies shortly after booting because the guest atomic synchronisation > > primitives don't work right. > > This later problem seems like the hardest to solve to me. Did you have > any ideas here that don't involve hand coding the translation for atomic > instructions?
Yes. I do exclusive access locking at the TLB level. i.e. creating a TLB entry for a writable page forces that page to be flushed from all the other CPU TLBs. There's some wiggle room in the definition of a "writable" page. If other critera are met it should be sufficient to just do this exclusion for atomic accesses. If necessary the same technique can be used to avoid write ordering and coherency problems without having to accurately map guest barriers onto the equivalent host operations. This is handy when most of the guest barriers are implicit, e.g. when emulating a strictly ordered guest on a weakly ordered host. In theory this could be taken to extremes and used to split emulation of a single machine over multiple address spaces/nodes. In practice the contention on a normal SMP operating system is high enough that this is not practical. Paul