陳韋任 wrote: > > As x86 doesn't use or need barrier instructions, when translating x86 > > to (say) run on ARM host, multi-threaded code that needs barriers > > isn't easy to detect, so barriers may be required between every memory > > access in the generated ARM code. > > Sounds awful to me. Regardless current QEMU's support for multi-threaded > application, it's possible to emulate a architecture with stronger memory > model on a weaker one?
It's possible, unfortunately those barriers tends to be quite expensive and they are needed often, so it would run slowly. Probably a lot slower than using a single host thread with preemption to simulate multiple guest CPUs. But someone should try it and find out. It might be possible to do some deep analysis of the guest to work out which memory accesses don't need barriers, but it's a hard research problem with no guarantee of a good solution. One strategy which comes to mind is simulated MESI or MOESI (cache coherency protocols) at the page level, so independent guest threads never have unsynchronised access to the same page. Or at finer granularity, with more emulation overhead (but still maybe less than barriers). Another is software transactional memory techniques. Neither will run system software at great speed, but certain kinds of mostly-independent processing, for example a guest running mainly userspace number crunching in independent processes, might work alright. -- Jamie