在 2010年7月23日 下午5:13,Stefan Hajnoczi <stefa...@gmail.com> 写道: > 2010/7/23 Alexander Graf <ag...@suse.de>: >> >> On 23.07.2010, at 09:53, Jan Kiszka wrote: >> >>> wang Tiger wrote: >>>> 在 2010年7月22日 下午11:47,Stefan Hajnoczi <stefa...@gmail.com> 写道: >>>>> 2010/7/22 wang Tiger <tigerwang1...@gmail.com>: >>>>>> In our implementation for x86_64 target, all devices except LAPIC are >>>>>> emulated in a seperate thread. VCPUs are emulated in other threads >>>>>> (one thread per VCPU). >>>>>> By observing some device drivers in linux, we have a hypothethis that >>>>>> drivers in OS have already ensured correct synchronization on >>>>>> concurrent hardware accesses. >>>>> This hypothesis is too optimistic. If hardware emulation code assumes >>>>> it is only executed in a single-threaded fashion, but guests can >>>>> execute it in parallel, then this opens up the possibility of race >>>>> conditions that malicious guests can exploit. There needs to be >>>>> isolation: a guest should not be able to cause QEMU to crash. >>>> >>>> In our prototype, we assume the guest behaves correctly. If hardware >>>> emulation code can ensure atomic access(behave like real hardware), >>>> VCPUS can access device freely. We actually refine some hardward >>>> emulation code (eg. BMDMA, IOAPIC ) to ensure the atomicity of >>>> hardware access. >>> >>> This approach is surely helpful for a prototype to explore the limits. >>> But it's not applicable to production systems. It would create a huge >>> source of potential subtle regressions for other guest OSes, >>> specifically those that you cannot analyze regarding synchronized >>> hardware access. We must play safe. >>> >>> That's why we currently have the global mutex. Its conversion can only >>> happen step-wise, e.g. by establishing an infrastructure to declare the >>> need of device models for that Big Lock. Then you can start converting >>> individual models to private locks or even smart lock-less patterns. >> >> But isn't that independent from making TCG atomic capable and parallel? At >> that point a TCG vCPU would have the exact same issues and interfaces as a >> KVM vCPU, right? And then we can tackle the concurrent device access issues >> together. > > An issue that might affect COREMU today is core QEMU subsystems that > are not thread-safe and used from hardware emulation, for example: > > cpu_physical_memory_read/write() to RAM will use qemu_get_ram_ptr(). > This function moves the found RAMBlock to the head of the global RAM > blocks list in a non-atomic way. Therefore, two unrelated hardware > devices executing cpu_physical_memory_*() simultaneously face a race > condition. I have seen this happen when playing with parallel > hardware emulation. > > Tiger: If you are only locking the hardware thread for ARM target, > your hardware emulation is not safe for other targets. Have I missed > something in the COREMU patch that defends against this problem? > > Stefan > In fact, we solve this problem through a really simple method. In our prototype, we removed this piece of code like this: void *qemu_get_ram_ptr(ram_addr_t addr) { ......
/* Move this entry to to start of the list. */ #ifndef CONFIG_COREMU /* Different core can access this function at the same time. * For coremu, disable this optimization to avoid data race. * XXX or use spin lock here if performance impact is big. */ if (prev) { prev->next = block->next; block->next = *prevp; *prevp = block; } #endif return block->host + (addr - block->offset); } CONFIG_COREMU is defined when TCG parallel mode is configured. And the list is more likely to be read only without hotplug device, so we don't use a lock to protect it. Reimplement this list with a lock free list is also reasonable, but seems unnecessary. :-) -- Zhaoguo Wang, Parallel Processing Institute, Fudan University Address: Room 320, Software Building, 825 Zhangheng Road, Shanghai, China tigerwang1...@gmail.com http://ppi.fudan.edu.cn/zhaoguo_wang