Re: [Qemu-devel] Runtime-modified DIMMs and live migration issue

Andrey Korolyov Thu, 03 Sep 2015 12:24:09 -0700

On Tue, Aug 18, 2015 at 5:51 PM, Andrey Korolyov <and...@xdel.ru> wrote:
> "Fixed" with cherry-pick of the
> 7a72f7a140bfd3a5dae73088947010bfdbcf6a40 and its predecessor
> 7103f60de8bed21a0ad5d15d2ad5b7a333dda201. Of course this is not a real
> fix as the only race precondition is shifted/disappeared by a clear
> assumption. Though there are not too many hotplug users around, I hope
> this information would be useful for those who would experience the
> same in a next year or so, until 3.18+ will be stable enough for
> hypervisor kernel role. Any suggestions on a further debug/race
> re-exposition are of course very welcomed.
>
> CCing kvm@ as it looks as a hypervisor subsystem issue then. The
> entire discussion can be found at
> https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg03117.html .


So no, the issue is still there, though appearance rate is lower. What
could be interesting, non-smp guests are affected as well, before that
I suspected that the vCPUs has been resumed in a racy manner to
trigger a memory corruption. Also the chance to hit the problem is
increased at least faster than linear with number of plugged DIMMs, at
8G total it is almost impossible to catch the issue for now (which is
better than the state of things at the beginning of this thread) and
at 16G total reproduction has a fairly high rate with active memory
operations.

Migration of the suspended VM resulting in same corruption being seen,
so it is very likely that the core analysis could reveal the root of
the issue, the problem is that I have a zero clues of what exactly
could be wrong there and how this thing could be dependent on a
machine size, if we are not taking race conditions in a view.

Re: [Qemu-devel] Runtime-modified DIMMs and live migration issue

Reply via email to