Hello Avi,

About this VMCSINFO patch, we really need this functionality in our development.
And YOSHIDA Masanori(masanori.yoshida...@hitachi.com), the developer from 
Hitachi,
has said they need this too. So could you please tell us why the patch is 
unacceptable?
You dislike the whole export-VMCSINFO-thing in all, or you just dislike the way
we implement the path? Finally do you have any suggestion about all this?

Below is why we need this patch and how we will use this patch in our 
development.

We once came to an abnormal situation: a host scheduler bug caused guest 
machine's
vcpu stopped for a long time and then led to heartbeat stop (host is still 
running).
     
We want to have an efficient way to make the bug analysis when we come to the 
similar
situations where guest machine doesn't work well due to something of host 
machine's.
Actually, these situations have happened many times, in particular, under 
development.
  
So here comes the requirement:
If we want to find the root cause, we should debug both host machine's and guest
machine's sides. But first we should get both host machine's crash dump and 
guest
machine's crash dump and they must be dumped at the same time when the abnormal
situation remains. So the only way to do this is to panic the host with the 
abnormal
guest running on it and then the guest's image is contained in host's crash 
dump.

Logically, retrieving guest's crash dump from the host's crash dump is the very
important step to accomplish our goal. Unfortunately, in kvm implementation, 
some
registers' values of the guest are hidden in vmcs, and vmcs internal is hidden 
by
Intel. If we could not retrieve these registers from the vmcs, the guest crash 
dump
we make is incomplete, and some key information is lost when we analyse the 
guest
crash dump. 

So we make this patch to export the vmcs internal. With the patch applied, we
could write registers' values stored in vmcs into guest's crash dump. And that's
what we want.
  
If a bug was found on customer's environment, we have two ways to avoid
affecting other guest machines running on the same host. First, we could do bug
analysis on another environment to reproduce the buggy situation; Second, we
could migrate other guest machines to other hosts.

After the abnormal situation is reproduced, we panic the host *manually*.
Then we could use userland tools to get guest machine's crash dump from host 
machine's
with the feature provided by this patch. Finally we could analyse them 
separately
to find which side causes the problem.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to