On 2017-09-05 23:55, Mike Larkin wrote:
> On Mon, Sep 04, 2017 at 12:03:31AM -0700, Carlos Cardenas wrote:
>> * Fix logic handling stopping a VM.  Prevents VMD from crashing.
>> * Add additional error code to notify the user that a vm cannot be
>>   stopped when not running.
>> * Add additional log_debug statements.
>>
> 
> See below. Also this diff has tabs vs spaces problems like the
> previous one.
> 
> If we fix the style issues and you can shed some light on the part
> I noted below, I think we can get both these diffs in.
> 
> -ml
> 
>> diff --git usr.sbin/vmctl/vmctl.c usr.sbin/vmctl/vmctl.c
>> index 64d82ca847d..5fb7fbfd74c 100644
>> --- usr.sbin/vmctl/vmctl.c
>> +++ usr.sbin/vmctl/vmctl.c
>> @@ -439,12 +439,19 @@ terminate_vm_complete(struct imsg *imsg, int *ret)
>>              vmr = (struct vmop_result *)imsg->data;
>>              res = vmr->vmr_result;
>>              if (res) {
>> -                    errno = res;
>> -                    if (res == ENOENT)
>> -                            warnx("vm not found");
>> -                    else
>> -                            warn("terminate vm command failed");
>> -                    *ret = EIO;
>> +                       switch (res) {
>> +                       case VMD_VM_STOP_INVALID:
>> +                               warnx("cannot stop vm that is not running");
>> +                               *ret = EINVAL;
>> +                               break;
>> +                       case ENOENT:
>> +                               warnx("vm not found");
>> +                               *ret = EIO;
>> +                               break;
>> +                       default:
>> +                               warn("terminate vm command failed");
>> +                               *ret = EIO;
>> +                       }
>>              } else {
>>                      warnx("sent request to terminate vm %d", vmr->vmr_id);
>>                      *ret = 0;
>> @@ -453,6 +460,7 @@ terminate_vm_complete(struct imsg *imsg, int *ret)
>>              warnx("unexpected response received from vmd");
>>              *ret = EINVAL;
>>      }
>> +    errno = *ret;
>>  
>>      return (1);
>>  }
>> diff --git usr.sbin/vmd/vmd.h usr.sbin/vmd/vmd.h
>> index 22da6d58a7b..1240339db52 100644
>> --- usr.sbin/vmd/vmd.h
>> +++ usr.sbin/vmd/vmd.h
>> @@ -54,6 +54,7 @@
>>  #define VMD_BIOS_MISSING    1001
>>  #define VMD_DISK_MISSING    1002
>>  #define VMD_DISK_INVALID    1003
>> +#define VMD_VM_STOP_INVALID 1004
>>  
>>  /* 100.64.0.0/10 from rfc6598 (IPv4 Prefix for Shared Address Space) */
>>  #define VMD_DHCP_PREFIX             "100.64.0.0/10"
>> diff --git usr.sbin/vmd/vmm.c usr.sbin/vmd/vmm.c
>> index d3233147cff..66083a5f959 100644
>> --- usr.sbin/vmd/vmm.c
>> +++ usr.sbin/vmd/vmm.c
>> @@ -169,10 +169,9 @@ vmm_dispatch_parent(int fd, struct privsep_proc *p, 
>> struct imsg *imsg)
>>                      else
>>                              res = 0;
>>              } else {
>> -                    /* Terminate VMs that are unknown or shutting down */
>> -                    vtp.vtp_vm_id = vm_vmid2id(vm->vm_vmid, vm);
>> -                    res = terminate_vm(&vtp);
>> -                    vm_remove(vm);
>> +                        log_debug("%s: cannot stop vm that is not running",
>> +                                __func__);
>> +                        res = VMD_VM_STOP_INVALID;

> 
> Is this really what we want? (Was this the part that was crashing vmd?)
> This will break (I think) stopping a vm, then while it is shutting down
> during vmmci shutdown processing, stopping it again to force kill it.
> 
> Is the problem that we are doing vm_remove unconditionally, regardless of
> the result of the previous calls? (Eg, what if vm_vmid2id or terminate_vm
> failed?)
> 
Using the previous patch in the series, this is what is observed when
attempting to stop a running vm (normal use case):
control_dispatch_imsg vid: 1, name: , uid: 0
proc_compose_imsg: about to compose_event to PROC 0
proc_compose_imsg: about to compose_event to PROC 2
vmm_dispatch_parent: recv'ed TERMINATE_VM for 1
vmm_dispatch_parent: sending shutdown request to vm 1
proc_compose_imsg: about to compose_event to PROC 0
vmd_dispatch_vmm: forwarding TERMINATE VM for 1
proc_compose_imsg: about to compose_event to PROC 1
control_close
vmm_sighdlr: handling signal
vmm_sighdlr: attempting to terminate vm 1
terminate_vm: terminating vmid 1
proc_compose_imsg: about to compose_event to PROC 0
vmm_sighdlr: calling vm_remove
vm_remove: removing vm 1 from running config
vm_remove: calling vm_stop
vm_stop: stopping vm 1
vmd_dispatch_vmm: handling TERMINATE_EVENT for 1
vmd_dispatch_vmm: about to stop vm 1
vm_stop: stopping vm 1

For the non-normal cases:
* Stopping a non-running vm:
** if vm is NULL (means vmd and control has the vm in it's list but vmm
doesn't) -> vmd crash
** if vm is not NULL but is missing a VM managed by vmm (like attempting
to stop a non-running vm that has been configured by vm.conf),
terminate_vm causes a vmd crash

vm_remove does the right thing by attempting to stop the VM first and
then clean up.

The patch above makes the assumption that a vmd instance does not
attempt to recover running VM (from a previous vmd instance) and that
the canonical vm list is maintained by vmd.

One thing that I will admit is that my vm test set for these changes
were limited, using amd64 OpenBSD 6.1 and -current as the guests and
they tended to be well behaved guests (no need for sending vmctl stop
more than once).  If my vm test set was expanded to included to have
some "ill-behaved" guests (do you have/know of any?) they might require
more tests in that area of the code (testing vm, testing vm->vm_shutdown
and vm->vm_running, testing vmid2id, etc) before sending the
VMD_VM_STOP_INVALID message.

I think it might be a good idea to rework this patch to add additional
checks for things if I'm able to add some "ill-behaved" guests that will
exercise those additional checks (if not more).

What do you think?

+--+
Carlos
>>              }
>>              cmd = IMSG_VMDOP_TERMINATE_VM_RESPONSE;
>>              break;
>> @@ -346,6 +345,8 @@ vmm_sighdlr(int sig, short event, void *arg)
>>  
>>                              vmid = vm->vm_params.vmc_params.vcp_id;
>>                              vtp.vtp_vm_id = vmid;
>> +                                log_debug("%s: attempting to terminate vm 
>> %d",
>> +                                        __func__, vm->vm_vmid);
>>                              if (terminate_vm(&vtp) == 0) {
>>                                      memset(&vmr, 0, sizeof(vmr));
>>                                      vmr.vmr_result = ret;
>> @@ -505,7 +506,7 @@ vmm_dispatch_vm(int fd, short event, void *arg)
>>   * supplied vm_terminate_params structure (vtp->vtp_vm_id)
>>   *
>>   * Parameters
>> - *  vtp: vm_create_params struct containing the ID of the VM to terminate
>> + *  vtp: vm_terminate_params struct containing the ID of the VM to terminate
>>   *
>>   * Return values:
>>   *  0: success
>> @@ -515,6 +516,7 @@ vmm_dispatch_vm(int fd, short event, void *arg)
>>  int
>>  terminate_vm(struct vm_terminate_params *vtp)
>>  {
>> +        log_debug("%s: terminating vmid %d", __func__, vtp->vtp_vm_id);
>>      if (ioctl(env->vmd_fd, VMM_IOC_TERM, vtp) < 0)
>>              return (errno);
>>  
>> -- 
>> 2.14.1
>>

Reply via email to