On Wed, 27 Feb 2019 01:35:01 +0530 Kirti Wankhede <kwankh...@nvidia.com> wrote:
> Alex, > > On 2/22/2019 3:53 AM, Alex Williamson wrote: > > On Wed, 20 Feb 2019 02:53:16 +0530 > > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > >> - Defined MIGRATION region type and sub-type. > >> - Used 2 bits to define VFIO device states. > >> Bit 0 => 0/1 => _STOPPED/_RUNNING > >> Bit 1 => 0/1 => _RESUMING/_SAVING > >> Combination of these bits defines VFIO device's state during migration > >> _RUNNING => Normal VFIO device running state. > >> _STOPPED => VFIO device stopped. > >> _SAVING | _RUNNING => vCPUs are running, VFIO device is running but > >> start > >> saving state of device i.e. pre-copy state > >> _SAVING | _STOPPED => vCPUs are stoppped, VFIO device should be > >> stopped, and > >> save device state,i.e. stop-n-copy state > >> _RESUMING => VFIO device resuming state. > > > > Shouldn't we have a non-_RESUMING/_SAVING run state? If these are > > indicating directly flow, maybe we need two bits: > > > > 00b - None, normal runtime > > 01b - Saving > > 10b - Resuming > > 11b - Invalid/reserved (maybe a Failed state indicator) > > > > There has to be 2 more states: > _SAVING | _RUNNING => SAVING while device is RUNNING - pre-copy phase > _SAVING | _STOPPED => SAVING while device is STOPPED - stop-and-copy phase > > So the 2 bits used in this patch are: > 00b - _RESUMING > 01b - _RUNNING - Normal Running > 10b - _SAVING | _STOPPED - stop-and-copy phase > 11b - _SAVING | _RUNNING - pre-copy phase Not sure if the "use two bits" approach is the most elegant here -- rightmost bit unset can mean either _RESUMING or _STOPPED... What about simply making this four distinct states? #define _RESUMING 0 #define _RUNNING 1 //or the other way around? #define _SAVING_STOPPED 2 #define _SAVING_RUNNING 3 You could still check if you're currently SAVING by ANDing with 10b. (...) > >> + * > >> + * data_offset: (read/write) > >> + * User application should read data_offset in migration region from > >> where > >> + * user application should read data during _SAVING state. > >> + * User application would write data_offset in migration region from > >> where > >> + * user application is had written data during _RESUMING state. > > > > Why wouldn't data_offset simply be fixed? Do we really want to support > > the user writing a arbitrary offsets in the migration region? Each > > chunk can simply start at the kernel provided data_offset. > > > > data_offset differs based on region is trapped on mapped. In case when > region is mmaped, QEMU writes data to mapped region and then write > data_offset field, indicating data is present in mmaped buffer. Read > below for more detailed steps. > > >> + * > >> + * data_size: (write only) > >> + * User application should write size of data copied in migration > >> region > >> + * during _RESUMING state. > > > > How much does the user read on _SAVING then? > > > > pending_bytes tells bytes that should be read on _SAVING. > > >> + * > >> + * start_pfn: (write only) > >> + * Start address pfn to get bitmap of dirty pages from vendor driver > >> duing > >> + * _SAVING state. > >> + * > >> + * page_size: (write only) > >> + * User application should write the page_size of pfn. > >> + * > >> + * total_pfns: (write only) > >> + * Total pfn count from start_pfn for which dirty bitmap is > >> requested. > > > > So effectively the area that begins at data_offset is dual purpose > > (triple purpose vs Yan's, config, memory, and dirty bitmap) I'm wondering whether splitting it would combine the best of the two approaches: Just having an optional dirty bitmap region would make it more obvious if dirty page tracking is not available.