Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]

John Snow Fri, 18 Dec 2015 13:26:19 -0800


On 12/18/2015 01:07 PM, Eric Blake wrote:
> On 12/16/2015 05:50 PM, John Snow wrote:
>> In working through a prototype to enable multiple block jobs. A few
>> problem spots in our API compatibility become apparent.
>>
> 
>> ==== qmp_query_block_jobs ====
>>
> 
>> Let's consider using a new command. I'm not sure if this is strictly
>> possible with current QAPI, but Eric will yell at me if it isn't --
>> forgive the wiggly pseudo-specification.
>>
>> query-jobs
>>
>> Will return a list of all jobs.
>>
>> query-jobs sys="block"
>>
>> Will return a list of all block jobs. This will be the only valid
>> subsystem to start with -- we don't have non-block jobs yet and it may
>> be some time before we do.
>>
>> Each subsystem will have a sys= enumeration, and the jobs for that
>> particular subsystem can subclass the default return type. The QAPI
>> commands can be defined to accept a union of the subclasses so we don't
>> have to specify in advance which type each command accepts, but we can
>> allow the responsible subsystem to interpret it on demand dynamically.
>>
>> The base type can be something along the lines of:
>>
>> { 'struct': 'JobInfo',
>>   'data': {
>>     'type': JobTypeEnum,
>>     'sys': JobSysEnum,
>>     'id': str,
>>     'busy': bool,
>>     'paused': bool,
>>     'ready': bool
>>   }
>> }
>>
>> And the BlockJobInfo can inherit/extend, looking like this:
>>
>> { 'struct': 'BlockJobInfo',
>>   'base': JobInfo
>>   'data': {
>>     'len': int,
>>     'offset': int,
>>     'speed': 'int',
>>     'io-status': BlockDeviceIoStatus,
>>     'devices': <see below>,
>>   }
>> }
> 
> Done in QAPI via flat unions.  All the common fields, including the
> discriminator 'sys', are present in the base type, and then each
> subclass of a job adds additional parameters to the QMP type by
> specifying their subtype as a branch of the union.  So should be doable.
> 
>>
>> 'devices' will now report a more nuanced view of this Job's associated
>> nodes in our graph, as in the following:
>>
>> { "devices": [
>>   { "name": "drive0",
>>     "nodes": [
>>       { "id": "#node123",
>>         "permissions": "+RW-X"
>>       },
>>       { "id": "#node456",
>>         "permissions": "+R"
>>       }
>>     ]
>>   }
>> }
>>
>> This lets us show exactly what nodes we are touching, what BlockBackends
>> they belong to, and what permissions are involved. Unambiguously. You
>> can use your own imagination for a permissions string that will make
>> sense -- This depends on Jeff Cody's work primarily.
> 
> We'd probably want it to look more JSON-y, maybe something like:
> 
> { "id": "#node456",
>   "permissions": [ { "name":"read", "mode":"require" },
>                    { "name":"write", "mode":"allow" },
>                    { "name":"meta", "mode":"unspecified" } ] }
> 
> but I agree that how it is represented does not affect the proposal here
> of merely representing a list of all associated BDS affected by this
> job, broken out by permissions per BDS (as some jobs will affect
> multiple BDS, and not all of them with the same permissions).
>


And as not shown here, some may affect multiple devices, so this should
give the full granularity.

The more JSON-y permissions is fine, also. It will likely evolve to
match whatever Jeff Cody needs for his fine-grained op blocker idea.

>>
>> The new command gives a chance to make a clean break and any users of
>> this new command won't be confused about the information they receive.
>>
>> The legacy qmp-query-block-jobs command can continue to operate
>> providing a best-effort approximation of the data it finds. The legacy
>> command will *abort* and return error if it detects any job that was
>> launched by a new-style command, e.g. if it detects there is more than
>> one job attached to any node under a single BlockBackend -- clearly the
>> user has been using new features -- and should abort announcing this
>> command is deprecated.
> 
> Makes sense. If an old client only uses the old commands, things will
> still work; while a new client, once converted to use the new commands,
> should do the conversion completely and not rely on the old view.
> 

Yes. All or nothing. I prefer this approach to an incomplete report. I
shouldn't have used "best-effort" above; I should have said:

"The legacy command will report back jobs if it has sufficient
resolution to do so." e.g. no conflicting device or BDS id results to
the query.

I don't very much like the idea of a query command that can give you
incomplete data, especially to an unwitting querier.

>>
>> If the graph looks reasonably plain, it can report back the information
>> it always has, except that it will now also report the "id" field in
>> addition to be perfectly unambiguous about how to issue commands to this
>> job, like I outlined in the first paragraph of this section.
> 
> Old clients won't care about the new information, but it doesn't hurt to
> supply it, especially if it lets us share code with the new query command.
> 

That's the thought. The wrapper will investigate the list for
suitability and terminate if it finds duplicate BB/BDS IDs.

Though, I'll need to fill in the legacy device field anyway, so I could
just strip out the "devices" graph

>>
>>
>> ==== block-job-cancel ====
>>
>> This command can remain as a legacy command. Provided the "device"
>> provided has precisely one job attached, we can allow this command to
>> cancel it. If more complicated configurations are found, abort and
>> encourage the user to use a new API.
> 
> Again, won't affect old client usage patterns, and new clients should
> make the lock-step conversion to using only the new interface.  Sounds
> reasonable.
> 
>>
>> We can refuse to augment block-job-cancel; forcing users who want to
>> specify IDs to cancel to use the new API.
>>
>> We can add a new "job-cancel" command which removes the block specificity.
> 
> Adding a new job-cancel sounds right.
> 
>>
>> We can introduce sub-classed arguments as needed for e.g. BlockJob
>> cancellations:
>>
>> { 'command': 'job-cancel',
>>   'data': { 'id': str,
>>             '*force': bool,
>>             'options': JobCancelOpts
>>   }
>> }
>>
>> The idea is that JobCancelOpts would be a subclassable type that can be
>> passed directly to the subsystem's implementation of the cancel
>> operation for further interpretation, so different subsystems can get
>> relevant arguments as needed. We don't currently have any such need for
>> BlockJobCancelOpts, but it can't hurt to be prepared for it.
> 
> To be subclassable, we need a flat union, and the discriminator must be
> non-optional within that union.  Either 'options' is that flat union
> (and we have a layer of {} nesting in QMP}, or we directly make the
> 'data' of job-cancel be the flat union (no need for an "options":{}
> nesting in QMP); the latter still depends on some of my pending patches,
> but we'll get there in plenty of time.
> 

Ah, shame. It would have been nice to delay interpretation of the union
pending the result of the Job lookup.

If the discriminator must always be present, though, then so be it.

I am fine with either approach; the approach outlined above was a hope
to delay the evaluation of the union to allow for type resolution based
on the ID. If that's a non-starter, I'm fine with the flat union:

{ 'id': str,
  'sys': str,
  '*force': bool,
  <subclass options>
}

where if the 'sys' of the job you find is not the sys you supplied, then
it errors out with "Wrong subsystem ID provided for job '#job123',
expected 'blk', given 'abc'" or so.

>>
>> The type can be interpreted through the lens of whichever type the job
>> we've identified expects to see. (Block Jobs expect to see
>> BlockJobCancelOpts, etc.)
>>
>>
>> ==== block-job-pause, block-job-resume, block-job-complete ====
>>
>> Same story as above. Introduce job-pause, job-resume and job-complete
>> with subclassible job structures we can use for the block subsystem.
>>
>> Legacy commands continue to operate only as long as the "device"
>> argument is unambiguous to decipher.
> 
> Seems reasonable. As you surmised, I'm willing to help come up with the
> proper QAPI representation, if that becomes a sticking point in the design.
> 

I'll give it a college try and send some actual patches for x-job-query,
x-job-cancel, etc.

For the flat union support on the top level as hinted for job-cancel,
what branch of yours should I develop on top of?

>>
>>
>> ==== block-job-set-speed ====
>>
>> This one is interesting since it definitely does apply only to Block Jobs.
>>
>> We can augment it and limp it along, allowing e.g.
>>
>> { 'data':
>>   { '*device': 'str',
>>     '*id': 'str',
>>     'speed': 'int'
>>   }
>> }
>>
>> Where we follow the "One of ID or Device but not both nor either"
>> pattern that we've applied in the past.
> 
> Or even "at least one of ID or Device, and if you pass both, they must
> both resolve to the same object".  But "exactly one" works - old clients
> will pass "device", and it will always resolve (because they never used
> the new API to confuse things); new clients will pass only "id" and it
> will be the right thing.
> 

Less work and never potentially ambiguous.

>>
>> Or, we can say "If there isn't one job per device, reject the command
>> and use the new API"
>>
>> where the new API is:
>>
>> job-set-option :
>>   { 'id': str,
>>     'options': {
>>       '*etc': ...,
>>       '*etc2': ...,
>>     }
>>   }
> 
> That has a bit more flexibility, especially if we add new options for
> other commands, more than just block-job speed.
> 
>>
>> Similar to the other commands, the idea would be to take the subclassed
>> options structure that belongs to that Job's Subclass (e.g.
>> BlockJobOptions) but allow any field inside to be absent.
>>
>> Then we could have commands like this:
>>
>> job-set-option \
>> arguments={ 'id': '#job789',
>>             'options': {
>>               'speed': 10000
>>             }
>> }
>>
>> And so on. These would remain a flexible way of setting any various
>> post-launch options a job would need, and the relevant job-subsystem can
>> be responsible for rejecting certain options, etc.
> 
> Keeping type-safety via a flat union may require it to look more like:
> 
> job-set-option \
> arguments={ 'id': '#job789',
>             'sys': 'block',
>             'speed': 10000
>           }
> 
> where the use of the 'sys' discriminator tells what other fields are
> being supplied, and we can avoid the "options":{} nesting.  Then we'd
> need a sanity check in the code that the 'sys' requested by
> job-set-option matches the sys that owns '#job789', and fail if they differ.
> 

Yes, I was assuming again we could resolve the job first and then
interpret the data, but if it's easiest to always require the
discriminator (instead of having to /look/ for it dynamically), then
your outline is perfectly acceptable to me.

>>
>> I believe that covers all the existing jobs interface in the QMP, and
>> that should be sufficiently vague and open-ended enough to behave well
>> with a generic jobs system if we roll one out in the future.
>>
>> Eric, Markus: Please inform me of all the myriad ways my fever dreams
>> for QAPI are wrong. :~)
> 
> At this stage, your concepts seem reasonable, even if the concrete way
> of representing a subclass in qapi is not quite spelled out.
> 

Thanks!

I'll get rolling on some x-versions to mix alongside my multi-block-job
patches.

--js

Re: [Qemu-devel] Jobs 2.0 QAPI [RFC]

Reply via email to