Michal Privoznik <mpriv...@redhat.com> writes: > On 8/29/19 3:12 PM, Eric Blake wrote: >> On 8/29/19 8:04 AM, Michal Privoznik wrote: >> >>>>> A bit of background: up until very recently libvirt used qemu-ga >>>>> in all or nothing way. It didn't care why a qemu-ga command >>>>> failed. But very recently a new API was introduced which >>>>> implements 'best effort' approach (in some cases) and thus >>>>> libvirt must differentiate between: {CommandNotFound, >>>>> CommandDisabled} and some generic error. While the former classes >>>>> mean the API can issue some other commands the latter raises a >>>>> red flag causing the API to fail. >>>> >>>> Why do you need to distinguish CommandNotFound from CommandDisabled? >>> >>> I don't. That's why I've put them both in curly braces. Perhaps this >>> says its better: >>> >>> switch (klass) { >>> case CommandNotFound: >>> case CommandDisabled: >>> /* okay */ >>> break; >>> >> >> So the obvious counter-question - why not use class CommandNotFound for >> a command that was disabled, rather than readding another class that has >> no distinctive purpose? >> >> > > Because disabling a command is not the same as nonexistent > command. While a command can be disabled by user/sysadmin, they are > disabled at runtime by qemu-ga itself for a short period of time > (e.g. on FS freeze some commands are disabled - typically those which > require write disk access). And I guess reporting CommandNotFound for > a command that does exist only is disabled temporarily doesn't reflect > the reality, does it? > > On the other hand, CommandNotFound would fix the issue for libvirt, so > if you don't want to invent a new error class, then that's the way to > go.
I'm fine with changing the error to CommandNotFound. I'm reluctant to add back CommandDisabled. I doubt it's necessary. To arrive at an informed opinion, I had to figure out how this command disablement stuff works. I can just as well send it out, so here goes. Let's review our command disable feature. Commands are enabled on registration, see qmp_register_command(). To disable, call qmp_disable_command(). Only qga/main.c does, in two places: * ga_disable_non_whitelisted(): disable all commands except for ga_freeze_whitelist[], which is documented as /* commands that are safe to issue while filesystems are frozen */ * initialize_agent(): disable blacklisted commands. I figure these are the ones blacklisted with -b, plus commands blacklisted due to build configuration. The latter feels inappropriate; we should use QAPI schema conditionals to compile them out instead (QAPI conditionals didn't exist when the blacklisting code was written). Disabled commands can be re-enabled with qmp_enable_command(). Only qga/main.c does, in ga_enable_non_blacklisted(). I figure it re-enables the commands ga_disable_non_whitelisted() disables. Gets called when guest-fsfreeze-freeze freezes nothing[1], and when guest-fsfreeze-thaw succeeds[2]. Command dispatch fails when the command is disabled, in do_qmp_dispatch(). The proposed patch changes the error reply. QGA's guest-info shows whether a command is disabled (GuestAgentCommandInfo member @enabled, set in qmp_command_info()). QMP's query-commands skips disabled commands, in query_commands_cb(). Dead, as nothing ever disables QMP commands. Skipping feels like a bad idea anyway. Analysis: There are three kinds of disabled commands: compile-time (should be compiled out instead), permanently blacklisted with -b, temporarily disabled while filesystems are frozen. There are two states: thawed (first two kinds disabled) and frozen (all three kinds disabled). Command guest-fsfreeze-freeze[3] goes to state frozen or else fails. Command guest-fsfreeze-thaw goes to state thawed or else fails. guest-fsfreeze-status reports the state. Note that the transition to frozen (and thus the temporary command disablement) is under the control of the QGA client. There is no TOCTTOU between guest-info telling you which commands are disabled and executing the next command. My point is: the client can figure out whether a command is disabled before executing it. Of course, that doesn't mean we should make it figure it out. [1] POSIX only, WTF? [2] Except for execute_fsfreeze_hook(), which can still fail the command on POSIX, WTF? [3] guest-fsfreeze-freeze's doc comment notes "The frozen state is limited for up to 10 seconds by VSS." Sounds like some spontaneous transition back to thawed. If this is actually true, GAState member @frozen is not updated to reflect the spontaneous thaw. WTF?