Peter Xu <pet...@redhat.com> writes:

> On Wed, Oct 11, 2023 at 04:21:02PM +0200, Markus Armbruster wrote:
>> > I'll have another look at the schema change, and how the types are used.
>> 
>> Schema changes:
>> 
>> 1. Change MigrationParameters members @tls-creds, @tls-hostname,
>>    @tls-authz from 'str' to 'StrOrNull'
>> 
>> 2. Replace MigrateSetParameters by MigrationParameters.
>> 
>>    No change, since they are identical after change 1.
>> 
>> To determine the patch's impact, we need to examine uses of
>> MigrationParameters members @tls-FOO before the patch.  These are:
>> 
>> * Return type of query-migrate-parameters
>> 
>>   Introspection shows the type change: the type's set of values now
>>   includes JSON null.
>> 
>>   Is JSON null possible?  See [*] below.
>> 
>> * migrate_params_init()
>> 
>>   Before the patch, we initialize to "".
>> 
>>   Afterwards, we initialize to "" wrapped in a StrOrNull.
>> 
>>   The initial value means "off" before and after.
>> 
>> * migrate_params_check()
>> 
>>   An error check gets updated.  Ignoring for now.
>> 
>> * migrate_params_test_apply()
>> 
>>   Function deleted in the patch, but you wrote that's wrong.  Ignoring
>>   for now.
>> 
>> * migrate_params_apply()
>> 
>>   Duplicates the three parameters from argument @parameters into the
>>   migration object's member parameters.
>> 
>>   Argument @parameters comes from QMP via command
>>   migrate-set-parameters.  Before the patch,
>>   qmp_migrate_set_parameters() maps JSON null values to "".  Afterwards,
>>   it passes the values verbatim.
>> 
>>   Parameters stored in the migration object before and after the patch:
>> 
>>   - When initialized and never changed: char * "", and StrOrNull
>>     QTYPE_QSTRING "".
>> 
>>   - When set to non-empty string with migrate-set-parameters or
>>     equivalent: that non-empty string, and QTYPE_QSTRING with that
>>     non-empty string.
>> 
>>   - When reset with migrate-set-parameters with value "": "", and
>>     QTYPE_QSTRING "".
>> 
>>   - When reset with migrate-set-parameters with value null: "", and
>>     QTYPE_QNULL.
>> 
>>   Note that there's now a difference between passing "" and null to
>>   migrate-set-parameters: the former results in value QTYPE_QSTRING "",
>>   the latter QTYPE_QNULL.  Both values mean "off".  I hate this.  I very
>>   much want a single C representation of "off".
>
> Yes.
>
> One option to avoid such ugliness is we keep using strings for tls fields,
> then the only OFF is empty string.  It is not perfect either, but we need
> to support empty string anyway as OFF.  That's the simplest to me.
>
> We also have the benefit of decoupling this series from the qapi string
> visitor issue, which means I'll just revive v1 which still doesn't use
> StrOrNull.

Another option is to keep the schema as is, but normalize the two "off"
values to just one value on assignment.

>> 
>> * MigrationState member @parameters.
>> 
>>   Uses:
>> 
>>   - Properties "tls-creds", "tls-hostname", "tls-authz"
>> 
>>     These are externally accessible with -global.  The additional null
>>     value is not accessible there: string input visitor limitation.  It
>>     could become accessible depending on how we fix the crash bugs
>>     related to that limitation, but we can worry about that when we do
>>     it.
>> 
>>     Digression: why do these properties even exist?  I believe we
>>     created the "migration" (pseudo-)device just so we can use "compat
>>     props" to apply machine- and accelerator-specific configuration
>>     tweaks.  We then added configuration for *all* configuration
>>     parameters, not just the ones that need tweaking.  The external
>>     exposure of properties via -global is not something we wanted, it
>>     just came with the part we wanted (compat props).  Accidental
>>     external interface.  Ugh.
>
> IIRC both of them used to be the goals: either allow compat properties for
> old machine types, or specify migration parameters in cmdline for easier
> debugging and tests.  I use "-global" in mostly every migration test script
> after it's introduced.

You use -global just because it's easier than using monitor commands,
correct?

>>     None of the tls-FOO are tweaked via compat props, so no worries
>>     there.
>> 
>>     I believe property access with qom-get and qom-set is not possible,
>>     because the migration object is not part to the QOM tree, and
>>     therefore is not reachable via any QOM path.  Aside: feels like
>>     abuse of QOM.
>
> Yes slightly abused, as the comment shows..
>
> static const TypeInfo migration_type = {
>     .name = TYPE_MIGRATION,
>     /*
>      * NOTE: TYPE_MIGRATION is not really a device, as the object is
>      * not created using qdev_new(), it is not attached to the qdev
>      * device tree, and it is never realized.
>      *
>      * TODO: Make this TYPE_OBJECT once QOM provides something like
>      * TYPE_DEVICE's "-global" properties.
>      */
>     .parent = TYPE_DEVICE,
>
> It's just that current code works mostly as expected without much churns
> elsewhere.. so the TODO is not really getting much attention..

I see a number of abuses:

1. We're using TYPE_DEVICE for something that clearly isn't a device.

2. The device is not connected to the qdev device tree rooted at the
main system bus.  I'm not sure this qualifies as abuse.  The connection
the device tree supports is plugging into a bus, which bus-less devices
cannot do.  Instead, they are commonly part of other devices.  There
might be exceptions.  The qdev device tree feels like a relic of the
pre-QOM past.

3. We're keeping the device around unrealized.  No written contract
forbids this, but it's highly irregular.

4. The object is not connected to the QOM composition tree.  No written
contract forbids this, but it's highly irregular, especially for an
object whose purpose is to provide an external configuration interface.

The comment is about the abuse of qdev (items 1 to 3).  Even with the
TODO comment fully addressed, we'd still abuse QOM (item 4).

>> 
>>     It's also not part of the device tree rooted at the main system bus,
>>     which means it isn't visible in "info qtree".  It is visible in
>>     "info qdm", "device_add migration,help", and "-device
>>     migration,help".  Output of the latter two changes.  All harmless.
>> 
>>     I *think* that's all.
>> 
>>   - migrate_tls(), migrate_tls_authz(), migrate_tls_creds(),
>>     migrate_tls_hostname()
>> 
>>     Before the patch, these return the respective migration parameter
>>     directly.  I believe the value is never NULL.  Value "" is special
>>     and means "off".
>> 
>>     After the patch, these return the respective migration parameter
>>     when it's a non-empty QTYPE_QSTRING, else NULL.  Value NULL means
>>     off.
>> 
>>     Note this maps both C representations of "off" to NULL.
>> 
>>     This changes the return value for "off" from "" to NULL.
>>     Improvement, because it results in a more pleasant "is off" check.
>
> Maybe more than pleasant. :) Internally NULL is a must (even if empty
> string), to make things like "if (tls_authz)" work.

Three possible interfaces, and the resulting tests:

* Non-empty string when on, "" when off:
  if (*tls_authz)

* Non-empty string when on, NULL when off:
  if (tls_authz)

* Non-empty string when on, NULL or "" when off:
  if (tls_authz && *tls_authz)
  
All three work equally well, but the second one is more pleasant than
the others.

>>   - qmp_query_migrate_parameters()
>> 
>>     The three tls_FOO get duplicated into the return value.
>> 
>>     Looks like the two different C representations of "off" bleed into
>>     QMP (ugh!), and [*] JSON null is possible (incompatible change).
>> 
>> * hmp_info_migrate_parameters()
>> 
>>   The two different C representations of "off" are first mapped to NULL
>>   with str_from_StrOrNull(), and then mapped to "" with a ?: operator.
>>   Works.
>> 
>> Bottom line:
>> 
>> * Affected external interfaces:
>> 
>>   - query-migrate-parameters: can now return either "" or null when TLS
>>     is off.  null is an incompatible change.  Needs fixing.
>
> If we want to seriously keep using StrOrNull as the interface, I think we
> need to allow both NULL or "" to exist?  Because the "logical goal" is to
> use NULL but "" for compatibility?

The interface was ill-conceived from the start.  Probably so it could be
shoehorned more easily into existing migraion configuration
infrastructure.

We have three syntactically independent parameters: @tls-creds,
@tls-hostname, @tls-authz.  The latter two are meaningless (and silently
ignored) unless the first one is given.  I hate that.

TLS is off by default.  To enable it, set @tls-creds.  Except setting it
to the otherwise invalid value "" disables it.  That's because all we
have for configuration is the "set migration parameter to value"
interface.  Only works because we have an invalid value we can abuse.  I
hate that.

I found this interface days before the freeze, too late to replace it by
a saner one.

> I worry we spend a lot of time working on the NULL/"" but then found nobody
> is using NULL anyway.

I naively assumed deprecating "" would get null used.

>>   - query-qmp-schema: shows null is now possible.  Correctly reflects
>>     the backward incompatible change.  If we fix compatibility break, we
>>     get a tolerable loss of typing precision instead.
>> 
>> 2. Two different C representations of "off".  Strong dislike.  I
>>    recommend to fix the compatibility break by switching to a single C
>>    representation.
>
> IIUC the goal is to switch to StrOrNull, but that face challenge.
>
>> 
>> Thoughts?
>
> If I'd vote (based on our current code), I'd double check tls* fields are
> working properly with libvirt on empty strings, then switch all StrOrNull
> in migration parameters to strings, for both QAPI and QEMU internally.
> Then making sure all migrate_tls_*() helpers do proper job on converting ""
> to NULL.

Accept cleanup of this bad interface is not practical.  Cut our losses
and run.

> If we design a new interface, just remember to use NULL (hopefully) to be
> clear and single way to say "OFF" starting from the 1st day.

Migration configuration and control is in an ugly state.

Configuration is almost entirely special (own QMP commands for
everything), with a little abuse of general infrastructure stirred in
(-global, compat props).  Having capabilities in addition to parameters
is a useless complication.  Too many features of questionable utility
with way too many configuration knobs.

Control is entirely special.  Solves the largely same problems as jobs
do, just different.

Is there hope for improvement, or should I hold my nose and look the
other way?

> But that's only my 2 cents.
>
> Thanks,


Reply via email to