Daniel P. Berrangé <berra...@redhat.com> writes: > On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berra...@redhat.com> writes: >> >> > modify them so that we can load the >> > files straight into the python intepretor as code, and not parse >> > them as data. I feel unhappy about treating data as code though. >> >> Stress on *can* load. Doesn't mean we should. >> >> Ancient prior art: Lisp programs routinely use s-expressions as >> configuration file syntax. They don't load them as code, they read them >> as data. >> >> With Python, it's ast.parse(), I think. > > Yes, that could work > > >> > struct: ImageInfoSpecificQCow2 >> > data: >> > compat: str >> > "*data-file": str >> > "*data-file-raw": bool >> > "*lazy-refcounts": bool >> > "*corrupt": bool >> > refcount-bits: int >> > "*encrypt": ImageInfoSpecificQCow2Encryption >> > "*bitmaps": >> > - Qcow2BitmapInfo >> > compression-type: Qcow2CompressionType >> > >> > >> > Then we could use a regular off the shelf YAML parser in python. >> > >> > The uglyiness with quotes is due to the use of "*". Slightly less ugly >> > if we simply declare that quotes are always used, even where they're >> > not strictly required. >> >> StrictYAML insists on quotes. > > I wouldn't suggest StrictYAML, just normal YAML is what pretty much > everyone uses. > > If we came up with a different way to mark a field as optional > instead of using the magic "*" then we wouldn't need to quote > anything
Nope. Stupid test script: $ cat test-yaml.py #!/usr/bin/python3 import sys, yaml data = yaml.load(sys.stdin, Loader=yaml.CLoader) print(data) Example taken from block.json: $ ./test-yaml.py <<EOF enum: FloppyDriveType data: - 144 - 288 - 120 - none - auto EOF {'enum': 'FloppyDriveType', 'data': [144, 288, 120, 'none', 'auto']} The upper layer will choke on this: qapi/block.yaml: In enum 'FloppyDriveType': qapi/block.yaml:61: 'data' member requires a string name You could propose to provide "wouldn't need to quote anything" by silently converting numbers back to strings. I got two issues with that. 1. What's the point in switching to an off-the-shelf parser to replace less than 400 SLOC if I then have to write non-trivial glue code to make the syntax less surprising? 2. Let me trot out the tired Norway problem again. Say we QAPIfy -k so that its argument is an enum, for nice introspection. Something like $ ./test-yaml.py <<EOF enum: Keymap data: - ar - cz - de - de-ch - en-gb - en-us - no - pt - pt-br EOF {'enum': 'Keymap', 'data': ['ar', 'cz', 'de', 'de-ch', 'en-gb', 'en-us', False, 'pt', 'pt-br']} To which of the eleven ways to say False in YAML should we convert? YAML and the schema language are fundamentally at odds here: they fight over types. YAML lets the value determine the type regardless of context. But in the schema, the context determines the type. >> I hate having to quote identifiers. There's a reason we don't write >> >> 'int' >> 'main'('int', 'argc', 'char' *'argv'[]) >> { >> 'printf'("hello world\n"); >> return 0; >> } >> >> > struct: ImageInfoSpecificQCow2 >> > data: >> > "compat": "str" >> > "*data-file": "str" >> > "*data-file-raw": "bool" >> > "*lazy-refcounts": "bool" >> > "*corrupt": "bool" >> > "refcount-bits": "int" >> > "*encrypt": "ImageInfoSpecificQCow2Encryption" >> > "*bitmaps": >> > - "Qcow2BitmapInfo" >> > "compression-type": "Qcow2CompressionType" >> > >> > With the use of "---" to denote the start of document, we have no trouble >> > parsing our files which would actually be a concatenation of multiple >> > documents. The python YAML library provides the easy yaml.load_all() >> > method. >> >> Required reading on YAML: >> https://www.arp242.net/yaml-config.html > > I don't think this is especially helpful to our evaluation. You can write > such blog posts about pretty much any thing if you want to pick holes in a > proposal. Certainly there's plenty of awful stuff you can write about > JSON, and Python. Picking holes in a proposal is precisely what we need to do before we act on it and rebase our QAPI schema DSL. That's expensive and painful, so we better don't screw it up *again*. Here's my superficial five minute assessment of the essay's main points for our use case: * Insecure by default Valid criticism of existing YAML tools, but hardly relevant for us, because 1. "don't do that then", and 2. the QAPI schema is trusted input. * Can be hard to edit, especially for large files Valid when you use YAML to describe data. We would use it as an IDL, though. If parts of the interface description get too deeply nested or too long for comfort, we better provide means to rearrange it in more pleasant ways. However, if our new base language got uncomfortable earlier than the old one, and the existing means to rearrange prove to weak (they are pretty weak), then we'd create additional work. I'm cautiously optimistic that YAML would do okay here. * It’s pretty complex If you go "we'll use only a simple subset", then I go "define the subset, and tell me how to enforce the subset. I have no interest in teaching contributors one by one which of the nine ways to write a multiline string to avoid, or which of the eleven ways to say False to use. * Surprising behavior See "fight over types" above. * It’s not portable Probably irrelevant, because feeding the schema to another YAML parser is so unlikely to be useful. >> Some of the criticism there doesn't matter for our use case. > > Yeah, what matters is whether it can do the job we need in a way that is > better than what we have today, and whether there are any further options > to consider that might be viable alternatives. Would it improve things enough to be worth the switching pain?