Kirill: Thank you for your constructive criticism. This is the gem that made it worthwhile to post my document. I think all of your points are spot-on, and I will be fixing the documentation.
I can well believe that the C implementation of YAML is much faster than the Python one, but I am aiming for something that will be reasonably quick in pure Python. I will double-check the JSON C test results, but something I probably did not make clear is that the 22 seconds is not spent parsing -- that is for the entire test, which involves reading restructured text and generating some 160 separate PDF files. Best regards, Pat On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <x...@gamma.dn.ua> wrote: > Patrick Maupin wrote: >> >> All: >> >> Finding .ini configuration files too limiting, JSON and XML to hard to >> manually edit, and YAML too complex to parse quickly, I have started >> work on a new configuration file parser. > > I'd like to note that with the optional libyaml bindings, the PyYAML parser > is pretty fast. > >> I call the new format RSON (for "Readable Serial Object Notation"), >> and it is designed to be a superset of JSON. >> >> I would love for it to be considered valuable enough to be a part of >> the standard library, but even if that does not come to pass, I would >> be very interested in feedback to help me polish the specification, >> and then possibly help for implementation and testing. >> >> The documentation is in rst PEP form, at: >> >> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt > > === cut === > Because YAML does allow for highly readable configuration files, it > is tempting to overlook its other flaws for the task. But a fully > (or almost) compliant parser has to understand the whole YAML > specification, and this is apparently expensive. Running the rst2pdf > testsuite, without sphinx or most of the other optional packages, in > "fast" mode (preloading all the modules, and then forking for every > test) generates 161 smallish PDF files, totaling around 2.5 MB. On > one test system this process takes 22 seconds. Disabling the _json C > scanner and reading the configuration files using the json pure Python > implementation adds about 0.3 seconds to the 22 seconds. But using > pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process! > It might seem that this is an edge case, but it makes it unacceptable to > use YAML for this sort of testing, and taking 200 ms to read in 1000 > lines of simple JSON will be unacceptable in many other application > domains as well. > === cut === > > I'd question your testing methodology. From your description, it looks like > the _json speedup never was enabled. Also PyYAML provides optional bindings > to libyaml, which makes parsing and emitting yaml much faster. In my tests, > it parses a 10Mb file in 3 sec. > > === cut === > RSON semantics are based on JSON. Like JSON, an RSON document represents > either a single scalar object, or a DAG (Directed Acyclic Graph), which > may contain only a few simple data types. > === cut === > > JSON doesn't represent a DAG, at least, not an arbitrary DAG since each node > in the document has no more than one parent. It would be more accurate to > say that that it represents a tree-like structure. > > === cut === > The YAML syntax for supporting back-references was considered and deemed > unsatisfactory. A human user who wants to put identical information in a > "ship to" and "bill to" address is much more likely to use cut and paste > than he is to understand and use backreferences, so the additional overhead > of supporting more complex document structures is unwarranted. > > The concept of a "merge" in YAML, where two sub-trees of data can be > merged together (similar to a recursive Python dictionary update) > is quite useful, though, and will be copied. This does not alter the > outcome that parsing a RSON file will result in a DAG, but does give > more flexibility in the syntax that can be used to achieve a particular > output DAG. > === cut === > > This paragraph assumes the reader is familiar with intricate details of the > YAML grammar and semantics. I bet most of your audience are completely lost > here. > > === cut === > Enhanced example:: > > key1/key2a > key3a = Some random string > key3b = 42 > key1/key2a > key3c > 1 > 2 > {} > key4a = anything > key4b = something else > [] > a > b > c > 3 > 4 > key1/key2b = [1, 2, 3, 4] > key5 = "" > This is a multi-line string. It is > dedented to the farthest left > column that is indented from > the line containing "". > key6 = [""] > This is an array of strings, one per line. > Each string is dedented appropriately. > === cut === > > Frankly, this is an example that only a mother could love. I'd suggest you > to add some real-world examples, make sure they look nice and put them to > the introductory part of the document. Examples is how the format will be > evaluated by the readers, and yours don't stand a chance. > > Seriously, the only reason YAML enjoys its moderate popularity despite its > overcomplicated grammar, chronic lack of manpower and deficient > implementations is because it's so cute. > > > > Disclaimer: I'm the author of PyYAML and libyaml. > > Thanks, > Kirill > -- http://mail.python.org/mailman/listinfo/python-list