On Jul 5, 2012, at 7:57 AM, Michael Natterer wrote:

> And XML was ruled out because it's not the latest fad any longer?

I think this is pretty much the right answer.  There is a ton of XML hate in 
the world right now.

Having fought this battle when dealing with millions of lines of code, 100's of 
thousands of lines of JSON and/or XML, I can leave the following advice…

XML is probably the right answer here.  XML sucks in the following ways.

1. It's verbose.  This is actually good for humans, but it sucks as a wire 
format, and some people feel the verbosity is unreadable.  That's only true if 
you're able to keep all the context in your head.  Once someone screws up the 
indentation, or you're 1000 lines in and 12 nested levels deep, having the 
extra context of tag names makes a huge difference.  Also, gzip is awesome here 
and solves the on-disk space issues.

2. It's complex.  No argument here.  There is a lot of things is supposed to 
do, and a major ambiguity that people always complain about (attribute vs. 
elements).

3. Many of the parsers are memory hogs (tree parsers) or very slow (though 
that's gotten much better and doesn't apply to the parser gegl is using).  They 
were copying too many strings.

1 and 3 means it sucks as an on-wire format for interactive HTTP requests 
(though gzip pretty much negates 1).  2 means it's hard to write a fast JS 
parser for it, which means your HTML5 app will get slow.

Everyone says "it's more readable!" Then they try to maintain a large file, 
using their JSON file.  Then they discover that validation and line numbers for 
errors, and a more expressive grammar go a long way towards keeping programs 
simpler.  The first time you spend an hour trying to track down where your 
missing "," caused your entire file to fail to parse, you'll wish you had a 
better parser.  I haven't found a JSON parser that will actually spit out line 
numbers and context for errors.  With XML, it's easy to combine multiple 
grammars (think embedding GEGL ops into another XML document).  It has a 
validation language (two of them, in fact. yes, they have warts… but they do 
actually work for most things).  It's easier for new brains to look at (though 
slower for familiar brains).  It's more self-describing, for those who expect 
their file format to be produced or consumed by many other programs.  It's 
amazing how important strict specification can be when it comes to using a file 
as an interchange format.  XML is much better at this, than most other options.

Anyways, if you just expect your serialization to be temporary (like a wire 
format), needs to be parsed fast by a huge variety of hardware in languages 
without a byte array (JS), or is only produced and consumed by your own 
application, then JSON (or BSON, or protocol buffers) seem like a good choice.  
If you're going for more of an interchange format, stick with XML.

Thus I would strongly suggest using XML for this.

Also, as far as structure goes, if you want to represent a general graph, you 
can draw inspiration from DOT, the language of graphviz.  There is also 
graphML.  You could frankly use graphML straight out of the box, though it has 
lots of features you're probably not interested in.

The general structure is usually:

<graph>
  .. graph attributes …
  <node />
  <node />
' <node />
  <edge />
  <edge />
  <edge />
</graph>

So you don't try to put a tree in the text at all.  IT's just a list of nodes 
and edges.

--
Daniel

_______________________________________________
gegl-developer-list mailing list
gegl-developer-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gegl-developer-list

Reply via email to