Miko O'Sullivan wrote:
> We already have the ability to embed foreign languages (XML, HTML,
> whatever) using here docs:
> 
>  $myml = MyXmlParser->new(<< '(MARKUP)');
>   <thingy>
>       <blah>blah blah</blah>
>   </thingy>
>  (MARKUP)

True, but what kind of magic is hiding inside MyXmlParser?

One problem is that writing MyXmlParser to parse and validate XML and
then generate some corresponding Perl data structure is difficult and
error prone.

In the simple case, XML::Simple is your friend.  But as Robin points 
out, the simple approach falls down work when you need finer control
over what you're doing.

You can use the XML::Schema modules (if you're feeling brave) and that
will generate a validating parser with control over the generated data
structure.  But it's big and bulky and the complexities of XML Schema 
itself make it a daunting task.

There are various other modules and techniques which can acheive the
desired result, but I've yet to find one that was both easy to use and
powerful (although I need to check out those links that Robin posted).

So I'm thinking that if the Perl 6 parser is as flexible and powerful 
as promises, then can we adapt it to simplify the task of parsing XML
into internal data structures?

One benefit of inlined XML over the example above is that it would be 
parsed at compile time, not runtime.  When our modified parser 
sees this:

  use Perl6::XML;

  <thingy>
    <blah>blah blah</blah>
  </thingy>

It would effectively re-write it as if written:

  my $thingy = {
     blah => 'blah blah',
  }

and then generate the appropriate opcodes to implement it at runtime.

A further benefit would be that your parsed and validated XML markup
could then be stored as Parrot bytcode.  You would effectively be
"compiling" XML into bytecode that you could load into other programs
with a simple "use".  That would be neat.

As and when we need more control over the XML validation or code 
generation, we would write our own modified XML grammar modules.  
Apocalypse 5 suggests this would be a simple matter of defining a
few new 'rule' constructs.  For example, we might want to add a rule
for matching thingy/blah that constructs a list rather than a scalar.
Thus, the XML would be parsed as if written:

  my $thingy = {
     blah => [ 'blah blah' ],
  }

This is all largely hypothetical, of course.  Hence the continued hand
waving and general lack of detail.  Consider it an open thought in process.

:-)

A

Reply via email to