Doug Ransom <[EMAIL PROTECTED]> writes:

> I have configured this message so that replies should go to haskell-cafe
> automatically.

Sure.  However, I do 'f' to follow up, which causes both adresses to
be included :-)  (Removed haskell@manually)

> What xml parser are you using XML in haskell?  I am familliar with this
> stuff: http://www.cs.york.ac.uk/fp/HaXml.  

I wrote my own, since it was basically just a toy project.  Not
exactly featureful (discards attributes(!), doesn't validate, ignores
charsets, doesn't understand CDATA sections...), and not particularly
elegant (no monadic parser-combinators - although I tried that first,
I found it made parsing an element too strict for my taste).  It does
work, though, and lets you access the XML structure in several layers, 
from a string of tokens (stago, tagc, ..) to SAX-like (i.e. tags,
characters, entities) to DOM-like (a tree of elements), all lazily
built.  Works for me.

What I want now is a good heuristic for parsing HTML-documents.  As it 
is, I have to do ugly stuff like ignoring stray end tags and closing
a few levels of open contexts (although that is actually legal HTML,
which could be handled by validating with a DTD), and of course '&'s
not used as entity markers -- and that's just to parse a few, fairly
well-behaved, FrontPage-generated pages. 

> I am researching combinator libraries now.

That is probably a good idea, although I couldn't get it to work the
way I wanted (probably me stupid), and I got the nagging suspicion
that XML is really too simple to parse to warrant any complex
machinery. 

-kzm
-- 
If I haven't seen further, it is by standing in the footprints of giants

Reply via email to