--- Robin Berjon <[EMAIL PROTECTED]> wrote: > If it is creating a /toolset/ to make recuperating data from a > quasi-XML (aka > tag soup) then it is an interesting area of research. I can think of > two approaches: > > - have a parametrisable XML grammar. By default it would really > parse XML, and barf with extreme prejudice on errors. However > individual rules will be relaxable and modifiable to accept > different, possibly slightly broken, input. This is imho the > least desirable approach.
Why is this the least desirable approach? =Austin > - base a quasi-parser on something that does quasi-parsing well, > namely an > HTML parser, which would be wrapped to look like an XML parser but > would be able > to correct most typical problems (poorly defined entities, missing > end tags, > encoding errors, etc). Advantages are: a) it addresses 98% of > existing problems, > b) trying to solve the remaining issues in any non ad hoc manner is > suicidal, c) > can be pointed to to developers in trouble, and d) has very low > general public > visibility. Oh, and e) the perl-xml community is already on it, > expect something > in the month to come.