--- Robin Berjon <[EMAIL PROTECTED]> wrote:
> If it is creating a /toolset/ to make recuperating data from a
> quasi-XML (aka 
> tag soup) then it is an interesting area of research. I can think of
> two approaches:
> 
>    - have a parametrisable XML grammar. By default it would really
> parse XML, and barf with extreme prejudice on errors. However 
> individual rules will be relaxable and modifiable to accept 
> different, possibly slightly broken, input. This is imho the 
> least desirable approach.

Why is this the least desirable approach?


=Austin

>    - base a quasi-parser on something that does quasi-parsing well,
> namely an 
> HTML parser, which would be wrapped to look like an XML parser but
> would be able 
> to correct most typical problems (poorly defined entities, missing
> end tags, 
> encoding errors, etc). Advantages are: a) it addresses 98% of
> existing problems, 
> b) trying to solve the remaining issues in any non ad hoc manner is
> suicidal, c) 
> can be pointed to to developers in trouble, and d) has very low
> general public 
> visibility. Oh, and e) the perl-xml community is already on it,
> expect something 
> in the month to come.

Reply via email to