Thanks for the reply. I had looked at the JTidy project. Unfortunately their
current stable release removes empty tags which is no good for me, and too
many errors are reported trying to build the latest source (which includes a
config option for not deleting empty tags, if I understand correct). Seems
I'll have to write something to pre-parse the docs.

Regards,
Derek



keshlam wrote:
> 
> Closet thing I can think of is the W3C's "tidy" tool, which repairs some 
> of the common/obvious errors.
> 
> ______________________________________
> "... Three things see no end: A loop with exit code done wrong,
> A semaphore untested, And the change that comes along. ..."
>   -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
> http://www.ovff.org/pegasus/songs/threes-rev-11.html)
> 
> 
> 
> Derek Alexander <d.alexan...@lse.ac.uk> 
> 07/22/2009 09:55 AM
> Please respond to
> j-users@xerces.apache.org
> 
> 
> To
> j-users@xerces.apache.org
> cc
> 
> Subject
> repairing document while parsing?
> 
> 
> 
> 
> 
> 
> 
> Hi,
> 
> Is there any way with xerces (or any other xml parser you know of) to plug
> in some kind of error handler that can attempt to repair the document 
> being
> parsed, rather than just log errors.
> 
> Specific case I have is xhtml documents that may have attribute values 
> that
> aren't escaped properly, e.g., href="http://some.server/path?blah&foo=baa";
> 
> What I want to do is catch the error that &foo is not a known entity and
> replace it with &amp;foo as it ought to be, and have the parser carry on
> with that.
> 
> Cheers,
> Derek
> 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/repairing-document-while-parsing--tp24607002p24607002.html
> 
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/repairing-document-while-parsing--tp24607002p24608002.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to