Re: Parsing HTML

2025-05-29 Thread Stanimir Stamenkov
Tue, 27 May 2025 17:08:55 +, /Olivier Cailloux/: Can anyone point me towards some way of reading HTML (non XML) files using Xerces-J? I tried various things using org.apache.xerces.parsers.DOMParserImpl but parsing this file for example (valid according to Nu validator) fails. Haven't us

Re: Parsing HTML

2025-05-27 Thread Paul Kinnucan
:34 PM To: j-users@xerces.apache.org Subject: Re: Parsing HTML Supporting an HTML DOM, and being able serialize to HTML, does not necessarily imply being able to parse HTML. As far as I know, that last is not supported by Xerces. I was able to (ab)use the W3C's _tidy_ tool to do some basic H

Re: Parsing HTML

2025-05-27 Thread Joseph Kesselman
redundant. From: Olivier Cailloux Sent: Tuesday, May 27, 2025 1:08:55 PM To: j-users@xerces.apache.org Subject: Parsing HTML Dear list, Apache Xerces-J says that it implements DOM Level 1 HTML. I asked recently about the bootstrapping support, which did not yield answers,

Parsing HTML

2025-05-27 Thread Olivier Cailloux
Dear list, Apache Xerces-J says that it implements DOM Level 1 HTML. I asked recently about the bootstrapping support, which did not yield answers, so let me broaden the question. Can anyone point me towards some way of reading HTML (non XML

Re: Problem with parsing HTML

2012-05-13 Thread Michael Glavassevich
-mail: mrgla...@apache.org > > "Yizhou Z." wrote on 12/05/2012 11:40:23 AM: > > > > Hi. I am using NekoHTML to parse a piece of HTML code which includes > > an input element: > > > > id="Password1" /> > > > > My

Re: Problem with parsing HTML

2012-05-13 Thread Yizhou Z.
gt;> "Yizhou Z." wrote on 12/05/2012 11:40:23 AM: >> >> >> > Hi. I am using NekoHTML to parse a piece of HTML code which includes >> > an input element: >> >> > > > id="Password1" /> >> > >> > My pr

Re: Problem with parsing HTML

2012-05-12 Thread Yizhou Z.
t; > Hi. I am using NekoHTML to parse a piece of HTML code which includes > > an input element: > > > > id="Password1" /> > > > > My program for parsing HTML is below. > > > > DOMParser parser = new DOMParser(); > > parser.setPropert

Re: Problem with parsing HTML

2012-05-12 Thread Michael Glavassevich
b E-mail: mrgla...@ca.ibm.com E-mail: mrgla...@apache.org "Yizhou Z." wrote on 12/05/2012 11:40:23 AM: > Hi. I am using NekoHTML to parse a piece of HTML code which includes > an input element: > id="Password1" /> > > My program for parsing

Re: Parsing HTML

2005-10-06 Thread Andy Clark
g APIs available in Xerces (e.g. DOM and SAX). Here's the link where you can download and evaluate it: http://www.apache.org/~andyc/neko/doc/html/ Another option for parsing HTML and using XML interfaces is JTidy. It's primarilly used for cleaning up HTML and saving the result back t

Re: Parsing HTML

2005-10-05 Thread Michael Glavassevich
"Paul Green" <[EMAIL PROTECTED]> wrote on 10/05/2005 08:14:33 AM: > Hi, > > I read recently (in Elliotte Rusty Harold's "Processing XML with Java") > that Xerces-J is capable of parsing an HTML document into a DOM tree. > Xerces-J 1.4.4 does indeed contain an "html" package with all the required

Parsing HTML

2005-10-05 Thread Paul Green
Hi, I read recently (in Elliotte Rusty Harold's "Processing XML with Java") that Xerces-J is capable of parsing an HTML document into a DOM tree. Xerces-J 1.4.4 does indeed contain an "html" package with all the required interfaces to represent an HTML document in DOM form. However, I have been un