Well, after writing that mail I checked libxml's homepage, and 
it seems they've managed to build in an HTML mode, so maybe 
it's forgiving enough to parse really anything. 


At 18:06 14.1. 2001, James Duncan wrote the following:
-------------------------------------------------------------- 
>But I thought you said that the DOM XML wouldn't parse a normal HTML web
>page because 98% of web pages aren't truly XML compatible and the XML parser
>would die with an error message(s)?
>
>I want to be able to feed the parser any old HTML web page and read the node
>values from the DOM (created by the parser), just like I do with IE and
>Javascript.
>
>Thanks
>
>PS: I am learning slowly so don't get tooooo mad with me ;)
>
>
>-----Original Message-----
>From: Cynic [mailto:[EMAIL PROTECTED]]
>Sent: 14 January 2001 17:01
>To: James Duncan; [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]
>Subject: RE: [PHP-WIN] DOM
>
>What you want has already been done, with two different
>approaches: DOM XML functions and Sablotron functions (SAX
>interface). Just use one of these modules in your script.
>
>
>At 16:28 14.1. 2001, James Duncan wrote the following:
>--------------------------------------------------------------
>>As I'm asking stupid questions at the moment: Could someone write an
>>(XML/HTML?) parser for PHP that exposes the DOM in the same way as the
>>Javascript one does in IE 5? This would allow me to access the node
>elements
>>(#text, etc) via PHP on an HTML file stored on the server in the same way
>as
>>I can via Javascript in IE 5? Why do I want to do this? It would allow me
>to
>>download a web page, parse it into a DOM tree-structure, loop through all
>>#text nodes and extract all the textual data. This would make capturing
>>textual data from an HTML file so much easier than attempting to strip all
>>the HTML tags, etc. The parser would only need to support a "read" mode for
>>my requirements, which should simplify the parser (it wouldn't need to
>worry
>>about updating node values, etc or writing them back to the HTML file). It
>>sounds like a good idea to me but I might be way off course...
>>
>>This would allow all work to be performed server-side, whereas at the
>moment
>>I'm having to send the HTML file to IE, run Javascript DOM code to extract
>>the #text values, dump those values into a hidden field and post the data
>>back to the server, where PHP can process it.
>>
>>Thanks
>>
>>James
>>
>>-----Original Message-----
>>From: Cynic [mailto:[EMAIL PROTECTED]]
>>Sent: 14 January 2001 01:38
>>To: James Duncan; [EMAIL PROTECTED]
>>Cc: [EMAIL PROTECTED]
>>Subject: RE: [PHP-WIN] DOM
>>
>>It's not PHP vs. DOM. It's XML (DOM) vs. (bad) HTML. PHP just
>>provides you with an interface to an XML parser.
>>
>>www.php4win.de
>>
>>
>>At 01:14 14.1. 2001, James Duncan wrote the following:
>>--------------------------------------------------------------
>>>Yikes. I'm just reading more about DOM and PHP at the moment on the
>>>PHPBuilder website.
>>>
>>>Does anyone have a version of PHP complied with DOM support included for
>>>Windows (I'm developing on a Windows system before moving it over to
>>Linux -
>>>RedHat)?
>>>
>>>So loading any old web page and trying to construct a DOM document from it
>>>via PHP isn't going to work? How does IE v5 manage to parse the same web
>>>page correctly (or what seems to be correctly)? I've already read in the
>>DOM
>>>table node elements #text and their values via Javascript in IE.
>>>
>>>Still learning lots ;)
>>>
>>>Thanks
>>>
>>>James
>>>
>>>
>>>-----Original Message-----
>>>From: Cynic [mailto:[EMAIL PROTECTED]]
>>>Sent: 14 January 2001 00:07
>>>To: James Duncan; [EMAIL PROTECTED]
>>>Cc: [EMAIL PROTECTED]
>>>Subject: RE: [PHP-WIN] DOM
>>>
>>>I should warn you that XML functions require the document to be
>>>very 'correct'. Most (I guess 98%... I wish browsers weren't so
>>>forgiving, all might've been much easier and better) of HTML
>>>pages on the internet basically aren't HTML (which is a son of
>>>SGML, and an older, heavily cripled brother of XML), and even
>>>strict HTML isn't XML compliant up to XHTML 1.0, which is the
>>>latest version of HTML, fully XML compliant.
>>>If you'll try to load such document into an XML parser, it'll
>>>die with an error message, because XML requires the document
>>>to be well-formed.
>>>
>>>At 00:54 14.1. 2001, James Duncan wrote the following:
>>>--------------------------------------------------------------
>>>>Ah rite... thanks for the info. As I said I'm very new to all of this and
>>>>reading lots, whilst trying to make sense of it all ;) So it is possible
>>to
>>>>use PHP to access DOM elements (via the XML DOM library) created from an
>>>>HTML source file (a code example would be very handy)? Does anyone know
>if
>>>>an XML parser will be built into PHP in the future? I then assume I could
>>>>access DOM elements from an HTML file in the same easy way as I can via
>>>>Javascript in IE?
>>>>
>>>>Thanks
>>>>
>>>>James
>>>>
>>>>
>>>>-----Original Message-----
>>>>From: Cynic [mailto:[EMAIL PROTECTED]]
>>>>Sent: 13 January 2001 23:22
>>>>To: James Duncan; [EMAIL PROTECTED]
>>>>Cc: [EMAIL PROTECTED]
>>>>Subject: RE: [PHP-WIN] DOM
>>>>
>>>>You don't understand the basic concept.
>>>>
>>>>DOM (Document Object Model) is a tree representing the structure
>>>>of a document, where the elements (logically separated parts of)
>>>>content is enclosed within tags to allow for computerized
>>>>processing. IE exposes it's own version of DOM through its
>>>>implementations of JS. If you want to access and manipulate a HTML
>>>>document in PHP using this tree-like abstraction (DOM), you will
>>>>have to use XML DOM library. No XML parser is an integral part of
>>>>the language.
>>>>
>>>>
>>>>At 18:20 13.1. 2001, James Duncan wrote the following:
>>>>--------------------------------------------------------------
>>>>>I don't think this will work in my case because I don't control the
>>layout
>>>>>of the HTML page and hence can't add the hidden fields. I'm downloading
>>>the
>>>>>HTML pages from a website. It would require as much work to insert the
>>>>>hidden fields as trying to strip the HTML tags in an attempt to read the
>>>>>data directly from the HTML page itself. There must be a way to access
>>the
>>>>>DOM directly from PHP? I notice in the manual there is a section
>>regarding
>>>>>XML DOM but not the DOM itself.
>>>>>
>>>>>Are the DOM values only available on the client? If that's the case then
>>>>PHP
>>>>>can't be used to read them because it's limited to the server side?
>>>>>
>>>>>Thanks
>>>>>
>>>>>James
>>>>>
>>>>>-----Original Message-----
>>>>>From: Michael Stearne [mailto:[EMAIL PROTECTED]]
>>>>>Sent: 13 January 2001 17:06
>>>>>To: James Duncan
>>>>>Cc: [EMAIL PROTECTED]
>>>>>Subject: Re: [PHP-WIN] DOM
>>>>>
>>>>>Could you do something like:
>>>>>
>>>>>myForm.myField.value=tablejames.firstChild.childNodes[1].childNodes[4].f
>i
>>r
>>>s
>>>>t
>>>>>Child.firstChild.node Value;
>>>>>
>>>>>Set up a form of hidden fields.  Extract the values from the DOM and
>then
>>>>>have the user hit a Submit button to get to the next page.  At that
>point
>>>>>the values that were collected and put into the hidden form fields will
>>be
>>>>>submitted and you next page (the PHP page) could INSERT the values into
>>>the
>>>>>database,
>>>>>
>>>>>Michael
>>>>>
>>>>>
>>>>>On Friday, January 12, 2001, at 07:30 PM, James Duncan wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> I'm still new to HTML, Javascript and PHP but learning (fast
>>hopefully).
>>>>>> I've just started accessing DOM elements. I have worked out how to
>>>update
>>>>>> the contents of table cells directly using this method, etc. In
>>>>Javascript
>>>>>I
>>>>>> would use code like:
>>>>>>
>>>>>>   alert("Value is: " +
>>>>>>
>>>>>tablejames.firstChild.childNodes[1].childNodes[4].firstChild.firstChild.
>n
>>o
>>>d
>>>>e
>>>>>> Name);
>>>>>>   alert("Value is: " +
>>>>>>
>>>>>tablejames.firstChild.childNodes[1].childNodes[5].firstChild.firstChild.
>n
>>o
>>>d
>>>>e
>>>>>> Value);
>>>>>>
>>>>>> This Javascript shows the name and value of the child element.
>>>>>>
>>>>>> Now I want to use PHP to extract data (values) from HTML pages like I
>>do
>>>>>> with the above Javascript. Is this possible? Obviously with the
>>>>Javascript
>>>>>> the HTML page has already been rendered in the browser (i.e. all tree
>>>>>> elements have been created). This makes extracting data a simple case
>>of
>>>>>> finding the "#text" elements and reading in the values. Can I do the
>>>same
>>>>>> thing with PHP and an HTML file I've downloaded from the Internet?
>>>>>Obviously
>>>>>> this file is sitting on my server and hasn't been rendered in a
>>>>browser...
>>>>>>
>>>>>> The whole point of this exercise is so that I can extract values from
>>an
>>>>>> HTML table and populate them into a database. Maybe it's easier to
>>>>process
>>>>>> the HTML file line by line and strip the unwanted HTML tags? However,
>>>>with
>>>>>> this approach I've got to hardcode each webpage...
>>>>>>
>>>>>> If this is a silly question then sorry but you only learn if you ask
>;)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> James
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> PHP Windows Mailing List (http://www.php.net/)
>>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>> To contact the list administrators, e-mail:
>>[EMAIL PROTECTED]
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>PHP Windows Mailing List (http://www.php.net/)
>>>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>>>To contact the list administrators, e-mail: [EMAIL PROTECTED]
>>>>------end of quote------
>>>>
>>>>
>>>>
>>>>____________________________________________________________
>>>>Cynic:
>>>>
>>>>A member of a group of ancient Greek philosophers who taught
>>>>that virtue constitutes happiness and that self control is
>>>>the essential part of virtue.
>>>>
>>>>[EMAIL PROTECTED]
>>>------end of quote------
>>>
>>>
>>>
>>>____________________________________________________________
>>>Cynic:
>>>
>>>A member of a group of ancient Greek philosophers who taught
>>>that virtue constitutes happiness and that self control is
>>>the essential part of virtue.
>>>
>>>[EMAIL PROTECTED]
>>------end of quote------
>>
>>
>>
>>____________________________________________________________
>>Cynic:
>>
>>A member of a group of ancient Greek philosophers who taught
>>that virtue constitutes happiness and that self control is
>>the essential part of virtue.
>>
>>[EMAIL PROTECTED]
>------end of quote------
>
>
>
>____________________________________________________________
>Cynic:
>
>A member of a group of ancient Greek philosophers who taught
>that virtue constitutes happiness and that self control is
>the essential part of virtue.
>
>[EMAIL PROTECTED]
------end of quote------ 



____________________________________________________________
Cynic:

A member of a group of ancient Greek philosophers who taught
that virtue constitutes happiness and that self control is
the essential part of virtue.

[EMAIL PROTECTED]



-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to