But I thought you said that the DOM XML wouldn't parse a normal HTML web
page because 98% of web pages aren't truly XML compatible and the XML parser
would die with an error message(s)?

I want to be able to feed the parser any old HTML web page and read the node
values from the DOM (created by the parser), just like I do with IE and
Javascript.

Thanks

PS: I am learning slowly so don't get tooooo mad with me ;)


-----Original Message-----
From: Cynic [mailto:[EMAIL PROTECTED]]
Sent: 14 January 2001 17:01
To: James Duncan; [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: RE: [PHP-WIN] DOM

What you want has already been done, with two different
approaches: DOM XML functions and Sablotron functions (SAX
interface). Just use one of these modules in your script.


At 16:28 14.1. 2001, James Duncan wrote the following:
--------------------------------------------------------------
>As I'm asking stupid questions at the moment: Could someone write an
>(XML/HTML?) parser for PHP that exposes the DOM in the same way as the
>Javascript one does in IE 5? This would allow me to access the node
elements
>(#text, etc) via PHP on an HTML file stored on the server in the same way
as
>I can via Javascript in IE 5? Why do I want to do this? It would allow me
to
>download a web page, parse it into a DOM tree-structure, loop through all
>#text nodes and extract all the textual data. This would make capturing
>textual data from an HTML file so much easier than attempting to strip all
>the HTML tags, etc. The parser would only need to support a "read" mode for
>my requirements, which should simplify the parser (it wouldn't need to
worry
>about updating node values, etc or writing them back to the HTML file). It
>sounds like a good idea to me but I might be way off course...
>
>This would allow all work to be performed server-side, whereas at the
moment
>I'm having to send the HTML file to IE, run Javascript DOM code to extract
>the #text values, dump those values into a hidden field and post the data
>back to the server, where PHP can process it.
>
>Thanks
>
>James
>
>-----Original Message-----
>From: Cynic [mailto:[EMAIL PROTECTED]]
>Sent: 14 January 2001 01:38
>To: James Duncan; [EMAIL PROTECTED]
>Cc: [EMAIL PROTECTED]
>Subject: RE: [PHP-WIN] DOM
>
>It's not PHP vs. DOM. It's XML (DOM) vs. (bad) HTML. PHP just
>provides you with an interface to an XML parser.
>
>www.php4win.de
>
>
>At 01:14 14.1. 2001, James Duncan wrote the following:
>--------------------------------------------------------------
>>Yikes. I'm just reading more about DOM and PHP at the moment on the
>>PHPBuilder website.
>>
>>Does anyone have a version of PHP complied with DOM support included for
>>Windows (I'm developing on a Windows system before moving it over to
>Linux -
>>RedHat)?
>>
>>So loading any old web page and trying to construct a DOM document from it
>>via PHP isn't going to work? How does IE v5 manage to parse the same web
>>page correctly (or what seems to be correctly)? I've already read in the
>DOM
>>table node elements #text and their values via Javascript in IE.
>>
>>Still learning lots ;)
>>
>>Thanks
>>
>>James
>>
>>
>>-----Original Message-----
>>From: Cynic [mailto:[EMAIL PROTECTED]]
>>Sent: 14 January 2001 00:07
>>To: James Duncan; [EMAIL PROTECTED]
>>Cc: [EMAIL PROTECTED]
>>Subject: RE: [PHP-WIN] DOM
>>
>>I should warn you that XML functions require the document to be
>>very 'correct'. Most (I guess 98%... I wish browsers weren't so
>>forgiving, all might've been much easier and better) of HTML
>>pages on the internet basically aren't HTML (which is a son of
>>SGML, and an older, heavily cripled brother of XML), and even
>>strict HTML isn't XML compliant up to XHTML 1.0, which is the
>>latest version of HTML, fully XML compliant.
>>If you'll try to load such document into an XML parser, it'll
>>die with an error message, because XML requires the document
>>to be well-formed.
>>
>>At 00:54 14.1. 2001, James Duncan wrote the following:
>>--------------------------------------------------------------
>>>Ah rite... thanks for the info. As I said I'm very new to all of this and
>>>reading lots, whilst trying to make sense of it all ;) So it is possible
>to
>>>use PHP to access DOM elements (via the XML DOM library) created from an
>>>HTML source file (a code example would be very handy)? Does anyone know
if
>>>an XML parser will be built into PHP in the future? I then assume I could
>>>access DOM elements from an HTML file in the same easy way as I can via
>>>Javascript in IE?
>>>
>>>Thanks
>>>
>>>James
>>>
>>>
>>>-----Original Message-----
>>>From: Cynic [mailto:[EMAIL PROTECTED]]
>>>Sent: 13 January 2001 23:22
>>>To: James Duncan; [EMAIL PROTECTED]
>>>Cc: [EMAIL PROTECTED]
>>>Subject: RE: [PHP-WIN] DOM
>>>
>>>You don't understand the basic concept.
>>>
>>>DOM (Document Object Model) is a tree representing the structure
>>>of a document, where the elements (logically separated parts of)
>>>content is enclosed within tags to allow for computerized
>>>processing. IE exposes it's own version of DOM through its
>>>implementations of JS. If you want to access and manipulate a HTML
>>>document in PHP using this tree-like abstraction (DOM), you will
>>>have to use XML DOM library. No XML parser is an integral part of
>>>the language.
>>>
>>>
>>>At 18:20 13.1. 2001, James Duncan wrote the following:
>>>--------------------------------------------------------------
>>>>I don't think this will work in my case because I don't control the
>layout
>>>>of the HTML page and hence can't add the hidden fields. I'm downloading
>>the
>>>>HTML pages from a website. It would require as much work to insert the
>>>>hidden fields as trying to strip the HTML tags in an attempt to read the
>>>>data directly from the HTML page itself. There must be a way to access
>the
>>>>DOM directly from PHP? I notice in the manual there is a section
>regarding
>>>>XML DOM but not the DOM itself.
>>>>
>>>>Are the DOM values only available on the client? If that's the case then
>>>PHP
>>>>can't be used to read them because it's limited to the server side?
>>>>
>>>>Thanks
>>>>
>>>>James
>>>>
>>>>-----Original Message-----
>>>>From: Michael Stearne [mailto:[EMAIL PROTECTED]]
>>>>Sent: 13 January 2001 17:06
>>>>To: James Duncan
>>>>Cc: [EMAIL PROTECTED]
>>>>Subject: Re: [PHP-WIN] DOM
>>>>
>>>>Could you do something like:
>>>>
>>>>myForm.myField.value=tablejames.firstChild.childNodes[1].childNodes[4].f
i
>r
>>s
>>>t
>>>>Child.firstChild.node Value;
>>>>
>>>>Set up a form of hidden fields.  Extract the values from the DOM and
then
>>>>have the user hit a Submit button to get to the next page.  At that
point
>>>>the values that were collected and put into the hidden form fields will
>be
>>>>submitted and you next page (the PHP page) could INSERT the values into
>>the
>>>>database,
>>>>
>>>>Michael
>>>>
>>>>
>>>>On Friday, January 12, 2001, at 07:30 PM, James Duncan wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> I'm still new to HTML, Javascript and PHP but learning (fast
>hopefully).
>>>>> I've just started accessing DOM elements. I have worked out how to
>>update
>>>>> the contents of table cells directly using this method, etc. In
>>>Javascript
>>>>I
>>>>> would use code like:
>>>>>
>>>>>   alert("Value is: " +
>>>>>
>>>>tablejames.firstChild.childNodes[1].childNodes[4].firstChild.firstChild.
n
>o
>>d
>>>e
>>>>> Name);
>>>>>   alert("Value is: " +
>>>>>
>>>>tablejames.firstChild.childNodes[1].childNodes[5].firstChild.firstChild.
n
>o
>>d
>>>e
>>>>> Value);
>>>>>
>>>>> This Javascript shows the name and value of the child element.
>>>>>
>>>>> Now I want to use PHP to extract data (values) from HTML pages like I
>do
>>>>> with the above Javascript. Is this possible? Obviously with the
>>>Javascript
>>>>> the HTML page has already been rendered in the browser (i.e. all tree
>>>>> elements have been created). This makes extracting data a simple case
>of
>>>>> finding the "#text" elements and reading in the values. Can I do the
>>same
>>>>> thing with PHP and an HTML file I've downloaded from the Internet?
>>>>Obviously
>>>>> this file is sitting on my server and hasn't been rendered in a
>>>browser...
>>>>>
>>>>> The whole point of this exercise is so that I can extract values from
>an
>>>>> HTML table and populate them into a database. Maybe it's easier to
>>>process
>>>>> the HTML file line by line and strip the unwanted HTML tags? However,
>>>with
>>>>> this approach I've got to hardcode each webpage...
>>>>>
>>>>> If this is a silly question then sorry but you only learn if you ask
;)
>>>>>
>>>>> Thanks
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> PHP Windows Mailing List (http://www.php.net/)
>>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>> To contact the list administrators, e-mail:
>[EMAIL PROTECTED]
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>--
>>>>PHP Windows Mailing List (http://www.php.net/)
>>>>To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>>For additional commands, e-mail: [EMAIL PROTECTED]
>>>>To contact the list administrators, e-mail: [EMAIL PROTECTED]
>>>------end of quote------
>>>
>>>
>>>
>>>____________________________________________________________
>>>Cynic:
>>>
>>>A member of a group of ancient Greek philosophers who taught
>>>that virtue constitutes happiness and that self control is
>>>the essential part of virtue.
>>>
>>>[EMAIL PROTECTED]
>>------end of quote------
>>
>>
>>
>>____________________________________________________________
>>Cynic:
>>
>>A member of a group of ancient Greek philosophers who taught
>>that virtue constitutes happiness and that self control is
>>the essential part of virtue.
>>
>>[EMAIL PROTECTED]
>------end of quote------
>
>
>
>____________________________________________________________
>Cynic:
>
>A member of a group of ancient Greek philosophers who taught
>that virtue constitutes happiness and that self control is
>the essential part of virtue.
>
>[EMAIL PROTECTED]
------end of quote------



____________________________________________________________
Cynic:

A member of a group of ancient Greek philosophers who taught
that virtue constitutes happiness and that self control is
the essential part of virtue.

[EMAIL PROTECTED]


-- 
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to