That sounds the way to go. Yes the sites I have looked at so far put their
data into tables. Is there a PHP command that performs a replace all with ""
until first occurrence of <table> kind of thing?
I agree with the DOM statement too. I worked out the DOM access via
Javascript on IE quite quickly but I've had a very brief look at the PHP DOM
access libraries and they look like much harder work (especially as I'm new
to programming).
Thanks
James
-----Original Message-----
From: Tom [mailto:[EMAIL PROTECTED]]
Sent: 16 January 2001 10:27
Cc: [EMAIL PROTECTED]
Subject: Re: [PHP-WIN] DOM
James
My guess is that whatever the site, the dat will be in a table.
It is fairly trivial to strip off everything before the beginning of the
table
(replace everything up to and including <TABLE> with "")
Then replace (say) <tr> with </td> and </tr> with ""
Finally strip out the end of the file (</table> and onwards)
You will then just have the table data, all seperated by </td><td>, which
should
be easy to handle (presumably you want to make a distinction between EPIC
and
price - do this by using the datatype, and there'll probably be a pile of
formatting to sort out as well, but that should really be pretty trivial!)
Alternatively, you could spend a considerable amount of time trying to get a
generic XML parser to work and rebuild a DOM which would no doubt improve
your
XML skills immeasurably, but you would probably die trying!
Tom
James Duncan wrote:
> Thanks Tom. Yes you have it exactly right. That is the approach I'm
> currently aiming for! However, as you say this approach is hard-coded to
> each source website. These websites have a nasty habit of changing their
> format slightly on a fairly regular basis. I'm also attempting to pull
share
> price information from many different websites at the same time because
none
> provide the full set of data I require plus some shares (off market
> particularly) are only provided on dedicated web sites.
>
> The reason I'm attempting to access the HTML textual data via the DOM is
> because I can run a looped search on all the #text fields until I find a
> match on a company name or EPIC code and then all data on the nested #text
> elements will be referring to that company. This allows easy data capture
> and transfer to my database. Another major benefit of this approach is
that
> the same PHP code can be used to search ANY HTML file and recover the
> required data without source code changes. That's the idea but whether
it's
> actually possible in reality is another matter ;)
>
> Thanks
>
> James
>
> -----Original Message-----
> From: Tom [mailto:[EMAIL PROTECTED]]
> Sent: 15 January 2001 10:31
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PHP-WIN] DOM
>
> James
>
> If I'm reading your many posts right, then what you are trying to do is
pull
> the
> share prices from the same site at (say) half hourly intervals, so that
you
> can
> use them yourself / analyse them or whatever.
> In this case, I suspect that the format of the page you pull down will
> ALWAYS BE
> IDENTICAL, so you actually only have to work out a suitable parser to
> extract
> the data once.
> If I remember rightly from a couple of weeks back, you are using MySQL as
> the
> database? In this case, pull the html file down, save it on your server
and
> examine how the html is constructed (it will almost certainly be an ASP /
> PHP
> while construct to build a table, all of whose rows will thus be identical
> apart
> from the data).
> Then you can use a command line (run from a PHP script if you like) MySQL
> LOAD DATA INFILE 'blah.html' INTO TABLE Share_Prices FIELDS TERMINATED BY
> '</td><td>';
> type of construct.
> Note that you will want to strip out the beginning and end of the file
first
> as
> well. This may sound like a bit of work, but you only have to do it once,
as
> the
> file format will always be the same (barring the addition of new stocks).
>
> Tom
>
> James Duncan wrote:
>
> > I don't think this will work in my case because I don't control the
layout
> > of the HTML page and hence can't add the hidden fields. I'm downloading
> the
> > HTML pages from a website. It would require as much work to insert the
> > hidden fields as trying to strip the HTML tags in an attempt to read the
> > data directly from the HTML page itself. There must be a way to access
the
> > DOM directly from PHP? I notice in the manual there is a section
regarding
> > XML DOM but not the DOM itself.
> >
> > Are the DOM values only available on the client? If that's the case then
> PHP
> > can't be used to read them because it's limited to the server side?
> >
> > Thanks
> >
> > James
> >
> > -----Original Message-----
> > From: Michael Stearne [mailto:[EMAIL PROTECTED]]
> > Sent: 13 January 2001 17:06
> > To: James Duncan
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [PHP-WIN] DOM
> >
> > Could you do something like:
> >
> >
>
myForm.myField.value=tablejames.firstChild.childNodes[1].childNodes[4].first
> > Child.firstChild.node Value;
> >
> > Set up a form of hidden fields. Extract the values from the DOM and
then
> > have the user hit a Submit button to get to the next page. At that
point
> > the values that were collected and put into the hidden form fields will
be
> > submitted and you next page (the PHP page) could INSERT the values into
> the
> > database,
> >
> > Michael
> >
> > On Friday, January 12, 2001, at 07:30 PM, James Duncan wrote:
> >
> > > Hi folks,
> > >
> > > I'm still new to HTML, Javascript and PHP but learning (fast
hopefully).
> > > I've just started accessing DOM elements. I have worked out how to
> update
> > > the contents of table cells directly using this method, etc. In
> Javascript
> > I
> > > would use code like:
> > >
> > > alert("Value is: " +
> > >
> >
>
tablejames.firstChild.childNodes[1].childNodes[4].firstChild.firstChild.node
> > > Name);
> > > alert("Value is: " +
> > >
> >
>
tablejames.firstChild.childNodes[1].childNodes[5].firstChild.firstChild.node
> > > Value);
> > >
> > > This Javascript shows the name and value of the child element.
> > >
> > > Now I want to use PHP to extract data (values) from HTML pages like I
do
> > > with the above Javascript. Is this possible? Obviously with the
> Javascript
> > > the HTML page has already been rendered in the browser (i.e. all tree
> > > elements have been created). This makes extracting data a simple case
of
> > > finding the "#text" elements and reading in the values. Can I do the
> same
> > > thing with PHP and an HTML file I've downloaded from the Internet?
> > Obviously
> > > this file is sitting on my server and hasn't been rendered in a
> browser...
> > >
> > > The whole point of this exercise is so that I can extract values from
an
> > > HTML table and populate them into a database. Maybe it's easier to
> process
> > > the HTML file line by line and strip the unwanted HTML tags? However,
> with
> > > this approach I've got to hardcode each webpage...
> > >
> > > If this is a silly question then sorry but you only learn if you ask
;)
> > >
> > > Thanks
> > >
> > > James
> > >
> > >
> > >
> > > --
> > > PHP Windows Mailing List (http://www.php.net/)
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > To contact the list administrators, e-mail:
[EMAIL PROTECTED]
> > >
> > >
> > >
> >
> > --
> > PHP Windows Mailing List (http://www.php.net/)
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > To contact the list administrators, e-mail: [EMAIL PROTECTED]
>
> --
> PHP Windows Mailing List (http://www.php.net/)
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> To contact the list administrators, e-mail: [EMAIL PROTECTED]
--
PHP Windows Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]