[PHP] Re: html stripping

Al Thu, 04 Dec 2003 19:23:28 -0800

<[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> No this is not some shaddy game or strip poker knock-off.  My question has
to
> do with a person project I have started.  I have a script that grabs names
and
> ids from a database, puts them in an array and then based on that grabs a
URL
> and parses that URL for this name, drops all the html crap, and takes the
> information/stats and insert or updates the stats table.  Problem is on
> certain names the page that it is parsing is different, and so I get loads
and
> loads of extra HTML, and one name in particular doesnt return all the
stats.


In essence what you're trying to do is spider / trawl web pages that do not
have consistent HTML formatting with inflexible functions like substr( ) and
str_replace( ) that are incapable of adapting to inconsistent search text.

You'd be much better off extracting the relevant data from HTML using
functions such as preg_match( ) and preg_replace( ) which let you search
over text using Perl-style regular expressions (aka regexes). Function
documentation can be found at:
http://au3.php.net/manual/en/function.preg-match.php and
http://au3.php.net/manual/en/function.preg-replace.php

If you have no idea what a regex is, think of it as powerful, wildcard-based
searching. It allows you to write search expressions that are very tolerant
of changing search text and can still extract the data you need.

Regexes can be confusing when you start out. Try looking at this tutorial
for some starters: http://www.phpfreaks.com/tutorials/52/0.php

Good luck.

Al

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] Re: html stripping

Reply via email to