Re: [PHP] Extract printable text from web page using preg_match

Martin Zvarík Tue, 27 Feb 2007 10:23:09 -0800

I believe it is better to use strpos() in this case because it's faster.


<?php
$start = strpos(strtolower($website_code), '<body>');

// not necessary
$end = strpos(strtolower($website_code), '</body>');

$code = substr($website_code, $start, $end-$start);

echo strip_tags($code); // clean text
?>

This is very simple, but there are many HTML parsers out there if youwant to do more complex stuff.


Martin

-----------------------
M5 napsal(a):

I am trying to write a regex function to extract the readable (visible,screen-rendered) portion of any web page. Specifically, I only want thetext between the <body> tags, excluding any <script> or <style> tagswithin the document, also excluding comments. Has anyone here seen sucha regex? Is it possible to do in one expression?
...Rene


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Extract printable text from web page using preg_match

Reply via email to