Hello,

I am working on a a script that reads in an HTML file, and outputs formatted
plain text.  Not a significant task, but one area that I am having
difficulty with is gracefully converting the 'A' element.  The desired
outcome is text that maintains the link reference in brackets:

------------------------------------
$body = "<p><a href=\"mailto:[EMAIL PROTECTED]\";>first link</a></p>
         <p><a href=\"http://www.tao.ca/\";>second link</a></p>
         <p><a href=\"http://www.tao.ca\";>http://www.tao.ca</a></p>
         <p><a href=\"mailto:[EMAIL PROTECTED]\";>[EMAIL PROTECTED]</a>";

// Regex's for dealing with HTML elements here
// Most of them omitted for simplicity
$body = preg_replace ('/<a href="(http:\/\/)(.*)".*>(.*)<\/a>/Usi', "\\3
(\\1\\2)", $body[$el]);
$body = preg_replace ('/<a href="(mailto:)(.*)".*>(.*)<\/a>/Usi', "\\3
(\\2)", $body[$el]);

//---------------//
// output        //
//---------------//

/*

first link ([EMAIL PROTECTED])
second link (http://www.tao.ca/)
http://www.tao.ca (http://www.tao.ca)
[EMAIL PROTECTED] ([EMAIL PROTECTED])

*/

------------------------------------------

The regex above deal fine for the first and second link, but leave redundant
text in the third and fourth.  Ideally, the regex expressions would not
include the text in brackets in the 3rd and 4th lines.  This is what I am
having difficulty with.  How can I incorporate such logic into my regex's?

Thank for your help,


Michael Caplan
Institute for Social Ecology
http://www.social-ecology.org/

1118 Maple Hill Road
Plainfield, VT, 05667 USA


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to