I'm having problem with regular expression, not a good eek this week it
seen like I alway's get a wall of problem. I know that it surely been
ask a 1000 times, I look around, didn't find anythings, if you find
somethings please point me out.

So here what I want to do, I need to parse a xml document , but before
to parse it I need to get rid of bad html that I don't want, but the
document that I want require some stuff that I need too, so I don't want
to get ride of all they HTML.

So what I want to do, I already did a little bite of code that get out
my good element and check for bad stuff, the only bad thing is that
"text<text-1" is a good stuff, but I need to change < to &lt; or it will
do bad things with my xml parser.

Here what I try

$simple = <<<XMLDATA
<?xml version='1.0'?>
 <!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
 <!ENTITY plainEntity "FOO entity">
 <!ENTITY systemEntity SYSTEM "xmltest2.xml">
 ]>
 <item>
text 
   <bad stuff>
text<text-1
   text
 <image  title="Ceci est mon titre2" description="Ceci est ma
description"
link="http://www.windplanet.com/";
url="http://www.windplanet.com/images/news/988991159.gif"; 
align="left" width="235"  height="131"  size="13310"/>
text
        text
 <image title="Ceci est mon titre" description="Ceci est ma description"
link="http://www.windplanet.com/";
url="http://www.windplanet.com/images/news/988991159.gif"; align="left"
width="235"  height="131"  size="13310"/>
 </item>

XMLDATA;
//$simple = str_replace("\n\n"," &lt;br/>  &lt;br/> ",$simple); 

                                /* trouve moi tous les < sauf suivant ceci ... */
$data = $simple;
print $data;
if(preg_match_all("/\<(?:(?:\!|\/|\?|)(?:<!xml|<!DOCTYPE|<!ENTITY|<!image|<!item|))/",$data,$cbadhtml)){
  foreach( $cbadhtml as $key => $myarray){
      foreach( $myarray as $key2 => $myarray2){
        print "<p><font color='red'>You can't use HTML here so ". 
htmlentities($myarray2) ." is not allowed</font></p>\n";
      }
    }
                                // what html? we exit 
    //exit;
  
}

It find all the < but doesnt' remove the one that I accept, so how can I
find the bad < and transform them to &lt; ?

Thank you and have a nice day.

-- 
Francis Fillion, BAA SI
Broadcasting live from his linux box.
And the maintainer of http://www.windplanet.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to