On Sun, 5 Oct 2003 04:46:16 -0400, you wrote: >I'm trying to strip comments out of my code. I can get it to strip one >section of comments but the problem comes in when I have more then one >comment section to strip. > >I am using this: $code = preg_replace('/\/*(.*?)*\//is', '$1', $code) and >need help fixing my regex.
As someone already mentioned, regexes aren't the right tool for this job. Consider: echo ("/*"); /* test */ And while that's unlikely in real code it is /possible/, and doing it the right way is so easy due to the tokenizer functions (http://www.php.net/manual/en/ref.tokenizer.php) that it would be foolish not to. The following script prints out it's own source code, sans comments (I use something like this to replace tabs with spaces). It's adapted from a fragment in the manual. It removes /* comments */, <!-- comments --> and // comments <?php $incoming = file_get_contents ($PATH_TRANSLATED); echo (strip_comments ($incoming)); function strip_comments ($in) { $out = ''; $tokens = token_get_all ($in); foreach ($tokens as $token) { if (is_string ($token)) { $out .= $token; } else { list ($id, $text) = $token; switch ($id) { case T_INLINE_HTML : $out .= preg_replace ('/<!--(.|\s)*?-->/', '', $text); break; case T_COMMENT : case T_ML_COMMENT : break; default : $out .= $text; break; } } } return ($out); } ?> I'm reasonably certain I can get away with using a regex to strip HTML comments because SGML/XML are stricter on the placing of angle brackets. If anyone can come up with a case that breaks the regex I'll take a shot at an XSLT-based fix. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php