On Sun, 5 Oct 2003 04:46:16 -0400, you wrote:

>I'm trying to strip comments out of my code.  I can get it to strip one
>section of comments but the problem comes in when I have more then one
>comment section to strip.
>
>I am using this: $code = preg_replace('/\/*(.*?)*\//is', '$1', $code) and
>need help fixing my regex.

As someone already mentioned, regexes aren't the right tool for this job.
Consider:

echo ("/*"); /* test */

And while that's unlikely in real code it is /possible/, and doing it the
right way is so easy due to the tokenizer functions
(http://www.php.net/manual/en/ref.tokenizer.php) that it would be foolish
not to.

The following script prints out it's own source code, sans comments (I use
something like this to replace tabs with spaces). It's adapted from a
fragment in the manual.

It removes /* comments */, <!-- comments --> and // comments

<?php

$incoming = file_get_contents ($PATH_TRANSLATED);
echo (strip_comments ($incoming));

function strip_comments ($in)
{
    $out = '';
    $tokens = token_get_all ($in);

    foreach ($tokens as $token)
    {
        if (is_string ($token))
        {
            $out .= $token;
        } else {
            list ($id, $text) = $token;
            switch ($id) { 
                case T_INLINE_HTML :
                    $out .= preg_replace ('/<!--(.|\s)*?-->/', '', $text);
                    break;
                case T_COMMENT :
                case T_ML_COMMENT : break;
                default : $out .= $text;
                          break;
            }
        }
    }

    return ($out);
}
?>

I'm reasonably certain I can get away with using a regex to strip HTML
comments because SGML/XML are stricter on the placing of angle brackets.

If anyone can come up with a case that breaks the regex I'll take a shot at
an XSLT-based fix.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to