> -----Original Message-----
> From: Andreas Perstinger [mailto:andiper...@gmail.com]
> Sent: Tuesday, May 28, 2013 11:10 PM
> To: php-general@lists.php.net
> Subject: Re: [PHP] need some regex help to strip out // comments but not
> http:// urls
> 
> On 28.05.2013 23:17, Daevid Vincent wrote:
> > I want to remove all comments of the // variety, HOWEVER I don't want to
> > remove URLs...
> >
> 
> You need a negative look behind assertion
> ( http://www.php.net/manual/en/regexp.reference.assertions.php ).
> 
> "(?<!http:)//" will match "//" only if it isn't preceded by "http:".
> 
> Bye, Andreas

This worked like a CHAMP Andreas my friend! You are a regex guru!

> -----Original Message-----
> From: Sean Greenslade [mailto:zootboys...@gmail.com]
> Sent: Wednesday, May 29, 2013 10:28 AM
>
> Also, (I haven't tested it, but) I don't think that example you gave
> would work. Without any sort of quoting around the "http://";
> , I would assume the JS interpreter would take that double slash as a
> comment starter. Do tell me if I'm wrong, though.

You're wrong Sean. :-p

This regex works in all cases listed in my example target string.

\s*(?<!:)//.*?$

Or in my actual compress() method:

$sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob);

Target test case with intentional traps:

// another comment here
<iframe src="http://foo.com";>
function bookmarksite(title,url){
    if (window.sidebar) // firefox
        window.sidebar.addPanel(title, url, "");
    else if(window.opera && window.print){ // opera
        var elem = document.createElement('a');
        elem.setAttribute('href',url);
        elem.setAttribute('title',title);
        elem.setAttribute('rel','sidebar');
        elem.click();
    } 
    else if(document.all)// ie
        window.external.AddFavorite(url, title);
}


And for those interested here is the whole method...

public function compress($sBlob)
{
        //remove C style /* */ blocks as well as PHPDoc /** */ blocks
        $sBlob = preg_replace("@/\*(.*?)\*/@s",'',$sBlob);
        //$sBlob =
preg_replace("/\*[^*]*\*+(?:[^*/][^*]*\*+)*/s",'',$sBlob);
        //$sBlob = preg_replace("/\\*(?:.|[\\n\\r])*?\\*/s",'',$sBlob);

        //remove // or # style comments at the start of a line possibly
redundant with next preg_replace
        $sBlob =
preg_replace("@^\s*((^\s*(#+|//+)\s*.+?$\n)+)@m",'',$sBlob);
        //remove // style comments that might be tagged onto valid code
lines. we don't try for # style as that's risky and not widely used
        // @see http://www.php.net/manual/en/regexp.reference.assertions.php
        $sBlob = preg_replace("@\s*(?<!:)//.*?$@m",'',$sBlob);

        if (in_array($this->_file_name_suffix, array('html','htm')))
        {
                //remove <!-- --> blocks
                $sBlob = preg_replace("/<!--[^\[](.*?)-->/s",'',$sBlob);

                //if Tidy is enabled...
                //if (!extension_loaded('tidy')) dl( ((PHP_SHLIB_SUFFIX ===
'dll') ? 'php_' : '') . 'tidy.' . PHP_SHLIB_SUFFIX);
                if (FALSE && extension_loaded('tidy'))
                {
                        //use Tidy to clean up the rest. There may be some
redundancy with the above, but it shouldn't hurt
                        //See all parameters available here:
http://tidy.sourceforge.net/docs/quickref.html
                        $tconfig = array(
                                            'clean' => true,
                                            'hide-comments' => true,
                                                'hide-endtags' => true,
        
'drop-proprietary-attributes' => true,
                                                'join-classes' => true,
                                                'join-styles' => true,
                                                'quote-marks' => false,
                                                'fix-uri' => false,
                                                'numeric-entities' => true,
                                                'preserve-entities' => true,
                                                'doctype' => 'omit',
                                                'tab-size' => 1,
                                                'wrap' => 0,
                                                'wrap-php' => false,
                                                'char-encoding' => 'raw',
                                                'input-encoding' => 'raw',
                                                'output-encoding' => 'raw',
                                                'ascii-chars' => true,
                                                'newline' => 'LF',
                                                'tidy-mark' => false,
                                                'quiet' => true,
                                                'show-errors' =>
($this->_debug ? 6 : 0),
                                                'show-warnings' =>
$this->_debug,
                        );

                        if ($this->_log_messages) $tconfig['error-file'] =
DBLOGPATH.'/'.$this->get_file_name().'_tidy.log';

                        $tidy = tidy_parse_string($sBlob, $tconfig, 'utf8');
                        $tidy->cleanRepair();
                        $sBlob = tidy_get_output($tidy);

                        /*
                        //FIXME: [dv] this is an attempted hack to restore
what Tidy fucks up...
        
//http://lists.w3.org/Archives/Public/html-tidy/2013AprJun/ should be a
message from me on 2013-05-01
                        //$sBlob = str_replace(array('&lt;?=', '?&gt;',
'-&gt;'), array('<?=', '?>', '->'), $sBlob);
                        //$sBlob = str_replace(array('&lt;', '&gt;'),
array('<', '>'), $sBlob);
                        */
                }
        }

        //condense multiple white spaces with a single white space
        
//http://stackoverflow.com/questions/1981349/regex-to-replace-multiple-space
s-with-a-single-space
        
//http://stackoverflow.com/questions/2326125/remove-multiple-whitespaces-in-
php
        $sBlob = preg_replace('/\s+/', ' ', $sBlob);

        //ini_set('xdebug.var_display_max_data', -1); var_dump($sBlob);
        return $sBlob;
}

I never was able to get Tidy to not dick with my < and > chars
unfortunately, however I think even without it, I get most of what I was
looking to accomplish, so I'm not crying over it.


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to