Re: [Haskell-cafe] Regular Expression with PCRE

Carter Tazio Schonwald Fri, 16 Mar 2012 17:18:49 -0700

There's a lot of reasons why I don't recommend that approach, but I think it's 
best explained by the following now classic stack overflow  post and answer


http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Basically this applies in your case because recognizing if a sequence of 
characters is in a comment block or not for HTML is likely not expressible 
using regexes.  

There may be a way for a very controlled restricted subset of HTML, but it 
might require some complex regexes. 

That said, if you're ok with some false positives and dealing with that, a 
simple regex based solution is the way to go!

Cheers,



-- 
Carter Tazio Schonwald


On Friday, March 16, 2012 at 7:08 PM, Joseph Bozeman wrote:

> My goal is to remove the HTML comments. It probably would be at least as 
> efficient to use an HTML parser, but I usually strip files by hand, and I 
> always use regex then. I didn't want to bother importing yet another package, 
> because if I could just get this line to work, I could get all my stripping 
> done with three functions, and then I have four that I use to apply a 
> template to the text once it's bare.
> 
> On Fri, Mar 16, 2012 at 5:41 PM, Carter Tazio Schonwald 
> <carter.schonw...@gmail.com (mailto:carter.schonw...@gmail.com)> wrote:
> > have you considered using one of the many amazing HTML parsers on hackage? 
> > 
> > If the goal is to just get the HTML comments, that might be a much more 
> > effective use of your time 
> > 
> > -- 
> > Carter Tazio Schonwald
> > 
> > 
> > On Friday, March 16, 2012 at 4:55 PM, Joseph Bozeman wrote:
> > 
> > 
> > 
> > > Hey everyone, I'm hoping someone can point me in the right direction. 
> > > 
> > > The regex-pcre package exports (=~) and (=~~) as two useful infix 
> > > functions. They're great! The only problem is, they are a positive match 
> > > for a regex. I have a file that contains HTML comments (it was generated 
> > > in Word) and I really just want the barest text. I already have a 
> > > function that strips out all the tags, and I have a function that finds 
> > > all the links and sticks those in another file for later perusal. 
> > > 
> > > What I'd like is advice on how to implement the (!~) and (!~~) operators. 
> > > They should have the same types as (=~) and (=~~). I'm stuck, though. 
> > > Here's the source for both of those functions: they depend on 
> > > Text.Rege.PCRE package. 
> > > 
> > > (=~) :: (RegexMaker 
> > > (http://hackage.haskell.org/packages/archive/regex-base/0.93.2/doc/html/Text-Regex-Base-RegexLike.html#t:RegexMaker)
> > >  Regex 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:Regex)
> > >  CompOption 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:CompOption)
> > >  ExecOption 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:ExecOption)
> > >  source, RegexContext 
> > > (http://hackage.haskell.org/packages/archive/regex-base/0.93.2/doc/html/Text-Regex-Base-RegexLike.html#t:RegexContext)
> > >  Regex 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:Regex)
> > >  source1 target) => source1 -> source -> target 
> > > (=~) x r = let q :: Regex q = makeRegex r in match q x
> > > 
> > > (=~~) :: (RegexMaker 
> > > (http://hackage.haskell.org/packages/archive/regex-base/0.93.2/doc/html/Text-Regex-Base-RegexLike.html#t:RegexMaker)
> > >  Regex 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:Regex)
> > >  CompOption 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:CompOption)
> > >  ExecOption 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:ExecOption)
> > >  source, RegexContext 
> > > (http://hackage.haskell.org/packages/archive/regex-base/0.93.2/doc/html/Text-Regex-Base-RegexLike.html#t:RegexContext)
> > >  Regex 
> > > (http://hackage.haskell.org/packages/archive/regex-pcre/0.94.2/doc/html/Text-Regex-PCRE-Wrap.html#t:Regex)
> > >  source1 target, Monad 
> > > (http://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/Control-Monad.html#t:Monad)
> > >  m) => source1 -> source -> m target
> > > (=~~) x r = do (q :: Regex) <- makeRegexM r matchM q x
> > > What I figured I could do was find a function that was the inverse of 
> > > "match" and "matchM", but I can't find any in the docs. I really hope I 
> > > don't have to implement that, too. I'm still new at this, and that seems 
> > > like it would be over my head.
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Haskell-Cafe mailing list
> > > Haskell-Cafe@haskell.org (mailto:Haskell-Cafe@haskell.org)
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > > 
> > > 
> > > 
> > 
> > 
>

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Regular Expression with PCRE

Reply via email to