I hate it when I post something and then find a bit of information I should 
have included.

http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg

The poster lists four valid HTML constructs that regex are ill equiped to 
handle.
The commentors add more examples.

Somu, I believe this is what you've been asking for.



On Sat, Apr 14, 2012 at 09:44:59AM -0700, Michael Rasmussen wrote:
> On Sat, Apr 14, 2012 at 07:05:54PM +0300, Shlomi Fish wrote:
> > Hi Somu,
> > 
> > On Sat, 14 Apr 2012 21:01:03 +0530
> > Somu <som....@gmail.com> wrote:
> > 
> > > OK. Can i ask "WHY?"
> > > Why can't it be done using regex. Isn't a html file just another long
> > > string with more, but similar special characters??
> > > 
> > 
> > first of all I should note that you appear to be replying to the wrong 
> > messages
> > which breaks the flow of the thread. Otherwise, please read the links which 
> > I
> > gave you:
> 
> I did, he may or may not have but ...
> They all saw to not do it without the "WHY".  The closest is 
>   http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
> "It's a solved problem" being the "WHY" given. 
> 
> Well, that's not totally fair of me. 
> 
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
> Does start:
>  You can't parse [X]HTML with regex. Because HTML can't be parsed by
>  regex. Regex is not a tool that can be used to correctly parse HTML. 
>    ...
>  Regular expressions are a tool that is insufficiently sophisticated to
>  understand the constructs employed by HTML.
> 
> Though the humor in the rest of the post mask that essential statement.
> 
> Somu, regex to HTML parsing is like:
>   screwdriver to nail
>   butter knife to screw
>   mid sized car to coal transport
>   bicycle to 3,000 km journey to be completed in 48 hours
>   meat to a vegetarian
>   hair brush to can of paint
> 
> To a greater or lessor degree you might try to use one for the purpose
> but it's not the right tool for the job.  
> 
> -- 
>             Michael Rasmussen, Portland Oregon  
>       Other Adventures: http://www.jamhome.us/ or http://westy.saunter.us/
> Fortune Cookie Fortune du courrier:
> By being willing to be a bad artist, you have a chance to BE an artist, 
> and perhaps over time, a very good one 
>     ~ Julia Cameron
> 
> s/artist/what you want to be/
> 
> -- 
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
> 
> 

-- 
            Michael Rasmussen, Portland Oregon  
      Other Adventures: http://www.jamhome.us/ or http://westy.saunter.us/
Fortune Cookie Fortune du courrier:
You're suddenly worried about how much is in your retirement account, 
but other people are worried about how much is on their dinner plate tonight.
    ~ Rick Steves on the economy March 2009

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to