I hate it when I post something and then find a bit of information I should have included.
http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg The poster lists four valid HTML constructs that regex are ill equiped to handle. The commentors add more examples. Somu, I believe this is what you've been asking for. On Sat, Apr 14, 2012 at 09:44:59AM -0700, Michael Rasmussen wrote: > On Sat, Apr 14, 2012 at 07:05:54PM +0300, Shlomi Fish wrote: > > Hi Somu, > > > > On Sat, 14 Apr 2012 21:01:03 +0530 > > Somu <som....@gmail.com> wrote: > > > > > OK. Can i ask "WHY?" > > > Why can't it be done using regex. Isn't a html file just another long > > > string with more, but similar special characters?? > > > > > > > first of all I should note that you appear to be replying to the wrong > > messages > > which breaks the flow of the thread. Otherwise, please read the links which > > I > > gave you: > > I did, he may or may not have but ... > They all saw to not do it without the "WHY". The closest is > http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html > "It's a solved problem" being the "WHY" given. > > Well, that's not totally fair of me. > > http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags > Does start: > You can't parse [X]HTML with regex. Because HTML can't be parsed by > regex. Regex is not a tool that can be used to correctly parse HTML. > ... > Regular expressions are a tool that is insufficiently sophisticated to > understand the constructs employed by HTML. > > Though the humor in the rest of the post mask that essential statement. > > Somu, regex to HTML parsing is like: > screwdriver to nail > butter knife to screw > mid sized car to coal transport > bicycle to 3,000 km journey to be completed in 48 hours > meat to a vegetarian > hair brush to can of paint > > To a greater or lessor degree you might try to use one for the purpose > but it's not the right tool for the job. > > -- > Michael Rasmussen, Portland Oregon > Other Adventures: http://www.jamhome.us/ or http://westy.saunter.us/ > Fortune Cookie Fortune du courrier: > By being willing to be a bad artist, you have a chance to BE an artist, > and perhaps over time, a very good one > ~ Julia Cameron > > s/artist/what you want to be/ > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > -- Michael Rasmussen, Portland Oregon Other Adventures: http://www.jamhome.us/ or http://westy.saunter.us/ Fortune Cookie Fortune du courrier: You're suddenly worried about how much is in your retirement account, but other people are worried about how much is on their dinner plate tonight. ~ Rick Steves on the economy March 2009 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/