On Thu, Nov 26, 2015 at 10:44 AM, Grobu <snailcoder@retrosite.invalid> wrote: > On 26/11/15 00:06, Chris Angelico wrote: >> >> On Thu, Nov 26, 2015 at 9:48 AM, ryguy7272 <ryanshu...@gmail.com> wrote: >>> >>> Thanks!! Is that regex? Can you explain exactly what it is doing? >>> Also, it seems to pick up a lot more than just the list I wanted, but >>> that's ok, I can see why it does that. >>> >>> Can you just please explain what it's doing??? >> >> >> It's a trap! >> >> Don't use a regex to parse HTML, unless you're deliberately trying to >> entice young and innocent programmers to the dark side. >> >> ChrisA >> > > Sorry, I wasn't aware of regex being on the dark side :-) > Now that you mention it, I suppose that their being complex and > error-inducing could lead to broken code all too easily when there is a > reliable, ready-made solution like BeautifulSoup.
Regular expressions have their uses, but parsing HTML is not one of them. The most important use of a regex is letting an end user control the search pattern; it's a compact language for describing a variety of text search concepts. For hard-coded regular expressions, there are some places where they're very good, and a lot of places where they're the wrong tool for the job. And one of those wrong-tool-for-job places is parsing stuff that fundamentally cannot be parsed with regexes, such as HTML. You _need_ a proper parser, which is what Beautiful Soup is for. ChrisA -- https://mail.python.org/mailman/listinfo/python-list