On Mon, Dec 18, 2017 at 2:29 AM, Peng Yu <pengyu...@gmail.com> wrote: > Hi, > > I would like to extract "a...@efg.hij.xyz". But it only shows ".hij". > Does anybody see what is wrong with it? Thanks. > > $ cat main.py > #!/usr/bin/env python > # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 > fileencoding=utf-8: > > import re > email_regex = re.compile('[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)') > s = 'a...@efg.hij.xyz.' > for email in re.findall(email_regex, s): > print email > > $ ./main.py > .hij
What is the goal of your email address extraction? There are two goals, one of which cannot be done perfectly but doesn't need to, and the other cannot be done perfectly and is thus virtually useless. If you want to detect email addresses in text and turn them into mailto: links, it's okay to miss out some edge cases, and for that, I would recommend keeping your regex REALLY simple - something like you have above, but maybe even simpler. (And I wouldn't have the parentheses in there, which I think might be what you're getting tripped up on.) But if you're trying to *validate* an email address - for instance, if you receive a form submission and want to know if there was an email address included - then my recommendation is simply DON'T. You can't get all the edge cases right; it is actually impossible for a regex to perfectly match every valid email address and no invalid addresses. And that's only counting *syntactically* valid - it doesn't take into account the fact that "b...@junk.example.com" is not going to get anywhere. So if you're trying to do validation, basically just don't. ChrisA -- https://mail.python.org/mailman/listinfo/python-list