On Tue, Nov 29, 2011 at 4:54 PM, Sheppy R <bobross...@gmail.com> wrote:
> Couldn't you just use the non-whitespace character to capture everything > before and after the @ symbol? > > s/^.*\s(\S+@\S+)\s.*$/$1/ > > > Yes you could of course but... this is why I was saying nearly no syntax checking... the minor check to ensure that you have the . in there helps to weed out the mystuff@someplace funny none email addresses. The biggest problem with email addresses is that the rules of how an email address can be formatted are so relaxed and thus so complex that there is to the best of my knowledge not a single person that has ever managed to create a 100% correct regular expression that checks if a string does in fact match all criteria of a valid email address. One of the problems is that even if you where to manage to create such a thing there are a few possibilities in the specification that are valid but will not likely be accepted by any email server or mail client. Take for instance the following email address 1....1....1....1@something...@ somewhere.info this is technically a valid email address but I can already tell you that your mail client is likely to choke on it and the mail server at somewhere.info will not like it much either. This is the problem with the email addresses as they are used as opposed to the specification and what that allows. I would personally try and avoid doing any sanity checking on the emails you filter out at least on the first pass... After all the reason you are filtering emails addresses is because in the end it is better to have a few none existing email addresses then miss a few valid once. The cost of missing a email address is a order of magnitude bigger then the cost of sending out an email that bounces because of a none existing email address. Therefore I suggest you grab as much as you can on the first pass anything that smells like an email address. The next step you can then filter further to for instance only those that have a domain name that your DNS server can lookup. Then the next step is to try and mail them and filter the once out that bounce, so check your email account for any messages stating that the receiving mail server does not know the account. In the end you will have a list of valid existing email addresses that you can then spam to no end or well what ever else your intention is with them ;-) Regards, Rob Coops