Jean-Claude Neveu wrote:
Hello,

I was wondering if someone could tell me where I'm going wrong with my regular expression. I'm trying to write a regexp that identifies whether a string contains a correctly-formatted currency amount. I want to support dollars, UK pounds and Euros, but the example below deliberately omits Euros in case the Euro symbol get mangled anywhere in email or listserver processing. I also want people to be able to omit the currency symbol if they wish.

If Euro symbols can get mangled, so can Pound signs. They're both outside ASCII.

My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"

Here's how I think it should work (but clearly I'm wrong, because it does not actually work):

^\$\£? Require zero or one instance of $ or £ at the start of the string.

^[$£]? is correct. And, as you're using re.match, the ^ is superfluous. (A previous message suggested ^[\$£]? which will also work. You generally need to escape a Dollar sign but not here.)

You should also think about the encoding. In my terminal, "£" is identical to '\xc2\xa3'. That is, two bytes for a UTF-8 code point. If you assume this encoding, it's best to make it explicit. And if you don't assume a specific encoding it's best to convert to unicode to do the comparisons, so for 2.x (or portability) your string should start u"

d{0,10}     Next, require between zero and ten alpha characters.

There's a backslash missing, but not from your original expression. Digits are not "alpha characters".

(\.\d{2})? Optionally, two characters can follow. They must be preceded by a decimal point.

That works.  Of course, \d{2} is longer than the simpler \d\d

Note that you can comment the original expression like this:

rex = u"""(?x)
    ^[$£]?    # Zero or one instance of $ or £
               # at the start of the string.
    \d{0,10}   # Between zero and ten digits
    (\.\d{2})? # Optionally, two digits.
               # They must be preceded by a decimal point.
    $          # End of line
"""

Then anybody (including you) who comes to read this in the future will have some idea what you were trying to do.

\> Examples of acceptable input should be:

$12.42
$12
£12.42
$12,482.96 (now I think about it, I have not catered for this in my regexp)

Yes, you need to think about that.


               Graham

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to