Jean-Claude Neveu wrote:
Hello,
I was wondering if someone could tell me where I'm going wrong with my
regular expression. I'm trying to write a regexp that identifies whether
a string contains a correctly-formatted currency amount. I want to
support dollars, UK pounds and Euros, but the example below deliberately
omits Euros in case the Euro symbol get mangled anywhere in email or
listserver processing. I also want people to be able to omit the
currency symbol if they wish.
If Euro symbols can get mangled, so can Pound signs.
They're both outside ASCII.
My regexp that I'm matching against is: "^\$\£?\d{0,10}(\.\d{2})?$"
Here's how I think it should work (but clearly I'm wrong, because it
does not actually work):
^\$\£? Require zero or one instance of $ or £ at the start of the
string.
^[$£]? is correct. And, as you're using re.match, the ^ is
superfluous. (A previous message suggested ^[\$£]? which
will also work. You generally need to escape a Dollar sign
but not here.)
You should also think about the encoding. In my terminal,
"£" is identical to '\xc2\xa3'. That is, two bytes for a
UTF-8 code point. If you assume this encoding, it's best to
make it explicit. And if you don't assume a specific
encoding it's best to convert to unicode to do the
comparisons, so for 2.x (or portability) your string should
start u"
d{0,10} Next, require between zero and ten alpha characters.
There's a backslash missing, but not from your original
expression. Digits are not "alpha characters".
(\.\d{2})? Optionally, two characters can follow. They must be preceded
by a decimal point.
That works. Of course, \d{2} is longer than the simpler \d\d
Note that you can comment the original expression like this:
rex = u"""(?x)
^[$£]? # Zero or one instance of $ or £
# at the start of the string.
\d{0,10} # Between zero and ten digits
(\.\d{2})? # Optionally, two digits.
# They must be preceded by a decimal point.
$ # End of line
"""
Then anybody (including you) who comes to read this in the
future will have some idea what you were trying to do.
\> Examples of acceptable input should be:
$12.42
$12
£12.42
$12,482.96 (now I think about it, I have not catered for this in my
regexp)
Yes, you need to think about that.
Graham
--
http://mail.python.org/mailman/listinfo/python-list