Sure, that is one pattern to try to detect, but there are many more (e.g., cannot have multiple '$' or '€'). For speed, I'm looking for a *single* expression to detect valid currency strings in one grep.
The one shown for US works. For euros, it might suffice to replace '$' with '€' but I cannot test that in my location. Can you? I.e., Change currencypattern to currencypattern <- "^\\€?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))€|^\\-?\\€?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))€|^\\€?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)€" Does Excel display euro values per the format within this test vector? x <- c("1,234.00", "12,34.00", "€1,000", "(124)", "$(123)", "(€123)", " 1,000 ", "NA") and does grep yield the correct answer? It should, but better to test it than assume. grep(currencypattern, trim(x)) [1] 1 3 4 5 7 # correct answer I suppose my biggest holdup is knowing how Excel formats currencies in other denominations. Maybe there's a way for me to test euro, sterling, etc. in my location, but I haven't discovered it yet. :( Again, thanks for your help. -Dan On Mon, Oct 20, 2014 at 3:39 PM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Oct 20, 2014, at 2:34 PM, Dan Murphy wrote: > >> Good ideas, David. >> >> 1) By "confirm that MS Excel honors that OutDec" I mean that, in a >> location (France? others?) where options("OutDec") is a comma, does MS >> Excel format numbers that way when displaying currencies with decimal >> places? I have no way of knowing if that is true in all OutDec = "," >> locales. >> >> 2) I wish it were as simple as just removing unwanted "adornments." >> The issue is that such "adornments" must be in their proper places for >> the character string to represent a currency value, or a numeric value >> for that matter. If I add one more comma to your first element in the >> wrong place, it should not translate to a valid numeric, but it does >> with your gsub, which would be a bug if that were in pasteFromExcel: >>> gsub(rmchar, "", c("$1,0,00", "1,200", "800")) >> [1] "1000" "1200" "800" > > If you wanted to restrict the substitutions to only the commas that were > succeeded by three digits then this succeeds: > > gsub("(\\,)(\\d{3,3})", "\\2", c("1,000,000,000.00") ) > [1] "1000000000.00" > > You should also take a look at formatC which has provisions for output using > commas. > > - > david. > > >> >> When I originally looked into this I believed I couldn't be the first >> one asking that question .. and I wasn't. There are many hits for >> regular expressions that purport to successfully identify well-formed >> *US dollar* currency strings. The expression in pasteFromExcel is >> based on >> http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex. >> >> I'm curious if anyone has come across -- and tested -- a similar >> regular expression in other places that might have use for >> pasteFromExcel. >> >> This is how pasteFromExcel uses its currency regular expression (the >> first ugly assignment is what I'm looking for in other locales around >> the world -- maybe there's a Regular Expression mailing list out >> there): >> >> currencypattern <- >> "^\\$?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\-?\\$?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\$?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)$" >> >> # Here's a test vector >> x <- c("1,234.00", "12,34.00", "$1,000", "(124)", "$(123)", "($123)", >> " 1,000 ", "NA") >> >> # grep will tell you whether elements of x, trimmed of >> beginning/ending whitespace, match the currencypattern >> grep(currencypattern, trim(x)) >> [1] 1 3 4 5 7 # correct answer >> >> *Now* one may remove unwanted characters from the well-formed strings. >> And deal with the "negatives" of course .. and NAs. See how that's >> done in excelRio.r in the excelRio package on github: >> https://github.com/trinostics/excelRio >> >> Thanks for your interest. >> >> >> On Mon, Oct 20, 2014 at 10:56 AM, David Winsemius >> <dwinsem...@comcast.net> wrote: >>> >>> On Oct 20, 2014, at 10:29 AM, Dan Murphy wrote: >>> >>>> Nice. >>>> So if someone were to offer a currency regular expression that works >>>> in their locale, I should also ask them to give me the results of >>>> Sys.getlocale("LC_MONETARY") >>>> and >>>> options("OutDec") >>>> and confirm that MS Excel honors that OutDec. >>> >>> I'm not sure we can know what you mean by "confirm that MS Excel honors >>> that OutDec." The result of options("OutDec") was intended for you to >>> determine what character not to remove from a monetary value in an R >>> workspace. If the assumption is that all values will be in the same unit >>> and that the user is not doing any currency conversions then: >>> >>>> decsep <- options("OutDec") >>>> rmchar <- paste0( "[$£€", c(".", ",")[!c(".", ",") %in% decsep], "]" ) >>>> gsub(rmchar, "", c("$1,000", "1,200", "800")) >>> [1] "1000" "1200" "800" >>> >>> >>>> Thank you, David. >>>> -Dan >>>> >>>> On Mon, Oct 20, 2014 at 10:04 AM, David Winsemius >>>> <dwinsem...@comcast.net> wrote: >>>>> >>>>> On Oct 19, 2014, at 11:18 PM, Dan Murphy wrote: >>>>> >>>>>> To Users of Excel: >>>>>> >>>>>> Following advice from Brian and Markus, I created an RMarkdown "vignette" >>>>>> that shows an example of how the pasteFromExcel function in the excelRio >>>>>> package on github could be used by an actuary to transfer a triangle from >>>>>> Excel to R. See today's post at http://trinostics.blogspot.com/ >>>>>> >>>>>> Unfortunately, if you are located outside the US, the demonstrated >>>>>> functionality will not work for you because the currency regex >>>>>> implemented >>>>>> assumes the dollar sign ($) and comma/decimal punctuation of the form >>>>>> 999,999.00. >>>>>> >>>>>> If anyone is interested in contributing currency regex expressions that >>>>>> work in your locale, I would be happy to try to incorporate them in the >>>>>> package. If anyone knows how best to determine the user's locale (might >>>>>> "timezone" suffice?), I'd appreciate that help too. >>>>>> >>>>> >>>>> ?Sys.getlocale # perhaps "LC_MONETARY" >>>>> >>>>> ?options # look for OutDec >>>>> >>>>> >>>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> >>>>> David Winsemius >>>>> Alameda, CA, USA >>>>> >>> >>> David Winsemius >>> Alameda, CA, USA >>> > > David Winsemius > Alameda, CA, USA > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.