Re: [R] pasteFromExcel

Dan Murphy Tue, 21 Oct 2014 06:07:36 -0700

Sure, that is one pattern to try to detect, but there are many more
(e.g., cannot have multiple '$' or '€'). For speed, I'm looking for a
*single* expression to detect valid currency strings in one grep.


The one shown for US works. For euros, it might suffice to replace '$'
with '€' but I cannot test that in my location. Can you? I.e.,

Change currencypattern to

currencypattern <-
  
"^\\€?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))€|^\\-?\\€?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))€|^\\€?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)€"

Does Excel display euro values per the format within this test vector?

x <- c("1,234.00", "12,34.00", "€1,000", "(124)", "$(123)", "(€123)",
"  1,000   ", "NA")

and does grep yield the correct answer? It should, but better to test
it than assume.

grep(currencypattern, trim(x))
[1] 1 3 4 5 7  # correct answer

I suppose my biggest holdup is knowing how Excel formats currencies in
other denominations. Maybe there's a way for me to test euro,
sterling, etc. in my location, but I haven't discovered it yet. :(

Again, thanks for your help.

-Dan

On Mon, Oct 20, 2014 at 3:39 PM, David Winsemius <dwinsem...@comcast.net> wrote:
>
> On Oct 20, 2014, at 2:34 PM, Dan Murphy wrote:
>
>> Good ideas, David.
>>
>> 1) By "confirm that MS Excel honors that OutDec" I mean that, in a
>> location (France? others?) where options("OutDec") is a comma, does MS
>> Excel format numbers that way when displaying currencies with decimal
>> places? I have no way of knowing if that is true in all OutDec = ","
>> locales.
>>
>> 2) I wish it were as simple as just removing unwanted "adornments."
>> The issue is that such "adornments" must be in their proper places for
>> the character string to represent a currency value, or a numeric value
>> for that matter. If I add one more comma to your first element in the
>> wrong place, it should not translate to a valid numeric, but it does
>> with your gsub, which would be a bug if that were in pasteFromExcel:
>>> gsub(rmchar, "", c("$1,0,00", "1,200", "800"))
>> [1] "1000" "1200" "800"
>
> If you wanted to restrict the substitutions to only the commas that were 
> succeeded by three digits then this succeeds:
>
> gsub("(\\,)(\\d{3,3})", "\\2", c("1,000,000,000.00") )
> [1] "1000000000.00"
>
> You should also take a look at formatC which has provisions for output using 
> commas.
>
> -
> david.
>
>
>>
>> When I originally looked into this I believed I couldn't be the first
>> one asking that question .. and I wasn't. There are many hits for
>> regular expressions that purport to successfully identify well-formed
>> *US dollar* currency strings. The expression in pasteFromExcel is
>> based on 
>> http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex.
>>
>> I'm curious if anyone has come across -- and tested -- a similar
>> regular expression in other places that might have use for
>> pasteFromExcel.
>>
>> This is how pasteFromExcel uses its currency regular expression (the
>> first ugly assignment is what I'm looking for in other locales around
>> the world -- maybe there's a Regular Expression mailing list out
>> there):
>>
>> currencypattern <-
>> "^\\$?\\-?([1-9]{1}[0-9]{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\-?\\$?([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))$|^\\$?\\(([1-9]{1}\\d{0,2}(\\,\\d{3})*(\\.\\d{0,2})?|[1-9]{1}\\d{0,}(\\.\\d{0,2})?|0(\\.\\d{0,2})?|(\\.\\d{1,2}))\\)$"
>>
>> # Here's a test vector
>> x <- c("1,234.00", "12,34.00", "$1,000", "(124)", "$(123)", "($123)",
>> "  1,000   ", "NA")
>>
>> # grep will tell you whether elements of x, trimmed of
>> beginning/ending whitespace, match the currencypattern
>> grep(currencypattern, trim(x))
>> [1] 1 3 4 5 7  # correct answer
>>
>> *Now* one may remove unwanted characters from the well-formed strings.
>> And deal with the "negatives" of course .. and NAs. See how that's
>> done in excelRio.r in the excelRio package on github:
>> https://github.com/trinostics/excelRio
>>
>> Thanks for your interest.
>>
>>
>> On Mon, Oct 20, 2014 at 10:56 AM, David Winsemius
>> <dwinsem...@comcast.net> wrote:
>>>
>>> On Oct 20, 2014, at 10:29 AM, Dan Murphy wrote:
>>>
>>>> Nice.
>>>> So if someone were to offer a currency regular expression that works
>>>> in their locale, I should also ask them to give me the results of
>>>> Sys.getlocale("LC_MONETARY")
>>>> and
>>>> options("OutDec")
>>>> and confirm that MS Excel honors that OutDec.
>>>
>>> I'm not sure we can know what you mean by "confirm that MS Excel honors 
>>> that OutDec." The result of options("OutDec") was intended for you to 
>>> determine what character not to remove from a monetary value in an R 
>>> workspace. If the assumption is that all values will be in the same unit 
>>> and that the user is not doing any currency conversions then:
>>>
>>>> decsep <- options("OutDec")
>>>> rmchar <- paste0( "[$£€", c(".", ",")[!c(".", ",") %in% decsep], "]" )
>>>> gsub(rmchar, "", c("$1,000", "1,200", "800"))
>>> [1] "1000" "1200" "800"
>>>
>>>
>>>> Thank you, David.
>>>> -Dan
>>>>
>>>> On Mon, Oct 20, 2014 at 10:04 AM, David Winsemius
>>>> <dwinsem...@comcast.net> wrote:
>>>>>
>>>>> On Oct 19, 2014, at 11:18 PM, Dan Murphy wrote:
>>>>>
>>>>>> To Users of Excel:
>>>>>>
>>>>>> Following advice from Brian and Markus, I created an RMarkdown "vignette"
>>>>>> that shows an example of how the pasteFromExcel function in the excelRio
>>>>>> package on github could be used by an actuary to transfer a triangle from
>>>>>> Excel to R. See today's post at http://trinostics.blogspot.com/
>>>>>>
>>>>>> Unfortunately, if you are located outside the US, the demonstrated
>>>>>> functionality will not work for you because the currency regex 
>>>>>> implemented
>>>>>> assumes the dollar sign ($) and comma/decimal punctuation of the form
>>>>>> 999,999.00.
>>>>>>
>>>>>> If anyone is interested in contributing currency regex expressions that
>>>>>> work in your locale, I would be happy to try to incorporate them in the
>>>>>> package. If anyone knows how best to determine the user's locale (might
>>>>>> "timezone" suffice?), I'd appreciate that help too.
>>>>>>
>>>>>
>>>>> ?Sys.getlocale   # perhaps "LC_MONETARY"
>>>>>
>>>>> ?options   # look for OutDec
>>>>>
>>>>>
>>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>>
>>>>>
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>>
>>>
>>> David Winsemius
>>> Alameda, CA, USA
>>>
>
> David Winsemius
> Alameda, CA, USA
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] pasteFromExcel

Reply via email to