On Jan 30, 2012, at 1:30 PM, Marc Schwartz wrote:
>
> On Jan 30, 2012, at 12:15 PM, David Winsemius wrote:
>
>>
>> On Jan 30, 2012, at 8:44 AM, Paul Miller wrote:
>>
>>> Hi Rui, Marc, and Gabor,
>>>
>>> Thanks for your replies to my question. All were helpful and it was
>>> interesting to see how different people approach various aspects of the
>>> same problem.
>>>
>>> Spent some time this weekend looking at Rui's solution, which is certainly
>>> much clearer than my own. Managed to figure out pretty much all the details
>>> of how it works. Also managed to tweak it slightly in order to make it do
>>> exactly what I wanted. (See revised code below.)
>>>
>>> Still have a couple of questions though. The first concerns the insertion
>>> of the code "Y > 2012" to set year values beyond 2012 to NA (on line 10 of
>>> the function below). When I add this (or use it in place of "nchar(Y) >
>>> 4"), the code succesfully finds the problem date "05/16/2015". After that
>>> though, it produces the following error message:
>>>
>>> Error in if (any(is.na(x) & M != "un" & Y != "un")) cat("Warning: Invalid
>>> date values in", : missing value where TRUE/FALSE needed
>>
>> It's a bit dangerous to use comparison operators on mixed data types. In
>> your case you are comparing a character value to a numeric value and may not
>> realize that 2015 is not the same as "2015". Try "123" > 1000 if you want a
>> quick counter-example. You may want to coerce the Y value to "numeric" mode
>> to be safe.
>>
>> Also 'any' does not expect the logical connectives. You probably want:
>>
>> any(is.na(x) , M != "un" , Y != "un")
>
>
> Perhaps I am missing something relevant here, but I am still confused by what
> I see as an over engineering of the code being implemented. If the primary
> requirements are:
>
> 1. Impute the 15th of month if it is 'un'
> 2. Reject dates prior to 1900 or after 2011
> 3. Reject dates with an unknown ('un') month or year
> 4. Reject years with >4 digits, also presuming that the value passed should
> always be 10 characters in length
>
> If that is the basic functionality required, then a modest modification of my
> prior code should work:
Ack...typo in my code for the upper end of the date range. Should be:
checkDate <- function(x) {
# Replace unknown day with 15
tmp <- gsub("/un/", "/15/", x)
tmp2 <- as.Date(tmp, format = "%m/%d/%Y")
as.character(x[is.na(tmp2) |
tmp2 < as.Date("1900/01/01") |
tmp2 > as.Date("2011/12/31") |
nchar(as.character(x)) > 10])
}
Marc
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.