Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Chel Hee Lee Wed, 03 Dec 2014 17:50:56 -0800

The output in the object 'new1' are apparently same the output in theobject 'new2'. Are you trying to compare the entries of two outputs'new1' and 'new2'? If so, the function 'all()' would be useful:


> all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms ofclass, then the function 'identical()' is useful:


> attributes(new1)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"

$class
[1] "rowwise_df" "tbl_df"     "tbl"        "data.frame"

$row.names
[1] 1 2 3 4 5 6 7

> attributes(new2)
$names
[1] "id"      "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

Hello,

Two alternative approaches - mutate() vs. sapply() - were used to get the 
desired results (i.e., creating a new column of the most recent date  from 4 
dates ) with help from Arun and Mark on this forum.  I now find that the two 
data objects (created using two different approaches) are not identical 
although results are exactly the same.

identical(new1, new2)
[1] FALSE

Please see the reproducible example below.

I don't understand why the code returns FALSE here.  Any hints/comments  will 
be  appreciated.

Thanks,

Pradip

#############################################  reproducible example 
########################################
library(dplyr)
# data object - description

temp <- "id  mrjdate cocdate inhdate haldate
1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
2             NA         NA         NA         NA
3     2009-10-24         NA 2011-10-13         NA
4     2007-10-10         NA         NA         NA
5     2006-09-01 2005-08-10         NA         NA
6     2007-09-04 2011-10-05         NA         NA
7     2005-10-25         NA         NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp),
                     colClasses=c("character", "Date", "Date", "Date", "Date"),
                     header=TRUE, as.is=TRUE
                     )


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>%
      rowwise() %>%
       mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
                                                                na.rm=TRUE), 
origin='1970-01-01'))

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
   if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')])))) {
     max_d <- NA
   } else {
     max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
   }
   max_d}),
   origin = "1970-01-01")

identical(new1, new2)

# print records

print (new1); print(new2)

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Sunday, November 09, 2014 6:11 AM
To: 'Mark Sharp'
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your 
help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:msh...@txbiomed.org]
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of 
the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() 
function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
   if (all(is.na(unlist(data1[row, -1])))) {
     max_d <- NA
   } else {
     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
   }
   max_d}),
   origin = "1970-01-01")

data3
   id    mrjdate    cocdate    inhdate    haldate    oidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2       <NA>       <NA>       <NA>       <NA>       <NA>
3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute 
P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally 
privileged.  It is covered by the Electronic Communications Privacy Act, 18 
U.S.C.2510-2521.  If you are not the intended recipient, you are hereby 
notified that any retention, dissemination, distribution or copying of this 
communication is strictly prohibited.  Please reply to the sender that you have 
received this message in error, then delete it.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Reply via email to