Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Diggs > Sent: Monday, April 25, 2011 11:05 AM > To: christoph.jaec...@wi.tum.de > Cc: r-help@r-project.org > Subject: Re: [R] Problem with ddply in the plyr-package: > surprising output of a date-column > > On 4/25/2011 10:19 AM, Christoph Jäckel wrote: > > Hi Together, > > > > I have a problem with the plyr package - more precisely > with the ddply > > function - and would be very grateful for any help. I hope > the example > > here is precise enough for someone to identify the problem. > Basically, > > in this step I want to identify observations that are identical in > > terms of certain identifiers (ID1, ID2, ID3) and just want to save > > those observations (in this step, without deleting any rows or > > manipulating any data) in a separate data.frame. However, I get the > > warning message below and the column with dates is messed up. > > Interestingly, the value column (the type is factor here, but if you > > change that with as.integer it doesn't make any difference) > is handled > > correctly. Any idea what I do wrong? > > > > df<- > data.frame(cbind(ID1=c(1,2,2,3,3,4,4),ID2=c('a','b','b','c','d ','e','e'),ID3=c("v1","v1","v1","v1","v2","v1","v1"), > > > > > Date=c("1985-05-1","1985-05-2","1985-05-3","1985-05-4","1985-0 > 5-5","1985-05-6","1985-05-7"), > > Value=c(1,2,3,4,5,6,7))) > > df[,1]<- as.character(df[,1]) > > df[,2]<- as.character(df[,2]) > > df$Date<- strptime(df$Date,"%Y-%m-%d") > > > > #Apparently there are two observation that have the same > IDs: ID1=2 and ID1=4 > > ddply(df,.(ID1,ID2,ID3),nrow) > > #I want to save those IDs in a separate data.frame, so the > desired output is: > > df[c(2:3,6:7),] > > > > #My idea: Write a custom function that only returns > observations with > > multiple rows. > > #Seems to work except that the Date column doesn't make any > sense anymore > > #Warning message: In output[[var]][rng]<- df[[var]]: number of items > > to replace is not a multiple of replacement length > > ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df}) > > > > #Notice that it works perfectly if I only have one observation with > > multiple rows > > ddply(df[1:6,],.(ID1,ID2,ID3),function(df) > if(nrow(df)<=1){NULL}else{df}) > > Works for me: > > > df[c(2:3,6:7),] > ID1 ID2 ID3 Date Value > 2 2 b v1 1985-05-2 2 > 3 2 b v1 1985-05-3 3 > 6 4 e v1 1985-05-6 6 > 7 4 e v1 1985-05-7 7 > > ddply(df,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df}) > ID1 ID2 ID3 Date Value > 1 2 b v1 1985-05-2 2 > 2 2 b v1 1985-05-3 3 > 3 4 e v1 1985-05-6 6 > 4 4 e v1 1985-05-7 7 > [ ... version info elided ... ] > A couple of things: there was just an update of plyr to 1.5.2; maybe > that fixes what you are seeing? Also, your df consists of > only factors. > cbind-ing the data before turning it into a data.frame makes it a > character matrix which gets converted to factors. > > > str(df) > 'data.frame': 7 obs. of 5 variables: > $ ID1 : Factor w/ 4 levels "1","2","3","4": 1 2 2 3 3 4 4 > $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5 > $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1 > $ Date : Factor w/ 7 levels "1985-05-1","1985-05-2",..: 1 2 > 3 4 5 6 7 > $ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7 The OP's data.frame contained a POSIXlt (not factor) object in the "Date" column > str(df) 'data.frame': 7 obs. of 5 variables: $ ID1 : chr "1" "2" "2" "3" ... $ ID2 : chr "a" "b" "b" "c" ... $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1 $ Date : POSIXlt, format: "1985-05-01" "1985-05-02" ... $ Value: Factor w/ 7 levels "1","2","3","4",..: 1 2 3 4 5 6 7 and apparently plyr's equivalent of rbind doesn't support that class. If you want to continue using POSIXlt objects you can get your immediate result without ddply; subscripting will do the job: > nDups <- with(df, ave(rep(0,nrow(df)), ID1, ID2, ID3, FUN=length)) > print(nDups) [1] 1 2 2 1 1 2 2 > df[nDups>1, ] ID1 ID2 ID3 Date Value 2 2 b v1 1985-05-02 2 3 2 b v1 1985-05-03 3 6 4 e v1 1985-05-06 6 7 4 e v1 1985-05-07 7 > str(.Last.value) 'data.frame': 4 obs. of 5 variables: $ ID1 : chr "2" "2" "4" "4" $ ID2 : chr "b" "b" "e" "e" $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 $ Date : POSIXlt, format: "1985-05-02" "1985-05-03" ... $ Value: Factor w/ 7 levels "1","2","3","4",..: 2 3 6 7 If you need plyr for other tasks you ought to use a different class for your date data (or wait until plyr can deal with POSIXlt objects). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Maybe that has something to do with the odd "dates" since > they are not > really dates at all, just string representations of factor levels. > Compare with: > > DF <- data.frame(ID1=c(1,2,2,3,3,4,4), > ID2=c('a','b','b','c','d','e','e'), > ID3=c("v1","v1","v1","v1","v2","v1","v1"), > Date=as.Date(c("1985-05-1","1985-05-2","1985-05-3", > "1985-05-4","1985-05-5","1985-05-6","1985-05-7")), > Value=c(1,2,3,4,5,6,7)) > str(DF) > #'data.frame': 7 obs. of 5 variables: > # $ ID1 : num 1 2 2 3 3 4 4 > # $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 1 2 2 3 4 5 5 > # $ ID3 : Factor w/ 2 levels "v1","v2": 1 1 1 1 2 1 1 > # $ Date : Date, format: "1985-05-01" "1985-05-02" ... > # $ Value: num 1 2 3 4 5 6 7 > > This version also works for me. > > ddply(DF,.(ID1,ID2,ID3),function(df) if(nrow(df)<=1){NULL}else{df}) > # ID1 ID2 ID3 Date Value > #1 2 b v1 1985-05-02 2 > #2 2 b v1 1985-05-03 3 > #3 4 e v1 1985-05-06 6 > #4 4 e v1 1985-05-07 7 > > > Thanks in advance, > > > > Christoph > > > > > -------------------------------------------------------------- > -------------------------------------------------------------- > ---------------------------------------- > > > > Christoph Jäckel (Dipl.-Kfm.) > > > > > -------------------------------------------------------------- > -------------------------------------------------------------- > ---------------------------------------- > > > > Research Assistant > > > > Chair for Financial Management and Capital Markets | Lehrstuhls für > > Finanzmanagement und Kapitalmärkte > > > > TUM School of Management | Technische Universität München > > > > Arcisstr. 21 | D-80333 München | Germany > > > > > -- > Brian S. Diggs, PhD > Senior Research Associate, Department of Surgery > Oregon Health & Science University > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.