> d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) I'm sorry, I stuck in the unname() in the mail but did not run it - its closing parenthesis should be after split's closing parenthisis, not at the end.
> d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date)), function(x)x==max(x))) > identical(d$flag , d$flag2) [1] TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: arun [mailto:smartpink...@yahoo.com] > Sent: Saturday, October 20, 2012 9:29 AM > To: William Dunlap > Cc: R help; Flavio Barros; ramoss > Subject: Re: [R] Creating a new by variable in a dataframe > > HI Bill, > > Thanks for the reply. > It was unnecessarily complicated. > d$flag<-unlist(lapply(split(d,d$date),function(x) > x[3]==max(x[3])),use.names=FALSE) > #or > d$flag<-unlist(lapply(split(d,d$date),function(x) x[3]==max(x[3]))) > should have done the same job. > str(d) > #'data.frame': 10 obs. of 4 variables: > # $ transaction: chr "T01" "T02" "T03" "T04" ... > # $ date : Date, format: "2012-10-19" "2012-10-19" ... > # $ time : int 8 9 10 11 12 13 14 15 16 17 > #$ flag : logi FALSE FALSE FALSE TRUE TRUE FALSE ... > > I am getting error messages with: > d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) > Error in match.fun(FUN) : argument "FUN" is missing, with no default > > > A.K. > > > > > > ----- Original Message ----- > From: William Dunlap <wdun...@tibco.com> > To: arun <smartpink...@yahoo.com>; Flavio Barros <flaviomargar...@gmail.com> > Cc: R help <r-help@r-project.org>; ramoss <ramine.mossad...@finra.org> > Sent: Saturday, October 20, 2012 12:04 PM > Subject: RE: [R] Creating a new by variable in a dataframe > > > d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3])))) > > I think that line is unnecessarily complicated. lapply() returns a list > and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) > to it (it also changes its names to rownames). unlist doesn't care about > the dimensions, so you may as well leave out the rbind. The only difference > in the results with and without calling rbind is that the rbind version omits > the names from flag. Use the more direct unname() on split's output or > unlists's output if that concerns you. > > Also, if you are interested in saving time and memory when the input, d, is > large, > you will be better off applying split() to just the column of the data.frame > that you want split instead of to the entire data.frame. > d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), > function(x)x==max(x)))) > (I used d[[3]] instead of the more readable d$time to follow your original > more closely.) > > You ought to check that the data is sorted by date: otherwise these give the > wrong answer. > > What result do you want when there are several transactions at the last time > in the day? > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > > -----Original Message----- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > > Behalf > > Of arun > > Sent: Friday, October 19, 2012 7:49 PM > > To: Flavio Barros > > Cc: R help; ramoss > > Subject: Re: [R] Creating a new by variable in a dataframe > > > > > > > > HI, > > Without using "ifelse()" on the same example dataset. > > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02", > > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date = > > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22", > > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time > > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", > > "16:00", "17:00")) > > > > d$date <- as.Date(d$date,format="%Y-%m-%d") > > d$time<-strptime(d$time,format="%H:%M")$hour > > d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3])))) > > d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H") > > d1<-d[,c(1,5,4)] > > d1 > > # transaction datetime flag > > #1 T01 2012-10-19 08:00:00 FALSE > > #2 T02 2012-10-19 09:00:00 FALSE > > #3 T03 2012-10-19 10:00:00 FALSE > > #4 T04 2012-10-19 11:00:00 TRUE > > #5 T05 2012-10-22 12:00:00 TRUE > > #6 T06 2012-10-23 13:00:00 FALSE > > #7 T07 2012-10-23 14:00:00 FALSE > > #8 T08 2012-10-23 15:00:00 FALSE > > #9 T09 2012-10-23 16:00:00 FALSE > > #10 T10 2012-10-23 17:00:00 TRUE > > > > str(d1) > > #'data.frame': 10 obs. of 3 variables: > > # $ transaction: chr "T01" "T02" "T03" "T04" ... > > # $ datetime : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 > > 09:00:00" ... > > # $ flag : logi FALSE FALSE FALSE TRUE TRUE FALSE ... > > > > A.K. > > > > > > ----- Original Message ----- > > From: Flavio Barros <flaviomargar...@gmail.com> > > To: William Dunlap <wdun...@tibco.com> > > Cc: "r-help@r-project.org" <r-help@r-project.org>; ramoss > > <ramine.mossad...@finra.org> > > Sent: Friday, October 19, 2012 4:24 PM > > Subject: Re: [R] Creating a new by variable in a dataframe > > > > I think i have a better solution > > > > *## Example data.frame* > > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02", > > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date = > > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22", > > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time > > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", > > "16:00", "17:00")) > > > > *## As date tranfomation* > > d$date <- as.Date(d$date) > > d$time <- strptime(d$time, format='%H') > > > > library(reshape) > > > > *## Create factor to split the data* > > fdate <- factor(format(d$date, '%D')) > > > > *## Create a list with logical TRUE when is the last transaction* > > ex <- sapply(split(d, fdate), function(x) > > ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F)) > > > > *## Coerce to logical vector* > > flag <- unlist(rbind(ex)) > > > > *## With reshape we have the transform function e can add the flag column * > > d <- transform(d, flag = flag) > > > > On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdun...@tibco.com> wrote: > > > > > Suppose your data frame is > > > d <- data.frame( > > > stringsAsFactors = FALSE, > > > transaction = c("T01", "T02", "T03", "T04", "T05", "T06", > > > "T07", "T08", "T09", "T10"), > > > date = c("2012-10-19", "2012-10-19", "2012-10-19", > > > "2012-10-19", "2012-10-22", "2012-10-23", > > > "2012-10-23", "2012-10-23", "2012-10-23", > > > "2012-10-23"), > > > time = c("08:00", "09:00", "10:00", "11:00", "12:00", > > > "13:00", "14:00", "15:00", "16:00", "17:00" > > > )) > > > (Convert the date and time to your favorite classes, it doesn't matter > > > here.) > > > > > > A general way to say if an item is the last of its group is: > > > isLastInGroup <- function(...) ave(logical(length(..1)), ..., > > > FUN=function(x)seq_along(x)==length(x)) > > > is_last_of_dayA <- with(d, isLastInGroup(date)) > > > If you know your data is sorted by date you could save a little time for > > > large > > > datasets by using > > > isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE) > > > is_last_of_dayB <- isLastInRun(d$date) > > > The above d is sorted by date so you get the same results for both: > > > > cbind(d, is_last_of_dayA, is_last_of_dayB) > > > transaction date time is_last_of_dayA is_last_of_dayB > > > 1 T01 2012-10-19 08:00 FALSE FALSE > > > 2 T02 2012-10-19 09:00 FALSE FALSE > > > 3 T03 2012-10-19 10:00 FALSE FALSE > > > 4 T04 2012-10-19 11:00 TRUE TRUE > > > 5 T05 2012-10-22 12:00 TRUE TRUE > > > 6 T06 2012-10-23 13:00 FALSE FALSE > > > 7 T07 2012-10-23 14:00 FALSE FALSE > > > 8 T08 2012-10-23 15:00 FALSE FALSE > > > 9 T09 2012-10-23 16:00 FALSE FALSE > > > 10 T10 2012-10-23 17:00 TRUE TRUE > > > > > > > > > Bill Dunlap > > > Spotfire, TIBCO Software > > > wdunlap tibco.com > > > > > > > > > > -----Original Message----- > > > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > > > On Behalf > > > > Of ramoss > > > > Sent: Friday, October 19, 2012 10:52 AM > > > > To: r-help@r-project.org > > > > Subject: [R] Creating a new by variable in a dataframe > > > > > > > > Hello, > > > > > > > > I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & > > > > time(event_tim). > > > > How could I create a 4th variable (last_trans) that would flag the last > > > > transaction of the day for each day? > > > > In SAS I use: > > > > proc sort data=all6; > > > > by tdate event_tim; > > > > run; > > > > /*Create last transaction flag per day*/ > > > > data all6; > > > > set all6; > > > > by tdate event_tim; > > > > last_trans=last.tdate; > > > > > > > > Thanks ahead for any suggestions. > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > http://r.789695.n4.nabble.com/Creating-a-new-by- > > > > variable-in-a-dataframe-tp4646782.html > > > > Sent from the R help mailing list archive at Nabble.com. > > > > > > > > ______________________________________________ > > > > R-help@r-project.org mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > -- > > Att, > > > > Flávio Barros > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.