> d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3]))))
I think that line is unnecessarily complicated. lapply() returns a list and rbind applied to one argument, L, mainly adds dimensions c(length(L),1) to it (it also changes its names to rownames). unlist doesn't care about the dimensions, so you may as well leave out the rbind. The only difference in the results with and without calling rbind is that the rbind version omits the names from flag. Use the more direct unname() on split's output or unlists's output if that concerns you. Also, if you are interested in saving time and memory when the input, d, is large, you will be better off applying split() to just the column of the data.frame that you want split instead of to the entire data.frame. d$flag2 <- unlist(lapply(unname(split(d[[3]], d$date), function(x)x==max(x)))) (I used d[[3]] instead of the more readable d$time to follow your original more closely.) You ought to check that the data is sorted by date: otherwise these give the wrong answer. What result do you want when there are several transactions at the last time in the day? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of arun > Sent: Friday, October 19, 2012 7:49 PM > To: Flavio Barros > Cc: R help; ramoss > Subject: Re: [R] Creating a new by variable in a dataframe > > > > HI, > Without using "ifelse()" on the same example dataset. > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02", > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date = > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22", > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", > "16:00", "17:00")) > > d$date <- as.Date(d$date,format="%Y-%m-%d") > d$time<-strptime(d$time,format="%H:%M")$hour > d$flag<-unlist(rbind(lapply(split(d,d$date),function(x) x[3]==max(x[3])))) > d$datetime<-as.POSIXct(paste(d$date,d$time," "),format="%Y-%m-%d %H") > d1<-d[,c(1,5,4)] > d1 > # transaction datetime flag > #1 T01 2012-10-19 08:00:00 FALSE > #2 T02 2012-10-19 09:00:00 FALSE > #3 T03 2012-10-19 10:00:00 FALSE > #4 T04 2012-10-19 11:00:00 TRUE > #5 T05 2012-10-22 12:00:00 TRUE > #6 T06 2012-10-23 13:00:00 FALSE > #7 T07 2012-10-23 14:00:00 FALSE > #8 T08 2012-10-23 15:00:00 FALSE > #9 T09 2012-10-23 16:00:00 FALSE > #10 T10 2012-10-23 17:00:00 TRUE > > str(d1) > #'data.frame': 10 obs. of 3 variables: > # $ transaction: chr "T01" "T02" "T03" "T04" ... > # $ datetime : POSIXct, format: "2012-10-19 08:00:00" "2012-10-19 09:00:00" > ... > # $ flag : logi FALSE FALSE FALSE TRUE TRUE FALSE ... > > A.K. > > > ----- Original Message ----- > From: Flavio Barros <flaviomargar...@gmail.com> > To: William Dunlap <wdun...@tibco.com> > Cc: "r-help@r-project.org" <r-help@r-project.org>; ramoss > <ramine.mossad...@finra.org> > Sent: Friday, October 19, 2012 4:24 PM > Subject: Re: [R] Creating a new by variable in a dataframe > > I think i have a better solution > > *## Example data.frame* > d <- data.frame(stringsAsFactors = FALSE, transaction = c("T01", "T02", > "T03", "T04", "T05", "T06", "T07", "T08", "T09", "T10"),date = > c("2012-10-19", "2012-10-19", "2012-10-19", "2012-10-19", "2012-10-22", > "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23", "2012-10-23"),time > = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", > "16:00", "17:00")) > > *## As date tranfomation* > d$date <- as.Date(d$date) > d$time <- strptime(d$time, format='%H') > > library(reshape) > > *## Create factor to split the data* > fdate <- factor(format(d$date, '%D')) > > *## Create a list with logical TRUE when is the last transaction* > ex <- sapply(split(d, fdate), function(x) > ifelse(as.numeric(x[,'time'])==max(as.numeric(x[,'time'])),T,F)) > > *## Coerce to logical vector* > flag <- unlist(rbind(ex)) > > *## With reshape we have the transform function e can add the flag column * > d <- transform(d, flag = flag) > > On Fri, Oct 19, 2012 at 3:51 PM, William Dunlap <wdun...@tibco.com> wrote: > > > Suppose your data frame is > > d <- data.frame( > > stringsAsFactors = FALSE, > > transaction = c("T01", "T02", "T03", "T04", "T05", "T06", > > "T07", "T08", "T09", "T10"), > > date = c("2012-10-19", "2012-10-19", "2012-10-19", > > "2012-10-19", "2012-10-22", "2012-10-23", > > "2012-10-23", "2012-10-23", "2012-10-23", > > "2012-10-23"), > > time = c("08:00", "09:00", "10:00", "11:00", "12:00", > > "13:00", "14:00", "15:00", "16:00", "17:00" > > )) > > (Convert the date and time to your favorite classes, it doesn't matter > > here.) > > > > A general way to say if an item is the last of its group is: > > isLastInGroup <- function(...) ave(logical(length(..1)), ..., > > FUN=function(x)seq_along(x)==length(x)) > > is_last_of_dayA <- with(d, isLastInGroup(date)) > > If you know your data is sorted by date you could save a little time for > > large > > datasets by using > > isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE) > > is_last_of_dayB <- isLastInRun(d$date) > > The above d is sorted by date so you get the same results for both: > > > cbind(d, is_last_of_dayA, is_last_of_dayB) > > transaction date time is_last_of_dayA is_last_of_dayB > > 1 T01 2012-10-19 08:00 FALSE FALSE > > 2 T02 2012-10-19 09:00 FALSE FALSE > > 3 T03 2012-10-19 10:00 FALSE FALSE > > 4 T04 2012-10-19 11:00 TRUE TRUE > > 5 T05 2012-10-22 12:00 TRUE TRUE > > 6 T06 2012-10-23 13:00 FALSE FALSE > > 7 T07 2012-10-23 14:00 FALSE FALSE > > 8 T08 2012-10-23 15:00 FALSE FALSE > > 9 T09 2012-10-23 16:00 FALSE FALSE > > 10 T10 2012-10-23 17:00 TRUE TRUE > > > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > > > -----Original Message----- > > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > > On Behalf > > > Of ramoss > > > Sent: Friday, October 19, 2012 10:52 AM > > > To: r-help@r-project.org > > > Subject: [R] Creating a new by variable in a dataframe > > > > > > Hello, > > > > > > I have a dataframe w/ 3 variables of interest: transaction,date(tdate) & > > > time(event_tim). > > > How could I create a 4th variable (last_trans) that would flag the last > > > transaction of the day for each day? > > > In SAS I use: > > > proc sort data=all6; > > > by tdate event_tim; > > > run; > > > /*Create last transaction flag per day*/ > > > data all6; > > > set all6; > > > by tdate event_tim; > > > last_trans=last.tdate; > > > > > > Thanks ahead for any suggestions. > > > > > > > > > > > > -- > > > View this message in context: > > http://r.789695.n4.nabble.com/Creating-a-new-by- > > > variable-in-a-dataframe-tp4646782.html > > > Sent from the R help mailing list archive at Nabble.com. > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Att, > > Flávio Barros > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.