HI, df$NewPrices<- unsplit(lapply(split(df,df$Stocks),function(x) {do.call(rbind,lapply(seq_len(nrow(x)),function(i) {if(x[i,]$Offsets==2) x[i+2,]$Prices else if(x[i,]$Offsets==1) x[i+1,]$Prices else x[i,]$Prices })) }),df$Stocks)
df$NewPrices #[1] 13 17 12 16 17 18 16 17 18 sqldf(Q2)[,1] #[1] 13 17 12 16 17 18 16 17 18 I think sqldf() would be faster. A.K. ----- Original Message ----- From: Ira Sharenow <irasharenow...@yahoo.com> To: r-help@r-project.org Cc: Sent: Saturday, September 7, 2013 4:11 PM Subject: [R] Create a new column based on values in two other columns I am trying to add a column to a data frame. Each day for each stock I make a prediction for a future date. Then I need to compare my predictions to the actual values. So looking at the first row of data: For Stock A on 2011-01-01 I predicted that on 2011-01-02 the price would be 10.25. Now I need an ActualPrices column. The first value should be 13. I solved the problem using sqldf, but I would appreciate some advice on how to solve the problem using standard R techniques. The real data frame has over 100,000 rows. I know that the conditions for the correct row can be found in the WHERE clause of the SQL query and then I need to look in the Prices column to get the value, but I do not know how to do that in standard R. If another library would be easier, I am open to other ideas. Dates = as.Date(c(rep("2011-01-01",3), rep("2011-01-02",3), rep("2011-01-03",3) ), "%Y-%m-%d") Stocks = rep(c("A", "B", "C"), 3) Offsets = c(1,2,0,1,1,1,0,0,0) Prices = 10:18 PredPrices = 10:18 + 0.25 df = data.frame(Stocks, Dates, Offsets, Prices, PredPrices ) df$NewDates = df$Dates + df$Offsets df StocksDates Offsets Prices PredPricesNewDates 1A 2011-01-0111010.25 2011-01-02 2B 2011-01-0121111.25 2011-01-03 3C 2011-01-0101212.25 2011-01-01 4A 2011-01-0211313.25 2011-01-03 5B 2011-01-0211414.25 2011-01-03 6C 2011-01-0211515.25 2011-01-03 7A 2011-01-0301616.25 2011-01-03 8B 2011-01-0301717.25 2011-01-03 9C 2011-01-0301818.25 2011-01-03 library(sqldf) # To see everything in this small example Q1 = "SELECT df1.Stocks, df1.Dates, df1.Offsets, df1.Prices, df1.PredPrices, df2.Prices AS NewPrices FROM df AS df1, df AS df2 WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks"; sqldf(Q1) # To get the column. This what I really want Q2 = "SELECT df2.Prices AS NewPrices FROM df AS df1, df AS df2 WHERE df1.NewDates = df2.Dates AND df1.Stocks = df2.Stocks"; sqldf(Q2) As I will need to reshape my data so that each row is for a specific date, a second starting point is this data frame. dfWide= reshape(df, direction = "wide", idvar = "Dates", timevar = "Stocks") > dfWide Dates Offsets.A Prices.A PredPrices.A NewDates.A Offsets.B Prices.B PredPrices.B NewDates.B Offsets.C Prices.C PredPrices.C NewDates.C 1 2011-01-0111010.25 2011-01-0221111.25 2011-01-0301212.25 2011-01-01 4 2011-01-0211313.25 2011-01-0311414.25 2011-01-0311515.25 2011-01-03 7 2011-01-0301616.25 2011-01-0301717.25 2011-01-0301818.25 2011-01-03 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.