Hi All,
I have the following dataset: > str(pfi_v3) 'data.frame': 714 obs. of 8 variables: $ project_id : int 1 2 3 4 5 6 7 8 9 10 ... $ project_lat : num 51.4 51.5 52.2 51.5 53.5 ... $ project_lon : num -0.642 -1.85 0.08 0.126 -1.392 ... $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 6 6 6 6 6 6 6 ... $ project_duration : int 1826 3652 121 520 1087 730 730 730 790 522 ... $ project_delay : int -323 0 -60 0 0 0 0 0 0 -91 ... $ capital_value : num 6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 ... $ contract_type : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 2 2 2 2 2 ... I'm using following commands to create training and test sets: split <- sample.split(pfi_v3, SplitRatio = 0.8) trainPFI <- subset(pfi_v3, split == TRUE) testPFI <- subset(pfi_v3, split == FALSE) I am using several predictive models to estimate delay in projects. The commands are given as below: 1. Simple linear regression lm_m <- lm(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI) lm_pred <- predict(lm_m2, newdata = testPFI) 2. Regression tree tree_m <- rpart(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI) tree_pred <- predict(tree_m2, newdata = testPFI) 3. Cp optimsed regression tree train_m <- train(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI, method="rpart", trControl=tr.control, tuneGrid = cp.grid) train_pred <- predict(tr_m, newdata = testPFI) 4. Random Forest rf_m <- randomForest(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI, importance=TRUE, ntree = 2000) rf_pred <- predict(rf_m, newdata = testPFI) 5. Conditional Forest cf_m <- cforest(project_delay ~ project_lon + project_lat + project_duration + sector + contract_type + capital_value, data = trainPFI, controls=cforest_unbiased(ntree=2000, mtry=3)) cf_pred <- predict(cf_m, testPFI, OOB=TRUE, type = "response") That is it. Now I want to create a new data frame to combine the actual and predicted values such that the new frame has the following columns: $project_id $actual_delay $lm_predicted_delay $tree_predicted_delay $train_predicted_delay $rf_predicted_delay $cf_predicted_delay I want to use this dataframe to draw the line chart to compare predictions. How to achieve this? Any help will be highly appreciated. Many Thanks and Kind Regards -- Muhammad Bilal Research Fellow and Doctoral Researcher, Bristol Enterprise, Research, and Innovation Centre (BERIC), University of the West of England (UWE), Frenchay Campus, Bristol, BS16 1QY muhammad2.bi...@live.uwe.ac.uk<mailto:olugbenga2.akin...@live.uwe.ac.uk> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.