Pls don't mind the typo in predict() functions for some of the models. Sent from my iPhone
> On 11 May 2016, at 12:47 am, Muhammad Bilal <muhammad2.bi...@live.uwe.ac.uk> > wrote: > > Hi All, > > > I have the following dataset: > > >> str(pfi_v3) > 'data.frame': 714 obs. of 8 variables: > $ project_id : int 1 2 3 4 5 6 7 8 9 10 ... > $ project_lat : num 51.4 51.5 52.2 51.5 53.5 ... > $ project_lon : num -0.642 -1.85 0.08 0.126 -1.392 ... > $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 > 6 6 6 6 6 6 6 ... > $ project_duration : int 1826 3652 121 520 1087 730 730 730 790 522 ... > $ project_delay : int -323 0 -60 0 0 0 0 0 0 -91 ... > $ capital_value : num 6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 > ... > $ contract_type : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 > 2 2 2 2 2 ... > > > I'm using following commands to create training and test sets: > > split <- sample.split(pfi_v3, SplitRatio = 0.8) > trainPFI <- subset(pfi_v3, split == TRUE) > testPFI <- subset(pfi_v3, split == FALSE) > > > I am using several predictive models to estimate delay in projects. > > > The commands are given as below: > > > 1. Simple linear regression > > lm_m <- lm(project_delay ~ project_lon + > > project_lat + > > project_duration + > > sector + > > contract_type + > > capital_value, > > data = trainPFI) > > lm_pred <- predict(lm_m2, newdata = testPFI) > > > 2. Regression tree > > tree_m <- rpart(project_delay ~ project_lon + > project_lat + > project_duration + > sector + > contract_type + > capital_value, > data = trainPFI) > > tree_pred <- predict(tree_m2, newdata = testPFI) > > 3. Cp optimsed regression tree > > train_m <- train(project_delay ~ project_lon + > project_lat + > project_duration + > sector + > contract_type + > capital_value, > data = trainPFI, > method="rpart", > trControl=tr.control, tuneGrid = cp.grid) > > > train_pred <- predict(tr_m, newdata = testPFI) > > > 4. Random Forest > > rf_m <- randomForest(project_delay ~ project_lon + > project_lat + > project_duration + > sector + > contract_type + > capital_value, > data = trainPFI, > importance=TRUE, > ntree = 2000) > > rf_pred <- predict(rf_m, newdata = testPFI) > > 5. Conditional Forest > cf_m <- cforest(project_delay ~ project_lon + > project_lat + > project_duration + > sector + > contract_type + > capital_value, > data = trainPFI, > controls=cforest_unbiased(ntree=2000, mtry=3)) > > cf_pred <- predict(cf_m, testPFI, OOB=TRUE, type = "response") > > That is it. > > > Now I want to create a new data frame to combine the actual and predicted > values such that the new frame has the following columns: > > $project_id > > $actual_delay > > $lm_predicted_delay > > $tree_predicted_delay > > $train_predicted_delay > > $rf_predicted_delay > > $cf_predicted_delay > > > I want to use this dataframe to draw the line chart to compare predictions. > > > How to achieve this? > > > Any help will be highly appreciated. > > > Many Thanks and > > > Kind Regards > > -- > Muhammad Bilal > Research Fellow and Doctoral Researcher, > Bristol Enterprise, Research, and Innovation Centre (BERIC), > University of the West of England (UWE), > Frenchay Campus, > Bristol, > BS16 1QY > > muhammad2.bi...@live.uwe.ac.uk<mailto:olugbenga2.akin...@live.uwe.ac.uk> > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.