Hi All,
I have the following dataset:

> str(pfi_v3)
'data.frame': 714 obs. of  8 variables:
 $ project_id             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ project_lat            : num  51.4 51.5 52.2 51.5 53.5 ...
 $ project_lon            : num  -0.642 -1.85 0.08 0.126 -1.392 ...
 $ sector                 : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 
6 6 6 6 6 6 6 ...
 $ project_duration       : int  1826 3652 121 520 1087 730 730 730 790 522 ...
 $ project_delay          : int  -323 0 -60 0 0 0 0 0 0 -91 ...
 $ capital_value          : num  6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 ...
 $ contract_type          : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 2 
2 2 2 2 ...

I'm using following commands to create training and test sets:

split <- sample.split(pfi_v3, SplitRatio = 0.8)
trainPFI <- subset(pfi_v3, split == TRUE)
testPFI <- subset(pfi_v3, split == FALSE)

I am using several predictive models to estimate delay in projects.

The commands are given as below:

1. Simple linear regression

lm_m <- lm(project_delay ~ project_lon +

                                                     project_lat +

                                                     project_duration +

                                                     sector +

                                                     contract_type +


                         data = trainPFI)

lm_pred <- predict(lm_m2, newdata = testPFI)

2. Regression tree

tree_m <- rpart(project_delay ~ project_lon +
                                                          project_lat +
                                                          project_duration +
                                                          sector +
                                                          contract_type +
                                data = trainPFI)

tree_pred <- predict(tree_m2, newdata = testPFI)

3. Cp optimsed regression tree

train_m <- train(project_delay ~ project_lon +
                                                           project_lat +
                                                           project_duration +
                                                           sector +
                                                           contract_type +
                     data = trainPFI,
                     trControl=tr.control, tuneGrid = cp.grid)

train_pred <- predict(tr_m, newdata = testPFI)

4. Random Forest

rf_m <- randomForest(project_delay ~ project_lon +
                       project_lat +
                       project_duration +
                       sector +
                       contract_type +
                     data = trainPFI,
                     ntree = 2000)

rf_pred <- predict(rf_m, newdata = testPFI)

5. Conditional Forest
cf_m <- cforest(project_delay ~ project_lon +
                       project_lat +
                       project_duration +
                       sector +
                       contract_type +
                     data = trainPFI,
                     controls=cforest_unbiased(ntree=2000, mtry=3))

cf_pred <- predict(cf_m, testPFI, OOB=TRUE, type = "response")

That is it.

Now I want to create a new data frame to combine the actual and predicted 
values such that the new frame has the following columns:








I want to use this dataframe to draw the line chart to compare predictions.

How to achieve this?

Any help will be highly appreciated.

Many Thanks and

Kind Regards

Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
BS16 1QY


