[R] Creating data frame of predicted and actual values in R for plotting

Muhammad Bilal Tue, 10 May 2016 16:48:00 -0700

Hi All,

I have the following dataset:


> str(pfi_v3)
'data.frame': 714 obs. of  8 variables:
 $ project_id             : int  1 2 3 4 5 6 7 8 9 10 ...
 $ project_lat            : num  51.4 51.5 52.2 51.5 53.5 ...
 $ project_lon            : num  -0.642 -1.85 0.08 0.126 -1.392 ...
 $ sector                 : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 4 
6 6 6 6 6 6 6 ...
 $ project_duration       : int  1826 3652 121 520 1087 730 730 730 790 522 ...
 $ project_delay          : int  -323 0 -60 0 0 0 0 0 0 -91 ...
 $ capital_value          : num  6.7 5.8 21.8 47.3 47 24.2 40.7 71.9 10.7 70 ...
 $ contract_type          : Factor w/ 2 levels "Lumpsum","Turnkey": 2 2 2 2 2 2 
2 2 2 2 ...


I'm using following commands to create training and test sets:

split <- sample.split(pfi_v3, SplitRatio = 0.8)
trainPFI <- subset(pfi_v3, split == TRUE)
testPFI <- subset(pfi_v3, split == FALSE)


I am using several predictive models to estimate delay in projects.


The commands are given as below:


1. Simple linear regression

lm_m <- lm(project_delay ~ project_lon +

                                                     project_lat +

                                                     project_duration +

                                                     sector +

                                                     contract_type +

                                                     capital_value,

                         data = trainPFI)

lm_pred <- predict(lm_m2, newdata = testPFI)


2. Regression tree

tree_m <- rpart(project_delay ~ project_lon +
                                                          project_lat +
                                                          project_duration +
                                                          sector +
                                                          contract_type +
                                                          capital_value,
                                data = trainPFI)

tree_pred <- predict(tree_m2, newdata = testPFI)

3. Cp optimsed regression tree

train_m <- train(project_delay ~ project_lon +
                                                           project_lat +
                                                           project_duration +
                                                           sector +
                                                           contract_type +
                                                           capital_value,
                     data = trainPFI,
                     method="rpart",
                     trControl=tr.control, tuneGrid = cp.grid)


train_pred <- predict(tr_m, newdata = testPFI)


4. Random Forest

rf_m <- randomForest(project_delay ~ project_lon +
                       project_lat +
                       project_duration +
                       sector +
                       contract_type +
                       capital_value,
                     data = trainPFI,
                     importance=TRUE,
                     ntree = 2000)

rf_pred <- predict(rf_m, newdata = testPFI)

5. Conditional Forest
cf_m <- cforest(project_delay ~ project_lon +
                       project_lat +
                       project_duration +
                       sector +
                       contract_type +
                       capital_value,
                     data = trainPFI,
                     controls=cforest_unbiased(ntree=2000, mtry=3))

cf_pred <- predict(cf_m, testPFI, OOB=TRUE, type = "response")

That is it.


Now I want to create a new data frame to combine the actual and predicted 
values such that the new frame has the following columns:

$project_id

$actual_delay

$lm_predicted_delay

$tree_predicted_delay

$train_predicted_delay

$rf_predicted_delay

$cf_predicted_delay


I want to use this dataframe to draw the line chart to compare predictions.


How to achieve this?


Any help will be highly appreciated.


Many Thanks and


Kind Regards

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk<mailto:olugbenga2.akin...@live.uwe.ac.uk>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating data frame of predicted and actual values in R for plotting

Reply via email to