Dear Experts,

I have fitted MARS and GAM models on a real dataset. My goal is prediction. I 
have run crossvalidation many times to get an idea of the out-of-bag accuracy 
value. I use the Mean Squared Error (MSE) as an error evaluation criterion. I 
have published my paper and the reviewers ask me to do simulations.
So, my goal is now to do simulations as simulation studies may be a better 
alternative for objectively comparing the performances of these 2 algorithms. 
My goal is to figure out which method (GAM or MARS) performs better (minimizing 
MSE) in what circumstances.
I want to consider 3 different factors : n (sample size) ; the presence of 
Y-outliers and the presence of missing data (X-data).
I want to know the influence of the sample size, the influence of the 
percentage of Y-outliers and the influence of the percentage of X missing data.

Sample size : n=50 ; n=100 ; n=200; n=300 and n=500
Y-outliers : 10% of Y-outliers ; 20% of Y-outliers ; 30% of Y-outliers ; 40% of 
Y-outliers and 50% of Y-outliers
Missing data : 10% of X missing data ; 20% of X missing data ; 30% of X missing 
data ; 40% of X missing data and 50% of X missing data

Here below are the reproducible R codes for GAM and MARS I use to calculate the 
MSE running cross-validation many times. 
How can I modify my R codes to simulate the sample size, the presence of 
Y-outliers and the presence of missing data ?

###MSE CROSSVALIDATION GAM (gam1)
install.packages("ISLR")
library(ISLR)
install.packages("mgcv")
library(mgcv)
 
set.seed(123)
# Create a list to store the results
lst<-list()
 
# This statement does the repetitions (looping)
for(i in 1 :1000){
 
n=dim(Wage)[1]
 
p=0.667
 
sam=sample(1 :n,floor(p*n),replace=FALSE)
 
Training =Wage [sam,]
Testing = Wage [-sam,]
 
GAM1<-gam(wage ~education+s(age,bs="ps")+year,data=Wage)
 
ypred=predict(GAM1,newdata=Testing)
y=Testing$wage

MSE = mean((y-ypred)^2)
MSE
lst[i]<-MSE
}
mean(unlist(lst))
########

#####MSE CROSSVALIDATION MARS (Mars1)
install.packages("ISLR")
library(ISLR)
install.packages("earth")
library(earth)

set.seed(123)
# Create a list to store the results
lst<-list()
 
# This statement does the repetitions (looping)
for(i in 1 :1000){
 
n=dim(Wage)[1]
 
p=0.667
 
sam=sample(1 :n,floor(p*n),replace=FALSE)
 
Training =Wage [sam,]
Testing = Wage [-sam,]
 
mars1 <- earth(wage~age+as.factor(education)+year, data=Wage)
 
ypred=predict(mars1,newdata=Testing)
y=Testing$wage

MSE = mean((y-ypred)^2)
MSE
lst[i]<-MSE
}
mean(unlist(lst))
#########

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to