thanks Phil, I have your solution and another which I will attempt in the next day or so and will post results to the list then.
cheers andy On Wed, Oct 13, 2010 at 10:30 AM, Phil Spector <spec...@stat.berkeley.edu>wrote: > Andrew - > I think > > answer = replicate(50,{fit1 <- rpart(CHAB~.,data=chabun, method="anova", > > control=rpart.control(minsplit=10, > cp=0.01, xval=10)); > x = printcp(fit1); > x[which.min(x[,'xerror']),'nsplit']}) > > will put the numbers you want into answer, but there was no reproducible > example to test it on. Unfortunately, I don't know of any way to surpress > the printing from printcp(). > > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spec...@stat.berkeley.edu > > > > > > On Wed, 13 Oct 2010, Andrew Halford wrote: > > Hi All, >> >> I have to say upfront that I am a complete neophyte when it comes to >> programming. Nevertheless I enjoy the challenge of using R because of its >> incredible statistical resources. >> >> My problem is this .........I am running a regression tree analysis using >> "rpart" and I need to run the calculation repeatedly (say n=50 times) to >> obtain a distribution of results from which I will pick the median one to >> represent the most parsimonious tree size. Unfortunately rpart does not >> contain this ability so it will have to be coded for. >> >> Could anyone help me with this? I have provided the code (and relevant >> output) for the analysis I am running. I need to run it n=50 times and >> from >> each output pick the appropriate tree size and post it to a datafile where >> I >> can then look at the frequency distribution of tree sizes. >> >> Here is the code and output from a single run >> >> fit1 <- rpart(CHAB~.,data=chabun, method="anova", >>> >> control=rpart.control(minsplit=10, cp=0.01, xval=10)) >> >>> printcp(fit1) >>> >> >> Regression tree: >> rpart(formula = CHAB ~ ., data = chabun, method = "anova", control = >> rpart.control(minsplit = 10, >> cp = 0.01, xval = 10)) >> Variables actually used in tree construction: >> [1] EXP LAT POC RUG >> Root node error: 35904/33 = 1088 >> n= 33 >> CP nsplit rel error xerror xstd >> 1 0.539806 0 1.00000 1.0337 0.41238 >> 2 0.050516 1 0.46019 1.2149 0.38787 >> 3 0.016788 2 0.40968 1.2719 0.41280 >> 4 0.010221 3 0.39289 1.1852 0.38300 >> 5 0.010000 4 0.38267 1.1740 0.38333 >> >> Each time I re-run the model I will get a slightly different output. I >> want >> to extract the nsplit number corresponding to the lowest xerror for each >> run >> of the model (in this case it is for nsplit = 0) over 50 runs and then >> look >> at the distribution of nsplits after 50 runs. >> >> Any help appreciated. >> >> >> Andy >> >> >> -- >> Andrew Halford >> Associate Researcher >> Marine Laboratory >> University of Guam >> Ph: +1 671 734 2948 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> -- Andrew Halford Ph.D Associate Researcher Scientist Marine Laboratory University of Guam Ph: +1 671 734 2948 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.