Re: [R] repeating an analysis

Andrew Halford Wed, 13 Oct 2010 17:40:13 -0700

thanks Phil, I have your solution and another which I will attempt in the
next day or so and will post results to the list then.


cheers

andy

On Wed, Oct 13, 2010 at 10:30 AM, Phil Spector <spec...@stat.berkeley.edu>wrote:

> Andrew -
>   I think
>
> answer = replicate(50,{fit1 <- rpart(CHAB~.,data=chabun, method="anova",
>
>                                     control=rpart.control(minsplit=10,
>                                             cp=0.01, xval=10));
>                                     x = printcp(fit1);
>                                     x[which.min(x[,'xerror']),'nsplit']})
>
> will put the numbers you want into answer, but there was no reproducible
> example to test it on.  Unfortunately, I don't know of any way to surpress
> the printing from printcp().
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spec...@stat.berkeley.edu
>
>
>
>
>
> On Wed, 13 Oct 2010, Andrew Halford wrote:
>
>  Hi All,
>>
>> I have to say upfront that I am a complete neophyte when it comes to
>> programming. Nevertheless I enjoy the challenge of using R because of its
>> incredible statistical resources.
>>
>> My problem is this .........I am running a regression tree analysis using
>> "rpart" and I need to run the calculation repeatedly (say n=50 times) to
>> obtain a distribution of results from which I will pick the median one to
>> represent the most parsimonious tree size. Unfortunately rpart does not
>> contain this ability so it will have to be coded for.
>>
>> Could anyone help me with this? I have provided the code (and relevant
>> output) for the analysis I am running. I need to run it n=50 times and
>> from
>> each output pick the appropriate tree size and post it to a datafile where
>> I
>> can then look at the frequency distribution of tree sizes.
>>
>> Here is the code and output from a single run
>>
>>  fit1 <- rpart(CHAB~.,data=chabun, method="anova",
>>>
>> control=rpart.control(minsplit=10, cp=0.01, xval=10))
>>
>>> printcp(fit1)
>>>
>>
>> Regression tree:
>> rpart(formula = CHAB ~ ., data = chabun, method = "anova", control =
>> rpart.control(minsplit = 10,
>>   cp = 0.01, xval = 10))
>> Variables actually used in tree construction:
>> [1] EXP LAT POC RUG
>> Root node error: 35904/33 = 1088
>> n= 33
>>       CP nsplit rel error xerror    xstd
>> 1 0.539806      0   1.00000 1.0337 0.41238
>> 2 0.050516      1   0.46019 1.2149 0.38787
>> 3 0.016788      2   0.40968 1.2719 0.41280
>> 4 0.010221      3   0.39289 1.1852 0.38300
>> 5 0.010000      4   0.38267 1.1740 0.38333
>>
>> Each time I re-run the model I will get a slightly different output. I
>> want
>> to extract the nsplit number corresponding to the lowest xerror for each
>> run
>> of the model (in this case it is for nsplit = 0) over 50 runs and then
>> look
>> at the distribution of nsplits after 50 runs.
>>
>> Any help appreciated.
>>
>>
>> Andy
>>
>>
>> --
>> Andrew Halford
>> Associate Researcher
>> Marine Laboratory
>> University of Guam
>> Ph: +1 671 734 2948
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>


-- 
Andrew Halford Ph.D
Associate Researcher Scientist
Marine Laboratory
University of Guam
Ph: +1 671 734 2948

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] repeating an analysis

Reply via email to