Dear R users,

I am trying to apply the analysis processed in a paper, on the data I'm working 
with.

The data is: 80 patients for which I have survival data (time - days, and event 
- binary), and microarray expression data for 200 genes (predictor continuous 
variables).
My data matrix "data.test" has ncol: 202 and nrow: 80.

What I want to do is: 
- run recursive partitioning on this data to get groups of patients homogenous 
in terms of survival/prognosis.
- extract the "correlation" of single gene expression (each of the 200 genes) 
with recurrence-free survival (time and event): i want to know which variables 
can predict best a poor/good prognosis based on survival data.

I am using function "ctree" from the "party" package.

I came up with this command:
test <- ctree(Surv(time, event)~.,
        data =data.test, 
        controls=ctree_control(teststat="max", testtype="Bonferroni", 
mincriterion=0.95,savesplitstats = TRUE),
        ytrafo = function(data)trafo(data, numeric_trafo = rank), 
        xtrafo=function(data)trafo(data, surv_trafo=logrank_trafo(data, 
ties.method = "logrank"))
)
which works well but as I am not a statistician it is quite confusing and i 
might not run it properly.

My technical problem is that I would like to extract the statistics output from 
my "test" object (BinaryTree class), i.e. P-value of each of the 200 
comparisons (survival data versus each gene): i would like to know which of 
them can be really correlated to each node of the tree.

I tried:
test@tree$criterion$statistic
but the maximum value of this is 16, so I assume it is not a p-value as such: 
what is it?
and:
test@tree$criterion$criterion
maximum value is 0.96 and minimum value is 0; only one is > 0.95

str(test) gives quite some information, but it is more confusing than helping 
me at the moment.

I want to know:
- if my command for "ctree" makes sense to people who have more experience than 
me with this kind of data...
- which elements of "test" represent which statistics and how to interpret 
them: as I understood, setting "mincriterion" to 0.95 equals to setting up a 
P-value threshold of 0.05 (ctree help: "when 'mincriterion = 0.95', the p-value 
must be smaller than $0.05$ in order to split this node.")

I hope my explanation is clear, I might be completely mistaken: any tip or 
guidance are more than welcome...

Thanks!
Sarah



sessionInfo()
R version 2.14.2 (2012-02-29)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
 
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] stats4    grid      splines   stats     graphics  grDevices utils     
datasets  methods  
[10] base     

other attached packages:
 [1] biomaRt_2.10.0    party_1.0-2       vcd_1.2-13        colorspace_1.1-1  
MASS_7.3-20      
 [6] strucchange_1.4-7 sandwich_2.2-9    zoo_1.7-7         coin_1.0-21       
mvtnorm_0.9-9992 
[11] modeltools_0.2-19 survival_2.36-14 

loaded via a namespace (and not attached):
[1] lattice_0.20-6 RCurl_1.91-1.1 tools_2.14.2   XML_3.9-4.1  







------------------
Sarah Bonnin
Bioinformatician
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain



------------------
Sarah Bonnin
Bioinformatician
Genomics Unit - Office 439.01
Centre for Genomic Regulation
C/ Dr. Aiguader, 88
08003 Barcelona, Spain
Tel. +34 93-316-0373
www.crg.eu 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to