Joel Neilson wrote:
> No problem - I apologize for the lack of clarity.
> 
>  >>> import rpy2.robjects as robjects
>  >>> r = robjects.r
>  >>> wilcox = robjects.r['wilcox.test']
>  >>> vec1 = [1,2,3,4,5]
>  >>> vec2 = [4,5,6,7,8]
>  >>> rvec1 = robjects.FloatVector(vec1)
>  >>> rvec2 = robjects.FloatVector(vec2)
>  >>> address = wilcox(rvec1, rvec2)
> Warning message:
> In wilcox.test.default(c(1, 2, 3, 4, 5), c(4, 5, 6, 7, 8)) :
>   cannot compute exact p-value with ties
>  >>> address
> <RVector - Python:0x6c9e18 / R:0xda4608>
> 
>  >>> print address                                   
> 
>     Wilcoxon rank sum test with continuity correction
> 
> data:  c(1, 2, 3, 4, 5) and c(4, 5, 6, 7, 8)              #herein likely 
> lies the problem if it's big

Yes. This is happening because:
- the R print method for objects of R class 'htest' likes to tell it all 
about the data used
- on an R standpoint the python variables 'rvec1' and 'rvec2' are 
anonymous (that is data structures without any associated name/symbol).


> W = 2, p-value = 0.03558
> alternative hypothesis: true location shift is not equal to 0
> 
> #right here is the problem that I ran into.  If I convert address to a 
> string and split it to get out the p value,

That's not the most efficient way to proceed; you are converting a 
(mostly) numerical data structure into a string in order to parse it and 
extract one numerical value of interest. It is better to extract 
directly your value of interest.

Try instead:

 >>> test_res = wilcox(rvec1, rvec2)
 >>> print(test_res.names)
[1] "statistic"   "parameter"   "p.value"     "null.value"  "alternative"
[6] "method"      "data.name"
 >>> test_res.subset('p.value')[0][0]
0.035578833239594126



> #funny things start happening once the aggregate vector length (e.g. 
> both of them) is about 1500
> #i think it is because R returns the primary data as illustrated above, 
> and once that data line gets out to 1500 or
> #so converting the address to a string returns only one of the following 
> four lines, and if it's the fourth line, that gets
> #truncated

There were/are issue with long string (as you found it).

> #but inside the R documentation for the wilcox test (below), i found 
> that besides the above output, which i am
> #used to seeing, R is storing the following values as a list:
> 
> 1. statistic
> the value of the test statistic with a name describing it.
> 2. parameter
> the parameter(s) for the exact distribution of the test statistic.
> 3. p.value
> the p-value for the test.
> 4. null.value
> the location parameter mu.
> 5. alternative
> a character string describing the alternative hypothesis.
> 6. method
> the type of test applied.
> 7. data.name
> a character string giving the names of the data.
> 8. conf.int
> a confidence interval for the location parameter. (Only present if 
> argument conf.int = TRUE.)
> 9. estimate
> an estimate of the location parameter. (Only present if argument 
> conf.int = TRUE.)
> 
> #so directly extracting what you need from the stored variable seems to 
> do the trick:
> 
>  >>> pval = str(address[2])
>  >>> pval
> '[1] 0.03557883'
>  >>> pvalactual = float(pval[4:])
>  >>> pvalactual
> 0.035578829999999999
> 
> #totally easy in hindsight, which is the way i guess most things are
> #but i hope this is helpful to other rookies who run into the problem

Same here: it is not necessary to convert a numerical value into its 
string representation when you are primarily after the numerical value.
It is slower (and you are currently loosing precision).

 >>> pval = address[2][0]
 >>> pval
0.035578833239594126

> 
> On Apr 4, 2009, at 5:47 AM, Laurent Gautier wrote:
> 
>> Joel,
>>
>> Good that you solved your issue.
>> However, I am not certain of what you mean by "extracting the required 
>> object directly from the address rather than first converting the 
>> address to a string".
>>
>> Self-contained examples often constitute a very efficient way to 
>> demonstrate the problem when requesting help from the list.
>>
>>
>> L.
>>
>>
>>
>>
>>
>> Joel Neilson wrote:
>>> although i still don't understand what's happening and why, this  
>>> problem went away if i extracted the required object directly from 
>>> the  address rather than first converting the address to a string or 
>>> list  and then indexing out what i wanted.
>>> i'm new to both python and computer science in general, so if this 
>>> is  obvious to everyone on the list i apologize.  however, it seems 
>>> that  the others have run into analogous problems with long R outputs 
>>> (see:  '[Rpy] R console: long output'  thread) and it was not obvious 
>>> to me  upon reading these threads precisely where the problem was  
>>> occurring.   now i know and hopefully this is useful information.
>>> ------------------------------------------------------------------------------
>>>  
>>>
>>> _______________________________________________
>>> rpy-list mailing list
>>> rpy-list@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rpy-list
>>
> 
> Joel R. Neilson, Ph.D.
> Research Scientist/Sharp Lab
> Koch Institute for Integrative Cancer Research
> Massachusetts Institute of Technology
> 40 Ames Street, E17-528
> Cambridge, MA 02139
> 
> t:  617.253.6457
> f:  617.253.3867
> 
> jneil...@mit.edu
> 
> 
> 


------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to