Joel Neilson wrote: > No problem - I apologize for the lack of clarity. > > >>> import rpy2.robjects as robjects > >>> r = robjects.r > >>> wilcox = robjects.r['wilcox.test'] > >>> vec1 = [1,2,3,4,5] > >>> vec2 = [4,5,6,7,8] > >>> rvec1 = robjects.FloatVector(vec1) > >>> rvec2 = robjects.FloatVector(vec2) > >>> address = wilcox(rvec1, rvec2) > Warning message: > In wilcox.test.default(c(1, 2, 3, 4, 5), c(4, 5, 6, 7, 8)) : > cannot compute exact p-value with ties > >>> address > <RVector - Python:0x6c9e18 / R:0xda4608> > > >>> print address > > Wilcoxon rank sum test with continuity correction > > data: c(1, 2, 3, 4, 5) and c(4, 5, 6, 7, 8) #herein likely > lies the problem if it's big
Yes. This is happening because: - the R print method for objects of R class 'htest' likes to tell it all about the data used - on an R standpoint the python variables 'rvec1' and 'rvec2' are anonymous (that is data structures without any associated name/symbol). > W = 2, p-value = 0.03558 > alternative hypothesis: true location shift is not equal to 0 > > #right here is the problem that I ran into. If I convert address to a > string and split it to get out the p value, That's not the most efficient way to proceed; you are converting a (mostly) numerical data structure into a string in order to parse it and extract one numerical value of interest. It is better to extract directly your value of interest. Try instead: >>> test_res = wilcox(rvec1, rvec2) >>> print(test_res.names) [1] "statistic" "parameter" "p.value" "null.value" "alternative" [6] "method" "data.name" >>> test_res.subset('p.value')[0][0] 0.035578833239594126 > #funny things start happening once the aggregate vector length (e.g. > both of them) is about 1500 > #i think it is because R returns the primary data as illustrated above, > and once that data line gets out to 1500 or > #so converting the address to a string returns only one of the following > four lines, and if it's the fourth line, that gets > #truncated There were/are issue with long string (as you found it). > #but inside the R documentation for the wilcox test (below), i found > that besides the above output, which i am > #used to seeing, R is storing the following values as a list: > > 1. statistic > the value of the test statistic with a name describing it. > 2. parameter > the parameter(s) for the exact distribution of the test statistic. > 3. p.value > the p-value for the test. > 4. null.value > the location parameter mu. > 5. alternative > a character string describing the alternative hypothesis. > 6. method > the type of test applied. > 7. data.name > a character string giving the names of the data. > 8. conf.int > a confidence interval for the location parameter. (Only present if > argument conf.int = TRUE.) > 9. estimate > an estimate of the location parameter. (Only present if argument > conf.int = TRUE.) > > #so directly extracting what you need from the stored variable seems to > do the trick: > > >>> pval = str(address[2]) > >>> pval > '[1] 0.03557883' > >>> pvalactual = float(pval[4:]) > >>> pvalactual > 0.035578829999999999 > > #totally easy in hindsight, which is the way i guess most things are > #but i hope this is helpful to other rookies who run into the problem Same here: it is not necessary to convert a numerical value into its string representation when you are primarily after the numerical value. It is slower (and you are currently loosing precision). >>> pval = address[2][0] >>> pval 0.035578833239594126 > > On Apr 4, 2009, at 5:47 AM, Laurent Gautier wrote: > >> Joel, >> >> Good that you solved your issue. >> However, I am not certain of what you mean by "extracting the required >> object directly from the address rather than first converting the >> address to a string". >> >> Self-contained examples often constitute a very efficient way to >> demonstrate the problem when requesting help from the list. >> >> >> L. >> >> >> >> >> >> Joel Neilson wrote: >>> although i still don't understand what's happening and why, this >>> problem went away if i extracted the required object directly from >>> the address rather than first converting the address to a string or >>> list and then indexing out what i wanted. >>> i'm new to both python and computer science in general, so if this >>> is obvious to everyone on the list i apologize. however, it seems >>> that the others have run into analogous problems with long R outputs >>> (see: '[Rpy] R console: long output' thread) and it was not obvious >>> to me upon reading these threads precisely where the problem was >>> occurring. now i know and hopefully this is useful information. >>> ------------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> rpy-list mailing list >>> rpy-list@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rpy-list >> > > Joel R. Neilson, Ph.D. > Research Scientist/Sharp Lab > Koch Institute for Integrative Cancer Research > Massachusetts Institute of Technology > 40 Ames Street, E17-528 > Cambridge, MA 02139 > > t: 617.253.6457 > f: 617.253.3867 > > jneil...@mit.edu > > > ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list