[Rpy] dataframe from structured numpy array

Gary Strangman Fri, 23 Apr 2010 07:23:17 -0700

Hi all,

I have a mixed-type structured numpy array (including columns of ints, 
floats and strings), sometimes with missing values. In rpy2-2.1, what is 
the recommended (including fastest and least memory-expensive) way to 
convert such a structure to an R dataframe? Here's where I've been:


1)  When I try to convert a structured numpy array directly:

>>> df = rpy2.robjects.DataFrame(strucarray)

... I get "ValueError: tlist can be either an instance of 
rpy2.rlike.container.TaggedList or an instance of 
rpy2.rinterfaceSexpVector of type VECSXP, or a Python dict" (even after 
importing rpy2.robjects.numpy2ri).

2) I can convert strucarray to a dicitonary of numpy.ndarray's first. In 
this case

>>> df = rpy2.robjects.DataFrame(dct_of_numpy_arrays)

... no longer throws an error. However, passing the resulting dataframe to 
lme() generates an RRuntimeError on non-conformable arrays. (I believe 
this is because the string columns are converted to "Class: array, Mode: 
character" instead of to a factor. I don't know how to control that.

3) I tried instead converting the structured array to a dictionary of 
lists and convert that to a dataframe

>>> df = rpy2.robjects.DataFrame(dct_of_lists)

... but that gives me a complete mess (every value in the entire dict of 
lists ends up in its own column). Same problem if the dictionary values 
are tuples instead of lists.

4) As the topmost error message suggests, I can first convert my numpy 
array to a tagged list of lists via 
rpy2.rlike.container.TaggedList(list_of_columns,column_names) but then 
when I go to convert to a dataframe ...

>>> df = rpy2.robjects.DataFrame(tlist)

... I get a "ValueError: All parameters must be of type Sexp_Type or 
Python int/long, float, bool, or None".

Clearly #2 gets me closest. The examples work fine in R using read.table() 
and then passing the data to lme(), but I have to do a /lot/ of these and 
don't want to burn a hole in my disk by saving from python and 
read_table'ing into R each time (not to mention the speed hit there). The 
structured arrays are also often pretty big, meaning I'm looking to avoid 
as many intermediate data formats as possible ... thus, numpy-array to 
list-of-numpy-columns to tagged-list to data-frame is non-optimal to say 
the least.

Am I missing something obvious here?

Gary


The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.


------------------------------------------------------------------------------
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

[Rpy] dataframe from structured numpy array

Reply via email to