Hi all, I have a mixed-type structured numpy array (including columns of ints, floats and strings), sometimes with missing values. In rpy2-2.1, what is the recommended (including fastest and least memory-expensive) way to convert such a structure to an R dataframe? Here's where I've been:
1) When I try to convert a structured numpy array directly: >>> df = rpy2.robjects.DataFrame(strucarray) ... I get "ValueError: tlist can be either an instance of rpy2.rlike.container.TaggedList or an instance of rpy2.rinterfaceSexpVector of type VECSXP, or a Python dict" (even after importing rpy2.robjects.numpy2ri). 2) I can convert strucarray to a dicitonary of numpy.ndarray's first. In this case >>> df = rpy2.robjects.DataFrame(dct_of_numpy_arrays) ... no longer throws an error. However, passing the resulting dataframe to lme() generates an RRuntimeError on non-conformable arrays. (I believe this is because the string columns are converted to "Class: array, Mode: character" instead of to a factor. I don't know how to control that. 3) I tried instead converting the structured array to a dictionary of lists and convert that to a dataframe >>> df = rpy2.robjects.DataFrame(dct_of_lists) ... but that gives me a complete mess (every value in the entire dict of lists ends up in its own column). Same problem if the dictionary values are tuples instead of lists. 4) As the topmost error message suggests, I can first convert my numpy array to a tagged list of lists via rpy2.rlike.container.TaggedList(list_of_columns,column_names) but then when I go to convert to a dataframe ... >>> df = rpy2.robjects.DataFrame(tlist) ... I get a "ValueError: All parameters must be of type Sexp_Type or Python int/long, float, bool, or None". Clearly #2 gets me closest. The examples work fine in R using read.table() and then passing the data to lme(), but I have to do a /lot/ of these and don't want to burn a hole in my disk by saving from python and read_table'ing into R each time (not to mention the speed hit there). The structured arrays are also often pretty big, meaning I'm looking to avoid as many intermediate data formats as possible ... thus, numpy-array to list-of-numpy-columns to tagged-list to data-frame is non-optimal to say the least. Am I missing something obvious here? Gary The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. ------------------------------------------------------------------------------ _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list