A quick update on that point.

Silently translating R's NA into values is something I clearly overlooked
(as it can lead to hard-to-track problems in someone's code).

The following improvements are in the 2.1-dev repository:

- the parameter 'as_is' was added to the signature of the method 
DataFrame.from_csvfile()
(set 'as_is' to True and the DataFrame returned will have columns of 
type StrVector
rather than FactorVector),

- NA are now subtypes of Python types (for now, that should be NA, 
NA_integer_, and NA_real_ for a start). When the taking the same 
example, that's:

import rpy2.robjects as ro
fcr = ro.r('factor(c("a", "b", NA, "a", NA))')

 >>> list(fcr)
[1, 2, NA_integer_, 1, NA_integer_]

(used to be:
[1, 2, -2147483648, 1, -2147483648]
)



L.





On 15/01/10 17:41, Laurent Gautier wrote:
> Hi Luca,
>
> Unfortunately this does not seem to be caused by your installation.
>
> The problem exists for IntVector, and FactorVector inherits from it. 
> Few features are likely missing from FactorVector, but the good thing 
> is that they already can be implemented simply.
>
> Let's take an example:
>
> import rpy2.robjects as ro
> fcr = ro.r('factor(c("a", "b", NA, "a", NA))')
>
> 'fcr' is now a FactorVector, that is an IntVector with levels.
>
> >>> list(fcr)
> [1, 2, -2147483648, 1, -2147483648]
>
> That large negative integer is the one used by R to encode missing 
> "integer" values:
>
> >>> ro.NA_integer[0]
> -2147483648
>
> What is happening when doing 'list(fcr)' is that fcr will be iterated 
> through and each element stored into a result Python list.
> The issue is that Python does not have a "missing integer" value, but
> that should not stop us from writing a simple function to deal with it 
> as needed.
>
> def as_character_list(factor):
>     na_val = ro.NA_integer[0]
>     res = [None, ] * len(factor)
>     for i, elt in enumerate(factor):
>         if elt != na_val:
>             #NOTE: R is using 1-offset indices
>             res[i] = factor.levels[elt-1]
>     return res
>
> >>> as_character_list(fcr)
> ['a', 'b', None, 'a',  None]
>
>
> What we have implemented is a variant of the R base function 
> "as.character.factor":
>
> from rpy2.robjects.packages import importr
> base = importr("base")
>
> >>>list(base.as_character(fcr))
> ['a', 'b', 'NA', 'a', 'NA']
>
>
>
> L.
>
>
>
>
> On 1/15/10 2:36 PM, Luca Beltrame wrote:
>> Hello,
>>
>> in my code, I need to convert the columns from a robjects.DataFrame 
>> to other
>> data types (list, for example). Howver, I've found a problem when 
>> dealing with
>> data that contains NAs. In particular, I'm referring to non-numeric 
>> columns,
>> that are represented as FactorVectors.
>>
>> Example code:
>>
>> import rpy2.robjects as robjects
>>
>> data = robjects.DataFrame.from_csvfile("file_with_NAs_in_columns", 
>> sep="\t")
>>
>> column_with_na = data.rx2("Column")
>>
>> print column_with_na
>>
>> [1]<NA> <NA> <NA>  some_value
>> Levels: some_value
>>
>> and If I issue
>>
>> print column_with_na[0]
>>
>> I get:
>> -2147483648
>>
>> And of course, accessing the levels I only get some_value. Converting 
>> to other
>> types of Vector doesn't seem to help.
>>
>> Notice that this works if I do
>>
>> base = importr("base")
>> column_value = base.as_vector(column_with_na)
>> column_value = list(column_value)
>> print column_value
>> ['NA', 'NA', 'NA', 'some_value']
>>
>> Is there a way to translate the column *including* the NAs, into a 
>> Python list
>> without doing the hackish way described above?
>>
>> This is with RPy 2.1 alpha 2. I admit that there may be a problem 
>> with my
>> installation as I'm running a local copy of rpy2 2.1 as I still have 
>> a system-
>> wide 2.0.x needed for some projects.
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>  
>>
>> Throughout its 18-year history, RSA Conference consistently attracts the
>> world's best and brightest in the field, creating opportunities for 
>> Conference
>> attendees to learn about information security's most important issues 
>> through
>> interactions with peers, luminaries and emerging and established 
>> companies.
>> http://p.sf.net/sfu/rsaconf-dev2dev
>>
>>
>>
>> _______________________________________________
>> rpy-list mailing list
>> rpy-list@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rpy-list
>


------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to