Thanks Rui!

Anybody with ideas regarding filling _while_ binding data frames instead of
afterwards?

Ingmar

2012/8/22 Rui Barradas <ruipbarra...@sapo.pt>

> Hello,
>
> Your function doesn't seem to be very difficult to generalize.
>
> d <- read.table(text="
>
>    trg_type child_type_1
> 1 Scientists NA
> 2        of         used
> ", header=TRUE)
> str(d)
>
> subs_na <- function(tok, na_factor_level = "NOT_REALIZED", na_num = 99999)
> {
>     ifac <- which(sapply(tok, is.factor))
>     inum <- which(sapply(tok, is.numeric))
>     for(i in ifac) {
>         levels(tok[, i]) <- c(levels(tok[, i]), na_factor_level)
>         tok[is.na(tok[, i]), i] <- as.factor(na_factor_level)
>     }
>     for(i in inum)
>         tok[is.na(tok[, i]), i] <- na_num
>     tok
> }
>
> r1 <- substitute_na(d)
> r2 <- subs_na(d)
> str(r1)
> str(r2)
> identical(r1, r2)  # TRUE
>
> You could use the same coding for characters, Dates, etc.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 22-08-2012 20:16, Ingmar Schuster escreveu:
>
>  Hi,
>>
>> I have a data set with variables that are _not_ missing at random. Now I
>> use a package for learning a Bayesian Network which won't accept NA as a
>> value. From a database I query data.frames with k,k+n,k+2n, ... variables
>> (there are always at least k variables as leftmost columns). Using
>> rbind.fill from the reshape package on two data frames I would get a data
>> frame like
>>
>>     trg_type child_type_1
>> 1 Scientists NA
>> 2        of         used
>>
>> Now to get rid of NA values I use the following function, which works for
>> data frames with only factor values:
>>
>>    substitute_na <- function(tok, na_factor_level = "NOT_REALIZED") {
>>      for (i in 1:length(tok)) {levels(tok[,i]) <- c(levels(tok[,i]),
>> na_factor_level)}
>>      tok[is.na(tok)] <- as.factor(na_factor_level)
>>      return(tok)
>>    }
>>
>> Is there a better/faster way to do it? It would also be great to be able
>> to
>> distinguish factor columns from numeric columns and use a special numeric
>> value there. The current version of rbind.fill makes no direct reference
>> to
>> the fill value so that I could change its implementation for my purpose.
>>
>>
>> Thanks!
>>
>> Ingmar
>>
>>
>


-- 
Ingmar Schuster
Natural Language Processing Group
Department of Computer Science
University of Leipzig
Johannisgasse 26
04103 Leipzig, Germany

Tel. +49 341 9732205

http://asv.informatik.uni-leipzig.de/en/staff/Ingmar_Schuster

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to