On Wed, 17-Sep-2014 at 12:36AM -0300, walmes . wrote:

|> Hello R users,
|> 
|> I'm writing a brief tutorial of getting statistical measures by splitting
|> according strata and over columns. When I used plyr::ddply I got and
|> unexpected result, with NA/NaN for non existing cells. Below is a minimal
|> reproducible code with the result that I got. For comparison, the result of
|> aggregate is showed. Why this behaviour? What I can do to avoid it?
|> 
|> > require(plyr)
|> >
|> > hab <-
|> +     read.table("http://www.leg.ufpr.br/~walmes/data/ipea_habitacao.csv";,
|> +                header=TRUE, sep=",", stringsAsFactors=FALSE, quote="",
|> +                encoding="utf-8")
|> >
|> > hab <- hab[,-ncol(hab)]
|> > names(hab) <- c("sig", "cod", "mun", "agua", "ener", "tel", "carro",
|> +                 "comp", "tot")
|> > hab <- transform(hab, sig=factor(sig))
|> > hab$siz <- cut(hab$tot, breaks=c(-Inf, 5000, Inf),
|> +                labels=c("P","G"))


However:
> summary(hab$tot)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    227    1328    2640    8264    5440 3039000      89 

Those NAs interfere with the cut() statement.

The simplest work around is

> hab <- na.omit(hab)
> 
Then ddply will play nicely.

HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_                 Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)                              ..... Eleanor Roosevelt
          
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to