On Wed, 17-Sep-2014 at 12:36AM -0300, walmes . wrote: |> Hello R users, |> |> I'm writing a brief tutorial of getting statistical measures by splitting |> according strata and over columns. When I used plyr::ddply I got and |> unexpected result, with NA/NaN for non existing cells. Below is a minimal |> reproducible code with the result that I got. For comparison, the result of |> aggregate is showed. Why this behaviour? What I can do to avoid it? |> |> > require(plyr) |> > |> > hab <- |> + read.table("http://www.leg.ufpr.br/~walmes/data/ipea_habitacao.csv", |> + header=TRUE, sep=",", stringsAsFactors=FALSE, quote="", |> + encoding="utf-8") |> > |> > hab <- hab[,-ncol(hab)] |> > names(hab) <- c("sig", "cod", "mun", "agua", "ener", "tel", "carro", |> + "comp", "tot") |> > hab <- transform(hab, sig=factor(sig)) |> > hab$siz <- cut(hab$tot, breaks=c(-Inf, 5000, Inf), |> + labels=c("P","G"))
However: > summary(hab$tot) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 227 1328 2640 8264 5440 3039000 89 Those NAs interfere with the cut() statement. The simplest work around is > hab <- na.omit(hab) > Then ddply will play nicely. HTH -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.