Hi If you are not satisfied with R intro docs which are distributed with R installation you can consider Introductory statistics with R by P.Dalgaard for beginners and mayby Modern applied statistics with S by W.N.Venables and B.D.Ripley which is a bit outdated and applies maybe a little more to S but still worth reading.
Regards Petr r-help-boun...@r-project.org napsal dne 27.04.2010 10:05:25: > Thanks dennis. > > Is there a book on R u could recommend. > > > > On Mon, Apr 26, 2010 at 7:12 PM, Dennis Murphy <djmu...@gmail.com> wrote: > > > Hi: > > > > > > > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher <mosherste...@gmail.com>wrote: > > > Thanks, > > > > > I was trying to stick with the base package and figure out how the base > > routines worked. > > > > If you want to use base functions, then here's a solution with aggregate: > > (the Id column > > was removed first): > > > > > with(DF, aggregate(DF[, -2], list(Year = Year), FUN = mean, na.rm = > > TRUE)) > > Year D Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > > 1 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN > > 2 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 > > 3 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN > > 4 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN > > 5 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN > > 6 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN > > 7 1988 1.333333 238 246 249 246 244 213 212 224 232 238 232 230 > > 8 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 > > > > The problem with tapply() is that the function has to be called recursively > > on each > > column you want to summarize. You could do it in a loop: > > > res <- matrix(NA, 8, 14) > > > res[, 1] <- unique(DF$Year) > > > res[, 2] <- with(DF, tapply(D, Year, mean, na.rm = TRUE)) > > > for(j in 3:14) res[, j] <- tapply(DF[, j], DF$Year, mean, na.rm = TRUE) > > > res > > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] > > [,13] > > [1,] 1980 1.000000 NaN NaN NaN NaN NaN 212 203 209 228 237 > > NaN > > [2,] 1981 0.500000 NaN 251 243 246 241 NaN NaN NaN 230 NaN > > 231 > > [3,] 1982 0.500000 236 237 242 240 242 205 199 NaN NaN NaN > > NaN > > [4,] 1983 0.500000 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 > > NaN > > [5,] 1986 0.000000 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN > > NaN > > [6,] 1987 1.333333 241 NaN NaN NaN NaN 218 NaN NaN 235 243 > > 240 > > [7,] 1988 1.333333 238 246 249 246 244 213 212 224 232 238 > > 232 > > [8,] 1989 1.333333 232 233 238 239 231 NaN 215 NaN NaN NaN > > NaN > > [,14] > > [1,] NaN > > [2,] 245 > > [3,] NaN > > [4,] NaN > > [5,] NaN > > [6,] NaN > > [7,] 230 > > [8,] 238 > > > > but it's not the most efficient way to do things. > > > > Essentially, this approach conforms to the 'split-apply-combine' strategy > > which is > > more efficiently implemented in functions like aggregate() or in packages > > such > > as doBy, plyr, reshape and data.table, some of which were mentioned earlier > > by > > Petr Pikal. > > > > HTH, > > Dennis > > > > > > On Mon, Apr 26, 2010 at 8:01 AM, steven mosher <mosherste...@gmail.com>wrote: > > > >> Thanks, > >> > >> I was trying to stick with the base package and figure out how the base > >> routines worked. I looked at plyer and it was very appealing. I guess i'll > >> give in and use it > >> > >> On Mon, Apr 26, 2010 at 2:33 AM, Dennis Murphy <djmu...@gmail.com> wrote: > >> > >>> Hi: > >>> > >>> Use of ddply() in the plyr package appears to work. > >>> > >>> library(plyr) > >>> ddply(df[, -1], .(Year), colwise(mean), na.rm = TRUE) > >>> > >>> D Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec > >>> 1 1.000000 1980 NaN NaN NaN NaN NaN 212 203 209 228 237 NaN NaN > >>> 2 0.500000 1981 NaN 251 243 246 241 NaN NaN NaN 230 NaN 231 245 > >>> 3 0.500000 1982 236 237 242 240 242 205 199 NaN NaN NaN NaN NaN > >>> 4 0.500000 1983 NaN 247 NaN NaN NaN NaN NaN 205 NaN 225 NaN NaN > >>> 5 0.000000 1986 NaN NaN NaN 240 NaN NaN NaN 213 NaN NaN NaN NaN > >>> 6 1.333333 1987 241 NaN NaN NaN NaN 218 NaN NaN 235 243 240 NaN > >>> 7 1.333333 1988 238 246 249 246 244 213 212 224 232 238 232 230 > >>> 8 1.333333 1989 232 233 238 239 231 NaN 215 NaN NaN NaN NaN 238 > >>> > >>> Replace the NaNs with NAs and that should do it.... > >>> > >>> HTH, > >>> Dennis > >>> > >>> On Sun, Apr 25, 2010 at 9:52 PM, steven mosher <mosherste...@gmail.com>wrote: > >>> > >>>> Having some difficulties with understanding how tapply works and getting > >>>> return values I expect > >>>> > >>>> Data: dataframe. DF DF$Id $D $Year....... > >>>> > >>>> Id D Year Jan Feb Mar Apr May Jun Jul Aug Sep > >>>> Oct > >>>> Nov Dec > >>>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA > >>>> NA > >>>> 11264402000 0 1981 NA NA 243 244 NA NA NA NA 225 NA 231 > >>>> NA > >>>> 11264402000 1 1981 NA 251 NA 248 241 NA NA NA 235 NA NA > >>>> 245 > >>>> 11264402000 0 1982 236 237 242 240 242 205 199 NA NA NA NA > >>>> NA > >>>> 11264402000 1 1982 236 NA NA 240 242 NA NA NA NA NA NA > >>>> NA > >>>> 11264402000 0 1983 NA 247 NA NA NA NA NA 205 NA NA NA > >>>> NA > >>>> 11264402000 1 1983 NA 247 NA NA NA NA NA NA NA 225 NA > >>>> NA > >>>> 11264402000 0 1986 NA NA NA 240 NA NA NA 213 NA NA NA > >>>> NA > >>>> 11264402000 0 1987 241 NA NA NA NA 218 NA NA 235 243 240 > >>>> NA > >>>> 11264402000 1 1987 NA NA NA NA NA 218 NA NA 235 243 240 > >>>> NA > >>>> 11264402000 3 1987 NA NA NA NA NA 218 NA NA 235 243 240 > >>>> NA > >>>> 11264402000 0 1988 238 246 249 NA 244 213 212 224 232 238 232 > >>>> 230 > >>>> 11264402000 1 1988 238 246 249 246 244 213 212 224 232 NA NA > >>>> 230 > >>>> 11264402000 3 1988 238 246 249 246 244 213 212 224 232 NA NA > >>>> 230 > >>>> 11264402000 0 1989 232 233 238 239 231 NA 215 NA NA NA NA > >>>> 238 > >>>> 11264402000 1 1989 232 233 238 239 231 NA NA NA NA NA NA > >>>> 238 > >>>> 11264402000 3 1989 232 233 238 239 231 NA NA NA NA NA NA > >>>> 238 > >>>> > >>>> and the result should be a dataframe of column means by year with the > >>>> variable D dropped (or kept doesnt matter) > >>>> > >>>> 11264402000 1 1980 NA NA NA NA NA 212 203 209 228 237 NA > >>>> NA > >>>> 11264402000 .5 1981 NA NA 243 244 NA NA NA NA 225 NA 231 > >>>> NA > >>>> 11264402000 .5 1982 236 237 242 240 242 205 199 NA NA NA NA > >>>> NA > >>>> 11264402000 .5 1983 NA 247 NA NA NA NA NA 205 NA 225 > >>>> NA > >>>> NA > >>>> 11264402000 1 1986 NA NA NA 240 NA NA NA 213 NA NA NA > >>>> NA > >>>> 11264402000 2 1987 241 NA NA NA NA 218 NA NA 235 243 240 > >>>> NA > >>>> 11264402000 1.33 1988 238 246 249 246 244 213 212 224 232 238 > >>>> 232 > >>>> 230 > >>>> 11264402000 1.33 1989 232 233 238 239 231 NA 215 NA NA NA > >>>> NA > >>>> 238 > >>>> > >>>> It would seem that Tapply should work > >>>> result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T) > >>>> > >>>> but i get errors about the length of arguments, which > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >>> > >> > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.