Dear, I would like to post the following question to the r-help on Nabble (thanks in advance for the attention, Gustavo Vieira): Hi there. I have a data set on hands with 5,220 cases and I'd like to automate some procedures (but I have almost no programming knowledge). The data has some continuous variables that are grouped by 2 others: the name of species and the locality where they were collected. So, the samples are defined as 'each species on each locality'. For every sample I'd like to do multiple imputation (when applicable), test for the presence of outliers, standardize the variables, correct some species abundances, save individual samples to tab delimited text file, and assemble each individual sample (now, without NAs and outliers, corrected abundances, and with the new standardized variables) into a single data set. That task is pretty complex to me, since my programming knowledge is poor (and my free time to learn R programming is sparse). Could someone help me with that (I could provide you the data set and the script I have written to do that, sample by sample [ouch!])? Thanks in advance for your attention and all the best (g...@hotmail.com).
[Bellow is an example is the codes I've used to accomplish my goals, sample by sample, which can exemplify the complexity of the procedures: #Subsetting the data (v1-v11 are continuous "predictors"): species 1 at locality 1 (all data [5520 cases] are on a vector called 'morfo') sp1.loc1<-morfo[which(spps=="sp1" & taxoc=="loc1"),] #getting only the observations of sp1 (species 1) at loc1 (locality 1) str(sp1.loc1) #abundance -> 19 cases and the abundance variable ('abund') says 18 sp1.loc1$abund<-rep(19,19) summary(sp1.loc1) #missing values present; abundance for sp1 at loc1 corrected attach(sp1.loc1) #Dealing with NAs: install.packages("mice", dependencies = T) #ok (R at: home & work) library(mice) imp <- mice(sp1.loc1) sp1.loc1 <- complete(imp) summary(sp1.loc1) #jaust checking... No more Nas! attach(sp1.loc1) #Detecting univariate outliers z.crit <- qnorm(0.9999) subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit) morfo[47,6] sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00 is 25.10 sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10 subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit) #Rechecking for outliers (now, it's ok) subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit) subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit) #Standardizing variables v1.std<-with(sp1.loc1,(scale(v1))) v1.pad<-v1.std[,1] v2.std<-with(sp1.loc1,(scale(v2))) v2.pad<-v2.std[,1] v3.std<-with(sp1.loc1,(scale(v3))) v3.pad<-v3.std[,1] v4.std<-with(sp1.loc1,(scale(v4))) v4.pad<-v4.std[,1] v5.std<-with(sp1.loc1,(scale(v5))) v5.pad<-v5.std[,1] v6.std<-with(sp1.loc1,(scale(v6))) v6.pad<-v6.std[,1] v7.std<-with(sp1.loc1,(scale(v7))) v7.pad<-v7.std[,1] v8.std<-with(sp1.loc1,(scale(v8))) v8.pad<-v8.std[,1] v9.std<-with(sp1.loc1,(scale(v9))) v9.pad<-v9.std[,1] v10.std<-with(sp1.loc1,(scale(v10))) v10.pad<-v10.std[,1] v11.std<-with(sp1.loc1,(scale(v11))) v11.pad<-v1.std[,1] #Joining the new standardized variables to the sp1.loc1 data set sp1.loc1<-data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8.pad,v9.pad,v10.pad,v11.pad) attach(sp1.loc1) write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F, col.names=T,sep="\t") detach(sp1.loc1) #Subsetting the data (v1-v11 are continuous "predictors"): species 2 at locality 1...]-- "Time will tell" -- [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.