[R] help for an R automated procedures

Gustavo Vieira Thu, 28 Feb 2013 02:32:22 -0800

Dear, I would like to post the following question to the r-help on Nabble 
(thanks in advance for the attention, Gustavo Vieira):
Hi there.
I have a data set on hands with 5,220 cases and I'd like to automate some
procedures (but I have almost no programming knowledge). The data has some
continuous variables that are grouped by 2 others: the name of species and
the locality where they were collected. So, the samples are defined as 'each
species on each locality'. For every sample I'd like to do multiple
imputation (when applicable), test for the presence of outliers, standardize
the variables, correct some species abundances, save individual samples to
tab delimited text file, and assemble each individual sample (now, without
NAs and outliers, corrected abundances, and with the new standardized
variables) into a single data set. That task is pretty complex to me, since
my programming knowledge is poor (and my free time to learn R programming is
sparse). Could someone help me with that (I could provide you the data set
and the script I have written to do that, sample by sample [ouch!])?
Thanks in advance for your attention and all the best (g...@hotmail.com).


[Bellow is an example is the codes I've used to accomplish my goals, sample
by sample, which can exemplify the complexity of the procedures:

#Subsetting the data (v1-v11 are continuous "predictors"): species 1 at
locality 1 (all data [5520 cases] are on a vector called 'morfo')
sp1.loc1<-morfo[which(spps=="sp1" & taxoc=="loc1"),] #getting only the
observations of sp1 (species 1) at loc1 (locality 1)
str(sp1.loc1) #abundance -> 19 cases and the abundance variable ('abund')
says 18
sp1.loc1$abund<-rep(19,19)
summary(sp1.loc1) #missing values present; abundance for sp1 at loc1
corrected
attach(sp1.loc1)

#Dealing with NAs:
install.packages("mice", dependencies = T) #ok (R at: home & work)
library(mice)
imp <- mice(sp1.loc1)
sp1.loc1 <- complete(imp)
summary(sp1.loc1) #jaust checking... No more Nas!
attach(sp1.loc1)


#Detecting univariate outliers
z.crit <- qnorm(0.9999)

subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
morfo[47,6]
sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00 is 25.10
sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10
subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit) #Rechecking
for outliers (now, it's ok)

subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit)

#Standardizing variables
v1.std<-with(sp1.loc1,(scale(v1)))
v1.pad<-v1.std[,1]

v2.std<-with(sp1.loc1,(scale(v2)))
v2.pad<-v2.std[,1]

v3.std<-with(sp1.loc1,(scale(v3)))
v3.pad<-v3.std[,1]

v4.std<-with(sp1.loc1,(scale(v4)))
v4.pad<-v4.std[,1]

v5.std<-with(sp1.loc1,(scale(v5)))
v5.pad<-v5.std[,1]

v6.std<-with(sp1.loc1,(scale(v6)))
v6.pad<-v6.std[,1]

v7.std<-with(sp1.loc1,(scale(v7)))
v7.pad<-v7.std[,1]

v8.std<-with(sp1.loc1,(scale(v8)))
v8.pad<-v8.std[,1]

v9.std<-with(sp1.loc1,(scale(v9)))
v9.pad<-v9.std[,1]

v10.std<-with(sp1.loc1,(scale(v10)))
v10.pad<-v10.std[,1]

v11.std<-with(sp1.loc1,(scale(v11)))
v11.pad<-v1.std[,1]


#Joining the new standardized variables to the sp1.loc1 data set

sp1.loc1<-data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8.pad,v9.pad,v10.pad,v11.pad)

attach(sp1.loc1)

write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F,
col.names=T,sep="\t")

detach(sp1.loc1)

#Subsetting the data (v1-v11 are continuous "predictors"): species 2 at
locality 1...]--

"Time will tell"
--

                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help for an R automated procedures

Reply via email to