Re: [R] Randomizing a dataframe

Mark Na Fri, 10 Jul 2009 15:27:46 -0700

Greg's reply was just what I needed to get me going. I used his advice to
produce a program which does just what I need. In case it helps someone
else, my program is below.


Mark Na


library(reshape)
data<-read.csv("data.csv")
datam<-melt(data,id=("TREE")) #value = number of individuals

datam<-datam[rep(1:nrow(datam), datam$value),] #expand rows based on number
of individuals
rownames(datam) <- 1:nrow(datam) #fix rownames
datam<-subset(datam,select=c("TREE","variable")) #drop columns
names(datam)<-c("TREE","SPECIES") #rename columns

datap<-data.frame(sample(datam$TREE),datam$SPECIES) #randomly permute TREE
names(datap)<-c("TREE","SPECIES") #rename columns

datat<-data.frame(table(datap)) #collapse rows based on number of
individuals = Freq
datac<-cast(datat,TREE~SPECIES,value="Freq") #the final permuted table



On Wed, Jul 8, 2009 at 11:28 AM, Greg Snow <[email protected]> wrote:

> Here is one approach (there are others, some that are probably better, but
> this can get you started):
>
> 1. rearrange your data so that every insect is a single row with 2 columns:
> the tree id and the species (this new dataset will have as many rows as the
> sum of the values in the old dataset).  The reshape package may be able to
> help with this step (you may also need the rep function).
>
> 2. randomly permute one of the 2 columns (see ?sample).
>
> 3. restructure the permuted data back to the original (the table function
> may be enough here, the reshape package will give more options).
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [email protected]
> 801.408.8111
>
>
> > -----Original Message-----
> > From: [email protected] [mailto:r-help-boun...@r-
> > project.org] On Behalf Of Mark Na
> > Sent: Wednesday, July 08, 2009 9:54 AM
> > To: [email protected]
> > Subject: [R] Randomizing a dataframe
> >
> > Hi R-helpers,
> >
> > I have a dataframe (called data) with trees in rows (n=100) and insect
> > species (n=10) in columns. My tree IDs are in a column called TREE and
> > each
> > species has a column labeled SPEC1, SPEC2, SPEC3, etc...
> >
> > I wish to randomize the values in my dataframe such that row and column
> > totals are held constant, i.e. in my randomized data each tree will
> > have the
> > same number of individual insects as in the real data (constant row
> > totals)
> > and each species will have the same number of individuals as in the
> > real
> > data (constant column totals).
> >
> > I will eventually want to do this many times, but I would appreciate
> > help
> > getting started with the randomization.
> >
> > Thank you, Mark Na
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Randomizing a dataframe

Reply via email to