Hi, Thanks, for the reply.
I should explain more, I'll be as brief as I can, the code for generating the dataframe is below. What I'm doing is individual based modelling of a pathogen and it's host. The way I've thought of doing this is with two dataframes, one of the pathogen and it's genes and effectors, and one of the host and it's resistance genes. During the processes of the model these things can be pulled out of the dataframes and operated on, before being stored again in the dataframes. I have generated my dataset as below, it was suggested by "arun" in a reply to a previous email I wrote with the subject "Trouble with data structures". Path_Number <- 0500 # The number of pathogen individuals in the population. # Create the initial dataframe, with initial number of effectors and initial number of expressed effectors. inds <-data.frame(ID=formatC(0001:Path_Number,width=4,flag=0),No_of_Effectors="",No_Expressed_Effectors="") # Generate the number of effectors genes each individual has. inds$No_of_Effectors <- round(as.numeric(lapply(1:nrow(inds),function(x) runif(1, min=1, max=550)))) # Generate the actual efector genes. Effectors <- lapply(1:nrow(inds),function(x) sample(1:10000,inds$No_of_Effectors,replace=TRUE)) #Add them to the dataframe inds <- data.frame(inds,Effectors=as.character(Effectors)) What I'm trying to do is for each individual, extract the values in the Effector genes cell to an object. As far as I can tell, Ind_Genes<-strsplit(as.character(inds2[1,4]),",") Will do this for the first individual or I can get all of them with All_Genes<-strsplit(as.character(inds2[,4]),",") What I then want to do is according to a generated number for each individual... round(as.numeric(lapply(1:nrow(inds2),function(x) runif(1, min=10, max=50)))) ... sample that many genes from Ind_Genes and make a new object called Expressed_Genes, which can be stored in the dataframe. My attempt at doing this is: Expressed_Genes<-lapply(First_Ind_Genes,function(x) sample(First_Ind_Genes,round(runif(1, min=10, max=50)),replace=F)) to get Expressed genes for each individual, this might be part of a for loop, or to the whole list of every individuals genes like so: Expressed_Genes<-lapply(All_Genes,function(x) sample(All_Genes,3,replace=F)) What usually happens however is I get errors: Error in sample(First_Ind_Genes, round(runif(1, min = 10, max = 50)), : cannot take a sample larger than the population when 'replace = FALSE' or it will rather than sample 3 values, sample all the values, 3 times if I allow replacement (which I don't want). So it's not sampling 3 values for me, but the whole lot of values 3 times. I do not know of another way to extract these gene values and then do things with them. For my model it is essential I can pull the genes or expressed genes out of the dataframe, work functions or operations on them and then store them back again. For example if an individual turns a gene on that was not before, then the genes would need to be pulled from the database, as would the expressed genes, and a random value from the genes object added to the expressed genes object, and then they could both be put back. A similar thing would happen when I wanted to mutate the genes. In short my aim is pull genes or expressed genes out, work functions or operations on them and then store them back again. Hopefully I've explained better, I have been thinking of changing my approach from datasets of pathogen and host from which values are pulled to objects and operated on to a multi-dimentional ragged arrays. I've been told this may be more simple for me. Where every line is an effector gene and there can be columns for the gene value, expression state (1 or 0/T or F), fitness contribution etc. This 2D layout of rows and columns is then repeated in the z dimension of the array for each individual. It is ragged in the sense each individual, each slice through the array in the z direction, would have different numbers of rows - different numbers of effectors. I can then simulate mutations by changing the gene values, cause duplications by adding rows of duplicated genes, or even cause deletions by removing rows. Once I have this set up for the pathogen I may make a similar array for the host plants, then perhaps with indexing or some such thing I can write functions to do the interactions and immunology and such. Best, Ben W. UEA (ENV) & The Sainsbury Laboratory. ________________________________ From: Jean V Adams [jvad...@usgs.gov] Sent: 07 November 2012 21:12 To: Benjamin Ward (ENV) Cc: r-help@r-project.org Subject: Re: [R] sample from list Ben, Can you provide a small example data set for inds so that we can run the code you have supplied? It's difficult for me to follow what you've got and where you're trying to go. Jean "Benjamin Ward (ENV)" <b.w...@uea.ac.uk> wrote on 11/06/2012 03:29:52 PM: > > Hi all, > > I have a list of genes present in 500 individuals, the individuals > are the elements: > Genes <- lapply(1:nrow(inds),function(x) sample(1:10000,inds > $No_of_Genes,replace=TRUE)) > > (This was later written to a dataframe as well as kept as the list > object: inds2 <- data.frame(inds,Genes=I(Genes))) > > I also have a vector of how many of those genes are expressed in > the individuals, this can also kept as a vector object or written to > a data frame: > > inds2$No_Expressed_Genes <- round(as.numeric(lapply(1:nrow > (inds2),function(x) runif(1, min=10, max=50)))) > > I want to create another list which consists of each individuals > expressed genes - essentially a subset of the total genes the > individuals have in the "Genes" list, by sampling from the Genes > list for each individual, the number of genes (values)in the > Num_Expressed_Genes vector. i.e. if Num_Expressed_Genes = 3 then > sample 3 values from the element in the Genes list. I can't quite > figure it out though. So far I have the following: > > #Defines The number of expressed genes for each individual in my data frame. > Num_Expressed_Genes <- round(as.numeric(lapply(1:nrow > (inds2),function(x) runif(1, min=10, max=50)))) > > > #My attempts to apply the sample function to every element > (individual organism) of the "Genes" list , to subset the genes expressed. > Expressed_Genes <- lapply(1:nrow(inds),function(x) sample > (Genes,Num_Expressed_Genes, replace=FALSE)) > Expressed_Genes <- lapply(Genes,function(x) sample > (Genes,Num_Expressed_Genes, replace=FALSE)) > > So far though I'm getting results like this: > > [[49]] > [[49]][[1]] > [1] 3540 27 5344 7278 9758 8077 ............................... [217] > > > [[49]][[2]] > [1] 740 3362 8588 8574 4371 1447 .............................. [340] > > > When what I need is more: > > [[49]] > [1] 6070 1106 6275 > In a case where Num_Expressed_Genes = 3 and the values are taken > from the much larger set of values for element (individual) 49 in my > Genes list. > > I'm not sure what I'm doing wrong but it seems what is happening is > instead of picking out a few values according to the > Num_Expressed_Genes vector - as an example say 3 again, It's drawing > a large number of values, if not all of them, from elements in the > list, 3 times. > > Any help is greatly appreciated, > I've thought of using loops to achieve the same task, but I'm trying > to get my individual/genes/expressed genes data.frame set up for my > individual based model and get it running using vectors and as > little loops as possible. > > Thanks, > Ben. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.