Hi Sashi, On Thu, Dec 9, 2010 at 9:44 AM, Sashi Challa <cha...@ohsu.edu> wrote: > Hello All, > > I have a toy dataframe like this. It has 8 columns separated by tab. > > Name SampleID Al1 Al2 X Y R Th > rs191191 A1 A B 0.999 0.09 0.78 0.090 > abc928291 A1 B J 0.3838 0.3839 0.028 0.888 > abcnab A1 H K 0.3939 0.939 0.3939 0.77 > rx82922 B1 J K 0.3838 0.393 0.393 0.00 > rcn3939 B1 M O 0.000 0.000 0.000 0.77 > tcn39399 B1 P I 0.393 0.393 0.393 0.56 > > Note that the SampleID is repeating. So I want to be able to split the > dataset based on the SampleID and write the splitted dataset of every > SampleID into a new file. > I tried split followed by lapply to do this. > > infile <- read.csv("test.txt", sep="\t", as.is = TRUE, header = TRUE) > infile.split <- split(infile, infile$SampleID) > names(infile.split[1]) ## outputs “A1”
correct, names() returns the top level names of infile.split (i.e., the two data frame names) > ## now A1, B1 are two lists in infile.split as I understand it. Correct me if > I am wrong. It is a single, named list containing two data frames (A1 and B1) (though data frames are built from lists, I think so I suppose in a way it contains two lists, but that is not really the point). > > lapply(infile.split,function(x){ > filename <- names(x) #### here I expect to see A1 or B1, I > didn’t, I tried (names(x)[1]) and that gave me “Name” and not A1 or B1. by using lapply() on the actual object, your function is getting each element of the list. That is: infile.split[[1]] infile.split[[2]] trying names() on those: names(infile.split[[1]]) should show what you are getting > final_filename <- paste(filename,”toy_set.txt”,sep=”_”) > write.table(x, file = paste(path, final_filename,sep=”/”, > row.names=FALSE, quote=FALSE,sep=”\t”) FYI I think you are missing a parenthesis in there somewhere > } ) > > In lapply I wanted to give a unique filename to all the split Sample Ids, > i.e. name them here as A1_toy_set.txt, B1_toy_set_txt. > How do I get those names, i.e. A1, B1 to a create a filename like above. Try this: ## read your data from the clipboard (obviously you do not need to) infile <- read.table("clipboard", header = TRUE) split.infile <- split(dat, dat$SampleID) #split data path <- "~" # generic path ## rather than applying to the data itself, instead apply to the names lapply(names(split.infile), function(x) { write.table(x = split.infile[[x]], file = paste(path, paste(x, "toy_set.txt", sep = "_"), sep = "/"), row.names = FALSE, quote = FALSE, sep = "\t") cat("wrote ", x, fill = TRUE) }) it will return two NULL lists, but that is fine because it should have written the files. > When I write each of the element in the list obtained after split into a > file, the column names would have names like A1.Name, A1.SampleID, A1.Al1, > ….. Can I get rid of “A1” in the column names within the lapply (other than > reading in the file again and changing the names) ? Can you report the results of str(yourdataframe) ? I did not have that issue just copying and pasting from your email and using the code I showed above. Cheers, Josh > > Thanks for your time, > > Regards > Sashi > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.