Hi Jeff, Does this work okay for you?
ST <- list(data.frame(a=1:10), data.frame(b=c(NA,NA,NA,NA,NA,6:10)), data.frame(c=c(1,NA,NA,4:10)), data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)), data.frame(e=c(1,2,3,4,NA,NA,7:9,NA))) doit <- function(data, rows, minpresent) { if (sum(!is.na(data[rows, ])) >= minpresent) { data } else {NULL} } results <- lapply(ST, doit, rows = 1:5, minpresent = 2) ## print results results in your actual case, you would change to rows = 1:9000 and minpresent = 1000. You will have a list where each element is a dataset, and if the dataset does not meet requirements, the element is NULL. Hope this helps, Josh On Mon, May 21, 2012 at 8:32 AM, jeff6868 <geoffrey_kl...@etu.u-bourgogne.fr> wrote: > Hi everyone. > > I'm working on a list of files (about 50 files). I've listed them thanks to > the function: list.files. > Each of my files contains 35000 lines of data. These files may also contain > some missing values NA (sometimes till 10 000 NAs following each other). > The aim is to do some correlation matrices between these files (I already > have the script). But as I have often missing values, the script doesn't > work yet for all my files. > > In this topic, I would like to select a part of the data of these files > before the correlation. > In the files list I've created, I would like to select only the 9000 first > lines of each of my files: myfiles[1:9000,1], and then, in these 9000 lines, > I would like to keep only in my list the files which contains at least 1000 > non-NA lines (so numeric data) on my 9000 lines. > > I would like then to apply my script on this list of files which contains at > least 1000 numeric data on the first 9000 lines of my whole data. > > I've created easy data.frames for the example, if someone could explain me > how I can do this easily (at least 2 non NA values for the 5 first lines for > example for these fake data.frames just here). > Thank you very much! > > ST1 <- data.frame(a=1:10) > ST2 <- data.frame(b=c(NA,NA,NA,NA,NA,6:10)) > ST3 <- data.frame(c=c(1,NA,NA,4:10)) > ST4 <- data.frame(d=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA)) > ST5 <- data.frame(e=c(1,2,3,4,NA,NA,7:9,NA)) > > ( in this example, the aim is to keep only in the list.files: ST1, ST3 and > ST5 because they all contains at least 2 non-NA values in the 5 first lines, > and so to remove from the list.files ST2 and ST4 because they contain both > too much NAs in the first 5 lines). Hope you've understood! Thanks again! > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/select-part-of-files-from-a-list-files-tp4630769.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.