Yes. Also, the original poster said that the files had the same column structure, so there may be stronger heuristics to see whether the first line is a header line. E.g., assuming that the first column is called "ID" (and doesn't have ID as a possible value) use
first <- readLines(file, 1) if (grepl("^ID", first) ... else ... -pd > On 13 Aug 2019, at 20:39 , Sarah Goslee <sarah.gos...@gmail.com> wrote: > > Like Bert, I can't see an easy approach for datasets that have > character rather than numeric data. But here's a simple approach for > distinguishing files that have possible character headers but numeric > data. > > > > readheader <- function(filename) { > > possibleheader <- read.table(filename, nrows=1, sep=",", header=FALSE) > > if(all(is.numeric(possibleheader[,1]))) { > # no header > infile <- read.table(filename, sep=",", header=FALSE) > } else { > # has header > infile <- read.table(filename, sep=",", header=TRUE) > } > > infile > } > > > > #### file noheader.csv #### > > 1,1,1 > 2,2,2 > 3,3,3 > > > #### file hasheader.csv #### > > a,b,c > 1,1,1 > 2,2,2 > 3,3,3 > > ######################## > >> readheader("noheader.csv") > V1 V2 V3 > 1 1 1 1 > 2 2 2 2 > 3 3 3 3 >> readheader("hasheader.csv") > a b c > 1 1 1 1 > 2 2 2 2 > 3 3 3 3 > > Sarah > > On Tue, Aug 13, 2019 at 2:00 PM Christopher W Ryan <cr...@binghamton.edu> > wrote: >> >> Alas, we spend so much time and energy on data wrangling . . . . >> >> I'm given a collection of csv files to work with---"found data". They arose >> via saving Excel files to csv format. They all have the same column >> structure, except that some were saved with column names and some were not. >> >> I have a code snippet that I've used before to traverse a directory and >> read into R all the csv files of a certain filename pattern within it, and >> combine them all into a single dataframe: >> >> library(dplyr) >> ## specify the csv files that I will want to access >> files.to.read <- list.files(path = "H:/EH", pattern = >> "WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive = >> FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE) >> >> ## function to read csv files back in >> read.csv.files <- function(filename) { >> bb <- read.csv(filename, colClasses = "character", header = TRUE) >> bb >> } >> >> ## now read the csv files, as all character >> b <- lapply(files.to.read, read.csv.files) >> >> ddd <- bind_rows(b) >> >> But this assumes that all files have column names in their first row. In >> this case, some don't. Any advice how to handle it so that those with >> column names and those without are read in and combined properly? The only >> thing I've come up with so far is: >> >> ## function to read csv files back in >> ## Unfortunately, some of the csv files are saved with column headers, and >> some are saved without them. >> ## This presents a problem when defining the function to read them: header >> = TRUE or header = FALSE? >> ## The best solution I can think of as of 13 August 2019 is to use header = >> FALSE and skip the >> ## first row of every file. This will sacrifice one record from each csv of >> about 80 files >> read.csv.files <- function(filename) { >> bb <- read.csv(filename, colClasses = "character", header = FALSE, skip >> = 1) >> bb >> } >> >> This sacrifices about 80 out of about 1600 records. For my purposes in this >> instance, this may be acceptable, but of course I'd rather not. >> >> Thanks. >> >> --Chris Ryan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Sarah Goslee (she/her) > http://www.numberwright.com > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.