I have recently been bitten by an aspect of the behaviour of the read.csv() function.
Some lines in a (fairly large) *.csv file that I read in had too many entries. I would have hoped that this would cause read.csv() to throw an error, or at least issue a warning, but it read the file without complaint, putting the extra entries into an additional line. This behaviour is illustrated by the toy example in the attached file ``junk.csv''. Just do junk <- read.csv("junk.csv",header=TRUE) junk to see the problem. If the offending over-long line were in the fourth line of data or earlier, an error would be thrown, but if it is in the fifth line of data or later no error is given. This is in a way compatible with what the help on read.csv() says: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. However, the help for read.table() says the same thing. And yet if one does gorp <- read.table("junk.csv",sep=",",header=TRUE) one gets an error, whereas read.csv() gives none. Am I correct in saying that is inappropriate behaviour on the part of read.csv(), or am I missing something? cheers, Rolf Turner
P. S.: > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/C/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] datasets utils stats graphics grDevices methods base other attached packages: [1] misc_0.0-13 gtools_2.6.2 spatstat_1.21-2 deldir_0.0-13 [5] mgcv_1.6-2 fortunes_1.4-0 MASS_7.3-8 loaded via a namespace (and not attached): [1] grid_2.12.0 lattice_0.19-13 Matrix_0.999375-44 nlme_3.1-97 [5] tools_2.12.0
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.