[R] The behaviour of read.csv().

Rolf Turner Thu, 02 Dec 2010 15:55:02 -0800

I have recently been bitten by an aspect of the behaviour of
the read.csv() function.


Some lines in a (fairly large) *.csv file that I read in had
too many entries.  I would have hoped that this would cause
read.csv() to throw an error, or at least issue a warning,
but it read the file without complaint, putting the extra
entries into an additional line.

This behaviour is illustrated by the toy example in the
attached file ``junk.csv''.  Just do

        junk <- read.csv("junk.csv",header=TRUE)
        junk

to see the problem.

If the offending over-long line were in the fourth line of data
or earlier, an error would be thrown, but if it is in the fifth line
of data or later no error is given.

This is in a way compatible with what the help on read.csv()
says:

        The number of data columns is determined by looking at
        the first five lines of input (or the whole file if it
        has less than five lines), or from the length of col.names
        if it is specified and is longer.

However, the help for read.table() says the same thing.  And yet if
one does

        gorp <- read.table("junk.csv",sep=",",header=TRUE)

one gets an error, whereas read.csv() gives none.

Am I correct in saying that is inappropriate behaviour on
the part of read.csv(), or am I missing something?

                cheers,

                        Rolf Turner


P. S.:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/C/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] datasets  utils     stats     graphics  grDevices methods   base     

other attached packages:
[1] misc_0.0-13     gtools_2.6.2    spatstat_1.21-2 deldir_0.0-13  
[5] mgcv_1.6-2      fortunes_1.4-0  MASS_7.3-8     

loaded via a namespace (and not attached):
[1] grid_2.12.0        lattice_0.19-13    Matrix_0.999375-44 nlme_3.1-97       
[5] tools_2.12.0

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] The behaviour of read.csv().

Reply via email to