Le vendredi 04 octobre 2013 à 07:55 -0400, Duncan Murdoch a écrit : > On 13-10-04 7:31 AM, Joshua Ulrich wrote: > > On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsem...@comcast.net> > > wrote: > >> > >> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote: > >> > >>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimi...@club.fr> > >>> wrote: > >>>> Hi! > >>>> > >>>> > >>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider > >>>> quoted integers as an acceptable value for columns for which > >>>> colClasses="integer". But when colClasses is omitted, these columns are > >>>> read as integer anyway. > >>>> > >>>> For example, let's consider a file named file.dat, containing: > >>>> "1" > >>>> "2" > >>>> > >>>>> read.table("file.dat", colClasses="integer") > >>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > >>>> na.strings, : > >>>> scan() expected 'an integer' and got '"1"' > >>>> > >>>> But: > >>>>> str(read.table("file.dat")) > >>>> 'data.frame': 2 obs. of 1 variable: > >>>> $ V1: int 1 2 > >>>> > >>>> The latter result is indeed documented in ?read.table: > >>>> Unless ‘colClasses’ is specified, all columns are read as > >>>> character columns and then converted using ‘type.convert’ to > >>>> logical, integer, numeric, complex or (depending on ‘as.is’) > >>>> factor as appropriate. Quotes are (by default) interpreted in all > >>>> fields, so a column of values like ‘"42"’ will result in an > >>>> integer column. > >>>> > >>>> > >>>> Should the former behavior be considered a bug? > >>>> > >>> No. If you tell read.table the column is integer and it's actually > >>> character on disk, it should be an error. > >> > >> My reading of the `read.table` help page is that one should expect that > >> when > >> there is an 'integer'-class and an `as.integer` function and "integer" > >> is the > >> argument to colClasses, that `as.integer` will be applied to the values in > >> the > >> column. Should I be reading elsewhere? > >> > > I assume you're referring to the paragraph below. > > > > Possible values are ‘NA’ (the default, when ‘type.convert’ is > > used), ‘"NULL"’ (when the column is skipped), one of the > > atomic vector classes (logical, integer, numeric, complex, > > character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’. > > Otherwise there needs to be an ‘as’ method (from package > > ‘methods’) for conversion from ‘"character"’ to the specified > > formal class. > > > > I read that as meaning that an "as" method is required for classes not > > already listed in the prior sentence. It doesn't say an "as" method > > will be applied if colClasses is one of the atomic, factor, Date, or > > POSIXct classes; but I can see how you might assume that, since all > > the atomic, factor, Date, and POSIXct classes already have "as" > > methods... > > And this does suggest a workaround for ffdf: instead of declaring the > class to be "integer", declare a class "ffdf_integer", and write a > conversion method. Or simply read everything as character and call > as.integer() explicitly. This is indeed an interesting workaround for read.table.ffdf(), thanks!
I still think adapting the behavior of scan() would be an interesting improvement for R users, though. Regards ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel