On Feb 28, 2010, at 4:55 PM, Rolf Turner wrote:
I had occasion recently to read in a one-line *.csv file that
looked like:
"CandidateName","NSN","Ethnicity","dob","gender"
"Smith, Mary Jane",111222333,"E","2/25/1989","F"
That "F" (for female) in the last field got transformed to
FALSE. Apparently read.csv (and hence read.table) are inferring
that if the entries of a file are all F's and T's then the
field is interpreted as logical.
If I change the file to
"CandidateName","NSN","Ethnicity","dob","gender"
"Smith, Mary Jane",111222333,"E","2/25/1989","F"
"Mingdinkler, Melvin Queue",999888777,"01/04/1942","M"
then the read functions correctly interpret the last field
as being character.
The translation of "F" into FALSE resulted in some mysterious
contretemps in further analysis, which it took me a while to
track down.
I solved the problem by putting in a colClasses argument in my
call to read.csv(). But I really think that the read functions
are being too clever by half here. If field entries are surrounded
by quotes, shouldn't they be left as character? Even if they are
all F's and T's?
Furthermore using F's and T's to represent TRUE's and FALSE's is
bad practice anyway. Since FALSE and TRUE are reserved words it
would make sense for the read function to assume that a field is
logical if it consists entirely of these words. But T's and F's
.... I don't think so.
It is documented that conversion will be attempted to logical, so it
does make sense that T/F would become TRUE and FALSE since that is
typical behavior elsewhere. But at the very least this sentence in the
type.convert help page:
"Given a character vector, it attempts to convert it to logical,
integer, numeric or complex, and failing that converts it to factor
unless as.is = TRUE."
... ought to be clarified. It is not at all clear that the
conversion to logical still will be attempted even if as.is=TRUE, i.e.
the only conversion not attempted would be to factor.
I would argue that this behaviour should be changed. I can see no
downside to such a change.
cheers,
Rolf Turner
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:
9}}
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.