I had occasion recently to read in a one-line *.csv file that looked like:
"CandidateName","NSN","Ethnicity","dob","gender" "Smith, Mary Jane",111222333,"E","2/25/1989","F" That "F" (for female) in the last field got transformed to FALSE. Apparently read.csv (and hence read.table) are inferring that if the entries of a file are all F's and T's then the field is interpreted as logical. If I change the file to "CandidateName","NSN","Ethnicity","dob","gender" "Smith, Mary Jane",111222333,"E","2/25/1989","F" "Mingdinkler, Melvin Queue",999888777,"01/04/1942","M" then the read functions correctly interpret the last field as being character. The translation of "F" into FALSE resulted in some mysterious contretemps in further analysis, which it took me a while to track down. I solved the problem by putting in a colClasses argument in my call to read.csv(). But I really think that the read functions are being too clever by half here. If field entries are surrounded by quotes, shouldn't they be left as character? Even if they are all F's and T's? Furthermore using F's and T's to represent TRUE's and FALSE's is bad practice anyway. Since FALSE and TRUE are reserved words it would make sense for the read function to assume that a field is logical if it consists entirely of these words. But T's and F's .... I don't think so. I would argue that this behaviour should be changed. I can see no downside to such a change. cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.