On 02/06/2010 12:57 AM, Barry Rowlingson wrote:
On Fri, Feb 5, 2010 at 10:23 AM, analys...@hotmail.com
<analys...@hotmail.com> wrote:
the csv files are downloaded from a database and it looks like some
character fields contain the CR-LF sequence within them.
This causes R to see a new record/row and the number of rows it sees
is different (usually higher) from the number of rows actually
extracted.
Hard to tell without an example, but I just tried this in a file:
1,2,"this
is a test",99
2,3,"oneliner",45
and:
read.table("test.csv",sep=",")
V1 V2 V3 V4
1 1 2 this\nis a test 99
2 2 3 oneliner 45
seemed to work. But if your strings aren't "quoted" (hard to tell
without an example) then you might have to find another way. Hard to
tell without an example.
Maybe the database output looks like this:
1,2,this
is a test,99
2,3,oneliner,45
in which case:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
line 1 did not have 4 elements
However, if we try:
read.csv("test.csv",header=FALSE)
V1 V2 V3 V4
1 1 2 this NA
2 is a test 99 NA
3 2 3 oneliner 45
If you can determine whether the embedded EOLs are different from those
at the end of a record, you could do a global replace on the input file
for the embedded EOLs to some character that isn't used (e.g. ~ or |) in
the input file. I'll leave the syntax to the regexperts.
Jim
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.