Re: [R] strange behavior when reading csv - line wraps

Martin Tomko Sat, 30 May 2009 01:33:58 -0700

Jim,
the two lines I put in are the actual problematic input lines.

In these examples, there are no quotes nor # signs, although I have nomeans to make sure they do not occur in the inputs (any hints how Icould deal with that?).I am trying to avoid as much pre-processing outside R as possible, and Ihave to process about 500 files with up to 3000 records each, so I needa more or less automated/batch solution. - so any string substitutionwill have to occur in R. But for the moment, I do not see a reaason forsubstitution, and the wrapping still occurs.


Cheers
Martin



jim holtman wrote:

You need to supply the actual input line so we can see what ishappening. Are you sure you do not have unbalanced quotes in yourinput (try quote='') or do you have comment characters ("#") in yourinput?

On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <martin.to...@geo.uzh.ch<mailto:martin.to...@geo.uzh.ch>> wrote:


    Dear All,
    I am observing a strange behavior and searching the archives and
    help pages didn't help much.
    I have a csv with a variable number of fields in each line.

    I use
    dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);

    to read it in, and it works. But - some lines are long and 'wrap',
    or split and continue on the next line. So when I check the dim of
    the frame, they are not correct and I can see when I do a printout
    that the lines is split into two in the frame. I checked the input
    file and all is good.

    an example of the input is:
    
37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;

    where the last values occurs on the next line in the data frame.

    It does not have to be the last value, as in the follwong example,
    the word "kempten" starts the next line:
    
39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;

    What could be the reason?

    I ws thinking about solving the issue by using a different
    separator, that I would use for the first 7 fields and
    concatenating all of the remaining values into a single stirng
    value, but could not figure out how to do such a substitution in
    R. Unfortunately, on my system I cannot specify a range for sed...

    Thanks for any help/pointers
    Martin

    ______________________________________________
    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    <http://www.r-project.org/posting-guide.html>
    and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange behavior when reading csv - line wraps

Reply via email to