Jim,
the two lines I put in are the actual problematic input lines.
In these examples, there are no quotes nor # signs, although I have no
means to make sure they do not occur in the inputs (any hints how I
could deal with that?).
I am trying to avoid as much pre-processing outside R as possible, and I
have to process about 500 files with up to 3000 records each, so I need
a more or less automated/batch solution. - so any string substitution
will have to occur in R. But for the moment, I do not see a reaason for
substitution, and the wrapping still occurs.
Cheers
Martin
jim holtman wrote:
You need to supply the actual input line so we can see what is
happening. Are you sure you do not have unbalanced quotes in your
input (try quote='') or do you have comment characters ("#") in your
input?
On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <martin.to...@geo.uzh.ch
<mailto:martin.to...@geo.uzh.ch>> wrote:
Dear All,
I am observing a strange behavior and searching the archives and
help pages didn't help much.
I have a csv with a variable number of fields in each line.
I use
dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);
to read it in, and it works. But - some lines are long and 'wrap',
or split and continue on the next line. So when I check the dim of
the frame, they are not correct and I can see when I do a printout
that the lines is split into two in the frame. I checked the input
file and all is good.
an example of the input is:
37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
where the last values occurs on the next line in the data frame.
It does not have to be the last value, as in the follwong example,
the word "kempten" starts the next line:
39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
What could be the reason?
I ws thinking about solving the issue by using a different
separator, that I would use for the first 7 fields and
concatenating all of the remaining values into a single stirng
value, but could not figure out how to do such a substitution in
R. Unfortunately, on my system I cannot specify a range for sed...
Thanks for any help/pointers
Martin
______________________________________________
R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.