It is still not clear to me exactly how you want to read the lines in.  If
the lines have a variable number of fields, and some of the lines might be
wrapped, is there some way to determine where the start of each line is.

If you are reading them in with read.csv, then the system is assuming that
each line starts a new row.  If this is not the case, then you will have to
state the rules that determine where the lines start.  You can always read
the data in with 'scan' to separate each line and then do whatever
processing is required to put together the rows in a data frame that you
want.

In one of your examples, you indicated that the line was split starting at
the word "kempten"; if this is in the middle of the line, then you would
have to create the break after reading the line in with 'scan' and then
creating the rows in the dataframe.  All of this can be done in R if you can
state what the criteria is.
On Sat, May 30, 2009 at 4:32 AM, Martin Tomko <martin.to...@geo.uzh.ch>wrote:

> Jim,
> the two lines I put in are the actual problematic input lines.
> In these examples, there are no quotes nor # signs, although I have no
> means to make sure they do not occur in the inputs (any hints how I could
> deal with that?).
> I am trying to avoid as much pre-processing outside R as possible, and I
> have to process about 500 files with up to 3000 records each, so I need a
> more or less automated/batch solution. - so any string substitution will
> have to occur in R. But for the moment, I do not see a reaason for
> substitution, and the wrapping still occurs.
>
> Cheers
> Martin
>
>
>
> jim holtman wrote:
>
>> You need to supply the actual input line so we can see what is happening.
>>  Are you sure you do not have unbalanced quotes in your input (try quote='')
>> or do you have comment characters ("#") in your input?
>>
>>  On Fri, May 29, 2009 at 3:15 PM, Martin Tomko 
>> <martin.to...@geo.uzh.ch<mailto:
>> martin.to...@geo.uzh.ch>> wrote:
>>
>>    Dear All,
>>    I am observing a strange behavior and searching the archives and
>>    help pages didn't help much.
>>    I have a csv with a variable number of fields in each line.
>>
>>    I use
>>    dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);
>>
>>    to read it in, and it works. But - some lines are long and 'wrap',
>>    or split and continue on the next line. So when I check the dim of
>>    the frame, they are not correct and I can see when I do a printout
>>    that the lines is split into two in the frame. I checked the input
>>    file and all is good.
>>
>>    an example of the input is:
>>    37;2175168475;13;8.522729;47.19537;16366...@n00
>> ;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
>>
>>    where the last values occurs on the next line in the data frame.
>>
>>    It does not have to be the last value, as in the follwong example,
>>    the word "kempten" starts the next line:
>>    39;167757703;12;10.309295;47.724545;21903...@n00
>> ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
>>
>>    What could be the reason?
>>
>>    I ws thinking about solving the issue by using a different
>>    separator, that I would use for the first 7 fields and
>>    concatenating all of the remaining values into a single stirng
>>    value, but could not figure out how to do such a substitution in
>>    R. Unfortunately, on my system I cannot specify a range for sed...
>>
>>    Thanks for any help/pointers
>>    Martin
>>
>>    ______________________________________________
>>    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
>>    https://stat.ethz.ch/mailman/listinfo/r-help
>>    PLEASE do read the posting guide
>>    
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>>    <http://www.r-project.org/posting-guide.html>
>>    and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to