You probably told R to write out the file as a single long line with fields 
separated alternately by 380 TABs and one newline � that�s what the ncol 
argument does (write is just a small wrapper around cat()).

cat() doesn�t print lines that are longer than 2 GiB, so it will insert an 
extra \n after every 2 GiB of data. (IIRC, this is because in the C code, 
fill=FALSE is replaced by fill=MAX_INT or so.)

The only way around this limitation that I can think of is to write a wrapper 
function that breaks up the matrix or list of vectors in smaller chunks and 
appends them separately to the output file.  I�m planning to add such a 
function to one of my packages, so I�d be interested if somebody has a better 
solution.

Best,
Stefan


On 16 Sep 2014, at 18:54, Maxime Vallee <vall...@iarc.fr> wrote:

> In my script I have one list of 1,132,533 vectors (each vector contains
> 381 elements). 
> 
> When I use "write" to save this list in a flat text file (I unlist my
> list, separate by tabs, and set ncol to 381), I end up with a file of
> 1,132,535 lines (2 additional lines). I checked back, my R list do not
> have those two additional items before writing.
> 
> With awk, I determined if lines where not made of 381 fields: there were
> two, separated by around 400k lines.
> 
> I made sub-files, using those "incomplete" lines as boundaries. My files
> are very close in size : 1.9 GB (respectively 1971841853 B and 1972614897
> B). It feels like a 32 bit / 64 bit issue.
> 
> My R version is this:
> ./Rscript -e 'sessionInfo()$platform'
> [1] "x86_64-unknown-linux-gnu (64-bit)"
> 
> There is somewhere, reaching 1.9 GB, something that is changing my tabs to
> unwanted carriage returns...
> Any idea that might cause this, and if it looks solvable in R?


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to