Hi Petar,

If you're going to share this matrix across R sessions, save()/load() is
probably one of your best options.

Otherwise, you could try the rhdf5 package from Bioconductor:

1. Install the package with:

     source("http://bioconductor.org/biocLite.R";)
     biocLite("rhdf5")

2. Then:

     library(rhdf5)

     h5createFile("my_big_matrix.h5")

     # write a matrix
     my_big_matrix <- matrix(runif(5000*10000), nrow=5000)
     attr(my_big_matrix, "scale") <- "liter"
h5write(my_big_matrix, "my_big_matrix.h5", "my_big_matrix") # takes 1 min.
     # file size on disk is 248M

     # read a matrix
my_big_matrix <- h5read("my_big_matrix.h5", "my_big_matrix") # takes 7.4 sec.

Multiply the above numbers (obtained on a laptop with a traditional
hard drive) by 100 for your monster matrix, or less if you have super
fast I/O.

2 advantages of using the HDF5 format: (1) should not be too hard to use
the HDF5 C library in the C code you're going to use to read the matrix,
and (2) my understanding is that HDF5 is good at letting you access
arbitrary slices of the data so chunk-processing should be easy and
efficient:

  http://www.hdfgroup.org/HDF5/

Cheers,
H.


On 10/29/2013 02:34 PM, Petar Milin wrote:
Hello,

On Oct 29, 2013, at 10:16 PM, Prof Brian Ripley <rip...@stats.ox.ac.uk> wrote:

On 29/10/2013 20:42, Rui Barradas wrote:
Hello,

You can use the argument to write.csv or write.table  append = TRUE to
write the matrix in chunks. Something like the following.

That was going to be my suggestion. But the reason long vectors have not been 
implemented is that is rather implausible to be useful.   A text file with the 
values of such a numeric matrix is likely to be 100GB. What are you going to do 
with such a file?  For transfer to another program I would seriously consider a 
binary format (e.g. use writeBin), as it is the conversion to and from text 
that is time consuming.

I need to submit it to a cluster analysis (k-means). From an independent source 
I have been advised to use means algorithm written in C which is very fast and 
efficient. It asks for a txt file as an input.

I tried few options in R, where I am more comfortable, but solution never came, 
even after too many hours.

Thanks!
Best,
PM
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to