Thanks Corey,
Ive looked into them before and I dont think they can help me with this problem. The Big functions are great for handling and analysing data sets that are too big for R to store in memory. However I believe my problem goes 1 step beyond that. In that my distance matrix has too many entries for Rs architecture to know how to store in memory, even if I had memory that was big enough to store it. Again, Im no expert in this so I may be wrong. Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. *From:* Corey Dow-Hygelund [mailto:godelsthe...@gmail.com] *Sent:* Thursday, 11 August 2011 3:00 AM *To:* Chris Howden *Cc:* r-help@r-project.org *Subject:* Re: [R] Can R handle a matrix with 8 billion entries? You might want to look into the packages bigmemory and biganalytics. Corey On Tue, Aug 9, 2011 at 8:38 PM, Chris Howden <ch...@trickysolutions.com.au> wrote: Hi, Im trying to do a hierarchical cluster analysis in R with a Big Data set. Im running into problems using the dist() function. Ive been looking at a few threads about Rs memory and have read the memory limits section in R help. However Im no computer expert so Im hoping Ive misunderstood something and R can handle my Big Data set, somehow. Although at the moment I think my dataset is simply too big and there is no way around it, but Id like to be proved wrong! My data set has 90523 rows of data and 24 columns. My understanding is that this means the distance matrix has a min of 90523^2 elements which is 8194413529. Which roughly translates as 8GB of memory being required (if I assume each entry requires 1 bit). I only have 4GB on a 32bit build of windows and R. So there is no way thats going to work. So then I thought of getting access to a more powerful computer, and maybe using cloud computing. However the R memory limit help mentions On all builds of R, the maximum length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9. Now as the distance matrix I require has more elements than this does this mean its too big for R no matter what I do? Any ideas would be welcome. Thanks. Chris Howden Founding Partner Tricky Solutions Tricky Solutions 4 Tricky Problems Evidence Based Strategic Development, IP Commercialisation and Innovation, Data Analysis, Modelling and Training (mobile) 0410 689 945 (fax / office) ch...@trickysolutions.com.au Disclaimer: The information in this email and any attachments to it are confidential and may contain legally privileged information. If you are not the named or intended recipient, please delete this communication and contact us immediately. Please note you are not authorised to copy, use or disclose this communication or any attachments without our consent. Although this email has been checked by anti-virus software, there is a risk that email messages may be corrupted or infected by viruses or other interferences. No responsibility is accepted for such interference. Unless expressly stated, the views of the writer are not those of the company. Tricky Solutions always does our best to provide accurate forecasts and analyses based on the data supplied, however it is possible that some important predictors were not included in the data sent to us. Information provided by us should not be solely relied upon when making decisions and clients should use their own judgement. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *The mark of a successful man is one that has spent an entire day on the bank of a river without feeling guilty about it.* [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.