Re: [R] Memory limit for Windows 64bit build of R

2012-08-06 Thread Jay Emerson
Alan, More RAM will definitely help. But if you have an object needing more than 2^31-1 ~ 2 billion elements, you'll hit a wall regardless. This could be particularly limiting for matrices. It is less limiting for data.frame objects (where each column could be 2 billion elements). But many R a

Re: [R] bigmemory

2012-05-11 Thread Jay Emerson
R internally uses 32-bit integers for indexing (though this may change). For this and other reasons these external objects with specialized purposes (larger-than-RAM, shared memory) simply can't behave exactly as R objects. Best case, some R functions will work. Others would simply break. Others

Re: [R] bigmemory

2012-05-11 Thread Jay Emerson
To answer your first question about read.big.matrix(), we don't know what your acc3.dat file is, but it doesn't appear to have been detected as a standard file (like a CSV file) or -- perhaps -- doesn't even exist (or doesn't exist in your current directory)? Next: > In addition, I am planning to

[R] bigmemory on Solaris

2011-12-01 Thread Jay Emerson
At one point we might have gotten something working (older version?) on Solaris x86, but were never successful on Solaris sparc that I remember -- it isn't a platform we can test and support. We believe there are problems with BOOST library compatibilities. We'll try (again) to clear up the other

Re: [R] Foreach (doMC)

2011-10-20 Thread Jay Emerson
ere of great help to > me on several ocasions and I have deep respect for everybody devoting his > time to open source software! > > Jannis > > > > On 10/19/2011 01:26 PM, Jay Emerson wrote: >>> >>> P.S. Is there any particular reason why there are s

Re: [R] Foreach (doMC)

2011-10-19 Thread Jay Emerson
> P.S. Is there any particular reason why there are so seldom answers to posts > regarding foreach and all these doMC/doSMP packages ? Do so few people use > these packages or does this have anything to do with the commercial origin of > these packages? Jannis, An interesting question. I'm a

Re: [R] efficient coding with foreach and bigmemory

2011-09-30 Thread Jay Emerson
First, we strongly recommend 64-bit R. Otherwise, you may not be able to scale up as far as you would like. Second, as I think you realize, with big objects you may have to do things in chunks. I generally recommend working a column at a time rather than in blocks of rows if possible (better per

Re: [R] Exception while using NeweyWest function with doMC

2011-08-29 Thread Jay Emerson
Simon, Though we're please to see another use of bigmemory, it really isn't clear that it is gaining you anything in your example; anything like as.big.matrix(matrix(...)) still consumes full RAM for both the inner matrix() and the new big.matrix -- is the filebacking really necessary. It also do

Re: [R] Installation of bigmemory fails

2011-06-25 Thread Jay Emerson
Premal, Package authors generally welcome direct emails. We've been away from this project since the release of 2.13.0 and I only just noticed the build errors. These generally occur because of some (usually small and solvable) problem with compilers and the BOOST libraries. We'll look at it an

Re: [R] Kolmogorov-smirnov test

2011-02-28 Thread Jay Emerson
Taylor Arnold and I have developed a package ks.test (available on R-Forge in beta version) that modifies stats::ks.test to handle discrete null distributions for one-sample tests. We also have a draft of a paper we could provide (email us). The package uses methodology of Conover (1972) and Gles

Re: [R] lm without intercept

2011-02-18 Thread Jay Emerson
No, this is a cute problem, though: the definition of R^2 changes without the intercept, because the "empty" model used for calculating the total sums of squares is always predicting 0 (so the total sums of squares are sums of squares of the observations themselves, without centering around the sam

Re: [R] [Fwd: adding more columns in big.matrix object of bigmemory package]

2010-12-17 Thread Jay Emerson
For good reasons (having to do with avoiding copies of massive things) we leave such merging to the user: create a new filebacking of the proper size, and fill it (likely a column at a time, assuming you have enough RAM to support that). Jay On Fri, Dec 17, 2010 at 2:16 AM, utkarshsinghal wrote:

Re: [R] big data and lmer

2010-10-22 Thread Jay Emerson
Though bigmemory, ff, and other big data solutions (databases, etc...) can help easily manage massive data, their data objects are not natively compatible with all the advanced functionality of R. Exceptions include lm and glm (both ff and bigmemory support his via Lumley's biglm package), kmeans,

Re: [R] merging and working with big data sets

2010-10-12 Thread Jay Emerson
I can't speak for ff and filehash, but bigmemory's data structure doesn't allow "clever" merges (for actually good reasons). However, it is still probably less painful (and faster) than other options, though we don't implement it: we leave it to the user because details may vary depending on the e

Re: [R] bigmemory doubt

2010-09-08 Thread Jay Emerson
By far the easiest way to achieve this would be to use the bigmemory C++ structures in your program itself. However, if you do something on your own (but fundamentally have a column-major matrix in shared memory), it should be possible to play around with the pointer with R/bigmemory to accomplish

Re: [R] Bigmemory: Error Running Example

2010-08-11 Thread Jay Emerson
It seems very likely you are working on a 32-bit version of R, but it's a little surprising still that you would have a problem with any single year. Please tell us the operating system and version of R. Did you preprocess the airline CSV file using the utilities provided on bigmemory.org? If you

Re: [R] (help) This is an R workspace memory processing question

2010-06-23 Thread Jay Emerson
You should look at packages like ff, bigmemory, RMySQL, and so on. However, you should really consider moving to a different platform for large-data work (Linux, Mac, or Windows 7 64-bit). Jay - This is an R workspace memory processing question.

Re: [R] Parallel computing on Windows (foreach) (Sergey Goriatchev)

2010-06-16 Thread Jay Emerson
foreach (or virtually anything you might use for concurrent programming) only really makes sense if the work the "clients" are doing is substantial enough to overwhelm the communication overhead. And there are many ways to accomplish the same task more or less efficiently (for example, doing block

[R] [R-pkgs] bigmemory 4.2.3

2010-05-17 Thread Jay Emerson
visit http://www.bigmemory.org/. Jay Emerson & Mike Kane -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay ___ R-packages mailing list r-packa...@r-project.org h

[R] [R-pkgs] Bayesian change point" package bcp 2.2.0 available

2010-05-17 Thread Jay Emerson
consider taking advantage of it for tasks that might be computationally intensive and could be easily done in parallel.  Some vignettes are available at http://cran.r-project.org/web/packages/foreach/index.html. Jay Emerson & Chandra Erdman (Apologies, the first version of this announcement was not pl

[R] [R-pkgs] "Bayesian change point" package bcp 2.2.0 available

2010-05-10 Thread Jay Emerson
consider taking advantage of it for tasks that might be computationally intensive and could be easily done in parallel. Some vignettes are available at http://cran.r-project.org/web/packages/foreach/index.html. Jay Emerson & Chandra Erdman -- John W. Emerson (Jay) Associate Professor of Stati

Re: [R] bigmemory package woes

2010-04-24 Thread Jay Emerson
Zerdna, Please note that the CRAN version 3.12 is about to be replaced by a new cluster of packages now on R-Forge; we consider the new bigmemory >= 4.0 to be "stable" and recommend you start using it immediately. Please see http://www.bigmemory.org. In your case, two comments: (1) Your for() l

Re: [R] Huge data sets and RAM problems

2010-04-20 Thread Jay Emerson
Stella, A few brief words of advice: 1. Work through your code a line at a time, making sure that each is what you would expect. I think some of your later problems are a result of something early not being as expected. For example, if the read.delim() is in fact not giving you what you expect,

Re: [R] large dataset

2010-03-27 Thread Jay Emerson
A little more information would help, such as the number of columns? I imagine it must be large, because 100,000 rows isn't overwhelming. Second, does the read.csv() fail, or does it work but only after a long time? And third, how much RAM do you have available? R Core provides some guidelines

Re: [R] Mosaic plots

2010-03-23 Thread Jay Emerson
As pointed out by others, vcd supports mosaic plots on top of the grid engine (which is extremely helpful for those of us who love playing around with grid). The standard mosaicplot() function is directly available (it isn't clear if you knew this). The proper display of names is a real challenge

Re: [R] question about bigmemory: releasing RAM from a big.matrix that isn't used anymore

2010-02-06 Thread Jay Emerson
>>> See inline for responses. But people are always welcome to contact >>> us directly. Hi all, I'm on a Linux server with 48Gb RAM. I did the following: x <- big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50)) #Gets around the 2^31 issue - yeah! >>> We stron

Re: [R] Multicore package: sharing/modifying variable accross processes

2009-10-31 Thread Jay Emerson
Renaud, Package bigmemory can help you with shared-memory matrices, either in RAM or filebacked. Mutex support currently exists as part of the package, although for various reasons will soon be abstracted from the package and provided via a new package, synchronicity. bigmemory works beautifully

Re: [R] Estimation in a changepoint regression with R

2009-10-16 Thread Jay Emerson
Package bcp does Bayesian changepoint analysis, though not in the general regression framework.  The most recent reference is Bioinformatics 24(19) 2143-2148; doi:   10.1093/bioinformatics/btn404; slightly older is JSS 23(3). Both reference some alternatives you might want to consider (including s

Re: [R] reading web log file into R

2009-09-23 Thread Jay Emerson
Sebastian, There is rarely a completely free lunch, but fortunately for us R has some wonderful tools to make this possible. R supports regular expressions with commands like grep(), gsub(), strsplit(), and others documented on the help pages. It's just a matter of constructing and algorithm tha

Re: [R] kmeans.big.matrix

2009-07-22 Thread Jay Emerson
This sort of question is ideal to send directly to the maintainer. We've removed kmeans.big.matrix for the time being and will place it in a new package, bigmemoryAnalytics. bigmemory itself is the core building block and tool, and we don't want to pollute it with lots of extras. Allan's point i

Re: [R] Building a big.matrix using foreach

2009-07-19 Thread Jay Emerson
Michael, If you have a big.matrix, you just want to iterate over the rows. I'm not in R and am just making this up on the fly (from a bar in Beijing, if you believe that): foreach(i=1:nrow(x),.combine=c) %dopar% f(x[i,]) should work, essentially applying the functin f() to the rows of x? But p

Re: [R] bigmemory - extracting submatrix from big.matrix object

2009-06-03 Thread Jay Emerson
ce code of the function "colmean" can help, > if that is not too much to ask for. Or if we can develop a function similar > to "apply" of the base R. > > > Regards > Utkarsh > > > > > Jay Emerson wrote: >> >> We also have ColCountNA(),

Re: [R] bigmemory - extracting submatrix from big.matrix object

2009-06-02 Thread Jay Emerson
Thanks for trying this out. Problem 1. We'll check this. Options should certainly be available. Thanks! Problem 2. Fascinating. We just (yesterday) implemented a sub.big.matrix() function doing exactly this, creating something that is a big matrix but which just references a contiguous subset

[R] [R-pkgs] Major bigmemory revision released.

2009-04-16 Thread Jay Emerson
matrices for larger-than-RAM applications. We're working on updating the package vignette, and a draft is available upon request (just send me an email if you're interested). The user interface is largely unchanged. Feedback, bug reports, etc... are welcome. Jay Emerson & Michael Ka

Re: [R] Using very large matrix

2009-03-02 Thread Jay Emerson
released > on 32 bit R on 32 bit MS Windows and only closed source  I normally > use > 64 bit R on 64 bit Linux :) > > I tried to use the bigmemory in cran with 32 bit windows, but I had some > serious problems. > > Best, > > On Thursday 26 February 2009 1

Re: [R] Using very large matrix

2009-02-26 Thread Jay Emerson
Corrado, Package bigmemory has undergone a major re-engineering and will be available soon (available now in Beta version upon request). The version currently on CRAN is probably of limited use unless you're in Linux. bigmemory may be useful to you for data management, at the very least, where

[R] [R-pkgs] Package bigmemory now available on CRAN

2008-06-26 Thread Jay Emerson
Package "bigmemory" is now available on CRAN. A brief abstract follows: Multi-gigabyte data sets challenge and frustrate R users even on well-equipped hardware. C/C++ and Fortran programming can be helpful, but is cumbersome for interactive data analysis and lacks the flexibility and power of R's

Re: [R] R package building

2008-05-17 Thread Jay Emerson
I agree with others that the packaging system is generally easy to use, and between the "Writing R Extensions" documentation and other scattered sources (including these lists) there shouldn't be many obstacles. Using "package.skeleton()" is a great way to get started: I'd recommend just having on

Re: [R] R on a computer cluster

2008-02-17 Thread Jay Emerson
Gabriele, In addition to the suggestions from Markus (below), there is NetWorkSpaces (package nws). I have used both nws and snow together with a package I'm developing (bigmemoRy) which allocates matrices to shared memory (helping avoid the bottleneck Markus alluded to for processors on the same

Re: [R] Memory problem?

2008-01-31 Thread Jay Emerson
Elena, Page 23 of the R Installation Guide provides some memory guidelines that you might find helpful. There are a few things you could try using R, at least to get up and running: - Look at fewer tumors at a time using standard R as you have been. - Look at the ff package, which leaves the dat