Alan,
More RAM will definitely help. But if you have an object needing more than
2^31-1 ~ 2 billion elements, you'll hit a wall regardless. This could be
particularly limiting for matrices. It is less limiting for data.frame
objects (where each column could be 2 billion elements). But many R
a
R internally uses 32-bit integers for indexing (though this may change).
For this and other reasons these external objects with specialized purposes
(larger-than-RAM, shared memory) simply can't behave exactly as R objects.
Best case, some R functions will work. Others would simply break. Others
To answer your first question about read.big.matrix(), we don't know what
your acc3.dat file is, but it doesn't appear to have been detected as a
standard file (like a CSV file) or -- perhaps -- doesn't even exist (or
doesn't exist in your current directory)?
Next:
> In addition, I am planning to
At one point we might have gotten something working (older version?) on
Solaris x86, but were never successful on Solaris sparc that I remember --
it isn't a platform we can test and support. We believe there are problems
with BOOST library compatibilities.
We'll try (again) to clear up the other
ere of great help to
> me on several ocasions and I have deep respect for everybody devoting his
> time to open source software!
>
> Jannis
>
>
>
> On 10/19/2011 01:26 PM, Jay Emerson wrote:
>>>
>>> P.S. Is there any particular reason why there are s
> P.S. Is there any particular reason why there are so seldom answers to posts
> regarding foreach and all these doMC/doSMP packages ? Do so few people use
> these packages or does this have anything to do with the commercial origin of
> these packages?
Jannis,
An interesting question. I'm a
First, we strongly recommend 64-bit R. Otherwise, you may not be able
to scale up as far as you would like.
Second, as I think you realize, with big objects you may have to do
things in chunks. I generally recommend working a column at a time
rather than in blocks of rows if possible (better per
Simon,
Though we're please to see another use of bigmemory, it really isn't
clear that it is gaining you
anything in your example; anything like as.big.matrix(matrix(...))
still consumes full RAM for both
the inner matrix() and the new big.matrix -- is the filebacking really
necessary. It also do
Premal,
Package authors generally welcome direct emails.
We've been away from this project since the release of 2.13.0 and I only just
noticed the build errors. These generally occur because of some (usually
small and solvable) problem with compilers and the BOOST libraries. We'll
look at it an
Taylor Arnold and I have developed a package ks.test (available on R-Forge
in beta version) that modifies stats::ks.test to handle discrete null
distributions
for one-sample tests. We also have a draft of a paper we could provide (email
us). The package uses methodology of Conover (1972) and Gles
No, this is a cute problem, though: the definition of R^2 changes
without the intercept, because the
"empty" model used for calculating the total sums of squares is always
predicting 0 (so the total sums
of squares are sums of squares of the observations themselves, without
centering around the sam
For good reasons (having to do with avoiding copies of massive things)
we leave such merging to the user: create a new filebacking of the
proper size, and fill it (likely a column at a time, assuming you have
enough RAM to support that).
Jay
On Fri, Dec 17, 2010 at 2:16 AM, utkarshsinghal
wrote:
Though bigmemory, ff, and other big data solutions (databases, etc...)
can help easily manage massive data, their data objects are not
natively compatible with all the advanced functionality of R.
Exceptions include lm and glm (both ff and bigmemory support his via
Lumley's biglm package), kmeans,
I can't speak for ff and filehash, but bigmemory's data structure
doesn't allow "clever" merges (for actually good reasons). However,
it is still probably less painful (and faster) than other options,
though we don't implement it: we leave it to the user because details
may vary depending on the e
By far the easiest way to achieve this would be to use the bigmemory
C++ structures in your program itself. However, if you do something
on your own (but fundamentally have a column-major matrix in shared
memory), it should be possible to play around with the pointer with
R/bigmemory to accomplish
It seems very likely you are working on a 32-bit version of R, but it's a
little surprising still that you would have a problem with any single year.
Please tell us the operating system and version of R. Did you preprocess
the airline CSV file using the utilities provided on bigmemory.org? If you
You should look at packages like ff, bigmemory, RMySQL, and so on. However,
you should really consider moving to a different platform for large-data
work (Linux, Mac, or Windows 7 64-bit).
Jay
-
This is an R workspace memory processing question.
foreach (or virtually anything you might use for concurrent programming)
only really makes sense if the work the "clients" are doing is substantial
enough to overwhelm the communication overhead. And there are many ways to
accomplish the same task more or less efficiently (for example, doing block
visit
http://www.bigmemory.org/.
Jay Emerson & Mike Kane
--
John W. Emerson (Jay)
Associate Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay
___
R-packages mailing list
r-packa...@r-project.org
h
consider taking advantage of it for tasks that
might be computationally intensive and could be easily done in
parallel. Some vignettes are available at
http://cran.r-project.org/web/packages/foreach/index.html.
Jay Emerson & Chandra Erdman
(Apologies, the first version of this announcement was not pl
consider taking advantage of it for tasks that might be computationally
intensive and could be easily done in parallel. Some vignettes are
available at http://cran.r-project.org/web/packages/foreach/index.html.
Jay Emerson & Chandra Erdman
--
John W. Emerson (Jay)
Associate Professor of Stati
Zerdna,
Please note that the CRAN version 3.12 is about
to be replaced by a new cluster of packages now on R-Forge; we consider the
new bigmemory >= 4.0 to be "stable" and recommend you start using it
immediately. Please see http://www.bigmemory.org.
In your case, two comments:
(1) Your for() l
Stella,
A few brief words of advice:
1. Work through your code a line at a time, making sure that each is what
you would expect. I think some of your later problems are a result of
something
early not being as expected. For example, if the read.delim() is in fact
not
giving you what you expect,
A little more information would help, such as the number of columns? I
imagine it must
be large, because 100,000 rows isn't overwhelming. Second, does the
read.csv() fail,
or does it work but only after a long time? And third, how much RAM do you
have
available?
R Core provides some guidelines
As pointed out by others, vcd supports mosaic plots on top of the grid
engine (which is extremely helpful for those of us who love playing around
with grid). The standard mosaicplot() function is directly available (it
isn't clear if you knew this). The proper display of names is a real
challenge
>>> See inline for responses. But people are always welcome to contact
>>> us directly.
Hi all,
I'm on a Linux server with 48Gb RAM. I did the following:
x <-
big.matrix(nrow=2,ncol=50,type='short',init=0,dimnames=list(1:2,1:50))
#Gets around the 2^31 issue - yeah!
>>> We stron
Renaud,
Package bigmemory can help you with shared-memory matrices, either in RAM or
filebacked. Mutex support currently exists as part of the package, although
for various reasons will soon be abstracted from the package and provided
via a new package, synchronicity.
bigmemory works beautifully
Package bcp does Bayesian changepoint analysis, though not in the
general regression
framework. The most recent reference is Bioinformatics 24(19) 2143-2148; doi:
10.1093/bioinformatics/btn404; slightly older is JSS 23(3). Both
reference some
alternatives you might want to consider (including s
Sebastian,
There is rarely a completely free lunch, but fortunately for us R has
some wonderful tools
to make this possible. R supports regular expressions with commands
like grep(),
gsub(), strsplit(), and others documented on the help pages. It's
just a matter of
constructing and algorithm tha
This sort of question is ideal to send directly to the maintainer.
We've removed kmeans.big.matrix for the time being and will place it in a
new package, bigmemoryAnalytics. bigmemory itself is the core building
block and tool, and we don't want to pollute it with lots of extras.
Allan's point i
Michael,
If you have a big.matrix, you just want to iterate over the rows. I'm not
in R and am just making this up on the fly (from a bar in Beijing, if you
believe that):
foreach(i=1:nrow(x),.combine=c) %dopar% f(x[i,])
should work, essentially applying the functin f() to the rows of x? But
p
ce code of the function "colmean" can help,
> if that is not too much to ask for. Or if we can develop a function similar
> to "apply" of the base R.
>
>
> Regards
> Utkarsh
>
>
>
>
> Jay Emerson wrote:
>>
>> We also have ColCountNA(),
Thanks for trying this out.
Problem 1. We'll check this. Options should certainly be available. Thanks!
Problem 2. Fascinating. We just (yesterday) implemented a
sub.big.matrix() function doing exactly
this, creating something that is a big matrix but which just
references a contiguous subset
matrices for larger-than-RAM applications. We're working on updating
the package vignette, and a draft is available upon request (just send
me an email if you're interested). The user interface is largely unchanged.
Feedback, bug reports, etc... are welcome.
Jay Emerson & Michael Ka
released
> on 32 bit R on 32 bit MS Windows and only closed source I normally
> use
> 64 bit R on 64 bit Linux :)
>
> I tried to use the bigmemory in cran with 32 bit windows, but I had some
> serious problems.
>
> Best,
>
> On Thursday 26 February 2009 1
Corrado,
Package bigmemory has undergone a major re-engineering and will be available
soon (available now in Beta version upon request). The version
currently on CRAN
is probably of limited use unless you're in Linux.
bigmemory may be useful to you for data management, at the very least, where
Package "bigmemory" is now available on CRAN. A brief abstract follows:
Multi-gigabyte data sets challenge and frustrate R users even on
well-equipped hardware.
C/C++ and Fortran programming can be helpful, but is cumbersome for interactive
data analysis and lacks the flexibility and power of R's
I agree with others that the packaging system is generally easy to
use, and between the "Writing R Extensions" documentation and other
scattered sources (including these lists) there shouldn't be many
obstacles. Using "package.skeleton()" is a great way to get started:
I'd recommend just having on
Gabriele,
In addition to the suggestions from Markus (below), there is
NetWorkSpaces (package nws). I have used both nws and snow together
with a package I'm developing (bigmemoRy) which allocates matrices to
shared memory (helping avoid the bottleneck Markus alluded to for
processors on the same
Elena,
Page 23 of the R Installation Guide provides some memory guidelines
that you might find helpful.
There are a few things you could try using R, at least to get up and running:
- Look at fewer tumors at a time using standard R as you have been.
- Look at the ff package, which leaves the dat
40 matches
Mail list logo