I see.  For now: Yes, you need to change and recompile.
I will take a look what was actually changed and will run some test cases.

Best,
Uwe


On 06.10.2011 16:50, Patrick McCann wrote:
The error I am referring to is in unique.c in Base R, it cannot
accomodate greater than 2^29 values, even though it appears the
overflow protection should be 2^30. The only relevance of the arules
package is I was using it while I discovered this issue.

Thanks,
Patrick



2011/10/6 Uwe Ligges<lig...@statistik.tu-dortmund.de>:


On 05.10.2011 22:15, Patrick McCann wrote:

Hi,

I am trying to read in a rather large list of transactions using the
arules library.

You mean the arules package?


It seems in the coerce method into the dgCmatrix, it
somewhere calls unique. Unique.c throws an error when  n>    536870912;
however, when 4*n was modified to 2*n in 2004, the overflow protection
should have changed from 2^29 to 2^30, right? If so, how would I
change it in my copy? Do I have to recompile everything?

Yes.

There is also the way to ask the maintainer to improve it, but it won't work
without reinstallation of the changed package sources.

Uwe Ligges



Thanks,
Patrick McCann


Here is a simple to reproduce example:

runif(2^29+5)->a
sum(unique(a))->b

Error in unique.default(a) : length 536870917 is too large for hashing

traceback()

3: unique.default(a)
2: unique(a)
1: unique(a)

unique.default

function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
     z<- .Internal(unique(x, incomparables, fromLast))
     if (is.factor(x))
         factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
             ordered = is.ordered(x))
     else if (inherits(x, "POSIXct"))
         structure(z, class = class(x), tzone = attr(x, "tzone"))
     else if (inherits(x, "Date"))
         structure(z, class = class(x))
     else z
}
<environment: namespace:base>

 From http://svn.r-project.org/R/trunk/src/main/unique.c I see:


/*
  Choose M to be the smallest power of 2
  not less than 2*n and set K = log2(M).
  Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29.

  Dec 2004: modified from 4*n to 2*n, since in the worst case we have
  a 50% full table, and that is still rather efficient -- see
  R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
*/
static void MKsetup(int n, HashData *d)
{
    int n4 = 2 * n;

    if(n<    0 || n>    536870912) /* protect against overflow to -ve */
        error(_("length %d is too large for hashing"), n);
    d->M = 2;
    d->K = 1;
    while (d->M<    n4) {
        d->M *= 2;
        d->K += 1;
    }
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to