[R] Maximum length for a -e argument to Rscript?

2017-04-21 Thread Ben Haller
  Hi!  I’m attempting to use Rscript to do some automated plotting.  It is 
working well, except that I seem to be running into a maximum line length 
issue, and I’m wondering if it is a bug on your end.  Here’s an example of the 
command I’m trying to run:

/usr/local/bin/Rscript -e '{x <- c(-1.31171, -0.686165, 1.62771, 0.320195, 
-0.322011, 1.66518, -0.271971, -0.665367, 0.516482, -0.716343, -0.317471, 
0.068046, -0.100371, -1.15907, 0.263329, -0.936049, -0.852444, 0.358817, 
-0.233959, 0.209891, -0.831575, -0.952987, -0.0420206, -1.78527, -0.280584, 
-0.62353, 1.42597, 0.127994, 0.0751232, 0.896835, -0.319488, 0.897876, 0.18457, 
0.779571, -0.0543194, 0.226722, -0.769983, -0.723463, 0.144386, -0.468544, 
-0.349417, 0.336786, 0.749212, -1.62397, 0.683075, -0.746449, 0.300921, 
-0.365468, 0.548271, 1.13169, -1.34042, -0.0740572, 1.34986, 0.531771, 
-0.147157, 0.824894, -1.05816, 1.58867, -0.885764, 1.11912, 0.361512, 1.77985, 
0.585099, -1.205, 2.44134, -0.331372, -0.346322, 0.0535267, -1.75089, 
0.0773243, -1.07846, -1.29632, 1.0622, 1.34867, 0.199777, 0.197516, 0.574185, 
1.06555, -0.885166, -0.788576, -1.46061, -1.54026, 0.690576, -0.88821, 
0.343747, -0.100751, -0.865596, -0.128504, 0.222334, -1.18932, -0.555258, 
-0.557368, 0.219272, 0.298858, 0.848022, 0.142608, 1.10082, -0.348039, 
0.0566489, 0.662136, 0.50451, -0.909399, 1.02446, 1.40592, -0.114786, -1.10718, 
2.02549, 0.0818607, -1.037, 1.18961, -0.204, 2.83165, -0.959653, -0.393082, 
-0.463351, 0.914054, 1.14472, -1.32927, 1.25416, 0.372267, 0.410832, 1.04187, 
1.22288, 1.27131, 0.0949385, 0.194053, -0.226184, -0.502155, -1.36834, 
-0.000591861, -0.565903, 1.14099, 1.67811, 0.331709, -0.756879, 0.889596, 
0.718098, 0.740242, -0.861818, 0.0332746, 1.01745, 0.584536, -1.14245, 
-0.85, -1.34237, 0.660603, 1.16048, -0.898828, 0.965746, -1.16953, 
-2.33417, 0.591078, -0.364892, 0.0719267, -1.21676, 1.12646, 1.37534, 
0.0712832, 1.22889, -0.0110024, 0.248213, -1.12013, -0.525197, -0.352156, 
-0.317182, -0.89552, 1.53422, -1.36777, 1.52099, 1.18789, -3.15251, 1.24008, 
-0.564289, -0.515629, -0.0920464, 2.94027, 0.895481, -0.157643, -0.347874, 
-0.290823, -0.771436, 1.29285, 0.216689, -1.86856, 2.24075, 0.888635, 0.430417, 
-0.585856, 1.13119, -0.243977, 0.544491, 0.921995, 0.815365, 1.2584, -1.29347, 
0.0574579, 0.990557, -1.58657, -0.264373, 0.865893, 0.599298, -0.417531, 
0.132897, 1.88597, 1.33112, -0.880904, 0.0762161, 0.0567852, 0.593295, 
-0.632135, 0.885625, 0.365863, -0.17885, 0.420185, -0.508275, 0.974357, 
0.628085, 0.710578, 1.72447, 1.38488, 1.01301, 1.30015, 0.260501, 0.808981, 
0.440228, 0.416344, -1.66273, -0.397224, -0.512086, -0.175854, -0.663143, 
0.369712, -1.01654, 0.660465, 0.124851, -1.51101, -0.95725, 2.09893, 1.26819, 
1.08086, 0.493204, 0.79073, 1.49191, 0.563689, 0.414473, 2.27361, 0.871923, 
0.193703, -0.185039, -0.312859, -1.42267, -2.11561, 0.311996, -0.0906527, 
1.19139, 1.57502, 1.10587, 0.416333, 2.35374, -1.0531, 0.0450512, 0.979958, 
0.398269, 0.0897618, -0.855305, -1.59337, -0.084904, 0.245872, 1.27115, 1.3512, 
-0.166962, 1.01098, -1.19854, -2.05932, -0.98, 0.704973, 0.694688, 1.20408, 
-1.12553, 0.770056, 1.01602, 0.295223, -1.18801, 1.51707, 1.1418, -0.148787, 
1.28886, 1.23981, 1.67984, 0.0185941, -0.877581, 0.495042, -0.368668, 1.59972, 
-2.20849, -1.36852, -0.972566, -1.01848, -0.366674, -2.60273, -0.540706, 
-0.475797, 0.227651, -1.11476, 1.73452, -0.212185, 3.04994, -0.251225, 
-0.0443482, -0.489125, 0.557936, -0.246389, -0.954287, 0.388668, 0.759049, 
-0.501683, -1.98722, 0.158712, -0.897082, -1.17644, 0.192465, -1.49901, 
-0.289129, -0.0460198, -0.520331, 0.432488, -0.471775, 1.21482, 0.998091, 
-0.794933, -0.36989, 0.937091, 1.27297, 1.06108, -0.784307, 0.70919, -0.309221, 
-0.516031, 0.479702, -0.669637, 1.60288, 0.657474, -0.666286, -1.01816, 
-0.452818, -0.283749, 1.05511, -1.2002, 0.112269, -1.37403, 1.00512, 0.664936, 
0.600516, -1.08099, -0.705522, 0.103051, 0.0461179, 1.74404, 0.727475, 2.41175, 
1.20414, 1.71095, 0.0957544, 0.610755, 0.545194, -0.936381, 0.190467, 0.485791, 
0.0855697, 0.142676, 0.721754, -1.84506, 2.1227, -1.1271, -1.11228, -1.2807, 
0.13385, 0.228727, -0.34391, 1.09837, -0.37053, 0.832574, 0.673463, 0.717732, 
-0.307687, 1.12486, 0.159803, -1.51251, 1.403, 2.0215, 0.010914, -0.543749, 
0.137119, 0.200364, -0.104029, -0.930966, -1.56781, -0.526978, -0.537582, 
1.11872, -0.99061, -0.501421, 1.21982, 0.813201, -0.539109, 0.433523, 
-0.0615188, 2.04298, 0.697692, 1.34515, 1.7298, 0.515137, 2.08119, 0.550985, 
1.49876, 1.31187, 0.920405, 0.597678, 0.884353, -0.732892, -0.143545, 
-0.236836, -0.330872, 1.55577, -1.74473, -0.493322, 1.46375, 1.14347, 1.76164, 
1.73099, -0.234701, -0.0546848, 0.346991, -0.393301, 1.34267, -1.58519, 
-0.381789, 0.622675, 1.34655, 2.84895, -0.371, -0.519666, -1.64944, 0.573592, 
1.06484, -0.0239894, -0.604563, 0.0680641, -0.881325, 1.07265, 0.182585, 
0.373288, 2.20228, -0.763593, -0.25659, 1.9397, -0.166943, -0.672522, -1.35983, 
0.227406, 0.49471, -1

Re: [R] Maximum length for a -e argument to Rscript?

2017-04-21 Thread Ben Haller
  OK, thanks for the information Ben and Henrik.  It sounds like even if the 
Rscript stall were fixed, there would still be a 10k limit when using -e, as 
well as a 4K maximum line length limit when using -e or stdin, if I understand 
the section cited by Ben; so I need to switch to using a script file instead, 
to get past these various limits, I guess.  Thanks for your help.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


> On Apr 22, 2017, at 6:59 AM, Henrik Bengtsson  
> wrote:
> 
> That Rscript stalls sounds like a bug, but not sure it's R or the
> terminal that needs to be fixed:
> 
>  Rscript -e "$long_expression"
> WARNING: '-e 
> {x~+~<-~+~c(-1.31171,~+~-0.686165,~+~1.62771,~+~0.320195,~+~-0.322011,~+~1.66518,~+~-0.271971,~+~-0.665367,~+~0.516482,~+~-0.716343,~+~-0.317471,~+~0.068046,~+~-0.100371,~+~-1.15907,~+~0.263329,~+~-0.936049,~+~-0.852444,~+~0.358817,~+~-0.233959,~+~0.209891,~+~-0.831575,~+~-0.952987,~+~-0.0420206,~+~-1.78527,~+~-0.280584,~+~-0.62353,~+~1.42597,~+~0.127994,~+~0.0751232,~+~0.896835,~+~-0.319488,~+~0.897876,~+~0.18457,~+~0.779571,~+~-0.0543194,~+~0.226722,~+~-0.769983,~+~-0.723463,~+~0.144386,~+~-0.468544,~+~-0.349417,~+~0.336786,~+~0.749212,~+~-1.62397,~+~0.683075,~+~-0.746449,~+~0.300921,~+~-0.365468,~+~0.548271,~+~1.13169,~+~-1.34042,~+~-0.0740572,~+~1.34986,~+~0.531771,~+~-0.147157,~+~0.824894,~+~-1.05816,~+~1.58867,~+~-0.885764,~+~1.11912,~+~0.361512,~+~1.77985,~+~0.585099,~+~-1.205,~+~2.44134,~+~-0.331372,~+~-0.346322,~+~0.0535267,~+~-1.75089,~+~0.0773243,~+~-1.07846,~+~-1.29632,~+~1.0622,~+~1.34867,~+~0.199777,~+~0.197516,~+~0.574185,~+~1.06555,~+~-0.885166,~+~-0.788576,~+~-1.46061,~+~-1.5402
> ^C^C
> ^C
> ^C^C^C^C^C^C
> ^\Quit (core dumped)
> 
> On my default Ubuntu 16.04 terminal, R 3.3.3 hangs and does not
> respond to user interrupts (SIGINT), but it does respond to Ctrl-\
> (SIGKILL).
> 
> A workaround is to pass the expression via standard input to R, e.g.
> 
> $ echo "$long_expression" | R --no-save
> 
> /Henrik
> 
> On Fri, Apr 21, 2017 at 11:07 AM, Ben Tupper  wrote:
>> Hi,
>> 
>> I suspect you are over the 10kb limit for the expression.  See
>> 
>> https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Invoking-R-from-the-command-line
>> 
>> Cheers,
>> Ben
>> 
>>> On Apr 21, 2017, at 3:44 AM, Ben Haller  wrote:
>>> 
>>> Hi!  I’m attempting to use Rscript to do some automated plotting.  It is 
>>> working well, except that I seem to be running into a maximum line length 
>>> issue, and I’m wondering if it is a bug on your end.  Here’s an example of 
>>> the command I’m trying to run:
>>> 
>>> /usr/local/bin/Rscript -e '{x <- c(-1.31171, -0.686165, 1.62771, 0.320195, 
>>> -0.322011, 1.66518, -0.271971, -0.665367, 0.516482, -0.716343, -0.317471, 
>>> 0.068046, -0.100371, -1.15907, 0.263329, -0.936049, -0.852444, 0.358817, 
>>> -0.233959, 0.209891, -0.831575, -0.952987, -0.0420206, -1.78527, -0.280584, 
>>> -0.62353, 1.42597, 0.127994, 0.0751232, 0.896835, -0.319488, 0.897876, 
>>> 0.18457, 0.779571, -0.0543194, 0.226722, -0.769983, -0.723463, 0.144386, 
>>> -0.468544, -0.349417, 0.336786, 0.749212, -1.62397, 0.683075, -0.746449, 
>>> 0.300921, -0.365468, 0.548271, 1.13169, -1.34042, -0.0740572, 1.34986, 
>>> 0.531771, -0.147157, 0.824894, -1.05816, 1.58867, -0.885764, 1.11912, 
>>> 0.361512, 1.77985, 0.585099, -1.205, 2.44134, -0.331372, -0.346322, 
>>> 0.0535267, -1.75089, 0.0773243, -1.07846, -1.29632, 1.0622, 1.34867, 
>>> 0.199777, 0.197516, 0.574185, 1.06555, -0.885166, -0.788576, -1.46061, 
>>> -1.54026, 0.690576, -0.88821, 0.343747, -0.100751, -0.865596, -0.128504, 
>>> 0.222334, -1.18932, -0.555258, -0.557368, 0.219272, 0.298858, 0.848022, 
>>> 0.142608, 1.10082, -0.348039, 0.0566489, 0.662136, 0.50451, -0.909399, 
>>> 1.02446, 1.40592, -0.114786, -1.10718, 2.02549, 0.0818607, -1.037, 1.18961, 
>>> -0.204, 2.83165, -0.959653, -0.393082, -0.463351, 0.914054, 1.14472, 
>>> -1.32927, 1.25416, 0.372267, 0.410832, 1.04187, 1.22288, 1.27131, 
>>> 0.0949385, 0.194053, -0.226184, -0.502155, -1.36834, -0.000591861, 
>>> -0.565903, 1.14099, 1.67811, 0.331709, -0.756879, 0.889596, 0.718098, 
>>> 0.740242, -0.861818, 0.0332746, 1.01745, 0.584536, -1.14245, -0.85, 
>>> -1.34237, 0.660603, 1.16048, -0.898828, 0.965746, -1.16953, -2.33417, 
>>> 0.591078, -0.364892, 0.0719267, -1.21676, 1.12646, 1.37534, 0.0712832, 
>>> 1.22889, -0.0110024, 0.248213, -1.12013, -0.525197, -0.352156, -0.317182, 
>>> -0.89552, 1.53422, -1.36777, 1.52099, 1.18789, -3.15251, 1.24008, 
>>> -0.564289, 

[R] Confidence intervals and polynomial fits

2011-05-06 Thread Ben Haller
  Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
fit, but I don't think that's important) for a formula that contains powers of 
the explanatory variable up to fourth.  So the fit looks something like this 
(typing into mail; the actual fit code is complicated because it involves 
step-down and so forth):

x_sq <- x * x
x_cb <- x * x * x
x_qt <- x * x * x * x
model <- glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)

  This works great, and I get a nice fit.  My question regards the confidence 
intervals on that fit.  I am calling:

cis <- confint.default(model, level=0.95)

to get the confidence intervals for each coefficient.  This confint.default() 
call gives me a table like this (where "dispersal" is the real name of what I'm 
calling "x" above):

 2.5 %  97.5 %
(Intercept)  8.7214297   8.9842533
dispersal  -37.5621412 -35.6629981
dispersal_sq66.8077669  74.4490123
dispersal_cb   -74.5963861 -62.8347057
dispersal_qt22.6590159  28.6506186

  A side note: calling confint() instead of confint.default() is much, much 
slower -- my model has many other terms I'm not discussing here, and is fit to 
300,000 data points -- and a check indicates that the intervals returned for my 
model by confint() are not virtually identical, so this is not the source of my 
problem:

 2.5 %  97.5 %
(Intercept)  8.7216693   8.9844958
dispersal  -37.5643922 -35.6652276
dispersal_sq66.8121519  74.4534753
dispersal_cb   -74.5995820 -62.8377766
dispersal_qt22.6592724  28.6509494

  These tables show the problem: the 95% confidence limits for the quartic term 
are every bit as wide as the limits for the other terms.  Since the quartic 
term coefficient gets multiplied by the fourth power of x, this means that the 
width of the confidence band starts out nice and narrow (when x is close to 
zero, where the width of the confidence band is pretty much just determined by 
the confidence limits on the intercept) but balloons out to be tremendously 
wide for larger values of x.  The width of it looks quite unreasonable to me, 
given that my fit is constrained by 300,000 data points.

  Intuitively, I would have expected the confidence limits for the quartic term 
to be much narrower than for, say, the linear term, because the effect of 
variation in the quartic term is so much larger.  But the way confidence 
intervals are calculated in R, at least, does not seem to follow my instincts 
here.  In other words, predicted values using the 2.5% value of the linear 
coefficient and the 97.5% value of the linear coefficient are really not very 
different, but predicted values using the 2.5% value of the quadratic 
coefficient and the 97.5% value of the quadratic coefficient are enormously, 
wildly divergent -- far beyond the range of variability in the original data, 
in fact.  I feel quite confident, in a seat-of-the-pants sense, that the actual 
95% limits of that quartic coefficient are much, much narrower than what R is 
giving me.

  Looking at it another way: if I just exclude the quartic term from the glm() 
fit, I get a dramatically narrower confidence band, even though the fit to the 
data is not nearly as good (I'm keeping the fourth power for good reasons).  
This again seems to me to indicate that the confidence band with the quartic 
term included is artificially, and incorrectly, wide.  If a fit with reasonably 
narrow confidence limits can be produced for the model with terms only up to 
cubic (or, taking this reductio even further, for a quadratic or a linear 
model, also), why does adding the quadratic term, which improves the quality of 
the fit to the data, cause the confidence limits to nevertheless balloon 
outwards?  That seems fundamentally illogical to me, but maybe my instincts are 
bad.

  So what's going on here?  Is this just a limitation of the way confint() 
computes confidence intervals?  Is there a better function to use, that is 
compatible with binomial glm() fits?  Or am I misinterpreting the meaning of 
the confidence limits in some way?  Or what?

  Thanks for any advice.  I've had trouble finding examples of polynomial fits 
like this; people sometimes fit the square of the explanatory variable, of 
course, but going up to fourth powers seems to be quite uncommon, so I've had 
trouble finding out how others have dealt with these sorts of issues that crop 
up only with higher powers.

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-06 Thread Ben Haller
> From what you have written, I am not exactly sure what your
> seat-of-the-pant sense is coming from.  My pantseat typically does not
> tell me much; however, quartic trends tend to less stable than linear,
> so I am not terribly surprised.

  My pantseat is not normally very informative either, but if you saw the width 
of the confidence limits I'm getting for the quartic coefficient, I think your 
pantseat would agree with mine.  :->  The confidence band is staggeringly wide, 
many times the variation in the data itself; and with 300,000 data points to 
fit to, that just should not be the case.  With that many data points, it seems 
quite obvious that one can be "95% confident" that the true data lies within a 
band somewhere reasonably close to the band that the sample data actually fall 
into, not a band many times wider.

> As two side notes:
> 
> x_qt <- x^4 # shorter code-wise
> and you can certain just enter a quartic term if the data is just
> quartic and you are not really itnerested in the lower trends.

  Yep, for sure.

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-06 Thread Ben Haller
On May 6, 2011, at 12:31 PM, David Winsemius wrote:

> On May 6, 2011, at 11:35 AM, Ben Haller wrote:
> 
>> Hi all!  I'm getting a model fit from glm() (a binary logistic regression 
>> fit, but I don't think that's important) for a formula that contains powers 
>> of the explanatory variable up to fourth.  So the fit looks something like 
>> this (typing into mail; the actual fit code is complicated because it 
>> involves step-down and so forth):
>> 
>> x_sq <- x * x
>> x_cb <- x * x * x
>> x_qt <- x * x * x * x
>> model <- glm(outcome ~ x + x_sq + x_cb + x_qt, binomial)
> 
> It might have been easier with:
> 
> model <- glm(outcome ~ poly(x, 4) , binomial)

  Interesting.  The actual model I'm fitting has lots more terms, and needs to 
be able to be stepped down; sometimes some of the terms of the polynomials will 
get dropped while others get retained, for example.  But more to the point, 
poly() seems to be doing something tricky involving "orthogonal polynomials" 
that I don't understand.  I don't think I want whatever that is; I want my 
original variable x, plus its powers.  For example, if I do:

x < runif(10)
poly(x, 3)

the columns I get are not x, x^2, x^3; they are something else.  So the poly() 
fit is not equivalent to my fit.

> Since the authors of confint might have been expecting a poly() formulation 
> the results might be more reliable. I'm just using lm() but I think the 
> conclusions are more general:
> 
> ...
> 
> Coefficients:
>Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.457370.52499   0.871   0.3893
> x   -0.759891.15080  -0.660   0.5131
> x2   1.309870.67330   1.945   0.0594 .
> x3  -0.035590.11058  -0.322   0.7494
> 
> ...
> 
> Coefficients:
>Estimate Std. Error t value Pr(>|t|)
> (Intercept)   5.4271 0.1434  37.839  < 2e-16 ***
> poly(x, 3)1  30.0235 0.9184  32.692  < 2e-16 ***
> poly(x, 3)2   8.7823 0.9184   9.563 1.53e-11 ***
> poly(x, 3)3  -0.2956 0.9184  -0.3220.749

  Here, in your illustration, is underscored what I mean.  Whatever these 
orthogonal polynomial terms are that you're using, they are clearly not the 
original x, x^2 and x^3, and they're giving you a different fit than those do.  
I probably ought to learn about this technique, since it looks interesting; but 
for my purposes I need the fit to actually be in terms of x, since x is my 
explanatory variable.  And the fit I'm getting is highly significant (all terms 
< 2e-16), so the lack of fit problem you're illustrating does not seem to apply 
to my case.

  Or maybe I'm totally misunderstanding your point...?  :->

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-06 Thread Ben Haller
On May 6, 2011, at 1:58 PM, Prof Brian Ripley wrote:

> On Fri, 6 May 2011, Bert Gunter wrote:
> 
>> FWIW:
>> 
>> Fitting higher order polynomials (say > 2) is almost always a bad idea.
>> 
>> See e.g.  the Hastie, Tibshirani, et. al book on "Statistical
>> Learning" for a detailed explanation why. The Wikipedia entry on
>> "smoothing splines" also contains a brief explanation, I believe.
>> 
>> Your ~0 P values for the coefficients also suggest problems/confusion
>> (!) -- perhaps you need to consider something along the lines of
>> "functional data analysis"  for your analysis.
>> 
>> Having no knowledge of your issues, these remarks are entirely
>> speculative and may well be wrong. So feel free to dismiss.
>> Nevertheless, you may find it useful to consult your local
>> statistician for help.
> 
> That is the main piece of advice I would have given.  But if you must DIY, 
> consider the merits of orthogonal polynomials.  Computing individual 
> confidence intervals for highly correlated coefficients is very dubious 
> practice.  Without the example the posting guide asked for, we can only guess 
> if that is what is happening.

  Thanks to both of you.  Yes, I am admittedly out of my depth; the 
statistician I would normally ask is on sabbatical, and I'm a bit at sea.  Of 
course McGill has a whole department of mathematics and statistics; I guess I 
ought to try to make a friend over there (I'm in the biology department).  
Anyhow, I've just downloaded the Hastie et al. book and will try to figure out 
whether my use of higher order polynomials is incorrect in my situation.  
Eliminating those would certainly solve my problem with the confidence 
intervals.  :->

  I was figuring that the ~0 P-values for coefficients was just the result of 
my having 300,000 data points; I figured the regression procedure was thus able 
to pin down very accurate estimates of them.  I'll look into "functional data 
analysis" as you recommend, though; I'm entirely unfamiliar with it.

  As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly 
correlated, for values close to zero.  Is this what you mean, as a potential 
source of problems?  Or if you mean that the various other terms in my model 
might be correlated with x, that is not the case; each independent variable is 
completely uncorrelated with the others (this data comes from simulations, so 
the independent variables for each data point were in fact chosen by random 
drawing).

  It didn't seem easy to post an example, since my dataset is so large, but if 
either of you would be willing to look at this further, I could upload my 
dataset to a web server somewhere and post a link to it.  In any case, thanks 
very much for your help; I'll look into the things you mentioned.

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals and polynomial fits

2011-05-07 Thread Ben Haller
On May 6, 2011, at 4:27 PM, David Winsemius wrote:

> On May 6, 2011, at 4:16 PM, Ben Haller wrote:
>> 
> 
>> As for correlated coefficients: x, x^2, x^3 etc. would obviously be highly 
>> correlated, for values close to zero.
> 
> Not just for x close to zero:
> 
> > cor( (10:20)^2, (10:20)^3 )
> [1] 0.9961938
> > cor( (100:200)^2, (100:200)^3 )
> [1] 0.9966219

  Wow, that's very interesting.  Quite unexpected, for me.  Food for thought.  
Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Questions regrading the lasso and glmnet

2011-05-28 Thread Ben Haller
the product of the Cx 
column and the Cy column, and so it is not centered/scaled.  I don't know if 
this is correct or not.  So question 3:

3. Is my model matrix correct, or do I have a problem with the scale of the 
interaction variables?

  Having run glmnet(), I now have lots of coefficients for the different values 
of lambda.  One thing that seems problematic is that the coefficient for an 
interaction (Cx:Cy, say), will become non-zero while the coefficients for the 
underlying terms Cx and Cy are still zero.  Of course glmnet() has no way of 
preventing this, since it doesn't know the formula or which matrix columns are 
interactions, etc.  From my previous work on model selection via step-down, my 
understanding is that this state of affairs should not be allowed.  Using 
step-down, Cx and Cy will not be dropped as long as Cx:Cy is retained, for 
example.  As I understand it, this is for essentially the same conceptual 
reasons that one ought to retain the intercept in a regression even if it is 
non-significant; the slope of the regression depends upon it, and so dropping 
it is not correct.  However, I don't see any way that I can tell glmnet() about 
these relationships, since it doesn't have a functional interf!
 ace.  So:

4. Is it a problem that the lasso fit gives non-zero coefficients for 
interactions whose underlying terms have zero coefficients?

  And finally comes the actual model selection problem.  The best lambda value 
given by cv.glmnet() is for a model that contains most of the 116 terms; I 
suppose that, as with the BIC step-down, this is because 300,000 rows is a lot 
of data, and so most of those terms are in fact well-supported by the data.  
But I want a simple explanatory model with only the terms of large effect, as 
described before.  I'm thinking of finding the prediction rate for each lambda 
value used by glmnet, and using the same subjective "simplest model with a 
pretty good prediction rate" criterion that I was using before.  But I'm 
wondering:

5. Is there any way to choose a simple explanatory model, smaller than the best 
predictive model supported by the data, that is less arbitrary / subjective?

  And finally, I've got sort of a philosophical question that I'm struggling 
with that makes me wonder if the lasso is really the right tool for my job in 
the first place.  As lambda increases, the coefficients of terms diverge from 
zero, one by one, until all terms have non-zero coefficients and you've arrived 
at the full model.  The funny thing is, for most values of lambda, some or 
another term in my formula will have only just recently diverged from zero, and 
so that term will have a very small coefficient -- the shrinkage aspect of the 
lasso will have shrunk it almost, but not quite, to zero.  This is weird, given 
that my goal is a simple explanatory model; having terms with very small but 
non-zero coefficients is contrary to the goals of my statistical analysis.  The 
lasso is considered to be better for producing explanatory models than some 
other methods, because coefficients do eventually go to zero, instead of just 
getting smaller and smaller; but still, I wonder if!
  I ought to be using some more extreme method of model selection that, like 
the step-down I was using before, either keeps a term in or drops it, with no 
in-between.  On the other hand, I gather that the shrinkage that the lasso does 
is considered to be a good thing.  I don't really understand enough about all 
this to know what to do.  So:

6. Should I be looking at a method other than the lasso, so that I would not 
get some terms with very small coefficients?  Or should I use the lasso, and 
just ignore, in my explanation of the model, terms with tiny coefficients?  Or 
is there some other way of handling this problem?

  Thanks to anybody who has even read this far, and many many thanks to any 
responses!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gam() (in mgcv) with multiple interactions

2011-06-07 Thread Ben Haller
  Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as recommended 
to me earlier by some folks on this list.  I've run into a question to which I 
can't find the answer in his book, so I'm hoping somebody here knows.

   My outcome variable is binary, so I'm doing a binomial fit with gam().  I 
have five independent variables, all continuous, all uniformly distributed in 
[0, 1].  (This dataset is the result of a simulation model.)  Let's call them 
a,b,c,d,e for simplicity.  I'm interested in interactions such as a*b, so I'm 
using tensor product smooths such as te(a,b).  So far so good.  But I'm also 
interested in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these 
have a as a marginal basis (if I'm using the right terminology; all I mean is, 
both interactions involve a), and I would have expected them to share that 
basis; I would have expected them to be constrained such that the effect of a 
when b=0, for one, would be the same as the effect of a when d=0, for the 
other.  This would be just as, in a GLM with formula a*b + a*d, that formula 
would expand to a + b + d + a:b + a:d, and there is only one "a"; a doesn't get 
to be different for the a*b interaction than it is for the!
  a*d interaction.  But with tensor product smooths in gam(), that does not 
seem to be the case.  I'm still just getting to know mgcv and experimenting 
with things, so I may be doing something wrong; but the plots I have done of 
fits of this type appear to show different marginal effects.

  I tried explicitly including terms for the marginal basis; in my example, I 
tried a formula like te(a) + te(b) + te(d) + te(a, b) + te(a, d).  No dice; in 
this case, the main effect of a is different between all three places where it 
occurs in the model.  I.e. te(a) shows a different effect of a than te(a, b) 
shows at b=0, which is again different from the effect shown by te(a, d) at 
d=0.  I don't even know what that could possibly mean; it seems wrong to me 
that this could even be the case, but what do I know.  :->

   I could move up to a higher-order tensor like te(a,b,d), but there are three 
problems with that.  One, the b:d interaction (in my simplified example) is 
then also part of the model, and I'm not interested in it.  Two, given the set 
of interactions that I *am* interested in, I would actually be forced to do the 
full five-way te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think 
how long that will take to run, since it would have something like 5^5 free 
parameters to fit; that doesn't seem worth pursuing.  And three, interpretation 
of a five-way interaction would be unpleasant, to say the least; I'd much 
rather be able to stay with just the two-way (and one three-way) interactions 
that I know are of interest (I know this from previous logistic regression 
modelling of the dataset).

  For those who like to see the actual R code, here are two fits I've tried:

gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) + 
te(dispersal) + te(slope, curvature) + te(slope, amplitude) + te(curvature, 
amplitude) + te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

  So.  Any advice?  How can I correctly do a gam() fit involving multiple 
interactions that involve the same independent variable?

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam() (in mgcv) with multiple interactions

2011-06-10 Thread Ben Haller
On Jun 9, 2011, at 11:35 AM, Simon Wood wrote:

> I think that the main problem here is that smooths are not constrained to 
> pass through the origin, so the covariate taking the value zero doesn't 
> correspond to no effect in the way that you would like it to. Another way of 
> putting this is that smooths are translation invariant, you get essentially 
> the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
> + e_i (which implies that x_i=0 can have no special status).

  OK, I understand the translation invariance, I think, and why x_i=0 has no 
special status.  I don't understand the consequence that you are saying follows 
from that, though.  Onward...

> All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
> te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
> te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
> te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
> even be in the range of a and b).

  Hmm.  Perhaps my question would be better phrased as a question of model 
interpretation.  Can I think of the smooth found for te(a) as the "main effect" 
of a?  If so, should I just not be bothered by the fact that te(a,b) at b=0 has 
a different shape from te(a)?  Or is the "main effect" of a really, here, te(a) 
+ te(a,b | b=0) + te(a,d | d=0) (if my notation makes any sense, I'm not much 
at math)?  Or is the whole concept of "main effect" meaningless for these kinds 
of models -- in which case, how do I interpret what te(a) means?  Or perhaps I 
should not be trying to interpret a by itself; perhaps I can only interpret the 
interactions, not the main effects.  In that case, do I interpret te(a,b) by 
itself, or do I need to conceptually "add in" te(a) to understand what te(a,b) 
is telling me?  My head is spinning.  Clearly I just don't understand what 
these GAM models even mean, on a fundamental conceptual level.

> In general I find functional ANOVA not entirely intuitive to think about, but 
> there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
> Springer), and the associated package gss is on CRAN.

  Is what I'm doing functional ANOVA?  I realize ANOVA and regression are 
fundamentally related; but I think of ANOVA as involving discrete levels 
(factors) in the independent variables, like treatment groups.  My independent 
variables are all continuous, so I would not have thought of this as ANOVA.  
Anyhow, OK.  I will go get that book today, and see if I can figure all this 
out.

  Thanks for your help!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/


Begin forwarded message:

> From: Simon Wood 
> Date: June 9, 2011 11:35:11 AM EDT
> To: r-help@r-project.org, rh...@sticksoftware.com
> Subject: Re: [R] gam() (in mgcv) with multiple interactions
> 
> I think that the main problem here is that smooths are not constrained to 
> pass through the origin, so the covariate taking the value zero doesn't 
> correspond to no effect in the way that you would like it to. Another way of 
> putting this is that smooths are translation invariant, you get essentially 
> the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
> + e_i (which implies that x_i=0 can have no special status).
> 
> All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
> te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
> te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
> te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
> even be in the range of a and b).
> 
> In general I find functional ANOVA not entirely intuitive to think about, but 
> there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
> Springer), and the associated package gss is on CRAN.
> 
> best,
> Simon
> 
> 
> 
> On 07/06/11 17:00, Ben Haller wrote:
>> Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as
>> recommended to me earlier by some folks on this list.  I've run into
>> a question to which I can't find the answer in his book, so I'm
>> hoping somebody here knows.
>> 
>> My outcome variable is binary, so I'm doing a binomial fit with
>> gam().  I have five independent variables, all continuous, all
>> uniformly distributed in [0, 1].  (This dataset is the result of a
>> simulation model.)  Let's call them a,b,c,d,e for simplicity.  I'm
>> interested in interactions such as a*b, so I'm using tensor product
>> smooths such as te(a,b).  So far so good.  But I'm also interested
>> in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of

[R] error in optim, within polr(): "initial value in 'vmmin' is not finite"

2011-02-16 Thread Ben Haller
  Hi all.  I'm just starting to explore ordinal multinomial regression.  My 
dataset is 300,000 rows, with an outcome (ordinal factor from 1 to 9) and five 
independent variables (all continuous).  My first stab at it was this:

pomod <- polr(Npf ~ o_stddev + o_skewness + o_kurtosis + o_acl_1e + dispersal, 
rlc, Hess=TRUE)

  And that worked; I got a good model fit.  However, a variety of other things 
that I've tried give me this error:

Error in optim(s0, fmin, gmin, method = "BFGS", ...) : 
  initial value in 'vmmin' is not finite

  This occurs, for example, when I try to use the method="probit" option of 
polr().  It also occurs when I try a regression involving interactions, such as:

pomod <- polr(Npf ~ o_stddev * o_skewness * o_kurtosis * o_acl_1e * dispersal, 
rlc, Hess=TRUE)

  I have good reason to believe that interactions are important here, so I'd 
very much like to be able to fit such models.  I have been doing that 
successfully with logistic regression (considering my outcome variable to be 
binary, either "1" or "2-9") using glm(), but now using polr() it gives me this 
error.  I've searched Google and the R lists for information about this error, 
and while I did find a couple of other people asking about it, I didn't find 
any advice about what to do about it that I can apply to my situation.

  I'd be happy to share my dataset with anyone willing to help me on this, but 
300,000 rows is a bit large to include in this email.  :->

  Thanks!

Ben Haller
McGill University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Evaluating model fits for ordinal multinomial regressions with polr()

2011-02-16 Thread Ben Haller
  Hi all.  I'm just starting to explore ordinal multinomial regression.  I've 
been doing logistic regression in the past, and I've gotten used to various 
functions for getting pseudo-R2 metrics and other metrics to evaluate the 
quality of a model.  For example:

val.prob()
NagelkerkeR2()
ClassLog() 
LogRegR2()

  But these don't seem to work for the models generated by polr().  Does anyone 
have any recommendations for ways to evaluate the goodness-of-fit and such 
things for ordinal multinomial regression models?  I've searched Google and 
looked at the docs for the package (MASS) that defines polr(), but haven't 
discovered anything.

  Thanks!

Ben Haller
McGill University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multi process support in R

2011-02-17 Thread Ben Haller
On Feb 17, 2011, at 11:40 AM, Alaios wrote:

> ...Is it possible to split work in many cores in R and if yes how is this 
> library called?

  I'd recommend the "mclapply" function in the "multicore" package.  The only 
drawback is that you can't run your code in a GUI any more.

Ben Haller
McGill University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in optim, within polr(): "initial value in 'vmmin' is not finite"

2011-02-18 Thread Ben Haller
  An update for the benefit of the list/posterity: I resolved this issue by 
switching over to using the lrm() function of package rms.  It seems to pick 
better starts, or something; in any case, it has been able to converge on a 
solution for every model I've tried, although for the most complex ones I 
needed to raise maxit (maximum iterations) above the default of 12 slightly.  
The lrm() function does not support interactions higher than third-order, and 
it does only logistic regressions, not probit or other types, so it does have 
its drawbacks; but it has solved my difficulties quite nicely.  Just in case 
anybody cares.  :->

Ben Haller
McGill University


On Feb 16, 2011, at 5:41 PM, Ben Haller wrote:

>  Hi all.  I'm just starting to explore ordinal multinomial regression.  My 
> dataset is 300,000 rows, with an outcome (ordinal factor from 1 to 9) and 
> five independent variables (all continuous).  My first stab at it was this:
> 
> pomod <- polr(Npf ~ o_stddev + o_skewness + o_kurtosis + o_acl_1e + 
> dispersal, rlc, Hess=TRUE)
> 
>  And that worked; I got a good model fit.  However, a variety of other things 
> that I've tried give me this error:
> 
> Error in optim(s0, fmin, gmin, method = "BFGS", ...) : 
>  initial value in 'vmmin' is not finite
> 
>  This occurs, for example, when I try to use the method="probit" option of 
> polr().  It also occurs when I try a regression involving interactions, such 
> as:
> 
> pomod <- polr(Npf ~ o_stddev * o_skewness * o_kurtosis * o_acl_1e * 
> dispersal, rlc, Hess=TRUE)
> 
>  I have good reason to believe that interactions are important here, so I'd 
> very much like to be able to fit such models.  I have been doing that 
> successfully with logistic regression (considering my outcome variable to be 
> binary, either "1" or "2-9") using glm(), but now using polr() it gives me 
> this error.  I've searched Google and the R lists for information about this 
> error, and while I did find a couple of other people asking about it, I 
> didn't find any advice about what to do about it that I can apply to my 
> situation.
> 
>  I'd be happy to share my dataset with anyone willing to help me on this, but 
> 300,000 rows is a bit large to include in this email.  :->
> 
>  Thanks!
> 
> Ben Haller
> McGill University
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SPDEP Package, neighbours list for Moran's I on large grid dataset

2011-04-20 Thread Ben Haller
Laurent Jégou wrote:

> Hello list members, i'd like to calculate the Moran I on a large dataset,
> a 8640x3432 grid of values.
> 
> When i try to create the neighbours list with the cell2nb function, on
> such a scale, R works for several hours and seems to crash. I'm using the last
> version (2.13), 64 bits, on a mac pro with 4go of memory.
> 
> I think that i'm doing it wrong :-)
> 
> I'd like to use a "moving window" weight matrix instead of a full scale
> one, but i was not lucky enough to find a solution in the docs.

  I confronted this problem recently, and decided to roll my own.  For a large 
dataset, an exhaustive computation of Moran's I is very time-consuming, as you 
say; but an estimate of it, based on a subset of pairs chosen randomly, seemed 
(for my data, at least) to converge quite nicely.  Here's my code.  It assumes 
a square grid, so you'll need to adapt it for your non-square grid, but that 
should be trivial.  To determine the right number of pairs for a good estimate 
in my application, I called this function with various numbers of pairs, and 
plotted the estimate produced as a function of the number of pairs used, and 
observed good convergence by 200,000 pairs.  You will probably want to do this 
yourself, for your dataset.  200,000 pairs took just a second or two to run, on 
my machine, so this is much, much faster than an exhaustive calculation.  As a 
bonus, this code also gives you Geary's C.  Oh, and the code below assumes a 
wraparound lattice (i.e. a torus, i.e. periodic boundaries), which may not be 
what you want; just get rid of the d_wraparound stuff and the following pmax() 
call if you want non-wraparound boundaries, and that should work fine, I think. 
 Not sure what you mean by the moving window thing, so I leave that as an 
exercise for the reader.  :->

  I've never made an R package, and I'm not sure there's much demand for this 
code (is there?), so I'm presently unlikely to package it up.  However, if 
someone who owns a package related to this problem would like to adopt this 
code, generalize it to non-square lattices, add a flag to choose periodic or 
non-periodic boundaries, etc., feel free to do so; I hereby place it in the 
public domain.  I'd just like to hear about it if someone does this, and 
receive credit somewhere, that's all.  I'm not super-good in R, either, so if 
there's a better way to do some things, like the somewhat hacked random index 
generation (trying to avoid floating point issues when generating a random 
integer), please let me know.  I'm always interested in learning how to do 
things better.

  And of course this code is provided without warranty, may have bugs, etc.

  Enjoy!

Ben Haller
McGill University


correlation_stats <- function(p, n_pairs=20)
{
# Compute Moran's I and Geary's C for the landscape p.  This is tricky 
because the exact calculation
# would be far too time-consuming, as it involves comparison of every 
point to every other point.
# So I am trying to estimate it here with a small subset of all 
possible point comparisons.
p_size <- NROW(p)   # 512
p_length <- length(p)   # 262144
mean_p <- mean(p)
pv <- as.vector(p)

# select points and look up their values; for speed, points are 
selected by their vector index
p1ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
p1x <- (p1ix %/% p_size) + 1
p1y <- (p1ix %% p_size) + 1
p1ix <- p1ix + 1

p2ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
p2x <- (p2ix %/% p_size) + 1
p2y <- (p2ix %% p_size) + 1
p2ix <- p2ix + 1

v1 <- pv[p1ix] - mean_p
v2 <- pv[p2ix] - mean_p

# Calculate distances between point pairs, both directly and wrapping 
around
# The average direct distance is much smaller than the average 
wraparound distance,
# because points near the center vertically have a maximum direct 
distance near 256,
# but a minimum wraparound distance near 256.  The Moran's I estimate 
from wraparound
# distances is different, as a result.  Rupert recommends taking the 
shorter of the
# two distances, whichever it is, because that keeps us within the 
meaningful scale
# of our space, before it just starts repeating due to periodicity.
d_direct <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p1y - p2y) ^ 2))
d_direct[is.infinite(d_direct)] <- 0

d_wraparound <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p_size - abs(p1y - p2y)) 
^ 2))
d_wraparound[is.infinite(d_wraparound)] <- 0

d <- pmax(d_direct, d_wraparound)   # max because we want the min 
distance, and these are 1/distan

Re: [R] SPDEP Package, neighbours list for Moran's I on large grid dataset

2011-04-22 Thread Ben Haller
On Apr 22, 2011, at 2:07 AM, Laurent Jégou wrote:

> Hello, thank you very much for this solution, i'll try to adapt it to my 
> problem, i think i'll be able to.

  Great!

> By the way, i wrote about a "sliding-window" weight matrix, it's an 
> expression used in satellite images processing, esp. convolution filters, 
> meaning that the main matrix is not confronted to another one of the same 
> dimensions, but to a much smaller one. The calculus is executed value by 
> value (pixels on an image), centering the small matrix on each one, hence the 
> name of sliding window.

  Yeah, I'm familiar with the concept in general, but I don't see how you can 
really apply it to Moran's I calculations; the Moran's I value is, by 
definition, based on calculating the value for every point compared to every 
other point, and looking at only points within a smaller window risks getting a 
value that is not representative of the larger context.  But if it makes sense 
to you in your application, then more power to you.  :->

Ben Haller
McGill University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.