[R] No answer in anova.nls

2008-08-11 Thread Nazareno Andrade
Dear R-helpers,

I am trying to check whether a model of the form y(t) = a/(1 +b*t) fits the
curve of downloads per day of a file in a specific online community better
than a model of the form y(t) = a*exp(-b*t). For that, I used nls to fit
both models and I am now trying to compare the fits with anova. The problem
I find is that anova does not report an F statistic or a p-value when I
compare these two models.

The data for a file is typically the following:
> d
   V1  V2
1   1 293
2   2 101
3   3  63
4   4  53
5   5  42
6   6  19
7   7  28
8   8  23
9   9  18
10 10  17
11 11  14
12 12  18
13 13   5
14 14   9
15 15  10
16 15   0

My code:

d <- 
read.table(url("http://ece.ubc.ca/~nazareno/85247.arrivalRates
"))
plot(d)
f.exp.nw <- nls(V2 ~ a. * exp(-b. * V1), data = d, list( a. = d$V2[1], b. =
0.05))
f.exp5.nw <- nls(V2 ~ a. / (1+ b. *V1), data = d, list( a. = d$V2[1], b. =
2))
lines(d$V1, predict(f.exp.nw), col = "royalblue")
lines(d$V1, predict(f.exp5.nw), col = "orange")

anova(f.exp.nw, f.exp5.nw)

However, the output from anova.nls is:

Analysis of Variance Table

Model 1: V2 ~ a. * exp(-b. * V1)
Model 2: V2 ~ a./(1 + b. * V1)
  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 13 4994.9
2 13  314.7  00.0

and I cannot interpretate the lack of an F value. Looking at the
implementation of the anova.nls() function, this seems to be related to the
fact that the residuals' degrees of freedom are the same, but I could not
find anywhere more information on whether they were required to be
different. Thus, I'd greatly appreciate if you could spot any mistakes I
might be doing or a (preferably online) reference for more on this issue.


As a side question, it would be great also if someone with more experience
on this matter could confirm with me that the proper direction for checking
whether "the y(t) = a/(1 +b*t) form models more precisely the behavior of
downloads of files in this communtiy" by quantifying for how many files it
outperforms the exponential model.

thank you very much in advance,
Nazareno

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] No answer in anova.nls

2008-08-11 Thread Nazareno Andrade
Dear R-helpers,

I am trying to check whether a model of the form y(t) = a/(1 +b*t) fits the
curve of downloads per day of a file in a specific online community better
than a model of the form y(t) = a*exp(-b*t). For that, I used nls to fit
both models and I am now trying to compare the fits with anova. The problem
I find is that anova does not report an F statistic or a p-value when I
compare these two models.

The data for a file is typically the following:
> d
   V1  V2
1   1 293
2   2 101
3   3  63
4   4  53
5   5  42
6   6  19
7   7  28
8   8  23
9   9  18
10 10  17
11 11  14
12 12  18
13 13   5
14 14   9
15 15  10
16 15   0

My code:

d <- read.table(url("http://ece.ubc.ca/~nazareno/85247.arrivalRates";))
plot(d)
f.exp.nw <- nls(V2 ~ a. * exp(-b. * V1), data = d, list( a. = d$V2[1], b. =
0.05))
f.exp5.nw <- nls(V2 ~ a. / (1+ b. *V1), data = d, list( a. = d$V2[1], b. =
2))
lines(d$V1, predict(f.exp.nw), col = "royalblue")
lines(d$V1, predict(f.exp5.nw), col = "orange")

anova(f.exp.nw, f.exp5.nw)

However, the output from anova.nls is:

Analysis of Variance Table

Model 1: V2 ~ a. * exp(-b. * V1)
Model 2: V2 ~ a./(1 + b. * V1)
  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 13 4994.9
2 13  314.7  00.0

and I cannot interpretate the lack of an F value. Looking at the
implementation of the anova.nls() function, this seems to be related to the
fact that the residuals' degrees of freedom are the same, but I could not
find anywhere more information on whether they were required to be
different. Thus, I'd greatly appreciate if you could spot any mistakes I
might be doing or a (preferably online) reference for more on this issue.


As a side question, it would be great also if someone with more experience
on this matter could confirm with me that the proper direction for checking
whether "the y(t) = a/(1 +b*t) form models more precisely the behavior of
downloads of files in this communtiy" by quantifying for how many files it
outperforms the exponential model.

thank you very much in advance,
Nazareno

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] No answer in anova.nls

2008-08-12 Thread Nazareno Andrade
(sorry if this arrives multiple times, I sent it from the wrong email
address to the r-help the first time)

Thanks for both answers. I'll look into that.

I understand I can take do a qualitative evaluation of the fits using
visual tests, but a problem I have is that I'd like to quantify in how
many out of hundreds of downloads each model fits better the data. I
have some a hypothesis that there are two group of downloads, one
modeled by each function. Would there be any automated method for
quantifying this in R?

thank you again,
Nazareno

On Tue, Aug 12, 2008 at 9:19 AM, Bert Gunter <[EMAIL PROTECTED]> wrote:
> To add to Brian's points (which you should heed!) -- you **may** find it
> also useful to look at (possibly smoothed) residuals to compare lack of fit
> from your alternative models. If any shows up, some subject matter knowledge
> might lead you to choose one or the other of your models -- or neither.
>
> -- Bert Gunter
> Genentech
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
> Behalf Of Prof Brian Ripley
> Sent: Monday, August 11, 2008 11:34 PM
> To: Nazareno Andrade
> Cc: r-help
> Subject: Re: [R] No answer in anova.nls
>
> The reason for no F test showing up is that the additional df is 0 and the
> F value is Inf.  But the underlying problem is that your models are not
> nested and so ANOVA between them is invalid.
>
> I suggest you seek help from a local statistician: your misunderstanding
> and your question about model adequacy are subtle statistical issues and
> not help on R.
>
> On Mon, 11 Aug 2008, Nazareno Andrade wrote:
>
>> Dear R-helpers,
>>
>> I am trying to check whether a model of the form y(t) = a/(1 +b*t) fits
> the
>> curve of downloads per day of a file in a specific online community better
>> than a model of the form y(t) = a*exp(-b*t). For that, I used nls to fit
>> both models and I am now trying to compare the fits with anova. The
> problem
>> I find is that anova does not report an F statistic or a p-value when I
>> compare these two models.
>>
>> The data for a file is typically the following:
>>> d
>>   V1  V2
>> 1   1 293
>> 2   2 101
>> 3   3  63
>> 4   4  53
>> 5   5  42
>> 6   6  19
>> 7   7  28
>> 8   8  23
>> 9   9  18
>> 10 10  17
>> 11 11  14
>> 12 12  18
>> 13 13   5
>> 14 14   9
>> 15 15  10
>> 16 15   0
>>
>> My code:
>>
>> d <-
> read.table(url("http://ece.ubc.ca/~nazareno/85247.arrivalRates<http://ece.ub
> c.ca/%7Enazareno/85247.arrivalRates>
>> "))
>> plot(d)
>> f.exp.nw <- nls(V2 ~ a. * exp(-b. * V1), data = d, list( a. = d$V2[1], b.
> =
>> 0.05))
>> f.exp5.nw <- nls(V2 ~ a. / (1+ b. *V1), data = d, list( a. = d$V2[1], b. =
>> 2))
>> lines(d$V1, predict(f.exp.nw), col = "royalblue")
>> lines(d$V1, predict(f.exp5.nw), col = "orange")
>>
>> anova(f.exp.nw, f.exp5.nw)
>>
>> However, the output from anova.nls is:
>>
>> Analysis of Variance Table
>>
>> Model 1: V2 ~ a. * exp(-b. * V1)
>> Model 2: V2 ~ a./(1 + b. * V1)
>>  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
>> 1 13 4994.9
>> 2 13  314.7  00.0
>>
>> and I cannot interpretate the lack of an F value. Looking at the
>> implementation of the anova.nls() function, this seems to be related to
> the
>> fact that the residuals' degrees of freedom are the same, but I could not
>> find anywhere more information on whether they were required to be
>> different. Thus, I'd greatly appreciate if you could spot any mistakes I
>> might be doing or a (preferably online) reference for more on this issue.
>>
>>
>> As a side question, it would be great also if someone with more experience
>> on this matter could confirm with me that the proper direction for
> checking
>> whether "the y(t) = a/(1 +b*t) form models more precisely the behavior of
>> downloads of files in this communtiy" by quantifying for how many files it
>> outperforms the exponential model.
>>
>> thank you very much in advance,
>> Nazareno
>>
>>   [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley,  [EMAIL PROTECTED

Re: [R] No answer in anova.nls

2008-08-12 Thread Nazareno Andrade
Thanks for both answers. I'll look into that.

I understand I can take do a qualitative evaluation of the fits using visual
tests, but a problem I have is that I'd like to quantify in how many out of
hundreds of downloads each model fits better the data. I have some a
hypothesis that there are two group of downloads, one modeled by each
function. Would there be any automated method for quantifying this in R?

thank you again,
Nazareno

On Tue, Aug 12, 2008 at 9:19 AM, Bert Gunter <[EMAIL PROTECTED]> wrote:

> To add to Brian's points (which you should heed!) -- you **may** find it
> also useful to look at (possibly smoothed) residuals to compare lack of fit
> from your alternative models. If any shows up, some subject matter
> knowledge
> might lead you to choose one or the other of your models -- or neither.
>
> -- Bert Gunter
> Genentech
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On
> Behalf Of Prof Brian Ripley
> Sent: Monday, August 11, 2008 11:34 PM
> To: Nazareno Andrade
> Cc: r-help
> Subject: Re: [R] No answer in anova.nls
>
> The reason for no F test showing up is that the additional df is 0 and the
> F value is Inf.  But the underlying problem is that your models are not
> nested and so ANOVA between them is invalid.
>
> I suggest you seek help from a local statistician: your misunderstanding
> and your question about model adequacy are subtle statistical issues and
> not help on R.
>
> On Mon, 11 Aug 2008, Nazareno Andrade wrote:
>
> > Dear R-helpers,
> >
> > I am trying to check whether a model of the form y(t) = a/(1 +b*t) fits
> the
> > curve of downloads per day of a file in a specific online community
> better
> > than a model of the form y(t) = a*exp(-b*t). For that, I used nls to fit
> > both models and I am now trying to compare the fits with anova. The
> problem
> > I find is that anova does not report an F statistic or a p-value when I
> > compare these two models.
> >
> > The data for a file is typically the following:
> >> d
> >   V1  V2
> > 1   1 293
> > 2   2 101
> > 3   3  63
> > 4   4  53
> > 5   5  42
> > 6   6  19
> > 7   7  28
> > 8   8  23
> > 9   9  18
> > 10 10  17
> > 11 11  14
> > 12 12  18
> > 13 13   5
> > 14 14   9
> > 15 15  10
> > 16 15   0
> >
> > My code:
> >
> > d <-
> read.table(url("http://ece.ubc.ca/~nazareno/85247.arrivalRates<http://ece.ubc.ca/%7Enazareno/85247.arrivalRates>
> <http://ece.ub
> c.ca/%7Enazareno/85247.arrivalRates>
> > "))
> > plot(d)
> > f.exp.nw <- nls(V2 ~ a. * exp(-b. * V1), data = d, list( a. = d$V2[1], b.
> =
> > 0.05))
> > f.exp5.nw <- nls(V2 ~ a. / (1+ b. *V1), data = d, list( a. = d$V2[1], b.
> =
> > 2))
> > lines(d$V1, predict(f.exp.nw), col = "royalblue")
> > lines(d$V1, predict(f.exp5.nw), col = "orange")
> >
> > anova(f.exp.nw, f.exp5.nw)
> >
> > However, the output from anova.nls is:
> >
> > Analysis of Variance Table
> >
> > Model 1: V2 ~ a. * exp(-b. * V1)
> > Model 2: V2 ~ a./(1 + b. * V1)
> >  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
> > 1 13 4994.9
> > 2 13  314.7  00.0
> >
> > and I cannot interpretate the lack of an F value. Looking at the
> > implementation of the anova.nls() function, this seems to be related to
> the
> > fact that the residuals' degrees of freedom are the same, but I could not
> > find anywhere more information on whether they were required to be
> > different. Thus, I'd greatly appreciate if you could spot any mistakes I
> > might be doing or a (preferably online) reference for more on this issue.
> >
> >
> > As a side question, it would be great also if someone with more
> experience
> > on this matter could confirm with me that the proper direction for
> checking
> > whether "the y(t) = a/(1 +b*t) form models more precisely the behavior of
> > downloads of files in this communtiy" by quantifying for how many files
> it
> > outperforms the exponential model.
> >
> > thank you very much in advance,
> > Nazareno
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
&

[R] Pdf file size for very scatter plots

2008-08-15 Thread Nazareno Andrade
Dear all,

I am plotting a scatter plot for a large sample (1e+05 ordered pairs).
This produces a large (~5MB) file in a pdf or postscript terminal, and
I am wondering whether there are methods for reducing the size of the
resulting file so that it is easier to include it in a document. I'd
rather stick with pdf or ps as I am using latex.

thanks,
Nazareno

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pdf file size for very scatter plots

2008-08-15 Thread Nazareno Andrade
Jim,

Thanks for the answer. Using pch="." reduces the file to ~3MB... Still large.

I'll look into hexbins, but if I understand it right, it would 'round'
points which are nearby into a same hexagon, right? Couldn't that
result in an inaccurate view of a scatter plot?

Here's the code I'm using:

pdf(); plot(rnorm(1e5), rnorm(1e5), pch = "."); dev.off()

thanks again,
Nazareno

On Fri, Aug 15, 2008 at 12:27 PM, jim holtman <[EMAIL PROTECTED]> wrote:
> Have you tried using  pch='.'?
>
> Also you might consider using 'hexbin' for creating the scatter plot.
>
> On Fri, Aug 15, 2008 at 12:24 PM, Nazareno Andrade
> <[EMAIL PROTECTED]> wrote:
>> Dear all,
>>
>> I am plotting a scatter plot for a large sample (1e+05 ordered pairs).
>> This produces a large (~5MB) file in a pdf or postscript terminal, and
>> I am wondering whether there are methods for reducing the size of the
>> resulting file so that it is easier to include it in a document. I'd
>> rather stick with pdf or ps as I am using latex.
>>
>> thanks,
>> Nazareno
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pdf file size for very scatter plots

2008-08-15 Thread Nazareno Andrade
Jim,

Thanks for the answer. Using pch="." reduces the file to ~3MB... Still large.

I'll look into hexbins, but if I understand it right, it would 'round'
points which are nearby into a same hexagon, right? Couldn't that
result in an inaccurate view of a scatter plot?

Here's the code I'm using:

pdf(); plot(rnorm(1e5), rnorm(1e5), pch = "."); dev.off()

thanks again,
Nazareno

On Fri, Aug 15, 2008 at 12:27 PM, jim holtman <[EMAIL PROTECTED]> wrote:
> Have you tried using  pch='.'?
>
> Also you might consider using 'hexbin' for creating the scatter plot.
>
> On Fri, Aug 15, 2008 at 12:24 PM, Nazareno Andrade
> <[EMAIL PROTECTED]> wrote:
>> Dear all,
>>
>> I am plotting a scatter plot for a large sample (1e+05 ordered pairs).
>> This produces a large (~5MB) file in a pdf or postscript terminal, and
>> I am wondering whether there are methods for reducing the size of the
>> resulting file so that it is easier to include it in a document. I'd
>> rather stick with pdf or ps as I am using latex.
>>
>> thanks,
>> Nazareno
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.