date:20120920

Re: [R] Adding legends on plots using Lattice package

2012-09-20 Thread Deepayan Sarkar

On Mon, Sep 17, 2012 at 2:34 PM, jpm miao  wrote:
> To dear Dr Sarkar and anyone that knows about Lattice package,
>
>I make 4 graphs by Lattice package. Each of the graphs has two time
> series. All the series are plotted in plain lines by default, and I would
> like one series to be in plain line and the other to be in dotted line in
> each graph. How can I modify the command of xyplot in the following line to
> achieve this? It seems that "key" or "auto.key"  parameters are needed, but
> I don't know how to do it. Thank you very much!
>
>
> require(graphics)
> library(lattice)
> data1<-read.csv(file="G_Results_3FX1.csv", header=TRUE)

Reproducible example please (the EuStockMarkets dataset may be suitable).

-Deepayan


> xts<-ts(data1[,2:9],frequency = 4, start = c(1983, 1))
> xyplot(xts, screen=list("a","a","b","b","c","c","d","d"), layout=c(2,2),
> scales=list(x="same",y="same"))
>
>
>
> Miao
>
>
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] validate.lrm - confidence interval for boostrap-corrected AUC ?

2012-09-20 Thread Whee Sze Ong

Hi



Does anyone know whether the rms package provides a confidence interval for
the bootstrap-corrected Dxy or c-index?



I have fitted a logistic model, and would like to obtain the 95% confidence
interval of the bootstrap-corrected area under the ROC curve estimate.



Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] invalid labels; length 2 should be 1 or 0

2012-09-20 Thread ya

hi list,

I googled this invalid lables issue. It seems different people doing different 
analysis encountered this problem. So I guess this is not about the MICE 
package.

However, in general, they have categorical variables, in my case, I double 
checked, the bulg_1, bulg_2, and bulg_3 are continuous variables that need to 
be imputed. 

Why the factor() function was used here: Error in factor(x[, type == (-2)], 
labels = 1:n.class)

Thank you very much.

Best regards,

ya


 
·¢¼þÈË£º ya
·¢ËÍÊ±¼ä£º 2012-09-19 11:30
ÊÕ¼þÈË£º 32680822
Ö÷Ìâ£º Fw: invalid labels; length 2 should be 1 or 0

  

Dear list,

I am trying to impute the two level data, I have a question about a warning. 
Could you give me some suggestions please? Thank you very much.

Here is my code and output of mice package:

> ini <- mice(try, maxit=0)
> pred=ini$pred
> pred
FAC1_1 FAC2_1 FAC3_1 FAC4_1 FAC5_1 FAC6_1 FAC7_1 FAC8_1 FAC9_1 
FAC10_1 ClassSize_1 ClassSize_2 ClassSize_3 intercept TeacherID_1 bulg_1 bulg_2
FAC1_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC2_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC3_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC4_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC5_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC6_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC7_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC8_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC9_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
FAC10_1  0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
ClassSize_1  0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
ClassSize_2  0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
ClassSize_3  0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
intercept0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
TeacherID_1  1  1  1  1  1  1  1  1  1  
 1   1   1   1 0   0  0  0
bulg_1   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
bulg_2   0  0  0  0  0  0  0  0  0  
 0   0   0   0 0   0  0  0
bulg_3   1  1  1  1  1  1  1  1  1  
 1   1   1   1 0   1  0  0
bulg_3
FAC1_1   0
FAC2_1   0
FAC3_1   0
FAC4_1   0
FAC5_1   0
FAC6_1   0
FAC7_1   0
FAC8_1   0
FAC9_1   0
FAC10_1  0
ClassSize_1  0
ClassSize_2  0
ClassSize_3  0
intercept0
TeacherID_1  1
bulg_1   0
bulg_2   0
bulg_3   0
> pred["bulg_1",]=c(2,2,2,2,2,2,2,2,2,2,1,0,0,2,-2,0,0,0)
> imp=mice(try,meth=c("","","","","","","","","","","","","","","","2l.norm","2l.norm","2l.norm"),pred=pred,maxit=3)

 iter imp variable
  1   1  bulg_3Error in factor(x[, type == (-2)], labels = 1:n.class) : 
  invalid labels; length 2 should be 1 or 0

> class(formi$bulg_1)
[1] "numeric"
> class(formi$bulg_2)
[1] "numeric"
> class(formi$bulg_3)
[1] "numeric"

The TeacherID_1 is the second level ID. bulg_1, bulg_2, and bulg_3 are 
continuous variables that need to be imputed.  Why the factor() was used for 
continuous variables?

Thank you very much.

Best regards,

ya
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

[R] optim and "the function should not" advice

2012-09-20 Thread Gildas Mazo

Dear R users, 

I'm using optim to optimize a pretty complicated function. This function takes 
the parameter vector "theta" and within its body I use instructions like 

sigma<-theta[a:b]; computations with sigma... 
out<-c() 
for (i in 1:d){ 
a<-theta[(3*d+i):c] 
out[i]<-evaluation of an expression involving 'a' (I use symbolic 
differentiation) 
} 

Unfortunately for certain problems 'optim' returns a parameter vector which 
didn't move at all from the initial parameters, and the output says that 
although the function has been evaluated a high number of times, the gradient 
(which I fed the function with) has been evaluated only one time. I used the 
BFGS method. 

By chance I looked at the help and I read "The parameter vector passed to fn 
has special semantics and may be shared between calls: the function should not 
change or copy it" . Could the instructions above be the cause of the failure? 
If so, how to deal with symbolic differentation? 

Thanks in advance, 
Gildas 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optim and "the function should not" advice

2012-09-20 Thread Prof Brian Ripley


On 20/09/2012 09:24, Gildas Mazo wrote:

Dear R users,

I'm using optim to optimize a pretty complicated function. This function takes the 
parameter vector "theta" and within its body I use instructions like

sigma<-theta[a:b]; computations with sigma...
out<-c()
for (i in 1:d){
a<-theta[(3*d+i):c]
out[i]<-evaluation of an expression involving 'a' (I use symbolic 
differentiation)
}

Unfortunately for certain problems 'optim' returns a parameter vector which 
didn't move at all from the initial parameters, and the output says that 
although the function has been evaluated a high number of times, the gradient 
(which I fed the function with) has been evaluated only one time. I used the 
BFGS method.


On face value that means it is unable to find a small step that goes 
downhill consistent with the gradient, and usually indicates an error in 
the gradient function or using numerical derivatives on a 
non-differentiable function.



By chance I looked at the help and I read "The parameter vector passed to fn has 
special semantics and may be shared between calls: the function should not change or copy 
it" . Could the instructions above be the cause of the failure? If so, how to deal 
with symbolic differentation?


None of the code you show us changes 'theta'.  It would be a very 
unusual thing to do, but has happened in error when people have used 
compiled code.



Thanks in advance,
Gildas



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R/C++ interfaces: crashes when using .c(), followed by correct results when R restarted

2012-09-20 Thread Franckx Laurent

Dear all


I have written a function in C++  , equil_distC, that I am calling from an R 
script.

In the last few days, R has repeatedly crashed when calling this function, or 
delivered obviously wrong outputs. However, when I restarted R after the crash, 
the results turned out to be OK most of the time although I had not modified 
the C++ code.

Although the code runs correctly now, I am not sure why it does work right now, 
and didn't just an hour ago.

Let me summarize the main steps I undertook:

* After having modified the script with the C++ code, I compiled it using R CMD 
SHLIB mylatestversion.cpp - no problems occurred then.
* Next, I loaded the dll into R with
dyn.load("C:\\Ccodefortransortmodel\\ mylatestversion.dll")
* Next, I called equil_distC with
testres2 <- .C("equil_distC", as.integer(rowmaxfiets),as.integer(colmaxfiets), 
as.double(nodes), as.integer(links_dist_calc), as.integer(cyclepathsfirstline), 
as.integer(numzones), as.integer(numnodes), as.integer(numlinks), result = 
double(6))
* About half of the time, whenever I had changed the C++ code, the results were 
either nonsensical or R crashed.
* When R did not crash, I unloaded the dll with:
dyn.unload("C:\\Ccodefortransortmodel\\equildistCforodpair_2012_09_20v8.dll")
* Next, I deleted the .dll and the .o of the most recent version, I modified 
the C++ code and I ran the above procedure again.
* In most cases when R crashed, a new run of the procedure above resulted in 
correct results, even though the C++ code was not modified.

I have also noticed that, in several cases, apparently irrelevant changes to 
the code would lead to a crash. This occurred for instance when I replaced %d 
with %f in Rprintf()) - my first conjecture was actually that this was the root 
of the problem, but the code runs correctly now with %f.

My own conjecture is that, some way or another, old versions of equil_distC 
kept on circulating in memory.

For instance, even after a dyn.unload of all previous versions of the dll, 
"is.loaded("equil_distC")" would still lead to a TRUE.

I would like to understand the structural reason for these problems. Again, the 
problem is not the C++ code - it does run correctly now, although it had 
crashed in the previous run.


Best regards



Laurent Franckx, PhD
VITO NV
Boeretang 200, 2400 MOL, Belgium
Tel. + 32 14 33 58 22
Skype: laurent.franckx
laurent.fran...@vito.be








 [http://www.vito.be/e-maildisclaimer/bsds.png]  

VITO Disclaimer: http://www.vito.be/e-maildisclaimer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R/C++ interfaces: crashes when using .c(), followed by correct results when R restarted

2012-09-20 Thread Prof Brian Ripley

First, the posting guide asks you to send questions about compiled code 
to the R-devel list.


These are all classic symptoms of the use of uninitialized variables or 
writing out of bounds, but of which you can catch by using valgrind: see 
'Writing R Extensions'.  They are much harder to find on Windows (which 
although you did not tell us your OS, is what this looks like).



On 20/09/2012 10:19, Franckx Laurent wrote:

Dear all


I have written a function in C++  , equil_distC, that I am calling from an R 
script.

In the last few days, R has repeatedly crashed when calling this function, or 
delivered obviously wrong outputs. However, when I restarted R after the crash, 
the results turned out to be OK most of the time although I had not modified 
the C++ code.

Although the code runs correctly now, I am not sure why it does work right now, 
and didn't just an hour ago.

Let me summarize the main steps I undertook:

* After having modified the script with the C++ code, I compiled it using R CMD 
SHLIB mylatestversion.cpp - no problems occurred then.
* Next, I loaded the dll into R with
dyn.load("C:\\Ccodefortransortmodel\\ mylatestversion.dll")
* Next, I called equil_distC with
testres2 <- .C("equil_distC", as.integer(rowmaxfiets),as.integer(colmaxfiets), 
as.double(nodes), as.integer(links_dist_calc), as.integer(cyclepathsfirstline), 
as.integer(numzones), as.integer(numnodes), as.integer(numlinks), result = double(6))
* About half of the time, whenever I had changed the C++ code, the results were 
either nonsensical or R crashed.
* When R did not crash, I unloaded the dll with:
dyn.unload("C:\\Ccodefortransortmodel\\equildistCforodpair_2012_09_20v8.dll")
* Next, I deleted the .dll and the .o of the most recent version, I modified 
the C++ code and I ran the above procedure again.
* In most cases when R crashed, a new run of the procedure above resulted in 
correct results, even though the C++ code was not modified.

I have also noticed that, in several cases, apparently irrelevant changes to 
the code would lead to a crash. This occurred for instance when I replaced %d 
with %f in Rprintf()) - my first conjecture was actually that this was the root 
of the problem, but the code runs correctly now with %f.

My own conjecture is that, some way or another, old versions of equil_distC 
kept on circulating in memory.

For instance, even after a dyn.unload of all previous versions of the dll, 
"is.loaded("equil_distC")" would still lead to a TRUE.

I would like to understand the structural reason for these problems. Again, the 
problem is not the C++ code - it does run correctly now, although it had 
crashed in the previous run.


Best regards



Laurent Franckx, PhD
VITO NV
Boeretang 200, 2400 MOL, Belgium
Tel. + 32 14 33 58 22
Skype: laurent.franckx
laurent.fran...@vito.be




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Solved!! ( Dummy Variable : Doubt )

2012-09-20 Thread Eva Prieto Castro

Hi,

Finally I could resolve. I understood how you can use dummy variables in lm().

Thanks!

Eva

--- El jue, 20/9/12, Eva Prieto Castro  escribiÃ³:

De: Eva Prieto Castro 
Asunto: Re: Dummy Variable : Doubt
Para: R-help@r-project.org
Fecha: jueves, 20 de septiembre, 2012 11:27

Sorry, I could write Dummy and not Gummy.

Regards

--- El jue, 20/9/12, Eva Prieto Castro  escribiÃ³:

De: Eva Prieto Castro 
Asunto: Gummy Variable : Doubt
Para: R-help@r-project.org
Fecha: jueves, 20 de septiembre, 2012 11:13



Hi, 


Â 

I have a system in which I analyze 2 subjects and 1 variable, so I have
2 models as follow:

Â 

y ~ x_1[, 1] + x_2[, 1] + x_1[, 2] + x_2[, 2]

Â 

Where 

Â 

x_1[, i] = cos(2 * pi * t / T_i)

x_2[, i] = sin(2 * pi * t / T_i)

Â 

i = 1, 2

Â 

Data have two columns: t and y.

Â 

As you can see, I have a multiple components model, with rithm and
without trends, and I have a fundamental period (T_1 = 24 hour; T_2 = 12 hour).

Â 

I have to compare the parameters between the two models (one for each
subject), using a parametric test as described in the doc I adjunt (page 500,
Parametric solution):

Â 

I have to reach results as follow:

Â 

__

H0: Equality of...Â Â Â Â Â Â Â Â Â  dfÂ Â Â Â Â Â Â Â Â  
Â Â Â Â Â Â Â Â Â Â Â Â FÂ Â Â Â Â Â  Â Â Â Â Â Â Â Â Â p

__

MESORÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  Â Â Â  ( 1, 171)Â Â Â  224.0246 
Â Â Â <0.0001

(A,phi) 24hÂ Â Â Â Â Â Â Â Â Â Â  Â Â Â Â Â Â ( 2, 171)Â Â Â Â Â  
Â Â 7.6332Â Â  Â Â Â 0.0007

(A,phi) 24hÂ Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â ( 2, 171)Â Â Â Â  Â Â Â 5.8370Â  
Â Â Â Â 0.0035

Rhythmic
componentsÂ Â  Â Â Â  ( 4, 171)Â Â Â Â Â 
Â Â 6.3568Â  Â Â <0.0001

Whole modelÂ Â Â Â Â  Â Â Â Â Â Â Â Â Â  Â  (
5, 171)Â Â Â Â  Â 51.6583Â 
Â Â <0.0001




I know how to obtain df values and I know how to obtain F and p for the
whole model, because whole model means that all parameters of the two series
are equal, so it means that all values are in the same serie, so I construct a
unique serie concatenating the respective tâs and yâs vectors.

Â 

The problem is that I donât know how to obtain F in the other cases (H1:
equal mesor, H2.x: equal amplitude and acrophase, H3: equal rhythmic
components). I suppose I have to use dummy variables, but I donât know how to
do it.

Â 

I could access something similar in a solution manual ofÂ  a Weisberg
book (1985), chapter 6, problem 9, as follows:
m1 <- lm(Yvar~ Xvar + Fvar + Fvar:Xvar, na.action=na.omit, 
weights=theWeights)Â  # this is model 1 the most general 

m2 <- lm(Yvar~ Xvar + FvarÂ Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 2 parallel
m3 <- lm(Yvar~ Xvar + Fvar:XvarÂ Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 3 common intercept
m4 <- lm(Yvar~ XvarÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 4 the least general (all the same) 


Â 

Please could you help me?.

Â 

Thank you in advance.
Eva


Â 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] filtering out known instrumental error from time series

2012-09-20 Thread Arnaud Duranel


Hello all,
I am working with high-frequency hydrological time series, from 
automatic loggers. These have a known instrumental error, determined 
from laboratory tests. Could anybody please advise on methods (an indeed 
R packages/functions) that I could use to remove part of this 
instrumental noise, making sure that the time series are not smoothed 
beyond the known instrumental error?

Many thanks
A. Duranel - UCL Department of Geography

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Dummy Variable : Doubt

2012-09-20 Thread Eva Prieto Castro

Sorry, I could write Dummy and not Gummy.

Regards

--- El jue, 20/9/12, Eva Prieto Castro  escribiÃ³:

De: Eva Prieto Castro 
Asunto: Gummy Variable : Doubt
Para: R-help@r-project.org
Fecha: jueves, 20 de septiembre, 2012 11:13



Hi, 


Â 

I have a system in which I analyze 2 subjects and 1 variable, so I have
2 models as follow:

Â 

y ~ x_1[, 1] + x_2[, 1] + x_1[, 2] + x_2[, 2]

Â 

Where 

Â 

x_1[, i] = cos(2 * pi * t / T_i)

x_2[, i] = sin(2 * pi * t / T_i)

Â 

i = 1, 2

Â 

Data have two columns: t and y.

Â 

As you can see, I have a multiple components model, with rithm and
without trends, and I have a fundamental period (T_1 = 24 hour; T_2 = 12 hour).

Â 

I have to compare the parameters between the two models (one for each
subject), using a parametric test as described in the doc I adjunt (page 500,
Parametric solution):

Â 

I have to reach results as follow:

Â 

__

H0: Equality of...Â Â Â Â Â Â Â Â Â  dfÂ Â Â Â Â Â Â Â Â  
Â Â Â Â Â Â Â Â Â Â Â Â FÂ Â Â Â Â Â  Â Â Â Â Â Â Â Â Â p

__

MESORÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  Â Â Â  ( 1, 171)Â Â Â  224.0246 
Â Â Â <0.0001

(A,phi) 24hÂ Â Â Â Â Â Â Â Â Â Â  Â Â Â Â Â Â ( 2, 171)Â Â Â Â Â  
Â Â 7.6332Â Â  Â Â Â 0.0007

(A,phi) 24hÂ Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â ( 2, 171)Â Â Â Â  Â Â Â 5.8370Â  
Â Â Â Â 0.0035

Rhythmic
componentsÂ Â  Â Â Â  ( 4, 171)Â Â Â Â Â 
Â Â 6.3568Â  Â Â <0.0001

Whole modelÂ Â Â Â Â  Â Â Â Â Â Â Â Â Â  Â  (
5, 171)Â Â Â Â  Â 51.6583Â 
Â Â <0.0001




I know how to obtain df values and I know how to obtain F and p for the
whole model, because whole model means that all parameters of the two series
are equal, so it means that all values are in the same serie, so I construct a
unique serie concatenating the respective tâs and yâs vectors.

Â 

The problem is that I donât know how to obtain F in the other cases (H1:
equal mesor, H2.x: equal amplitude and acrophase, H3: equal rhythmic
components). I suppose I have to use dummy variables, but I donât know how to
do it.

Â 

I could access something similar in a solution manual ofÂ  a Weisberg
book (1985), chapter 6, problem 9, as follows:
m1 <- lm(Yvar~ Xvar + Fvar + Fvar:Xvar, na.action=na.omit, 
weights=theWeights)Â  # this is model 1 the most general 

m2 <- lm(Yvar~ Xvar + FvarÂ Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 2 parallel
m3 <- lm(Yvar~ Xvar + Fvar:XvarÂ Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 3 common intercept
m4 <- lm(Yvar~ XvarÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 4 the least general (all the same) 


Â 

Please could you help me?.

Â 

Thank you in advance.
Eva


Â 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Gummy Variable : Doubt

2012-09-20 Thread Eva Prieto Castro



Hi, 


Â 

I have a system in which I analyze 2 subjects and 1 variable, so I have
2 models as follow:

Â 

y ~ x_1[, 1] + x_2[, 1] + x_1[, 2] + x_2[, 2]

Â 

Where 

Â 

x_1[, i] = cos(2 * pi * t / T_i)

x_2[, i] = sin(2 * pi * t / T_i)

Â 

i = 1, 2

Â 

Data have two columns: t and y.

Â 

As you can see, I have a multiple components model, with rithm and
without trends, and I have a fundamental period (T_1 = 24 hour; T_2 = 12 hour).

Â 

I have to compare the parameters between the two models (one for each
subject), using a parametric test as described in the doc I adjunt (page 500,
Parametric solution):

Â 

I have to reach results as follow:

Â 

__

H0: Equality of...Â Â Â Â Â Â Â Â Â  dfÂ Â Â Â Â Â Â Â Â  
Â Â Â Â Â Â Â Â Â Â Â Â FÂ Â Â Â Â Â  Â Â Â Â Â Â Â Â Â p

__

MESORÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  Â Â Â  ( 1, 171)Â Â Â  224.0246 
Â Â Â <0.0001

(A,phi) 24hÂ Â Â Â Â Â Â Â Â Â Â  Â Â Â Â Â Â ( 2, 171)Â Â Â Â Â  
Â Â 7.6332Â Â  Â Â Â 0.0007

(A,phi) 24hÂ Â Â Â Â Â Â Â Â  Â Â Â Â Â Â Â Â ( 2, 171)Â Â Â Â  Â Â Â 5.8370Â  
Â Â Â Â 0.0035

Rhythmic
componentsÂ Â  Â Â Â  ( 4, 171)Â Â Â Â Â 
Â Â 6.3568Â  Â Â <0.0001

Whole modelÂ Â Â Â Â  Â Â Â Â Â Â Â Â Â  Â  (
5, 171)Â Â Â Â  Â 51.6583Â 
Â Â <0.0001




I know how to obtain df values and I know how to obtain F and p for the
whole model, because whole model means that all parameters of the two series
are equal, so it means that all values are in the same serie, so I construct a
unique serie concatenating the respective tâs and yâs vectors.

Â 

The problem is that I donât know how to obtain F in the other cases (H1:
equal mesor, H2.x: equal amplitude and acrophase, H3: equal rhythmic
components). I suppose I have to use dummy variables, but I donât know how to
do it.

Â 

I could access something similar in a solution manual ofÂ  a Weisberg
book (1985), chapter 6, problem 9, as follows:
m1 <- lm(Yvar~ Xvar + Fvar + Fvar:Xvar, na.action=na.omit, 
weights=theWeights)Â  # this is model 1 the most general 

m2 <- lm(Yvar~ Xvar + FvarÂ Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 2 parallel
m3 <- lm(Yvar~ Xvar + Fvar:XvarÂ Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 3 common intercept
m4 <- lm(Yvar~ XvarÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  , na.action=na.omit, 
weights=theWeights)Â  # this is model 4 the least general (all the same) 


Â 

Please could you help me?.

Â 

Thank you in advance.
Eva


Â 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with Newton_Raphson

2012-09-20 Thread Christopher Kelvin

Hello,
I have being trying to estimate the parameters of the generalized exponential 
distribution. The random number generation for the GE distribution 
is x<-(-log(1-U^(1/p1))/b), where U stands for uniform dist. The data i have 
generated to estimate the parameters is right censored and the code is given 
below; The problem is that, the newton-Raphson approach isnt working and i do 
not know what is wrong. Can somebody suggest something or help in identifying 
what the problem might be. 

p1<-0.6;b<-2
n=20;rr=5000
U<-runif(n,0,1)
for (i in 1:rr){

x<-(-log(1-U^(1/p1))/b)

 meantrue<-gamma(1+(1/p1))*b
  meantrue
  d<-meantrue/0.30
  cen<- runif(n,min=0,max=d)
  s<-ifelse(x<=cen,1,0)
  q<-c(x,cen)

    z<-function(data, p){ 
    shape<-p[1]
    scale<-p[2]
    log1<-n*sum(s)*log(p[1])+ 
n*sum(s)*log(p[2])+(p[1]-1)*sum(s)*log(1-((exp(-(p[2])*sum(x)
-(p[2])*sum(t) + (p[1])*log((exp(-(p[2])*sum(x-
(p[1])*sum(s)*log((exp(-(p[2])*sum(x
  return(-log1)
  }
}
  start <- c(1,1)
  zz<-optim(start,fn=z,data=q,hessian=T)
  zz
  m1<-zz$par[2]
  p<-zz$par[1] 


Thank you
Chris Kelvin
INSPEM. UPM


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] qplot: plotting precipitation data

2012-09-20 Thread Hadley Wickham

Hmmm, I'm not sure what the problem is, but I suspect it's related to
the fact the xmin and xmax have different factors levels and there are
some bugs in ggplot2 related to combining factors in some situations
(it's basically impossible to always do it right).

Explicitly ensuring the levels were the same doesn't help, but setting
the xlims does.  Winston, is this related to some of the other bugs
we've been working on lately?

mydata <- structure(list(chrom = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label =
c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7",
"chr8", "chr9", "chrX"), class = "factor"), start = c(5291000L, 10988025L,
11767950L, 11840900L, 12267450L, 12276675L), end = c(5291926L, 10988526L,
11768676L, 11841851L, 12268076L, 12277051L), peak = c(8L, 7L, 8L, 8L, 12L, 7L)),
.Names = c("chrom", "start", "end", "peak" ), row.names = c(NA, -6L), class =
"data.frame")

ggplot(mydata) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak))

unique(mydata$start)
unique(mydata$end)

levels <- sort(unique(c(mydata$start, mydata$end)))
mydata$start <- factor(mydata$start, levels = levels)
mydata$end <- factor(mydata$end, levels = levels)

ggplot(mydata, aes(x = start)) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak)) +
  xlim(as.character(levels))


On Sun, Sep 16, 2012 at 11:11 AM, Rui Barradas  wrote:
> Maybe a bug in ggplot2::geom_rect?
>
> I'm Cceing this to Hadley Wickham, maybe he has an answer.
>
> Rui Barradas
>
> Em 16-09-2012 17:04, John Kane escreveu:
>>>
>>> -Original Message-
>>> From: ruipbarra...@sapo.pt
>>> Sent: Sun, 16 Sep 2012 13:13:47 +0100
>>> To: jrkrid...@inbox.com
>>> Subject: Re: [R] qplot: plotting precipitation data
>>>
>>> Hello,
>>>
>>> Relative to the op's "request" for rectangls, I'm not understanding them.
>>
>> Neither am I really, I just googled a couple of sites for possible
>> "chromatin precipitation" graphs and since the OP was not sure of the name
>> of the geom made the assumption that they wanted a bar chart as it seemed
>> like the simplest graph matching the 'rectanngles" statement.  I was
>> assuming a terminology or language problem here and I could not see any
>> reason the OP wanted purely rectangles.
>>
>>> In your plot using geom_bar, the levels of as.factor(start) are sorted
>>> ascending. If both
>>>
>>>   > as.factor(mydata$start)
>>> [1] 5291000  10988025 11767950 11840900 12267450 12276675
>>> Levels: 5291000 10988025 11767950 11840900 12267450 12276675
>>>   > as.factor(mydata$end)
>>> [1] 5291926  10988526 11768676 11841851 12268076 12277051
>>> Levels: 5291926 10988526 11768676 11841851 12268076 12277051
>>>
>>> also are, why isn't geom_rect ploting them by that order?
>>>
>>> p2 <- ggplot(mydata, aes(x = as.factor(start), y = peak))
>>> p2 + geom_rect(aes(xmin = as.factor(start), xmax = as.factor(end), ymin
>>> = 0, ymax = peak))
>>>
>>> The level 5291926 is place last. Shouldn't it be expected to plot as
>>> first?
>>
>> This is far beyond my knowledge of ggplot but I would certainly think it
>> should.
>>   as.numeric( as.factor(mydata$start)))
>> [1] 1 2 3 4 5 6
>>
>> so why would we get something like 2 3 4 5 6 1  if I am reading this
>> correctly?
>>
>>
>>> Rui Barradas
>>>
>>> Em 16-09-2012 00:20, John Kane escreveu:

 Thanks for the data. It makes things much easier.

 Do you want a bar chart (i.e. geom  = bar in qplot or geom_bar in
 ggplot)? That sounds like what you mean when you speak of rectangles.

 If so try this ggplot) command -- I almost never use qplot() so I am not
 quite sure how to specify it there.

 p  <-  ggplot(mydata , aes(as.factor(start), peak )) + geom_bar(stat=
 "identity", )
 p


 John Kane
 Kingston ON Canada


> -Original Message-
> From: hnorp...@googlemail.com
> Sent: Sat, 15 Sep 2012 18:39:54 +0200
> To: r-help@r-project.org
> Subject: [R] qplot: plotting precipitation data
>
> Dear list,
>
> I wish to plot chromatin precipitation data: I would like to have a
> rectangles (x:end-start, y:peak) but I do not have an idea how to
> define
> x
> (in terms of qplot syntax) and to choose the correct geom.
>mydata is a subset of a larger file.
>
>> mydata
>
> chromstart  end   peak
> 1 chr11  5291000  52919268
> 2 chr11 10988025 109885267
> 3 chr11 11767950 117686768
> 4 chr11 11840900 118418518
> 5 chr11 12267450 12268076   12
> 6 chr11 12276675 122770517
>>
>> dput(mydata)
>
> structure(list(chrom = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label =
> c("chr1",
> "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
> "chr17", "chr18", "chr19", "chr2", "chr3", "chr4", "chr5", "chr6",
> "chr7", "chr8", "chr9", "chrX"), class = "factor"), star

[R] Sweave - if \Sexpr{} than \SweaveInput{"my.Rnw"}

2012-09-20 Thread Witold E Wolski

Depending on an R computation I would like to include an Sweave documents
in the main Sweave document.
How can I do it?

So I was thinking  to use Latex features :

\newif\ifpaper

\ifpaper

\SweaveInput{"my1.Rnw"}
\else
 \SweaveInput{"my2.Rnw"}
\fi

But how do I set paper to true or false given an \Sexpr ??

\papertrue % or

\paperfalse


Any ideas?


cheers


-- 
Witold Eryk Wolski

Triemlistrasse 155
8047 Zuerich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Newton_Raphson

2012-09-20 Thread Berend Hasselman

On 20-09-2012, at 13:46, Christopher Kelvin wrote:

> Hello,
> I have being trying to estimate the parameters of the generalized exponential 
> distribution. The random number generation for the GE distribution is 
> x<-(-log(1-U^(1/p1))/b), where U stands for uniform dist. The data i have 
> generated to estimate the parameters is right censored and the code is given 
> below; The problem is that, the newton-Raphson approach isnt working and i do 
> not know what is wrong. Can somebody suggest something or help in identifying 
> what the problem might be. 
> 

Newton-Raphson? is not a method for optim.

> p1<-0.6;b<-2
> n=20;rr=5000
> U<-runif(n,0,1)
> for (i in 1:rr){
> 
> x<-(-log(1-U^(1/p1))/b)
> 
>  meantrue<-gamma(1+(1/p1))*b
>   meantrue
>   d<-meantrue/0.30
>   cen<- runif(n,min=0,max=d)
>   s<-ifelse(x<=cen,1,0)
>   q<-c(x,cen)
> 
> z<-function(data, p){ 
> shape<-p[1]
> scale<-p[2]
> log1<-n*sum(s)*log(p[1])+ 
> n*sum(s)*log(p[2])+(p[1]-1)*sum(s)*log(1-((exp(-(p[2])*sum(x)
> -(p[2])*sum(t) + (p[1])*log((exp(-(p[2])*sum(x-
> (p[1])*sum(s)*log((exp(-(p[2])*sum(x
>   return(-log1)
>   }
> }
>   start <- c(1,1)
>   zz<-optim(start,fn=z,data=q,hessian=T)
>   zz
>   m1<-zz$par[2]
>   p<-zz$par[1] 
> 

Running your code given above gives an error message:

Error in sum(t) : invalid 'type' (closure) of argument

Where is object 't'?

Why are you defining function z within the rr loop? Only the last definition is 
given to optim.
Why use p[1] and p[2] explicitly in the calculation of log1 in the body of 
function z when you can use shape and scale defined in the lines before log1 <-.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multinomial MCMCglmm

2012-09-20 Thread Vaniscotte


Dear all,

I would like to add mixed effects in a multinomial model and I am trying 
to use MCMCglmm for that.


The main problem I face: my data set consits of a trapping data set, 
where the observation at eah trap (1 or 0 for each species) have been 
aggregated per traplines. Therefore we have a proportion of 
presence/absence for each species per trapline.


ex:
ID_line mesh habitat Apsy Mygl Crle Crru Miag Miar Mimi Mumu Misu Soar 
Somi
11  028S6A   28   copse200000000
00
12  028S6B   28   copse110000000
00
13  028S6C   28   hedge200400000
00
14  028S6D   28   hedge100700001
00
15  028S6E   28   hedge700100000
00

empty
1128
1228
1324
1421
1522

When I run the following:

> test1 <- 
MCMCglmm(fixed=cbind(Apsy,Mygl,Crle,Crru,Miag,Miar,Mimi,Mumu,Misu,Soar,Somi,empty)~habitat,random=~mesh,family="multinomial12",data=metalSmA[,c(2,9,23:34)],rcov=~us(trait):units) 



I got some error when running regarding the variance structure:

> "ill-conditioned G/R structure: use proper priors if you haven't or 
rescale data if you have"


I guess that the problem comes from the nature of my observation whih 
are frequencies rather than 0/1 per unit


Does someone know if a multinomial model fitted with MCMCglmm can 
handdle those frequencies table and how to specify the good G/R variance 
structures?



Regards

Amélie Vaniscotte

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test and Mean Ratios

2012-09-20 Thread Ben Bolker

Mohamed Radhouane Aniba  gmail.com> writes:

> 
> 
> Thank you Thomas,
> 
> So you think a t-test is more adequate to use in this case ?
> 
> Rad

  No, because a t-test makes even stronger parametric assumptions.
You were given more specific advice on stackoverflow 
http://stackoverflow.com/questions/12499687/wilcoxon-test-and-mean-ratios
If you want to prove that there is *some* difference between the
distributions, you're done. If you want to test for some specific
difference, you need to think more about what kind of test you want
to do.  Permutation tests with various test statistics are a way
to approach that.

  Ben Bolker

> 
> On Sep 19, 2012, at 8:43 PM, Thomas Lumley  uw.edu> wrote:
> 
> > On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
> >  gmail.com> wrote:
> >> Hello All,
> >> 
> >> I am writing to ask your opinion on how to interpret this case. I have two
vectors "a" and "b" that I am trying
> to compare.

  [snip]

> > 
> > There's nothing conceptually strange about the Wilcoxon test showing a
> > difference in the opposite direction to the difference in means.  It's
> > probably easiest to think about this in terms of the Mann-Whitney
> > version of the same test, which is based on the proportion of pairs of
> > one observation from each group where the `a' observation is higher.
> > Your 'c' vector has a lot more zeros, so a randomly chosen observation
> > from 'c' is likely to be smaller than one from 'a', but the non-zero
> > observations seem to be larger, so the mean of 'c' is higher.
> > 
> > The Wilcoxon test probably isn't very useful in a setting like this,
> > since its results really make sense only under 'stochastic ordering',
> > where the shift is in the same direction across the whole
> > distribution.
> > 
> >  -thomas
> > 
> > -- 
> > Thomas Lumley
> > Professor of Biostatistics
> > University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave - if \Sexpr{} than \SweaveInput{"my.Rnw"}

2012-09-20 Thread Duncan Murdoch


On 20/09/2012 8:47 AM, Witold E Wolski wrote:

Depending on an R computation I would like to include an Sweave documents
in the main Sweave document.
How can I do it?

So I was thinking  to use Latex features :

\newif\ifpaper

\ifpaper

\SweaveInput{"my1.Rnw"}
\else
  \SweaveInput{"my2.Rnw"}
\fi

But how do I set paper to true or false given an \Sexpr ??

\papertrue % or

\paperfalse


Any ideas?


The SweaveInput directives are processed before any expressions are 
evaluated, so you can't do it that way.  You can have Sweave chunks emit 
LaTex code, so this might achieve a similar effect:


<>=
if ( test ) name <- "my1"
else name <- "my2"

Sweave( paste0(name, ".Rnw") )
paste0("\\input{", name, ".tex}")
@

I've never tried having a Sweave chunk call Sweave(), so there might be 
troubles there, and you might only be able to input .tex files, not Rnw 
files.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Quantile regression with large number of fixed effects

2012-09-20 Thread C.Steinwender

Dear R users, 

I am trying to estimate a median regression with fixed effects. I have an 
unbalanced panel data set with 5,000 individuals and 10 years, resulting in a 
total of 20,000 observations.

When I try to add individual (firmid) fixed effects to the quantile regression 
using the following command:
result<-rq(y~x+factor(firmid),tau=0.5)

I get the following error message: 
"Error: cannot allocate vector of size 536.9 Mb"

I assume this has to do with the large number of fixed effects (because it 
worked when I only used year fixed effects). I have already set the memory to 
the (I believe) maximum limit using 
memory.limit(4000)

What can I do to implement this regression in R? 

Many thanks in advance,

Claudia

Please access the attached hyperlink for an important electronic communications 
disclaimer: http://lse.ac.uk/emailDisclaimer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Variance Inflation Factor VIC() with a matrix

2012-09-20 Thread Martin H. Schmidt


Hi everyone,

Running the vif() function from the car package like


> reg2 <- lm(CARsPur~Delay_max10+LawChange+MarketTrend_20d+MultiTrade, 
data=data.frame(VarVecPur))

> vif(reg2)
Delay_max10   LawChange MarketTrend_20d  MultiTrade
   1.0105721.0098741.0042781.003351


gives a useful result. But using the right-hand variables as a matrix in 
the following way doesn't work with the vif() function:



> reg  <- lm(CARsPur~VarVecPur)
> summary(reg)

Call:
lm(formula = CARsPur ~ VarVecPur)

Residuals:
 Min   1Q   Median   3Q  Max
-0.72885 -0.06461  0.00493  0.06873  0.74936

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.037860   0.006175  -6.131 9.25e-10 ***
VarVecPurDelay_max10  0.003661   0.001593   2.298   0.0216 *
VarVecPurLawChange0.004679   0.006185   0.757   0.4493
VarVecPurMarketTrend_20d  0.019015   0.001409  13.493  < 2e-16 ***
VarVecPurMultiTrade  -0.005081   0.003129  -1.624   0.1045
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1229 on 6272 degrees of freedom
Multiple R-squared: 0.03021,Adjusted R-squared: 0.02959
F-statistic: 48.84 on 4 and 6272 DF,  p-value: < 2.2e-16

> vif(reg)
Error in vif.lm(reg) : model contains fewer than 2 terms


Is there a solution or a way to work around?

Thank you very much in advanced.



--
Kind Regards,

Martin H. Schmidt
Humboldt University Berlin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] validate.lrm - confidence interval for boostrap-corrected AUC ?

2012-09-20 Thread Frank Harrell

It does not offer that feature.  That is an area of active research.  If
anyone knows of a paper that has solved this problem please let me know.
Frank

Whee Sze Ong wrote
> Hi
> 
> 
> 
> Does anyone know whether the rms package provides a confidence interval
> for
> the bootstrap-corrected Dxy or c-index?
> 
> 
> 
> I have fitted a logistic model, and would like to obtain the 95%
> confidence
> interval of the bootstrap-corrected area under the ROC curve estimate.
> 
> 
> 
> Thanks.
> 
>   [[alternative HTML version deleted]]
> 
> __

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/validate-lrm-confidence-interval-for-boostrap-corrected-AUC-tp4643708p4643722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] stats q: median regression and using ln(y+1)

2012-09-20 Thread knallgrau

Hi there,

I hope you don't mind me asking a general stats rather than a R-specific 
question.

I'm estimating a median regression model and would like to interpret my (mainly 
categorical) regression coefficients in terms of percentage changes in the 
dependent variable. There are about 10-15%(naturally occuring) zeros in my 
datasets. The dependent variable (y) varies from zero to about 100,000.

Would using ln(y+1) as the dependent variable be more defensible with quantile 
regression compared to the OLS context?

Best,
Irene

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Newton_Raphson

2012-09-20 Thread Berend Hasselman

On 20-09-2012, at 15:17, Christopher Kelvin wrote:

> By your suggestion if i understand you, i have changed (p[1]) and (p[2]) and 
> have also corrected the error sum(t), but if i run it, the parameters 
> estimates are very large.
> Can you please run it and help me out? The code is given below.
> 
> 
> p1<-0.6;b<-2
> n=20;rr=5000
> U<-runif(n,0,1)
> for (i in 1:rr){
> 
> x<-(-log(1-U^(1/p1))/b)
> 
>  meantrue<-gamma(1+(1/p1))*b
>   meantrue
>   d<-meantrue/0.30
>   cen<- runif(n,min=0,max=d)
>   s<-ifelse(x<=cen,1,0)
>   q<-c(x,cen)
> }
> z<-function(data, p){ 
> shape<-p[1]
> scale<-p[2]
> log1<-n*sum(s)*log(shape)+ 
> n*sum(s)*log(scale)+(shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x)
> -(scale)*sum(x) + (shape)*log((exp(-(scale)*sum(x-
> (shape)*sum(s)*log((exp(-(scale)*sum(x
>   return(-log1)
>   }
>   start <- c(1,1)
>   zz<-optim(start,fn=z,data=q,hessian=T)
>   zz

1. You think you are using a (quasi) Newton method. But the default method is 
"Nelder-Mead". You should specify method explicitly for example method="BFGS". 
When you do that you will see that the results are completely different. You 
should also carefully inspect the return value of optim. In your case 
zz$convergence is 1 which means "indicates that the iteration limit maxit had 
been reached.".
When you use method="BFGS" you will get zz$ convergence is 0.

This may do what you want

zz<-optim(start, fn=z, data=q, method="BFGS", hessian=T)
zz

2. when providing examples such as yours it is a good idea to issue 
set.seed() at the start of your script to ensure reproducibility.

3. The function z does not calculate what you want: since fully formed 
expressions are terminated by a newline and since the remainder of the line 
after log1<- is a complete expression, log1 does include the value of the two 
following  lines.
See the difference between 

a <- 1
b <- 11
a
-b

and

a-
b

So you could write this

log1 <- n*sum(s)*log(shape)+ n*sum(s)*log(scale)+
(shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x) -(scale)*sum(x) + 
(shape)*log((exp(-(scale)*sum(x - 
(shape)*sum(s)*log((exp(-(scale)*sum(x

and you will see rather different results.
You will have to determine if they are what you expect/desire.

A final remark on function z:

- do not calculate things like n*sum(s) repeatedly: doing something like 
A<-n*sum(s) and reusing A is more efficient.
- same thing for log((exp(-(scale)*sum(x (recalculated four times)
- same thing for sum(x)

See below for a partly rewritten function z and results with method="BFGS".
I have put a set.seed(1) at the start of your code.

good luck,

Berend

z<-function(p,data){ 
shape<-p[1]
scale<-p[2]
log1 <- n*sum(s)*log(shape)+ n*sum(s)*log(scale)+
(shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x) -(scale)*sum(x) + 
(shape)*log((exp(-(scale)*sum(x - 
(shape)*sum(s)*log((exp(-(scale)*sum(x
return(-log1)
}

start <- c(1,1)
> z(start, data=q)
[1] -138.7918

> zz<-optim(start, fn=z, data=q, method="BFGS", hessian=T)
There were 34 warnings (use warnings() to see them)
> zz
$par
[1] 1.009614e+11 8.568498e+01

$value
[1] -1.27583e+15

$counts
function gradient 
 270   87 

$convergence
[1] 0

$message
NULL

$hessian
   [,1] [,2]
[1,] -625000
[2,]  00

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Newton_Raphson

2012-09-20 Thread Berend Hasselman

On 20-09-2012, at 16:06, Berend Hasselman wrote:

> 
> On 20-09-2012, at 15:17, Christopher Kelvin wrote:
> 
> .
> A final remark on function z:
> 
> - do not calculate things like n*sum(s) repeatedly: doing something like 
> A<-n*sum(s) and reusing A is more efficient.
> - same thing for log((exp(-(scale)*sum(x (recalculated four times)
> - same thing for sum(x)
> 

Oops!
No not four times. Twice.
But (exp(-(scale)*sum(x))) is calculated three times.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD check and browser 2.

2012-09-20 Thread R. Michael Weylandt

On Mon, Sep 17, 2012 at 3:49 PM, Christian Hoffmann
 wrote:
> Hi,
>
> The browser opens, when the command
>
> * checking whether package 'cwhmisc' can be installed ... OK
>
> is executed.
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
>  [1] tcltk stats4splines   parallel  datasets  compiler graphics
>  [8] grDevices stats grid  utils tools methods base
>
> other attached packages:
>  [1] survival_2.36-14  spatial_7.3-3 rpart_3.1-53 nnet_7.3-1
>  [5] nlme_3.1-104  mgcv_1.7-18   foreign_0.8-50 codetools_0.2-8
>  [9] cluster_1.14.2class_7.3-3   boot_1.3-4 Matrix_1.0-6
> [13] MASS_7.3-18   KernSmooth_2.23-7 cwhmisc_3.0 lattice_0.20-6
>>
>
> To my knowledge there is no access to teh help system within the package,
> and this effect is occurring sporadically.
>
> What would a 'minimal "non-working" example' be?

Hi Christian,

Please reply to both me and the list (or just the list, since I'll see
it that way as well) so that others can help you when I am busy (as
for the last few days)

A non-working example would be a package wherein you've chipped away
as much as you can while the problem is still being displayed.
Alternatively, if you really want us to be able to look at this,
you'll have to point to a server where a copy of your package can be
obtained.

Alternatively: does the problem occur if you CMD check other packages?
If so, it's not a problem with your package but rather one of your
environment variables likely got messed up in some creative fashion.

Michael

>
> Cheers
>
> Christian
>
> Am 15.09.12 23:53, schrieb R. Michael Weylandt:
>>
>> On Sat, Sep 15, 2012 at 8:56 PM, Christian Hoffmann
>>  wrote:
>>>
>>> Hi,
>>>
>>> Every time I do
>>>
>>> system("R CMD check mypackckage"
>>>
>>> the process executes a help.start(). Why does this happen?
>>>
>> Hi Christian,
>>
>> I'm afraid I can't replicate this with the packages I keep local
>> copies of and installed version of R: could you perhaps make the
>> package source available (or point us to it if its own GitHub,
>> r-forge, etc.?) and give us your sessionInfo() so we can try to
>> reproduce? If the package is sensitive, perhaps work up a minimal
>> "non-working" example?
>>
>>
>> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
>>
>> My immediate guess is that something in the package accesses the help
>> system, but obviously I can't be sure without having something to play
>> with.
>>
>> Cheers,
>> Michael
>>
>
> --
> Christian W. Hoffmann,
> CH - 8915 Hausen am Albis, Switzerland
> Rigiblickstrasse 15 b, Tel.+41-44-7640853
> c-w.hoffm...@sunrise.ch,
> christ...@echoffmann.ch,
> www.echoffmann.ch
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Newton_Raphson

2012-09-20 Thread Christopher Kelvin

By your suggestion if i understand you, i have changed (p[1]) and (p[2]) and 
have also corrected the error sum(t), but if i run it, the parameters estimates 
are very large.
Can you please run it and help me out? The code is given below.

p1<-0.6;b<-2
n=20;rr=5000
U<-runif(n,0,1)
for (i in 1:rr){

x<-(-log(1-U^(1/p1))/b)

 meantrue<-gamma(1+(1/p1))*b
  meantrue
  d<-meantrue/0.30
  cen<- runif(n,min=0,max=d)
  s<-ifelse(x<=cen,1,0)
  q<-c(x,cen)
}
    z<-function(data, p){ 
    shape<-p[1]
    scale<-p[2]
    log1<-n*sum(s)*log(shape)+ 
n*sum(s)*log(scale)+(shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x)
-(scale)*sum(x) + (shape)*log((exp(-(scale)*sum(x-
(shape)*sum(s)*log((exp(-(scale)*sum(x
  return(-log1)
  }
  start <- c(1,1)
  zz<-optim(start,fn=z,data=q,hessian=T)
  zz

Thank you
Chris Kelvin

- Original Message -
From: Berend Hasselman 
To: Christopher Kelvin 
Cc: "r-help@r-project.org" 
Sent: Thursday, September 20, 2012 8:52 PM
Subject: Re: [R] Problem with Newton_Raphson

On 20-09-2012, at 13:46, Christopher Kelvin wrote:

> Hello,
> I have being trying to estimate the parameters of the generalized exponential 
> distribution. The random number generation for the GE distribution is 
> x<-(-log(1-U^(1/p1))/b), where U stands for uniform dist. The data i have 
> generated to estimate the parameters is right censored and the code is given 
> below; The problem is that, the newton-Raphson approach isnt working and i do 
> not know what is wrong. Can somebody suggest something or help in identifying 
> what the problem might be. 
> 

Newton-Raphson? is not a method for optim.

> p1<-0.6;b<-2
> n=20;rr=5000
> U<-runif(n,0,1)
> for (i in 1:rr){
> 
> x<-(-log(1-U^(1/p1))/b)
> 
>  meantrue<-gamma(1+(1/p1))*b
>   meantrue
>   d<-meantrue/0.30
>   cen<- runif(n,min=0,max=d)
>   s<-ifelse(x<=cen,1,0)
>   q<-c(x,cen)
> 
>     z<-function(data, p){ 
>     shape<-p[1]
>     scale<-p[2]
>     log1<-n*sum(s)*log(p[1])+ 
>n*sum(s)*log(p[2])+(p[1]-1)*sum(s)*log(1-((exp(-(p[2])*sum(x)
> -(p[2])*sum(t) + (p[1])*log((exp(-(p[2])*sum(x-
> (p[1])*sum(s)*log((exp(-(p[2])*sum(x
>   return(-log1)
>   }
> }
>   start <- c(1,1)
>   zz<-optim(start,fn=z,data=q,hessian=T)
>   zz
>   m1<-zz$par[2]
>   p<-zz$par[1] 
> 

Running your code given above gives an error message:

Error in sum(t) : invalid 'type' (closure) of argument

Where is object 't'?

Why are you defining function z within the rr loop? Only the last definition is 
given to optim.
Why use p[1] and p[2] explicitly in the calculation of log1 in the body of 
function z when you can use shape and scale defined in the lines before log1 <-.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Applying glm coefficients (Beginner Question)

2012-09-20 Thread SirRon

Hello,
I am working with a dataset with three variables and one binomial parameter.
The glm function provides coefficients for these three variables, e.g.
-1.5 | 27.2 | -2.9

If I'm not mistaken, $fitted.values gives me an estimate of how likely my
parameter is to be true/1 . I would like to apply these coefficients on
other variables to predict the binomial parameter but I'm not sure how to
make use of them.

To clarify a bit more I'm looking for a formula to calculate the chance that
the parameter is true/1, based on the three variables/coefficients,
something like

-1.5*V1+27.2*V2-2.9*V2

I hope someone understands my awkwardly worded question and is able to help
me out - thanks!



--
View this message in context: 
http://r.789695.n4.nabble.com/Applying-glm-coefficients-Beginner-Question-tp4643737.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Line over Boxplot

2012-09-20 Thread gfishel

Very much a rookie at R, and have only recently started using it again so
pardon the simple question. I am trying to produce a box plot from one data
set and then overlay a line plot from another data set. The box plot data
set is made up of 20 sets of 30 data points, or 600 total data points. The
line has only 30 total data points. The box plot is plotting fine, but for
some reason, the line plot is starting at the 6th data position and running
off the screen. I tried modifying the text file so that the data repeated it
self 30 times to make the total number of lines in the file identical, but
that did not help! Here are my two datasets.

temp.final.text
  
tmax.final.text
  

And here is my script

R   --save --no-save --vanilla  <<  EOF

pdf(file="boxplot_tmax_$$MM$DD${HH}.pdf", height=10, width=12)

soton.df = read.table ("tmax.final.text", header=TRUE)
gfs.df = read.table ("greg.txt", header=TRUE)
boxplot (TMAX ~ HOUR, data=soton.df, xlab="Forecast Hour", ylab="MAX TEMP",
main="GEFS $$MM$DD ${HH}Z FORECAST MAX TEMPS", whiskcol="red",
col="red", outline=TRUE, ylim=c(0,100),xlim=c(1,30),xaxs="i",yaxs="i")
lines (data=gfs.df, type="o", col="green")
par(new=TRUE)
abline(h=seq(0,100,by=5),lty=2)
abline(v=seq(1,30,by=1),lty=2)



EOF

Thanks for  helping me out!




--
View this message in context: 
http://r.789695.n4.nabble.com/Line-over-Boxplot-tp4643736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice dotplot reorder contiguous levels

2012-09-20 Thread maxbre

my reproducible example

test<-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("A", 
"B", "C", "D", "E"), class = "factor"), conc = c(2.32, 0.902, 
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442, 
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315, 
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30, 
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61, 
3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L, 
4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L, 
3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L, 
8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c("2", 
"4", "12", "24", "96", "135", "167", "168", "169"), class = "factor")),
.Names = c("site", 
"conc", "samp.time"), row.names = c(NA, 52L), class = "data.frame")



dotplot(samp.time~conc|site, data=test,
scales=list(x=list(log=10), y = list(relation = "free")),
layout=c(1,5), strip=FALSE, strip.left=TRUE
)


my objective is to use “site” as conditioning variable but with “samp.time”
correctly grouped by “site”; the problem here is to ensure that levels of
“samp.time” within each “site” are contiguous as otherwise they would be not
contiguous in the dot plot itself (i.e, avoid that sort of holes in between
y axis categories -see dotplot -)


I’ve been trying with this but without much success

test$samp.time.new<-
  with(test,reorder(samp.time,as.numeric(site)))


dotplot(samp.time.new~conc|site, data=test,
scales=list(x=list(log=10), y = list(relation = "free")),
layout=c(1,5), strip=FALSE, strip.left=TRUE
)

I think (I hope) a possible different solution is to create for "ylim" a
proper character vector of different length to pass to each panel of the
dotplot (I’m not posting this attempt because too much confused up to now)

can anyone point me in the right direction?
any help much appreciated

thank you




--
View this message in context: 
http://r.789695.n4.nabble.com/lattice-dotplot-reorder-contiguous-levels-tp4643741.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with Newton_Raphson

2012-09-20 Thread Christopher Kelvin

Thank you very much for everything. Your suggestions were very helpful. 

Chris

- Original Message -
From: Berend Hasselman 
To: Christopher Kelvin 
Cc: "r-help@r-project.org" 
Sent: Thursday, September 20, 2012 10:06 PM
Subject: Re: [R] Problem with Newton_Raphson

On 20-09-2012, at 15:17, Christopher Kelvin wrote:

> By your suggestion if i understand you, i have changed (p[1]) and (p[2]) and 
> have also corrected the error sum(t), but if i run it, the parameters 
> estimates are very large.
> Can you please run it and help me out? The code is given below.
> 
> 
> p1<-0.6;b<-2
> n=20;rr=5000
> U<-runif(n,0,1)
> for (i in 1:rr){
> 
> x<-(-log(1-U^(1/p1))/b)
> 
>  meantrue<-gamma(1+(1/p1))*b
>   meantrue
>   d<-meantrue/0.30
>   cen<- runif(n,min=0,max=d)
>   s<-ifelse(x<=cen,1,0)
>   q<-c(x,cen)
> }
>     z<-function(data, p){ 
>     shape<-p[1]
>     scale<-p[2]
>     log1<-n*sum(s)*log(shape)+ 
>n*sum(s)*log(scale)+(shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x)
> -(scale)*sum(x) + (shape)*log((exp(-(scale)*sum(x-
> (shape)*sum(s)*log((exp(-(scale)*sum(x
>   return(-log1)
>   }
>   start <- c(1,1)
>   zz<-optim(start,fn=z,data=q,hessian=T)
>   zz

1. You think you are using a (quasi) Newton method. But the default method is 
"Nelder-Mead". You should specify method explicitly for example method="BFGS". 
When you do that you will see that the results are completely different. You 
should also carefully inspect the return value of optim. In your case 
zz$convergence is 1 which means "indicates that the iteration limit maxit had 
been reached.".
When you use method="BFGS" you will get zz$ convergence is 0.

This may do what you want

zz<-optim(start, fn=z, data=q, method="BFGS", hessian=T)
zz

2. when providing examples such as yours it is a good idea to issue 
set.seed() at the start of your script to ensure reproducibility.

3. The function z does not calculate what you want: since fully formed 
expressions are terminated by a newline and since the remainder of the line 
after log1<- is a complete expression, log1 does include the value of the two 
following  lines.
See the difference between 

a <- 1
b <- 11
a
-b

and

a-
b

So you could write this

    log1 <- n*sum(s)*log(shape)+ n*sum(s)*log(scale)+
            (shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x) -(scale)*sum(x) + 
            (shape)*log((exp(-(scale)*sum(x - 
(shape)*sum(s)*log((exp(-(scale)*sum(x

and you will see rather different results.
You will have to determine if they are what you expect/desire.

A final remark on function z:

- do not calculate things like n*sum(s) repeatedly: doing something like 
A<-n*sum(s) and reusing A is more efficient.
- same thing for log((exp(-(scale)*sum(x (recalculated four times)
- same thing for sum(x)

See below for a partly rewritten function z and results with method="BFGS".
I have put a set.seed(1) at the start of your code.

good luck,

Berend

z<-function(p,data){ 
    shape<-p[1]
    scale<-p[2]
    log1 <- n*sum(s)*log(shape)+ n*sum(s)*log(scale)+
            (shape-1)*sum(s)*log(1-((exp(-(scale)*sum(x) -(scale)*sum(x) + 
            (shape)*log((exp(-(scale)*sum(x - 
(shape)*sum(s)*log((exp(-(scale)*sum(x
    return(-log1)
}

start <- c(1,1)
> z(start, data=q)
[1] -138.7918

> zz<-optim(start, fn=z, data=q, method="BFGS", hessian=T)
There were 34 warnings (use warnings() to see them)
> zz
$par
[1] 1.009614e+11 8.568498e+01

$value
[1] -1.27583e+15

$counts
function gradient 
     270       87 

$convergence
[1] 0

$message
NULL

$hessian
       [,1] [,2]
[1,] -62500    0
[2,]      0    0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread Basil Iannone

Hi Heramb,

No "Help(anova)" does not work either. Thanks for the suggestions. I put
the question out there in case anyone else was having similar problems. I
think I will throw in the towel and install the latest version of R to see
if that resolves the issue.

Thanks

On Thu, Sep 20, 2012 at 12:46 AM, Heramb Gadgil wrote:

> Try this;
>
> help(anova)
>
> I have used this in R-2.14.1
>
> It has worked fine for me. Hope it works for you as well.
>
> Best,
> Heramb
>
> On Thu, Sep 20, 2012 at 1:40 AM, Rui Barradas wrote:
>
>> Hello,
>>
>> I had a problem seeing the help pages with R 2.14.(0 or 1? I don't
>> remember) on Windows 7.
>> Then I realized that after a command like print, Rgui would first display
>> an error message saying that the temp directory used by help didn't exist.
>> The solution I've found was to manually create it whenever this happened.
>> And to issue a print statement after a missed ?function one. Apparently the
>> error message was waiting to be displayed somewhere.
>>
>> The problem was corrected with R 2.15.0 so I recommened you update your
>> installation of R.
>>
>> Hope this helps,
>>
>> Rui Barradas
>> Em 19-09-2012 19:19, Basil Iannone escreveu:
>>
>>  Dear R-help community,
>>>
>>> I am unable to access help files when using the typical
>>> "?function.of.interest" command.
>>>
>>> For example, if I type "?anova", Internet Explorer opens, but I am never
>>> connected to the usual page describing the nuances of the anova function.
>>>
>>> This is a new problem (just started occurring a few days ago). I am
>>> currently using R 2.14.1 on a Windows XP machine. As suggested by someone
>>> else on another list serve I have placed the session info below for a
>>> brief
>>> session where this occurred. NOTE: I am not trying to access any
>>> functions
>>> for libraries that are not active (for example, anova is in the default
>>> Stats package).
>>>
>>> Any suggestions on how to resolve this issue or any insights on why this
>>> may be occurring are greatly appreciated. Thanks in advance for your
>>> help.
>>>
>>>  sessionInfo()

>>> R version 2.14.1 (2011-12-22)
>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.14.1
>>>
>>>
>>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


-- 
Basil Iannone
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-3231
Fax: 312-413-2435
http://www2.uic.edu/~bianno2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] puzzling script bug

2012-09-20 Thread Robert Douglas Kinley

Windows XP (SP3)   ,   R 2.15.1  32bit

Hi ...

I have a script which fails and closes my R session.

Unfortunately, it bombs out at a different point each time I run it.

I'm guessing that it may be something to do with memory management, or
perhaps it's to do with the various .C dll's the script calls.

Has anyone come across similar problems and if so, how did you track down the 
cause ?

Pathetically grateful for any pointers ...

Cheers   Bob Kinley
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fourier series for finding average values

2012-09-20 Thread eliza botto


Dear UseRs,
i have a matrix of 365 rows and 444 columns. i drew each column of this matrix 
against the number of days in a year, which are obviously 365. now i have 444 
curves and i want to Use Fourier analysis for the approximation of the average 
values.
does anyone know how to do it?
any help in this regards will be deeply appreciated...
regards
eliza
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] effective way to return only the first argument of "which()"

2012-09-20 Thread Mike Spam

Thank you very much, especially Milan and Bert!
I will do some speedtests and fit the function to my needs.

I think the best way would be a modified function in C...
But i am not familiar enough with C. Perhaps this would be a simple
but useful extension.
If someone has a solution, i would appreciate a post in this mailing list.

Cheers and thanks to all,
Nico


2012/9/19 Bert Gunter :
> Well, following up on this observation, which can be put under the
> heading of "Sometimes vectorization can be much slower than explicit
> loops" , I offer the following:
>
>  firsti  <-function(x,k)
> {
>   i <- 1
>   while(x[i]<=k){i <- i+1}
>   i
> }
>
>> system.time(for(i in 1:100)which(x>.99)[1])
>user  system elapsed
>19.1 2.422.2
>
>> system.time(for(i in 1:100)which.max(x>.99))
>user  system elapsed
>   30.456.75   37.46
>
>> system.time(for(i in 1:100)firsti(x,.99))
>user  system elapsed
>0.030.000.03
>
> ## About a 500 - 1000 fold speedup !
>
>> firsti(x,.99)
> [1] 122
>
> It doesn't seem to scale too badly, either (whatever THAT means!):
> (Of course, the which() versions are essentially identical in timing,
> and so are omitted)
>
>> system.time(for(i in 1:100)firsti(x,.))
>user  system elapsed
>2.700.002.72
>
>> firsti(x,.)
> [1] 18200
>
> Of course, at some point, the explicit looping is awful -- with k =
> .99, the index was about 36, and the timing test took 54
> seconds.
>
> So I guess the point is -- as always -- that the optimal approach
> depends on the nature of the data. Prudence and robustness clearly
> demands the vectorized which() approaches if you have no information.
> But if you do know something about the data, then you can often write
> much faster tailored solutions. Which is hardly revelatory, of course.
>
> Cheers to all,
> Bert
>
> On Wed, Sep 19, 2012 at 8:55 AM, Milan Bouchet-Valat  
> wrote:
>> Le mercredi 19 septembre 2012 à 15:23 +, William Dunlap a écrit :
>>> The original method is faster than which.max for longish numeric vectors
>>> (in R-2.15.1), but you should check time and memory usage on your
>>> own machine:
>>>
>>> > x <- runif(18e6)
>>> > system.time(for(i in 1:100)which(x>0.99)[1])
>>>user  system elapsed
>>>   11.641.05   12.70
>>> > system.time(for(i in 1:100)which.max(x>0.99))
>>>user  system elapsed
>>>   16.382.94   19.35
>> If you the probability that such an element appears at the beginning of
>> the vector, a custom hack might well be more efficient. The problem with
>> ">", which() and which.max() is that they will go over all the elements
>> of the vector even if it's not needed at all. So you can start with a
>> small subset of the vector, and increase its size in a few steps until
>> you find the value you're looking for.
>>
>> Proof of concept (the values of n obviously need to be adapted):
>> x <-runif(1e7)
>>
>> find <- function(x, lim) {
>> len <- length(x)
>>
>> for(n in 2^(14:0)) {
>> val <- which(x[seq.int(1L, len/n)] > lim)
>>
>> if(length(val) > 0) return(val[1])
>> }
>>
>> return(NULL)
>> }
>>
>>> system.time(for(i in 1:100)which(x>0.999)[1])
>> utilisateur système  écoulé
>>   9.740   5.795  15.890
>>> system.time(for(i in 1:100)which.max(x>0.999))
>> utilisateur système  écoulé
>>  14.288   9.510  24.562
>>> system.time(for(i in 1:100)find(x, .999))
>> utilisateur système  écoulé
>>   0.017   0.002   0.019
>>> find(x, .999)
>> [1] 1376
>>
>> (Looks almost like cheating... ;-)
>>
>>
>>
>>
>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>> > -Original Message-
>>> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
>>> > On Behalf
>>> > Of Jeff Newmiller
>>> > Sent: Wednesday, September 19, 2012 8:06 AM
>>> > To: Mike Spam; r-help@r-project.org
>>> > Subject: Re: [R] effective way to return only the first argument of 
>>> > "which()"
>>> >
>>> > ?which.max
>>> > ---
>>> > Jeff NewmillerThe .   .  Go 
>>> > Live...
>>> > DCN:Basics: ##.#.   ##.#.  Live 
>>> > Go...
>>> >   Live:   OO#.. Dead: OO#..  Playing
>>> > Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
>>> > /Software/Embedded Controllers)   .OO#.   .OO#.  
>>> > rocks...1k
>>> > ---
>>> > Sent from my phone. Please excuse my brevity.
>>> >
>>> > Mike Spam  wrote:
>>> >
>>> > >Hi,
>>> > >
>>> > >I was looking for a function like "which()" but only returns the first
>>> > >argument.
>>> > >Compare:
>>> > >
>>> > >x <- c(1,2,3,4,5,6)
>>> > >y <- 4
>>> > >which(x>y)
>>> > >
>>> > >returns:
>>> > >5,6
>>> > >
>>> > >which(x>y)[1]
>>> > >returns:
>>> > >5
>>> > >
>>> > >which(x>y)[1] is e

Re: [R] Variance Inflation Factor VIC() with a matrix

2012-09-20 Thread John Fox

Dear Martin,

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Martin H. Schmidt
> Sent: Thursday, September 20, 2012 8:52 AM
> To: r-help@r-project.org
> Subject: [R] Variance Inflation Factor VIC() with a matrix
> 
> Hi everyone,
> 
> Running the vif() function from the car package like
> 
> 
>  > reg2 <- lm(CARsPur~Delay_max10+LawChange+MarketTrend_20d+MultiTrade,
> data=data.frame(VarVecPur))
>  > vif(reg2)
>  Delay_max10   LawChange MarketTrend_20d  MultiTrade
> 1.0105721.0098741.0042781.003351
> 
> 
> gives a useful result. But using the right-hand variables as a matrix in
> the following way doesn't work with the vif() function:
> 
> 
>  > reg  <- lm(CARsPur~VarVecPur)
>  > summary(reg)
> 
> Call:
> lm(formula = CARsPur ~ VarVecPur)
> 
> Residuals:
>   Min   1Q   Median   3Q  Max
> -0.72885 -0.06461  0.00493  0.06873  0.74936
> 
> Coefficients:
>Estimate Std. Error t value Pr(>|t|)
> (Intercept)  -0.037860   0.006175  -6.131 9.25e-10 ***
> VarVecPurDelay_max10  0.003661   0.001593   2.298   0.0216 *
> VarVecPurLawChange0.004679   0.006185   0.757   0.4493
> VarVecPurMarketTrend_20d  0.019015   0.001409  13.493  < 2e-16 ***
> VarVecPurMultiTrade  -0.005081   0.003129  -1.624   0.1045
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 0.1229 on 6272 degrees of freedom
> Multiple R-squared: 0.03021,Adjusted R-squared: 0.02959
> F-statistic: 48.84 on 4 and 6272 DF,  p-value: < 2.2e-16
> 
>  > vif(reg)
> Error in vif.lm(reg) : model contains fewer than 2 terms
> 
> 
> Is there a solution or a way to work around?

Not with vif() in the car package, which wants to compute generalized variance 
inflation factors (GVIFs) for multi-df terms in the model. Single-df VIFs are 
pretty simple, so you could just write your own function. Alternatively, there 
are other packages on CRAN, such as DAAG, that compute VIFs, so you might try 
one of these.

I hope this helps,
 John

---
John Fox
Senator McMaster Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada


> 
> Thank you very much in advanced.
> 
> 
> 
> --
> Kind Regards,
> 
> Martin H. Schmidt
> Humboldt University Berlin
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave - if \Sexpr{} than \SweaveInput{"my.Rnw"}

2012-09-20 Thread Yihui Xie

If you want to program Sweave documents, you can try the knitr
package. This case will be something like:

<<>>=
paper <- TRUE # or change it to FALSE
@

<>=
@

i.e. you use the logical variable 'paper' to control which child
document to include in the parent document. See
http://yihui.name/knitr/

Regards,
Yihui
--
Yihui Xie 
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA

On Thu, Sep 20, 2012 at 7:47 AM, Witold E Wolski  wrote:
> Depending on an R computation I would like to include an Sweave documents
> in the main Sweave document.
> How can I do it?
>
> So I was thinking  to use Latex features :
>
> \newif\ifpaper
>
> \ifpaper
>
> \SweaveInput{"my1.Rnw"}
> \else
>  \SweaveInput{"my2.Rnw"}
> \fi
>
> But how do I set paper to true or false given an \Sexpr ??
>
> \papertrue % or
>
> \paperfalse
>
>
> Any ideas?
>
>
> cheers
>
>
> --
> Witold Eryk Wolski
>
> Triemlistrasse 155
> 8047 Zuerich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying glm coefficients (Beginner Question)

2012-09-20 Thread David Winsemius

On Sep 20, 2012, at 6:55 AM, SirRon wrote:

> Hello,
> I am working with a dataset with three variables and one binomial parameter.
> The glm function provides coefficients for these three variables, e.g.
> -1.5 | 27.2 | -2.9
> 
> If I'm not mistaken, $fitted.values gives me an estimate of how likely my
> parameter is to be true/1 .

Not at all how I would have expressed it.

> I would like to apply these coefficients on
> other variables to predict the binomial parameter but I'm not sure how to
> make use of them.

On other instances of similarly measured variables? Then use the new data 
argument to predict().

> 
> To clarify a bit more I'm looking for a formula to calculate the chance that
> the parameter is true/1, based on the three variables/coefficients,
> something like
> 
> -1.5*V1+27.2*V2-2.9*V2

I am guessing you will be using the type="response" argument to predict(), but 
again is is not the case that this will be answer the question as you have 
expressed it and I have interpreted it. It is not going to return the 
probability that the "parameter is true", at least if the word "parameter" is 
what most people are calling "coefficient". 

?predict.glm

> I hope someone understands my awkwardly worded question and is able to help
> me out - thanks!

Awkwardly worded questions will get much better answers if they are accompanied 
by some test data.

-- 
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fortune nomination

2012-09-20 Thread Bert Gunter

On Thu, Sep 20, 2012 at 9:01 AM, David Winsemius  wrote:

(In response to an OP's aplogy for an "awkwardly worded question"):

> Awkwardly worded questions will get much better answers if they are 
> accompanied by some test data.

Fortune nomination!

Cheers,
Bert

> --
> David Winsemius, MD
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2012-09-20 Thread Rui Barradas

Maybe it is longer, but it's also more general, it issues an error if 
the tables are not 1-dim. That's where most of the function's extra 
lines are. Otherwise it's the same as your first solution. The second 
one has the problem you've mentioned.


Rui Barradas
Em 20-09-2012 16:46, Stefan Th. Gries escreveu:

Ye, but this is way longer than any of the three solutions I sent, is it not?
STG
--
Stefan Th. Gries
---
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries
---


On Thu, Sep 20, 2012 at 8:43 AM, Rui Barradas  wrote:

Hello,

The trick is to use the table's dimnames attributes. Try the following.

addTables <- function(t1, t2){
 dn1 <- dimnames(t1)
 dn2 <- dimnames(t2)
 if(length(dn1) == 1){
 dn1 <- unlist(dn1)
 dn2 <- unlist(dn2)
 dns <- sort(unique(c(dn1, dn2)))
 tsum <- array(integer(length(dns)), dim = length(dns))
 dimnames(tsum) <- list(dns)
 tsum[dn1] <- t1
 tsum[dn2] <- tsum[dn2] + t2
 }else
 stop(paste("table with", ndim, "dimensions is not implemented."))
 tsum
}


a <- c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
b <- c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")
ta <- table(a)
tb <- table(b)
rm(a, b)

addTables(ta, tb)

Hope this helps,

Rui Barradas
Em 20-09-2012 15:57, Stefan Th. Gries escreveu:

>From my book on corpus linguistics with R:

# (10)   Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")

# Of these vectors, you can create frequency lists by writing
freq.list.a<-table(a); freq.list.b<-table(b)
rm(a); rm(b)

# How do you merge these two frequency lists without merging the two
vectors first? More specifically, if I delete a and b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1

joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.list.b)) # You generate an empty vector joint.freqs (i)
that is as long as there are different types in both a and b (but note
that, as requested, this information is not taken from a or b, but
from their frequency lists) ...
names(joint.freqs)<-sort(unique(c(names(freq.list.a),
names(freq.list.b # ... and (ii) whose elements have these
different types as names.
joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
vector joint.freqs that have the same names as the frequencies in the
first frequency list are assigned the respective frequencies.

joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
# The elements of the new vector joint.freqs that have the same names
as the frequencies in the second frequency list are assigned the sum
of the values they already have (either the ones from the first
frequency list or just zeroes) and the respective frequencies.
joint.freqs # look at the result

# Another shorter and more elegant solution was proposed by Claire
Crawford (but uses a function which will only be introduced later in
the book)
freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
lists are merged into a single vector ...
joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
sum)) # ... and then the sums of all numbers that share the same names
are computed
joint.freqs # look at the result

# The shortest, but certainly not memory-efficient way to do this
involves just using the frequency lists to create one big vector with
all elements and tabulate that.
table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
freq.list.b))) # kind of cheating but possible with short vectors ...

HTH,
STG
--
Stefan Th. Gries
---
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2012-09-20 Thread Gabor Grothendieck

On Thu, Sep 20, 2012 at 10:57 AM, Stefan Th. Gries  wrote:
> >From my book on corpus linguistics with R:
>
> # (10)   Imagine you have two vectors a and b such that
> a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
> b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")
>
> # Of these vectors, you can create frequency lists by writing
> freq.list.a<-table(a); freq.list.b<-table(b)
> rm(a); rm(b)
>
> # How do you merge these two frequency lists without merging the two
> vectors first? More specifically, if I delete a and b from your
> memory,
> rm(a); rm(b)
> # how do you generate the following table only from freq.list.a and
> freq.list.b, i.e., without any reference to a and b themselves? Before
> you complain about this question as being unrealistic, consider the
> possibility that you generated the frequency lists of two corpora
> (here, a and b) that are so large that you cannot combine them into
> one (a.and.b<-c(a, b)) and generate a frequency list of that combined
> vector (table(a.and.b)) ...
> joint.freqs
> a b d e f g i j
> 3 1 3 1 5 5 1 1
>
> joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
> names(freq.list.b)) # You generate an empty vector joint.freqs (i)
> that is as long as there are different types in both a and b (but note
> that, as requested, this information is not taken from a or b, but
> from their frequency lists) ...
> names(joint.freqs)<-sort(unique(c(names(freq.list.a),
> names(freq.list.b # ... and (ii) whose elements have these
> different types as names.
> joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
> vector joint.freqs that have the same names as the frequencies in the
> first frequency list are assigned the respective frequencies.
> joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
> # The elements of the new vector joint.freqs that have the same names
> as the frequencies in the second frequency list are assigned the sum
> of the values they already have (either the ones from the first
> frequency list or just zeroes) and the respective frequencies.
> joint.freqs # look at the result
>
> # Another shorter and more elegant solution was proposed by Claire
> Crawford (but uses a function which will only be introduced later in
> the book)
> freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
> lists are merged into a single vector ...
> joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
> sum)) # ... and then the sums of all numbers that share the same names
> are computed
> joint.freqs # look at the result
>
> # The shortest, but certainly not memory-efficient way to do this
> involves just using the frequency lists to create one big vector with
> all elements and tabulate that.
> table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
> freq.list.b))) # kind of cheating but possible with short vectors ...
>

Try:

rowsum(freq.list.a.b, names(freq.list.a.b))

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread R. Michael Weylandt

On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone  wrote:
> Hi Heramb,
>
> No "Help(anova)" does not work either. Thanks for the suggestions. I put
> the question out there in case anyone else was having similar problems. I
> think I will throw in the towel and install the latest version of R to see
> if that resolves the issue.
>
> Thanks
>

"Help(anova)" should not work. R is case sensitive -- did "help(anova)" work?

Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test and Mean Ratios

2012-09-20 Thread avinash barnwal

Hi,

In Wilcoxon test , we look for medians rather than the means. Ratio of
medians should be more coherent with P value.

On Thu, Sep 20, 2012 at 6:30 PM, Ben Bolker  wrote:

> Mohamed Radhouane Aniba  gmail.com> writes:
>
> >
> >
> > Thank you Thomas,
> >
> > So you think a t-test is more adequate to use in this case ?
> >
> > Rad
>
>   No, because a t-test makes even stronger parametric assumptions.
> You were given more specific advice on stackoverflow
> http://stackoverflow.com/questions/12499687/wilcoxon-test-and-mean-ratios
> If you want to prove that there is *some* difference between the
> distributions, you're done. If you want to test for some specific
> difference, you need to think more about what kind of test you want
> to do.  Permutation tests with various test statistics are a way
> to approach that.
>
>   Ben Bolker
>
>
> >
> > On Sep 19, 2012, at 8:43 PM, Thomas Lumley  uw.edu> wrote:
> >
> > > On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
> > >  gmail.com> wrote:
> > >> Hello All,
> > >>
> > >> I am writing to ask your opinion on how to interpret this case. I
> have two
> vectors "a" and "b" that I am trying
> > to compare.
>
>   [snip]
>
> > >
> > > There's nothing conceptually strange about the Wilcoxon test showing a
> > > difference in the opposite direction to the difference in means.  It's
> > > probably easiest to think about this in terms of the Mann-Whitney
> > > version of the same test, which is based on the proportion of pairs of
> > > one observation from each group where the `a' observation is higher.
> > > Your 'c' vector has a lot more zeros, so a randomly chosen observation
> > > from 'c' is likely to be smaller than one from 'a', but the non-zero
> > > observations seem to be larger, so the mean of 'c' is higher.
> > >
> > > The Wilcoxon test probably isn't very useful in a setting like this,
> > > since its results really make sense only under 'stochastic ordering',
> > > where the shift is in the same direction across the whole
> > > distribution.
> > >
> > >  -thomas
> > >
> > > --
> > > Thomas Lumley
> > > Professor of Biostatistics
> > > University of Auckland
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Avinash Barnwal, M.Sc.
Statistics and Informatics
Department of Mathematics
IIT Kharagpur

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] puzzling script bug

2012-09-20 Thread R. Michael Weylandt

On Thu, Sep 20, 2012 at 4:34 PM, Robert Douglas Kinley
 wrote:
> Windows XP (SP3)   ,   R 2.15.1  32bit
>
> Hi ...
>
> I have a script which fails and closes my R session.
>
> Unfortunately, it bombs out at a different point each time I run it.
>
> I'm guessing that it may be something to do with memory management, or
> perhaps it's to do with the various .C dll's the script calls.

Much more likely this one.

>
> Has anyone come across similar problems and if so, how did you track down the 
> cause ?

Valgrind perhaps? Or stepping through it interactively?

See the Writing-R-Extensions manual for some tips, but it's a hard
process. You'll likely be rewarded in spending at least a little time
trying to identify which C call it is (first heuristic: if a crash is
at line n, you can immediately eliminate all the C calls after n)

Good luck,
Michael

>
> Pathetically grateful for any pointers ...
>
> Cheers   Bob Kinley
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fourier series for finding average values

2012-09-20 Thread Jeff Newmiller

If you want help understanding the theory of what you want to do, that is of 
topic here.
If you understand what you want to do, or just want to see what resources you 
can leverage in R, may I recommend the RSiteSearch function. I do think you 
should be wary of applying a tool you don't understand, particularly Discrete 
Fourier analysis, but that would be decidedly off topic here.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

eliza botto  wrote:

>
>Dear UseRs,
>i have a matrix of 365 rows and 444 columns. i drew each column of this
>matrix against the number of days in a year, which are obviously 365.
>now i have 444 curves and i want to Use Fourier analysis for the
>approximation of the average values.
>does anyone know how to do it?
>any help in this regards will be deeply appreciated...
>regards
>eliza
> 
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plot to tiff, font size problem in multiple plot figures

2012-09-20 Thread Claudia Penaloza

I am creating graphs for a publication and would like them to have the same
font size... but when I create a figure with multiple plots, the font size
decreases even though I haven't changed the tiff() resolution or
pointsize specifications, I have increased the figure size according to how
many plots it will ultimately have, and I have made sure the margins are
equivalent for single and multiple plot figures... example code below (font
size is consistent between 1x1 and 2x1 figure but decreases for 3x2 figure):

tiff("1x1.tif",
width=3,height=2.5,units="in",res=600,pointsize=8,compression="lzw",restoreConsole=T)
par(mfrow=c(1,1),mar=c(4,4,.5,.5)+0.1)
plot(x=rnorm(10),y=rnorm(10))
dev.off()

tiff("2x1.tif",
height=2.5*2,width=3,units="in",res=600,pointsize=8,compression="lzw",restoreConsole=T)
par(mfrow=c(2,1),mar=c(2,4,2.5,0.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),xaxt="n",xlab="")
par(mar=c(4,4,0.5,0.5)+0.1)
plot(x=rnorm(10),y=rnorm(10))
dev.off()

tiff("3x2.tif",
height=2.5*3,width=3*2,units="in",res=600,pointsize=8,compression="lzw",restoreConsole=T)
par(mfrow=c(3,2),mar=c(.5,4,4,0.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),xaxt="n",xlab="")
par(mar=c(.5,2,4,2.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),xaxt="n",xlab="",yaxt="n",ylab="")
par(mar=c(2.5,4,2,0.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),xaxt="n",xlab="")
par(mar=c(2.5,2,2,2.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),xaxt="n",xlab="",yaxt="n",ylab="")
par(mar=c(4.5,4,0,0.5)+0.1)
plot(x=rnorm(10),y=rnorm(10))
par(mar=c(4.5,2,0,2.5)+0.1)
plot(x=rnorm(10),y=rnorm(10),yaxt="n",ylab="")
dev.off()

Why is this happening?
Any help greatly appreciated.
claudia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate help

2012-09-20 Thread Sam Steingold

I want to count attributes of IDs:
--8<---cut here---start->8---
z <- data.frame(id=c(10,20,10,30,10,20),
a1=c("a","b","a","c","b","b"),
a2=c("x","y","x","z","z","y"),
stringsAsFactors=FALSE)
> z
  id a1 a2
1 10  a  x
2 20  b  y
3 10  a  x
4 30  c  z
5 10  b  z
6 20  b  y
--8<---cut here---end--->8---
I want to get something like
--8<---cut here---start->8---
id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
10   3 "a"  2  "b"  1   3  "x" 2   "z" 1
20   2 "b"  2   0   2  "y" 2   0
30   1 "c"  1   0   1  "z" 1   0
--8<---cut here---end--->8---
(except that I don't care what appears in the cells marked with )
I tried this:
--8<---cut here---start->8---
aggregate(z,by=list(id=z$id),function (s) {
  t <- sort(table(s),decreasing=TRUE)
  if (length(t) == 1)
list(length(s),names(t)[1],t[1],"junk",0)
  else
list(length(s),names(t)[1],t[1],names(t)[2],t[2])
 })
  id id a1 a2
1 10  3  3  3
2 20  2  2  2
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
--8<---cut here---end--->8---
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://palestinefacts.org
http://pmw.org.il http://dhimmi.com http://jihadwatch.org http://ffii.org
I'm out of my mind, but feel free to leave a message...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice dotplot reorder contiguous levels

2012-09-20 Thread Bert Gunter

I don't entirely understand what you want as an alternative. What is
wrong with relation ="same", the default?

In any case, it sounds like you'll need to write your own panel
function. If you look at panel.dotplot(), you'll see it's fairly
straightforward, so modification should not be difficult.

BTW, thanks for the reproducible example. I wouldn't have bothered at
all without it, although maybe the result is the same so far as your
concerned.

Cheers,
Bert

On Thu, Sep 20, 2012 at 7:18 AM, maxbre  wrote:
> my reproducible example
>
> test<-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("A",
> "B", "C", "D", "E"), class = "factor"), conc = c(2.32, 0.902,
> 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
> 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
> 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
> 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
> 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
> 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
> 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
> 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c("2",
> "4", "12", "24", "96", "135", "167", "168", "169"), class = "factor")),
> .Names = c("site",
> "conc", "samp.time"), row.names = c(NA, 52L), class = "data.frame")
>
>
>
> dotplot(samp.time~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "same")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
>
> my objective is to use “site” as conditioning variable but with “samp.time”
> correctly grouped by “site”; the problem here is to ensure that levels of
> “samp.time” within each “site” are contiguous as otherwise they would be not
> contiguous in the dot plot itself (i.e, avoid that sort of holes in between
> y axis categories -see dotplot -)
>
>
> I’ve been trying with this but without much success
>
> test$samp.time.new<-
>   with(test,reorder(samp.time,as.numeric(site)))
>
>
> dotplot(samp.time.new~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "free")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
> I think (I hope) a possible different solution is to create for "ylim" a
> proper character vector of different length to pass to each panel of the
> dotplot (I’m not posting this attempt because too much confused up to now)
>
> can anyone point me in the right direction?
> any help much appreciated
>
> thank you
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/lattice-dotplot-reorder-contiguous-levels-tp4643741.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2012-09-20 Thread arun

HI Stefan,
Thanks for the solutions.

Just to add 1 more:
f.a<-table(a); f.b<-table(b)
c(f.a[!names(f.a)%in%names(f.b)],f.b[!names(f.b)%in%names(f.a)],xtabs(f.a[names(f.a)%in%names(f.b)]+f.b[names(f.b)%in%names(f.a)]~
names(f.a[names(f.a)%in%names(f.b)])))

#e i j a b d f g 
#1 1 1 3 1 3 5 5 

A.K.



- Original Message -
From: Stefan Th. Gries 
To: mce...@lightminersystems.com
Cc: r-help@r-project.org
Sent: Thursday, September 20, 2012 10:57 AM
Subject: [R] (no subject)

>From my book on corpus linguistics with R:

# (10)   Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g")
b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g")

# Of these vectors, you can create frequency lists by writing
freq.list.a<-table(a); freq.list.b<-table(b)
rm(a); rm(b)

# How do you merge these two frequency lists without merging the two
vectors first? More specifically, if I delete a and b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1

joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.list.b)) # You generate an empty vector joint.freqs (i)
that is as long as there are different types in both a and b (but note
that, as requested, this information is not taken from a or b, but
from their frequency lists) ...
names(joint.freqs)<-sort(unique(c(names(freq.list.a),
names(freq.list.b # ... and (ii) whose elements have these
different types as names.
joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
vector joint.freqs that have the same names as the frequencies in the
first frequency list are assigned the respective frequencies.
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
# The elements of the new vector joint.freqs that have the same names
as the frequencies in the second frequency list are assigned the sum
of the values they already have (either the ones from the first
frequency list or just zeroes) and the respective frequencies.
joint.freqs # look at the result

# Another shorter and more elegant solution was proposed by Claire
Crawford (but uses a function which will only be introduced later in
the book)
freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
lists are merged into a single vector ...
joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
sum)) # ... and then the sums of all numbers that share the same names
are computed
joint.freqs # look at the result

# The shortest, but certainly not memory-efficient way to do this
involves just using the frequency lists to create one big vector with
all elements and tabulate that.
table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
freq.list.b))) # kind of cheating but possible with short vectors ...

HTH,
STG
--
Stefan Th. Gries
---
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line over Boxplot

2012-09-20 Thread Rui Barradas


Hello,

Works with me after correcting your lines() instruction. Your code 
doesn't say what columns to use as coordinates, just where to look for them.
Also, (1) allways explicitly close the device using dev.off(). (2) The 
grid lines were over the boxes. A way to avoid this is to plot the 
boxes, then the grid, then replot the boxes with the parameter add = 
TRUE (There's a function ?grid). And (3) you don't need the call to 
par(new=TRUE), abline() doesn't restart the graphics device.

The full code follows.


url1 <- "http://r.789695.n4.nabble.com/file/n4643736/tmax.final.text";
url2 <- "http://r.789695.n4.nabble.com/file/n4643736/temp.final.text";

pdf(file="boxplot_tmax_$$MM$DD${HH}.pdf", height=10, width=12)

soton.df = read.table(url1, header=TRUE)
gfs.df = read.table(url2, header=TRUE)

boxplot (TMAX ~ HOUR, data=soton.df,
xlab="Forecast Hour", ylab="MAX TEMP",
main="GEFS $$MM$DD ${HH}Z FORECAST MAX TEMPS",
whiskcol="red", col="red", outline=TRUE,
ylim=c(0, 100), xlim=c(1, 30),
xaxs="i", yaxs="i")

abline(h=seq(0, 100,by=5), lty=2)
abline(v=seq(1, 30, by=1), lty=2)

boxplot (TMAX ~ HOUR, data=soton.df,
xlab="Forecast Hour", ylab="MAX TEMP",
main="GEFS $$MM$DD ${HH}Z FORECAST MAX TEMPS",
whiskcol="red", col="red", outline=TRUE,
ylim=c(0, 100),xlim=c(1, 30),
xaxs="i", yaxs="i",
add = TRUE)
lines(TMAX ~ HOUR, data=gfs.df, type="o", col="green")

dev.off()

Hope this helps,

Rui Barradas
Em 20-09-2012 14:41, gfishel escreveu:

Very much a rookie at R, and have only recently started using it again so
pardon the simple question. I am trying to produce a box plot from one data
set and then overlay a line plot from another data set. The box plot data
set is made up of 20 sets of 30 data points, or 600 total data points. The
line has only 30 total data points. The box plot is plotting fine, but for
some reason, the line plot is starting at the 6th data position and running
off the screen. I tried modifying the text file so that the data repeated it
self 30 times to make the total number of lines in the file identical, but
that did not help! Here are my two datasets.

temp.final.text

tmax.final.text


And here is my script

R   --save --no-save --vanilla  <<  EOF

pdf(file="boxplot_tmax_$$MM$DD${HH}.pdf", height=10, width=12)

soton.df = read.table ("tmax.final.text", header=TRUE)
gfs.df = read.table ("greg.txt", header=TRUE)
boxplot (TMAX ~ HOUR, data=soton.df, xlab="Forecast Hour", ylab="MAX TEMP",
main="GEFS $$MM$DD ${HH}Z FORECAST MAX TEMPS", whiskcol="red",
col="red", outline=TRUE, ylim=c(0,100),xlim=c(1,30),xaxs="i",yaxs="i")
lines (data=gfs.df, type="o", col="green")
par(new=TRUE)
abline(h=seq(0,100,by=5),lty=2)
abline(v=seq(1,30,by=1),lty=2)



EOF

Thanks for  helping me out!




--
View this message in context: 
http://r.789695.n4.nabble.com/Line-over-Boxplot-tp4643736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line over Boxplot

2012-09-20 Thread Greg Snow

I expect that the coordinate system being set up and used by boxplot
is different from what you are expecting.  See the ?boxplot and ?bxp
help pages for details.  You may be able to have the boxplots drawn
where you expect by using the "at" argument (you may want to specify
"xlim" as well).

On Thu, Sep 20, 2012 at 7:41 AM, gfishel  wrote:
> Very much a rookie at R, and have only recently started using it again so
> pardon the simple question. I am trying to produce a box plot from one data
> set and then overlay a line plot from another data set. The box plot data
> set is made up of 20 sets of 30 data points, or 600 total data points. The
> line has only 30 total data points. The box plot is plotting fine, but for
> some reason, the line plot is starting at the 6th data position and running
> off the screen. I tried modifying the text file so that the data repeated it
> self 30 times to make the total number of lines in the file identical, but
> that did not help! Here are my two datasets.
>
> temp.final.text
> 
> tmax.final.text
> 
>
> And here is my script
>
> R   --save --no-save --vanilla  <<  EOF
>
> pdf(file="boxplot_tmax_$$MM$DD${HH}.pdf", height=10, width=12)
>
> soton.df = read.table ("tmax.final.text", header=TRUE)
> gfs.df = read.table ("greg.txt", header=TRUE)
> boxplot (TMAX ~ HOUR, data=soton.df, xlab="Forecast Hour", ylab="MAX TEMP",
> main="GEFS $$MM$DD ${HH}Z FORECAST MAX TEMPS", whiskcol="red",
> col="red", outline=TRUE, ylim=c(0,100),xlim=c(1,30),xaxs="i",yaxs="i")
> lines (data=gfs.df, type="o", col="green")
> par(new=TRUE)
> abline(h=seq(0,100,by=5),lty=2)
> abline(v=seq(1,30,by=1),lty=2)
>
>
>
> EOF
>
> Thanks for  helping me out!
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Line-over-Boxplot-tp4643736.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying glm coefficients (Beginner Question)

2012-09-20 Thread SirRon

Thanks for the reply! Using predict() on new data works just fine. What I'm
interested in is, if I can use the coefficients or other data, to develop my
own formula which does the same as predict().



--
View this message in context: 
http://r.789695.n4.nabble.com/Applying-glm-coefficients-Beginner-Question-tp4643737p4643769.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread Basil Iannone

I apologize. I meant to type "help(anova)" and not "Help(anova)". I am
installing the newest version later today after I complete some analyses. I
will let everyone know if doing so solves my problem.

Thanks,

On Thu, Sep 20, 2012 at 11:46 AM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone  wrote:
> > Hi Heramb,
> >
> > No "Help(anova)" does not work either. Thanks for the suggestions. I put
> > the question out there in case anyone else was having similar problems. I
> > think I will throw in the towel and install the latest version of R to
> see
> > if that resolves the issue.
> >
> > Thanks
> >
>
> "Help(anova)" should not work. R is case sensitive -- did "help(anova)"
> work?
>
> Michael
>

-- 
Basil Iannone
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-3231
Fax: 312-413-2435
http://www2.uic.edu/~bianno2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SQL query with Multicore option on R -linux

2012-09-20 Thread Madana_Babu

Hi all, 

I have the following sql query that I am executing on a machine with single
core. I want to know how can I execute the same sqery on a maching that is
running with 4 cores. Please provide me the code. 

NEW_TABLE <- rhive.query("SELECT A, B, COUNT(C) FROM TABLE_A WHERE
A>='01-01-2012'") 

Also let me know how can I leverage only 2 / 3 cores of the machine. 

Regards, 
Madana



--
View this message in context: 
http://r.789695.n4.nabble.com/SQL-query-with-Multicore-option-on-R-linux-tp4643771.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test and Mean Ratios

2012-09-20 Thread peter dalgaard


On Sep 20, 2012, at 02:43 , Thomas Lumley wrote:

> On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
>  wrote:
>> Hello All,
>> 
>> I am writing to ask your opinion on how to interpret this case. I have two 
>> vectors "a" and "b" that I am trying to compare.
>> 
>> The wilcoxon test is giving me a pvalue of 5.139217e-303 of a over b with 
>> the alternative "greater". Now if I make a summary on each of them I have 
>> the following
>> 
>>> summary(a)
>> Min.   1st Qu.Median  Mean   3rd Qu.  Max.
>> 0.000 0.0001411 0.0002381 0.0002671 0.0003623 0.0012910
>>> summary(c)
>> Min.   1st Qu.Median  Mean   3rd Qu.  Max.
>> 0.000 0.000 0.000 0.0004947 0.0002972 1.000
>> 
>> The mean ratio is then around 0.5399031 which naively goes in opposite 
>> direction of the wilcoxon test ( I was expecting to find a ratio >> 1)
>> 
> 
> There's nothing conceptually strange about the Wilcoxon test showing a
> difference in the opposite direction to the difference in means.  It's
> probably easiest to think about this in terms of the Mann-Whitney
> version of the same test, which is based on the proportion of pairs of
> one observation from each group where the `a' observation is higher.
> Your 'c' vector has a lot more zeros, so a randomly chosen observation
> from 'c' is likely to be smaller than one from 'a', but the non-zero
> observations seem to be larger, so the mean of 'c' is higher.
> 
> The Wilcoxon test probably isn't very useful in a setting like this,
> since its results really make sense only under 'stochastic ordering',
> where the shift is in the same direction across the whole
> distribution.
> 
>  -thomas

I was sure I had seen a definition where X was "larger than" Y if P(X>Y) > 
P(Yhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Applying glm coefficients (Beginner Question)

2012-09-20 Thread David Winsemius

On Sep 20, 2012, at 10:58 AM, SirRon wrote:

> Thanks for the reply! Using predict() on new data works just fine. What I'm
> interested in is, if I can use the coefficients or other data, to develop my
> own formula which does the same as predict().

Of course you can. Extract the coefficients, apply them to the new data, and 
build predictions. The code is all there to see how it's done by the Masters. 
Perhaps you need to do a bit of reading. Or do some searching. There must be 
many worked examples of constructing logistic regression estimates from first 
principles on various faculty websites aroune th world. The Posting Guide 
requests that you not consider this list an elementary statistics tutoring 
service.

-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test and Mean Ratios

2012-09-20 Thread avinash barnwal

Hi,

http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test

We can clearly see that null hypothesis is median different or not.
One way of proving non difference is P(X>Y) = P(X wrote:
>
> On Sep 20, 2012, at 02:43 , Thomas Lumley wrote:
>
>> On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
>>  wrote:
>>> Hello All,
>>>
>>> I am writing to ask your opinion on how to interpret this case. I have
>>> two vectors "a" and "b" that I am trying to compare.
>>>
>>> The wilcoxon test is giving me a pvalue of 5.139217e-303 of a over b with
>>> the alternative "greater". Now if I make a summary on each of them I have
>>> the following
>>>
 summary(a)
>>> Min.   1st Qu.Median  Mean   3rd Qu.  Max.
>>> 0.000 0.0001411 0.0002381 0.0002671 0.0003623 0.0012910
 summary(c)
>>> Min.   1st Qu.Median  Mean   3rd Qu.  Max.
>>> 0.000 0.000 0.000 0.0004947 0.0002972 1.000
>>>
>>> The mean ratio is then around 0.5399031 which naively goes in opposite
>>> direction of the wilcoxon test ( I was expecting to find a ratio >> 1)
>>>
>>
>> There's nothing conceptually strange about the Wilcoxon test showing a
>> difference in the opposite direction to the difference in means.  It's
>> probably easiest to think about this in terms of the Mann-Whitney
>> version of the same test, which is based on the proportion of pairs of
>> one observation from each group where the `a' observation is higher.
>> Your 'c' vector has a lot more zeros, so a randomly chosen observation
>> from 'c' is likely to be smaller than one from 'a', but the non-zero
>> observations seem to be larger, so the mean of 'c' is higher.
>>
>> The Wilcoxon test probably isn't very useful in a setting like this,
>> since its results really make sense only under 'stochastic ordering',
>> where the shift is in the same direction across the whole
>> distribution.
>>
>>  -thomas
>
> I was sure I had seen a definition where X was "larger than" Y if P(X>Y) >
> P(Y emphasizing that that is what the Wilcoxon test tests for, not whether the
> means differ, nor whether the medians do. As a counterexample of the latter,
> try
>
> x <- rep(0:1, c(60,40))
> y <- rep(0:1, c(80,20))
> wilcox.test(x,y)
> median(x)
> median(y)
>
> (and the "location shift" reference in wilcox.test output is a bit of a red
> herring.)
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Avinash Barnwal, M.Sc.
Statistics and Informatics
Department of Mathematics
IIT Kharagpur

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ES with time varying (GARCH model) under nonnormal (using cornish fisher expansion)

2012-09-20 Thread Eko andryanto Prakasa

Hiii
 
I have tried to measure ES with cornish fisher expansion using 
PerformanceAnalytics package, but i still confuse because to measure volatility 
i use GARCH model and i don't know how to consolidate it with ES in 
PerformanceAnalytics package..
 
i have measured ES under normality using formula 
ES=sqrt(S)*dnorm(qnorm(p))*value
Where S is volatility that I got from GARCH model; p is probability that is 5%.
 
and to get ES under nonnormal, I modified the formula above become
ES=sqrt(S)*dnorm(ZES)*value
where ZES is alpha prime and obtained from 
   ZES  = Zscore-(1/6*(p square -1)*skewness).
but the result is less than VaR (that its volatility used GARCH model) so its 
not reasonable, cause ES should larger than VaR
 
Do you have any idea, suggesition or solution ?
 
 
Thanks for the attention...
 
 
Eko
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread Basil Iannone

Okay. I installed the newest version of R and I still cannot link to help
files. Is anyone else having this problem? Again, I am using Window's XP.

Thanks for the help.

On Thu, Sep 20, 2012 at 12:23 PM, Basil Iannone  wrote:

> I apologize. I meant to type "help(anova)" and not "Help(anova)". I am
> installing the newest version later today after I complete some analyses. I
> will let everyone know if doing so solves my problem.
>
> Thanks,
>
>
> On Thu, Sep 20, 2012 at 11:46 AM, R. Michael Weylandt <
> michael.weyla...@gmail.com> wrote:
>
>> On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone  wrote:
>> > Hi Heramb,
>> >
>> > No "Help(anova)" does not work either. Thanks for the suggestions. I put
>> > the question out there in case anyone else was having similar problems.
>> I
>> > think I will throw in the towel and install the latest version of R to
>> see
>> > if that resolves the issue.
>> >
>> > Thanks
>> >
>>
>> "Help(anova)" should not work. R is case sensitive -- did "help(anova)"
>> work?
>>
>> Michael
>>
>
>
>
> --
> Basil Iannone
> University of Illinois at Chicago
> Department of Biological Sciences (MC 066)
> 845 W. Taylor St.
> Chicago, IL  60607-7060
> Email: bian...@uic.edu
> Phone: 312-355-3231
> Fax: 312-413-2435
> http://www2.uic.edu/~bianno2
>



-- 
Basil Iannone
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-3231
Fax: 312-413-2435
http://www2.uic.edu/~bianno2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave - if \Sexpr{} than \SweaveInput{"my.Rnw"}

2012-09-20 Thread Duncan Murdoch

On 20/09/2012 9:05 AM, Duncan Murdoch wrote:

On 20/09/2012 8:47 AM, Witold E Wolski wrote:
> Depending on an R computation I would like to include an Sweave documents
> in the main Sweave document.
> How can I do it?
>
> So I was thinking  to use Latex features :
>
> \newif\ifpaper
>
> \ifpaper
>
> \SweaveInput{"my1.Rnw"}
> \else
>   \SweaveInput{"my2.Rnw"}
> \fi
>
> But how do I set paper to true or false given an \Sexpr ??
>
> \papertrue % or
>
> \paperfalse
>
>
> Any ideas?

The SweaveInput directives are processed before any expressions are
evaluated, so you can't do it that way.  You can have Sweave chunks emit
LaTex code, so this might achieve a similar effect:

<>=
if ( test ) name <- "my1"
else name <- "my2"

Sweave( paste0(name, ".Rnw") )
paste0("\\input{", name, ".tex}")
@

I've never tried having a Sweave chunk call Sweave(), so there might be
troubles there, and you might only be able to input .tex files, not Rnw
files.

I was curious, so I tried this.  It's fine to run Sweave in a code 
chunk, but it prints a fair bit by default, so the chunk up above won't 
work exactly as written.  You need to cat() the \input line, and tell 
Sweave not to print anything, or just ignore what it prints.  I think  
this would do it:

<>=

if ( test ) name <- "my1"

else name <- "my2"

Sweave( paste0(name, ".Rnw"), quiet=TRUE )

cat( paste0("\\input{", name, ".tex}") )

@

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wilcoxon Test and Mean Ratios

2012-09-20 Thread Thomas Lumley

On Fri, Sep 21, 2012 at 6:43 AM, avinash barnwal
 wrote:
> Hi,
>
> http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
>
> We can clearly see that null hypothesis is median different or not.
> One way of proving non difference is P(X>Y) = P(X ordered.

Avinash.  No.

Firstly, the Wikipedia link is for the WIlcoxon signed rank test,
which is a different test and so is irrelevant. Even if the
signed-rank test were the one being discussed, you are still
incorrect. The signed rank test is on the median of differences, not
the difference in medians.  These are not the same, and need not even
be in the same direction.

Secondly, it is easy to establish that the WIlcoxon rank sum test need
not agree with the ordering in  medians, just by looking at examples,
as Peter showed

Thirdly,  there is a well-known demonstration originally due to Brad
Efron, "Efron's non-transitive dice', which implies that the
Mann-Whitney U test (which *is* equivalent to the Wilcoxon rank-sum
test) need not agree with the ordering given by *any* one-sample
summary statistic.

In this case, assuming the sample sizes are not too small (which looks
plausible given the p-value), the question is what summary the
original poster want's to compare: the mean (in which case the t-test
is the only option) or some other summary.  It's not possible to work
this out from the distribution of the data, so we need to ask the
original poster.  With reasonably large sample sizes he can get a
permutation test and bootstrap confidence interval for any summary
statistic of interest, but for the mean these will just reduce to the
t-test.

Rank tests (apart from Mood's test for quantiles, which has different
problems) can really behave very strangely in the absence of
stochastic ordering, because without stochastic ordering there is no
non-parametric way to define the direction of difference between two
samples.  It's important to remember that all the beautiful theory for
rank tests was developed under the (much stronger) a location shift
model: the distribution can have any shape, but the shape is assumed
to be identical in the two groups.  Or, as one of my colleagues puts
it "you don't know whether the treatment raises or lowers the outcome,
but you know it doesn't change anything else".

Knowledgeable and sensible statisticians who like the Wilcoxon test
(Frank Harrell comes to mind) like it because they believe stochastic
ordering is a reasonable assumption in the problems they work in, not
because they think you can do non-parametric testing in its absence.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice dotplot reorder contiguous levels

2012-09-20 Thread Richard M. Heiberger

Is result3 what you are looking for?



test<-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("A",
"B", "C", "D", "E"), class = "factor"), conc = c(2.32, 0.902,
0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c("2",
"4", "12", "24", "96", "135", "167", "168", "169"), class = "factor")),
.Names = c("site",
"conc", "samp.time"), row.names = c(NA, 52L), class = "data.frame")

result0 <-
dotplot(samp.time~conc|site, data=test,
scales=list(x=list(log=10), y = list(relation = "free")),
layout=c(1,5), strip=FALSE, strip.left=TRUE
)
result0


test$sample.time <- as.numeric(as.character(test$samp.time))

result1 <-
xyplot(sample.time ~ conc | site,
   data=test,
   scales=list(x=list(log=10), y = list(relation = "same")),
   layout=c(1,5), strip=FALSE, strip.left=TRUE,
   panel=panel.dotplot)
result1

result2 <-
update(result1, panel=function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.rug(y=y)
}
)
result2

trellis.par.get("superpose.line")
new.col <- c(trellis.par.get("superpose.line")$col, "darkgreen", "darkblue")
trellis.par.set(superpose.line=list(col=new.col),
superpose.symbol=list(col=new.col))


result3 <-
xyplot(sample.time ~ conc | site,
   groups=samp.time, pch=16,
   data=test,
   scales=list(x=list(log=10), y = list(relation =
"same"), alternating=1),
   layout=c(1,5), strip=FALSE, strip.left=TRUE,
   par.settings = list(clip = list(panel = "off")),
   panel=function(x, y, ...) {
 panel.xyplot(x, y, ...)
 gg <- list(...)$groups
 subs <- list(...)$subscripts
 col=trellis.par.get("superpose.line")$col
 panel.rug(y=unique(y),
col=unique(col[match(gg[subs], levels(gg))]))
 for (uy in unique(y))
  panel.axis(side="right", at=uy, outside=TRUE,
 text.col=col[match(uy, levels(gg))],
 line.col=col[match(uy, levels(gg))])
 }
   )
result3


On 9/20/12, maxbre  wrote:
> my reproducible example
>
> test<-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("A",
> "B", "C", "D", "E"), class = "factor"), conc = c(2.32, 0.902,
> 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
> 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
> 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
> 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
> 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
> 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
> 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
> 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c("2",
> "4", "12", "24", "96", "135", "167", "168", "169"), class = "factor")),
> .Names = c("site",
> "conc", "samp.time"), row.names = c(NA, 52L), class = "data.frame")
>
>
>
> dotplot(samp.time~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "free")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
>
> my objective is to use “site” as conditioning variable but with “samp.time”
> correctly grouped by “site”; the problem here is to ensure that levels of
> “samp.time” within each “site” are contiguous as otherwise they would be not
> contiguous in the dot plot itself (i.e, avoid that sort of holes in between
> y axis categories -see dotplot -)
>
>
> I’ve been trying with this but without much success
>
> test$samp.time.new<-
>   with(test,reorder(samp.time,as.numeric(site)))
>
>
> dotplot(samp.time.new~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "free")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
> I think (I hope) a possible different solution is to create for "ylim" a
> proper character vector of different length to pass to each panel of the
> dotplot (I’m not posting this attempt because too much confused up to now)
>
> c

Re: [R] issue accessing help files

2012-09-20 Thread Duncan Murdoch

On 20/09/2012 2:14 PM, Basil Iannone wrote:

Okay. I installed the newest version of R and I still cannot link to help
files. Is anyone else having this problem? Again, I am using Window's XP.

I don't think you've told us what output you get when you ask for help.  
Here's what I see from ?mean:

First, this appears in the R console:

starting httpd help server ... done

then the browser opens, with something like this showing:

http://127.0.0.1:25567/library/base/html/mean.html

You're not getting the second part; do you get the first?

If so, here's something to try.  From with R, print the value of 
tools:::httpdPort:

> tools:::httpdPort
[1] 25567

You'll probably get a different number than I did.  Then, with R still 
open, go to IE, and enter the URL I give above (i.e. 
http://127.0.0.1:25567/library/base/html/mean.html), with 25567 replaced 
by whatever port number you have.  Tell us what happens.

In the meantime, you can go back to the old text style help with

options(help_type="text")

You won't get links between pages or some other fancy stuff from the 
HTML pages, but you should get to see something.

Duncan Murdoch

Thanks for the help.

On Thu, Sep 20, 2012 at 12:23 PM, Basil Iannone  wrote:

> I apologize. I meant to type "help(anova)" and not "Help(anova)". I am
> installing the newest version later today after I complete some analyses. I
> will let everyone know if doing so solves my problem.
>
> Thanks,
>
>
> On Thu, Sep 20, 2012 at 11:46 AM, R. Michael Weylandt <
> michael.weyla...@gmail.com> wrote:
>
>> On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone  wrote:
>> > Hi Heramb,
>> >
>> > No "Help(anova)" does not work either. Thanks for the suggestions. I put
>> > the question out there in case anyone else was having similar problems.
>> I
>> > think I will throw in the towel and install the latest version of R to
>> see
>> > if that resolves the issue.
>> >
>> > Thanks
>> >
>>
>> "Help(anova)" should not work. R is case sensitive -- did "help(anova)"
>> work?
>>
>> Michael
>>
>
>
>
> --
> Basil Iannone
> University of Illinois at Chicago
> Department of Biological Sciences (MC 066)
> 845 W. Taylor St.
> Chicago, IL  60607-7060
> Email: bian...@uic.edu
> Phone: 312-355-3231
> Fax: 312-413-2435
> http://www2.uic.edu/~bianno2
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] VarBrul in R

2012-09-20 Thread Trevor Jenkins

Several years ago there were R implementations of a socio-linguistics
analysis method called Variable Rule Analysis namely rbrul and r-varb. Both
neither of the sites listed (in the method's WikiPedia page
http://en.wikipedia.org/wiki/Variable_rules_analysis ) appear to be online
any more (one was at UPenn and the other at Indiana). Does anyone know a)
whether the code for either or both of these implementations survives out
there and b) is anyone maintaining these implementations?

There are no listings for either of them at R Forge. And no listing for
varbrul, which is the name for the original Fortran program. (Even that
Fortran source can't be found online anymore.)

Regards, Trevor.

<>< Re: deemed!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variance Inflation Factor VIC() with a matrix

2012-09-20 Thread Michael Friendly

You've stumbled across the answer to your question --

while lm() supports y~X formulas without a data=argument
and y~ X1+X2+X3 formulas with one, you can't depend on
all contributed functions to do the same.

As John pointed out, the advantage of car::vif over other
implementations is that it correctly handles the cases
of factors, polynomial terms, etc. for which generalized
VIF is more useful, and this is most easily accommodated
with the formula interface.

The matrix interface takes less typing, but sometimes
leaves you wondering later what you actually had in VarVecPur.

-Michael

On 9/20/2012 8:52 AM, Martin H. Schmidt wrote:

Hi everyone,

Running the vif() function from the car package like

 > reg2 <- lm(CARsPur~Delay_max10+LawChange+MarketTrend_20d+MultiTrade,
data=data.frame(VarVecPur))
 > vif(reg2)
 Delay_max10   LawChange MarketTrend_20d  MultiTrade
1.0105721.0098741.0042781.003351

gives a useful result. But using the right-hand variables as a matrix in
the following way doesn't work with the vif() function:

 > reg  <- lm(CARsPur~VarVecPur)
 > summary(reg)

Call:
lm(formula = CARsPur ~ VarVecPur)

Residuals:
  Min   1Q   Median   3Q  Max
-0.72885 -0.06461  0.00493  0.06873  0.74936

Coefficients:
   Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.037860   0.006175  -6.131 9.25e-10 ***
VarVecPurDelay_max10  0.003661   0.001593   2.298   0.0216 *
VarVecPurLawChange0.004679   0.006185   0.757   0.4493
VarVecPurMarketTrend_20d  0.019015   0.001409  13.493  < 2e-16 ***
VarVecPurMultiTrade  -0.005081   0.003129  -1.624   0.1045
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1229 on 6272 degrees of freedom
Multiple R-squared: 0.03021,Adjusted R-squared: 0.02959
F-statistic: 48.84 on 4 and 6272 DF,  p-value: < 2.2e-16

 > vif(reg)
Error in vif.lm(reg) : model contains fewer than 2 terms

Is there a solution or a way to work around?

Thank you very much in advanced.

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fortune nomination

2012-09-20 Thread Achim Zeileis


On Thu, 20 Sep 2012, Bert Gunter wrote:


On Thu, Sep 20, 2012 at 9:01 AM, David Winsemius  wrote:

(In response to an OP's aplogy for an "awkwardly worded question"):


Awkwardly worded questions will get much better answers if they are accompanied 
by some test data.


Fortune nomination!


Yes, definitely :-) Added to the devel-version on R-Forge.
thx,
Z


Cheers,
Bert


--
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] effective way to return only the first argument of "which()"

2012-09-20 Thread R. Michael Weylandt

On Thu, Sep 20, 2012 at 4:39 PM, Mike Spam  wrote:
> Thank you very much, especially Milan and Bert!
> I will do some speedtests and fit the function to my needs.
>
> I think the best way would be a modified function in C...
> But i am not familiar enough with C. Perhaps this would be a simple
> but useful extension.
> If someone has a solution, i would appreciate a post in this mailing list.
>
> Cheers and thanks to all,
> Nico
>

5 hours and Dirk hasn't taken the bait? I suppose I'll give it a try,
though my Rcpp-fu is not great:

#

library(inline)
library(Rcpp)

## which.first() for R
which.first <- cxxfunction(signature(x = 'logical'), '
  NumericVector xx(x);
   // Rcpp magic which makes an integer vector xx from the SEXP x
  int i = 0;

  while(xx[i] != 1){
i++;
  }
  return(wrap(i + 1)); // Remember, c++ indices are 0-based
', plugin = "Rcpp")

##

This gives the first value for which x is true. If it's a specific
condition you are evaluating, you could push that logic down to C++
and put it inside the while loop as well to save time there but your
original post didn't say what the logic was.

Note that using this will require a working R development environment,
which is harder on some systems than others.

Cheers,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Line over Boxplot

2012-09-20 Thread gfishel

Thanks for your help! Unfortunately, I am now getting this:

> pdf(file="boxplot_tmax_2012091912.pdf", height=10, width=12) 
> 
> soton.df = read.table ( tmax.final.text, header=TRUE ) 
Error in read.table(tmax.final.text, header = TRUE) : 
  object 'tmax.final.text' not found
Execution halted

The file is there, has headers, and is located in the directory I am running
R in! Now I am really stumped!

Greg



--
View this message in context: 
http://r.789695.n4.nabble.com/Line-over-Boxplot-tp4643736p4643791.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate help

2012-09-20 Thread arun

Hi,
Try this:

z1<-aggregate(z,list(id=z$id),FUN=paste,sep=",")
dat1<-data.frame(id=z1[,1],a1.total=unlist(lapply(z1[,3],length)),a1.val1=unique(z$a1),a1.num=unlist(lapply(lapply(z1[,3],table),`[`,1)),a1.val2=unlist(lapply(z1[,3],`[`,3)),a1.num2=unlist(lapply(lapply(z1[,3],table),`[`,2)),a2.total=unlist(lapply(z1[,4],length)),a2.val1=unique(z$a2),a2.num=unlist(lapply(lapply(z1[,4],table),`[`,1)),a2.val2=unlist(lapply(z1[,4],`[`,3)),a2.num2=unlist(lapply(lapply(z1[,4],table),`[`,2)))
dat1

# id a1.total a1.val1 a1.num a1.val2 a1.num2 a2.total a2.val1 a2.num a2.val2
#0 10    3   a  2   b   1    3   x  2   z
#1 20    2   b  2      NA    2   y  2    
#2 30    1   c  1      NA    1   z  1    
#  a2.num2
#0   1
#1  NA
#2  NA
#It is not an elegant way!


A.K.



- Original Message -
From: Sam Steingold 
To: r-help@r-project.org
Cc: 
Sent: Thursday, September 20, 2012 2:06 PM
Subject: [R] aggregate help

I want to count attributes of IDs:
--8<---cut here---start->8---
z <- data.frame(id=c(10,20,10,30,10,20),
                a1=c("a","b","a","c","b","b"),
                a2=c("x","y","x","z","z","y"),
                stringsAsFactors=FALSE)
> z
  id a1 a2
1 10  a  x
2 20  b  y
3 10  a  x
4 30  c  z
5 10  b  z
6 20  b  y
--8<---cut here---end--->8---
I want to get something like
--8<---cut here---start->8---
id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
10   3     "a"      2      "b"      1       3      "x"     2       "z"     1
20   2     "b"      2           0       2      "y"     2           0
30   1     "c"      1           0       1      "z"     1           0
--8<---cut here---end--->8---
(except that I don't care what appears in the cells marked with )
I tried this:
--8<---cut here---start->8---
aggregate(z,by=list(id=z$id),function (s) {
  t <- sort(table(s),decreasing=TRUE)
  if (length(t) == 1)
    list(length(s),names(t)[1],t[1],"junk",0)
  else
    list(length(s),names(t)[1],t[1],names(t)[2],t[2])
})
  id id a1 a2
1 10  3  3  3
2 20  2  2  2
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
--8<---cut here---end--->8---
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://palestinefacts.org
http://pmw.org.il http://dhimmi.com http://jihadwatch.org http://ffii.org
I'm out of my mind, but feel free to leave a message...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question on assigning an argument in a function that is create by the function itself

2012-09-20 Thread Benjamin Caldwell

Hi,

I need some help with making a function a bit more elegant. How would you
all suggest avoiding the problem I've made myself below - I've written a
function that creates a temporary matrix by subseting a larger one I assign
it. I then call vectors from that matrix, add each item in the vector to
create a cumulative vector for each factor, and then patch them all back
together.

That's really an aside, because what I want to know is: how do I feed this
or any function via its arguements the vectors I want it to deal with in
advance, if those vectors are part of a matrix created in the function. My
example is below.

Thanks very much!

Ben Caldwell

#make some data

a<-rep("a",9)
b<-rep("b",9)
c<-rep("c",9)

level3<-c(a,b,c)
abc
d1<-9:1
set.seed(20)
d2<-rnorm(9,1,.1)
datas1<-d1*d2

e1<-rep(2.5,4)
e2<-rep(10,5)
datas2<-c(e1,e2)


number<-rep(1:9,3)
dummya<-data.frame(datas1,datas2,number)

dummya
dummy <- data.frame(level3,dummya)
dummy

#first function
cumulative <- function(v,x) {
b <- rep (2,length(v))

o<-order(x)
v <-v[o]

n<-length(v)
 b[1] <- v[1]
 for (i in 2:n) {
 b[i] <- v[i] + b[i-1]
}
 b
}

#function I wish I could only write once with four arguments (two for v and
x) not two
cut.paste.1 <- function(datas, level3) {

n <- length(unique(level))
sat <- NULL

for (i in 1:n) {
ref.frame<- datas[unique(level)[i] == level,]
v <- ref.frame$datas1 # but I have to modify it below
x <- ref.frame$number
sat<-append(sat,cumulative (v,x))
}
sat
}

cut.paste.1(dummy,level3)

cut.paste.2 <- function(datas, level3) {

n <- length(unique(level))
sat <- NULL

for (i in 1:n) {
ref.frame<- datas[unique(level)[i] == level,]
v <- ref.frame$datas2 # here, but I wish I could make it an arguement for
the function
x <- ref.frame$number
sat<-append(sat,cumulative (v,x))
}
sat
}

cut.paste.2(dummy,level3)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] qplot: plotting precipitation data

2012-09-20 Thread Winston Chang

It was a bit hard for me to follow the thread and figure out exactly
what the problem is that you're having, but I think it has something
to do with the ticks on the x axis not appearing in the correct order?

It's probably related to this issue:
https://github.com/hadley/ggplot2/issues/577
I believe it's happening because the x scale gets "trained" separately
on xmin and xmax values, and if that happens when one of them
_doesn't_ have all the factor levels, the factor levels get
recomputed, and are placed in lexicographical order.

One workaround is to use scale_x_discrete(drop=FALSE).

-Winston

mydata <- structure(list(chrom = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label =
c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7",
"chr8", "chr9", "chrX"), class = "factor"), start = c(5291000L, 10988025L,
11767950L, 11840900L, 12267450L, 12276675L), end = c(5291926L, 10988526L,
11768676L, 11841851L, 12268076L, 12277051L), peak = c(8L, 7L, 8L, 8L, 12L, 7L)),
.Names = c("chrom", "start", "end", "peak" ), row.names = c(NA, -6L), class =
"data.frame")

# Continuous x axis - rects are very, very narrow, but they're there
ggplot(mydata) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak))

# Continuous x axis and geom_bar: Width of each bar is determined by
# resolution of the data. Still narrow, but not as much.
ggplot(mydata) +
  geom_bar(aes(x = start, y = peak), stat="identity")


unique(mydata$start)
unique(mydata$end)

# Convert start and end to factors with the same set of levels
levels <- sort(unique(c(mydata$start, mydata$end)))
mydata$start <- factor(mydata$start, levels = levels)
mydata$end <- factor(mydata$end, levels = levels)

# X ticks appear in wrong order
ggplot(mydata) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak))

# Can specify the limits directly
ggplot(mydata) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak)) +
  xlim(as.character(levels))

# Or can use scale_x_discrete(drop = FALSE)
ggplot(mydata) +
  geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak)) +
  scale_x_discrete(drop = FALSE)


On Thu, Sep 20, 2012 at 7:19 AM, Hadley Wickham  wrote:
> Hmmm, I'm not sure what the problem is, but I suspect it's related to
> the fact the xmin and xmax have different factors levels and there are
> some bugs in ggplot2 related to combining factors in some situations
> (it's basically impossible to always do it right).
>
> Explicitly ensuring the levels were the same doesn't help, but setting
> the xlims does.  Winston, is this related to some of the other bugs
> we've been working on lately?
>
> mydata <- structure(list(chrom = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label =
> c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
> "chr17", "chr18", "chr19", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7",
> "chr8", "chr9", "chrX"), class = "factor"), start = c(5291000L, 10988025L,
> 11767950L, 11840900L, 12267450L, 12276675L), end = c(5291926L, 10988526L,
> 11768676L, 11841851L, 12268076L, 12277051L), peak = c(8L, 7L, 8L, 8L, 12L, 
> 7L)),
> .Names = c("chrom", "start", "end", "peak" ), row.names = c(NA, -6L), class =
> "data.frame")
>
> ggplot(mydata) +
>   geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak))
>
> unique(mydata$start)
> unique(mydata$end)
>
> levels <- sort(unique(c(mydata$start, mydata$end)))
> mydata$start <- factor(mydata$start, levels = levels)
> mydata$end <- factor(mydata$end, levels = levels)
>
> ggplot(mydata, aes(x = start)) +
>   geom_rect(aes(xmin = start, xmax = end, ymin = 0, ymax = peak)) +
>   xlim(as.character(levels))
>
>
> On Sun, Sep 16, 2012 at 11:11 AM, Rui Barradas  wrote:
>> Maybe a bug in ggplot2::geom_rect?
>>
>> I'm Cceing this to Hadley Wickham, maybe he has an answer.
>>
>> Rui Barradas
>>
>> Em 16-09-2012 17:04, John Kane escreveu:

 -Original Message-
 From: ruipbarra...@sapo.pt
 Sent: Sun, 16 Sep 2012 13:13:47 +0100
 To: jrkrid...@inbox.com
 Subject: Re: [R] qplot: plotting precipitation data

 Hello,

 Relative to the op's "request" for rectangls, I'm not understanding them.
>>>
>>> Neither am I really, I just googled a couple of sites for possible
>>> "chromatin precipitation" graphs and since the OP was not sure of the name
>>> of the geom made the assumption that they wanted a bar chart as it seemed
>>> like the simplest graph matching the 'rectanngles" statement.  I was
>>> assuming a terminology or language problem here and I could not see any
>>> reason the OP wanted purely rectangles.
>>>
 In your plot using geom_bar, the levels of as.factor(start) are sorted
 ascending. If both

   > as.factor(mydata$start)
 [1] 5291000  10988025 11767950 11840900 12267450 12276675
 Levels: 5291000 10988025 11767950 11840900 12267450 12276675
   > as.factor(mydata$end)
 [1] 5291926  10988526 11768676 1

Re: [R] Line over Boxplot

2012-09-20 Thread Peter Langfelder

Enclose the file name tmax.final.text in quotes. Otherwise R is
looking for a variable named tmax.final.text, not the file name named
"tmax.final.text".

HTH

Peter

On Thu, Sep 20, 2012 at 1:02 PM, gfishel  wrote:
> Thanks for your help! Unfortunately, I am now getting this:
>
>> pdf(file="boxplot_tmax_2012091912.pdf", height=10, width=12)
>>
>> soton.df = read.table ( tmax.final.text, header=TRUE )
> Error in read.table(tmax.final.text, header = TRUE) :
>   object 'tmax.final.text' not found
> Execution halted
>
> The file is there, has headers, and is located in the directory I am running
> R in! Now I am really stumped!
>
> Greg
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Line-over-Boxplot-tp4643736p4643791.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question on assigning an argument in a function that is create by the function itself

2012-09-20 Thread Rui Barradas


Hello,

Inline.
Em 20-09-2012 22:48, Benjamin Caldwell escreveu:

Hi,

I need some help with making a function a bit more elegant.


Yes you do!
Below, your first function, for instance, becomes a one liner.
Trick: R is vectorized. Use functions that act on whole vectors, 
avoiding loops.



  How would you
all suggest avoiding the problem I've made myself below - I've written a
function that creates a temporary matrix


No, not a matrix, a data.frame. They're very different, both 
conceptually and in implementation.
A matrix is just a folded vector. To give a vector a 'dim' attribute 
changes its operative nature. It becomes the R implementation of the 
_linear_algebra_ concept of the same name. See ?matrix (and also ?array).
A data.frame is a special type of ?list. It's R's implementation of the 
_statistical_ concept of variables and observations.



  by subseting a larger one I assign
it. I then call vectors from that matrix, add each item in the vector to
create a cumulative vector for each factor, and then patch them all back
together.

That's really an aside, because what I want to know is: how do I feed this
or any function via its arguements the vectors I want it to deal with in
advance, if those vectors are part of a matrix created in the function. My
example is below.


Now some code.


# first function (no longer needed)
cumul2 <- function(v, x) cumsum( v[order(x)] )

# function to replace cut.paste.1
# (two for v and x) not two
c.p.1 <- function(datas, level) {
sp <- split(datas, level)
sm <- sapply(sp, function(x) cumsum(x$datas1))
as.vector(sm)
}

c.p.1(dummy, level3)

# function to replace both cut.paste.1 and cut.paste.2
cut_paste <- function(datas, column, level) {
sp <- split(datas, level)
sm <- sapply(sp, function(x) cumsum( x[[column]] ))
as.vector(sm)
}

cut.paste.2(dummy, level3)
cut_paste(dummy, "datas2", level3)

cut.paste.1(dummy, level3)
cut_paste(dummy, "datas1", level3)


Hope this helps,

Rui Barradas


Thanks very much!

Ben Caldwell

#make some data

a<-rep("a",9)
b<-rep("b",9)
c<-rep("c",9)

level3<-c(a,b,c)
abc
d1<-9:1
set.seed(20)
d2<-rnorm(9,1,.1)
datas1<-d1*d2

e1<-rep(2.5,4)
e2<-rep(10,5)
datas2<-c(e1,e2)


number<-rep(1:9,3)
dummya<-data.frame(datas1,datas2,number)

dummya
dummy <- data.frame(level3,dummya)
dummy

#first function
cumulative <- function(v,x) {
b <- rep (2,length(v))

o<-order(x)
v <-v[o]

n<-length(v)
  b[1] <- v[1]
  for (i in 2:n) {
  b[i] <- v[i] + b[i-1]
}
  b
}

#function I wish I could only write once with four arguments (two for v and
x) not two
cut.paste.1 <- function(datas, level3) {

n <- length(unique(level))
sat <- NULL

for (i in 1:n) {
ref.frame<- datas[unique(level)[i] == level,]
v <- ref.frame$datas1 # but I have to modify it below
x <- ref.frame$number
sat<-append(sat,cumulative (v,x))
}
sat
}

cut.paste.1(dummy,level3)

cut.paste.2 <- function(datas, level3) {

n <- length(unique(level))
sat <- NULL

for (i in 1:n) {
ref.frame<- datas[unique(level)[i] == level,]
v <- ref.frame$datas2 # here, but I wish I could make it an arguement for
the function
x <- ref.frame$number
sat<-append(sat,cumulative (v,x))
}
sat
}

cut.paste.2(dummy,level3)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread Duncan Murdoch


On 12-09-20 6:58 PM, Basil Iannone wrote:

Hi Duncan,

The output I see after typing ?mean is

starting httpd help server ... done

Then the browser opens but goes nowhere. I just see a black IE page with no
address in the address bar.

OK. I tried what you suggested. I typed the following

tools:::httpdPort
[1] 17081

I then pasted the following link into my IE address bar (replacing the
25567 with 17081).

http://127.0.0.1:17081/library/base/html/mean.html

Doing so took me to the page describing the mean.

Thanks for the help. Please keep me posted as to if the above information
helps you out. In the mean time I will access the files as you suggested.


That tells me that the server is running properly, but somehow IE isn't 
getting the message that it should go to that page.


Could you try the same procedure (the port might be different next time, 
better check), but instead of starting IE and pasting the URL there, 
paste the URL into the Run box in the start menu?  That does something 
pretty close to what R does.  If it fails the way R did, then the 
problem is with IE, not with R; if that works, then likely the problem 
is with something in R.


If it does show up as a problem in R, could you run the following:

setInternet2(NA) # reports the proxy setting
setInternet2(FALSE) # turns it off
?mean

I don't normally use a proxy.  When I turn it on, things are very slow, 
but they still work.  (I assume the system looks for a proxy, then gives 
up and runs the regular code.)  It could be something like that is 
happening to you, or perhaps you are running a proxy, and it doesn't 
know what to do with the local connection that R is asking for.


Duncan Murdoch



On Thu, Sep 20, 2012 at 2:22 PM, Duncan Murdoch wrote:


On 20/09/2012 2:14 PM, Basil Iannone wrote:


Okay. I installed the newest version of R and I still cannot link to help
files. Is anyone else having this problem? Again, I am using Window's XP.



I don't think you've told us what output you get when you ask for help.
  Here's what I see from ?mean:

First, this appears in the R console:

starting httpd help server ... done

then the browser opens, with something like this showing:

http://127.0.0.1:25567/**library/base/html/mean.html

You're not getting the second part; do you get the first?

If so, here's something to try.  From with R, print the value of
tools:::httpdPort:


tools:::httpdPort

[1] 25567

You'll probably get a different number than I did.  Then, with R still
open, go to IE, and enter the URL I give above (i.e.
http://127.0.0.1:25567/**library/base/html/mean.html),
with 25567 replaced by whatever port number you have.  Tell us what happens.

In the meantime, you can go back to the old text style help with

options(help_type="text")

You won't get links between pages or some other fancy stuff from the HTML
pages, but you should get to see something.

Duncan Murdoch




Thanks for the help.

On Thu, Sep 20, 2012 at 12:23 PM, Basil Iannone 
wrote:


I apologize. I meant to type "help(anova)" and not "Help(anova)". I am
installing the newest version later today after I complete some

analyses. I

will let everyone know if doing so solves my problem.

Thanks,


On Thu, Sep 20, 2012 at 11:46 AM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:


On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone 

wrote:

Hi Heramb,

No "Help(anova)" does not work either. Thanks for the suggestions. I

put

the question out there in case anyone else was having similar

problems.

I

think I will throw in the towel and install the latest version of R

to

see

if that resolves the issue.

Thanks



"Help(anova)" should not work. R is case sensitive -- did "help(anova)"
work?

Michael





--
Basil Iannone
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-3231
Fax: 312-413-2435
http://www2.uic.edu/~bianno2













__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue accessing help files

2012-09-20 Thread Basil Iannone

Hi Duncan,

The output I see after typing ?mean is

starting httpd help server ... done

Then the browser opens but goes nowhere. I just see a black IE page with no
address in the address bar.

OK. I tried what you suggested. I typed the following

tools:::httpdPort
[1] 17081

I then pasted the following link into my IE address bar (replacing the
25567 with 17081).

http://127.0.0.1:17081/library/base/html/mean.html

Doing so took me to the page describing the mean.

Thanks for the help. Please keep me posted as to if the above information
helps you out. In the mean time I will access the files as you suggested.

On Thu, Sep 20, 2012 at 2:22 PM, Duncan Murdoch wrote:

> On 20/09/2012 2:14 PM, Basil Iannone wrote:
>
>> Okay. I installed the newest version of R and I still cannot link to help
>> files. Is anyone else having this problem? Again, I am using Window's XP.
>>
>
> I don't think you've told us what output you get when you ask for help.
>  Here's what I see from ?mean:
>
> First, this appears in the R console:
>
> starting httpd help server ... done
>
> then the browser opens, with something like this showing:
>
> http://127.0.0.1:25567/**library/base/html/mean.html
>
> You're not getting the second part; do you get the first?
>
> If so, here's something to try.  From with R, print the value of
> tools:::httpdPort:
>
> > tools:::httpdPort
> [1] 25567
>
> You'll probably get a different number than I did.  Then, with R still
> open, go to IE, and enter the URL I give above (i.e.
> http://127.0.0.1:25567/**library/base/html/mean.html),
> with 25567 replaced by whatever port number you have.  Tell us what happens.
>
> In the meantime, you can go back to the old text style help with
>
> options(help_type="text")
>
> You won't get links between pages or some other fancy stuff from the HTML
> pages, but you should get to see something.
>
> Duncan Murdoch
>
>
>
>> Thanks for the help.
>>
>> On Thu, Sep 20, 2012 at 12:23 PM, Basil Iannone 
>> wrote:
>>
>> > I apologize. I meant to type "help(anova)" and not "Help(anova)". I am
>> > installing the newest version later today after I complete some
>> analyses. I
>> > will let everyone know if doing so solves my problem.
>> >
>> > Thanks,
>> >
>> >
>> > On Thu, Sep 20, 2012 at 11:46 AM, R. Michael Weylandt <
>> > michael.weyla...@gmail.com> wrote:
>> >
>> >> On Thu, Sep 20, 2012 at 4:20 PM, Basil Iannone 
>> wrote:
>> >> > Hi Heramb,
>> >> >
>> >> > No "Help(anova)" does not work either. Thanks for the suggestions. I
>> put
>> >> > the question out there in case anyone else was having similar
>> problems.
>> >> I
>> >> > think I will throw in the towel and install the latest version of R
>> to
>> >> see
>> >> > if that resolves the issue.
>> >> >
>> >> > Thanks
>> >> >
>> >>
>> >> "Help(anova)" should not work. R is case sensitive -- did "help(anova)"
>> >> work?
>> >>
>> >> Michael
>> >>
>> >
>> >
>> >
>> > --
>> > Basil Iannone
>> > University of Illinois at Chicago
>> > Department of Biological Sciences (MC 066)
>> > 845 W. Taylor St.
>> > Chicago, IL  60607-7060
>> > Email: bian...@uic.edu
>> > Phone: 312-355-3231
>> > Fax: 312-413-2435
>> > http://www2.uic.edu/~bianno2
>> >
>>
>>
>>
>>
>


-- 
Basil Iannone
University of Illinois at Chicago
Department of Biological Sciences (MC 066)
845 W. Taylor St.
Chicago, IL  60607-7060
Email: bian...@uic.edu
Phone: 312-355-3231
Fax: 312-413-2435
http://www2.uic.edu/~bianno2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld

I'm working with some data from which a client would like to make a decision 
tree predicting brand preference based on inputs such as price, speed, etc.  
After running the decision tree analysis using rpart, it appears that this data 
is not capable of predicting brand preference.  

Here's the data set:

BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE  
SPED  REVW
Brand 1   0.69890.47310.78490.69890.74190.6022
0.88170.90320.6452
Brand 2   0.86210.37930.8621 0.9310.75860.6897
0.89660.96550.8276
Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
0.7   0.8   0.6
Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
0.750.8214   0.5
Brand 5   0.75860.42240.73280.66380.73280.6379
0.86210.86210.6897
Brand 6 0.750.08330.58330.4167   0.50.4167  
0.750.6667   0.5
Brand 7   0.77420.48390.61290.51610.80650.6452
0.77420.90320.6129
Brand 8   0.64290.26790.69640.7143 0.8750.5536
0.80360.94640.6607
Brand 90.575 0.175  0.65  0.55 0.625 0.375 
0.825  0.85 0.475
Brand 10  0.80950.52380.66670.64290.66670.5952
0.85710.80950.5714
Brand 11  0.6308   0.30.60770.58460.67690.5231
0.74620.8846   0.6
Brand 12  0.72120.31520.71520.65450.6606 0.503
0.80610.8909   0.6
Brand 13  0.74190.22580.61290.58060.70970.6129 
0.8710.96770.3226
Brand 14  0.71760.27060.63530.56470.69410.4471
0.71760.94120.5176
Brand 15  0.72870.34370.59950.57880.85270.5478
0.82170.89410.6227
Brand 16 0.7   0.4   0.6   0.4 1   0.4   
0.9   0.9   0.5
Brand 17  0.71930.0.66670.66670.70180.5263
0.77190.85960.7018
Brand 18  0.77780.41270.65080.63490.79370.6032
0.85710.9206 0.619
Brand 19  0.80280.28170.61970.43660.70420.4366
0.71830.91550.5634
Brand 20  0.77360.24530.62260.37740.58490.3019 
0.7170.86790.4717
Brand 21  0.84810.21520.63290.40510.63290.4557
0.69620.84810.3418
Brand 220.750.0.6667   0.50.66670.5833
0.91670.91670.4167

Here are my R commands:

> test.df = read.csv("test.csv")
> head(test.df)
 BRNDPRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000

> testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + MODE +  
> SPED +  REVW, method="class", data=test.df)

> printcp(testTree)

Classification tree:
rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
MODE + SPED + REVW, data = test.df, method = "class")

Variables actually used in tree construction:
[1] FORM

Root node error: 21/22 = 0.95455

n= 22 

CP nsplit rel error xerror xstd
1 0.047619  0   1.0 1.04760
2 0.01  1   0.95238 1.04760

I note that only one variable (FORM) was actually used in tree construction. 
When I run a plot using:

> plot(testTree)
> text(testTree)

...I get a tree with one branch.  

It looks to me like I'm doing everything right, and this data is just not 
capable of predicting brand preference. 

Am I missing anything?

Thanks very much in advance for any thoughts!

-Vik





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Bhupendrasinh Thakre

Not very sure what the problem is as I was not able to take your data for run. 
You might want to use dput() command to present the data. 

Now on the programming side. As we can see that we have more than 2 levels for 
the brands and hence method  = class is not able to able to understand what you 
actually want from it.

Suggestion : For predictions having more than 2 levels I will go for Weka and 
specifically C4.5 algorithm. You also have the RWeka package for it.

Best Regards,

Bhupendrasinh Thakre
Sent from my iPhone

On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld  wrote:

> I'm working with some data from which a client would like to make a decision 
> tree predicting brand preference based on inputs such as price, speed, etc.  
> After running the decision tree analysis using rpart, it appears that this 
> data is not capable of predicting brand preference.  
> 
> Here's the data set:
> 
> BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE
>   SPED  REVW
> Brand 1   0.69890.47310.78490.69890.74190.6022
> 0.88170.90320.6452
> Brand 2   0.86210.37930.8621 0.9310.75860.6897
> 0.89660.96550.8276
> Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
> 0.7   0.8   0.6
> Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
> 0.750.8214   0.5
> Brand 5   0.75860.42240.73280.66380.73280.6379
> 0.86210.86210.6897
> Brand 6 0.750.08330.58330.4167   0.50.4167  
> 0.750.6667   0.5
> Brand 7   0.77420.48390.61290.51610.80650.6452
> 0.77420.90320.6129
> Brand 8   0.64290.26790.69640.7143 0.8750.5536
> 0.80360.94640.6607
> Brand 90.575 0.175  0.65  0.55 0.625 0.375 
> 0.825  0.85 0.475
> Brand 10  0.80950.52380.66670.64290.66670.5952
> 0.85710.80950.5714
> Brand 11  0.6308   0.30.60770.58460.67690.5231
> 0.74620.8846   0.6
> Brand 12  0.72120.31520.71520.65450.6606 0.503
> 0.80610.8909   0.6
> Brand 13  0.74190.22580.61290.58060.70970.6129 
> 0.8710.96770.3226
> Brand 14  0.71760.27060.63530.56470.69410.4471
> 0.71760.94120.5176
> Brand 15  0.72870.34370.59950.57880.85270.5478
> 0.82170.89410.6227
> Brand 16 0.7   0.4   0.6   0.4 1   0.4   
> 0.9   0.9   0.5
> Brand 17  0.71930.0.66670.66670.70180.5263
> 0.77190.85960.7018
> Brand 18  0.77780.41270.65080.63490.79370.6032
> 0.85710.9206 0.619
> Brand 19  0.80280.28170.61970.43660.70420.4366
> 0.71830.91550.5634
> Brand 20  0.77360.24530.62260.37740.58490.3019 
> 0.7170.86790.4717
> Brand 21  0.84810.21520.63290.40510.63290.4557
> 0.69620.84810.3418
> Brand 220.750.0.6667   0.50.66670.5833
> 0.91670.91670.4167
> 
> Here are my R commands:
> 
>> test.df = read.csv("test.csv")
>> head(test.df)
> BRNDPRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.5000
> 5 Brand 5 0.7586 0.4224 0.7328 0.6638 0.7328 0.6379 0.8621 0.8621 0.6897
> 6 Brand 6 0.7500 0.0833 0.5833 0.4167 0.5000 0.4167 0.7500 0.6667 0.5000
> 
>> testTree = rpart(BRAND~PRI  + PROM  + FORM +  FAMI+   DRRE +  FREC  + MODE + 
>>  SPED +  REVW, method="class", data=test.df)
> 
>> printcp(testTree)
> 
> Classification tree:
> rpart(formula = BRND ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>MODE + SPED + REVW, data = test.df, method = "class")
> 
> Variables actually used in tree construction:
> [1] FORM
> 
> Root node error: 21/22 = 0.95455
> 
> n= 22 
> 
>CP nsplit rel error xerror xstd
> 1 0.047619  0   1.0 1.04760
> 2 0.01  1   0.95238 1.04760
> 
> I note that only one variable (FORM) was actually used in tree construction. 
> When I run a plot using:
> 
>> plot(testTree)
>> text(testTree)
> 
> ...I get a tree with one branch.  
> 
> It looks to me like I'm doing everything right, and this data is just not 
> capable of predicting brand preference. 
> 
> Am I missing anything?
> 
> Thanks very much in advance for any thoughts!
> 
> -Vik
> 
> 
> 
> 
> 
>[[alternative HTML version deleted]]
> 
> ___

[R] Axis annotation using lattice spplot

2012-09-20 Thread V Zintzen

Hi there,

I am having difficulties with what seems like a very simple thing.

My objective is to plot a distribution map for a species.
I have produced a plot with spplot which uses a raster, a few shapefiles and
xy points which are the species coordinates.
It all works fine until I want to add coordinates for this map.
I need to have the ticks and labels for those coordinates *inside* the
plotting area, not outside (outside, it works fine).
Apparently, using the code below (negatice 'tck'), the ticks are printed
inside the plotting area, but under the raster, which is not of much use to
me.

My questions are:
(1) how can I have the ticks and labels appearing on top of the raster?
(2) what are the functions to have the labels also moved on the inside of
the plotting area, sitting nicely close to the ticks?

x.labels<-c("10°","15°","20°","25°","30","35°","40°")
y.labels<-c("55°","50°","45°","40°","35°","30°","25°","20°")
spplot (  raster.map, 
scales=list(tck=c(-0.3,0), draw=T, x=list(at=x.UTM,
labels=x.labels), y=list(at=y.UTM, labels=y.labels)), #works fine with
tck=c(0.3,0), but not with a negative value
sp.layout=list(EEZ,coastline,sp)) #add shapefiles and the
distribution points for the selected species
  

Any help would be greatly appreciated!
-Vincent



--
View this message in context: 
http://r.789695.n4.nabble.com/Axis-annotation-using-lattice-spplot-tp4643810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld

Thanks! Here's the dput output:

> dput(test.df)
structure(list(BRND = structure(c(1L, 12L, 16L, 17L, 18L, 19L, 
20L, 21L, 22L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L, 
14L, 15L), .Label = c("Brand 1", "Brand 10", "Brand 11", "Brand 12", 
"Brand 13", "Brand 14", "Brand 15", "Brand 16", "Brand 17", "Brand 18", 
"Brand 19", "Brand 2", "Brand 20", "Brand 21", "Brand 22", "Brand 3", 
"Brand 4", "Brand 5", "Brand 6", "Brand 7", "Brand 8", "Brand 9"
), class = "factor"), PRI = c(0.6989, 0.8621, 0.6, 0.6429, 0.7586, 
0.75, 0.7742, 0.6429, 0.575, 0.8095, 0.6308, 0.7212, 0.7419, 
0.7176, 0.7287, 0.7, 0.7193, 0.7778, 0.8028, 0.7736, 0.8481, 
0.75), PROM = c(0.4731, 0.3793, 0.1, 0.25, 0.4224, 0.0833, 0.4839, 
0.2679, 0.175, 0.5238, 0.3, 0.3152, 0.2258, 0.2706, 0.3437, 0.4, 
0., 0.4127, 0.2817, 0.2453, 0.2152, 0.), FORM = c(0.7849, 
0.8621, 0.6, 0.5714, 0.7328, 0.5833, 0.6129, 0.6964, 0.65, 0.6667, 
0.6077, 0.7152, 0.6129, 0.6353, 0.5995, 0.6, 0.6667, 0.6508, 
0.6197, 0.6226, 0.6329, 0.6667), FAMI = c(0.6989, 0.931, 0.7, 
0.5, 0.6638, 0.4167, 0.5161, 0.7143, 0.55, 0.6429, 0.5846, 0.6545, 
0.5806, 0.5647, 0.5788, 0.4, 0.6667, 0.6349, 0.4366, 0.3774, 
0.4051, 0.5), DRRE = c(0.7419, 0.7586, 0.9, 0.6071, 0.7328, 0.5, 
0.8065, 0.875, 0.625, 0.6667, 0.6769, 0.6606, 0.7097, 0.6941, 
0.8527, 1, 0.7018, 0.7937, 0.7042, 0.5849, 0.6329, 0.6667), FREC = c(0.6022, 
0.6897, 0.7, 0.5, 0.6379, 0.4167, 0.6452, 0.5536, 0.375, 0.5952, 
0.5231, 0.503, 0.6129, 0.4471, 0.5478, 0.4, 0.5263, 0.6032, 0.4366, 
0.3019, 0.4557, 0.5833), MODE = c(0.8817, 0.8966, 0.7, 0.75, 
0.8621, 0.75, 0.7742, 0.8036, 0.825, 0.8571, 0.7462, 0.8061, 
0.871, 0.7176, 0.8217, 0.9, 0.7719, 0.8571, 0.7183, 0.717, 0.6962, 
0.9167), SPED = c(0.9032, 0.9655, 0.8, 0.8214, 0.8621, 0.6667, 
0.9032, 0.9464, 0.85, 0.8095, 0.8846, 0.8909, 0.9677, 0.9412, 
0.8941, 0.9, 0.8596, 0.9206, 0.9155, 0.8679, 0.8481, 0.9167), 
REVW = c(0.6452, 0.8276, 0.6, 0.5, 0.6897, 0.5, 0.6129, 0.6607, 
0.475, 0.5714, 0.6, 0.6, 0.3226, 0.5176, 0.6227, 0.5, 0.7018, 
0.619, 0.5634, 0.4717, 0.3418, 0.4167)), .Names = c("BRND", 
"PRI", "PROM", "FORM", "FAMI", "DRRE", "FREC", "MODE", "SPED", 
"REVW"), class = "data.frame", row.names = c(NA, -22L))


I've downloaded rWeka and am looking at the documentation.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Parallel Programming

2012-09-20 Thread Tjun Kiat Teo

I am trying to do parallel programming and I tried this

library(doSNOW)
library(foreach)

testfunc<-function(x){
x<-x+1
x
}

noc<-2

cl <- makeCluster(do.call(rbind,rep(list("localhost"),noc)), type = "SOCK")
registerDoSNOW(cl)
clusterExport(cl=cl,c("testfunc.r"))


testl<-foreach(pp=1:2) %dopar% {
testfunc(pp)
}


And this works but if I try to enclose my commands inside a text file
to be sourced it doesn't work

noc<-2

testfunc<-function(x){
x<-x+1
x
}

cl <- makeCluster(do.call(rbind,rep(list("localhost"),noc)), type = "SOCK")

registerDoSNOW(cl)

clusterExport(cl=cl,c("a","testfunc.r"))

testl<-foreach(pp=1:2)) %dopar% {
source("test.r")
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parallel Programming

2012-09-20 Thread Jeff Newmiller

Then don't do that.

Use your script file to define functions. Source that file before the loop to 
load them into memory. Call those functions from within your loop.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Tjun Kiat Teo  wrote:

>I am trying to do parallel programming and I tried this
>
>library(doSNOW)
>library(foreach)
>
>testfunc<-function(x){
>x<-x+1
>x
>}
>
>noc<-2
>
>cl <- makeCluster(do.call(rbind,rep(list("localhost"),noc)), type =
>"SOCK")
>registerDoSNOW(cl)
>clusterExport(cl=cl,c("testfunc.r"))
>
>
>testl<-foreach(pp=1:2) %dopar% {
>testfunc(pp)
>}
>
>
>And this works but if I try to enclose my commands inside a text file
>to be sourced it doesn't work
>
>noc<-2
>
>testfunc<-function(x){
>x<-x+1
>x
>}
>
>cl <- makeCluster(do.call(rbind,rep(list("localhost"),noc)), type =
>"SOCK")
>
>registerDoSNOW(cl)
>
>clusterExport(cl=cl,c("a","testfunc.r"))
>
>testl<-foreach(pp=1:2)) %dopar% {
>source("test.r")
>}
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld

Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set and 
got a 61.75% correct classification rate!

Correctly Classified Instances 988   61.75   %
Incorrectly Classified Instances   612   38.25   %
Kappa statistic  0.5651
Mean absolute error  0.0432
Root mean squared error  0.1469
Relative absolute error 52.7086 %
Root relative squared error 72.6299 %
Coverage of cases (0.95 level)  99.6875 %
Mean rel. region size (0.95 level)  15.4915 %
Total Number of Instances 1600 

When I plot it I get an enormous chart.  Running :

>respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE 
>+ SPED + REVW, data = respLevel)
>respLevelTree

...reports:

J48 pruned tree
--

Is there a way to further prune the tree so that I can present a chart that 
would fit on a single page or two?

Thanks very much in advance for any thoughts.


-Vik




On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:

> Not very sure what the problem is as I was not able to take your data for 
> run. You might want to use dput() command to present the data. 
> 
> Now on the programming side. As we can see that we have more than 2 levels 
> for the brands and hence method  = class is not able to able to understand 
> what you actually want from it.
> 
> Suggestion : For predictions having more than 2 levels I will go for Weka and 
> specifically C4.5 algorithm. You also have the RWeka package for it.
> 
> Best Regards,
> 
> Bhupendrasinh Thakre
> Sent from my iPhone
> 
> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld  wrote:
> 
>> I'm working with some data from which a client would like to make a decision 
>> tree predicting brand preference based on inputs such as price, speed, etc.  
>> After running the decision tree analysis using rpart, it appears that this 
>> data is not capable of predicting brand preference.  
>> 
>> Here's the data set:
>> 
>> BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE   
>>SPED  REVW
>> Brand 1   0.69890.47310.78490.69890.74190.6022
>> 0.88170.90320.6452
>> Brand 2   0.86210.37930.8621 0.9310.75860.6897
>> 0.89660.96550.8276
>> Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
>> 0.7   0.8   0.6
>> Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
>> 0.750.8214   0.5
>> Brand 5   0.75860.42240.73280.66380.73280.6379
>> 0.86210.86210.6897
>> Brand 6 0.750.08330.58330.4167   0.50.4167  
>> 0.750.6667   0.5
>> Brand 7   0.77420.48390.61290.51610.80650.6452
>> 0.77420.90320.6129
>> Brand 8   0.64290.26790.69640.7143 0.8750.5536
>> 0.80360.94640.6607
>> Brand 90.575 0.175  0.65  0.55 0.625 0.375 
>> 0.825  0.85 0.475
>> Brand 10  0.80950.52380.66670.64290.66670.5952
>> 0.85710.80950.5714
>> Brand 11  0.6308   0.30.60770.58460.67690.5231
>> 0.74620.8846   0.6
>> Brand 12  0.72120.31520.71520.65450.6606 0.503
>> 0.80610.8909   0.6
>> Brand 13  0.74190.22580.61290.58060.70970.6129 
>> 0.8710.96770.3226
>> Brand 14  0.71760.27060.63530.56470.69410.4471
>> 0.71760.94120.5176
>> Brand 15  0.72870.34370.59950.57880.85270.5478
>> 0.82170.89410.6227
>> Brand 16 0.7   0.4   0.6   0.4 1   0.4   
>> 0.9   0.9   0.5
>> Brand 17  0.71930.0.66670.66670.70180.5263
>> 0.77190.85960.7018
>> Brand 18  0.77780.41270.65080.63490.79370.6032
>> 0.85710.9206 0.619
>> Brand 19  0.80280.28170.61970.43660.70420.4366
>> 0.71830.91550.5634
>> Brand 20  0.77360.24530.62260.37740.58490.3019 
>> 0.7170.86790.4717
>> Brand 21  0.84810.21520.63290.40510.63290.4557
>> 0.69620.84810.3418
>> Brand 220.750.0.6667   0.50.66670.5833
>> 0.91670.91670.4167
>> 
>> Here are my R commands:
>> 
>>> test.df = read.csv("test.csv")
>>> head(test.df)
>>BRNDPRI   PROM   FORM   FAMI   DRRE   FREC   MODE   SPED   REVW
>> 1 Brand 1 0.6989 0.4731 0.7849 0.6989 0.7419 0.6022 0.8817 0.9032 0.6452
>> 2 Brand 2 0.8621 0.3793 0.8621 0.9310 0.7586 0.6897 0.8966 0.9655 0.8276
>> 3 Brand 3 0.6000 0.1000 0.6000 0.7000 0.9000 0.7000 0.7000 0.8000 0.6000
>> 4 Brand 4 0.6429 0.2500 0.5714 0.5000 0.6071 0.5000 0.7500 0.8214 0.50

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Bhupendrasinh Thakre

One possible way to think of it is using " variable reduction" before going for 
J48. You may want to use several methods available for that. Again prediction 
for brands is more of a business question to me. 

Two solution which I can think of.
1. Variable reduction before decision tree.
2. Let the intuition decide how many of them are "really" important.

Please let us know your findings. All the best.

Best Regards,

Bhupendrasinh Thakre
Sent from my iPhone

On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld  wrote:

> Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
> and got a 61.75% correct classification rate!
> 
> Correctly Classified Instances 988   61.75   %
> Incorrectly Classified Instances   612   38.25   %
> Kappa statistic  0.5651
> Mean absolute error  0.0432
> Root mean squared error  0.1469
> Relative absolute error 52.7086 %
> Root relative squared error 72.6299 %
> Coverage of cases (0.95 level)  99.6875 %
> Mean rel. region size (0.95 level)  15.4915 %
> Total Number of Instances 1600 
> 
> When I plot it I get an enormous chart.  Running :
> 
> >respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
> >MODE + SPED + REVW, data = respLevel)
> >respLevelTree
> 
> ...reports:
> 
> J48 pruned tree
> --
> 
> Is there a way to further prune the tree so that I can present a chart that 
> would fit on a single page or two?
> 
> Thanks very much in advance for any thoughts.
> 
> 
> -Vik
> 
> 
> 
> 
> On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
> 
>> Not very sure what the problem is as I was not able to take your data for 
>> run. You might want to use dput() command to present the data. 
>> 
>> Now on the programming side. As we can see that we have more than 2 levels 
>> for the brands and hence method  = class is not able to able to understand 
>> what you actually want from it.
>> 
>> Suggestion : For predictions having more than 2 levels I will go for Weka 
>> and specifically C4.5 algorithm. You also have the RWeka package for it.
>> 
>> Best Regards,
>> 
>> Bhupendrasinh Thakre
>> Sent from my iPhone
>> 
>> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld  wrote:
>> 
>>> I'm working with some data from which a client would like to make a 
>>> decision tree predicting brand preference based on inputs such as price, 
>>> speed, etc.  After running the decision tree analysis using rpart, it 
>>> appears that this data is not capable of predicting brand preference.  
>>> 
>>> Here's the data set:
>>> 
>>> BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE  
>>> SPED  REVW
>>> Brand 1   0.69890.47310.78490.69890.74190.6022
>>> 0.88170.90320.6452
>>> Brand 2   0.86210.37930.8621 0.9310.75860.6897
>>> 0.89660.96550.8276
>>> Brand 3  0.6   0.1   0.6   0.7   0.9   0.7  
>>>  0.7   0.8   0.6
>>> Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
>>> 0.750.8214   0.5
>>> Brand 5   0.75860.42240.73280.66380.73280.6379
>>> 0.86210.86210.6897
>>> Brand 6 0.750.08330.58330.4167   0.50.4167  
>>> 0.750.6667   0.5
>>> Brand 7   0.77420.48390.61290.51610.80650.6452
>>> 0.77420.90320.6129
>>> Brand 8   0.64290.26790.69640.7143 0.8750.5536
>>> 0.80360.94640.6607
>>> Brand 90.575 0.175  0.65  0.55 0.625 0.375 
>>> 0.825  0.85 0.475
>>> Brand 10  0.80950.52380.66670.64290.66670.5952
>>> 0.85710.80950.5714
>>> Brand 11  0.6308   0.30.60770.58460.67690.5231
>>> 0.74620.8846   0.6
>>> Brand 12  0.72120.31520.71520.65450.6606 0.503
>>> 0.80610.8909   0.6
>>> Brand 13  0.74190.22580.61290.58060.70970.6129 
>>> 0.8710.96770.3226
>>> Brand 14  0.71760.27060.63530.56470.69410.4471
>>> 0.71760.94120.5176
>>> Brand 15  0.72870.34370.59950.57880.85270.5478
>>> 0.82170.89410.6227
>>> Brand 16 0.7   0.4   0.6   0.4 1   0.4  
>>>  0.9   0.9   0.5
>>> Brand 17  0.71930.0.66670.66670.70180.5263
>>> 0.77190.85960.7018
>>> Brand 18  0.77780.41270.65080.63490.79370.6032
>>> 0.85710.9206 0.619
>>> Brand 19  0.80280.28170.61970.43660.70420.4366
>>> 0.71830.91550.5634
>>> Brand 20  0.77360.24530.62260.37740.58490.3019 
>>> 0.7170.86790.4717
>>> Brand 21

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Vik Rubenfeld

Very good. Could  you point me in a couple of potential directions for variable 
reduction? E.g. correlation analysis?


On Sep 20, 2012, at 10:36 PM, Bhupendrasinh Thakre wrote:

> One possible way to think of it is using " variable reduction" before going 
> for J48. You may want to use several methods available for that. Again 
> prediction for brands is more of a business question to me. 
> 
> Two solution which I can think of.
> 1. Variable reduction before decision tree.
> 2. Let the intuition decide how many of them are "really" important.
> 
> Please let us know your findings. All the best.
> 
> Best Regards,
> 
> Bhupendrasinh Thakre
> Sent from my iPhone
> 
> On Sep 21, 2012, at 12:16 AM, Vik Rubenfeld  wrote:
> 
>> Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set 
>> and got a 61.75% correct classification rate!
>> 
>> Correctly Classified Instances 988   61.75   %
>> Incorrectly Classified Instances   612   38.25   %
>> Kappa statistic  0.5651
>> Mean absolute error  0.0432
>> Root mean squared error  0.1469
>> Relative absolute error 52.7086 %
>> Root relative squared error 72.6299 %
>> Coverage of cases (0.95 level)  99.6875 %
>> Mean rel. region size (0.95 level)  15.4915 %
>> Total Number of Instances 1600 
>> 
>> When I plot it I get an enormous chart.  Running :
>> 
>> >respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + 
>> >MODE + SPED + REVW, data = respLevel)
>> >respLevelTree
>> 
>> ...reports:
>> 
>> J48 pruned tree
>> --
>> 
>> Is there a way to further prune the tree so that I can present a chart that 
>> would fit on a single page or two?
>> 
>> Thanks very much in advance for any thoughts.
>> 
>> 
>> -Vik
>> 
>> 
>> 
>> 
>> On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:
>> 
>>> Not very sure what the problem is as I was not able to take your data for 
>>> run. You might want to use dput() command to present the data. 
>>> 
>>> Now on the programming side. As we can see that we have more than 2 levels 
>>> for the brands and hence method  = class is not able to able to understand 
>>> what you actually want from it.
>>> 
>>> Suggestion : For predictions having more than 2 levels I will go for Weka 
>>> and specifically C4.5 algorithm. You also have the RWeka package for it.
>>> 
>>> Best Regards,
>>> 
>>> Bhupendrasinh Thakre
>>> Sent from my iPhone
>>> 
>>> On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld  wrote:
>>> 
 I'm working with some data from which a client would like to make a 
 decision tree predicting brand preference based on inputs such as price, 
 speed, etc.  After running the decision tree analysis using rpart, it 
 appears that this data is not capable of predicting brand preference.  
 
 Here's the data set:
 
 BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE 
  SPED  REVW
 Brand 1   0.69890.47310.78490.69890.74190.6022
 0.88170.90320.6452
 Brand 2   0.86210.37930.8621 0.9310.75860.6897
 0.89660.96550.8276
 Brand 3  0.6   0.1   0.6   0.7   0.9   0.7 
   0.7   0.8   0.6
 Brand 4   0.6429  0.250.5714   0.50.6071   0.5 
  0.750.8214   0.5
 Brand 5   0.75860.42240.73280.66380.73280.6379
 0.86210.86210.6897
 Brand 6 0.750.08330.58330.4167   0.50.4167 
  0.750.6667   0.5
 Brand 7   0.77420.48390.61290.51610.80650.6452
 0.77420.90320.6129
 Brand 8   0.64290.26790.69640.7143 0.8750.5536
 0.80360.94640.6607
 Brand 90.575 0.175  0.65  0.55 0.625 0.375 
 0.825  0.85 0.475
 Brand 10  0.80950.52380.66670.64290.66670.5952
 0.85710.80950.5714
 Brand 11  0.6308   0.30.60770.58460.67690.5231
 0.74620.8846   0.6
 Brand 12  0.72120.31520.71520.65450.6606 0.503
 0.80610.8909   0.6
 Brand 13  0.74190.22580.61290.58060.70970.6129 
 0.8710.96770.3226
 Brand 14  0.71760.27060.63530.56470.69410.4471
 0.71760.94120.5176
 Brand 15  0.72870.34370.59950.57880.85270.5478
 0.82170.89410.6227
 Brand 16 0.7   0.4   0.6   0.4 1   0.4 
   0.9   0.9   0.5
 Brand 17  0.71930.0.66670.66670.70180.5263
 0.77190.85960.7018
 Brand 18  0.77780.4127

Re: [R] Decision Tree: Am I Missing Anything?

2012-09-20 Thread Achim Zeileis


Hi,

just to add a few points to the discussion:

- rpart() is able to deal with responses with more than two classes. 
Setting method="class" explicitly is not necessary if the response is a 
factor (as in this case).


- If your tree on this data is so huge that it can't even be plotted, I 
wouldn't be surprised if it overfitted the data set. You should check for 
this and possibly try to avoid unnecessary splits.


- There are various ways to do so for J48 trees without variable 
reduction. One could require a larger minimal leaf size (default is 2) or 
one can use "reduced error pruning", see WOW("J48") for more options. They 
can be easily used as e.g. J48(..., control = Weka_control(R = TRUE,

M = 10)) etc.

- There are various other ways of fitting decision trees, see for example 
http://CRAN.R-project.org/view=MachineLearning for an overview. In 
particular, you might like the "partykit" package which additionally 
provides the ctree() method and has a unified plotting interface for 
ctree, rpart, and J48.


hth,
Z

On Thu, 20 Sep 2012, Vik Rubenfeld wrote:


Bhupendrashinh, thanks very much!  I ran J48 on a respondent-level data set and 
got a 61.75% correct classification rate!

Correctly Classified Instances 988   61.75   %
Incorrectly Classified Instances   612   38.25   %
Kappa statistic  0.5651
Mean absolute error  0.0432
Root mean squared error  0.1469
Relative absolute error 52.7086 %
Root relative squared error 72.6299 %
Coverage of cases (0.95 level)  99.6875 %
Mean rel. region size (0.95 level)  15.4915 %
Total Number of Instances 1600

When I plot it I get an enormous chart.  Running :


respLevelTree = J48(BRAND_NAME ~ PRI + PROM + FORM + FAMI + DRRE + FREC + MODE 
+ SPED + REVW, data = respLevel)
respLevelTree


...reports:

J48 pruned tree
--

Is there a way to further prune the tree so that I can present a chart that 
would fit on a single page or two?

Thanks very much in advance for any thoughts.


-Vik




On Sep 20, 2012, at 8:37 PM, Bhupendrasinh Thakre wrote:


Not very sure what the problem is as I was not able to take your data for run. 
You might want to use dput() command to present the data.

Now on the programming side. As we can see that we have more than 2 levels for 
the brands and hence method  = class is not able to able to understand what you 
actually want from it.

Suggestion : For predictions having more than 2 levels I will go for Weka and 
specifically C4.5 algorithm. You also have the RWeka package for it.

Best Regards,

Bhupendrasinh Thakre
Sent from my iPhone

On Sep 20, 2012, at 9:47 PM, Vik Rubenfeld  wrote:


I'm working with some data from which a client would like to make a decision 
tree predicting brand preference based on inputs such as price, speed, etc.  
After running the decision tree analysis using rpart, it appears that this data 
is not capable of predicting brand preference.

Here's the data set:

BRND  PRI   PROM  FORM  FAMI  DRRE  FREC  MODE  
SPED  REVW
Brand 1   0.69890.47310.78490.69890.74190.6022
0.88170.90320.6452
Brand 2   0.86210.37930.8621 0.9310.75860.6897
0.89660.96550.8276
Brand 3  0.6   0.1   0.6   0.7   0.9   0.7   
0.7   0.8   0.6
Brand 4   0.6429  0.250.5714   0.50.6071   0.5  
0.750.8214   0.5
Brand 5   0.75860.42240.73280.66380.73280.6379
0.86210.86210.6897
Brand 6 0.750.08330.58330.4167   0.50.4167  
0.750.6667   0.5
Brand 7   0.77420.48390.61290.51610.80650.6452
0.77420.90320.6129
Brand 8   0.64290.26790.69640.7143 0.8750.5536
0.80360.94640.6607
Brand 90.575 0.175  0.65  0.55 0.625 0.375 
0.825  0.85 0.475
Brand 10  0.80950.52380.66670.64290.66670.5952
0.85710.80950.5714
Brand 11  0.6308   0.30.60770.58460.67690.5231
0.74620.8846   0.6
Brand 12  0.72120.31520.71520.65450.6606 0.503
0.80610.8909   0.6
Brand 13  0.74190.22580.61290.58060.70970.6129 
0.8710.96770.3226
Brand 14  0.71760.27060.63530.56470.69410.4471
0.71760.94120.5176
Brand 15  0.72870.34370.59950.57880.85270.5478
0.82170.89410.6227
Brand 16 0.7   0.4   0.6   0.4 1   0.4   
0.9   0.9   0.5
Brand 17  0.71930.0.66670.66670.70180.5263
0.77190.85960.7018
Brand 18  0.77780.41270.65080.63490.79370.6032
0.85710.9

Re: [R] lattice dotplot reorder contiguous levels

2012-09-20 Thread Deepayan Sarkar

On Thu, Sep 20, 2012 at 7:48 PM, maxbre  wrote:
> my reproducible example
>
> test<-structure(list(site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
> 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("A",
> "B", "C", "D", "E"), class = "factor"), conc = c(2.32, 0.902,
> 0.468, 5.51, 1.49, 0.532, 0.72, 0.956, 0.887, 20, 30, 2.12, 0.442,
> 10, 50, 110, 3.36, 2.41, 20, 70, 3610, 100, 4.79, 20, 0.0315,
> 30, 60, 1, 3.37, 80, 1.21, 0.302, 0.728, 1.29, 30, 40, 90, 30,
> 0.697, 6.25, 0.576, 0.335, 20, 10, 620, 40, 9.98, 4.76, 2.61,
> 3.39, 20, 4.59), samp.time = structure(c(2L, 4L, 4L, 4L, 4L,
> 4L, 5L, 4L, 8L, 8L, 8L, 8L, 8L, 9L, 8L, 7L, 8L, 8L, 8L, 8L, 3L,
> 3L, 2L, 4L, 4L, 4L, 4L, 4L, 1L, 4L, 6L, 4L, 8L, 4L, 8L, 4L, 3L,
> 8L, 4L, 8L, 4L, 8L, 4L, 9L, 3L, 8L, 8L, 8L, 8L, 8L, 8L, 1L), .Label = c("2",
> "4", "12", "24", "96", "135", "167", "168", "169"), class = "factor")),
> .Names = c("site",
> "conc", "samp.time"), row.names = c(NA, 52L), class = "data.frame")
>
>
>
> dotplot(samp.time~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "free")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
>
> my objective is to use “site” as conditioning variable but with “samp.time”
> correctly grouped by “site”; the problem here is to ensure that levels of
> “samp.time” within each “site” are contiguous as otherwise they would be not
> contiguous in the dot plot itself (i.e, avoid that sort of holes in between
> y axis categories -see dotplot -)
>
>
> I’ve been trying with this but without much success
>
> test$samp.time.new<-
>   with(test,reorder(samp.time,as.numeric(site)))
>
>
> dotplot(samp.time.new~conc|site, data=test,
> scales=list(x=list(log=10), y = list(relation = "free")),
> layout=c(1,5), strip=FALSE, strip.left=TRUE
> )
>
> I think (I hope) a possible different solution is to create for "ylim" a
> proper character vector of different length to pass to each panel of the
> dotplot (I’m not posting this attempt because too much confused up to now)
>
> can anyone point me in the right direction?

The problem here is that there is crossing between sites and
samp.time. You can try some imaginative permutations of site, such as

test$samp.time.new <- with(test, reorder(samp.time,
as.numeric(factor(site, levels = c("A", "C", "D", "B", "E")

which gets all but site B right. There may be another permutation that
works for everything, but it would be much easier to make a nested
factor, i.e.,

test$samp.time.new <- with(test, reorder(samp.time:site, as.numeric(site)))

That just leaves getting the y-labels right, which I will leave for
you to figure out.

(Hint: ylim = some_function_of(levels(test$samp.time.new)))

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

85 matches

Mail list logo