[R] variable selection in linear regression

2011-06-07 Thread Syaiba Balqish
Hello  
  
With due respect, have a nice time. I would like to ask some command in R.  
It is regarding variable selection in linear regression. 
In R, there is one rebuild function called "step" which
selecting variables according to AIC.

let say i have data [y, x1,x2,x3,x4]
we start with y~b0
i compute the partial F test and choose the variable 
with maximum partial F to enter the model, let say
x4 with max value of partial F=58.02377.
therefore, our next model is y~b0+b4x4

my questions...

1.how should i write so that x4 will be added to the next step?
2. the formula for partial F test is   
F*=(SSE(reduced  model)-SSE(full model)/dfR-dfF) / (SSE(full model)/dfF) 
which can be simply as
F*=MSR(xi | x1,x2,...,xi-1,xi+1) / MSE(x1,x2,...,xi-1,xi,xi+1)
If i would like to write my formula by simplified one, how can i write it 
for every xi (not in the model) that need to be selected with conditionally
depend on other x's (in the model)
 let say , i want to select other variables (x1, x2, x3) after x4 is
selected
F*=MSR(x3|x4)/MSE(x3,x4)

Below, i attach my simple code

p <- dim(mydata)[2]
d <- p-1
n <- dim(mydata)[1]
x <- as.matrix(mydata[,2:p]) 
y <- as.matrix(mydata[,1])
X <- as.matrix(rep(1,n)) 


b <- lm(y~1,data=mydata)$coefficients 
yhat <- X%*%b 
res <- y-yhat
sigma.hat <- sqrt(sum(res^2)/(n-ncol(X)))
cv <- sigma.hat^2*ginv(t(X)%*%X)
se <- sqrt(diag(cv)) 

pc <- matrix(0,nrow=1,ncol=d)
resF <- matrix(0, nrow=n, ncol=d)
pf <- matrix(0, nrow=1, ncol=d)
for(j in 1:d){
pc[,j] <- cor(x=(x[,j]), y=(mydata[,1]))
resF[,j] <- lsfit(x[,j], y)$residuals  
sseF <- t(as.matrix(apply(resF^2, 2, sum)))
resR <- lm(y~1,data=mydata)$residuals
sseR <- sum(resR^2)
dfF <- n-2 
dfR <- n-1 
pf[,j] <- ((sseR-sseF[,j])/(dfR-dfF))/(sseF[,j]/dfF)  
max.pf=max(pf)
max.pc=max(pc) 


Thank you and looking forward to hear some replies.
Sincerely,

Iba
Universiti Putra Malaysia
 


--
View this message in context: 
http://r.789695.n4.nabble.com/variable-selection-in-linear-regression-tp3578795p3578795.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R results explanation

2011-06-07 Thread mpavlic
Hi all, 

 
this might be a stupid question, but still. 

Everytime i find some new function it's prettty easy to understand how to
use the syntax and to perform a text. Even the general idea of what the
function does is pretty easy to understand, but i can not find an
explanation (detailed explanation) of the R output for each function. For
example, a function fitdistr() in MASS package i testing the data against
the distributions (wanted distribution). 

fitdistr(k, "lognormal")
> meanlogsdlog   
  3.150725905   0.759584332 
 (0.012839319) (0.009078769)

But when i get the result I can not find what these numbers mean. Is there a
explanation somewhere in R help for each function?

Thanks for the help, m

--
View this message in context: 
http://r.789695.n4.nabble.com/R-results-explanation-tp3578918p3578918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and DBSCAN

2011-06-07 Thread Paco Pastor

Hello Christian

Thanks for answering. Yes, I have tried dbscan from fpc but I'm still 
stuck on the memory problem. Regarding your answer, I'm not sure which 
memory parameter should I look at. Following is the code I tried with 
dbscan parameters, maybe you can see if there is any mistake.


sstdat=read.csv("sst.dat",sep=";",header=F,col.names=c("lon","lat","sst"))

library(fpc)
sst1=subset(sstdat, sst<50)
sst2=subset(sst1, lon>-6)
sst2=subset(sst2, lon<40)
sst2=subset(sst2, lat<46)

> dbscan(sst2$sst, 0.1, MinPts = 5, scale = FALSE, method = 
c("hybrid"), seeds = FALSE, showplot = FALSE, countmode = NULL)

Error: no se puede ubicar un vector de tamaño  858.2 Mb
> head(sst2)
 lon   lat   sst
1257 35.18 24.98 26.78
1258 35.22 24.98 26.78
1259 35.27 24.98 26.78
1260 35.31 24.98 26.78
1261 35.35 24.98 26.78
1262 35.40 24.98 26.85


In this example I only apply dbscan to temperature values, not lon/lat, 
so eps parameter is 0.1. As it is a gridded data set any point is 
surrounded by eight data points, then I thought that at least 5 of the 
surrounding points should be within the reachability distance. But I'm 
not sure I'm getting the right approach by only considering temperature 
value, maybe then I'm missing spatial information. How should I deal 
with longitude and latitude data?


dimensions of sst2 are: 152243 rows x 3 columns

Thanks again

El 03/06/2011 18:24, Christian Hennig escribió:
Have you considered the dbscan function in library fpc, or was it 
another one?
dbscan in fpc doesn't have a "distance" parameter but several options, 
one
of which may resolve your memory problem (look up the documentation of 
the "memory" parameter).


Using a distance matrix for hundreds of thousands of points is a 
recipe for disaster (memory-wise). I'm not sure whether the function 
that you used did that, but dbscan in fpc can avoid it.


It is true that dbscan requires tuning constants that the user has to 
provide. There is unfortunately no general rule how to do this; it 
would be necessary to understand the method and the meaning of the 
constants, and how this translates into the requirements of your 
application.


You may try several different choices and do some cluster validation 
to see what works, but I can't explain this in general terms easily 
via email.


Hope this helps at least a bit.

Best regards,
Christian


On Fri, 3 Jun 2011, Paco Pastor wrote:


Hello everyone,

When looking for information about clustering of spatial data in R I 
was directed towards DBSCAN. I've read some docs about it and theb 
new questions have arisen.


DBSCAN requires some parameters, one of them is "distance". As my 
data are three dimensional, longitude, latitude and temperature, 
which "distance" should I use? which dimension is related to that 
distance? I suposse it should be temperature. How do I find such 
minimum distance with R?


Another parameter is the minimum number of points neded to form a 
cluster. Is there any method to find that number? Unfortunately I 
haven't found.


Searching thorugh Google I could not find an R example for using 
dbscan in a dataset similar to mine, do you know any website with 
such kind of examples? So I can read and try to adapt to my case.


The last question is that my first R attempt with DBSCAN (without a 
proper answer to the prior questions) resulted in a memory problem. R 
says it can not allocate vector. I start with a 4 km spaced grid with 
779191 points that ends in approximately 30 rows x 3 columns 
(latitude, longitude and temperature) when removing not valid SST 
points. Any hint to address this memory problem. Does it depend on my 
computer or in DBSCAN itself?


Thanks for the patience to read a long and probably boring message 
and for your help.


--
---
Francisco Pastor
Meteorology department, Instituto Universitario CEAM-UMH
http://www.ceam.es
---
mail: p...@ceam.es
skype: paco.pastor.guzman
Researcher ID: http://www.researcherid.com/rid/B-8331-2008
Cosis profile: http://www.cosis.net/profile/francisco.pastor
---
Parque Tecnologico, C/ Charles R. Darwin, 14
46980 PATERNA (Valencia), Spain
Tlf. 96 131 82 27 - Fax. 96 131 81 90


-
Este mensaje y los ficheros anexos son confidenciales. Los mismos 
contienen información reservada de la empresa que no puede ser 
difundida. Si usted ha recibido este correo por error, tenga la 
amabilidad de eliminarlo de su sistema y avisar al remitente mediante 
reenvío a su dirección electrónica; no deberá copiar el mensaje ni 
divulgar su contenido a ninguna persona.


Su dirección de correo electrónico junto a sus datos personales 
forman parte de un fichero titularidad de la Fundación de la 
Comunidad Valenciana Centro de Estudios Ambientales del Mediterráneo 
- CEAM, con CIF: G-46957213, cuya finalidad es la de mantener el 
contacto con Ud. De acuerdo con la Ley Orgánica 15/1999, usted puede 
ejercitar sus derech

[R] variable selection in linear regression

2011-06-07 Thread Syaiba Balqish
Hello  
  
With due respect, have a nice time. I would like to ask some command in R.   
It is regarding variable selection in linear regression. 
In R, there is one rebuild function called "step" which 
selecting variables according to AIC. 

let say i have data [y, x1,x2,x3,x4] 
we start with y~b0 
i compute the partial F test and choose the variable 
with maximum partial F to enter the model, let say 
x4 with max value of partial F=58.02377. 
therefore, our next model is y~b0+b4x4 

my questions... 

1.how should i write so that x4 will be added to the next step? 
2. the formula for partial F test is   
F*=(SSE(reduced  model)-SSE(full model)/dfR-dfF) / (SSE(full model)/dfF) 
which can be simply as 
F*=MSR(xi | x1,x2,...,xi-1,xi+1) / MSE(x1,x2,...,xi-1,xi,xi+1) 
If i would like to write my formula by simplified one, how can i write it 
for every xi (not in the model) that need to be selected with conditionally
depend on other x's (in the model) 
 let say , i want to select other variables (x1, x2, x3) after x4 is
selected 
F*=MSR(x3|x4)/MSE(x3,x4) 

Below, i attach my simple code 

p <- dim(mydata)[2] 
d <- p-1 
n <- dim(mydata)[1] 
x <- as.matrix(mydata[,2:p]) 
y <- as.matrix(mydata[,1]) 
X <- as.matrix(rep(1,n)) 


b <- lm(y~1,data=mydata)$coefficients 
yhat <- X%*%b 
res <- y-yhat 
sigma.hat <- sqrt(sum(res^2)/(n-ncol(X))) 
cv <- sigma.hat^2*ginv(t(X)%*%X) 
se <- sqrt(diag(cv)) 

pc <- matrix(0,nrow=1,ncol=d) 
resF <- matrix(0, nrow=n, ncol=d) 
pf <- matrix(0, nrow=1, ncol=d) 
for(j in 1:d){ 
pc[,j] <- cor(x=(x[,j]), y=(mydata[,1])) 
resF[,j] <- lsfit(x[,j], y)$residuals   
sseF <- t(as.matrix(apply(resF^2, 2, sum))) 
resR <- lm(y~1,data=mydata)$residuals 
sseR <- sum(resR^2) 
dfF <- n-2 
dfR <- n-1 
pf[,j] <- ((sseR-sseF[,j])/(dfR-dfF))/(sseF[,j]/dfF)   
max.pf=max(pf) 
max.pc=max(pc) 


Thank you and looking forward to hear some replies. 
Sincerely, 

Iba 
Universiti Putra Malaysia 
  

--
View this message in context: 
http://r.789695.n4.nabble.com/variable-selection-in-linear-regression-tp3578967p3578967.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2011-06-07 Thread farahnazlakhani
I am working on my thesis in which i have couple of independent variables
that are categorical in nature and the depndent variable is dichotomus. 
Initially I run univariate analysis and added the variables with significant
p-values (p<0.25) in my full model. 
I have three confusions. Firstly, I am looking for confounding variables by
using formula "(crude beta-cofficient - adjusted beta-cofficient)/ crude
beta-cofficient x 100" as per rule if the percentage of any variable is >10%
than I have considered that as confounder. I wanted to know that from
initial model i have deducted one variable with insignificant p-value to
form adjusted model. Now how will i know if the variable that i deducted
from initial model was confounder or not? 
Secondly, I wanted to know if the percentage comes in negative like
(-17.84%) than will it be considered as confounder or not? I also wanted to
know that confounders should be removed from model? or should be kept in
model?
Lastly, I wanted to know that I am running likelihood ratio test to identify
if the value is falling in critical region or not. So if the value doesnot
fall in critical region than what does it show? what should I do in this
case? In my final reduced model all p-values are significant but still the
value identified via likelihood ratio test is not falling in critical
region. So what does that show? 


--
View this message in context: 
http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R results explanation

2011-06-07 Thread Duncan Murdoch

On 07/06/2011 4:14 AM, mpavlic wrote:

Hi all,


this might be a stupid question, but still.

Everytime i find some new function it's prettty easy to understand how to
use the syntax and to perform a text. Even the general idea of what the
function does is pretty easy to understand, but i can not find an
explanation (detailed explanation) of the R output for each function. For
example, a function fitdistr() in MASS package i testing the data against
the distributions (wanted distribution).

fitdistr(k, "lognormal")

 meanlogsdlog

   3.150725905   0.759584332
  (0.012839319) (0.009078769)

But when i get the result I can not find what these numbers mean. Is there a
explanation somewhere in R help for each function?


The Value section of the help page describes what the function returns. 
The help page for fitdistr has


Value:

 An object of class ‘"fitdistr"’, a list with four components,

estimate: the parameter estimates,

  sd: the estimated standard errors,

vcov: the estimated variance-covariance matrix, and

  loglik: the log-likelihood.


You should be aware that objects often print only part of their value; 
the rest can be extracted using the usual methods, e.g.


fit <- fitdistr(k, "lognormal")
fit$loglik

will print the log likelihood.

This doesn't explain what "meanlog" and "sdlog" are; for that you need 
to know about the lognormal distribution, or consult the reference on 
the help page.


Duncan Murdoch



Thanks for the help, m

--
View this message in context: 
http://r.789695.n4.nabble.com/R-results-explanation-tp3578918p3578918.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] access data on server

2011-06-07 Thread Brown, Mathew
Hello,

I'm running R on Linux (Ubantu) and I'm trying to run a script that will read
and plot data on a linux server. I've looked around and haven't been able to
figure out how to do this. 

I want to load several files on the server and then be able to manipulate
them.

Any ideas?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about curve function

2011-06-07 Thread peter dalgaard

On Jun 6, 2011, at 11:22 , Prof Brian Ripley wrote:

> As a further example of the trickiness, the "function" method of plot() 
> relies on curve(x, ...) being a request to plot the function x(x) against x.  
> I've added a comment to that effect to the help page.

Ouch. This springs to mind:

> fortune(106)

If the answer is parse() you should usually rethink the question.
   -- Thomas Lumley
  R-help (February 2005)


but curve() predates that insight by half a decade or more. It could probably 
do with a redesign, if anyone is up to it.

By the way, it really does work if the 2nd arg is an expression object (as 
opposed to an expression evaluating to an expression object):

do.call(curve,list(expression(x)))

or

cl <- quote(curve(x))
cl[[2]] <- expression(x)
eval(cl)

(The trouble with nonstandard evaluation is that it doesn't follow standard 
evaluation rules...)
 


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Probit Transformation

2011-06-07 Thread Stuart
Hi

I have data set consisting of my many variables and also in some cases
missing values. I want to probit transformation of my whole data set
using library VGAM or any other possible way. I would appreciat if
some one can help in writing code for probit transformation


V1  V10 V11 V12 V13 V14 V15 V16
95.62962963 0.00730066  0.007109069 0.009523787 0.009393327 
36.75331497
55.71260452 55.52732849
100 0.00609076  0.00591477  0.00746469  0.007355629 
33.56465912
50.78512955 32.36806107
99.62962963 0.006729639 0.006615394 0.008195653 0.008123143
29.41248512 44.49405289 46.5644722
100 0.004690148 0.005060644 0.004959545 0.004777349 
12.15983486
18.36569214 1.916931152
95.5556 0.006579496 0.006267744 0.007883405 0.00750326  
27.65305138
41.82440567 31.90625763
99.62962963 NA  NA  NA  NA  NA  NA  0
100 0.006345392 0.006233124 0.007698979 0.007630885 
30.69077492
46.43167877 39.03141403
97.03703704 0.005294371 0.004554533 0.006608589 0.006270184
42.88692093 65.0030899  13.15470886
99.25925926 0.004527965 0.004476037 0.004671435 0.004626574
6.895369053 10.40838909 0.006126262
98.89694042 0.004666994 NA  0.00484924  NA  8.208506584 
12.39075851
0.000979055
97.40740741 0.005578207 0.005641043 0.006787194 0.006205489
35.79841614 54.18977356 15.91392231
97.43961353 0.004523328 NA  0.004811963 0.004580847 
14.03365326
21.20001221 0.257747442
98.14814815 0.005641896 0.006085037 0.006456159 0.006326901 
24.5825119
37.18153763 17.51481247
99.25925926 0.004614782 NA  0.004935548 0.004629854 
15.11430836
22.84581566 0.419624776
98.58796296 NA  NA  NA  NA  NA  NA  0
99.24242424 0.004876296 0.004559089 0.005570014 0.005358154
27.62986755 41.76262283 4.181870937
99.62962963 0.00670645  0.006495661 0.008073027 0.007948128 
27.7416153
41.95372772 45.09982681
99.62962963 0.006381809 0.006114239 0.007883402 0.007754351
33.53539276 50.7783699  35.65010834
100 NA  NA  NA  NA  NA  NA  NA


Thanks very muc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] About DCC-garch model...

2011-06-07 Thread windseav
Hi, everyone, 

I currently run into a problem about DCC-Garch model. I use the package
cc-garch and the function dcc.estimation. One of the output of this function
is DCC matrix, which shows conditional correlation matrix at every time
period you gives. However, I cannot figue out how the function calculate the
conditional correlation matrix at the first time period, since there is no
data to be conditioned... How do we usually deal with the first period data
of GARCH model? Could anyone suggest any paper about the DCC GARCH model,
including how to deal with the first time period data? 

Thank you all guys in advance!!  



--
View this message in context: 
http://r.789695.n4.nabble.com/About-DCC-garch-model-tp3579140p3579140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2011-06-07 Thread Mike Marchywka





> Date: Tue, 7 Jun 2011 01:38:32 -0700
> From: farah.farid@student.aku.edu
> To: r-help@r-project.org
> Subject: [R] Logistic Regression
>
> I am working on my thesis in which i have couple of independent variables
> that are categorical in nature and the depndent variable is dichotomus.
> Initially I run univariate analysis and added the variables with significant
> p-values (p<0.25) in my full model.
> I have three confusions. Firstly, I am looking for confounding variables by

I'm not sure what your thesis is about, some system that you are strying
by statistics or maybe the thesis is about statistics, but
according to this disputed wikipedia entry,

http://en.wikipedia.org/wiki/Confounding

confounding or extraneous is determined by the reality of your system.
It may help to consider factors related to that and use the statistics
to avoid fooling yourself. Look at the pictures ( non-pompous way of saying
look at graphs and scatter plots for some ideas to test ) and then test various
ideas. You see bad cause/effect inferences all the time in many fields- 
from econ to biotech ( although anecdotes suggest these mistakes usually
favour the sponsors LOL). Consider some mundane "known" examples about
what your data would look like and see if that relates to what you have.
If you were naively measuring car velocity 
at a single point in front of traffic light and color of light, what might you
observe ( much like with an earlier example on iron in patients, there are a 
number
of more precisely defined measurements you could take on a given "thing.").

If your concern is that " I ran test A and it said B but test C said D and
D seems inconsistent with B" it generally helps to look at assumptions and 
detailed
equations for each model and explore what those mean with your data. With 
continuous
variables anyway, non-monotonic relationships can easily destroy a correlation
even with strong causality. 


> using formula "(crude beta-cofficient - adjusted beta-cofficient)/ crude
> beta-cofficient x 100" as per rule if the percentage of any variable is >10%
> than I have considered that as confounder. I wanted to know that from
> initial model i have deducted one variable with insignificant p-value to
> form adjusted model. Now how will i know if the variable that i deducted
> from initial model was confounder or not?
> Secondly, I wanted to know if the percentage comes in negative like
> (-17.84%) than will it be considered as confounder or not? I also wanted to
> know that confounders should be removed from model? or should be kept in
> model?
> Lastly, I wanted to know that I am running likelihood ratio test to identify
> if the value is falling in critical region or not. So if the value doesnot
> fall in critical region than what does it show? what should I do in this
> case? In my final reduced model all p-values are significant but still the
> value identified via likelihood ratio test is not falling in critical
> region. So what does that show?
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] access data on server

2011-06-07 Thread Juergen Rose
Am Dienstag, den 07.06.2011, 11:40 +0200 schrieb Brown, Mathew:
> Hello,
> 
> I'm running R on Linux (Ubantu) and I'm trying to run a script that will read
> and plot data on a linux server. I've looked around and haven't been able to
> figure out how to do this. 
> 
> I want to load several files on the server and then be able to manipulate
> them.
> 
> Any ideas?

Export a NFS share with the file to read and to plot at the server.
Mount the the share at Ubuntu client and access all files like local
files.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl and kerberos

2011-06-07 Thread TAPO (Thomas Agersten Poulsen)
Hi again,

RCurl is just smarter than me:
getURL("http://my.web.service",.opts=curlOptions(username=":"))
does the trick.

Don't know how I missed that yesterday.

Hope nobody wated time on this.

Cheers,
Thomas

> -Original Message-
> From: TAPO (Thomas Agersten Poulsen)
> Sent: 6. juni 2011 16:10
> To: r-help@r-project.org
> Cc: TAPO (Thomas Agersten Poulsen); JK (Jesper Agerbo Krogh)
> Subject: RCurl and kerberos
> 
> Dear list,
> 
> I would like to call a Kerberos-authenticated web-service from within
> R.
> 
> Curl can do it:
> $ curl --negotiate -u : "http://my.web.service/";
> 
> so I would expect that RCurl also has the capability, but I have not
> been able to find the correct options to set.
> 
> listCurlOptions() does not return anything with negotiate, and
> searching the source of RCurl, the only thing I found was
> 
> ./RCurl/R/curlInfo.S:names(CurlFeatureBits) = c("ipv6", "kerberos4",
> "ssl", "libz", "ntlm", "gssnegotiate",
> 
> but e.g.
> getURL("http://my.web.service",.opts=curlOptions(username=":",httpauth=
> "gssnegotiate"))
> 
> does not work.
> 
> Does anybody know if RCurl or another package is able to negotiate the
> GSS api or otherwise access a Kerberos enabled webservice and how?
> 
> Testing was done in R 2.12.1 on Ubuntu 10.04.1 LTS / Lucid.
> 
> Thanks in advance,
> Thomas.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Populating values from html

2011-06-07 Thread amrita
can we populate values into an excel sheet from html forms that has to be
used in R for data analysis

Can we directly retireve the values from html forms into R fro analysis

--
View this message in context: 
http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list demographics

2011-06-07 Thread Jim Lemon

On 06/07/2011 06:20 AM, Sarah Goslee wrote:

Hi all,

I got curious about something, so in proper scientific fashion I
obtained some data and analyzed it.

Question: what is the female participation in the R-help email list?

Data: the most recent list postings, obtained from the website. I took
my best shot at classifying the names given in the email header as
male/female, but ended with a fair number of unknowns.

This dataset had 2797 list messages, in 895 questions. 1501 messages
were replies to one of those questions by someone not the original
querent.

Across all messages, 6.5% were from women, 77.8% from men, 15.7% unknown.

For new questions, 11.7% were from women, 61.2% from men, 27.0% unknown.

Among responses to other people's questions, 2.6% were from women,
92.3% from men, 5.1% unknown.

Nine women answered other people's questions, but only two were what
I'd consider active participants, offering more than two answers. (Not
divided up by separate questions, so could be several replies in one
discussion.)

For men, 214 answered questions, and 90 offered more than two answers.
(Not divided up by separate questions, so could be several replies in
one discussion.)

Six active participants were unclassifiable, so even if all of those
were female, that would still be only eight women actively
participating in the list in this sample.

Is the list representative of statisticians? People who use R? People
who participate in statistical software email lists? I have no idea,
but I found it interesting that there is so little female
participation in the list, even asking questions (where you'd expect
to see students and new R users), and almost no female participation
in answering questions.

For those of you who teach, are your classes heavily skewed?

Sarah


Hi Sarah,
This is consistent with other studies of gender-related behavior 
(finally a chance to use "gender" correctly!). Women are more likely to 
ask for help when faced with a problem, so the over-representation of 
female "askers" is to be expected. Men may be quicker to post an answer, 
and that tends to inhibit other "answerers", unless the answer is wrong 
or incomplete or you can one-up the previous answerer. In the dark ages 
when I was foolish enough to study statistics, there were ten males in 
my course. That was everybody. Not that all statisticians are male, 
Australia's own Annette Dobson is one of the best known statisticians 
Down Under, and I could make quite a list of notable female 
statisticians just from memory.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2011-06-07 Thread Frank Harrell
The "10% change" idea was never a good one and has not been backed up by
simulations.  It is quite arbitrary and results in optimistic standard
errors of remaining variables.  In fact a paper presented at the Joint
Statistical Meetings about 3 years ago (I'm sorry I've forgotten the names
of the authors) showed that conflicting results are obtained according to
whether you apply the 10% to the coefficients or to the odds ratios, and
there is no theory to guide the choice.  Why risk residual confounding? 
Form a good model apriori and adjust for all potential confounders; don't
base the choice on P-values.  Use propensity scores if overfitting is an
issue.
Frank


farahnazlakhani wrote:
> 
> I am working on my thesis in which i have couple of independent variables
> that are categorical in nature and the depndent variable is dichotomus. 
> Initially I run univariate analysis and added the variables with
> significant p-values (p<0.25) in my full model. 
> I have three confusions. Firstly, I am looking for confounding variables
> by using formula "(crude beta-cofficient - adjusted beta-cofficient)/
> crude beta-cofficient x 100" as per rule if the percentage of any variable
> is >10% than I have considered that as confounder. I wanted to know that
> from initial model i have deducted one variable with insignificant p-value
> to form adjusted model. Now how will i know if the variable that i
> deducted from initial model was confounder or not? 
> Secondly, I wanted to know if the percentage comes in negative like
> (-17.84%) than will it be considered as confounder or not? I also wanted
> to know that confounders should be removed from model? or should be kept
> in model?
> Lastly, I wanted to know that I am running likelihood ratio test to
> identify if the value is falling in critical region or not. So if the
> value doesnot fall in critical region than what does it show? what should
> I do in this case? In my final reduced model all p-values are significant
> but still the value identified via likelihood ratio test is not falling in
> critical region. So what does that show?
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3579392.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] write gene_id in a bed file

2011-06-07 Thread ads pit
Hi all,
I have build the following data frame
 head(href)
   chr tx_start   tx_end g_id strand cds_start  cds_end exon_count
1 chr1  8384389  8404227 NM_001080397  +   8384389  8404073  8
2 chr1 16767166 16786584 NM_001145277  +  16767256 16785491  7
3 chr1 16767166 16786584 NM_001145278  +  16767256 16785385  8
4 chr1 16767166 16786584NM_018090  +  16767256 16785385  8
5 chr1 48998526 50489626NM_032785  -  48999844 50489468 14
6 chr1 33546713 33585995NM_052998  +  33547850 33585783 12

Now when I'm trying to export it into a bed file I did:
output_href<- write.table(new_CTTS, file="new_href.bed", quote=FALSE, sep =
"\t", na="NA", row.names=FALSE, col.names=TRUE)

The thing is when I look at the file I only have:
 head(test)
  chr1 X564620 X564649 X564644 X565645  X94 X. X10
1 chr1  565369  565404  565371  566372  217  +   8
2 chr1  565463  565541  565480  566481 1214  +  15
3 chr1  565653  565697  565662  53 1031  +  28
4 chr1  565861  565922  565883  566884  316  +  12
5 chr1  566537  566573  566564  567565  119  +  11
6 chr1  567535  567579  567562  568563 2085  +  39

but what I want is to include the corresponding gene ID as well in a last
column. How can I do that?

Best,
Nanami

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Segfaults of eigen

2011-06-07 Thread Juergen Rose
Dear Prof. Ripley,

Am Montag, den 21.02.2011, 11:40 + schrieb Prof Brian Ripley:
> So there is very likely a bug in your system software or compiler.
> There is no such problem on other x86_64 systems (including 
> Fedora 14 and 12, FreeBSD, Solaris and Windows).
> 
> You can debug segfaults for yourself (see 'Writing R Extensions') or 
> ask your vendor for support.

I still suffer from the same problem, but now with R version 2.12.2.

> On Mon, 21 Feb 2011, Juergen Rose wrote:
> 
> > Am Montag, den 21.02.2011, 11:11 +0100 schrieb Juergen Rose:
> >> Hi,
..
> >> With large(?) matrices eigen produces on several of my systems
> >> segfaults:
> >>
> >>> data(iris)
> >>> D <- dist(iris[,-5])
..
> >>> eigen(D)
> >>
> >>  *** caught segfault ***
> >> address (nil), cause 'unknown'
> >>
> >> Traceback:
> >>  1: .Call("La_rs", x, only.values, PACKAGE = "base")
> >>  2: eigen(D)

The problem happens only if blas-atlas and lapack-atlas are the current
libraries, it not happens with blas-reference and lapack-reference.

I tried also to follow 'Writing R Extensions', but I not really
understand, what I should do in my case. Have I to compile the libraries
and R with debugging information?

I tried without debugging information in R:
> data(iris)
> D <- dist(iris[,-5])
> debug(eigen)
> eigen(D)
debugging in: eigen(D)
debug: {
x <- as.matrix(x)
if (!is.null(dimnames(x))) 
dimnames(x) <- list(NULL, NULL)
n <- nrow(x)
if (!n) 
stop("0 x 0 matrix")
if (n != ncol(x)) 
stop("non-square matrix in 'eigen'")
complex.x <- is.complex(x)
if (!complex.x && !is.double(x)) 
storage.mode(x) <- "double"
if (!all(is.finite(x))) 
stop("infinite or missing values in 'x'")
if (missing(symmetric)) 
symmetric <- isSymmetric.matrix(x)
if (!EISPACK) {
if (symmetric) {
z <- if (!complex.x) 
.Call("La_rs", x, only.values, PACKAGE = "base")
else .Call("La_rs_cmplx", x, only.values, PACKAGE = "base")
ord <- rev(seq_along(z$values))
}
else {
z <- if (!complex.x) 
.Call("La_rg", x, only.values, PACKAGE = "base")
else .Call("La_rg_cmplx", x, only.values, PACKAGE = "base")
ord <- sort.list(Mod(z$values), decreasing = TRUE)
}
return(list(values = z$values[ord], vectors = if (!only.values)
z$vectors[, 
ord, drop = FALSE]))
}
dbl.n <- double(n)
if (symmetric) {
if (complex.x) {
xr <- Re(x)
xi <- Im(x)
z <- .Fortran("ch", n, n, xr, xi, values = dbl.n, 
!only.values, vectors = xr, ivectors = xi, dbl.n, 
dbl.n, double(2 * n), ierr = integer(1L), PACKAGE =
"base")
if (z$ierr) 
stop(gettextf("'ch' returned code %d in 'eigen'", 
  z$ierr), domain = NA)
if (!only.values) 
z$vectors <- matrix(complex(real = z$vectors, 
  imaginary = z$ivectors), ncol = n)
}
else {
z <- .Fortran("rs", n, n, x, values = dbl.n, !only.values, 
vectors = x, dbl.n, dbl.n, ierr = integer(1L), 
PACKAGE = "base")
if (z$ierr) 
stop(gettextf("'rs' returned code %d in 'eigen'", 
  z$ierr), domain = NA)
}
ord <- sort.list(z$values, decreasing = TRUE)
}
else {
if (complex.x) {
xr <- Re(x)
xi <- Im(x)
z <- .Fortran("cg", n, n, xr, xi, values = dbl.n, 
ivalues = dbl.n, !only.values, vectors = xr, 
ivectors = xi, dbl.n, dbl.n, dbl.n, ierr = integer(1L), 
PACKAGE = "base")
if (z$ierr) 
stop(gettextf("'cg' returned code %d in 'eigen'", 
  z$ierr), domain = NA)
z$values <- complex(real = z$values, imaginary = z$ivalues)
if (!only.values) 
z$vectors <- matrix(complex(real = z$vectors, 
  imaginary = z$ivectors), ncol = n)
}
else {
z <- .Fortran("rg", n, n, x, values = dbl.n, ivalues =
dbl.n, 
!only.values, vectors = x, integer(n), dbl.n, 
ierr = integer(1L), PACKAGE = "base")
if (z$ierr) 
stop(gettextf("'rg' returned code %d in 'eigen'", 
  z$ierr), domain = NA)
ind <- z$ivalues > 0L
if (any(ind)) {
ind <- seq.int(n)[ind]
z$values <- complex(real = z$values, imaginary = z
$ivalues)
if (!only.values) {
  z$vectors[, ind] <- complex(real = z$vectors[, 
ind], imaginary = z$vectors[, ind + 1])
  z$vectors[, ind + 1] <- Conj(z$vectors[, ind])
}
}
}
ord <- sort.list(Mod(z$values), decreasing = TRUE)
}
  

[R] rgl: insert pauses in animation sequence / movie formats other than gif?

2011-06-07 Thread Michael Friendly

Two questions related to creating animated movies with rgl:

1. I've created an rgl scene with 5 different views I want to display in 
a movie, but I'd like to insert pauses (say, 5 seconds)

at each view.  How can I do this?

I first created 5 userMatrix's, then

play3d( par3dinterp( userMatrix=list(M1, M2, M3, M4, M5)),  
,duration=2*60/5) )


then tried simply repeating each twice,

play3d( par3dinterp( userMatrix=list(M1, M1, M2, M2, M3, M3, M4, M4, M5, 
M5)),  ,duration=2*60/5) )


but that didn't give the desired effect.  I see that play3d() has a 
times= argument, but the documentation

doesn't indicate how to use it in this context.

2. With movie3d(), I can get an animated .gif, but I wonder if there are 
other movie formats (.mov, .mpg) I can get,
either with convert, or an external tool, e.g., so I can embed a movie 
in a LaTeX  -> .pdf document using the

movie15 package.

e.g., I tried using convert at in a command prompt window,
> convert -delay 1x8 coffee-av3D-1*.png coffee-av3D-1.mov

but it seems that only one frame was used.

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ID50 i) comparisons ii) dose.p vs Reed-Muench

2011-06-07 Thread geast
I'm analysing some ID50 data for 2 different groups and had already
calculated this by hand using Reed-Muench formula, when I came across the
dose.p function in R.

I have 2 queries:

1) dose.p gives me a different answer to Reed-Muench, and actually I suspect
wrong answer, given that the dose.p result dosage stated to infect 50% is
actually stronger than the dose used in my experiments caused above 50%
infection!

>SetA<-data.frame(c("Hi","Med","Lo"),c(8.44,7.46,6.22),c(31,27,35),c(21,14,6))
>names(SetA)<-c("Group","dose","total","infected")
>attach(SetA)
>y<-cbind(infected,total-infected)
>model<-glm(y~dose,binomial)

>library(MASS)
> dose.p(model,p=0.5)
#   DoseSE
# >  p = 0.5:  7.6056   0.2263418
 
BUT! The 50% result is Higher than the Med Dose (Log 7.46) which in test
produced 51.9%
 (and by R-M formula the ID50 would be be Log 7.40)


2) How can I compare the ID50's of two sets to see if the difference between
their ID50 is signficant?

 I also have one further Set B and will calculate ID50 for this (by R-M its
working out at Log 6.84), and am interested to see if Set A and Set B differ
in their ID50.


Many thanks in advance

--
View this message in context: 
http://r.789695.n4.nabble.com/ID50-i-comparisons-ii-dose-p-vs-Reed-Muench-tp3579311p3579311.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Populating values from html

2011-06-07 Thread Mike Marchywka









> Date: Tue, 7 Jun 2011 03:35:46 -0700
> From: ammasamri...@gmail.com
> To: r-help@r-project.org
> Subject: [R] Populating values from html
>
> can we populate values into an excel sheet from html forms that has to be
> used in R for data analysis
>
> Can we directly retireve the values from html forms into R fro analysis


Judging from the way many websites offer data, you'd think that jpg is
the best means for getting it LOL. html, pdf, and image formats
are designed for human consumption of limited aspects of a data set.
Normally you would prefer something closer to raw data like csv.
After having visited yet another public website that offers
data in pdf ( YAPWTODIP) I would suggest you first contact the
site and ask them to present data in a form which allows it to
be easily examined in an open source environment ( note this criterion  does not
make Excel a preferred choice either). If you have to scrape web pages,
apparently there are some R facilities but depending on the page you
may need to do a lot of work and it will not  survive if the page
is redone for artistic reasons etc. See any of these results for example,

http://www.google.com/search?hl=en&q=cran+page+scraping








>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Populating-values-from-html-tp3579215p3579215.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Draw a Dendrogram

2011-06-07 Thread Ayoub Maatallaoui

Hello,
i'm a research student working on everyday sounds classification.
i need to draw a dendrogram to show how the classification is done, but 
while i never used R before, i guess that a help from someone would be 
great :)

does any one of you did something like that before?
Thank you

--
Ayoub

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About DCC-garch model...

2011-06-07 Thread Arun.stat
Dear Windseav, I found that it is quite subjective because the effect of
initial value will dilute after couple of time periods, hence whatever value
you put there never matters. However I found that common practice is to put
the unconditional variance/covariance/correlation for the first period. I
can recall (sometimes back I probably checked) that fgarch package (or may
be something related, I cant recall now) put average sum-square of the
observations as the estimated variance for the 1st period.

Thanks,
_

Arun Kumar Saha, FRM
QUANTITATIVE RISK AND HEDGE CONSULTING SPECIALIST
Visit me at: http://in.linkedin.com/in/ArunFRM
_

--
View this message in context: 
http://r.789695.n4.nabble.com/About-DCC-garch-model-tp3579140p3579489.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Draw a Dendrogram

2011-06-07 Thread Sarah Goslee
Hi Ayoub,

You'd be best served by learning how to search and to get help.

Within R,
??dendrogram
will give you a list of all the functions mentioning dendrogram, while
?dendrogram
will give you the help page for the function specifically named dendrogram().

While it can be a bit difficult to search for R-related information,
http://www.rseek.org
is a restricted Google search engine specifically intended for questions like
yours.

Beyond that, I'd highly recommend both one of the many excellent intro
to R guides and the posting guide for this email list.

Sarah

On Tue, Jun 7, 2011 at 8:22 AM, Ayoub Maatallaoui
 wrote:
> Hello,
> i'm a research student working on everyday sounds classification.
> i need to draw a dendrogram to show how the classification is done, but
> while i never used R before, i guess that a help from someone would be great
> :)
> does any one of you did something like that before?
> Thank you
>
> --
> Ayoub
>



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sorting a data frame with values of different lengths

2011-06-07 Thread William Armstrong
Hi all,

I am attempting to run a script in which I permute my data and run a
Wilcoxon rank sum test on the data 1000 times and compare my original test
statistic to the permuted test statistics to more accurately estimate a
significance level for the trends I am observing.

In the process of doing this, I need to sort a data frame that measures 3 x
1001 based on the value of the test statistic (W).  I am trying to get the
data in ascending order based on W.  My data frame looks something like
this:

> W_table
   pds_gagehandles.i.  p W
Wmibe  1   746
2mibe  2 870.5
3mibe  3   767
4mibe  4  1066
5mibe  5   885
6mibe  6 931.5
7mibe  7   765
8mibe  8   930
9mibe  9 696.5
10   mibe 10 711.5
11   mibe 11  1006

I am trying to sort it using the command: > W_table[order(W_table$W),],
which is spitting out:

   pds_gagehandles.i.  p W
11   mibe 11  1006
4mibe  4  1066
9mibe  9 696.5
10   mibe 10 711.5
Wmibe  1   746
7mibe  7   765
3mibe  3   767
2mibe  2 870.5
5mibe  5   885
8mibe  8   930
6mibe  6 931.5

I want this data frame to have the values that are currently the first two
as the last two, i.e. I need it in ascending order.  I am thinking that I am
having this problem because W is different lengths and ?order states the
data should be "a sequence of numeric, complex, character or logical
vectors, all of the same length, or a classed R object".  Maybe because the
values over 1000 have an extra digit R is seeing them as '100' for some
reason? Does anyone know of another function I can use to accomplish this
task or a way to work around this error?

Thank you very much,

Billy

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579653.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread Sarah Goslee
Hi,

It looks to me that your data frame is being sorted as text.

What does
str(W_table)
show?

How was W_table created? Your W column appears to not be numeric.

Sarah

On Tue, Jun 7, 2011 at 9:51 AM, William Armstrong
 wrote:
> Hi all,
>
> I am attempting to run a script in which I permute my data and run a
> Wilcoxon rank sum test on the data 1000 times and compare my original test
> statistic to the permuted test statistics to more accurately estimate a
> significance level for the trends I am observing.
>
> In the process of doing this, I need to sort a data frame that measures 3 x
> 1001 based on the value of the test statistic (W).  I am trying to get the
> data in ascending order based on W.  My data frame looks something like
> this:
>
>> W_table
>   pds_gagehandles.i.  p     W
> W                mibe  1   746
> 2                mibe  2 870.5
> 3                mibe  3   767
> 4                mibe  4  1066
> 5                mibe  5   885
> 6                mibe  6 931.5
> 7                mibe  7   765
> 8                mibe  8   930
> 9                mibe  9 696.5
> 10               mibe 10 711.5
> 11               mibe 11  1006
>
> I am trying to sort it using the command: > W_table[order(W_table$W),],
> which is spitting out:
>
>   pds_gagehandles.i.  p     W
> 11               mibe 11  1006
> 4                mibe  4  1066
> 9                mibe  9 696.5
> 10               mibe 10 711.5
> W                mibe  1   746
> 7                mibe  7   765
> 3                mibe  3   767
> 2                mibe  2 870.5
> 5                mibe  5   885
> 8                mibe  8   930
> 6                mibe  6 931.5
>
> I want this data frame to have the values that are currently the first two
> as the last two, i.e. I need it in ascending order.  I am thinking that I am
> having this problem because W is different lengths and ?order states the
> data should be "a sequence of numeric, complex, character or logical
> vectors, all of the same length, or a classed R object".  Maybe because the
> values over 1000 have an extra digit R is seeing them as '100' for some
> reason? Does anyone know of another function I can use to accomplish this
> task or a way to work around this error?
>
> Thank you very much,
>
> Billy
>
> --


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread William Armstrong
Hi Sarah,

str(W_table) gives me:

> str(W_table)
'data.frame':   11 obs. of  3 variables:
 $ pds_gagehandles.i.: Factor w/ 1 level "mibe": 1 1 1 1 1 1 1 1 1 1 ...
 $ p : chr  "1" "2" "3" "4" ...
 $ W : chr  "746" "870.5" "767" "1066" ...

here is the script I am using, with the lines that create W and the W_table
bolded:

for(i in 1:length(pds_gagehandles)){
POTWY<-get(paste(pds_gagehandles[i],"_POTWY",sep=""))
num_wyrs<-length(POTWY)
pre70wyrs_ind<-1:(length(POTWY)-39)
post70wyrs_ind<-(length(POTWY)-38):length(POTWY)

for(p in 1:11){
if(p==1){
pre70_POTWY<-POTWY[pre70wyrs_ind]
post70_POTWY<-POTWY[post70wyrs_ind]
original_stats<-wilcox.test(pre70_POTWY,post70_POTWY)
W<-original_stats$statistic
W_table<-data.frame(pds_gagehandles[i],p,W) 
W_table_name<-paste(pds_gagehandles[i],"_W_table",sep="")
}else{
sample_POTWY<-sample(POTWY)
sample_pre70_POTWY<-sample_POTWY[pre70wyrs_ind]
sample_post70_POTWY<-sample_POTWY[post70wyrs_ind]

sample_stats<-wilcox.test(sample_pre70_POTWY,sample_post70_POTWY)
sample_W<-sample_stats$statistic
sample_info<-c(pds_gagehandles[i],p,sample_W)
W_table<-rbind(W_table,sample_info) 
if(p==1001){
assign(W_table_name,W_table)

write.table(W_table,col.names=c('gage_handle','iteration','W'),file=paste(W_table_name,".txt",sep=""),sep="\t",row.names=FALSE)
}
}
}
}

Thank you very much for your help.

Billy

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579680.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Not missing at random

2011-06-07 Thread Joshua Wiley
Hi Blaz,

What do you do if the number of values sampled to be set missing
(e.g., 4) is greater than the number of values for a given case that
are less than your < 3 threshold?  If no special considerations are
needed for that, I do not see why you cannot apply the same technique
you did below with MCAR to MNAR.

Best regards,

Josh

On Tue, Jun 7, 2011 at 12:17 AM, Blaz Simcic  wrote:
> Josh,
>
> thanks for the answer, it really helped me. I have another question, if you
> maybe know how to do it.
>
> I would also like  to sample number of missing values within selected cases,
> as i did wit MCAR (see below).
>
> Can you help me tith this?
>
> Thanks,
>
> Blaz from Slovenia
>
> Here is my code for MCAR:
>
> N <- 1000  number of cases
>
> n <- 12   number of variables
>
> X <- matrix(rnorm(N * n), N, n)    matrix
>
> pMiss <- 0.20 percent of missing values
>
> idMiss <- sample(1:N, N * pMiss)    sample cases
>
> nMiss <- length(idMiss)
>
> m <- 3    maximum number of missing values within selected cases
>
> howmanyMiss <- sapply(idMiss, function(x) sample(1:m, 1))
>
> howmanyMiss   number of missing values within selected cases
>
> varMiss<-lapply(howmanyMiss, function(x) sample(1:n, x))     which
> values are missing
>
> ids <- cbind(rep(idMiss, howmanyMiss), unlist(varMiss))
>
> Xmiss <- X
>
> Xmiss[ids] <- NA
>
> Xmiss
>
> 
> From: Joshua Wiley 
> To: Blaz Simcic 
> Cc: r-help@r-project.org
> Sent: Mon, June 6, 2011 10:34:38 PM
> Subject: Re: [R] Not missing at random
>
> Hi Blaz,
>
> See below.
>
> x <-
> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
> nrow = 7, ncol=7, byrow=TRUE) matrix
>
> pMiss <- 30    percent of missing values
>
> N <- dim(x)[1]  number of cases
>
> candidate <- which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3
> |
> x[,7]<3)     I want to sample all cases with at least 1 value
> lower than 3, so I have to find candidates
>
> ## easier to use this
> ## find all x < 3 and return their row and column indices
> ## select only row indices, and then find unique
> candidate <- unique(which(x < 3, arr.ind = TRUE)[, "row"])
>
> idMiss <- sample(candidate, N * pMiss / 100)   I sampled cases
>
> ## from the subset of x cases that will be missing
> ## find all that are < 3 and set to NA
> x[idMiss, ][x[idMiss, ] < 3] <- NA
>
> ## If you are going to do this a lot, consider a function
> nmar <- function(x, op = "<", value = 3, p = 30) {
>   op <- get(op)
>   candidate <- unique(which(op(x, value), arr.ind = TRUE)[, "row"])
>   idMiss <- sample(candidate, nrow(x) * p / 100)
>   x[idMiss, ][op(x[idMiss, ], value)] <- NA
>   return(x)
> }
>
> nmar(x)
>
> ## has the advantage that you can easily change
> ## p, the cut off value, the operator (e.g., "<", ">", "<=", etc.)
>
> Cheers,
>
> Josh
>
> On Sun, Jun 5, 2011 at 11:17 PM, Blaz Simcic  wrote:
>>
>>
>> Hello!
>>
>> I would like to sample 30 % of cases (with at least 1 value lower than 3 -
>> in
>> the row) and among them I want to set all values lower than 3 (within
>> selected
>> cases) as NA (NMAR- Not missing at random). I managed to sample cases, but
>> I
>> don’t know how to set values (lower than 3) as NA.
>>
>> R code:
>>
>> x <-
>>
>> matrix(c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,3,3,3,4),
>>  nrow = 7, ncol=7, byrow=TRUE) matrix
>>
>> pMiss <- 30 percent of missing values
>>
>> N <- dim(x)[1]   number of cases
>>
>> candidate<-which(x[,1]<3 | x[,2]<3 | x[,3]<3 | x[,4]<3 | x[,5]<3 | x[,6]<3
>> |
>> x[,7]<3)     I want to sample all cases with at least 1 value lower
>> than 3,
>> so I have to find candidates
>>
>> idMiss <- sample(candidate, N * p / 100)     I sampled cases
>>
>> Now I'd like to set all values among sampled cases as NA.
>>
>> Any suggestion?
>>
>> Thanks,
>> Blaž
>>        [[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread William Armstrong
Also, I tried changing a line to store W as numeric:

sample_info<-c(pds_gagehandles[i],p,as.numeric(sample_W))

But it is still sorting incorrectly:

> W_table[order(W_table$as.numeric.W.),]
   pds_gagehandles.i.  p as.numeric.W.
8mibe  81004.5
1mibe  1   746
10   mibe 10   746
6mibe  6 811.5
2mibe  2 818.5
11   mibe 11 822.5
4mibe  4   879
5mibe  5   888
9mibe  9   901
7mibe  7 903.5
3mibe  3 919.5
> str(W_table)
'data.frame':   11 obs. of  3 variables:
 $ pds_gagehandles.i.: Factor w/ 1 level "mibe": 1 1 1 1 1 1 1 1 1 1 ...
 $ p : chr  "1" "2" "3" "4" ...
 $ as.numeric.W. : chr  "746" "818.5" "919.5" "879" ...


--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579691.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write gene_id in a bed file

2011-06-07 Thread Mohamed Lajnef
Hi Nanami,

you do not use the same file in export, href or new_CTTS ?

Regards
M



Le 07/06/11 14:42, ads pit a écrit :
> Hi all,
> I have build the following data frame
>   head(href)
> chr tx_start   tx_end g_id strand cds_start  cds_end exon_count
> 1 chr1  8384389  8404227 NM_001080397  +   8384389  8404073  8
> 2 chr1 16767166 16786584 NM_001145277  +  16767256 16785491  7
> 3 chr1 16767166 16786584 NM_001145278  +  16767256 16785385  8
> 4 chr1 16767166 16786584NM_018090  +  16767256 16785385  8
> 5 chr1 48998526 50489626NM_032785  -  48999844 50489468 14
> 6 chr1 33546713 33585995NM_052998  +  33547850 33585783 12
>
> Now when I'm trying to export it into a bed file I did:
> output_href<- write.table(new_CTTS, file="new_href.bed", quote=FALSE, sep =
> "\t", na="NA", row.names=FALSE, col.names=TRUE)
>
> The thing is when I look at the file I only have:
>   head(test)
>chr1 X564620 X564649 X564644 X565645  X94 X. X10
> 1 chr1  565369  565404  565371  566372  217  +   8
> 2 chr1  565463  565541  565480  566481 1214  +  15
> 3 chr1  565653  565697  565662  53 1031  +  28
> 4 chr1  565861  565922  565883  566884  316  +  12
> 5 chr1  566537  566573  566564  567565  119  +  11
> 6 chr1  567535  567579  567562  568563 2085  +  39
>
> but what I want is to include the corresponding gene ID as well in a last
> column. How can I do that?
>
> Best,
> Nanami
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Mohamed Lajnef,IE INSERM U955 eq 15#
Pôle de Psychiatrie#
Hôpital CHENEVIER  #
40, rue Mesly  #
94010 CRETEIL Cedex FRANCE #
mohamed.laj...@inserm.fr#
tel : 01 49 81 32 79   #
Sec : 01 49 81 32 90   #
fax : 01 49 81 30 99   #




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about curve function

2011-06-07 Thread Uwe Ligges



On 07.06.2011 11:57, peter dalgaard wrote:


On Jun 6, 2011, at 11:22 , Prof Brian Ripley wrote:


As a further example of the trickiness, the "function" method of plot() relies 
on curve(x, ...) being a request to plot the function x(x) against x.  I've added a 
comment to that effect to the help page.


Ouch. This springs to mind:


fortune(106)


If the answer is parse() you should usually rethink the question.
-- Thomas Lumley
   R-help (February 2005)


but curve() predates that insight by half a decade or more. It could probably 
do with a redesign, if anyone is up to it.

By the way, it really does work if the 2nd arg is an expression object (as 
opposed to an expression evaluating to an expression object):

do.call(curve,list(expression(x)))

or

cl<- quote(curve(x))
cl[[2]]<- expression(x)
eval(cl)

(The trouble with nonstandard evaluation is that it doesn't follow standard 
evaluation rules...)


If this is not already a fortune, I will add it.

Which is why I useually circvumvent curve(). It is typically faster to 
just evaluate a function at positions x and plot it rather than thinking 
minutes about how curve() expects its arguments.


Uwe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread Sarah Goslee
Hi,

On Tue, Jun 7, 2011 at 10:01 AM, William Armstrong
 wrote:
> Hi Sarah,
>
> str(W_table) gives me:
>
>> str(W_table)
> 'data.frame':   11 obs. of  3 variables:
>  $ pds_gagehandles.i.: Factor w/ 1 level "mibe": 1 1 1 1 1 1 1 1 1 1 ...
>  $ p                 : chr  "1" "2" "3" "4" ...
>  $ W                 : chr  "746" "870.5" "767" "1066" ...

See? W is character, so that's how it's being used to sort.

Somewhere in your code it's being converted to character. Since I
can't run your code myself, I can't easily see where that's happening.
Nothing jumps out as an obvious error.

Sarah

> here is the script I am using, with the lines that create W and the W_table
> bolded:
>
> for(i in 1:length(pds_gagehandles)){
>        POTWY<-get(paste(pds_gagehandles[i],"_POTWY",sep=""))
>        num_wyrs<-length(POTWY)
>        pre70wyrs_ind<-1:(length(POTWY)-39)
>        post70wyrs_ind<-(length(POTWY)-38):length(POTWY)
>
>        for(p in 1:11){
>                if(p==1){
>                        pre70_POTWY<-POTWY[pre70wyrs_ind]
>                        post70_POTWY<-POTWY[post70wyrs_ind]
>                        original_stats<-wilcox.test(pre70_POTWY,post70_POTWY)
>                        W<-original_stats$statistic
>                        W_table<-data.frame(pds_gagehandles[i],p,W)
> W_table_name<-paste(pds_gagehandles[i],"_W_table",sep="")
>                }else{
>                        sample_POTWY<-sample(POTWY)
>                        sample_pre70_POTWY<-sample_POTWY[pre70wyrs_ind]
>                        sample_post70_POTWY<-sample_POTWY[post70wyrs_ind]
>                        
> sample_stats<-wilcox.test(sample_pre70_POTWY,sample_post70_POTWY)
>                        sample_W<-sample_stats$statistic
>                        sample_info<-c(pds_gagehandles[i],p,sample_W)
>                        W_table<-rbind(W_table,sample_info)
> if(p==1001){
>                                assign(W_table_name,W_table)
>
> write.table(W_table,col.names=c('gage_handle','iteration','W'),file=paste(W_table_name,".txt",sep=""),sep="\t",row.names=FALSE)
>                        }
>                }
>        }
> }
>
> Thank you very much for your help.
>
> Billy
>
> --

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Line Graphs

2011-06-07 Thread Robert Baer
I want to plot 6 line graphs. I have 10 points 0.1, 0.2, 0.3, 0.4, 0.5, 
0.6,

0.7, 0.8, 0.9 and 1.0.
At each point say 0.1, I have 6 variables A, B, C, D, E and F. The 
variables

all have values between 0 and 1 (and including 0 and 1). I also want to
label the x axis from 0.1 to 1.0 and the y axis from 0.1 to 1.0.
My goal is to plot a line graph representing the mean of the variables at
each level. SO for 0.1 on the xaxis, we should expect 6 values for y.
This is what I have so far. The plot omits 1.0 and the abline function 
does

not make the line y = x on the plot
This is what I have so far:

# Calculate range from 0 to max value of cars and trucks
g_range <- range(0, 1)

# Graph autos using y axis that ranges from 0 to max
# value in cars or trucks vector.  Turn off axes and
# annotations (axis labels) so we can specify them ourself
plot(B, type="o", pch = 0, lty=1,col="blue", ylim=g_range, axes=FALSE,
ann=FALSE)
As  you are currently doing it, you are not specifying a x variable.  This 
means you are getting an 'index plot'.  If I understand what you want it 
would be simpler to just specify x in your plot statement.  From there if 
you absolutely need EVERY 0.1 labeled you could suppress the axis and label 
by hand if you want.  Something along the line of:

A =runif(10)
B =runif(10)
C =runif(10)
D =runif(10)
E =runif(10)
F =runif(10)
x = seq(0.1, 1.0, 0.1)
plot(x, B, type="o", pch = 0, lty=1,col="blue", xlim = c(0,1), ylim = 
c(0,1),

 xlab=expression(paste(lambda[0])),
 ylab= expression(paste("Estimate of ", lambda[0])))

# Graph trucks with red dashed line and square points
lines(x, A, type="o", pch=2, lty=1, col="red")
lines(x, C, type="o", pch=3, lty=1, col="green")
lines(x, D, type="o", pch=4, lty=1, col="orange")
lines(x, E, type="o", pch=6, lty=1, col="brown")
lines(x, F, type="o", pch=8, lty=1, col="yellow")
abline(0, 1, col = "black")
# Create a title with a red, bold/italic font
title(main="Methods", col.main="red", font.main=4)

# Create a legend at (1, g_range[2]) that is slightly smaller
# (cex) and uses the same line colors and points used by
# the actual plots
legend(1, g_range[2], c("B","A", "C", "D", "E", "F", "Actual"), cex=0.6,
col=c("blue","red", "green", "orange", "brown","yellow", "black"),
pch=c(0,2,3,4,6,8,9), lty=1)


Rob


# Make x axis using the values of pi_0 labels
#

axis(1, at=1:10,
lab=c("0.1","0.2","0.3","0.4","0.5","0.6","0.7","0.8","0.9","1.0"))

# Make y axis with horizontal labels that display ticks at
del = seq(0.1,1, 0.1)
axis(2, at=del,
lab=c("0.1","0.2","0.3","0.4","0.5","0.6","0.7","0.8","0.9","1.0"))

# Create box around plot
box()

# Graph trucks with red dashed line and square points
lines(A, type="o", pch=2, lty=1, col="red")
lines(C, type="o", pch=3, lty=1, col="green")
lines(D, type="o", pch=4, lty=1, col="orange")
lines(E, type="o", pch=6, lty=1, col="brown")
lines(F, type="o", pch=8, lty=1, col="yellow")
abline(0, 1, col = "black")
# Create a title with a red, bold/italic font
#title(main="Methods", col.main="red", font.main=4)

# Label the x and y axes
title(xlab=expression(paste(lambda[0])))
title(ylab= expression(paste("Estimate of ", lambda[0])))

# Create a legend at (1, g_range[2]) that is slightly smaller
# (cex) and uses the same line colors and points used by
# the actual plots
legend(1, g_range[2], c("B","A", "C", "D", "E", "F", "Actual"), cex=0.6,
col=c("blue","red", "green", "orange", "brown","yellow", "black"),
pch=c(0,2,3,4,6,8,9), lty=1)




--
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2011-06-07 Thread Bert Gunter
IMHO, you evidence considerable confusion and misunderstanding of
statistical methods. I would say that most of what you describe is
nonsense. Of course, maybe I'm just the one who's confused, but I
would strongly suggest you consult with a local statistician. This
list is unlikely to be able to provide you the level of support and
understanding that you need.

Cheers,
Bert

On Tue, Jun 7, 2011 at 1:38 AM, farahnazlakhani
 wrote:
> I am working on my thesis in which i have couple of independent variables
> that are categorical in nature and the depndent variable is dichotomus.
> Initially I run univariate analysis and added the variables with significant
> p-values (p<0.25) in my full model.
> I have three confusions. Firstly, I am looking for confounding variables by
> using formula "(crude beta-cofficient - adjusted beta-cofficient)/ crude
> beta-cofficient x 100" as per rule if the percentage of any variable is >10%
> than I have considered that as confounder. I wanted to know that from
> initial model i have deducted one variable with insignificant p-value to
> form adjusted model. Now how will i know if the variable that i deducted
> from initial model was confounder or not?
> Secondly, I wanted to know if the percentage comes in negative like
> (-17.84%) than will it be considered as confounder or not? I also wanted to
> know that confounders should be removed from model? or should be kept in
> model?
> Lastly, I wanted to know that I am running likelihood ratio test to identify
> if the value is falling in critical region or not. So if the value doesnot
> fall in critical region than what does it show? what should I do in this
> case? In my final reduced model all p-values are significant but still the
> value identified via likelihood ratio test is not falling in critical
> region. So what does that show?
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Logistic-Regression-tp3578962p3578962.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about curve function

2011-06-07 Thread Matt Shotwell
On Tue, 2011-06-07 at 16:17 +0200, Uwe Ligges wrote:
> 
> On 07.06.2011 11:57, peter dalgaard wrote:
> >
> > On Jun 6, 2011, at 11:22 , Prof Brian Ripley wrote:
> >
> >> As a further example of the trickiness, the "function" method of plot() 
> >> relies on curve(x, ...) being a request to plot the function x(x) against 
> >> x.  I've added a comment to that effect to the help page.
> >
> > Ouch. This springs to mind:
> >
> >> fortune(106)
> >
> > If the answer is parse() you should usually rethink the question.
> > -- Thomas Lumley
> >R-help (February 2005)
> >
> >
> > but curve() predates that insight by half a decade or more. It could 
> > probably do with a redesign, if anyone is up to it.
> >
> > By the way, it really does work if the 2nd arg is an expression object (as 
> > opposed to an expression evaluating to an expression object):
> >
> > do.call(curve,list(expression(x)))
> >
> > or
> >
> > cl<- quote(curve(x))
> > cl[[2]]<- expression(x)
> > eval(cl)
> >
> > (The trouble with nonstandard evaluation is that it doesn't follow standard 
> > evaluation rules...)
> 
> If this is not already a fortune, I will add it.

And one more for Uwe's principle: "when discontent, circumvent!"  :)

> Which is why I useually circvumvent curve(). It is typically faster to 
> just evaluate a function at positions x and plot it rather than thinking 
> minutes about how curve() expects its arguments.
> 
> Uwe
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl: insert pauses in animation sequence / movie formats other than gif?

2011-06-07 Thread Duncan Murdoch

On 07/06/2011 9:24 AM, Michael Friendly wrote:

Two questions related to creating animated movies with rgl:

1. I've created an rgl scene with 5 different views I want to display in
a movie, but I'd like to insert pauses (say, 5 seconds)
at each view.  How can I do this?

I first created 5 userMatrix's, then

play3d( par3dinterp( userMatrix=list(M1, M2, M3, M4, M5)),
,duration=2*60/5) )

then tried simply repeating each twice,

play3d( par3dinterp( userMatrix=list(M1, M1, M2, M2, M3, M3, M4, M4, M5,
M5)),  ,duration=2*60/5) )

but that didn't give the desired effect.  I see that play3d() has a
times= argument, but the documentation
doesn't indicate how to use it in this context.


Something like this works:

play3d(par3dinterp(times=c(0,5,6,11,12,17),
 userMatrix=list(m1,m1,m2,m2,m3,m3),
 method="linear"))

The "linear" says to use linear interpolation between time points, so it 
will stay exactly constant when the userMatrix doesn't change.  If you 
leave it out, it uses spline interpolation, and that gives some 
irritating overshooting.



2. With movie3d(), I can get an animated .gif, but I wonder if there are
other movie formats (.mov, .mpg) I can get,
either with convert, or an external tool, e.g., so I can embed a movie
in a LaTeX  ->  .pdf document using the
movie15 package.

e.g., I tried using convert at in a command prompt window,
  >  convert -delay 1x8 coffee-av3D-1*.png coffee-av3D-1.mov

but it seems that only one frame was used.

I don't know what formats convert handles these days.  I used ffmpeg to 
put together an mp4 movie a while ago.  The command line was


ffmpeg  -b 240 -r 24 -i movie%03d.png -s xga movie.mp4

That is:  a bitrate of 2.4 million bits per second, showing at 24 frames per second, 
frames named movie001.png etc, output size 1024x768 ("xga" size) in file 
movie.mp4.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Taking Integral and Optimization using Integrate, Optim and maxNR

2011-06-07 Thread Uwe Ligges



On 06.06.2011 20:14, MARYAM ZOLGHADR wrote:

Dear All, Hello!

I have some questoins in R programming as follows:

Question 1- How to take the integral of this function with respect to y, such 
that x would appear in the output after taking integral.

f(x,y)=(0.1766*exp(-exp(y+lnx))*-exp(y+lnx))/(1-exp(-exp(y+lnx))) y in 
(-6.907,-1.246)

It is doable in maple but not in R. At least I could not find the way.

p.s: result from maple is:

g(x)=dilog*exp(0.001000755564*x)+0.5*ln(exp(0.001000755564*x))^2-dilog*(exp(0.287653*x))-0.5*ln(exp(0.287653*x))^2

Where dilog=integral(log(t)/(1-t)) for t in (1,x)



Question 2- Then I want to optimize (maximize) the answer of the above integral 
for x. Assum x in (0,100)

The answer should be something between 26 and 27.



What I did is: got answer of integral(f(x,y)) from maple which is g(x), and 
then applied it in R. The code is as follows:

##In the case n=1

library(stats)
require(graphics)
integrand=function(t){(log(t))/(1-t)}
#dilog=integrate(integrand, lower=1, upper=x)
fr<- function(x){(integrate(integrand, lower=1, 
upper=x)$value)*(exp(0.001000755564*x))+0.5*log(exp(0.001000755564*x))^2-(integrate(integrand,
 lower=1, upper=x)$value)*(exp(0.287653*x))-0.5*log(exp(0.287653*x))^2}

optim(20, fr, NULL, method = "BFGS")



Question 2-1-Default by optim is to minimize a function, how I can use it to 
maximization?



If you want to maximize f, either minimize -f or read ?optim and find 
that it tells us:
"By default this function performs minimization, but it will maximize if 
control$fnscale is negative. "




Question 2-2- The above code faced with errors, and did not work. What I guess 
is there is something wrong with taking integral. The output of integrate 
function in R is some sort of thing. I had to somehow tell it, just take the 
value and forget about the others. But I guess it is still something wrong with 
it. I also tried maxNR, but is didn't work either.


If you integrate  log(t)/(1-t) from t=1 (as you specified in your 
code!), then at t=1 the value is undefined ...



Uwe Ligges




Question 2-3- Thoes above are the easiest case of my problem. Assume the case 
that I have summation of f(x1,y)+f(x2,y)+...+f(x12,y)



The article have done it by E04JAF-NAG Fortran Library Routine Document. But I 
want to do it in R.

Thanx all.



Cheers,

Maryam

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread William Armstrong
Thanks for catching that, Sarah.

It seems like the problem was that I was using the c() function to combine
terms (including W) that I was adding to a data frame.

This caused R to convert the numeric W to a character string.

I fixed this by using data.frame() and then rbind() instead of c() and
rbind(), which is now giving me the results I was expecting.

Thank you very much for your help.

Billy

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p358.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gam() (in mgcv) with multiple interactions

2011-06-07 Thread Ben Haller
  Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as recommended 
to me earlier by some folks on this list.  I've run into a question to which I 
can't find the answer in his book, so I'm hoping somebody here knows.

   My outcome variable is binary, so I'm doing a binomial fit with gam().  I 
have five independent variables, all continuous, all uniformly distributed in 
[0, 1].  (This dataset is the result of a simulation model.)  Let's call them 
a,b,c,d,e for simplicity.  I'm interested in interactions such as a*b, so I'm 
using tensor product smooths such as te(a,b).  So far so good.  But I'm also 
interested in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these 
have a as a marginal basis (if I'm using the right terminology; all I mean is, 
both interactions involve a), and I would have expected them to share that 
basis; I would have expected them to be constrained such that the effect of a 
when b=0, for one, would be the same as the effect of a when d=0, for the 
other.  This would be just as, in a GLM with formula a*b + a*d, that formula 
would expand to a + b + d + a:b + a:d, and there is only one "a"; a doesn't get 
to be different for the a*b interaction than it is for the!
  a*d interaction.  But with tensor product smooths in gam(), that does not 
seem to be the case.  I'm still just getting to know mgcv and experimenting 
with things, so I may be doing something wrong; but the plots I have done of 
fits of this type appear to show different marginal effects.

  I tried explicitly including terms for the marginal basis; in my example, I 
tried a formula like te(a) + te(b) + te(d) + te(a, b) + te(a, d).  No dice; in 
this case, the main effect of a is different between all three places where it 
occurs in the model.  I.e. te(a) shows a different effect of a than te(a, b) 
shows at b=0, which is again different from the effect shown by te(a, d) at 
d=0.  I don't even know what that could possibly mean; it seems wrong to me 
that this could even be the case, but what do I know.  :->

   I could move up to a higher-order tensor like te(a,b,d), but there are three 
problems with that.  One, the b:d interaction (in my simplified example) is 
then also part of the model, and I'm not interested in it.  Two, given the set 
of interactions that I *am* interested in, I would actually be forced to do the 
full five-way te(a,b,c,d,e), and with a 300,000 row dataset, I shudder to think 
how long that will take to run, since it would have something like 5^5 free 
parameters to fit; that doesn't seem worth pursuing.  And three, interpretation 
of a five-way interaction would be unpleasant, to say the least; I'd much 
rather be able to stay with just the two-way (and one three-way) interactions 
that I know are of interest (I know this from previous logistic regression 
modelling of the dataset).

  For those who like to see the actual R code, here are two fits I've tried:

gam(outcome ~ te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

gam(outcome ~ te(slope) + te(curvature) + te(amplitude) + te(acl) + 
te(dispersal) + te(slope, curvature) + te(slope, amplitude) + te(curvature, 
amplitude) + te(acl, dispersal) + te(amplitude, dispersal) + te(slope, 
curvature, amplitude), family=binomial, data=rla, method="REML")

  So.  Any advice?  How can I correctly do a gam() fit involving multiple 
interactions that involve the same independent variable?

  Thanks!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl: insert pauses in animation sequence / movie formats other than gif?

2011-06-07 Thread Michael Friendly

On 6/7/2011 10:52 AM, Duncan Murdoch wrote:


Something like this works:

play3d(par3dinterp(times=c(0,5,6,11,12,17),
 userMatrix=list(m1,m1,m2,m2,m3,m3),
 method="linear"))

The "linear" says to use linear interpolation between time points, so 
it will stay exactly constant when the userMatrix doesn't change.  If 
you leave it out, it uses spline interpolation, and that gives some 
irritating overshooting.

Thanks very much for this, Duncan.

Something like this would make an excellent documentation example for 
play3d/movie3d, or perhaps better

par3dinterp, even under \dontrun{}.

I take it that the times are relative rather than absolute, and time is 
scaled according to duration and
fps when the scene is play3d()'d.  Yes?  If so, the description for 
time= could also be clarified.


best,
-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl: insert pauses in animation sequence / movie formats other than gif?

2011-06-07 Thread Duncan Murdoch

On 07/06/2011 12:22 PM, Michael Friendly wrote:

On 6/7/2011 10:52 AM, Duncan Murdoch wrote:
>
>  Something like this works:
>
>  play3d(par3dinterp(times=c(0,5,6,11,12,17),
>   userMatrix=list(m1,m1,m2,m2,m3,m3),
>   method="linear"))
>
>  The "linear" says to use linear interpolation between time points, so
>  it will stay exactly constant when the userMatrix doesn't change.  If
>  you leave it out, it uses spline interpolation, and that gives some
>  irritating overshooting.
Thanks very much for this, Duncan.

Something like this would make an excellent documentation example for
play3d/movie3d, or perhaps better
par3dinterp, even under \dontrun{}.

I take it that the times are relative rather than absolute, and time is
scaled according to duration and
fps when the scene is play3d()'d.  Yes?  If so, the description for
time= could also be clarified.


Definitely it should be clarified in the docs; they are absolute times, 
in units of seconds.  Things get a little complicated when you go past 
the last time (17 seconds):  there are a number of different 
extrapolation methods described in ?par3dinterp.


What the par3dinterp function returns is a function that takes a time 
(expressed in seconds) and computes the parameters (i.e. the userMatrix 
in the example) that should be in effect at that time.  It doesn't know 
anything about fps, that's used by play3d to choose what times to pass 
to that constructed function.  The function that play3d is given can do 
lots of things other than returning new viewpoint parameters; for 
example, the flag demo redraws the scene in every frame (using 
skipRedraw=TRUE to avoid showing partial versions).


Duncan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a file with reusable functions accessible throughout a computational biology cancer project

2011-06-07 Thread Ben Ganzfried
Hi,

My project is set up the following way:
root directory contains the following folders:
  folders: "Breast_Cancer" AND "Colorectal_Cancer" AND "Lung_Cancer" AND
"Prostate_Cancer"

I want to create a file, call it: "repeating_functions.R" and place it in
the root directory such that I can call these functions from within the
sub-folders in each type of cancer.  My confusion is that I'm not sure of
the syntax to make this happen.  For example:

Within the "Prostate_Cancer" folder, I have the following folders:
"curated" AND "src" AND "uncurated"

Within "uncurated" I have a ton of files, one of which could be:
PMID5377_fullpdata.csv

within "src" I have my R scripts, the one corresponding to the above
"uncurated" file would be:
PMID5377_curation.R

Here's the problem I'm trying to address:
Many of the uncurated files will require the same R code to curate them and
I find myself spending a lot of time copying and pasting the same code over
and over. I've spent at least 40 hours copying code I've already written and
pasting it into a new dataset.  There has simply got to be a better way to
do this.

A common example of the code I'll write in an "uncurated" file is the
following (let's call the following snippet of code UNCURATED_EXAMPLE1):
##characteristics_ch1.2 -> G
tmp <- uncurated$characteristics_ch1.2
tmp <- sub("grade: ","",tmp,fixed=TRUE)
tmp[tmp=="I"] <- "low"
tmp[tmp=="II"] <- "low"
tmp[tmp=="III"] <- "high"
curated$G <- tmp

The thing that changes depending on the dataset is *typically* the column
header (ie "uncurated$characteristics_ch1.2" might be
"uncurated$description" or "uncurated_characteristics_ch1.7" depending on
the dataset), although sometimes I want to substitute different words (ie
"grade" can be referred to in many different ways).

What's the easiest way to automate this?  I'd like, at a minimum, to make
UNCURATED_EXAMPLE1 look like the following:
tmp <- uncurated$characteristics_ch1.2
insert_call_to_repeating_functions.R_and_access_("grade")_function
curated$G <- tmp

It would be even better if I could say, for Prostate_Cancer, write one R
script that standardizes all the "uncurated" datasets; rather than writing
100 different R scripts.  Although I don't know how feasible this is.

I'm sorry if this sounds confusing.  Basically, I have thousands of
"uncurated" datasets with clinical information and I'm trying to standardize
all the datasets via R scripts so that all the information is standardized
for statistical analysis.  Not all of the datasets contain the same
information, but many of them do contain similar data (ie age, stage, grade,
days_to_recurrence, and many others).  Furthermore, in many cases the
standardization code is very similar across datasets (ie I'll want to delete
the words "Age: " before the actual number).  But this is not always the
case (ie sometimes a dataset will not put the different patient data (ie
age, stage, grade) in separate columns, instead putting it all in one
column, so I have to write a different function to split it by the ";" and
make a new table that is separated by column).  Anyway, I would be forever
grateful for any advice to make this quicker and am happy to provide any
clarifications.

Thank you very much.

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a file with reusable functions accessible throughout a computational biology cancer project

2011-06-07 Thread Duncan Murdoch

On 07/06/2011 12:41 PM, Ben Ganzfried wrote:

Hi,

My project is set up the following way:
root directory contains the following folders:
   folders: "Breast_Cancer" AND "Colorectal_Cancer" AND "Lung_Cancer" AND
"Prostate_Cancer"

I want to create a file, call it: "repeating_functions.R" and place it in
the root directory such that I can call these functions from within the
sub-folders in each type of cancer.  My confusion is that I'm not sure of
the syntax to make this happen.  For example:

Within the "Prostate_Cancer" folder, I have the following folders:
"curated" AND "src" AND "uncurated"

Within "uncurated" I have a ton of files, one of which could be:
PMID5377_fullpdata.csv

within "src" I have my R scripts, the one corresponding to the above
"uncurated" file would be:
PMID5377_curation.R

Here's the problem I'm trying to address:
Many of the uncurated files will require the same R code to curate them and
I find myself spending a lot of time copying and pasting the same code over
and over. I've spent at least 40 hours copying code I've already written and
pasting it into a new dataset.  There has simply got to be a better way to
do this.


There is:  you should put your common functions in a package.  Packages 
are a good way to organize your own code, you don't need to publish 
them.  (You will get a warning if you put "Not for distribution" into 
the License field in the DESCRIPTION file, but it's just a warning.)  
You can also put datasets in a package; this makes sense if they are 
relatively static.  If you get new data every day you probably wouldn't.

A common example of the code I'll write in an "uncurated" file is the
following (let's call the following snippet of code UNCURATED_EXAMPLE1):
##characteristics_ch1.2 ->  G
tmp<- uncurated$characteristics_ch1.2
tmp<- sub("grade: ","",tmp,fixed=TRUE)
tmp[tmp=="I"]<- "low"
tmp[tmp=="II"]<- "low"
tmp[tmp=="III"]<- "high"
curated$G<- tmp

The thing that changes depending on the dataset is *typically* the column
header (ie "uncurated$characteristics_ch1.2" might be
"uncurated$description" or "uncurated_characteristics_ch1.7" depending on
the dataset), although sometimes I want to substitute different words (ie
"grade" can be referred to in many different ways).

What's the easiest way to automate this?  I'd like, at a minimum, to make
UNCURATED_EXAMPLE1 look like the following:
tmp<- uncurated$characteristics_ch1.2
insert_call_to_repeating_functions.R_and_access_("grade")_function
curated$G<- tmp

It would be even better if I could say, for Prostate_Cancer, write one R
script that standardizes all the "uncurated" datasets; rather than writing
100 different R scripts.  Although I don't know how feasible this is.


Both of those sound very easy.   For example,

curate <- function(characteristic, word="grade: ") {
  tmp <- sub(word, "", characteristic, fixed=TRUE)
  tmp[tmp=="I"] <- "low"
  tmp[tmp=="II"] <- "low"
  tmp[tmp=="III"] <- "high"
  tmp
}

Then your script would just need one line

curated$G <- curate(uncurated$characteristics_ch1.2)

I don't know where you'll find the names of all the datasets, but if you 
can get them into a vector, it's pretty easy to write a loop that calls 
curate() for each one.


Deciding how much goes in the package and how much is one-off code that 
stays with a particular dataset is a judgment call.  I'd guess based on 
your description that curate() belongs in the package but the rest 
doesn't, but you know a lot more about the details than I do.


Duncan Murdoch

I'm sorry if this sounds confusing.  Basically, I have thousands of
"uncurated" datasets with clinical information and I'm trying to standardize
all the datasets via R scripts so that all the information is standardized
for statistical analysis.  Not all of the datasets contain the same
information, but many of them do contain similar data (ie age, stage, grade,
days_to_recurrence, and many others).  Furthermore, in many cases the
standardization code is very similar across datasets (ie I'll want to delete
the words "Age: " before the actual number).  But this is not always the
case (ie sometimes a dataset will not put the different patient data (ie
age, stage, grade) in separate columns, instead putting it all in one
column, so I have to write a different function to split it by the ";" and
make a new table that is separated by column).  Anyway, I would be forever
grateful for any advice to make this quicker and am happy to provide any
clarifications.

Thank you very much.

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http:

Re: [R] rgl: insert pauses in animation sequence / movie formats other than gif?

2011-06-07 Thread Michael Friendly

On 6/7/2011 12:36 PM, Duncan Murdoch wrote:

On 07/06/2011 12:22 PM, Michael Friendly wrote:

On 6/7/2011 10:52 AM, Duncan Murdoch wrote:
>
>  Something like this works:
>
>  play3d(par3dinterp(times=c(0,5,6,11,12,17),
>   userMatrix=list(m1,m1,m2,m2,m3,m3),
>   method="linear"))
>
>  The "linear" says to use linear interpolation between time points, so
>  it will stay exactly constant when the userMatrix doesn't change.  If
>  you leave it out, it uses spline interpolation, and that gives some
>  irritating overshooting.
Thanks very much for this, Duncan.

Something like this would make an excellent documentation example for
play3d/movie3d, or perhaps better
par3dinterp, even under \dontrun{}.

I take it that the times are relative rather than absolute, and time is
scaled according to duration and
fps when the scene is play3d()'d.  Yes?  If so, the description for
time= could also be clarified.


Definitely it should be clarified in the docs; they are absolute 
times, in units of seconds.  Things get a little complicated when you 
go past the last time (17 seconds):  there are a number of different 
extrapolation methods described in ?par3dinterp.


What the par3dinterp function returns is a function that takes a time 
(expressed in seconds) and computes the parameters (i.e. the 
userMatrix in the example) that should be in effect at that time.  It 
doesn't know anything about fps, that's used by play3d to choose what 
times to pass to that constructed function.  The function that play3d 
is given can do lots of things other than returning new viewpoint 
parameters; for example, the flag demo redraws the scene in every 
frame (using skipRedraw=TRUE to avoid showing partial versions).


Duncan


OK, I don't exactly understand all the relationships among time (in 
seconds) for par3dinterp() and duration
and fps for play3d() and movie3d(), but the following gave me *exactly* 
what I was looking for: transitions

among 5 views, with pauses at each:

# define an order to show the views and basic times for each:
views <- list(M1, M3, M4, M5, M2)
times <- 5 * (seq_along(views)-1)

# add pauses; use linear interpolation in time
interp3d.fun <- par3dinterp( times=sort(c(times, times+2)),
userMatrix=rep(views, each=2),  method="linear")
play3d( interp3d.fun, duration=2*60/5)

best,
-Michael



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predictor raised to a constant power using gnm

2011-06-07 Thread natalia norden

Hi,

I'm using the package gnm to perform non-linear models and I cannot  
find how to write the formula for a power function model. I found out  
that there was a function Raise() to do this but it no longer exists  
in the package.  I guess I need to use the nonlin function Mult(), but  
I couldn't find how.


The equation is something like (1)   y ~ A * x^B   or  (2)y ~ A *  
B^x.


Thank you for your help,

Best regards,
Natalia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Taking Integral and Optimization using Integrate, Optim and maxNR

2011-06-07 Thread Berend Hasselman

Dear All, Hello!

I have some questoins in R programming as follows:

Question 1- How to take the integral of this function with respect to y,
such that x would appear in the output after taking integral.

f(x,y)=(0.1766*exp(-exp(y+lnx))*-exp(y+lnx))/(1-exp(-exp(y+lnx))) y in
(-6.907,-1.246)

It is doable in maple but not in R. At least I could not find the way.

p.s: result from maple is:

g(x)=dilog*exp(0.001000755564*x)+0.5*ln(exp(0.001000755564*x))^2-dilog*(exp(0.287653*x))-0.5*ln(exp(0.287653*x))^2

Where dilog=integral(log(t)/(1-t)) for t in (1,x)

Question 2- Then I want to optimize (maximize) the answer of the above
integral for x. Assum x in (0,100)

The answer should be something between 26 and 27.

What I did is: got answer of integral(f(x,y)) from maple which is g(x), and
then applied it in R. The code is as follows:

##In the case n=1

library(stats)
require(graphics)
integrand=function(t){(log(t))/(1-t)}
#dilog=integrate(integrand, lower=1, upper=x)
fr <- function(x){(integrate(integrand, lower=1,
upper=x)$value)*(exp(0.001000755564*x))+0.5*log(exp(0.001000755564*x))^2-(integrate(integrand,
lower=1,
upper=x)$value)*(exp(0.287653*x))-0.5*log(exp(0.287653*x))^2}

optim(20, fr, NULL, method = "BFGS")

Question 2-1-Default by optim is to minimize a function, how I can use it to
maximization?

Question 2-2- The above code faced with errors, and did not work. What I
guess is there is something wrong with taking integral. The output of
integrate function in R is some sort of thing. I had to somehow tell it,
just take the value and forget about the others. But I guess it is still
something wrong with it. I also tried maxNR, but is didn't work either.

/quote>

The integral of dilog (should) exist (according to the books).

dilog <- function(t) log(t)/(1-t)

integrate(dilog,1,2)
-0.822467 with absolute error < 9.1e-15

Now if you do:

z <- integrate(dilog,1.000,2)

str(z)

you will see that z is a list:

List of 5
 $ value   : num -0.822
 $ abs.error   : num 9.13e-15
 $ subdivisions: int 1
 $ message : chr "OK"
 $ call: language integrate(f = dilog, lower = 1, upper = 2)
 - attr(*, "class")= chr "integrate"

If you need the result of integrate then it is standard  R to use z$value to
get the value of the integral.

Berend



--
View this message in context: 
http://r.789695.n4.nabble.com/Taking-Integral-and-Optimization-using-Integrate-Optim-and-maxNR-tp3577923p3580456.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 and facet

2011-06-07 Thread James Rome
I have a data frame  (attached) that has interpolated EOT errors for
each minute before flight landing. It also has the runway and an index
for the flight:

> > times[1:4,]
  time   error runway flight
10 -0.0220623504R  1
21 -0.0796163104R  1
32 -0.1379538004R  1
43 -0.2072607304R  1

> > sapply(times, class)
 time errorrunwayflight
"numeric" "numeric"  "factor"  "factor"

I want to plot this using ggplot2
library(ggplot2)
pp2 = qplot(times$time, times$error, times)
pp2 = pp2 + facet_wrap(~ times$runway)
print(pp2)
But when I try it, I get the error
Error in fix.by(by.x, x) : 'by' must specify valid column(s)
I get the same error if I make times$time a factor:

> > times$time = as.factor(times$time)
> > sapply(times, class)
 time errorrunwayflight
 "factor" "numeric"  "factor"  "factor"

> > pp2 = qplot(times$time, times$error, times)
> > pp2 = pp2 + facet_wrap(~ times$runway)
> > print(pp2)
Error in fix.by(by.x, x) : 'by' must specify valid column(s)

What am I doing wrong?

And I really want to make a boxplot for every minute of times$time, but
when I try that, I get one single box.

Thanks for the help,
Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Draw a Dendrogram

2011-06-07 Thread Ayoub Maatallaoui

Le 07/06/2011 15:43, Sarah Goslee a écrit :

Hi Ayoub,

You'd be best served by learning how to search and to get help.

Within R,
??dendrogram
will give you a list of all the functions mentioning dendrogram, while
?dendrogram
will give you the help page for the function specifically named dendrogram().

While it can be a bit difficult to search for R-related information,
http://www.rseek.org
is a restricted Google search engine specifically intended for questions like
yours.

Beyond that, I'd highly recommend both one of the many excellent intro
to R guides and the posting guide for this email list.

Sarah

On Tue, Jun 7, 2011 at 8:22 AM, Ayoub Maatallaoui
  wrote:

Hello,
i'm a research student working on everyday sounds classification.
i need to draw a dendrogram to show how the classification is done, but
while i never used R before, i guess that a help from someone would be great
:)
does any one of you did something like that before?
Thank you

--
Ayoub





Hi Sarah,
first thanx for the fast reply, and rethanx :) for the informations.
i guess that by digging through the first 2 commands in R i can find a 
solution to my task.

but i'll keep in touch if things doesn't go the way i need ^^

Regards

Ayoub

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] assign a cluster based on a variable

2011-06-07 Thread Dominik P.H. Kalisch

Hi,

I have two matrices of the following form:

cluster (n=18):
12062  1
12063  2
12064  2
12065  3
12066  5

KreisSA (n=2304)
12062
12062
12067
12065
12063
12067

I try to assign the cluster[,2] to KreisSAa by the follwoing loop:

n <- nrow(cluster)
KreisSAa <- numeric()

for(i in 1:n){
 KreisSAa[KreisSA == cluster[i,1]] <- cluster[i,2]
}

The result is a vector of the length n=4608 where after the entry 2304 
are just NA's

Has someone an idea what I do wrong?

Thanks for your help.
Dominik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 2: Histogram with bell curve?

2011-06-07 Thread dicko ahmadou
Hi

Don't use t as var names, because t is also a function (transpose).
This code should work...
set.seed(1)
T <- rnorm(500)
qplot(T, geom = "blank") + 
geom_histogram(aes(y = ..density..), colour = "black", fill = "blue") + 
stat_density(geom = "line", colour = "red")



--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-2-Histogram-with-bell-curve-tp3580359p3580457.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count length of continues elements in a vector

2011-06-07 Thread davetracz
I am performing a precipitation analysis. data is in the form of daily
precipitation amounts, e.g.

x<- c(4,5,3,0,0,0,2,4,6,4,0,0,0,2,2,0,3,4,1,0,...)

I would like to find the length of the "storm", length of storm would be
defined as the number of days with continues precipitation. in this case the
returned vector would be:

 (3,4,2,3,...)

I would also like the amount of precipitation associated with each "storm"
in another variable. in this case the variable would look like:

(12,16,4,8,...)

any suggestions would be appreciated.

thanks
Dave

--
View this message in context: 
http://r.789695.n4.nabble.com/count-length-of-continues-elements-in-a-vector-tp3579754p3579754.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help on selecting genes showing highest variance

2011-06-07 Thread GIS Visitor 33
Hi

I have a problem for which I would like to know a solution. I have a gene 
expression data and I would like to choose only lets say top 200 genes that had 
the highest expression variance across patients.

How do i do this in R?

I tried x=apply(leukemiadata,1,var)
x1=x[order(-1*x)]

but the problem here is  x and x1 are numeric data , If I choose the first 200 
after sorting in descending, so I do not know how to choose the associated 
samples with just the numeric values.

Kindly help! 


Regards
Ap
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Classifying boolean values

2011-06-07 Thread Grifone
Hi to all, I'm new to this forum and new to R. I have to build a tree
classifier that has boolean values as response.
When I build the tree with:

echoknn.tree <- tree(class ~ ., data=echoknn.train)

where "class" is a coloumn of my dataset (echoknn.train) of boolean values,
the result is a tree where leaf nodes are numbers in the range [0,1]; but
this isn't the result that I expect to have. I'd want that result of
classifier is TRUE or FALSE.
Can someone help me?

Thanks.

Fabio


--
View this message in context: 
http://r.789695.n4.nabble.com/Classifying-boolean-values-tp3579993p3579993.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RgoogleMaps Axes

2011-06-07 Thread Erik Gregory
R Help,


I posted a question on StackOverflow yesterday regarding an issue I've been 
having with the RgoogleMaps packages' displaying of axes.  Here is the text of 
that submission:
http://stackoverflow.com/questions/6258408/rgooglemaps-axes

"I can't find any documentation of the following problem I'm having with the 
axis labels in RGoogleMaps:
library(RgoogleMaps)
datas <-structure(list(LAT =c(37.875,37.925,37.775,37.875,37.875),
                   LON =c(-122.225,-122.225,-122.075,-122.075,-122.025)),
                   .Names=c("LAT","LON"),class="data.frame",
                   row.names =c(1418L,1419L,1536L,1538L,1578L))
# Get bounding box.
boxt <-qbbox(lat =datas$LAT,lon =datas$LON)
MyMap<-GetMap.bbox(boxt$lonR,boxt$latR,destfile ="Arvin12Map.png",
maptype ="mobile")
PlotOnStaticMap(MyMap,lat =datas$LAT,lon =datas$LON,
                axes =TRUE,mar =rep(4,4))

When I run this on my computer the horizontal axis ranges from 300W to 60E, but 
the ticks in between aren't linearly spaced (300W, 200W, 100W, 0, 100E, 160W, 
60W). Also, the vertical axis moves linearly from 300S to 300N. It seems that 
no matter what data I supply for datas, the axes are always labeled this way.
My question is:
1. Does this problem occur on other machines using this code?
2. Does anyone have an explanation for it?
and
3. Can anybody suggest a way to get the correct axes labels (assuming these are 
"incorrect", but maybe i'm somehow misinterpreting the plot!)?
Thank you for your time."

There has been no answer other than that I ought to contact the package 
maintainer.  Since it would be nice to have a publicly displayed solution, I 
opted to post here first before doing so.  Does anyone have any insight to 
share?
Thank you very much for your time,

-Erik Gregory
CSU Sacramento, Mathematics
Student Assistant, California Environmental Protection Agency

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About DCC-garch model...

2011-06-07 Thread windseav
Hi Arun, thank you so much for your reply. 

I have tried to use cor() function in R to calculate the unconditional
correlation matrix of my time series, but it is not the same as the
calculated first period Dynamic Conditional Correlation matrix by the
function dcc.estimation...I don't know why

Besides, is this function very slow, because of the optimization methods? I
tried 30 time series with 771 time periods, and the function runned for one
and half hours under the windows and even cannot get a result...

Thank you!!  

--
View this message in context: 
http://r.789695.n4.nabble.com/About-DCC-garch-model-tp3579140p3579909.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Arima and Sarima Models

2011-06-07 Thread Flavio2f
Dears
I would like to know the command line to:
1. plot the periodogram of a time series
2. To calculate a sazonal difference of the 7th order
3. Put the AR or MA term of the 9th order sazonal (or not-sazonal) part.
4. The significance level (P-value) for the estimated parameters of the
ARMA(1,1), e.x. The default output is the parameter value and his s.e. .
Thaks
Flávio 

--
View this message in context: 
http://r.789695.n4.nabble.com/Arima-and-Sarima-Models-tp3580300p3580300.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sorting DataFrames

2011-06-07 Thread Cox, Samantha Lucy
I am a new user, and i am trying to sort out a data frame.



I have for example bins of data.  Within each bin i have multiple counts of 
animals and the depths at which these count were taken.  How would I summarise 
this to being only the maximum count per bin alongisde the corresponding height 
(but not the maximum depth - i want the depth at which the maximum number of 
animals occurs).



Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Get RStudio to show line number of error

2011-06-07 Thread idris
Using RStudio 0.93.89 on Windows 7. When I do "Run Lines" on a segment of
code in the Workspace and get an error message displayed in the Console, I
want to know what line number the error is on.

Is this possible in RStudio or can you suggest another workflow where I can
actually know what line my error is on?

Cheers.

--
View this message in context: 
http://r.789695.n4.nabble.com/Get-RStudio-to-show-line-number-of-error-tp3580243p3580243.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Dataframes

2011-06-07 Thread David Winsemius

SamiC wrote:
> 
> I am a new user, and i am trying to sort out a data frame.
> 
> I have for example bins of data.  Within each bin i have multiple counts
> of animals and the depths at which these count were taken.  How would I
> summarise this to being only the maximum count per bin alongisde the
> corresponding height (but not the maximum depth - i want the depth at
> which the maximum number of animals occurs).
> 
> 
> Thank you
> 

Post a small example that contains sufficient complexity to represent the
problem and also post what you consider to be the correct answer.

-- 
David.

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-Dataframes-tp3580075p3580521.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting a data frame with values of different lengths

2011-06-07 Thread Tim Smith
William,

I think to convert to numeric, you might need to do something like:

as.numeric(as.character())  ## and not just as.numeric()

As it stands, it would appear that it is still being read as a character string.






From: William Armstrong 
To: r-help@r-project.org
Sent: Tue, June 7, 2011 10:05:37 AM
Subject: Re: [R] Sorting a data frame with values of different lengths

Also, I tried changing a line to store W as numeric:

sample_info<-c(pds_gagehandles[i],p,as.numeric(sample_W))

But it is still sorting incorrectly:

> W_table[order(W_table$as.numeric.W.),]
   pds_gagehandles.i.  p as.numeric.W.
8mibe  81004.5
1mibe  1   746
10   mibe 10   746
6mibe  6 811.5
2mibe  2 818.5
11   mibe 11 822.5
4mibe  4   879
5mibe  5   888
9mibe  9   901
7mibe  7 903.5
3mibe  3 919.5
> str(W_table)
'data.frame':   11 obs. of  3 variables:
$ pds_gagehandles.i.: Factor w/ 1 level "mibe": 1 1 1 1 1 1 1 1 1 1 ...
$ p : chr  "1" "2" "3" "4" ...
$ as.numeric.W. : chr  "746" "818.5" "919.5" "879" ...


--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-a-data-frame-with-values-of-different-lengths-tp3579653p3579691.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tab-completion + running R from gedit

2011-06-07 Thread W Kruijer
Dear R-users,

having a bit of experience with R under windows, I recently switched to
linux (ubuntu 10.4, 64-bit).
I'm using gedit 2.30.3 and GNOME Terminal 2.30.2

Everything works fine, except that occasionally gedit and/or the terminal
seem to behave strangely.
For example:

Thr_min <- 0
Thr_max <- 10
Thr_grid<- seq(from=Thr_min,to=Thr_max,length=101)
cn_best <- 34
Thr   <- Thr_grid[cn_best]

which gives output

> Thr_min<- 0
> Thr_max<- 10
> Thr_grid<- seq(from=Thr_min,to=Thr_max,length=101)
> cn_best<- 34
> Thr_<- Thr_grid[cn_best]

Hence, I end up with a variable Thr_ instead of Thr
Apparently this happens because of the 2 tabs in the last statement, and the
fact that at that point there
exist other variables with names starting with Thr_
(sorry if those tabs did not show up correctly in the format of this e-mail;

in the code above I used 1 or 2 tabs to have the  assignment operators
alligned)

Of course the problem could be avoided by leaving out the tabs, changing the
variable names, or running the code
using the source command. Nonetheless, I would be very much interested in
running the above code correctly
by copy-pasting from gedit.

Does anyone have ideas how to change the behavior of R, gedit and/or the
GNOME terminal ?

Thanks !

Willem Kruijer
Wageningen,  Netherlands

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 Histogram with density curve

2011-06-07 Thread wwreith
I am learning ggplot2 commands and I have figured out how to create
histograms and density curves but I am not sure how to add a density curve
on top of a histogram. 

Here are the two graphs that I created. 

## Histogram
t<-rnorm(500)
w<-qplot(t, main="Normal Random Sample", fill=I("blue"), colour=I("black"),
geom="histogram")
w

##Density Curve
t<-rnorm(500)
r<-qplot(t, main="Normal Random Sample", colour=I("black"), geom="density",
adjust=4)
r

--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot2-Histogram-with-density-curve-tp3579894p3579894.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] giving factor names

2011-06-07 Thread kieran martin
Hi,

I've been driving myself insane with this problem. I have a trellis plot of
contours, and I want each level to have something like "z=value" for each
one. I can get each one to say z, or each one to say the value (by using
as.factor) but not both. Heres an artificial example to show what I mean (as
my actual data set is much larger!)

x<-c(1,2,3)
y<-c(2,4,6)
z<-c(0.1,0.5,2)
combo<-expand.grid(x,y,z)
combo<-data.frame(combo)
names(combo)<-c("x","y","z")
outcome<-function(l)
{
(l[1]*l[2])/l[3]
}
resp<-apply(combo,1,outcome)
levelplot(resp~x*y|z,data=combo
,pretty=TRUE,region=TRUE,contour=FALSE)

, so in this final graph I want the z=0.1, z=0.5 and z=2 in turn.

Thanks,

Kieran Martin
University of Southampton

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sorting Dataframes

2011-06-07 Thread SamiC
I am a new user, and i am trying to sort out a data frame.

I have for example bins of data.  Within each bin i have multiple counts of
animals and the depths at which these count were taken.  How would I
summarise this to being only the maximum count per bin alongisde the
corresponding height (but not the maximum depth - i want the depth at which
the maximum number of animals occurs).


Thank you


--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-Dataframes-tp3580075p3580075.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Setting up a State Space Model in dlm

2011-06-07 Thread Michael Ash
This question pertains to setting up a model in the package "dlm"
(dynamic linear models,
http://cran.r-project.org/web/packages/dlm/index.html

I have read both the vignette and "An R Package for Dynamic Linear
Models" (http://www.jstatsoft.org/v36/i12/paper), both of which are
very helpful.  There is also some discussion at
https://stat.ethz.ch/pipermail/r-help/2009-May/198463.html

I have what I think is a relatively straightforward state-space model
but am unable to translate it into the terms of dlm.   It would be
very helpful to get a basic dlm setup for the problem and I would
guess that I can then modify it with more lags, etc., etc.

The main equation is
pi[t] = a * pi[t-1] + b*(U[t] - UN[t]) + e[t]

(see http://chart.apis.google.com/chart?cht=tx&chl=%5Cpi_t=a%5Cpi_{t-1}%2bb%28U_t-U^N_{t}%29%2Be_t
for a pretty version)

with pi and U observed, a and b fixed coefficients, and e a
well-behaved error term (gaussian, say, variance unknown).
The object of interest is the unobserved and time-varying component UN
which evolves according to

UN[t] = UN[t-1] + w[t]

(see 
http://chart.apis.google.com/chart?cht=tx&chl=U%5EN_%7Bt%7D%20=%20U%5EN_%7Bt-1%7D%20%2B%20%5Cepsilon_t
for a pretty version)
that is, a random walk with well-behaved error term with var(w) known
(or assumed).

I'm interested in the estimates of a and b and also in estimating the
time series of UN.

Note that the term b*(U[t] - UN[t]) makes this a nonlinear model.

Below is code that does not work as expected.  I see the model as
having four parameters, a, b, var(e), and UN.  (Or do I have a
parameter UN[t] for every period?)

I do not fully understand the dlm syntax.  Is FF specified properly?
What should X look like?  How does m0 relate to parm()?


I would be grateful if someone would be willing to glance at the code.
 Thanks. Michael

library(quantmod)
library(dlm)
## Get and organize the data
getSymbols("UNRATE",src="FRED")  ## Unemployment rate
getSymbols("GDPDEF",src="FRED")  ## Quarterly GDP Implicit Price Deflator
u <- aggregate(UNRATE,as.yearqtr,mean)
gdpdef <- aggregate(GDPDEF,as.yearqtr,mean)
pi <- diff(log(gdpdef))*400
pilag <- lag(pi,-1)
tvnairu <- cbind(pi,pilag,u)
tvnairu.df <- subset(data.frame(tvnairu), !is.na(pi) & !is.na(u) &
!is.na(pilag))


## First attempt
buildNAIRU <- function(x) {
  modNAIRU <- dlm(FF=t(matrix(c(1,1,1,0))),
  GG=diag(4),
  W=matrix(c(0,0,0,0,  0,0,0,0,  0,0,0.04,0,  0,0,0,0),4,4),
  V=exp(x[4]), m0=rep(0,4), C0=diag(1e07,4),
  JFF = t(matrix(c(1,1,0,0))),
  X=cbind( tvnairu.df$pilag, tvnairu.df$u))
  return(modNAIRU)
}

(fitNAIRU <- dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
hessian=TRUE, control=list(maxit=500)))
(dlmNAIRU <- buildNAIRU(fitNAIRU$par))


## Second attempt
buildNAIRU <- function(x) {
  modNAIRU <- dlm(FF=t(matrix(c(1,1,0,0))),
  GG=diag(4),
  W=matrix(c(0,0,0,0, 0,0,0,0, 0,0,0.04,0, 0,0,0,0 ),4,4),
  V=exp(x[4]), m0=c(0.7,-0.3,5,1), C0=diag(100,4),
  JFF = t(matrix(c(1,1,0,0))),
  X=cbind( tvnairu.df$pilag, tvnairu.df$u - x[3] ))
  return(modNAIRU)
}

(fitNAIRU <- dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
hessian=TRUE, control=list(maxit=500)))
(dlmNAIRU <- buildNAIRU(fitNAIRU$par))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot 2: Histogram with bell curve?

2011-06-07 Thread wwreith
I am learning ggplot2 commands specifically qplot for the time being and I
have figured out how to create histograms and normal density curves but I am
not sure how to add a normal bell curve or other dist. as well on top of a
histogram. 

Here are the two graphs that I created. 

## Histogram 
t<-rnorm(500) 
w<-qplot(t, main="Normal Random Sample", fill=I("blue"), colour=I("black"),
geom="histogram") 
w 

##Density Curve 
t<-rnorm(500) 
r<-qplot(t, main="Normal Random Sample", colour=I("black"), geom="density",
adjust=4) 
r  



--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-2-Histogram-with-bell-curve-tp3580359p3580359.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Classifying boolean values

2011-06-07 Thread Sarah Goslee
It's likely that class is numeric and you actually want factor
(regression tree vs classification tree).

str(echoknn.train) will show you.

By saying, "I have to build a tree classifier" you make me think that
this is a course assignment. If it is, you should perhaps talk to your
instructor. If not, then a more detailed and reproducible example will
usually get you a more informative answer, since it will allow people
to actually run and debug your code.

Sarah

On Tue, Jun 7, 2011 at 11:47 AM, Grifone  wrote:
> Hi to all, I'm new to this forum and new to R. I have to build a tree
> classifier that has boolean values as response.
> When I build the tree with:
>
> echoknn.tree <- tree(class ~ ., data=echoknn.train)
>
> where "class" is a coloumn of my dataset (echoknn.train) of boolean values,
> the result is a tree where leaf nodes are numbers in the range [0,1]; but
> this isn't the result that I expect to have. I'd want that result of
> classifier is TRUE or FALSE.
> Can someone help me?
>
> Thanks.
>
> Fabio
>
>


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MARS for complex samples / survey data?

2011-06-07 Thread JPRISCIANDARO
Hello forum,

I am interested in investigating non-linear relationships between variables
using something akin to Multivariate Adaptive Regression Splines. The
problem is that my data have sample weights and stratification variables,
and I don't think earth and similar packages support these. 

Any suggestions? It doesn't have to be MARS per se, just something to
estimate non-linear relationships that does a better job than polynomials
and can incorporate sample weights and stratification variables.

Thanks,
Jim

--
View this message in context: 
http://r.789695.n4.nabble.com/MARS-for-complex-samples-survey-data-tp3580600p3580600.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 and facet

2011-06-07 Thread Ista Zahn
Hi James,
Specify data = times in the qplot call and get rid of times$
everywhere. For example, do

   pp2 = qplot(time, error, data = times)
   pp2 + facet_wrap(~ runway)

Best,
Ista
On Tue, Jun 7, 2011 at 4:01 PM, James Rome  wrote:
> I have a data frame  (attached) that has interpolated EOT errors for
> each minute before flight landing. It also has the runway and an index
> for the flight:
>
>> > times[1:4,]
>  time       error runway flight
> 1    0 -0.02206235    04R      1
> 2    1 -0.07961631    04R      1
> 3    2 -0.13795380    04R      1
> 4    3 -0.20726073    04R      1
>
>> > sapply(times, class)
>     time     error    runway    flight
> "numeric" "numeric"  "factor"  "factor"
>
> I want to plot this using ggplot2
>    library(ggplot2)
>    pp2 = qplot(times$time, times$error, times)
>    pp2 = pp2 + facet_wrap(~ times$runway)
>    print(pp2)
> But when I try it, I get the error
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
> I get the same error if I make times$time a factor:
>
>> > times$time = as.factor(times$time)
>> > sapply(times, class)
>     time     error    runway    flight
>  "factor" "numeric"  "factor"  "factor"
>
>> >     pp2 = qplot(times$time, times$error, times)
>> >     pp2 = pp2 + facet_wrap(~ times$runway)
>> >     print(pp2)
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>
> What am I doing wrong?
>
> And I really want to make a boxplot for every minute of times$time, but
> when I try that, I get one single box.
>
> Thanks for the help,
> Jim
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MARS for complex survey data

2011-06-07 Thread JPRISCIANDARO
Hello forum, 

I am interested in investigating non-linear relationships between variables
using something akin to Multivariate Adaptive Regression Splines. The
problem is that my data have sample weights and stratification variables,
and I don't think earth and similar packages support these.
 
Any suggestions? It doesn't have to be MARS per se, just something to
estimate non-linear relationships that does a better job than polynomials
and can incorporate sample weights and stratification variables.
 
Thanks, 
Jim 

--
View this message in context: 
http://r.789695.n4.nabble.com/MARS-for-complex-survey-data-tp3580659p3580659.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] WinBUGS on survival, simple but confusing question

2011-06-07 Thread tingtingzhan
Hi All,

I'm using WinBUGS on a very simple survival model (log-normal with one
covariate "Treat"), but I cannot understand the way it handles censored
data.  I'm posting the R file which generates the data from pre-specified
parameters, as well as the .bug file.

The question is, if I use NA to denote the censored data (as suggested by
the example Mice in WinBUGS Example Vol.I), 90% of the time I get the error
message:
"value of log normal surt.BUGS[ii] must be positive"
where "ii" is the index of the first NA in my simulated survival data.  
However, there are some very rare occasions that WinBUGS performs as we
expect, and I included on such dataset in the following code.

Any help would be greatly appreciated, if someone could try out these code
and let me know if I missed anything.

Best wishes,
Tingting Zhan



###
## R file.   Save to "~/BUGS.survival.simple.R"
## source("~/BUGS.survival.simple.R")

rm(list=ls(all=TRUE));
library(survival);
library(R2WinBUGS);
library(BRugs);


## True parameters
alpha = c(2.31, 0.14);
scale = 0.25;

## sample size
n.Treat = n.Control = 20;
N = n.Treat + n.Control;

## covariate
Treat = c(rep(0, n.Control), rep(1, n.Treat))

## true survival time (which is censored at day 14)
surt.cens = rep(14, N);

surt = # round(c(rlnorm(n.Control, alpha[1], scale), rlnorm(n.Treat,
alpha[1]+alpha[2], scale)));
 c(9, 9, 7, 9, 8, 10, 10, 10, 7, 8, 9, 8, 9, 7, 8, 9, 9, 11, 8, 8,
13, 9, 18, 11, 8, 11, 22, 11, 14, 8, 12, 16, 13, 10, 15, 15, 8, 16, 11, 10);  
## occasionally works


## survival time and censoring time to be passed into WinBUGS
surt.BUGS = surt;
surt.BUGS[surt >= surt.cens] = NA;  
if(any(is.na(surt.BUGS))) cat("surt.BUGS taking NA (censored) value at",
which(is.na(surt.BUGS)), ".\n");

surt.cens.BUGS = surt.cens;
surt.cens.BUGS[surt < surt.cens] = 0;

## non-info prior
sd.gamma1 = sd.gamma2 = 1e-2;
pos.lim = 1e-4;
norm.sd = 1e-3;

BUGS.data = list("N", "Treat", "surt.BUGS", "surt.cens.BUGS", "sd.gamma1",
"sd.gamma2", "norm.sd", "pos.lim");

BUGS.parameters = c("alpha", "surt.sigma", "surt.BUGS");

BUGS.sim = bugs(BUGS.data, inits = NULL, BUGS.parameters, model.file =
"~\\BUGS.survival.bug", n.chains = 7, n.iter = 700, debug = T);

## end of R file
###


###
## bug file.   Save to "~/BUGS.survival.bug"

model{
  for(i in 1:N)
  {
surt.BUGS[i] ~ dlnorm(surt.mean[i], surt.tau)I(surt.cens.BUGS[i], );
surt.mean[i] <- alpha[1] + alpha[2]*Treat[i];
  }

  surt.tau ~ dgamma(sd.gamma1, sd.gamma2)I(pos.lim,);
  surt.sigma <- sqrt(1/surt.tau);

  alpha[1] ~ dnorm(0, norm.sd);
  alpha[2] ~ dnorm(0, norm.sd);
}

## end of bug file
###



--
View this message in context: 
http://r.789695.n4.nabble.com/WinBUGS-on-survival-simple-but-confusing-question-tp3580685p3580685.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count length of continues elements in a vector

2011-06-07 Thread Marc Schwartz
On Jun 7, 2011, at 9:25 AM, davetracz wrote:

> I am performing a precipitation analysis. data is in the form of daily
> precipitation amounts, e.g.
> 
> x<- c(4,5,3,0,0,0,2,4,6,4,0,0,0,2,2,0,3,4,1,0,...)
> 
> I would like to find the length of the "storm", length of storm would be
> defined as the number of days with continues precipitation. in this case the
> returned vector would be:
> 
> (3,4,2,3,...)
> 
> I would also like the amount of precipitation associated with each "storm"
> in another variable. in this case the variable would look like:
> 
> (12,16,4,8,...)
> 
> any suggestions would be appreciated.
> 
> thanks
> Dave


?rle will be useful here

x <- c(4, 5, 3, 0, 0, 0, 2, 4, 6, 4, 0, 0, 0, 2, 2, 0, 3, 4, 1, 0)

# Get the runs of values in 'x' >= 1
R1 <- rle(x >= 1)

> R1
Run Length Encoding
  lengths: int [1:8] 3 3 4 3 2 1 3 1
  values : logi [1:8] TRUE FALSE TRUE FALSE TRUE FALSE ...


# Split 'x' by the sequences of values >= 1 in 'R1'
# The values will be grouped by integers, including '0s'
# > rep(cumsum(R1$values) * R1$values, R1$lengths)
#  [1] 1 1 1 0 0 0 2 2 2 2 0 0 0 3 3 0 4 4 4 0
# so we want to remove the first '0' group
R2 <- split(x, rep(cumsum(R1$values) * R1$values, R1$lengths))[-1]
 

> R2
$`1`
[1] 4 5 3

$`2`
[1] 2 4 6 4

$`3`
[1] 2 2

$`4`
[1] 3 4 1


Now use normal list processing to get the lengths and sums


> sapply(R2, length)
1 2 3 4 
3 4 2 3 

> sapply(R2, sum)
 1  2  3  4 
12 16  4  8 


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a file with reusable functions accessible throughout a computational biology cancer project

2011-06-07 Thread Ben Ganzfried
Thank you very much.  This is incredibly helpful, I just added an R package
and put a bunch of code in it which works very well.  I just had a quick
follow-up question.

Suppose across the uncurated data-sets, stage of cancer progression is
entered in the following way, where the column headers are all different
names:

Example Data-set 1Example Data-set 2Example Data-set
3
Tumor stage: T3bpt stage: 3 stage: I
Tumor stage: T2bpt stage: 2 stage:
II
Tumor stage: T3apt stage: 1 stage:
IV
Tumor stage: T4cpt stage: 4 stage:
IV

What would the code in the R package I'm creating look like such that the
corresponding code in each of these R scripts could be:

curated$stage <- curate_stage (uncurated$characteristics_ch1.5[or whatever
the column header is called])

That is, what code in the function curate_stage() would make it such that
the output would be:

Stage Output Data-set 1Stage Output Data-set 2   Stage Output
Data-set 3

3
3  1
2
2  2
3
1  4
4
4  4


Your helpful advice from before was that:

curate <- function(characteristic, word="grade: ") {
 tmp <- sub(word, "", characteristic, fixed=TRUE)

 tmp[tmp=="I"] <- "low"
 tmp[tmp=="II"] <- "low"
 tmp[tmp=="III"] <- "high"
 tmp
}

I'm wondering about situations when it's not the same exact wording (ie I
can't just find "grade: " and take the value after it), but is very similar
(ie when the word "stage" appears in the row but is preceded and followed by
distinct words) (ie "Tumor stage: ", "pt stage: ", and "stage").

Thanks in advance-- you guys are saving me countless hours of repetitive
coding, allowing me to move onto the more important and interesting parts of
the study. I really appreciate all your suggestions and help.

Sincerely,

Ben


On Tue, Jun 7, 2011 at 12:53 PM, Duncan Murdoch wrote:

> On 07/06/2011 12:41 PM, Ben Ganzfried wrote:
>
>> Hi,
>>
>> My project is set up the following way:
>> root directory contains the following folders:
>>   folders: "Breast_Cancer" AND "Colorectal_Cancer" AND "Lung_Cancer" AND
>> "Prostate_Cancer"
>>
>> I want to create a file, call it: "repeating_functions.R" and place it in
>> the root directory such that I can call these functions from within the
>> sub-folders in each type of cancer.  My confusion is that I'm not sure of
>> the syntax to make this happen.  For example:
>>
>> Within the "Prostate_Cancer" folder, I have the following folders:
>> "curated" AND "src" AND "uncurated"
>>
>> Within "uncurated" I have a ton of files, one of which could be:
>> PMID5377_fullpdata.csv
>>
>> within "src" I have my R scripts, the one corresponding to the above
>> "uncurated" file would be:
>> PMID5377_curation.R
>>
>> Here's the problem I'm trying to address:
>> Many of the uncurated files will require the same R code to curate them
>> and
>> I find myself spending a lot of time copying and pasting the same code
>> over
>> and over. I've spent at least 40 hours copying code I've already written
>> and
>> pasting it into a new dataset.  There has simply got to be a better way to
>> do this.
>>
>
> There is:  you should put your common functions in a package.  Packages are
> a good way to organize your own code, you don't need to publish them.  (You
> will get a warning if you put "Not for distribution" into the License field
> in the DESCRIPTION file, but it's just a warning.)  You can also put
> datasets in a package; this makes sense if they are relatively static.  If
> you get new data every day you probably wouldn't.
>
>  A common example of the code I'll write in an "uncurated" file is the
>> following (let's call the following snippet of code UNCURATED_EXAMPLE1):
>> ##characteristics_ch1.2 ->  G
>> tmp<- uncurated$characteristics_ch1.2
>> tmp<- sub("grade: ","",tmp,fixed=TRUE)
>> tmp[tmp=="I"]<- "low"
>> tmp[tmp=="II"]<- "low"
>> tmp[tmp=="III"]<- "high"
>> curated$G<- tmp
>>
>> The thing that changes depending on the dataset is *typically* the column
>> header (ie "uncurated$characteristics_ch1.2" might be
>> "uncurated$description" or "uncurated_characteristics_ch1.7" depending on
>> the dataset), although sometimes I want to substitute different words (ie
>> "grade" can be referred to in many different ways).
>>
>> What's the easiest way to automate this?  I'd like, at a minimum, to make
>> UNCURATED_EXAMPLE1 look like the following:
>> tmp<- uncurated$characteristics_ch1.2
>> insert_call_to_repeating_functions.R_and_access_("grade")_function
>> curated$G<- tmp
>>
>> It would be even better if I could say, for Prostate_Cancer, write one R
>> script that standardizes all the "uncurated" datasets; rather than writing
>> 100 different R scripts.  Although I don't know how feasible this i

Re: [R] ggplot 2: Histogram with bell curve?

2011-06-07 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of dicko ahmadou
> Sent: Tuesday, June 07, 2011 11:41 AM
> To: r-help@r-project.org
> Subject: Re: [R] ggplot 2: Histogram with bell curve?
> 
> Hi
> 
> Don't use t as var names, because t is also a function (transpose).
> This code should work...
> set.seed(1)
> T <- rnorm(500)

I think that it is more dangerous to use 'T' than 't' since
'T' is a built-in data name (with value TRUE) and I've
seen it used more than 't' (probably because S and S+
regard it as a reserved word).  I don't think that reusing
built-in names as local names is as dangerous as some would make
out but reusing built-in data names is probably more dangerous
than using built-in function names for local data names
(because function name lookup is done differently than general
name lookup - the former will only look for functions).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> qplot(T, geom = "blank") + 
> geom_histogram(aes(y = ..density..), colour = "black", fill = 
> "blue") + 
> stat_density(geom = "line", colour = "red")
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/ggplot-2-Histogram-with-bell-cur
ve-tp3580359p3580457.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assign a cluster based on a variable

2011-06-07 Thread Uwe Ligges



On 07.06.2011 16:24, Dominik P.H. Kalisch wrote:

Hi,

I have two matrices of the following form:

cluster (n=18):
12062 1
12063 2
12064 2
12065 3
12066 5

KreisSA (n=2304)
12062
12062
12067
12065
12063
12067

I try to assign the cluster[,2] to KreisSAa by the follwoing loop:

n <- nrow(cluster)
KreisSAa <- numeric()

for(i in 1:n){
KreisSAa[KreisSA == cluster[i,1]] <- cluster[i,2]
}

The result is a vector of the length n=4608 where after the entry 2304
are just NA's
Has someone an idea what I do wrong?


Since we do not know the structure of your objects, we cannot say 
easily. You may want to provide


str(cluster)
and
str(KreisSA)

Anyway: I'd just use merge()

Uwe Ligges







Thanks for your help.
Dominik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bigining with a Program of SVR

2011-06-07 Thread ypriverol
Well:
 I programmed an script in R using caret package and the results are very
interesting ...
 I have two datasets the first dataset have a linear distribution
experimentaly:
  values are: 4.3 , 5.3, 6.3.. 10.3... 
 the svmRadial kernel work perfectly and I can obtain an R2 = 0.98 between
the predicted value and the 
 estimated value.  
 But the second dataset do'nt have a linear distribution of the data
experimentaly ...then I can only say 
 that molecules are in the the well, 1, 2, 3, ... 10.
 When I tested the svmRadial the results are disastrous
 Is related the problem with non-linear regression?
 Is related the problem with the fact that svmRadial in caret select the
RMSD to choice the better model?
 
Thanks in advance
Yasset  


   


--
View this message in context: 
http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3580705.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error message

2011-06-07 Thread Parida, Mrutyunjaya
Hi
fn <- dir(pattern="txt",full.name=T)
>  fn
[1] "./GSM696980_US81503234_252741110209_S01_CGH_107_Sep09_1_1_32914.txt"
[2] "./GSM696981_US81503234_252741110209_S01_CGH_107_Sep09_1_2_32916.txt"
[3] "./GSM696982_US81503234_252741110209_S01_CGH_107_Sep09_1_3_33021.txt"
[4] "./GSM696983_US81503234_252741110209_S01_CGH_107_Sep09_1_4_33024.txt"
> RG <- 
> read.maimages(fn,source="agilent",quote="",other=c("gProcessedSignal","rProcessedSignal","gProcessedSigError","rProcessedSigError"),
+ columns=list(G = "gMeanSignal",Gb = "gBGMedianSignal", R ="rMeanSignal", Rb = 
"rBGMedianSignal"))
Error in readGenericHeader(fullname, columns = columns, sep = sep) :
  Specified column headings not found in file


I am trying to read the data into R using read.maimages(), but it is showing me 
this error. The data is from agilent and I checked the columns they exist on 
the dataset. I am also sending you an array file with this email.
Please comment on this.
Thanking you
Mrutyunjaya Parida



The information transmitted is intended only for the per...{{dropped:10}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regular Expressions for "Large" Data Set

2011-06-07 Thread Abraham Mathew
I'm running R 2.13 on Ubuntu 10.10

I have a data set which is comprised of character strings.

site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt')

dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499")
dat

I want to loop through the data and construct a data frame with the zip
code,
state abbreviation, and city name in seperate columns. Given the size of
this
data set, I was wondering if there was an efficient way to get the desired
results.

Thanks
Abraham


WebRep
Overall rating

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generic function to split a vector by user defined values

2011-06-07 Thread Marc Schwartz
Hi all,

In follow up to my reply regarding splitting/grouping a vector:

  https://stat.ethz.ch/pipermail/r-help/2011-June/280361.html

it seems logical that a generic approach might be useful. So here is one 
possibility, which I present for use and improvement as may be appropriate.

x : a vector
split: the value to use for the splits

splitVec <- function(x, split)
{
  is.na(x) <- x == split
  R1 <- rle(!is.na(x))
  split(x, rep(cumsum(R1$values) * R1$values, R1$lengths))[-1]
}


So with Dave's original integer vector:

x <- c(4, 5, 3, 0, 0, 0, 2, 4, 6, 4, 0, 0, 0, 2, 2, 0, 3, 4, 1, 0)

> splitVec(x, 0)
$`1`
[1] 4 5 3

$`2`
[1] 2 4 6 4

$`3`
[1] 2 2

$`4`
[1] 3 4 1


With a character vector:

set.seed(1)
Vec <- sample(c(letters[1:4], "Z"), 10, replace = TRUE)

> Vec
 [1] "b" "b" "c" "Z" "b" "Z" "Z" "d" "d" "a"

> splitVec(Vec, "Z")
$`1`
[1] "b" "b" "c"

$`2`
[1] "b"

$`3`
[1] "d" "d" "a"


Hope that this might be useful to folks.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 and facet

2011-06-07 Thread James Rome
Times is extracted from a larger data frame with the city in it also, so
the variables are not unique. But I tried what you suggested, and get
> pp2 = qplot(time, error, times)
> pp2 = pp2 + facet_wrap(~ runway)
> print(pp2)
Error in eval(expr, envir, enclos) : object 'error' not found

So I am confused.

Thanks,
Jim
On 6/7/2011 4:12 PM, Ista Zahn wrote:

Hi James,
Specify data = times in the qplot call and get rid of times$
everywhere. For example, do

   pp2 = qplot(time, error, data = times)
   pp2 + facet_wrap(~ runway)

Best,
Ista
On Tue, Jun 7, 2011 at 4:01 PM, James Rome  wrote:

> I have a data frame  (attached) that has interpolated EOT errors for
> each minute before flight landing. It also has the runway and an index
> for the flight:
>
>>> times[1:4,]
>  time   error runway flight
> 10 -0.0220623504R  1
> 21 -0.0796163104R  1
> 32 -0.1379538004R  1
> 43 -0.2072607304R  1
>
>>> sapply(times, class)
> time errorrunwayflight
> "numeric" "numeric"  "factor"  "factor"
>
> I want to plot this using ggplot2
>library(ggplot2)
>pp2 = qplot(times$time, times$error, times)
>pp2 = pp2 + facet_wrap(~ times$runway)
>print(pp2)
> But when I try it, I get the error
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
> I get the same error if I make times$time a factor:
>
>>> times$time = as.factor(times$time)
>>> sapply(times, class)
> time errorrunwayflight
>  "factor" "numeric"  "factor"  "factor"
>
>>> pp2 = qplot(times$time, times$error, times)
>>> pp2 = pp2 + facet_wrap(~ times$runway)
>>> print(pp2)
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>
> What am I doing wrong?
>
> And I really want to make a boxplot for every minute of times$time, but
> when I try that, I get one single box.
>
> Thanks for the help,
> Jim
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regular Expressions for "Large" Data Set

2011-06-07 Thread Marc Schwartz
On Jun 7, 2011, at 3:55 PM, Abraham Mathew wrote:

> I'm running R 2.13 on Ubuntu 10.10
> 
> I have a data set which is comprised of character strings.
> 
> site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt')
> 
> dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499")
> dat
> 
> I want to loop through the data and construct a data frame with the zip
> code,
> state abbreviation, and city name in seperate columns. Given the size of
> this
> data set, I was wondering if there was an efficient way to get the desired
> results.
> 
> Thanks
> Abraham


Since the original text file is a CSV file (without a header), just use:

> system.time(DF <- 
> read.csv("http://www.census.gov/tiger/tms/gazetteer/zips.txt";, header = 
> FALSE))
   user  system elapsed 
  0.385   0.033   1.832 


> str(DF)
'data.frame':   29470 obs. of  8 variables:
 $ V1: int  1 1 1 1 1 1 1 1 1 1 ...
 $ V2: int  35004 35005 35006 35007 35010 35014 35016 35019 35020 35023 ...
 $ V3: Factor w/ 51 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ V4: Factor w/ 16698 levels "02821","04465",..: 150 168 180 7710 10434 348 
547 812 1250 7044 ...
 $ V5: num  86.5 87 87.2 86.8 86 ...
 $ V6: num  33.6 33.6 33.4 33.2 32.9 ...
 $ V7: int  6055 10616 3205 14218 19942 3062 13650 1781 40549 39677 ...
 $ V8: num  0.001499 0.002627 0.000793 0.003519 0.004935 ...


> head(DF)
  V1V2 V3 V4   V5   V6V7   V8
1  1 35004 AL  ACMAR 86.51557 33.58413  6055 0.001499
2  1 35005 AL ADAMSVILLE 86.95973 33.58844 10616 0.002627
3  1 35006 AL  ADGER 87.16746 33.43428  3205 0.000793
4  1 35007 AL   KEYSTONE 86.81286 33.23687 14218 0.003519
5  1 35010 AL   NEW SITE 85.95109 32.94145 19942 0.004935
6  1 35014 AL ALPINE 86.20893 33.33116  3062 0.000758


HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 and facet

2011-06-07 Thread James Rome
Something is very strange:
> pp2 = qplot(time, error, times)
> plot(pp2)
Error in xy.coords(x, y, xlabel, ylabel, log) :
  'x' is a list, but does not have components 'x' and 'y'
On 6/7/2011 4:12 PM, Ista Zahn wrote:

Hi James,
Specify data = times in the qplot call and get rid of times$
everywhere. For example, do

   pp2 = qplot(time, error, data = times)
   pp2 + facet_wrap(~ runway)

Best,
Ista
On Tue, Jun 7, 2011 at 4:01 PM, James Rome  wrote:

> I have a data frame  (attached) that has interpolated EOT errors for
> each minute before flight landing. It also has the runway and an index
> for the flight:
>
>>> times[1:4,]
>  time   error runway flight
> 10 -0.0220623504R  1
> 21 -0.0796163104R  1
> 32 -0.1379538004R  1
> 43 -0.2072607304R  1
>
>>> sapply(times, class)
> time errorrunwayflight
> "numeric" "numeric"  "factor"  "factor"
>
> I want to plot this using ggplot2
>library(ggplot2)
>pp2 = qplot(times$time, times$error, times)
>pp2 = pp2 + facet_wrap(~ times$runway)
>print(pp2)
> But when I try it, I get the error
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
> I get the same error if I make times$time a factor:
>
>>> times$time = as.factor(times$time)
>>> sapply(times, class)
> time errorrunwayflight
>  "factor" "numeric"  "factor"  "factor"
>
>>> pp2 = qplot(times$time, times$error, times)
>>> pp2 = pp2 + facet_wrap(~ times$runway)
>>> print(pp2)
> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>
> What am I doing wrong?
>
> And I really want to make a boxplot for every minute of times$time, but
> when I try that, I get one single box.
>
> Thanks for the help,
> Jim
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 and facet

2011-06-07 Thread Ista Zahn
Hi James,

On Tue, Jun 7, 2011 at 5:12 PM, James Rome  wrote:
> Times is extracted from a larger data frame with the city in it also, so
> the variables are not unique. But I tried what you suggested, and get
>>     pp2 = qplot(time, error, times)
>>     pp2 = pp2 + facet_wrap(~ runway)
>>     print(pp2)
> Error in eval(expr, envir, enclos) : object 'error' not found

That is _not_ what I suggested. I said

  pp2 = qplot(time, error, data = times)
  pp2 + facet_wrap(~ runway)

You are missing the data = part.

>
> So I am confused.
>
> Thanks,
> Jim
> On 6/7/2011 4:12 PM, Ista Zahn wrote:
>
> Hi James,
> Specify data = times in the qplot call and get rid of times$
> everywhere. For example, do
>
>   pp2 = qplot(time, error, data = times)
>   pp2 + facet_wrap(~ runway)
>
> Best,
> Ista
> On Tue, Jun 7, 2011 at 4:01 PM, James Rome  wrote:
>
>> I have a data frame  (attached) that has interpolated EOT errors for
>> each minute before flight landing. It also has the runway and an index
>> for the flight:
>>
 times[1:4,]
>>  time       error runway flight
>> 1    0 -0.02206235    04R      1
>> 2    1 -0.07961631    04R      1
>> 3    2 -0.13795380    04R      1
>> 4    3 -0.20726073    04R      1
>>
 sapply(times, class)
>>     time     error    runway    flight
>> "numeric" "numeric"  "factor"  "factor"
>>
>> I want to plot this using ggplot2
>>    library(ggplot2)
>>    pp2 = qplot(times$time, times$error, times)
>>    pp2 = pp2 + facet_wrap(~ times$runway)
>>    print(pp2)
>> But when I try it, I get the error
>> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>> I get the same error if I make times$time a factor:
>>
 times$time = as.factor(times$time)
 sapply(times, class)
>>     time     error    runway    flight
>>  "factor" "numeric"  "factor"  "factor"
>>
     pp2 = qplot(times$time, times$error, times)
     pp2 = pp2 + facet_wrap(~ times$runway)
     print(pp2)
>> Error in fix.by(by.x, x) : 'by' must specify valid column(s)
>>
>> What am I doing wrong?
>>
>> And I really want to make a boxplot for every minute of times$time, but
>> when I try that, I get one single box.
>>
>> Thanks for the help,
>> Jim
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error message

2011-06-07 Thread Martin Morgan

On 06/07/2011 01:38 PM, Parida, Mrutyunjaya wrote:

Hi
fn<- dir(pattern="txt",full.name=T)

  fn

[1] "./GSM696980_US81503234_252741110209_S01_CGH_107_Sep09_1_1_32914.txt"
[2] "./GSM696981_US81503234_252741110209_S01_CGH_107_Sep09_1_2_32916.txt"
[3] "./GSM696982_US81503234_252741110209_S01_CGH_107_Sep09_1_3_33021.txt"
[4] "./GSM696983_US81503234_252741110209_S01_CGH_107_Sep09_1_4_33024.txt"

RG<- 
read.maimages(fn,source="agilent",quote="",other=c("gProcessedSignal","rProcessedSignal","gProcessedSigError","rProcessedSigError"),

+ columns=list(G = "gMeanSignal",Gb = "gBGMedianSignal", R ="rMeanSignal", Rb = 
"rBGMedianSignal"))
Error in readGenericHeader(fullname, columns = columns, sep = sep) :
   Specified column headings not found in file


I am trying to read the data into R using read.maimages(), but it is showing me 
this error. The data is from agilent and I checked the columns they exist on 
the dataset. I am also sending you an array file with this email.
Please comment on this.


This is a Bioconductor package so please ask there.

  http://bioconductor.org/help/mailing-list/

Perhpas not all the files are the same, and the headings are missing 
from some, or the files are produced by different versions of agilent 
software and so have different numbers of headings, or...


Martin


Thanking you
Mrutyunjaya Parida



The information transmitted is intended only for the p...{{dropped:17}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Dataframes

2011-06-07 Thread SamiC
So I have figured out how to do it via a series of loops and conditions, but
i am thinking there must be a quicker way to do it.  

an example.

Bin Depth   Fish  to:  Bin   DepthMaxFish
1  4 2   1  8   24
1  8 24 2  8   21  
1  124 3  12  33 
2  4 3
2   821
2   12   2
3   412
38   2
312  33

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-Dataframes-tp3580075p3581046.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Line Graphs

2011-06-07 Thread Robert Baer
Thanks Rob, but the legend is not appearing in the plot. I think the best 
place for it is on the top left.


Is there anyway I can also get it broken down in tenths instead of fifths?

The legend is easy; just specify where you want it.  The first 2 parameters 
specify the x, y of the top left coroner.  See ?legend.


As to getting tenths on the axes, using axes = FALSE and drawing them 
manually where you want them is as easy as anything.  The following should 
work:

A =runif(10)
B =runif(10)
C =runif(10)
D =runif(10)
E =runif(10)
F =runif(10)
x = seq(0.1, 1.0, 0.1)

plot(x, B, type="o", pch = 0, lty=1,col="blue", xlim = c(0,1),
   ylim = c(0,1), axes = FALSE,
   xlab=expression(paste(lambda[0])),
   ylab= expression(paste("Estimate of ", lambda[0])))
axis(1, at = seq(0,1,0.1))
axis(2, at = seq(0,1,0.1))
box()

# Graph trucks with red dashed line and square points
lines(x, A, type="o", pch=2, lty=1, col="red")
lines(x, C, type="o", pch=3, lty=1, col="green")
lines(x, D, type="o", pch=4, lty=1, col="orange")
lines(x, E, type="o", pch=6, lty=1, col="brown")
lines(x, F, type="o", pch=8, lty=1, col="yellow")
abline(0, 1, col = "black")
# Create a title with a red, bold/italic font
title(main="Methods", col.main="red", font.main=4)

# Create a legend at (1, g_range[2]) that is slightly smaller
# (cex) and uses the same line colors and points used by
# the actual plots
legend(0, 1, c("B","A", "C", "D", "E", "F", "Actual"), cex=0.6,
col=c("blue","red", "green", "orange", "brown","yellow", "black"),
pch=c(0,2,3,4,6,8,9), lty=1)

HTH,
Rob

I want to plot 6 line graphs. I have 10 points 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,
0.7, 0.8, 0.9 and 1.0.
At each point say 0.1, I have 6 variables A, B, C, D, E and F. The variables
all have values between 0 and 1 (and including 0 and 1). I also want to
label the x axis from 0.1 to 1.0 and the y axis from 0.1 to 1.0.
My goal is to plot a line graph representing the mean of the variables at
each level. SO for 0.1 on the xaxis, we should expect 6 values for y.
This is what I have so far. The plot omits 1.0 and the abline function does
not make the line y = x on the plot
This is what I have so far:

# Calculate range from 0 to max value of cars and trucks
g_range <- range(0, 1)

# Graph autos using y axis that ranges from 0 to max
# value in cars or trucks vector.  Turn off axes and
# annotations (axis labels) so we can specify them ourself
plot(B, type="o", pch = 0, lty=1,col="blue", ylim=g_range, axes=FALSE,
ann=FALSE)

As  you are currently doing it, you are not specifying a x variable.  This 
means you are getting an 'index plot'.  If I understand what you want it 
would be simpler to just specify x in your plot statement.  From there if 
you absolutely need EVERY 0.1 labeled you could suppress the axis and label 
by hand if you want.  Something along the line of:

A =runif(10)
B =runif(10)
C =runif(10)
D =runif(10)
E =runif(10)
F =runif(10)
x = seq(0.1, 1.0, 0.1)
plot(x, B, type="o", pch = 0, lty=1,col="blue", xlim = c(0,1), ylim = 
c(0,1),

xlab=expression(paste(lambda[0])),

ylab= expression(paste("Estimate of ", lambda[0])))


# Graph trucks with red dashed line and square points

lines(x, A, type="o", pch=2, lty=1, col="red")
lines(x, C, type="o", pch=3, lty=1, col="green")
lines(x, D, type="o", pch=4, lty=1, col="orange")
lines(x, E, type="o", pch=6, lty=1, col="brown")
lines(x, F, type="o", pch=8, lty=1, col="yellow")

abline(0, 1, col = "black")
# Create a title with a red, bold/italic font
title(main="Methods", col.main="red", font.main=4)


# Create a legend at (1, g_range[2]) that is slightly smaller
# (cex) and uses the same line colors and points used by
# the actual plots
legend(1, g_range[2], c("B","A", "C", "D", "E", "F", "Actual"), cex=0.6,
col=c("blue","red", "green", "orange", "brown","yellow", "black"),
pch=c(0,2,3,4,6,8,9), lty=1)



Rob


# Make x axis using the values of pi_0 labels
#

axis(1, at=1:10,
lab=c("0.1","0.2","0.3","0.4","0.5","0.6","0.7","0.8","0.9","1.0"))

# Make y axis with horizontal labels that display ticks at
del = seq(0.1,1, 0.1)
axis(2, at=del,
lab=c("0.1","0.2","0.3","0.4","0.5","0.6","0.7","0.8","0.9","1.0"))

# Create box around plot
box()

# Graph trucks with red dashed line and square points
lines(A, type="o", pch=2, lty=1, col="red")
lines(C, type="o", pch=3, lty=1, col="green")
lines(D, type="o", pch=4, lty=1, col="orange")
lines(E, type="o", pch=6, lty=1, col="brown")
lines(F, type="o", pch=8, lty=1, col="yellow")
abline(0, 1, col = "black")
# Create a title with a red, bold/italic font
#title(main="Methods", col.main="red", font.main=4)

# Label the x and y axes
title(xlab=expression(paste(lambda[0])))
title(ylab= expression(paste("Estimate of ", lambda[0])))

# Create a legend at (1, g_range[2]) that is slightly smaller
# (cex) and uses the same line colors and points used by
# the actual plots
legend(1, g_range[2], c("B","A", "C", "D", "E", "F", "Actual"), cex=0.6,
col=c("blue","red", "green", "orange", "brown","yellow", "blac

Re: [R] Sorting Dataframes

2011-06-07 Thread David A. Johnston
Here's one way:

# Here I read in your data to a variable 'x'
x = read.delim(textConnection(
"Bin Depth Fish
1 4 2
1 8 24
1 12 4
2 4 3
2 8 21
2 12 2
3 4 12
3 8 2
3 12 33"), sep = " ", header = TRUE)


do.call(rbind, lapply(split(x, x$Bin), function(grp)
grp[which.max(grp$Fish),]))

--
View this message in context: 
http://r.789695.n4.nabble.com/Sorting-Dataframes-tp3580075p3581189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About DCC-garch model...

2011-06-07 Thread Marcin P?�ciennik
Hi,
I thought that a common practice is just to ommit the first period data
since it does not have much influence on further results / calculations.

Cheers
Marcin



2011/6/7 windseav 

> Hi, everyone,
>
> I currently run into a problem about DCC-Garch model. I use the package
> cc-garch and the function dcc.estimation. One of the output of this
> function
> is DCC matrix, which shows conditional correlation matrix at every time
> period you gives. However, I cannot figue out how the function calculate
> the
> conditional correlation matrix at the first time period, since there is no
> data to be conditioned... How do we usually deal with the first period data
> of GARCH model? Could anyone suggest any paper about the DCC GARCH model,
> including how to deal with the first time period data?
>
> Thank you all guys in advance!!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/About-DCC-garch-model-tp3579140p3579140.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot 2: Histogram with bell curve?

2011-06-07 Thread dicko ahmadou
my badyou are right
I always use TRUE instead of T,  so i forgot that by default T = TRUE in R.


--
View this message in context: 
http://r.789695.n4.nabble.com/ggplot-2-Histogram-with-bell-curve-tp3580359p3581032.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding values to the end of a data frame

2011-06-07 Thread Abraham Mathew
Let's say that I'm trying to write a functions that will allow me to
automate a process
where I examine all possible combinations of various string groupings. Each
time I run
the one function, I want to include the new values to the end of a data
frame. The data
frame will basically be one column with a lot of rows.

roots <- c("car insurance", "auto insurance")
prefix <- c("cheap", "budget")
suffix <- c("rate", "rates")

one <- function(x, y, z=0) {
  nu <- do.call(paste, expand.grid(x, y, z))
  mydf <- data.frame(nu)
  print(mydf)
}

one(roots, suffix2)
one(prefix, roots)
one(prefix, roots, suffix2)

The code above just replaces each value in the data frame each time I run
the one function.

How can I add the new values to the end of the data frame?


Help!

I'm running R 2.13 on Ubuntu 10.10
WebRep
Overall rating

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] XML segfault on some architectures

2011-06-07 Thread Janet Young
Hi,

I found an architecture-specific segfault problem with the XML package. I 
originally found the problem using the parseKGML2Graph function in the 
Bioconductor KEGGgraph package, but as far as I can tell the underlying issue 
seems to be with the xmlTreeParse which is called by parseKGML2Graph.

I'm trying this piece of code, from the xmlTreeParse help page:

library(XML)
fileName <- system.file("exampleData", "test.xml", package="XML")
x <- xmlTreeParse(fileName)

On my Mac and on nodes of one of the linux clusters I have access to, this 
works fine. But on another of the linux clusters I use, I get a segfault every 
time, on both 32-bit and 64-bit nodes of the cluster.  The unames for those 
nodes are here:

Linux kong053 2.6.18-194.17.1.el5xen #1 SMP Wed Sep 29 13:30:21 EDT 2010 x86_64 
x86_64 x86_64 GNU/Linux
Linux king049 2.6.18-194.26.1.el5xen #1 SMP Tue Nov 9 14:13:46 EST 2010 i686 
i686 i386 GNU/Linux

I think I've included all the relevant info below, but please let me know if 
there's anything else you'd like to see.

thanks,

Janet

--- 

Dr. Janet Young 

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org


--- 






 on 64-bit node

> library(XML)

> fileName <- system.file("exampleData", "test.xml", package="XML")

> fileName
[1] "/home/btrask/traskdata/lib_linux_64/R/library/XML/exampleData/test.xml"

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] XML_3.4-0


> system("uname -a")
Linux kong053 2.6.18-194.17.1.el5xen #1 SMP Wed Sep 29 13:30:21 EDT 2010 x86_64 
x86_64 x86_64 GNU/Linux

> x <- xmlTreeParse(fileName)

 *** caught segfault ***
address 0x51c4f, cause 'memory not mapped'

Traceback:
 1: .Call("RS_XML_ParseTree", as.character(file), handlers, 
as.logical(ignoreBlanks), as.logical(replaceEntities), as.logical(asText), 
as.logical(trim), as.logical(validate), as.logical(getDTD), 
as.logical(isURL), as.logical(addAttributeNamespaces), 
as.logical(useInternalNodes), FALSE, as.logical(isSchema), 
as.logical(fullNamespaceInfo), as.character(encoding), 
as.logical(useDotNames), xinclude, error, addFinalizer, PACKAGE = "XML")
 2: xmlTreeParse(fileName)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cleveland dot plots

2011-06-07 Thread Colin Wahl
I would rather use cleveland dot plots than bar charts to display my study
results. I have not been able to find (or figure out) an R package that is
capable of producing the publication quality dot charts Im looking for. I
have either not been able to get error bars (lattice), cannot order the data
display properly (latticeExtra), or cannot make adjustments to axes. Does
anyone have a quick suggestion for a package that can handle cleveland dot
plots well? 

--
View this message in context: 
http://r.789695.n4.nabble.com/Cleveland-dot-plots-tp3581122p3581122.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict with model (rms package)

2011-06-07 Thread Mark Seeto
Dear R-help,

In the rms package, I have fitted an ols model with a variable
represented as a restricted cubic spline, with the knot locations
specified as a previously defined vector. When I save the model object
and open it in another workspace which does not contain the vector of
knot locations, I get an error message if I try to predict with that
model. This also happens if only one workspace is used, but the vector
of knot locations is removed:

library(rms)
set.seed(1)
x <- rnorm(100)
y <- 1 + x + x^2 + rnorm(100)

x.knots <- quantile(x, c(0.2, 0.5, 0.8))
ols1 <- ols(y ~ rcs(x, x.knots))

predict(ols1, data.frame(x = 0))  # This works
rm(x.knots)
predict(ols1, data.frame(x = 0))  # Gives error

The first predict gives
1
0.8340293

while the second predict gives
Error in rcs(x, x.knots) : object 'x.knots' not found

The same error happens if x.knots is simply defined as a vector like
c(-1, 0, 1) (i.e. not using quantile). Is this the intended behaviour?
The requirement that x.knots be in the workspace seems strange, given
that the knot locations are stored in ols1$Design$parms.

Thanks for any help you can give.

Mark Seeto
National Acoustic Laboratories, Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting DataFrames

2011-06-07 Thread Ethan Brown
Hello Samantha, I'm having some trouble understanding your question in
terms of what's happening in R. Are these "bins" columns of a
data.frame? Rows?

It's helpful for us to have a small example to look at--for instance,
you could create a small subset of your data called x, then type the
command

dump("x", file=stdout())

which will print an expression that will recreate the object x.

Best,
Ethan

On Tue, Jun 7, 2011 at 9:42 AM, Cox, Samantha Lucy
 wrote:
> I am a new user, and i am trying to sort out a data frame.
>
>
>
> I have for example bins of data.  Within each bin i have multiple counts of 
> animals and the depths at which these count were taken.  How would I 
> summarise this to being only the maximum count per bin alongisde the 
> corresponding height (but not the maximum depth - i want the depth at which 
> the maximum number of animals occurs).
>
>
>
> Thank you
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Dataframes

2011-06-07 Thread Dennis Murphy
Hi:

Here's another approach using the plyr package:

# Write a function that takes a data frame as input and outputs a data frame
f <- function(df) df[which.max(df$Fish), ]
ddply(x, 'Bin', f)
  Bin Depth Fish
1   1 8   24
2   2 8   21
3   312   33

HTH,
Dennis

On Tue, Jun 7, 2011 at 3:32 PM, SamiC  wrote:
> So I have figured out how to do it via a series of loops and conditions, but
> i am thinking there must be a quicker way to do it.
>
> an example.
>
> Bin     Depth   Fish          to:      Bin   Depth        MaxFish
> 1          4         2                       1      8               24
> 1          8         24                     2      8               21
> 1          12        4                     3      12              33
> 2          4         3
> 2           8        21
> 2           12       2
> 3           4        12
> 3            8       2
> 3            12      33
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Sorting-Dataframes-tp3580075p3581046.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding values to the end of a data frame

2011-06-07 Thread Dennis Murphy
Alas, you don't have a suffix2 object defined, but try this:

d1 <- one(prefix, roots)
d2 <- one(roots, suffix)
rbind(d1, d2)

To see a potential flaw in your function (as least as far as console
output is concerned), try
rbind(d1, one(roots, suffix))

HTH,
Dennis

On Tue, Jun 7, 2011 at 3:30 PM, Abraham Mathew  wrote:
> Let's say that I'm trying to write a functions that will allow me to
> automate a process
> where I examine all possible combinations of various string groupings. Each
> time I run
> the one function, I want to include the new values to the end of a data
> frame. The data
> frame will basically be one column with a lot of rows.
>
> roots <- c("car insurance", "auto insurance")
> prefix <- c("cheap", "budget")
> suffix <- c("rate", "rates")
>
> one <- function(x, y, z=0) {
>      nu <- do.call(paste, expand.grid(x, y, z))
>      mydf <- data.frame(nu)
>      print(mydf)
> }
>
> one(roots, suffix2)
> one(prefix, roots)
> one(prefix, roots, suffix2)
>
> The code above just replaces each value in the data frame each time I run
> the one function.
>
> How can I add the new values to the end of the data frame?
>
>
> Help!
>
> I'm running R 2.13 on Ubuntu 10.10
> WebRep
> Overall rating
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Princurve

2011-06-07 Thread guy33
As suggested above, specifying useful starting points definitely helps in the
case of:

x <- seq(0,2*pi, length=1000)
x <- cbind(x/(2*pi), sin(x))
fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100, start =
cbind(sort(x[,1]), rep(1, nrow(x 


Interestingly, I find that if you simply scale the X-axis from [0,1] to
[0,2*pi], the algorithm converges without the starting points, as in:

x <- seq(0,2*pi, length=1000)
x <- cbind(x, sin(x))
fit1 <- principal.curve(x, plot = TRUE, trace = TRUE)

I assume this is because scaling the data in this way changes the first
principal component. However, this begs the question of what happens when
you consider more than one sine wave (and what happens when you scale the
x-axis).  For example:

x <- seq(0,10*pi, length=1000)
x <- cbind(x, sin(x))
fit1 <- principal.curve(x, plot = TRUE, trace = TRUE)

I can't seem to get a good curve for this, with or without starting
conditions.  Can anyone get a better fit somehow?

--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3581363.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with Princurve

2011-06-07 Thread guy33

guy33 wrote:
> 
> As suggested above, specifying useful starting points definitely helps in
> the case of:
> 
> x <- seq(0,2*pi, length=1000)
> x <- cbind(x/(2*pi), sin(x))
> fit1 <- principal.curve(x, plot = TRUE, trace = TRUE, maxit = 100, start =
> cbind(sort(x[,1]), rep(1, nrow(x 
> 
> 
> Interestingly, I find that if you simply scale the X-axis from [0,1] to
> [0,2*pi], the algorithm converges without the starting points, as in:
> 
> x <- seq(0,2*pi, length=1000)
> x <- cbind(x, sin(x))
> fit1 <- principal.curve(x, plot = TRUE, trace = TRUE)
> 
> I assume this is because scaling the data in this way changes the first
> principal component. However, this 
> 

Out of curiosity, am I right about why scaling the data in this way seems to
"fix" the problem?  I meant that when the X-range = [0,1] and y-range =
[-1,1], var(x) = 0.0835 and var(y) = 0.5, so the first principal component,
which the algorithm uses, is vertical.  However, when the x-range is scaled
to [0,2*pi], var(x) becomes 3.299 > var(y) (=0.5 still), so now the first
principal component is in the horizontal direction, which leads to expected
behavior.



> begs the question of what happens when you consider more than one sine
> wave (and what happens when you scale the x-axis).  For example:
> 
> x <- seq(0,10*pi, length=1000)
> x <- cbind(x, sin(x))
> fit1 <- principal.curve(x, plot = TRUE, trace = TRUE)
> 
> I can't seem to get a good curve for this, with or without starting
> conditions.  Can anyone get a better fit somehow?
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-Princurve-tp3535721p3581380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cleveland dot plots

2011-06-07 Thread Peter Alspach
Kia ora Colin

I don't know if there is a package that does what you want, but they are easy 
enough to create using plot().  Error bars can be added with arrows().

HTH ...

Peter Alspach

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Colin Wahl
> Sent: Wednesday, 8 June 2011 11:30 a.m.
> To: r-help@r-project.org
> Subject: [R] Cleveland dot plots
> 
> I would rather use cleveland dot plots than bar charts to display my
> study
> results. I have not been able to find (or figure out) an R package that
> is
> capable of producing the publication quality dot charts Im looking for.
> I
> have either not been able to get error bars (lattice), cannot order the
> data
> display properly (latticeExtra), or cannot make adjustments to axes.
> Does
> anyone have a quick suggestion for a package that can handle cleveland
> dot
> plots well?
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Cleveland-
> dot-plots-tp3581122p3581122.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About DCC-garch model...

2011-06-07 Thread Arun.stat
Hi Marcin, I do not think you can just ignore the past period's estimate (or
I misunderstood your statement?)(M)GARCH estimation is essentially an
iterative procedure, therefore you need to have something as the starting
value.

Thanks,
_

Arun Kumar Saha, FRM
QUANTITATIVE RISK AND HEDGE CONSULTING SPECIALIST
Visit me at: http://in.linkedin.com/in/ArunFRM
_

--
View this message in context: 
http://r.789695.n4.nabble.com/About-DCC-garch-model-tp3579140p3581242.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >