date:20100902

[R] using R's svd from outside R

2010-09-02 Thread Ralf Goertz

Hi,

I have to compute the singular value decomposition of rather large
matrices. My test matrix is 10558 by 4255 and it takes about three
minutes in R to decompose on a 64bit quadruple core linux machine. (R is
running svd in parallel, all four cores are at their maximum load while
doing this.) I tried several blas and lapack libraries as well as the
gnu scientific library in my C++ programm. Apart from being unable to
have them do svd in parallel mode (although I thought I did everything
to make them do it in parallel) execution time always exceeds 25 minutes
which is still way more than the expected 12 minutes for the
non-parallel R code.

I am now going to call R from within my program, but this not very
elegant. So my questions are: Does R use a special svd-routine and is it
possible to use it directly by linking in the relevant libraries? (Sorry
but I couldn't figure that out by looking at the source code.) If that
is possible, can I have the code run in parallel mode?

Thanks,

Ralf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "pairs" with same xlim and ylim scale

2010-09-02 Thread Shi, Tao

Hi Dejian,

You're right on this!  Do you know how to pass those two argument into 
lower.panel?  Thanks!

...Tao

From: Dejian Zhao 
To: r-help@r-project.org
Sent: Tue, August 31, 2010 6:10:16 PM
Subject: Re: [R] "pairs" with same xlim and ylim scale

I think you have successfully passed the "xlim" and "ylim" into the 
function pairs1. Compare the two graphs produced by the codes you 
provided, you can find the xlim and ylim in the second graph have been 
reset to the assigned value. It seems that the program halted in 
producing the second plot after adding xlim and ylim. According to the 
error message, the two added parameters were not used in lower.panel, or 
the customized function f.xy.

On 2010-9-1 2:26, Shi, Tao wrote:
> Hi list,
>
> I have a function which basically is a wrapper of pairs with some useful panel
> functions.  However, I'm having trouble to pass the "xlim" and "ylim" into the
> function so the x and y axes are in the same scale and 45 degree lines are
> exactly diagonal.   I've looked at some old posts, they didn't help much.  I
[[elided Yahoo spam]]
>
> Thanks!
>
> ...Tao
>
>
> pairs1<- function(x, ...) {
>  f.xy<- function(x, y, ...) {
>  points(x, y, ...)
>  abline(0, 1, col = 2)
>  }
>
>  panel.cor<- function(x, y, digits=2, prefix="", cex.cor) {
>   usr<- par("usr"); on.exit(par(usr))
>   par(usr = c(0, 1, 0, 1))
>   r<- abs(cor(x, y, method="p", use="pairwise.complete.obs"))
>   txt<- format(c(r, 0.123456789), digits=digits)[1]
>   txt<- paste(prefix, txt, sep="")
>   if(missing(cex.cor)) cex<- 0.8/strwidth(txt)
>   text(0.5, 0.5, txt, cex = cex * r)
>   }
>
>   panel.hist<- function(x, ...) {
>   usr<- par("usr"); on.exit(par(usr))
>   par(usr = c(usr[1:2], 0, 1.5) )
>   h<- hist(x, plot = FALSE)
>   breaks<- h$breaks; nB<- length(breaks)
>   y<- h$counts; y<- y/max(y)
>   rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)
>   }
>
>  pairs(x, lower.panel=f.xy, upper.panel=panel.cor, diag.panel=panel.hist,
> ...)
> }
>
>
>> x<- rnorm(100, sd=0.2)
>> x<- cbind(x=x-0.1, y=x+0.1)
>> pairs1(x)
>> pairs1(x, xlim=c(-1,1), ylim=c(-1,1))
>>  
> Error in lower.panel(...) :
>unused argument(s) (xlim = c(-1, 1), ylim = c(-1, 1))
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to generate integers from uniform distribution with fixed mean

2010-09-02 Thread Barry Rowlingson

On Thu, Sep 2, 2010 at 7:17 AM, Yi  wrote:
> Hi, folks,
>
> runif (n,min,max) is the typical code for generate R.V from uniform dist.
>
> But what if we need to fix the mean as 20, and we want the values to be
> integers only?

 It's not clear what you want. Uniformly random integers with expected
mean 20 - but what range? Any range centred on 20 will work, for
example you could use sample() with replacement. To see the
distribution, use sample()

 table(sample(17:23,1,TRUE))

 which gives a uniform distribution of integers from 17 to 23, so the
mean is 20.0057 for 1 samples.

 Is that what you want?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cross Spectrum Analysis

2010-09-02 Thread aftar


Hi

I'm sorry, but I don't think that coherence is the same as the cross
spectrum. People use coherence since it is much easier to deal with. I know
how by using R to plot and calculate the coherence and phase, but what I
didn't know is how to calculate the cross spectrum by using R. 

Regards
Aftar
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Cross-Spectrum-Analysis-tp855188p2486248.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R code changes to crap

2010-09-02 Thread khush ........

Hi,

I am using vim to edit my files...I do not know what has been pressed by me
as my R code get converted to such a crap...

%PDF-1.4
%<81>â<81>ã<81>Ï<81>Ó\r
1 0 obj
<<
/CreationDate (D:20100902122215)
/ModDate (D:20100902122215)
/Title (R Graphics Output)
/Producer (R 2.11.1)
/Creator (R)

how to reconvert it to the my actual code?? as I do not have  backup for
it..?

Thanks in advance
Khushwant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reshape to wide format takes extremely long

2010-09-02 Thread Coen van Hasselt

Hello,

I have a data.frame with the following format:

> head(clin2)
Study Subject  Type  Obs Cycle Day   Date  Time
1 A001101   10108   ALB 44.098   1 2004-03-11 14:26
2 A001101   10108   ALP 95.098   1 2004-03-11 14:26
3 A001101   10108   ALT 61.098   1 2004-03-11 14:26
5 A001101   10108   AST 33.098   1 2004-03-11 14:26

I want to transform this data.frame so that I have "Obs" columns for
each "Type". The full dataset is 45000 rows long. For a short subset
of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
All columns are either numeric or character format (incl. date/time).

> reshape(clin2, v.names="Obs", timevar="Type", 
> direction="wide",idvar=c("Study","Subject","Cycle","Day","Date","Time"),)
  Study Subject Cycle Day   Date  Time Obs.ALB Obs.ALP Obs.ALT Obs.AST
1   A001101   1010898   1 2004-03-11 14:26  44  95  61  33
11  A001101   10108 1   1 2004-03-12 14:01  41  85  39  33
21  A001101   10108 1   8 2004-03-22 10:34  40  90  70  34
30  A001101   10108 1  15 2004-03-29 09:56  45  97  66
 48 []

However, when using the same reshape command for the full data.frame
of 45000 rows, it still wasn't finished when run overnight (8 GB RAM +
8 GB swap in use).

The time to process this data.frame from a 100-row subset to a
1000-row subset increases from 0.2 sec to 60 sec.

I would greatly appreciate a advice why the time for reshaping is
increasing exponentially with the nr. of rows, and how I can do this
in an elegant way.

Thanks!

Coen.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R code changes to crap

2010-09-02 Thread peter dalgaard

On Sep 2, 2010, at 09:16 , khush  wrote:

> Hi,
> 
> I am using vim to edit my files...I do not know what has been pressed by me
> as my R code get converted to such a crap...
> 
> %PDF-1.4
> %<81>â<81>ã<81>Ï<81>Ó\r
> 1 0 obj
> <<
> /CreationDate (D:20100902122215)
> /ModDate (D:20100902122215)
> /Title (R Graphics Output)
> /Producer (R 2.11.1)
> /Creator (R)
> 
> how to reconvert it to the my actual code?? as I do not have  backup for
> it..?
> 

Well, that's a PDF file, with some R Graphics inside, so presumably you did 
pdf(file="foo.R") or copied the graphics output as PDF to foo.R using the 
user-friendly interface to shooting yourself in the foot.

Depending on your (unstated) platform and configuration, VIM may be leaving 
.bak files around, which you can use, or you may have the commands echoed into 
an output scriptfile, from which you can extract them. Otherwise, I suspect 
that you are out of luck.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Kay Cecil Cichini

i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with  
one trail and get a p-value of 1. actually i want to proof the  
alternative H that the estimate is different from 0.5, what certainly  
can not be aproven here. but in reverse the p-value of 1 says that i  
can 100% sure that the estimate of 0.5 is true (??) - that's the point  
that astonishes me.


thanks if anybody could clarify this for me,
kay

Zitat von Greg Snow :

Try thinking this one through from first principles, you are  
essentially saying that your null hypothesis is that you are  
flipping a fair coin and you want to do a 2-tailed test.  You then  
flip the coin exactly once, what do you expect to happen?  The  
p-value of 1 just means that what you saw was perfectly consistent  
with what is predicted to happen flipping a single time.


Does that help?

If not, please explain what you mean a little better.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Kay Cichini
Sent: Wednesday, September 01, 2010 3:06 PM
To: r-help@r-project.org
Subject: [R] general question on binomial test / sign test


hello,

i did several binomial tests and noticed for one sparse dataset that
binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't
quite
grasp. that would say that the a prob of 1/2 has p-value of 0 ?? - i
must be
wrong but can't figure out the right interpretation..

best,
kay





-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context: http://r.789695.n4.nabble.com/general-
question-on-binomial-test-sign-test-tp2419965p2419965.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R code changes to crap

2010-09-02 Thread Ted Harding

On 02-Sep-10 07:16:54, khush  wrote:
> Hi,
> 
> I am using vim to edit my files...I do not know what has been
> pressed by me as my R code get converted to such a crap...
> 
> %PDF-1.4
> %<81>â<81>ã<81>Ï<81>Ó\r
> 1 0 obj
> <<
> /CreationDate (D:20100902122215)
> /ModDate (D:20100902122215)
> /Title (R Graphics Output)
> /Producer (R 2.11.1)
> /Creator (R)
> 
> how to reconvert it to the my actual code?? as I do not have  backup
> for
> it..?
> 
> Thanks in advance
> Khushwant

The content you have listed above is part of a header from a PDF file,
apparently generated by R from a plot() command ("R Graphics Output")
presumably encapsulated between pdf(...) ... dev.off() commands.
It is most unlikely that such a file would contain any "R code"
(i.e. a list of C commands).

Are you sure you are editing the correct file? What are you trying
to do? In any case, there are almost no circumstances in which it
is meaningful to apply a text editor (like 'vim') to a PDF file!

Possibly, if you are using a Windows platform, you may have mixed
up a text file with R commands (e.g. "project1.R") with a PDF
file generated by R using the same name (e.g. "project1.pdf")
because Windows (by default) does not show you the extension
(respectively ".R" and ".pdf").

Please clarify!
Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 02-Sep-10   Time: 09:01:35
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ggplot2 multiple group barchart

2010-09-02 Thread ONKELINX, Thierry

Dear Greg,

First convert your data.frame to a long format using melt(). Then use
ddply() to calculate the averages. Once you get at this point is should
be rather straightforward.

library (ggplot2)

v1  <- c(1,2,3,3,4)
v2  <- c(4,3,1,1,9)
v3  <- c(3,5,7,2,9)
gender <- c("m","f","m","f","f")

d.data  <- data.frame (v1, v2, v3, gender) 

molten <- melt(d.data)

Average <- ddply(molten, c("gender", "variable"), function(z){
c(Mean = mean(z$value))}
)

ggplot (data=Average, aes(x = variable, y = Mean, fill = gender)) + 
geom_bar(position = position_dodge()) + 
coord_flip() + 
geom_text (position = position_dodge(0.5),
aes(label=round(Mean, 1)), vjust=0.5, hjust=4,colour="white", size=7)

ggplot (data=Average, aes(x = variable, y = Mean)) + 
geom_bar() + 
coord_flip() + 
geom_text(aes(label=round(Mean, 1)), vjust=0.5,
hjust=4,colour="white", size=7) + 
facet_wrap(~gender)

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Waller Gregor (wall)
> Verzonden: woensdag 1 september 2010 17:16
> Aan: r-help@r-project.org
> Onderwerp: [R] ggplot2 multiple group barchart
> 
> 
> hi there.. i got a problem with ggplot2.
> 
> here my example:
> 
> library (ggplot2)
> 
> v1  <- c(1,2,3,3,4)
> v2  <- c(4,3,1,1,9)
> v3  <- c(3,5,7,2,9)
> gender <- c("m","f","m","f","f")
> 
> d.data  <- data.frame (v1, v2, v3, gender) d.data
> 
> x  <- names (d.data[1:3])
> y  <-  mean (d.data[1:3])
> 
> 
> pl  <- ggplot (data=d.data, aes (x=x,y=y)) pl  <- pl + 
> geom_bar() pl  <- pl + coord_flip() pl  <- pl + geom_text 
> (aes(label=round(y,1)),vjust=0.5, hjust=4,colour="white", size=7) pl
> 
> this gives me a nice barchart to compare the means of my 
> variables "v1","v2" and "v3".
> my question: how do i have to proceed if i want this barchart 
> splittet by the variable "gender".
> so i get two small bars for "v1", one for female and one for 
> male, two bars for "v2" etc. 
> i need them all in one chart. 
> 
> fill=gender, position="dodge" do not work... 
> 
> any ideas?
> 
> thanks a lot
> 
> greg
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using library and lib.loc

2010-09-02 Thread omerle

Hi,

I didn't find any post on this subject so I ll ask you some advices.

Let's say that I have two library trees.

Number 1 is the default R library tree on path1
Number 2 is another library tree on a server with all packages on path2.

When I set library(aaMI,lib.loc=paths2) it loads the package even if its not on 
default R library
When I set library(fOptions,lib.loc=paths2) it doesn't load because timeSeries 
is not on default R library (timeSeries is a required package for fOptions)

> library(fOptions,lib.loc=.lib.loc)
Le chargement a nÃ©cessitÃ© le package : timeDate
Le chargement a nÃ©cessitÃ© le package : timeSeries
Erreur : le package 'timeSeries' ne peut Ãªtre chargÃ©
De plus : Message d'avis :
In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = 
lib.loc) :
Â  aucun package nommÃ© 'timeSeries' n'est trouvÃ©

(Sorry for french error message. By the way, how can I set error in French 
(setting language in English in R installation is not sufficient !)

How can I set lib.loc for every package that I will load ?

Or is there any global way of doing this ?

Thanks,

Olivier


Une messagerie gratuite, garantie Ã  vie et des services en plus, Ã§a vous 
tente ?
Je crÃ©e ma boÃ®te mail www.laposte.net

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Ted Harding

You state: "in reverse the p-value of 1 says that i can 100% sure
that the estimate of 0.5 is true". This is where your logic about
significance tests goes wrong.

The general logic of a singificance test is that a test statistic
(say T) is chosen such that large values represent a discrepancy
between possible data and the hypothesis under test. When you
have the data, T evaluates to a value (say t0). The null hypothesis
(NH) implies a distribution for the statistic T if the NH is true.

Then the value of Prob(T >= t0 | NH) can be calculated. If this is
small, then the probability of obtaining data at least as discrepant
as the data you did obtain is small; if sufficiently small, then
the conjunction of NH and your data (as assessed by the statistic T)
is so unlikely that you can decide to not believe that it is possible.
If you so decide, then you reject the NH because the data are so
discrepant that you can't believe it!

This is on the same lines as the "reductio ad absurdum" in classical
logic: "An hypothesis A implies that an outcome B must occur. But I
have observed that B did not occur. Therefore A cannot be true."

But it does not follow that, if you observe that B did occur
(which is *consistent* with A), then A must be true. A could be
false, yet B still occur -- the only basis on which occurrence
of B could *prove* that A must be true is when you have the prior
information that B will occur *if and only if* A is true. In the
reductio ad absurdum, and in the parallel logic of significance
testing, all you have is "B will occur *if* A is true". The "only if"
part is not there. So you cannot deduce that "A is true" from
the observation that "B occurred", since what you have to start with
allows B to occur if A is false (i.e. "B will occur *if* A is true"
says nothing about what may or may not happen if A is false).

So, in your single toss of a coin, it is true that "I will observe
either 'succ' or 'fail' if the coin is fair". But (as in the above)
you cannot deduce that "the coin is fair" if you observe either
'succ' or 'fail', since it is possible (indeed necessary) that you
obtain such an observation if the coin is not fair (even if the
coin is the same, either 'succ' or 'fail', on both sides, therefore
completely unfair). This is an attempt to expand Greg Snow's reply!

Your 2-sided test takes the form T=1 if either outcome='succ' or
outcome='fail'. And that is the only possible value for T since
no other outcome is possible. Hence Prob(T==1) = 1 whether the coin
is fair or not. It is not possible for such data to discriminate
between a fair and an unfair coin.

And, as explained above, a P-value of 1 cannot prove that the
null hypothesis is true. All that is possible with a significance
test is that a small P-value can be taken as evidence that the
NH is false.

Hoping this helps!
Ted.

On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote:
> i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with  
> one trail and get a p-value of 1. actually i want to proof the  
> alternative H that the estimate is different from 0.5, what certainly  
> can not be aproven here. but in reverse the p-value of 1 says that i  
> can 100% sure that the estimate of 0.5 is true (??) - that's the point 
> that astonishes me.
> 
> thanks if anybody could clarify this for me,
> kay
> 
> Zitat von Greg Snow :
> 
>> Try thinking this one through from first principles, you are  
>> essentially saying that your null hypothesis is that you are  
>> flipping a fair coin and you want to do a 2-tailed test.  You then  
>> flip the coin exactly once, what do you expect to happen?  The  
>> p-value of 1 just means that what you saw was perfectly consistent  
>> with what is predicted to happen flipping a single time.
>>
>> Does that help?
>>
>> If not, please explain what you mean a little better.
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.s...@imail.org
>> 801.408.8111
>>
>>
>>> -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
>>> project.org] On Behalf Of Kay Cichini
>>> Sent: Wednesday, September 01, 2010 3:06 PM
>>> To: r-help@r-project.org
>>> Subject: [R] general question on binomial test / sign test
>>>
>>>
>>> hello,
>>>
>>> i did several binomial tests and noticed for one sparse dataset that
>>> binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't
>>> quite
>>> grasp. that would say that the a prob of 1/2 has p-value of 0 ?? - i
>>> must be
>>> wrong but can't figure out the right interpretation..
>>>
>>> best,
>>> kay
>>>
>>>
>>>
>>>
>>>
>>> -
>>> 
>>> Kay Cichini
>>> Postgraduate student
>>> Institute of Botany
>>> Univ. of Innsbruck
>>> 
>>>
>>> --
>>> View this message in context: http://r.789695.n4.nabble.com/general-
>>> question-on-binomial-test-sign-test-tp2419965p2419965.html
>>> Sent from the R help mailing list archive at Nabble.com.

[R] Is there any package or function perform stepwise variable selection under GEE method?

2010-09-02 Thread 黃敏慈

Hi ,

I use library(gee),library(geepack),library(yags) perform GEE data analysis
, but all of them cannot do variable selection!

Both step and stepAIC can do variable selection based on AIC criterion under
linear regression and glm,
  but they  cannot work when model is based on GEE.

I want to ask whether any variable selection function or package under GEE
model avaliable now?
Thanks!

Best,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape to wide format takes extremely long

2010-09-02 Thread Dennis Murphy

Hi:

I did the following test using function ddply() in the plyr package on a toy
data frame with 5 observations using five studies, 20 subjects per
study, 25 cycles per subject, five days per cycle and four observations by
type per day. No date-time variable was included.

# Test data frame
big <- data.frame(study = factor(rep(1:5, each = 1)),
  subject = factor(rep(1:100, each = 500)),
  cycle = rep(rep(1:25, each = 20), 100),
  day = rep(rep(1:5, each = 4), 500),
  type = rep(c('ALB', 'ALP', 'ALT', 'AST'), 12500),
  obs = rpois(5, 70) )
> dim(big)
[1] 5 6

# 64-bit R on a Windows 7 box with 8Gb RAM and a 2.93GHz Core Duo chip.
system.time(bigtab <- ddply(big, .(study, subject, cycle, day), function(x)
xtabs(obs ~ type, data = x)))
   user  system elapsed
  30.220.02   30.60

> dim(bigtab)
[1] 12500 8
> head(bigtab)
  study subject cycle day ALB ALP ALT AST
1 1   1 1   1  77  80  67  70
2 1   1 1   2  60  54  70  70
3 1   1 1   3  71  77  69  65
4 1   1 1   4  62  71  73  68
5 1   1 1   5  78  67  69  78
6 1   1 2   1  71  69  74  69
> tail(bigtab)
  study subject cycle day ALB ALP ALT AST
12495 5 10024   5  75  83  72  70
12496 5 10025   1  85  52  62  70
12497 5 10025   2  79  64  84  68
12498 5 10025   3  67  65  74  81
12499 5 10025   4  62  86  66  80
12500 5 10025   5  58  76  85  84

There may be an easier/more efficient way to do this with melt() and cast()
in the reshape package, but moved on when I couldn't figure it out within
ten minutes (probably because I was thinking 'xtabs of obs by type for
study/subject/cycle/day combinations - that's the ticket!' :) Packages sqldf
and data.table are other viable options for this sort of task, and now that
there is a test data set to play with, it would be interesting to see what
else can be done. I'd be surprised if this couldn't be done within a few
seconds because the data frame is not that large.

Anyway, HTH,
Dennis

On Thu, Sep 2, 2010 at 12:24 AM, Coen van Hasselt
wrote:

> Hello,
>
> I have a data.frame with the following format:
>
> > head(clin2)
>Study Subject  Type  Obs Cycle Day   Date  Time
> 1 A001101   10108   ALB 44.098   1 2004-03-11 14:26
> 2 A001101   10108   ALP 95.098   1 2004-03-11 14:26
> 3 A001101   10108   ALT 61.098   1 2004-03-11 14:26
> 5 A001101   10108   AST 33.098   1 2004-03-11 14:26
>
> I want to transform this data.frame so that I have "Obs" columns for
> each "Type". The full dataset is 45000 rows long. For a short subset
> of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
> All columns are either numeric or character format (incl. date/time).
>
> > reshape(clin2, v.names="Obs", timevar="Type",
> direction="wide",idvar=c("Study","Subject","Cycle","Day","Date","Time"),)
>  Study Subject Cycle Day   Date  Time Obs.ALB Obs.ALP Obs.ALT
> Obs.AST
> 1   A001101   1010898   1 2004-03-11 14:26  44  95  61
>  33
> 11  A001101   10108 1   1 2004-03-12 14:01  41  85  39
>  33
> 21  A001101   10108 1   8 2004-03-22 10:34  40  90  70
>  34
> 30  A001101   10108 1  15 2004-03-29 09:56  45  97  66
> 48 []
>
> However, when using the same reshape command for the full data.frame
> of 45000 rows, it still wasn't finished when run overnight (8 GB RAM +
> 8 GB swap in use).
>
> The time to process this data.frame from a 100-row subset to a
> 1000-row subset increases from 0.2 sec to 60 sec.
>
> I would greatly appreciate a advice why the time for reshaping is
> increasing exponentially with the nr. of rows, and how I can do this
> in an elegant way.
>
> Thanks!
>
> Coen.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to represent error bar in a plot legend?

2010-09-02 Thread Jim Lemon


On 09/02/2010 04:56 AM, josef.kar...@phila.gov wrote:

I have a simple barplot of 4 mean values, each mean value has an
associated 95% confidence interval drawn on the plot as an error bar.
I want to make a legend on the plot that uses the error bar symbol, and
explains "95% C.I."
How do I show the error bar symbol in the legend?  I could not find any
"pch" values that are appropriate


Hi Josef,
There is a way to do this, but it is not too easy. First, use the 
my.symbols function in the TeachingDemos package to define the "error 
bar" symbol. Then, you can draw a legend without one or more symbol 
elements and place your "error bar" symbol where the symbol would have 
been. You can see how to do this in the code of the legendg function in 
the plotrix package by getting the necessary coordinates from the legend 
function.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape to wide format takes extremely long

2010-09-02 Thread ONKELINX, Thierry

Dear Dennis,

cast() is in this case much faster.

> system.time(bigtab <- ddply(big, .(study, subject, cycle, day),
function(x) xtabs(obs ~ type, data = x)))
   user  system elapsed 
  35.360.12   35.53 
> system.time(bigtab2 <- cast(data = big, study + subject + cycle + day
~type, value = "obs", fun = mean))
   user  system elapsed 
   4.090.004.09 

I have the feeling that ddply() has a lot of overhead when the number of
levels is large.

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Dennis Murphy
> Verzonden: donderdag 2 september 2010 11:34
> Aan: Coen van Hasselt
> CC: r-help@r-project.org
> Onderwerp: Re: [R] reshape to wide format takes extremely long
> 
> Hi:
> 
> I did the following test using function ddply() in the plyr 
> package on a toy data frame with 5 observations using 
> five studies, 20 subjects per study, 25 cycles per subject, 
> five days per cycle and four observations by type per day. No 
> date-time variable was included.
> 
> # Test data frame
> big <- data.frame(study = factor(rep(1:5, each = 1)),
>   subject = factor(rep(1:100, each = 500)),
>   cycle = rep(rep(1:25, each = 20), 100),
>   day = rep(rep(1:5, each = 4), 500),
>   type = rep(c('ALB', 'ALP', 'ALT', 'AST'), 12500),
>   obs = rpois(5, 70) )
> > dim(big)
> [1] 5 6
> 
> # 64-bit R on a Windows 7 box with 8Gb RAM and a 2.93GHz Core 
> Duo chip.
> system.time(bigtab <- ddply(big, .(study, subject, cycle,
> day), function(x) xtabs(obs ~ type, data = x)))
>user  system elapsed
>   30.220.02   30.60
> 
> > dim(bigtab)
> [1] 12500 8
> > head(bigtab)
>   study subject cycle day ALB ALP ALT AST
> 1 1   1 1   1  77  80  67  70
> 2 1   1 1   2  60  54  70  70
> 3 1   1 1   3  71  77  69  65
> 4 1   1 1   4  62  71  73  68
> 5 1   1 1   5  78  67  69  78
> 6 1   1 2   1  71  69  74  69
> > tail(bigtab)
>   study subject cycle day ALB ALP ALT AST
> 12495 5 10024   5  75  83  72  70
> 12496 5 10025   1  85  52  62  70
> 12497 5 10025   2  79  64  84  68
> 12498 5 10025   3  67  65  74  81
> 12499 5 10025   4  62  86  66  80
> 12500 5 10025   5  58  76  85  84
> 
> There may be an easier/more efficient way to do this with 
> melt() and cast() in the reshape package, but moved on when I 
> couldn't figure it out within ten minutes (probably because I 
> was thinking 'xtabs of obs by type for 
> study/subject/cycle/day combinations - that's the ticket!' :) 
> Packages sqldf and data.table are other viable options for 
> this sort of task, and now that there is a test data set to 
> play with, it would be interesting to see what else can be 
> done. I'd be surprised if this couldn't be done within a few 
> seconds because the data frame is not that large.
> 
> Anyway, HTH,
> Dennis
> 
> 
> 
> On Thu, Sep 2, 2010 at 12:24 AM, Coen van Hasselt
> wrote:
> 
> > Hello,
> >
> > I have a data.frame with the following format:
> >
> > > head(clin2)
> >Study Subject  Type  Obs Cycle Day   Date  Time
> > 1 A001101   10108   ALB 44.098   1 2004-03-11 14:26
> > 2 A001101   10108   ALP 95.098   1 2004-03-11 14:26
> > 3 A001101   10108   ALT 61.098   1 2004-03-11 14:26
> > 5 A001101   10108   AST 33.098   1 2004-03-11 14:26
> >
> > I want to transform this data.frame so that I have "Obs" 
> columns for 
> > each "Type". The full dataset is 45000 rows long. For a 
> short subset 
> > of 100 rows, reshaping takes 0.2 seconds, and produces what I want.
> > All columns are either numeric or character format (incl. 
> date/time).
> >
> > > reshape(clin2, v.names="Obs", timevar="Type",
> > 
> direction="wide",idvar=c("Study","Subject","Cycle","Day","Date
> ","Time"),)
> >  Study Subject Cycle Day   Date  Time Obs.ALB 
> Obs.ALP Obs.ALT
> > Obs.AST
> > 1   A001101   1010898   1 2004-03-11 14:26  44  
> 95  61
> >  33
> > 11  A001101   1

Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :

2010-09-02 Thread Duncan Murdoch


On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote:

Hi,

I've built my own package in windows and when I run R CMD check Package-Name I 
get,

* install options are ' --no-html'
* installing *source* package 'AceTest' ...
** libs
  making DLL ...
g++ ...etc. 
installing to 

  ... done
** R
** preparing package for lazy loading
** help
Warning: ./man/AceTest-package.Rd:34: All text must be in a section
Warning: ./man/AceTest-package.Rd:35: All text must be in a section
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
Error in inDL(x, as.logical(local), as.logical(now), ...) : 
  unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll':

  LoadLibrary failure:  The specified module could not be found.
ERROR: loading failed


The message "The specified module could not be found" comes from 
Windows.  It probably means that your dll depends on some other dll and 
the other one is unavailable, but it might mean that AceTest.dll itself 
can't be loaded (permission problems, defective dll, etc.) In some cases 
Windows will pop up a dialog box with more information than that skimpy 
message, e.g. if you install the package and try to run library(AceTest) 
 from within Rgui.


Duncan Murdoch



* removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest'


can someone point out what went wrong?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape to wide format takes extremely long

2010-09-02 Thread carslaw


picking up on Thierry's example, I don't think you need any function because
you are just reshaping 
(not aggregating).  Therefore:

bigtab2 <- cast(data = big, study + subject + cycle + day  ~type, value =
"obs")
> head(bigtab2)
  study subject cycle day ALB ALP ALT AST
1 1   1 1   1  66  71  83  76
2 1   1 1   2  66  87  78  58
3 1   1 1   3  72  84  78  61
4 1   1 1   4  72  63  68  69
5 1   1 1   5  64  68  89  89
6 1   1 2   1  78  65  65  76

system.time(bigtab2 <- cast(data = big, study + subject + cycle + day 
~type, value = "obs"))
   user  system elapsed 
  0.760   0.000   0.782 

david
-- 
View this message in context: 
http://r.789695.n4.nabble.com/reshape-to-wide-format-takes-extremely-long-tp2487153p2506575.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R graphics: Add labels to stacked bar chart

2010-09-02 Thread Jens Oldeland


Hi,

I am looking for a way to add labels, i.e. absolute values, into a 
stacked bar chart using the basic plot functions of R. The labels should 
be inside the stacked bars.


For example,

###  I have this dataset

height = cbind(x = c(465, 91) / 465 * 100,
  y = c(840, 200) / 840 * 100,
  z = c(37, 17) / 37 * 100)

### and use this plot
barplot(height,
   beside = FALSE,
   horiz=T,
   col = c(2, 3)
   )

how can I put the values from height within the respective stacked bars?


Thank you!

--
+
Dipl.Biol. Jens Oldeland
Biodiversity of Plants
Biocentre Klein Flottbek and Botanical Garden
University of Hamburg 
Ohnhorststr. 18

22609 Hamburg,
Germany

Tel:0049-(0)40-42816-407
Fax:0049-(0)40-42816-543
Mail:   oldel...@botanik.uni-hamburg.de
   oldel...@gmx.de  (for attachments > 2mb!!)
Skype:  jens.oldeland
http://www.biologie.uni-hamburg.de/bzf/fbda005/fbda005.htm
http://jensoldeland.wordpress.com
+

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :

2010-09-02 Thread Duncan Murdoch


On 02/09/2010 6:46 AM, raje...@cse.iitm.ac.in wrote:
It is dependent on another dll but it did not give compilation errors. It seemed to link fine at that point. Why does it have a problem at this stage? 


Windows needs to be able to find the other DLL at load time.  It will 
find it if it's in the same directory or on the PATH.


Duncan Murdoch




From: "Duncan Murdoch"  
To: raje...@cse.iitm.ac.in 
Cc: "r-help"  
Sent: Thursday, September 2, 2010 4:05:14 PM 
Subject: Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) : 

On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote: 
Hi, 

I've built my own package in windows and when I run R CMD check Package-Name I get, 

* install options are ' --no-html' 
* installing *source* package 'AceTest' ... 
** libs 
making DLL ... 
g++ ...etc. 
installing to  
... done 
** R 
** preparing package for lazy loading 
** help 
Warning: ./man/AceTest-package.Rd:34: All text must be in a section 
Warning: ./man/AceTest-package.Rd:35: All text must be in a section 
*** installing help indices 
** building package indices ... 
** testing if installed package can be loaded 
Error in inDL(x, as.logical(local), as.logical(now), ...) : 
unable to load shared library 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': 
LoadLibrary failure: The specified module could not be found. 
ERROR: loading failed 


The message "The specified module could not be found" comes from 
Windows. It probably means that your dll depends on some other dll and 
the other one is unavailable, but it might mean that AceTest.dll itself 
can't be loaded (permission problems, defective dll, etc.) In some cases 
Windows will pop up a dialog box with more information than that skimpy 
message, e.g. if you install the package and try to run library(AceTest) 
from within Rgui. 

Duncan Murdoch 



* removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' 



can someone point out what went wrong? 
[[alternative HTML version deleted]] 

__ 
R-help@r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] draw a graph of the semi-partial R-squared /CAH

2010-09-02 Thread Yan leeuw

Dear r-help,

I am using CAH. I would cut my dendogram. What is the command in R that
allows draw a graph of the semi-partial R-squared ?

Best Regards

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD check Package(Windows): Error in inDL(x, as.logical(local), as.logical(now), ...) :

2010-09-02 Thread raje...@cse.iitm.ac.in

It is dependent on another dll but it did not give compilation errors. It 
seemed to link fine at that point. Why does it have a problem at this stage? 

From: "Duncan Murdoch"  
To: raje...@cse.iitm.ac.in 
Cc: "r-help"  
Sent: Thursday, September 2, 2010 4:05:14 PM 
Subject: Re: [R] R CMD check Package(Windows): Error in inDL(x, 
as.logical(local), as.logical(now), ...) : 

On 02/09/2010 2:29 AM, raje...@cse.iitm.ac.in wrote: 
> Hi, 
> 
> I've built my own package in windows and when I run R CMD check Package-Name 
> I get, 
> 
> * install options are ' --no-html' 
> * installing *source* package 'AceTest' ... 
> ** libs 
> making DLL ... 
> g++ ...etc. 
> installing to  
> ... done 
> ** R 
> ** preparing package for lazy loading 
> ** help 
> Warning: ./man/AceTest-package.Rd:34: All text must be in a section 
> Warning: ./man/AceTest-package.Rd:35: All text must be in a section 
> *** installing help indices 
> ** building package indices ... 
> ** testing if installed package can be loaded 
> Error in inDL(x, as.logical(local), as.logical(now), ...) : 
> unable to load shared library 
> 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest/libs/AceTest.dll': 
> LoadLibrary failure: The specified module could not be found. 
> ERROR: loading failed 

The message "The specified module could not be found" comes from 
Windows. It probably means that your dll depends on some other dll and 
the other one is unavailable, but it might mean that AceTest.dll itself 
can't be loaded (permission problems, defective dll, etc.) In some cases 
Windows will pop up a dialog box with more information than that skimpy 
message, e.g. if you install the package and try to run library(AceTest) 
from within Rgui. 

Duncan Murdoch 

> * removing 'H:/RTick/Project/Client/AceTest.Rcheck/AceTest' 
> 
> 
> can someone point out what went wrong? 
> [[alternative HTML version deleted]] 
> 
> __ 
> R-help@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help 
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
> and provide commented, minimal, self-contained, reproducible code. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error: could not find function "ad.test"

2010-09-02 Thread DrCJones


Hi,
I'm trying to run an anderson-darling test for normality on a given variable
'Y': 
ad.test(Y)

I think I need the 'nortest' package, but since it does not appear in any of
the Ubuntu repositories for 2.10.1, I am wondering if it goes by the name of
something else now? 

Thanks
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Error-could-not-find-function-ad-test-tp2501293p2501293.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R graphics: Add labels to stacked bar chart

2010-09-02 Thread Jim Lemon

On 09/02/2010 08:50 PM, Jens Oldeland wrote:
> ...
> I am looking for a way to add labels, i.e. absolute values, into a
> stacked bar chart using the basic plot functions of R. The labels
> should be inside the stacked bars.

barpos<-barplot(height,beside = FALSE,
 horiz=TRUE,col = c(2, 3))
library(plotrix)
boxed.labels(c(height[1,]/2,height[1,]+height[2,]/2),
 rep(barpos,2),c(height[1,],round(height[2,],1)))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] nlme formula from model specification

2010-09-02 Thread Mikkel Meyer Andersen

Dear R-community,

I'm analysing some noise using the nlme-package. I'm writing in order
to get my usage of lme verified.

In practise, a number of samples have been processed by a machine
measuring the same signal at four different channels. I want to model
the noise. I have taken the noise (the signal is from position 1 to
3500, and after that there is only noise).

My data looks like this:
channel.matrix:
  pos channel0 channel1 channel2 channel3 samplenumber
   1 350183   1211
   2 3502370   141
   3 350391   1331
   4 3504373   141
   5 350565451
   6 350670   1601
...
 495 399552991
 496 3996246   101
 497 399732771
 498 399824391
 499 3999316   111
 500 400003671
2301 350114392
2302 3502334   132
2303 350341852
2304 350431   1022
2305 350523582
2306 350605822
...

The model is
channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 + channel3
where i is sample number, j is position, and:
  alpha_i: fixed effect for each samplenumber
  eps_{i, j}:  random effect, here with correlation
structure as AR(1)
  channel1, ..., channel3: fixed effect for each channel not depending on
   samplenumber nor position

(And then afterwards I would model channel1 ~ ... + channel2 + channel3 etc.)

I then use this function call:
channel.matrix.grouped <- groupedData(channel0 ~ pos | samplenumber,
  data = channel.matrix)

fit <- lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3,
  random = ~ pos | samplenumber,
  correlation = corAR1(value = 0.5, form = ~ pos | samplenumber),
  data = channel.matrix.grouped)

Is that the right way to express the model in (n)lme-notation?

Cheers, Mikkel.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape to wide format takes extremely long

2010-09-02 Thread Dennis Murphy

Hi:

Thanks, David and Thierry. I knew what I did was inefficient, but I'm not
very adept with cast() yet. Thanks for the lesson!
The time is less than half without the fun = mean statement, too. On my
system, the timing of David's call was 2.06 s elapsed; with Thierry's, it
was 4.88 s. Both big improvements over my band-aid attempt.

Regards,
Dennis

On Thu, Sep 2, 2010 at 3:36 AM, carslaw  wrote:

>
> picking up on Thierry's example, I don't think you need any function
> because
> you are just reshaping
> (not aggregating).  Therefore:
>
> bigtab2 <- cast(data = big, study + subject + cycle + day  ~type, value =
> "obs")
> > head(bigtab2)
>   study subject cycle day ALB ALP ALT AST
> 1 1   1 1   1  66  71  83  76
> 2 1   1 1   2  66  87  78  58
> 3 1   1 1   3  72  84  78  61
> 4 1   1 1   4  72  63  68  69
> 5 1   1 1   5  64  68  89  89
> 6 1   1 2   1  78  65  65  76
>
> system.time(bigtab2 <- cast(data = big, study + subject + cycle + day
> ~type, value = "obs"))
>   user  system elapsed
>  0.760   0.000   0.782
>
> david
> --
> View this message in context:
> http://r.789695.n4.nabble.com/reshape-to-wide-format-takes-extremely-long-tp2487153p2506575.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: could not find function "ad.test"

2010-09-02 Thread Dennis Murphy

Hi:

One option to search for functions in R is to download the sos package:

library(sos)
findFn('Anderson-Darling')

On my system (Win 7), it found 51 matches. The two likeliest packages are
nortest and ADGofTest. To answer your question, nortest still exists on
CRAN. I can't comment on Ubuntu, but it's possible you may have to install
it manually from the gzip sources. I'm sure that others with hands-on Ubuntu
experience can provide a more complete answer.

HTH,
Dennis

On Thu, Sep 2, 2010 at 2:42 AM, DrCJones  wrote:

>
> Hi,
> I'm trying to run an anderson-darling test for normality on a given
> variable
> 'Y':
> ad.test(Y)
>
> I think I need the 'nortest' package, but since it does not appear in any
> of
> the Ubuntu repositories for 2.10.1, I am wondering if it goes by the name
> of
> something else now?
>
> Thanks
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Error-could-not-find-function-ad-test-tp2501293p2501293.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nlme formula from model specification

2010-09-02 Thread ONKELINX, Thierry

Dear Mikkel,

You need to do some reading on terminology.

In your model the fixed effects are channel 1, 2 and 3. samplenumber is
a random effect and the error term is an error term

The model you described has the notation below. You do not need to
create the grouped data structure.

lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3,
   random = ~ 1 | samplenumber,
   correlation = corAR1(value = 0.5, form = ~ pos | samplenumber),
   data = channel.matrix) 

HTH,

Thierry

PS There is a dedicated mailing list for mixed models:
R-sig-mixed-models



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Mikkel Meyer Andersen
> Verzonden: donderdag 2 september 2010 13:30
> Aan: r-help@r-project.org
> Onderwerp: [R] nlme formula from model specification
> 
> Dear R-community,
> 
> I'm analysing some noise using the nlme-package. I'm writing 
> in order to get my usage of lme verified.
> 
> In practise, a number of samples have been processed by a 
> machine measuring the same signal at four different channels. 
> I want to model the noise. I have taken the noise (the signal 
> is from position 1 to 3500, and after that there is only noise).
> 
> My data looks like this:
> channel.matrix:
>   pos channel0 channel1 channel2 channel3 samplenumber
>1 350183   1211
>2 3502370   141
>3 350391   1331
>4 3504373   141
>5 350565451
>6 350670   1601
> ...
>  495 399552991
>  496 3996246   101
>  497 399732771
>  498 399824391
>  499 3999316   111
>  500 400003671
> 2301 350114392
> 2302 3502334   132
> 2303 350341852
> 2304 350431   1022
> 2305 350523582
> 2306 350605822
> ...
> 
> The model is
> channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 + 
> channel3 where i is sample number, j is position, and:
>   alpha_i: fixed effect for each samplenumber
>   eps_{i, j}:  random effect, here with correlation
> structure as AR(1)
>   channel1, ..., channel3: fixed effect for each channel not 
> depending on
>samplenumber nor position
> 
> (And then afterwards I would model channel1 ~ ... + channel2 
> + channel3 etc.)
> 
> I then use this function call:
> channel.matrix.grouped <- groupedData(channel0 ~ pos | samplenumber,
>   data = channel.matrix)
> 
> fit <- lme(channel0 ~ pos + samplenumber + channel1 + 
> channel2 + channel3,
>   random = ~ pos | samplenumber,
>   correlation = corAR1(value = 0.5, form = ~ pos | samplenumber),
>   data = channel.matrix.grouped)
> 
> Is that the right way to express the model in (n)lme-notation?
> 
> Cheers, Mikkel.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
s

[R] how to cluster vectors of factors

2010-09-02 Thread tueken


Hello all

I wonder what can i use to cluster vectors which composed of several
factors.
lets say around 30 different factors compose a vector, and if the factor is
present then it encoded as 1, if not presented then it will be encoded as 0.
I was thinking of using hierarchical clustering, as i know the distance
between two vector were calculated through euclidean distance function, but
i dont think this distance would be correct to separate the data, cause two
vector with different composition, could end up having similar distance to
another vector.
hope someone could give me some clue!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-cluster-vectors-of-factors-tp2514654p2514654.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there any package or function perform stepwise variable selection under GEE method?

2010-09-02 Thread Frank Harrell



The absence of stepwise methods works to your advantage, as these 
yield invalid statistical inference and inflated regression 
coefficients.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Thu, 2 Sep 2010, 黃敏慈 wrote:


Hi ,

I use library(gee),library(geepack),library(yags) perform GEE data analysis
, but all of them cannot do variable selection!

Both step and stepAIC can do variable selection based on AIC criterion under
linear regression and glm,
 but they  cannot work when model is based on GEE.

I want to ask whether any variable selection function or package under GEE
model avaliable now?
Thanks!

Best,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nlme formula from model specification

2010-09-02 Thread Mikkel Meyer Andersen

Dear Thierry,

Thanks for the quick answer. I'm moving this to r-sig-mixed-models
(but also posting on r-help to notify).

I reserved "Mixed-effects models in S and S-PLUS" by Pinheiro and
Bates, New York : Springer, 2000. Do you know any other good
references?

Cheers, Mikkel.

2010/9/2 ONKELINX, Thierry :
> Dear Mikkel,
>
> You need to do some reading on terminology.
>
> In your model the fixed effects are channel 1, 2 and 3. samplenumber is
> a random effect and the error term is an error term
>
> The model you described has the notation below. You do not need to
> create the grouped data structure.
>
> lme(channel0 ~ pos + samplenumber + channel1 + channel2 + channel3,
>   random = ~ 1 | samplenumber,
>   correlation = corAR1(value = 0.5, form = ~ pos | samplenumber),
>   data = channel.matrix)
>
> HTH,
>
> Thierry
>
> PS There is a dedicated mailing list for mixed models:
> R-sig-mixed-models
>
> 
> 
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek
> team Biometrie & Kwaliteitszorg
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> Research Institute for Nature and Forest
> team Biometrics & Quality Assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
>
> tel. + 32 54/436 185
> thierry.onkel...@inbo.be
> www.inbo.be
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
>
>> -Oorspronkelijk bericht-
>> Van: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org] Namens Mikkel Meyer Andersen
>> Verzonden: donderdag 2 september 2010 13:30
>> Aan: r-help@r-project.org
>> Onderwerp: [R] nlme formula from model specification
>>
>> Dear R-community,
>>
>> I'm analysing some noise using the nlme-package. I'm writing
>> in order to get my usage of lme verified.
>>
>> In practise, a number of samples have been processed by a
>> machine measuring the same signal at four different channels.
>> I want to model the noise. I have taken the noise (the signal
>> is from position 1 to 3500, and after that there is only noise).
>>
>> My data looks like this:
>> channel.matrix:
>>       pos channel0 channel1 channel2 channel3 samplenumber
>>    1 3501        8        3       12        1            1
>>    2 3502        3        7        0       14            1
>>    3 3503        9        1       13        3            1
>>    4 3504        3        7        3       14            1
>>    5 3505        6        5        4        5            1
>>    6 3506        7        0       16        0            1
>> ...
>>  495 3995        5        2        9        9            1
>>  496 3996        2        4        6       10            1
>>  497 3997        3        2        7        7            1
>>  498 3998        2        4        3        9            1
>>  499 3999        3        1        6       11            1
>>  500 4000        0        3        6        7            1
>> 2301 3501        1        4        3        9            2
>> 2302 3502        3        3        4       13            2
>> 2303 3503        4        1        8        5            2
>> 2304 3504        3        1       10        2            2
>> 2305 3505        2        3        5        8            2
>> 2306 3506        0        5        8        2            2
>> ...
>>
>> The model is
>> channel0 ~ alpha_i + eps_{i, j} + channel1 + channel2 +
>> channel3 where i is sample number, j is position, and:
>>   alpha_i:                 fixed effect for each samplenumber
>>   eps_{i, j}:              random effect, here with correlation
>> structure as AR(1)
>>   channel1, ..., channel3: fixed effect for each channel not
>> depending on
>>                            samplenumber nor position
>>
>> (And then afterwards I would model channel1 ~ ... + channel2
>> + channel3 etc.)
>>
>> I then use this function call:
>> channel.matrix.grouped <- groupedData(channel0 ~ pos | samplenumber,
>>   data = channel.matrix)
>>
>> fit <- lme(channel0 ~ pos + samplenumber + channel1 +
>> channel2 + channel3,
>>   random = ~ pos | samplenumber,
>>   correlation = corAR1(value = 0.5, form = ~ pos | samplenumber),
>>   data = channel.matrix.grouped)
>>
>> Is that the right way to express the model in (n)lme-notation?
>>
>> Cheers, Mikkel.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Druk dit bericht a.u.b. niet onnod

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Kay Cecil Cichini



thanks a lot for the elaborations.

your explanations clearly brought to me that either  
binom.test(1,1,0.5,"two-sided") or binom.test(0,1,0.5) giving a  
p-value of 1 simply indicate i have abolutely no ensurance to reject H0.


considering binom.test(0,1,0.5,alternative="greater") and  
binom.test(1,1,0.5,alternative="less") where i get a p-value of 1 and  
0.5,respectively - am i right in stating that for the first estimate  
0/1 i have no ensurance at all for rejection of H0 and for the second  
estimate = 1/1 i have same chance for beeing wrong in either rejecting  
or keeping H0.


many thanks,
kay



Zitat von ted.hard...@manchester.ac.uk:


You state: "in reverse the p-value of 1 says that i can 100% sure
that the estimate of 0.5 is true". This is where your logic about
significance tests goes wrong.

The general logic of a singificance test is that a test statistic
(say T) is chosen such that large values represent a discrepancy
between possible data and the hypothesis under test. When you
have the data, T evaluates to a value (say t0). The null hypothesis
(NH) implies a distribution for the statistic T if the NH is true.

Then the value of Prob(T >= t0 | NH) can be calculated. If this is
small, then the probability of obtaining data at least as discrepant
as the data you did obtain is small; if sufficiently small, then
the conjunction of NH and your data (as assessed by the statistic T)
is so unlikely that you can decide to not believe that it is possible.
If you so decide, then you reject the NH because the data are so
discrepant that you can't believe it!

This is on the same lines as the "reductio ad absurdum" in classical
logic: "An hypothesis A implies that an outcome B must occur. But I
have observed that B did not occur. Therefore A cannot be true."

But it does not follow that, if you observe that B did occur
(which is *consistent* with A), then A must be true. A could be
false, yet B still occur -- the only basis on which occurrence
of B could *prove* that A must be true is when you have the prior
information that B will occur *if and only if* A is true. In the
reductio ad absurdum, and in the parallel logic of significance
testing, all you have is "B will occur *if* A is true". The "only if"
part is not there. So you cannot deduce that "A is true" from
the observation that "B occurred", since what you have to start with
allows B to occur if A is false (i.e. "B will occur *if* A is true"
says nothing about what may or may not happen if A is false).

So, in your single toss of a coin, it is true that "I will observe
either 'succ' or 'fail' if the coin is fair". But (as in the above)
you cannot deduce that "the coin is fair" if you observe either
'succ' or 'fail', since it is possible (indeed necessary) that you
obtain such an observation if the coin is not fair (even if the
coin is the same, either 'succ' or 'fail', on both sides, therefore
completely unfair). This is an attempt to expand Greg Snow's reply!

Your 2-sided test takes the form T=1 if either outcome='succ' or
outcome='fail'. And that is the only possible value for T since
no other outcome is possible. Hence Prob(T==1) = 1 whether the coin
is fair or not. It is not possible for such data to discriminate
between a fair and an unfair coin.

And, as explained above, a P-value of 1 cannot prove that the
null hypothesis is true. All that is possible with a significance
test is that a small P-value can be taken as evidence that the
NH is false.

Hoping this helps!
Ted.

On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote:

i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with
one trail and get a p-value of 1. actually i want to proof the
alternative H that the estimate is different from 0.5, what certainly
can not be aproven here. but in reverse the p-value of 1 says that i
can 100% sure that the estimate of 0.5 is true (??) - that's the point
that astonishes me.

thanks if anybody could clarify this for me,
kay

Zitat von Greg Snow :


Try thinking this one through from first principles, you are
essentially saying that your null hypothesis is that you are
flipping a fair coin and you want to do a 2-tailed test.  You then
flip the coin exactly once, what do you expect to happen?  The
p-value of 1 just means that what you saw was perfectly consistent
with what is predicted to happen flipping a single time.

Does that help?

If not, please explain what you mean a little better.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Kay Cichini
Sent: Wednesday, September 01, 2010 3:06 PM
To: r-help@r-project.org
Subject: [R] general question on binomial test / sign test


hello,

i did several binomial tests and noticed for one sparse dataset that
binom.test(1,1,0.5) gives a p-value of 1 for the null, what i can't
quite
grasp. that would say that th

Re: [R] how to represent error bar in a plot legend?

2010-09-02 Thread Josef . Kardos

I think I found a simple solution, although it requires some tinkering to 
find the x,y coordinates of the plot region to place the text...

barplot ( )
text(x= 2.9, y = 0.43, srt=90, labels = "H", cex = 1.5, col="blue")  #srt 
rotates the "H" character, so that it resembles an error bar
text(x=3.5, y=0.432,  labels = "95% C.I.", cex=1.1)
rect (2.62,.41,3.9,.45)#draws box around text

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lower triangle of the correlation matrix with xtable

2010-09-02 Thread Olga Lyashevska

Dear all,

mydata<-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3))
cor(mydata)
   x1 x2x3
x1  1.000 -0.5960396 0.3973597
x2 -0.5960396  1.000 0.500
x3  0.3973597  0.500 1.000

I wonder if it is possible to fill only lower triangle of this
correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt
allow to convert this table with xtable. 

print(xtable(cor(mydata),digits=3))

Any ideas?
Cheers,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Kolmogorov Smirnov p-values

2010-09-02 Thread Samsiddhi Bhattacharjee

I was just testing out ks.test()

> y <- runif(1, min=0, max=1)
> ks.test(y, runif, min=0, max=1, alternative="greater")

One-sample Kolmogorov-Smirnov test

data:  y
D^+ = 0.9761, p-value < 2.2e-16
alternative hypothesis: the CDF of x lies above the null hypothesis
>

It seems that everytime I run it, I get a highly significant p-value
(for two sided
or one-sided , exact=TRUE or FALSE). Can anybody tell me what is going on ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lower triangle of the correlation matrix with xtable

2010-09-02 Thread Sarah Goslee

Like this?

> mydata<-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3))
> as.dist(cor(mydata))
   x1 x2
x2 -0.5960396
x3  0.3973597  0.500

Sarah

On Thu, Sep 2, 2010 at 9:51 AM, Olga Lyashevska  wrote:
> Dear all,
>
> mydata<-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3))
> cor(mydata)
>           x1         x2        x3
> x1  1.000 -0.5960396 0.3973597
> x2 -0.5960396  1.000 0.500
> x3  0.3973597  0.500 1.000
>
> I wonder if it is possible to fill only lower triangle of this
> correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt
> allow to convert this table with xtable.
>
> print(xtable(cor(mydata),digits=3))
>
> Any ideas?
> Cheers,
> Olga
>



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov Smirnov p-values

2010-09-02 Thread Alain Guillet


 Hi,

Are you sure you don't want to do   ks.test(y, punif, min=0, max=1, 
alternative="greater") instead of what you tried?


Alain


On 02-Sep-10 15:52, Samsiddhi Bhattacharjee wrote:

ks.test(y, runif, min=0, max=1, alternative="greater")


--
Alain Guillet
Statistician and Computer Scientist

SMCS - IMMAQ - Université catholique de Louvain
Bureau c.316
Voie du Roman Pays, 20
B-1348 Louvain-la-Neuve
Belgium

tel: +32 10 47 30 50

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lower triangle of the correlation matrix with xtable

2010-09-02 Thread Steve_Friedman

try lower.tri

and see

??lower.tri

Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147


   
 Olga Lyashevska   
To 
 Sent by:  R mailing list  
 r-help-boun...@r-   
 project.orgcc 
   
   Subject 
 09/02/2010 09:51  [R] lower triangle of the   
 AMcorrelation matrix with xtable  
   
   
   
   
   
   




Dear all,

mydata<-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3))
cor(mydata)
   x1 x2x3
x1  1.000 -0.5960396 0.3973597
x2 -0.5960396  1.000 0.500
x3  0.3973597  0.500 1.000

I wonder if it is possible to fill only lower triangle of this
correlation matrix? Using 'dist' doesn't seem to be useful as it doesnt
allow to convert this table with xtable.

print(xtable(cor(mydata),digits=3))

Any ideas?
Cheers,
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Kolmogorov Smirnov p-values

2010-09-02 Thread Samsiddhi Bhattacharjee

oops sorryreally careless. thanks !

On Thu, Sep 2, 2010 at 10:03 AM, Alain Guillet
 wrote:
>  Hi,
>
> Are you sure you don't want to do   ks.test(y, punif, min=0, max=1,
> alternative="greater") instead of what you tried?
>
> Alain
>
>
> On 02-Sep-10 15:52, Samsiddhi Bhattacharjee wrote:
>>
>> ks.test(y, runif, min=0, max=1, alternative="greater")
>
> --
> Alain Guillet
> Statistician and Computer Scientist
>
> SMCS - IMMAQ - Université catholique de Louvain
> Bureau c.316
> Voie du Roman Pays, 20
> B-1348 Louvain-la-Neuve
> Belgium
>
> tel: +32 10 47 30 50
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lower triangle of the correlation matrix with xtable

2010-09-02 Thread Olga Lyashevska

if I try as.dist I get the following error:

On Thu, 2010-09-02 at 09:57 -0400, Sarah Goslee wrote:
> > mydata<-data.frame(x1=c(1,4,6),x2=c(3,1,2),x3=c(2,1,3))
> > as.dist(cor(mydata))
>x1 x2
> x2 -0.5960396
> x3  0.3973597  0.500

print(xtable(as.dist(cor(mydata)),digits=3)) 
Error in UseMethod("xtable") : 
  no applicable method for 'xtable' applied to an object of class "dist"

Jorge's solution with upper.tri works fine!

Thanks everyone for your prompt response.

Cheers
Olga

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Kay Cichini


..i'd like to add that i actually wanted to test the location of differences
of paired samples coming from an non-normal asymetric distribution. the
alternative hypothesis was that negative differences are more often than in
0.5 of all cases. thus i tested
(x=nr.diff.under.0,n=all.diffs,0.5,alternative="greater").
then this one of many tests for a sparse dataset came up where x=0, and
n=1). there i thought the H0 is x is less than 0.5, and i then had my
trouble interpreting the p-value of 1.

best,
kay
Kay Cichini wrote:
> 
> 
> thanks a lot for the elaborations.
> 
> your explanations clearly brought to me that either  
> binom.test(1,1,0.5,"two-sided") or binom.test(0,1,0.5) giving a  
> p-value of 1 simply indicate i have abolutely no ensurance to reject H0.
> 
> considering binom.test(0,1,0.5,alternative="greater") and  
> binom.test(1,1,0.5,alternative="less") where i get a p-value of 1 and  
> 0.5,respectively - am i right in stating that for the first estimate  
> 0/1 i have no ensurance at all for rejection of H0 and for the second  
> estimate = 1/1 i have same chance for beeing wrong in either rejecting  
> or keeping H0.
> 
> many thanks,
> kay
> 
> 
> 
> Zitat von ted.hard...@manchester.ac.uk:
> 
>> You state: "in reverse the p-value of 1 says that i can 100% sure
>> that the estimate of 0.5 is true". This is where your logic about
>> significance tests goes wrong.
>>
>> The general logic of a singificance test is that a test statistic
>> (say T) is chosen such that large values represent a discrepancy
>> between possible data and the hypothesis under test. When you
>> have the data, T evaluates to a value (say t0). The null hypothesis
>> (NH) implies a distribution for the statistic T if the NH is true.
>>
>> Then the value of Prob(T >= t0 | NH) can be calculated. If this is
>> small, then the probability of obtaining data at least as discrepant
>> as the data you did obtain is small; if sufficiently small, then
>> the conjunction of NH and your data (as assessed by the statistic T)
>> is so unlikely that you can decide to not believe that it is possible.
>> If you so decide, then you reject the NH because the data are so
>> discrepant that you can't believe it!
>>
>> This is on the same lines as the "reductio ad absurdum" in classical
>> logic: "An hypothesis A implies that an outcome B must occur. But I
>> have observed that B did not occur. Therefore A cannot be true."
>>
>> But it does not follow that, if you observe that B did occur
>> (which is *consistent* with A), then A must be true. A could be
>> false, yet B still occur -- the only basis on which occurrence
>> of B could *prove* that A must be true is when you have the prior
>> information that B will occur *if and only if* A is true. In the
>> reductio ad absurdum, and in the parallel logic of significance
>> testing, all you have is "B will occur *if* A is true". The "only if"
>> part is not there. So you cannot deduce that "A is true" from
>> the observation that "B occurred", since what you have to start with
>> allows B to occur if A is false (i.e. "B will occur *if* A is true"
>> says nothing about what may or may not happen if A is false).
>>
>> So, in your single toss of a coin, it is true that "I will observe
>> either 'succ' or 'fail' if the coin is fair". But (as in the above)
>> you cannot deduce that "the coin is fair" if you observe either
>> 'succ' or 'fail', since it is possible (indeed necessary) that you
>> obtain such an observation if the coin is not fair (even if the
>> coin is the same, either 'succ' or 'fail', on both sides, therefore
>> completely unfair). This is an attempt to expand Greg Snow's reply!
>>
>> Your 2-sided test takes the form T=1 if either outcome='succ' or
>> outcome='fail'. And that is the only possible value for T since
>> no other outcome is possible. Hence Prob(T==1) = 1 whether the coin
>> is fair or not. It is not possible for such data to discriminate
>> between a fair and an unfair coin.
>>
>> And, as explained above, a P-value of 1 cannot prove that the
>> null hypothesis is true. All that is possible with a significance
>> test is that a small P-value can be taken as evidence that the
>> NH is false.
>>
>> Hoping this helps!
>> Ted.
>>
>> On 02-Sep-10 07:41:17, Kay Cecil Cichini wrote:
>>> i test the null that the coin is fair (p(succ) = p(fail) = 0.5) with
>>> one trail and get a p-value of 1. actually i want to proof the
>>> alternative H that the estimate is different from 0.5, what certainly
>>> can not be aproven here. but in reverse the p-value of 1 says that i
>>> can 100% sure that the estimate of 0.5 is true (??) - that's the point
>>> that astonishes me.
>>>
>>> thanks if anybody could clarify this for me,
>>> kay
>>>
>>> Zitat von Greg Snow :
>>>
 Try thinking this one through from first principles, you are
 essentially saying that your null hypothesis is that you are
 flipping a fair coin and you want to do a 2-tailed test.  You then
 flip the coin exact

[R] date

2010-09-02 Thread Dunia Scheid

Hello all,

I've 2 strings that representing the start and end values of a date and
time.
For example,
time1 <- c("21/04/2005","23/05/2005","11/04/2005")
time2 <- c("15/07/2009", "03/06/2008", "15/10/2005")
as.difftime(time1,time2)
Time differences in secs
[1] NA NA NA
attr(,"tzone")
[1] ""

How can i calculate the difference  between this 2 string?

Regards,
Dunia

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] star models

2010-09-02 Thread Setlhare Lekgatlhamang

Hi,
Have you gotten help on this?

Lexi

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of ma...@unizar.es
Sent: Saturday, August 28, 2010 5:08 PM
To: r-help@r-project.org
Subject: [R] star models

Hi,
I am traying to implement an STAR model, but I have some problems.
I am following the instruction of the model, that they are in:

http://bm2.genes.nig.ac.jp/RGM2/R_current/library/tsDyn/man/star.html

  that they are from:

http://bm2.genes.nig.ac.jp/RGM2/pkg.php?p=tsDyn

The model is:
star(x, m=2, noRegimes, d = 1, steps = d, series, rob = FALSE, mTh,  
thDelay, thVar, sig=0.05, trace=TRUE, control=list(), ...)

I have implemented the example:
mod.star <- star(log10(lynx), m=2, mTh=c(0,1), control=list(maxit=3000))
mod.star

However, when I put my variables from my data.frame, R does not find  
my variables, so, for this reason I have try to create a matrix with  
my data variables, and to implement the model with the matrix, but I  
can not create the matrix, either.

Please, could someone help me to implement this model?

Thanks.
  Mercedes.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

DISCLAIMER:\ Sample Disclaimer added in a VBScript.\ ...{{dropped:3}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor

2010-09-02 Thread stephenb


sorry to bump in late, but I am doing similar things now and was browsing.

IMHO anova is not appropriate here. it applies when the richer model has p
more variables than the simpler model. this is not the case here. the
competing models use different variables.

you are left with IC. 

by transforming a continuous variable into categorical you are smoothing,
which is the idea of GAM. if you look at what is offered in GAMs you may
find better approximations f(age) as well as tools for testing among
different f(age) transformations.

regards.
S.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-get-design-matrix-tp891987p2524289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] date

2010-09-02 Thread Ista Zahn

Hi Dunia,
You need to convert the character strings to Dates.

time1 <- as.Date(c("21/04/2005","23/05/2005","11/04/2005"), "%d/%m/%Y")
time2 <- as.Date(c("15/07/2009", "03/06/2008", "15/10/2005"), "%d/%m/%Y")

time2-time1


Best,
Ista

On Thu, Sep 2, 2010 at 10:32 AM, Dunia Scheid  wrote:
> Hello all,
>
> I've 2 strings that representing the start and end values of a date and
> time.
> For example,
> time1 <- c("21/04/2005","23/05/2005","11/04/2005")
> time2 <- c("15/07/2009", "03/06/2008", "15/10/2005")
> as.difftime(time1,time2)
> Time differences in secs
> [1] NA NA NA
> attr(,"tzone")
> [1] ""
>
> How can i calculate the difference  between this 2 string?
>
> Regards,
> Dunia
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How using the weights argument in nls2?

2010-09-02 Thread Ivan Allaman


Good morning gentlemen!

How using a weighted model in nls2? Values with the nls are logical since
values with nls2 are not. I believe that this discrepancy is due to I did
not include the weights argument in nls2.

Here's an example:

MOISTURE <- c(28.41640,  28.47340,  29.05821,  28.52201,  30.92055, 
31.07901,  31.35840, 31.69617,  32.07168,  31.87296,  31.35525,  32.66118, 
33.23385,  32.72256,
32.57929,  32.12674,  52.35225,  52.77275,  64.90770,  64.90770,  85.23800,
84.43300,  68.96560,  68.41395,  70.82880,  71.18400,  96.13240,  96.07920,
95.35160,  94.71660,  87.59190,  88.63250,  89.78760,  90.17820, 88.46160,
87.10860, 94.86660,  94.51830,  75.79000,  76.98780, 144.70950, 143.88950,
111.58620, 112.71510, 120.85300, 121.43100, 116.34840, 114.87420, 195.35040,
191.36040, 265.35220, 267.25450, 227.13700, 228.78000, 238.37120, 242.70700,
299.54890, 291.04110, 220.09920, 219.82650, 236.79150, 243.70710, 208.79880,
208.12260, 417.21420, 429.59480, 360.91080, 371.66400, 357.72520, 360.53640,
383.82600, 383.82600, 434.02700, 432.57500, 440.56260, 438.32340, 468.69600,
469.82140, 497.93680, 497.17010)

YEARS <-  rep(c(86,109, 132, 158, 184, 220, 254, 276, 310,
337),c(8,8,8,8,8,8,8,8,8,8))

VARIANCE <- c(2.0879048 ,   2.0879048,2.0879048,2.0879048,   
2.0879048,
 2.0879048,2.0879048,2.0879048,0.3442724,0.3442724,
0.3442724,0.3442724,0.3442724,0.3442724,   0.3442724,
0.3442724,  151.9481710,  151.9481710,  151.9481710,  151.9481710,
151.9481710,  151.9481710,  151.9481710,  151.9481710,  115.3208995,
115.3208995,  115.3208995,  115.3208995,  115.3208995,  115.3208995,
115.3208995,  115.3208995,   51.9965027,   51.9965027,   51.9965027,
51.9965027,   51.9965027,   51.9965027,  51.9965027,   51.9965027,
180.0496045, 180.0496045,  180.0496045,  180.0496045,  180.0496045,
180.0496045,  180.0496045,  180.0496045,  791.3223240,  791.3223240,
791.3223240,  791.3223240,  791.3223240,  791.3223240,  791.3223240,
791.3223240, 1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973,
1280.0179973, 1280.0179973, 1280.0179973, 1280.0179973,  728.9582154,
728.9582154,  728.9582154,  728.9582154,  728.9582154,  728.9582154,
728.9582154,  728.9582154,  752.4591144,  752.4591144,  752.4591144,
752.4591144,  752.4591144,  752.4591144,  752.4591144,  752.4591144)

test <- data.frame(YEARS,MOISTURE,VARIANCE)

mod.nls <- nls(MOISTURE ~ A/(1+B*exp(-k*YEARS)),
data = test, 
weights = 1/VARIANCE,
start = list(A=1500, B=200, k=0.03345),
control=list(maxiter = 500),
trace=TRUE)
summary(mod.nls)

Following the example of pdf!

st1 <- expand.grid(A = seq(0, 2000, len = 10),
B = seq(0, 500, len = 10), k = seq(-1, 10, len = 10))
st1

mod.nls2 <-nls2(MOISTURE ~ A/(1+B*exp(-k*YEARS)),
data = test, 
start = st1,
algorithm="brute-force")
mod.nls2

I appreciate everyone's attention.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-using-the-weights-argument-in-nls2-tp2524328p2524328.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on glm and optim

2010-09-02 Thread Zhang,Yanwei

Dear all,

I'm trying to use the "optim" function to replicate the results  from the "glm" 
using an example from the help page of "glm", but I could not get the "optim" 
function to work. Would you please point out where I did wrong? Thanks a lot.

The following is the code:

# Step 1: fit the glm
clotting <- data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
fit1 <- glm(lot1 ~ log(u), data=clotting, family=Gamma)

# Step 2: use optim
# define loglikelihood function to be maximized over
# theta is a vector of three parameters: intercept, cofficient for log(u) and 
dispersion parameter
loglik <- function(theta,data){
E <- 1/(theta[1]+theta[2]*log(data$u))
V <- theta[3]*E^2
loglik <- 
sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T))
return(loglik)
}

# use the glm result as initial values
theta <- c(as.vector(coef(fit1)),0.002446059)
fit2 <- optim(theta, loglik,  clotting, gr = NULL, hessian = TRUE,
control = list(fnscale = -1))

# Then I got the following error message:
Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = 
list(fnscale = -1)) :
  non-finite finite-difference value [3]


Wayne (Yanwei) Zhang
Statistical Research
CNA
Phone: 312-822-6296
Email: yanwei.zh...@cna.com




NOTICE:  This e-mail message, including any attachments and appended messages, 
is for the sole use of the intended recipients and may contain confidential and 
legally privileged information.
If you are not the intended recipient, any review, dissemination, distribution, 
copying, storage or other use of all or any portion of this message is strictly 
prohibited.
If you received this message in error, please immediately notify the sender by 
reply e-mail and delete this message in its entirety.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Linear models (lme4) - basic question

2010-09-02 Thread James Nead

Hi,

Sorry for a basic questions on linear models.

I am trying to adjust raw data for both fixed and mixed effects. The data that 
I 
output should account for these effects, so that I can use the adjusted data 
for 
further analysis.

For example, if I have the blood sugar levels for 30 patients, and I know that 
'weight' is a fixed effect and that 'height' is a random effect, what I'd want 
as output is blood sugar levels that have been adjusted for these effects. 



library("lme4")
sugar <- c(1:10,sample(1:100,20))
weight <- 1:30
height <- rep(sample(1:15,15,replace=F),2)

lm.adj <- lmer(sugar ~ weight + (1|height))

adjusted.sugar <- fitted(lm.adj)

=

I'm not sure if I'm doing this right...?

thanks!



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on glm and optim

2010-09-02 Thread Thomas Lumley


On Thu, 2 Sep 2010, Zhang,Yanwei wrote:


Dear all,

I'm trying to use the "optim" function to replicate the results  from the "glm" using an example 
from the help page of "glm", but I could not get the "optim" function to work. Would you please 
point out where I did wrong? Thanks a lot.

The following is the code:

# Step 1: fit the glm
clotting <- data.frame(
   u = c(5,10,15,20,30,40,60,80,100),
   lot1 = c(118,58,42,35,27,25,21,19,18),
   lot2 = c(69,35,26,21,18,16,13,12,12))
fit1 <- glm(lot1 ~ log(u), data=clotting, family=Gamma)

# Step 2: use optim
# define loglikelihood function to be maximized over
# theta is a vector of three parameters: intercept, cofficient for log(u) and 
dispersion parameter
loglik <- function(theta,data){
   E <- 1/(theta[1]+theta[2]*log(data$u))
   V <- theta[3]*E^2
   loglik <- 
sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T))
   return(loglik)
}

# use the glm result as initial values
theta <- c(as.vector(coef(fit1)),0.002446059)
fit2 <- optim(theta, loglik,  clotting, gr = NULL, hessian = TRUE,
   control = list(fnscale = -1))

# Then I got the following error message:
Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = 
list(fnscale = -1)) :
 non-finite finite-difference value [3]



If you use trace(loglik, tracer=quote(print(theta))) to trace the inputs to 
loglik() you will find that it is being called with negative values of theta[3] 
to get finite differences.   One fix is to reparametrize and use the log scale 
rather than the scale as a parameter.

   -thomas


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on glm and optim

2010-09-02 Thread Zhang,Yanwei

Thomas, 
Thanks a lot. This solves my problem.  

Wayne (Yanwei) Zhang
Statistical Research 
>CNA

-Original Message-
From: Thomas Lumley [mailto:tlum...@u.washington.edu] 
Sent: Thursday, September 02, 2010 11:24 AM
To: Zhang,Yanwei
Cc: r-help@r-project.org
Subject: Re: [R] Help on glm and optim

On Thu, 2 Sep 2010, Zhang,Yanwei wrote:

> Dear all,
>
> I'm trying to use the "optim" function to replicate the results  from the 
> "glm" using an example from the help page of "glm", but I could not get the 
> "optim" function to work. Would you please point out where I did wrong? 
> Thanks a lot.
>
> The following is the code:
>
> # Step 1: fit the glm
> clotting <- data.frame(
>u = c(5,10,15,20,30,40,60,80,100),
>lot1 = c(118,58,42,35,27,25,21,19,18),
>lot2 = c(69,35,26,21,18,16,13,12,12))
> fit1 <- glm(lot1 ~ log(u), data=clotting, family=Gamma)
>
> # Step 2: use optim
> # define loglikelihood function to be maximized over
> # theta is a vector of three parameters: intercept, cofficient for log(u) and 
> dispersion parameter
> loglik <- function(theta,data){
>E <- 1/(theta[1]+theta[2]*log(data$u))
>V <- theta[3]*E^2
>loglik <- 
> sum(dgamma(data$lot1,shape=1/theta[3],rate=1/(E*theta[3]),log=T))
>return(loglik)
> }
>
> # use the glm result as initial values
> theta <- c(as.vector(coef(fit1)),0.002446059)
> fit2 <- optim(theta, loglik,  clotting, gr = NULL, hessian = TRUE,
>control = list(fnscale = -1))
>
> # Then I got the following error message:
> Error in optim(theta, loglik, clotting, gr = NULL, hessian = TRUE, control = 
> list(fnscale = -1)) :
>  non-finite finite-difference value [3]
>

If you use trace(loglik, tracer=quote(print(theta))) to trace the inputs to 
loglik() you will find that it is being called with negative values of theta[3] 
to get finite differences.   One fix is to reparametrize and use the log scale 
rather than the scale as a parameter.

-thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

NOTICE:  This e-mail message, including any attachments and appended messages, 
is for the sole use of the intended recipients and may contain confidential and 
legally privileged information.
If you are not the intended recipient, any review, dissemination, distribution, 
copying, storage or other use of all or any portion of this message is strictly 
prohibited.
If you received this message in error, please immediately notify the sender by 
reply e-mail and delete this message in its entirety.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread Bert Gunter

Why should height be a random effect?

I suspect you may need a tutorial on what a random effect in a mixed
model is. I see no obvious reason why one would cluster on height.
Perhaps if you clarify, it may become obvious either what your
concerns are (and that your model is correct) or that you
misunderstand and need a tutorial on random effects in mixed models.
Some kind soul on this list may provide that (not me, though) -- or
you can talk with your local statistician.

Cheers,
Bert

On Thu, Sep 2, 2010 at 9:22 AM, James Nead  wrote:
> Hi,
>
> Sorry for a basic questions on linear models.
>
> I am trying to adjust raw data for both fixed and mixed effects. The data 
> that I
> output should account for these effects, so that I can use the adjusted data 
> for
> further analysis.
>
> For example, if I have the blood sugar levels for 30 patients, and I know that
> 'weight' is a fixed effect and that 'height' is a random effect, what I'd want
> as output is blood sugar levels that have been adjusted for these effects.
>
>
> 
> library("lme4")
> sugar <- c(1:10,sample(1:100,20))
> weight <- 1:30
> height <- rep(sample(1:15,15,replace=F),2)
>
> lm.adj <- lmer(sugar ~ weight + (1|height))
>
> adjusted.sugar <- fitted(lm.adj)
>
> =
>
> I'm not sure if I'm doing this right...?
>
> thanks!
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread James Nead

Hi Bert,

Thanks for the reply.

Height, was just as an example. Perhaps, I should have said 'x' instead of 
height.

Essentially, what I want to do is adjust the data for known effects. After I've 
done this, I can conduct further analysis on the data (for example, if another 
variable 'z' has an effect on sugar). So, if I have to adjust the data for a 
fixed effect (weight) and random effect (height/x), then am I using the model 
correctly?

Thanks again for the reply!









From: Bert Gunter 

Cc: r-help@r-project.org
Sent: Thu, September 2, 2010 12:46:13 PM
Subject: Re: [R] Linear models (lme4) - basic question

Why should height be a random effect?

I suspect you may need a tutorial on what a random effect in a mixed
model is. I see no obvious reason why one would cluster on height.
Perhaps if you clarify, it may become obvious either what your
concerns are (and that your model is correct) or that you
misunderstand and need a tutorial on random effects in mixed models.
Some kind soul on this list may provide that (not me, though) -- or
you can talk with your local statistician.

Cheers,
Bert


> Hi,
>
> Sorry for a basic questions on linear models.
>
> I am trying to adjust raw data for both fixed and mixed effects. The data 
> that 
>I
> output should account for these effects, so that I can use the adjusted data 
>for
> further analysis.
>
> For example, if I have the blood sugar levels for 30 patients, and I know that
> 'weight' is a fixed effect and that 'height' is a random effect, what I'd want
> as output is blood sugar levels that have been adjusted for these effects.
>
>
> 
> library("lme4")
> sugar <- c(1:10,sample(1:100,20))
> weight <- 1:30
> height <- rep(sample(1:15,15,replace=F),2)
>
> lm.adj <- lmer(sugar ~ weight + (1|height))
>
> adjusted.sugar <- fitted(lm.adj)
>
> =
>
> I'm not sure if I'm doing this right...?
>
> thanks!
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread James Nead

Sorry, forgot to mention that the processed data will be used as input for a 
classification algorithm. So, I need to adjust for known effects before I can 
use the data.

thanks





From: Bert Gunter 

Cc: r-help@r-project.org
Sent: Thu, September 2, 2010 12:46:13 PM
Subject: Re: [R] Linear models (lme4) - basic question

Why should height be a random effect?

I suspect you may need a tutorial on what a random effect in a mixed
model is. I see no obvious reason why one would cluster on height.
Perhaps if you clarify, it may become obvious either what your
concerns are (and that your model is correct) or that you
misunderstand and need a tutorial on random effects in mixed models.
Some kind soul on this list may provide that (not me, though) -- or
you can talk with your local statistician.

Cheers,
Bert


> Hi,
>
> Sorry for a basic questions on linear models.
>
> I am trying to adjust raw data for both fixed and mixed effects. The data 
> that 
>I
> output should account for these effects, so that I can use the adjusted data 
>for
> further analysis.
>
> For example, if I have the blood sugar levels for 30 patients, and I know that
> 'weight' is a fixed effect and that 'height' is a random effect, what I'd want
> as output is blood sugar levels that have been adjusted for these effects.
>
>
> 
> library("lme4")
> sugar <- c(1:10,sample(1:100,20))
> weight <- 1:30
> height <- rep(sample(1:15,15,replace=F),2)
>
> lm.adj <- lmer(sugar ~ weight + (1|height))
>
> adjusted.sugar <- fitted(lm.adj)
>
> =
>
> I'm not sure if I'm doing this right...?
>
> thanks!
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Greg Snow

Just to add to Ted's addition to my response.  I think you are moving towards 
better understanding (and your misunderstandings are common), but to further 
clarify:

First, make sure that you understand that the probability of A given B, p(A|B), 
is different from the probability of B given A, p(B|A).  A simple example from 
probability class:  I have 3 coins in my pocket, one has 2 heads, one has 2 
tails, and the other is fair (p(H)=p(T)=0.5), I take one coin out at random 
(each has probability =1/3 of being chosen) and flip it, I observe "Heads", 
what is the probability that it is the 2 headed coin?

So here the probability of Heads given the coin is 2 headed is 1.0, but that 
does not mean that the probability that I chose the 2 headed coin is 1 given I 
saw heads (it is 2/3rds).  

The same applies with testing, the p-value is the probability of the data given 
that the null hypothesis is TRUE.  Many people try to interpret it as the 
probability that the null hypothesis is true given the data, but that is not 
the case (not even for Bayesians unless they use a really weird prior).  I 
think that you are starting to understand this part, high p-values mean that 
the data is consistent with the null, we cannot reject the null, but they do 
not prove the null.

A great article for help in understanding p-values better is:

 Murdock, D, Tsai, Y, and Adcock, J (2008) _P-Values are Random
 Variables_. The American Statistician. (62) 242-245.

Which talks about p-values as random variables.  There are a couple of 
functions in the TeachingDemos package that implement some of the simulations 
discussed in the article, I would suggest playing with run.Pvalue.norm.sim and 
run.Pvalue.binom.sim.  Notice that when the null hypothesis is true (and you 
have a big enough sample size for the binomial case) that the p-values follow a 
uniform distribution, the p-values 1.0, 0.5, 0.1, and 0.001 are all equally 
likely.  As the difference between the null hypothesis and the truth increases 
you get more and more p-values close to 0.

The real tricky bit about hypothesis testing is that we compute a single 
p-value, a single observation from a distribution, and based on that try to 
decide if the process that produced that observation is a uniform distribution 
or something else (that may be close to a uniform or very different).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Kay Cecil Cichini
> Sent: Thursday, September 02, 2010 6:40 AM
> To: ted.hard...@manchester.ac.uk
> Cc: r-help@r-project.org
> Subject: Re: [R] general question on binomial test / sign test
> 
> 
> thanks a lot for the elaborations.
> 
> your explanations clearly brought to me that either
> binom.test(1,1,0.5,"two-sided") or binom.test(0,1,0.5) giving a
> p-value of 1 simply indicate i have abolutely no ensurance to reject
> H0.
> 
> considering binom.test(0,1,0.5,alternative="greater") and
> binom.test(1,1,0.5,alternative="less") where i get a p-value of 1 and
> 0.5,respectively - am i right in stating that for the first estimate
> 0/1 i have no ensurance at all for rejection of H0 and for the second
> estimate = 1/1 i have same chance for beeing wrong in either rejecting
> or keeping H0.
> 
> many thanks,
> kay
> 
> 
> 
> Zitat von ted.hard...@manchester.ac.uk:
> 
> > You state: "in reverse the p-value of 1 says that i can 100% sure
> > that the estimate of 0.5 is true". This is where your logic about
> > significance tests goes wrong.
> >
> > The general logic of a singificance test is that a test statistic
> > (say T) is chosen such that large values represent a discrepancy
> > between possible data and the hypothesis under test. When you
> > have the data, T evaluates to a value (say t0). The null hypothesis
> > (NH) implies a distribution for the statistic T if the NH is true.
> >
> > Then the value of Prob(T >= t0 | NH) can be calculated. If this is
> > small, then the probability of obtaining data at least as discrepant
> > as the data you did obtain is small; if sufficiently small, then
> > the conjunction of NH and your data (as assessed by the statistic T)
> > is so unlikely that you can decide to not believe that it is
> possible.
> > If you so decide, then you reject the NH because the data are so
> > discrepant that you can't believe it!
> >
> > This is on the same lines as the "reductio ad absurdum" in classical
> > logic: "An hypothesis A implies that an outcome B must occur. But I
> > have observed that B did not occur. Therefore A cannot be true."
> >
> > But it does not follow that, if you observe that B did occur
> > (which is *consistent* with A), then A must be true. A could be
> > false, yet B still occur -- the only basis on which occurrence
> > of B could *prove* that A must be true is when you have the pri

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread Ben Bolker

James Nead  yahoo.com> writes:

> 
> Sorry, forgot to mention that the processed data will be used as input for a 
> classification algorithm. So, I need to adjust for known effects before I can 
> use the data.
> 
> > I am trying to adjust raw data for both fixed and mixed effects. 
> The data that I
> > output should account for these effects, so that I can use 
> the adjusted data 
> >for
> > further analysis.
> >
> > For example, if I have the blood sugar levels for 30 patients, 
> and I know that
> > 'weight' is a fixed effect and that 'height' is a random effect, 
> what I'd want
> > as output is blood sugar levels that have been adjusted for these effects.

  What's not clear to me is what you mean by 'adjusted for'.
fitted(lm.adj) will give predicted values based on the height
and weight. I don't really know what the justification for/meaning
of the adjustment is, so I don't know whether you want to predict
on the basis of the heights, or whether you want to get a 'population-level'
prediction, i.e. one with height effects set to zero.  Maybe you want
residuals(lm.adj) ...?

  I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread David Winsemius



On Sep 2, 2010, at 2:01 PM, Greg Snow wrote:




The real tricky bit about hypothesis testing is that we compute a  
single p-value, a single observation from a distribution, and based  
on that try to decide if the process that produced that observation  
is a uniform distribution or something else (that may be close to a  
uniform or very different).


My friendly addition would be to point the OP in the direction of  
using qqplot() for the examination of distributional properties rather  
than doing any sort of hypothesis testing. There is a learning curve  
for using that tool, but it will pay off in the end.


--
David.


Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Kay Cecil Cichini
Sent: Thursday, September 02, 2010 6:40 AM
To: ted.hard...@manchester.ac.uk
Cc: r-help@r-project.org
Subject: Re: [R] general question on binomial test / sign test


thanks a lot for the elaborations.

your explanations clearly brought to me that either
binom.test(1,1,0.5,"two-sided") or binom.test(0,1,0.5) giving a
p-value of 1 simply indicate i have abolutely no ensurance to reject
H0.

considering binom.test(0,1,0.5,alternative="greater") and
binom.test(1,1,0.5,alternative="less") where i get a p-value of 1 and
0.5,respectively - am i right in stating that for the first estimate
0/1 i have no ensurance at all for rejection of H0 and for the second
estimate = 1/1 i have same chance for beeing wrong in either  
rejecting

or keeping H0.

many thanks,
kay




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread James Nead

My apologies - I have made this more confusing than it needs to be.

I had microarray gene expression data which I want to use for classification 
algorithms. However, I want to 'adjust' the data for all confounding factors 
(such as age, experiment number etc.), before I could use the data as input for 
the classification algorithms. Since the phenotype is known to be effected by 
age, I thought that this would be a fixed effect whereas something like 
'beadchip' would be a random effect.

Should I be looking at something else for this?

From: Ben Bolker 
To: r-h...@stat.math.ethz.ch
Sent: Thu, September 2, 2010 2:06:47 PM
Subject: Re: [R] Linear models (lme4) - basic question

James Nead  yahoo.com> writes:

> 
> Sorry, forgot to mention that the processed data will be used as input for a 
> classification algorithm. So, I need to adjust for known effects before I can 
> use the data.
> 
> > I am trying to adjust raw data for both fixed and mixed effects. 
> The data that I
> > output should account for these effects, so that I can use 
> the adjusted data 
> >for
> > further analysis.
> >
> > For example, if I have the blood sugar levels for 30 patients, 
> and I know that
> > 'weight' is a fixed effect and that 'height' is a random effect, 
> what I'd want
> > as output is blood sugar levels that have been adjusted for these effects.

  What's not clear to me is what you mean by 'adjusted for'.
fitted(lm.adj) will give predicted values based on the height
and weight. I don't really know what the justification for/meaning
of the adjustment is, so I don't know whether you want to predict
on the basis of the heights, or whether you want to get a 'population-level'
prediction, i.e. one with height effects set to zero.  Maybe you want
residuals(lm.adj) ...?

  I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread Ben Bolker

On 10-09-02 02:26 PM, James Nead wrote:
> My apologies - I have made this more confusing than it needs to be.
>
> I had microarray gene expression data which I want to use for
> classification algorithms. However, I want to 'adjust' the data for
> all confounding factors (such as age, experiment number etc.), before
> I could use the data as input for the classification algorithms. Since
> the phenotype is known to be effected by age, I thought that this
> would be a fixed effect whereas something like 'beadchip' would be a
> random effect.
>
> Should I be looking at something else for this?
>

  Sounds to me as though you should use residuals() rather than fitted()
if you want to "adjust for confounding factors".

  But since you've made up a nice small example, I think you should look
at the results
 of fitted() and residuals()
for your example and see if it's doing what you want.
>
>
> 
> *From:* Ben Bolker 
> *To:* r-h...@stat.math.ethz.ch
> *Sent:* Thu, September 2, 2010 2:06:47 PM
> *Subject:* Re: [R] Linear models (lme4) - basic question
>
> James Nead  yahoo.com > writes:
>
> >
> > Sorry, forgot to mention that the processed data will be used as
> input for a
> > classification algorithm. So, I need to adjust for known effects
> before I can
> > use the data.
> >
> > > I am trying to adjust raw data for both fixed and mixed effects.
> > The data that I
> > > output should account for these effects, so that I can use
> > the adjusted data
> > >for
> > > further analysis.
> > >
> > > For example, if I have the blood sugar levels for 30 patients,
> > and I know that
> > > 'weight' is a fixed effect and that 'height' is a random effect,
> > what I'd want
> > > as output is blood sugar levels that have been adjusted for these
> effects.
>
>   What's not clear to me is what you mean by 'adjusted for'.
> fitted(lm.adj) will give predicted values based on the height
> and weight. I don't really know what the justification for/meaning
> of the adjustment is, so I don't know whether you want to predict
> on the basis of the heights, or whether you want to get a
> 'population-level'
> prediction, i.e. one with height effects set to zero.  Maybe you want
> residuals(lm.adj) ...?
>
>   I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org
> 
>
> __
> R-help@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Greg Snow

David,

The original poster was not looking at distributions and testing distributions, 
I referred to the distribution of the p-value to help them understand (in 
reference to the paper mentioned).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: David Winsemius [mailto:dwinsem...@comcast.net]
> Sent: Thursday, September 02, 2010 12:12 PM
> To: Greg Snow
> Cc: Kay Cecil Cichini; ted.hard...@manchester.ac.uk; r-h...@r-
> project.org
> Subject: Re: [R] general question on binomial test / sign test
> 
> 
> On Sep 2, 2010, at 2:01 PM, Greg Snow wrote:
> 
> 
> >
> > The real tricky bit about hypothesis testing is that we compute a
> > single p-value, a single observation from a distribution, and based
> > on that try to decide if the process that produced that observation
> > is a uniform distribution or something else (that may be close to a
> > uniform or very different).
> 
> My friendly addition would be to point the OP in the direction of
> using qqplot() for the examination of distributional properties rather
> than doing any sort of hypothesis testing. There is a learning curve
> for using that tool, but it will pay off in the end.
> 
> --
> David.
> >
> > Hope this helps,
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> >> project.org] On Behalf Of Kay Cecil Cichini
> >> Sent: Thursday, September 02, 2010 6:40 AM
> >> To: ted.hard...@manchester.ac.uk
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] general question on binomial test / sign test
> >>
> >>
> >> thanks a lot for the elaborations.
> >>
> >> your explanations clearly brought to me that either
> >> binom.test(1,1,0.5,"two-sided") or binom.test(0,1,0.5) giving a
> >> p-value of 1 simply indicate i have abolutely no ensurance to reject
> >> H0.
> >>
> >> considering binom.test(0,1,0.5,alternative="greater") and
> >> binom.test(1,1,0.5,alternative="less") where i get a p-value of 1
> and
> >> 0.5,respectively - am i right in stating that for the first estimate
> >> 0/1 i have no ensurance at all for rejection of H0 and for the
> second
> >> estimate = 1/1 i have same chance for beeing wrong in either
> >> rejecting
> >> or keeping H0.
> >>
> >> many thanks,
> >> kay
> >>
> >>
> 
> David Winsemius, MD
> West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear models (lme4) - basic question

2010-09-02 Thread Bert Gunter

Perhaps even more to the point, "covariate adjustment" and
"classification" should not be separate. One should fit the
appropriate model that does both.

-- Bert

On Thu, Sep 2, 2010 at 11:34 AM, Ben Bolker  wrote:
> On 10-09-02 02:26 PM, James Nead wrote:
>> My apologies - I have made this more confusing than it needs to be.
>>
>> I had microarray gene expression data which I want to use for
>> classification algorithms. However, I want to 'adjust' the data for
>> all confounding factors (such as age, experiment number etc.), before
>> I could use the data as input for the classification algorithms. Since
>> the phenotype is known to be effected by age, I thought that this
>> would be a fixed effect whereas something like 'beadchip' would be a
>> random effect.
>>
>> Should I be looking at something else for this?
>>
>
>  Sounds to me as though you should use residuals() rather than fitted()
> if you want to "adjust for confounding factors".
>
>  But since you've made up a nice small example, I think you should look
> at the results
>  of fitted() and residuals()
> for your example and see if it's doing what you want.
>>
>>
>> 
>> *From:* Ben Bolker 
>> *To:* r-h...@stat.math.ethz.ch
>> *Sent:* Thu, September 2, 2010 2:06:47 PM
>> *Subject:* Re: [R] Linear models (lme4) - basic question
>>
>> James Nead  yahoo.com > writes:
>>
>> >
>> > Sorry, forgot to mention that the processed data will be used as
>> input for a
>> > classification algorithm. So, I need to adjust for known effects
>> before I can
>> > use the data.
>> >
>> > > I am trying to adjust raw data for both fixed and mixed effects.
>> > The data that I
>> > > output should account for these effects, so that I can use
>> > the adjusted data
>> > >for
>> > > further analysis.
>> > >
>> > > For example, if I have the blood sugar levels for 30 patients,
>> > and I know that
>> > > 'weight' is a fixed effect and that 'height' is a random effect,
>> > what I'd want
>> > > as output is blood sugar levels that have been adjusted for these
>> effects.
>>
>>   What's not clear to me is what you mean by 'adjusted for'.
>> fitted(lm.adj) will give predicted values based on the height
>> and weight. I don't really know what the justification for/meaning
>> of the adjustment is, so I don't know whether you want to predict
>> on the basis of the heights, or whether you want to get a
>> 'population-level'
>> prediction, i.e. one with height effects set to zero.  Maybe you want
>> residuals(lm.adj) ...?
>>
>>   I suggest that follow-ups go to r-sig-mixed-mod...@r-project.org
>> 
>>
>> __
>> R-help@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why is vector assignment in R recreates the entire vector ?

2010-09-02 Thread Norm Matloff

Tal wrote:

> A friend recently brought to my attention that vector assignment actually
> recreates the entire vector on which the assignment is performed.
...

I brought this up in r-devel a few months ago.  You can read my posting,
and the various replies, at

http://www.mail-archive.com/r-de...@r-project.org/msg20089.html

Some of the replies not only explain the process, but list lines in the
source code where this takes place, enabling a closer look at how/when
duplication occurs. 

Norm Matloff

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simultaneous equations

2010-09-02 Thread benhartley903


Dear all, 
I am relatively new to R and have had some difficulty in understanding the
user manual for a package that I have downloaded to evaluate non-linear
simultaneous equations. 
The package is called systemfit. 
Does anyone have any experience of this package? 
What I am trying to do is solve a non linear simultaneous equations... 

Could anyone give me an example (please) of the code that would be needed to
return solutions for the following system using systemfit (or any other
better package): 

y=1/x 
y=sin(x) 

within a range of say x=-10 to 10 (x in radians) 

Thanks, I really appreciate your help in advance. 

Ben 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524645.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simultaneous equations

2010-09-02 Thread Berend Hasselman



benhartley903 wrote:
> 
> Dear all, 
> I am relatively new to R and have had some difficulty in understanding the
> user manual for a package that I have downloaded to evaluate non-linear
> simultaneous equations. 
> The package is called systemfit. 
> Does anyone have any experience of this package? 
> What I am trying to do is solve a non linear simultaneous equations... 
> 
> Could anyone give me an example (please) of the code that would be needed
> to return solutions for the following system using systemfit (or any other
> better package): 
> 
> y=1/x 
> y=sin(x) 
> 
> within a range of say x=-10 to 10 (x in radians) 
> 
> Thanks, I really appreciate your help in advance. 
> 
> Ben 
> 

systemfit is intended for estimating a system of simultaneous equations.
You seem to want to solve a system.
I assume you want to solve this system for x and y.

You could use package nleqslv as follows:

library(nleqslv)

fun <- function(x) {
 f <- numeric(length(x))
 f[1] <-  x[2] - 1/x[1]
 f[2] <-  x[2] - sin(x[1])
 f
}
x.start <- c(1,1)
nleqslv(x.start,fun)  # will give 1.1141571 0.8975395

x.start <- c(2,2)
nleqslv(x.start,fun)  # will give 2.7726047 0.3606717


Berend

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524710.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simultaneous equations

2010-09-02 Thread Peter Dalgaard

On 09/02/2010 09:25 PM, benhartley903 wrote:
> 
> Dear all, 
> I am relatively new to R and have had some difficulty in understanding the
> user manual for a package that I have downloaded to evaluate non-linear
> simultaneous equations. 
> The package is called systemfit. 
> Does anyone have any experience of this package? 
> What I am trying to do is solve a non linear simultaneous equations... 
> 
> Could anyone give me an example (please) of the code that would be needed to
> return solutions for the following system using systemfit (or any other
> better package): 
> 
> y=1/x 
> y=sin(x) 
> 
> within a range of say x=-10 to 10 (x in radians) 
> 
> Thanks, I really appreciate your help in advance. 
> 
> Ben 

Systemfit is not even in the same ballpark...!

I'd just rewrite the above as

x * sin(x) - 1 = 0

make a graph to bracket the roots and then use uniroot.

> f <- function(x) x*sin(x)-1
> curve(f, interval=c(-10.10)
> f(c(0,2,5,7,10))
[1] -1.000  0.8185949 -5.7946214  3.5989062 -6.4402111

So the roots are

> uniroot(f,interval=c(7,10))$root
[1] 9.317241
> uniroot(f,interval=c(5,7))$root
[1] 6.43914
> uniroot(f,interval=c(2,5))$root
[1] 2.772631
> uniroot(f,interval=c(0,2))$root
[1] 1.114161

and 4 more symmetrically below zero.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor

2010-09-02 Thread Frank Harrell


On Thu, 2 Sep 2010, stephenb wrote:



sorry to bump in late, but I am doing similar things now and was browsing.

IMHO anova is not appropriate here. it applies when the richer model has p
more variables than the simpler model. this is not the case here. the
competing models use different variables.


A simple approach is to have the factor variable in the model and to 
formally test for added information given by the continuous variable 
(linear, quadratic, spline, etc).  AIC could also be used.


 >

you are left with IC.

by transforming a continuous variable into categorical you are smoothing,
which is the idea of GAM. if you look at what is offered in GAMs you may
find better approximations f(age) as well as tools for testing among
different f(age) transformations.


I don't follow that comment.  Smoothing uses the full continuous 
variable to begin with.


A restricted cubic spline function in age is a simple approach.  E.g.:

require(rms)
dd <- datadist(mydata); options(datadist='dd')
f <- cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata)
plot(Predict(f, age))

Note that you can always expect the categorized version of age not to 
fit the data except sometimes when behavior is dictated by law 
(driving, drinking, military service, medicare).


Frank



regards.
S.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Power analysis

2010-09-02 Thread Lewis G. Dean

I am aware this is fairly simple, but is currently driving me mad! Could
someone help me out with conducting a post-hoc power analysis on a Wilcoxon
test. I am being driven slightly mad by some conflicting advice!
Thanks in advance,

Lewis

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simultaneous equations

2010-09-02 Thread benhartley903


Thanks Peter
Actually I should have specified. These are not actually the functions I
ultimately want to solve can't be rearranged explicitly and substituted. I
do need a way to solve simultaneously. 

Ben
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Simultaneous-equations-tp2524645p2524751.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Ordering data by variable

2010-09-02 Thread Mestat


Hi listers,
I could order a data that like this:
x<-c(2,6,8,8,1)
y<-c(1,6,3,5,4)
o<-order(x)
frame<-rbind(x,y)[,o]
But, I would like to know if there is a way to order my data without setting
up a data frame. I would like to keep independent vectors x and y.
Any suggestions?
Thanks in advance,
Marcio
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Ordering-data-by-variable-tp2524754p2524754.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ordering data by variable

2010-09-02 Thread Joshua Wiley

Hi Marcio,

Is this what you want?

x <- c(2,6,8,8,1)
y <- c(1,6,3,5,4)
o <- order(x)

# If you want each vector order by x
x[o]
y[o]

You can also use sort(), but then each vector would be sorted by
itself, not both by x.

HTH,

Josh

On Thu, Sep 2, 2010 at 1:48 PM, Mestat  wrote:
>
> Hi listers,
> I could order a data that like this:
> x<-c(2,6,8,8,1)
> y<-c(1,6,3,5,4)
> o<-order(x)
> frame<-rbind(x,y)[,o]
> But, I would like to know if there is a way to order my data without setting
> up a data frame. I would like to keep independent vectors x and y.
> Any suggestions?
> Thanks in advance,
> Marcio
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Ordering-data-by-variable-tp2524754p2524754.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How using the weights argument in nls2?

2010-09-02 Thread Gabor Grothendieck

On Thu, Sep 2, 2010 at 11:09 AM, Ivan Allaman  wrote:
>
> Good morning gentlemen!
>
> How using a weighted model in nls2? Values with the nls are logical since
> values with nls2 are not. I believe that this discrepancy is due to I did
> not include the weights argument in nls2.
>

Just to follow up, nls2 was ignoring the weights argument.  This is
now fixed in the development version of nls2.  The weights and no
weights versions below give different residual sum of squares showing
that weights is not ignored:

library(nls2)
# grab development version
source("http://nls2.googlecode.com/svn/trunk/R/nls2.R";)
BOD2 <- cbind(BOD, w = 1:6)
nls2(demand ~ a + b*Time, data = BOD2, start = c(a = 1, b = 1),
   weights = w, alg = "brute")

# compare against
nls2(demand ~ a + b*Time, data = BOD2, start = c(a = 1, b = 1),
   alg = "brute")

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor

2010-09-02 Thread Bond, Stephen

Totally agreed.

I made a mistake in calling the categorization a GAM. If we apply a step 
function to the continuous age we get a limited range ordinal variable. 
Categorizing is creating several binary variables from the continuous (with 
treatment contrasts).

Stephen B
-Original Message-
From: Frank Harrell [mailto:harre...@gmail.com] On Behalf Of Frank Harrell
Sent: Thursday, September 02, 2010 4:23 PM
To: Bond, Stephen
Cc: r-help@r-project.org
Subject: Re: [R] Comparing COXPH models, one with age as a continuous variable, 
one with age as a three-level factor

On Thu, 2 Sep 2010, stephenb wrote:

>
> sorry to bump in late, but I am doing similar things now and was browsing.
>
> IMHO anova is not appropriate here. it applies when the richer model has p
> more variables than the simpler model. this is not the case here. the
> competing models use different variables.

A simple approach is to have the factor variable in the model and to 
formally test for added information given by the continuous variable 
(linear, quadratic, spline, etc).  AIC could also be used.

  >
> you are left with IC.
>
> by transforming a continuous variable into categorical you are smoothing,
> which is the idea of GAM. if you look at what is offered in GAMs you may
> find better approximations f(age) as well as tools for testing among
> different f(age) transformations.

I don't follow that comment.  Smoothing uses the full continuous 
variable to begin with.

A restricted cubic spline function in age is a simple approach.  E.g.:

require(rms)
dd <- datadist(mydata); options(datadist='dd')
f <- cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata)
plot(Predict(f, age))

Note that you can always expect the categorized version of age not to 
fit the data except sometimes when behavior is dictated by law 
(driving, drinking, military service, medicare).

Frank

>
> regards.
> S.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power analysis

2010-09-02 Thread Greg Snow

Be happy, don't do post-hoc power analyses.

The standard "post-hoc power analysis" is actually counterproductive. It is 
much better to just create confidence intervals.  Or give a better 
description/justification showing that your case is not the standard/worse than 
useless version.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Lewis G. Dean
> Sent: Thursday, September 02, 2010 9:45 AM
> To: r-help@r-project.org
> Subject: [R] Power analysis
> 
> I am aware this is fairly simple, but is currently driving me mad!
> Could
> someone help me out with conducting a post-hoc power analysis on a
> Wilcoxon
> test. I am being driven slightly mad by some conflicting advice!
> Thanks in advance,
> 
> Lewis
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Ordering data by variable

2010-09-02 Thread Greg Snow

Suggestion:  use the power of R.

If x and y are independent then sorting y based on x is meaningless.

If sorting y based on x is meaningful, then they are not independent.

Trying to force non-independent things to pretend that they are independent 
just causes future headaches.

Part of the great power of R is the ability to group things together that 
should be grouped.  The wise learn this and use it (in some cases (mine) that 
wisdom comes at the expense of not having properly grouped in the past).

Learn the power of with/within, data= arguments, and apply style functions, 
then you will be eager to combine things into data frames (or lists or ...) 
when appropriate.

<>

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Mestat
> Sent: Thursday, September 02, 2010 2:49 PM
> To: r-help@r-project.org
> Subject: [R] Ordering data by variable
> 
> 
> Hi listers,
> I could order a data that like this:
> x<-c(2,6,8,8,1)
> y<-c(1,6,3,5,4)
> o<-order(x)
> frame<-rbind(x,y)[,o]
> But, I would like to know if there is a way to order my data without
> setting
> up a data frame. I would like to keep independent vectors x and y.
> Any suggestions?
> Thanks in advance,
> Marcio
> --
> View this message in context: http://r.789695.n4.nabble.com/Ordering-
> data-by-variable-tp2524754p2524754.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] general question on binomial test / sign test

2010-09-02 Thread Ted Harding

On 02-Sep-10 18:01:55, Greg Snow wrote:
> Just to add to Ted's addition to my response.  I think you are moving
> towards better understanding (and your misunderstandings are common),
> but to further clarify:
> [Wise words about P(A|B), P(B|A), P-values, etc., snipped]
> 
> The real tricky bit about hypothesis testing is that we compute a
> single p-value, a single observation from a distribution, and based on
> that try to decide if the process that produced that observation is a
> uniform distribution or something else (that may be close to a uniform
> or very different).

Indeed. And this is precisely why I began my original reply as follows:

>> Zitat von ted.hard...@manchester.ac.uk:
>>> [...]
>>> The general logic of a singificance test is that a test statistic
>>> (say T) is chosen such that large values represent a discrepancy
>>> between possible data and the hypothesis under test. When you
>>> have the data, T evaluates to a value (say t0). The null hypothesis
>>> (NH) implies a distribution for the statistic T if the NH is true.
>>>
>>> Then the value of Prob(T >= t0 | NH) can be calculated. If this is
>>> small, then the probability of obtaining data at least as discrepant
>>> as the data you did obtain is small; if sufficiently small, then
>>> the conjunction of NH and your data (as assessed by the statistic T)
>>> is so unlikely that you can decide to not believe that it is
>>> possible.
>>> If you so decide, then you reject the NH because the data are so
>>> discrepant that you can't believe it!

The point is that the test statistic T represents *discrepancy*
between data and NH in some sense. In what sense? That depends on
what you are interested in finding out; and, whatever it is,
there will be some T that represents it.

It might be whether two samples come from distributions with equal
means, or not. Then you might use T = mean(Ysample) - mean(Xsample).
Large values of |T| represent discrepancy (in either direction)
between data and an NH that the true means are equal. Large values
of T, discrepancy in the positive direction, large values of -T
diuscrepancy in the negative direction. Or it might be whether or
not the two samples are drawn from populations with equal variances,
when you might use T = var(Ysample)/var(Xsample). Or it might be
whether the distribution from which X was sampled is symmetric,
in which case you might use skewness(Xsample). Or you might be
interested in whether the numbers falling into disjoint classes
are consistent with hypothetical probabilities p1,...,pk of
falling into these classes -- in which case you might use the
chi-squared statistic T = sum(((ni - N*pi)^2)/(N*pi)). And so on.

Once you have decided on what "discrepant" means, and chosen a
statistic T to represent discrepancy, then the NH implies a
distribution for T and you can calculate
  P-value = Prob(T >= t0 | NH)
where t0 is the value of T calculated from the data.

*THEN* small P-value is in direct correspondence with large T,
i.e. small P is equivalent to large discrepancy. And it is also
the direct measure of how likely you were to get so large a
discrepancy if the NH really was true.

Thus the P-values, calculated from the distribution of (T | NH),
are ordered, not just numerically from small P to large, but also
equivalently by discrepancy (from large discrepancy to small).

Thus the uniform distribution of P under the NH does not just
mean that any value of P is as likely as any other, so "So what?
Why prefer on P-value to another?"

We also have that different parts of the [0,1] P-scale have
different *meanings* -- the parts near 0 are highly discrepant
from NH, the parts near 1 are highly consistent with NH,
*with respect to the meaning of "discrepancy" implied by the
choice of test statistic T*.

So it helps to understand hypothesis testing if you keep in
mind what the test statistic T *represents* in real terms.

Greg's point about "try to decide if the process that produced that
observation is a uniform distribution or something else (that may
be close to a uniform or very different)" is not in the first instance
relevant to the direct interpretation of small P-value as large
discrepancy -- that involves only the Null Hypothesis NH, under
which the P-values have a uniform distribution.

Where it somes into its own is that an Alternative Hypothesis AH
would correspond to some degree of discrepancy of a certain kind,
and if T is well chosen then its distribution under AH will give
large values of T greater probability than they would get under NH.
Thus the AHs that are implied by a large value of a certain test
statistic T are those AHs that give such values of T greater
probability than they would get under NH. Thus we are now getting
into the domain of the Power of the test to detect discrepancy.

Ted.

E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 02-Sep-10   Tim

[R] specify the covariance matrix for random effect

2010-09-02 Thread Qiu, Weiyu

Hi,

I'm doing a generalized linear mixed model, and I currently use an R function 
called "glmm". However, in this function they use a standard normal 
distribution for the random effect, which doesn't satisfy my case, i.e.  my 
random effect also follows a normal distribution, but observations over time 
are somehow correlated, so the covariance matrix would be different the 
identity matrix. Besides, it's also different from the commonly used AR(1), 
AR(2) structures, so actually I need a way to write down the covariance matrix 
for the random effect in GLMM by myself.

Does anyone know how to do this in R? Thank you in advance for any help.

--
Best,
Vicky

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NLS equation self starting non linear

2010-09-02 Thread Marlin Keith Cox

This data are kilojoules of energy that are consumed in starving fish over a
time period (Days).  The KJ reach a lower asymptote and level off and I
would like to use a non-linear plot to show this leveling off.  The data are
noisy and the sample sizes not the largest.  I have tried selfstarting
weibull curves and tried the following, both end with errors.

Days<-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39,
39, 39, 39, 39, 39,  1,  1,  1,  1)
Joules<-c(8.782010,  8.540524,  8.507526, 11.296904,  4.232690, 13.026588,
10.213342,  4.771482,  4.560359,  6.146684, 9.651727,  8.064329,  9.419335,
7.129264,  6.079012,  7.095888, 17.996794,  7.028287,  8.028352,  5.497564,
11.607090,  9.987215, 11.065307, 21.433094, 20.366385, 22.475717)
X11()
par(cex=1.6)
plot(Joules~Days,xlab="Days after fasting was initiated",ylab="Mean energy
per fish (KJ)")
model<-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229),
control=list(minFactor=1e-12),trace=TRUE)
summary(model)

init <- nls(Joules~ SSweibull(Days,Asym,Drop,lrc,pwr),
control=list(minFactor=1e-12),data=.GlobalEnv)
init
Data here is reproducible.

As always, thank you...keith

-- 
M. Keith Cox, Ph.D.
Alaska NOAA Fisheries, National Marine Fisheries Service
Auke Bay Laboratories
17109 Pt. Lena Loop Rd.
Juneau, AK 99801
keith@noaa.gov
marlink...@gmail.com
U.S. (907) 789-6603

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] specify the covariance matrix for random effect

2010-09-02 Thread Ben Bolker

Qiu, Weiyu  ualberta.ca> writes:

> 
> Hi,
> 
> I'm doing a generalized linear mixed model, and I currently use an 
> R function called "glmm". However, in
> this function they use a standard normal distribution for the random 
> effect, which doesn't satisfy my
> case, i.e.  my random effect also follows a normal distribution, 
> but observations over time are somehow
> correlated, so the covariance matrix would be different the
> identity matrix. Besides, it's also
> different from the commonly used AR(1), AR(2) structures, so 
> actually I need a way to write down the
> covariance matrix for the random effect in GLMM by myself.
> 

  If you could get by with an AR or ARMA structure then you could
use lme() with the 'correlation' argument from the nlme package.
If you have enough data/are willing to fit a completely unstructured
correlation matrix, you could use corSymm.  See ?corStruct in
the nlme package documentation: it is *in principle* possible to
write functions to implement your own correlation structures, e.g.
see

corSymm
nlme:::corSymm.corMatrix
nlme:::coef.corNatural

  However, this will not be easy unless you are already a good
R programmer and familiar with the nlme package.  If you really
need to do this I definitely recommend that you buy and study
Pinheiro and Bates's 2000 book.
  It might also be worth taking a look at the cor* objects in the
"ape" package, which are coded slightly more transparently.

  Followups should probably go to r-sig-mixed-mod...@r-project.org

  good luck
Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] testing for emptyenv

2010-09-02 Thread Peng, C


Is there a complete list of these very handy and power functions in the base
R?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/testing-for-emptyenv-tp2432922p2525031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "pairs" with same xlim and ylim scale

2010-09-02 Thread Dejian Zhao

When "pairs" draws plots, "lower.panel" invokes "f.xy". Maybe there is 
something in "f.xy" incompatible with "pairs". You can read the code of 
"pairs" to see what happens.


"pairs" has two methods, as you can see in the help message (?pairs). 
According to your code, pairs is supposed to invoke "Default S3 method".

> methods(pairs)
[1] pairs.default  pairs.formula*
   Non-visible functions are asterisked
Therefore, you should check the code of the function "pairs.default" to 
see how error occurs. Just type "pairs.default" at the R command prompt 
and enter, you can get the source code of "pairs.default".




On 2010-9-2 15:15, Shi, Tao wrote:

Hi Dejian,

You're right on this!  Do you know how to pass those two argument into
lower.panel?  Thanks!

...Tao



From: Dejian Zhao
To:r-help@r-project.org
Sent: Tue, August 31, 2010 6:10:16 PM
Subject: Re: [R] "pairs" with same xlim and ylim scale

I think you have successfully passed the "xlim" and "ylim" into the
function pairs1. Compare the two graphs produced by the codes you
provided, you can find the xlim and ylim in the second graph have been
reset to the assigned value. It seems that the program halted in
producing the second plot after adding xlim and ylim. According to the
error message, the two added parameters were not used in lower.panel, or
the customized function f.xy.

On 2010-9-1 2:26, Shi, Tao wrote:
   

Hi list,

I have a function which basically is a wrapper of pairs with some useful panel
functions.  However, I'm having trouble to pass the "xlim" and "ylim" into the
function so the x and y axes are in the same scale and 45 degree lines are
exactly diagonal.   I've looked at some old posts, they didn't help much.  I
 

[[elided Yahoo spam]]
   

Thanks!

...Tao


pairs1<- function(x, ...) {
  f.xy<- function(x, y, ...) {
  points(x, y, ...)
  abline(0, 1, col = 2)
  }

  panel.cor<- function(x, y, digits=2, prefix="", cex.cor) {
   usr<- par("usr"); on.exit(par(usr))
   par(usr = c(0, 1, 0, 1))
   r<- abs(cor(x, y, method="p", use="pairwise.complete.obs"))
   txt<- format(c(r, 0.123456789), digits=digits)[1]
   txt<- paste(prefix, txt, sep="")
   if(missing(cex.cor)) cex<- 0.8/strwidth(txt)
   text(0.5, 0.5, txt, cex = cex * r)
   }

   panel.hist<- function(x, ...) {
   usr<- par("usr"); on.exit(par(usr))
   par(usr = c(usr[1:2], 0, 1.5) )
   h<- hist(x, plot = FALSE)
   breaks<- h$breaks; nB<- length(breaks)
   y<- h$counts; y<- y/max(y)
   rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)
   }

  pairs(x, lower.panel=f.xy, upper.panel=panel.cor, diag.panel=panel.hist,
...)
}


 

x<- rnorm(100, sd=0.2)
x<- cbind(x=x-0.1, y=x+0.1)
pairs1(x)
pairs1(x, xlim=c(-1,1), ylim=c(-1,1))

   

Error in lower.panel(...) :
unused argument(s) (xlim = c(-1, 1), ylim = c(-1, 1))



 [[alternative HTML version deleted]]

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making plots in big scatterplot matrix large enough to see

2010-09-02 Thread Jocelyn Paine


William,

Thanks. I adapted your example by doing:
  library(psych)
  pdf(file="myfile.pdf",width=30,height=30)
  pairs.panels(data,gap=0)
  dev.off()

The R part worked. I could see it doing so when I replaced the call to 
'pdf' by

  windows(width=30,height=30)
. The remaining problem was that Adobe Reader only displayed ten rows of 
my file, and then hung. That doesn't surprise me, because it's an 
unreliable piece of software, often hanging on PDFs I find on Google. 
Annoying, though.


Jocelyn Paine
http://www.j-paine.org
http://www.spreadsheet-parts.org
+44 (0)7768 534 091

On Tue, 31 Aug 2010, William Revelle wrote:


Jocelyn,
 In a partial answer to your question, try setting gap=0 in the calls to 
pairs.  This will make the plots closer together.


(You might also find pairs.panels in the psych package useful,  -- it 
implements one of the help examples for pairs to report the histogram on the 
diagonal and reports the correlations in the upper off diagonal).


On a Mac, I just tried setting
quartz(width=30, height=30)   #make a big graphics window


#then
library(psych)
my.data <- sim.item(24)  #create 500 cases of 24 variables
pairs.panels(my.data, gap=0)  #the gap =0 makes the plots right next to each 
other


#And then save the graphics window as a pdf.  I can open this in a pdf  and 
scroll around pretty easily.



Bill

At 5:21 AM +0100 8/31/10, Jocelyn Paine wrote:
I've got a data frame with 23 columns, and wanted to plot a scatterplot 
matrix of it. I called

  pairs( df )
where 'df' is my data frame. This did generate the matrix, but the plotting 
window did not expand to make the individual plots large enough to see. 
Each one was only about 10 pixels high and wide.


I tried sending the plot to a file, with a high and wide image, by doing
  png( "plot.png", width = 4000, height = 4000 )
but I got these errors:
  Error in png( "plot.png", width = 4000, height = 4000 ) :
  unable to start device
  In addition: Warning messages:
  1: In png( "plot.png", width = 4000, height = 4000 ) :
 Unable to allocate bitmap
  2: In png( "plot.png", width = 4000, height = 4000 ) :
 opening device failed

The messages aren't helpful, because they don't tell you _why_ R can't 
start the device, allocate it, or open it. The documentation for png says:

  Windows imposes limits on the size of bitmaps: these are not documented
  in the SDK and may depend on the version of Windows. It seems that width
  and height are each limited to 2^15-1.
However, 2^15-1 is 32767, so that isn't the problem here. I tried various 
values for height and width. 2400 was OK, but 2500 wasn't. So it seems R 
can't produce plots that are more than about 2400 pixels square. This is 
with R 2.10.1.


Why is png failing on big images? Also, what's the recommended way to make 
a file containing a scatterplot matrix when you have lots of variables? 
'pairs' is a very useful function, but obviously one does need to be 
careful when doing this, and I don't know what experts would recommend. Do 
you loop round the variables plotting each pair to a different file? I was 
hoping that I could put them all into one very big image and view parts of 
it at a time.


Thanks,

Jocelyn Paine
http://www.j-paine.org
http://www.spreadsheet-parts.org
+44 (0)7768 534 091

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
William Revelle http://personality-project.org/revelle.html
Professor   http://personality-project.org
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Use R for psychology   http://personality-project.org/r
It is 6 minutes to midnight http://www.thebulletin.org



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making plots in big scatterplot matrix large enough to see

2010-09-02 Thread Jocelyn Paine

Greg, thanks for the suggestion. That's useful to know for future work. 
It's not so good in this case, because I'm making the plots for a 
colleague who doesn't know R, and it would be a bother for me to have to 
send him several files and him reassemble them. What I did was to use 
pairs.panels, as suggested by William Revelle on this thread.


I'd like to ask a general question about the interface though. There's a 
size below which individual scatterplots are not legible. It makes no 
sense at all for a scatterplot routine to plot them at that size or 
smaller. So why didn't the author(s) of 'pairs' program it so that it 
always makes them at least legible size, and expands the image window 
until it can fit them all in?


Regards,

Jocelyn Paine
http://www.j-paine.org
+44 (0)7768 534 091

Jocelyn's Cartoons:
http://www.j-paine.org/blog/jocelyns_cartoons/

On Tue, 31 Aug 2010, Greg Snow wrote:

Look at the pairs2 function in the TeachingDemos package, this lets you 
produce smaller portions of the total scatterplot matrix at a time (with 
bigger plots), you could print the smaller portions then assemble the 
full matrix on a large wall, or just use it to look at potentially 
interesting parts.


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Jocelyn Paine
Sent: Monday, August 30, 2010 10:21 PM
To: r-help@r-project.org
Subject: [R] Making plots in big scatterplot matrix large enough to see

I've got a data frame with 23 columns, and wanted to plot a scatterplot
matrix of it. I called
   pairs( df )
where 'df' is my data frame. This did generate the matrix, but the
plotting window did not expand to make the individual plots large
enough
to see. Each one was only about 10 pixels high and wide.

I tried sending the plot to a file, with a high and wide image,
by doing
   png( "plot.png", width = 4000, height = 4000 )
but I got these errors:
   Error in png( "plot.png", width = 4000, height = 4000 ) :
   unable to start device
   In addition: Warning messages:
   1: In png( "plot.png", width = 4000, height = 4000 ) :
  Unable to allocate bitmap
   2: In png( "plot.png", width = 4000, height = 4000 ) :
  opening device failed

The messages aren't helpful, because they don't tell you _why_ R can't
start the device, allocate it, or open it. The documentation for png
says:
   Windows imposes limits on the size of bitmaps: these are not
documented
   in the SDK and may depend on the version of Windows. It seems that
width
   and height are each limited to 2^15-1.
However, 2^15-1 is 32767, so that isn't the problem here. I tried
various
values for height and width. 2400 was OK, but 2500 wasn't. So it seems
R
can't produce plots that are more than about 2400 pixels square. This
is
with R 2.10.1.

Why is png failing on big images? Also, what's the recommended way to
make
a file containing a scatterplot matrix when you have lots of variables?
'pairs' is a very useful function, but obviously one does need to be
careful when doing this, and I don't know what experts would recommend.
Do
you loop round the variables plotting each pair to a different file? I
was
hoping that I could put them all into one very big image and view parts
of
it at a time.

Thanks,

Jocelyn Paine
http://www.j-paine.org
http://www.spreadsheet-parts.org
+44 (0)7768 534 091

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making plots in big scatterplot matrix large enough to see

2010-09-02 Thread Peter Ehlers


On 2010-08-31 13:49, Greg Snow wrote:

Look at the pairs2 function in the TeachingDemos package, this lets you produce 
smaller portions of the total scatterplot matrix at a time (with bigger plots), 
you could print the smaller portions then assemble the full matrix on a large 
wall, or just use it to look at potentially interesting parts.



If I remember correctly, there's also the xysplom() function
in pkg:HH.

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calculate monthly mean

2010-09-02 Thread Ali Alsamawi



Hello group


   Im trying to plot 3d  with scatterplot packages, i got error say  " 
length(color) must be equal length(x) or 1 " may data has dimensions 
(lon,lat,lev,time) ,the time in month i want to 
calculate the monthly mean for the time how can i make that , is there any 
function doing that 

Thanks a lot

 







##load rgl package
library(rgl)
library(fields)
library(ncdf)
library(scatterplot3d)
## open binary file to read
 nc <- open.ncdf("/srv/ccrc/data05/z3236814/mm5/co2/2000/q.21.mon.nc")

v1 <- nc$var [[1]]

v2 <- nc$var [[2]]

v3 <- nc$var [[3]]

data1 <- get.var.ncdf(nc,v1)
data2 <- get.var.ncdf(nc,v2)
data3 <- get.var.ncdf(nc,v3)


coldat = data1[1:111,1:101,23,1:60]

## creat colour 
hcol = cumsum(coldat)
coldat = hcol
hcol = hcol/max(hcol,na.rm=TRUE)


col <- hsv(h=hcol,s=1,v=1)

X 
<-scatterplot3d(data3[1:111,1:101],data2[1:111,1:101],data1[1:111,1:101,23,1:60],color=col)
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power analysis

2010-09-02 Thread C Peng


Agree with Greg's point. In fact it does not make logical sense in many
cases. Similar to the use of the "statistically unreliable" reliability
measure Cronbach's alpha in some non-statistical fields.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Power-analysis-tp2524729p2524907.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] specify the covariance matrix for random effect

2010-09-02 Thread Qiu, Weiyu

Hi,

I'm doing a generalized linear mixed model, and I currently use an R function 
called "glmm". However, in this function they use a standard normal 
distribution for the random effect, which doesn't satisfy my case, i.e.  my 
random effect also follows a normal distribution, but observations over time 
are somehow correlated, so the covariance matrix would be different the 
identity matrix. Besides, it's also different from the commonly used AR(1), 
AR(2) structures, so actually I need a way to write down the covariance matrix 
for the random effect in GLMM by myself.

Does anyone know how to do this in R? Thank you in advance for any help.

--
Best,
Vicky

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] date

2010-09-02 Thread Linlin Yan

try to use difftime() instead of as.difftime().

On Thu, Sep 2, 2010 at 10:32 PM, Dunia Scheid  wrote:
> Hello all,
>
> I've 2 strings that representing the start and end values of a date and
> time.
> For example,
> time1 <- c("21/04/2005","23/05/2005","11/04/2005")
> time2 <- c("15/07/2009", "03/06/2008", "15/10/2005")
> as.difftime(time1,time2)
> Time differences in secs
> [1] NA NA NA
> attr(,"tzone")
> [1] ""
>
> How can i calculate the difference  between this 2 string?
>
> Regards,
> Dunia
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NLS equation self starting non linear

2010-09-02 Thread Gabor Grothendieck

On Thu, Sep 2, 2010 at 7:39 PM, Marlin Keith Cox  wrote:
> This data are kilojoules of energy that are consumed in starving fish over a
> time period (Days).  The KJ reach a lower asymptote and level off and I
> would like to use a non-linear plot to show this leveling off.  The data are
> noisy and the sample sizes not the largest.  I have tried selfstarting
> weibull curves and tried the following, both end with errors.
>
> Days<-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39,
> 39, 39, 39, 39, 39,  1,  1,  1,  1)
> Joules<-c(8.782010,  8.540524,  8.507526, 11.296904,  4.232690, 13.026588,
> 10.213342,  4.771482,  4.560359,  6.146684, 9.651727,  8.064329,  9.419335,
> 7.129264,  6.079012,  7.095888, 17.996794,  7.028287,  8.028352,  5.497564,
> 11.607090,  9.987215, 11.065307, 21.433094, 20.366385, 22.475717)
> X11()
> par(cex=1.6)
> plot(Joules~Days,xlab="Days after fasting was initiated",ylab="Mean energy
> per fish (KJ)")
> model<-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229),
> control=list(minFactor=1e-12),trace=TRUE)
> summary(model)

Note that Joules is defined above but joules is used as well.  We have
assumed they are the same.

Also note that the formula is linear in log(joules) if a=0 so lets
assume a=0 and use lm to solve.  Also switch to the "plinear"
algorithm which only requires starting values for the nonlinear
parameters -- in this case only c (which we get from the lm).

> joules <- Joules
> coef(lm(log(joules) ~ Days))
(Intercept)Days
 2.61442015 -0.01614845
>
> # so use 0.016 as starting value of c
> fm <- nls(joules~cbind(1, exp(-c*Days)), start = list(c = 0.016), alg = 
> "plinear"); fm
Nonlinear regression model
  model:  joules ~ cbind(1, exp(-c * Days))
   data:  parent.frame()
  c   .lin1   .lin2
 0.2314  8.3630 13.2039
 residual sum-of-squares: 290.3

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making plots in big scatterplot matrix large enough to see

2010-09-02 Thread Peter Ehlers


On 2010-09-02 22:16, Jocelyn Paine wrote:

Greg, thanks for the suggestion. That's useful to know for future work.
It's not so good in this case, because I'm making the plots for a
colleague who doesn't know R, and it would be a bother for me to have to
send him several files and him reassemble them. What I did was to use
pairs.panels, as suggested by William Revelle on this thread.

I'd like to ask a general question about the interface though. There's a
size below which individual scatterplots are not legible. It makes no
sense at all for a scatterplot routine to plot them at that size or
smaller. So why didn't the author(s) of 'pairs' program it so that it
always makes them at least legible size, and expands the image window
until it can fit them all in?

Regards,

Jocelyn Paine
http://www.j-paine.org
+44 (0)7768 534 091


Hmm, I had no trouble creating and viewing William's pdf file.
I was also able to view the pairs plot fairly well on my screen.
And that's on my laptop. Perhaps my display has better resolution
than yours.

Your suggestion to "expand the image window until it can fit them
all in" would, at the very least, involve determining the resolution
of the display device. But even then, there's bound to be someone
who'll want a pairs plot for 1000 variables.

As usual with R, improvements are always welcome.
You could submit appropriate code and, if it is deemed useful,
I'm fairly sure that a better pairs() function will become
part of Rx.xx.x.

  -Peter Ehlers



Jocelyn's Cartoons:
http://www.j-paine.org/blog/jocelyns_cartoons/

On Tue, 31 Aug 2010, Greg Snow wrote:


Look at the pairs2 function in the TeachingDemos package, this lets you
produce smaller portions of the total scatterplot matrix at a time (with
bigger plots), you could print the smaller portions then assemble the
full matrix on a large wall, or just use it to look at potentially
interesting parts.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of Jocelyn Paine
Sent: Monday, August 30, 2010 10:21 PM
To: r-help@r-project.org
Subject: [R] Making plots in big scatterplot matrix large enough to see

I've got a data frame with 23 columns, and wanted to plot a scatterplot
matrix of it. I called
pairs( df )
where 'df' is my data frame. This did generate the matrix, but the
plotting window did not expand to make the individual plots large
enough
to see. Each one was only about 10 pixels high and wide.

I tried sending the plot to a file, with a high and wide image,
by doing
png( "plot.png", width = 4000, height = 4000 )
but I got these errors:
Error in png( "plot.png", width = 4000, height = 4000 ) :
unable to start device
In addition: Warning messages:
1: In png( "plot.png", width = 4000, height = 4000 ) :
   Unable to allocate bitmap
2: In png( "plot.png", width = 4000, height = 4000 ) :
   opening device failed

The messages aren't helpful, because they don't tell you _why_ R can't
start the device, allocate it, or open it. The documentation for png
says:
Windows imposes limits on the size of bitmaps: these are not
documented
in the SDK and may depend on the version of Windows. It seems that
width
and height are each limited to 2^15-1.
However, 2^15-1 is 32767, so that isn't the problem here. I tried
various
values for height and width. 2400 was OK, but 2500 wasn't. So it seems
R
can't produce plots that are more than about 2400 pixels square. This
is
with R 2.10.1.

Why is png failing on big images? Also, what's the recommended way to
make
a file containing a scatterplot matrix when you have lots of variables?
'pairs' is a very useful function, but obviously one does need to be
careful when doing this, and I don't know what experts would recommend.
Do
you loop round the variables plotting each pair to a different file? I
was
hoping that I could put them all into one very big image and view parts
of
it at a time.

Thanks,

Jocelyn Paine
http://www.j-paine.org
http://www.spreadsheet-parts.org
+44 (0)7768 534 091

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] "pairs" with same xlim and ylim scale

2010-09-02 Thread Shi, Tao

Hi Dejian,

Thanks for the reply!  

I finally found the problem.  It is actually in the "panel.cor" function.  
Adding "..." in the function and "text" call fixed everything.


Best,

...Tao




- Original Message 
> From: Dejian Zhao 
> To: r-help@r-project.org
> Sent: Thu, September 2, 2010 7:57:55 PM
> Subject: Re: [R] "pairs" with same xlim and ylim scale
> 
> When "pairs" draws plots, "lower.panel" invokes "f.xy". Maybe there is 
> something in "f.xy" incompatible with "pairs". You can read the code of 
> "pairs" to see what happens.
> 
> "pairs" has two methods, as you can see  in the help message (?pairs). 
> According to your code, pairs is supposed to  invoke "Default S3 method".
>  > methods(pairs)
> [1] pairs.default   pairs.formula*
> Non-visible functions are  asterisked
> Therefore, you should check the code of the function  "pairs.default" to 
> see how error occurs. Just type "pairs.default" at the R  command prompt 
> and enter, you can get the source code of  "pairs.default".
> 
> 
> 
> On 2010-9-2 15:15, Shi, Tao wrote:
> > Hi  Dejian,
> >
> > You're right on this!  Do you know how to pass  those two argument into
> > lower.panel?  Thanks!
> >
> >  ...Tao
> >
> >
> > 
> > From:  Dejian Zhao
> > To:r-help@r-project.org
> > Sent: Tue,  August 31, 2010 6:10:16 PM
> > Subject: Re: [R] "pairs" with same xlim and  ylim scale
> >
> > I think you have successfully passed the "xlim" and  "ylim" into the
> > function pairs1. Compare the two graphs produced by the  codes you
> > provided, you can find the xlim and ylim in the second graph  have been
> > reset to the assigned value. It seems that the program halted  in
> > producing the second plot after adding xlim and ylim. According to  the
> > error message, the two added parameters were not used in  lower.panel, or
> > the customized function f.xy.
> >
> > On  2010-9-1 2:26, Shi, Tao wrote:
> >
> >> Hi  list,
> >>
> >> I have a function which basically is a wrapper of  pairs with some useful 
>panel
> >> functions.  However, I'm having  trouble to pass the "xlim" and "ylim" 
> >> into 
>the
> >> function so the x and  y axes are in the same scale and 45 degree lines are
> >> exactly  diagonal.   I've looked at some old posts, they didn't help much. 
> >>   
>I
> >>  
> > [[elided Yahoo spam]]
> > 
> >> Thanks!
> >>
> >>  ...Tao
> >>
> >>
> >> pairs1<- function(x, ...)  {
> >>   f.xy<- function(x, y, ...)  {
> >>   points(x, y,  ...)
> >>   abline(0, 1,  col = 2)
> >>   }
> >>
> >>panel.cor<- function(x, y, digits=2, prefix="", cex.cor)  {
> >>usr<- par("usr");  on.exit(par(usr))
> >>par(usr =  c(0, 1, 0, 1))
> >>r<-  abs(cor(x, y, method="p", use="pairwise.complete.obs"))
> >> txt<- format(c(r, 0.123456789),  digits=digits)[1]
> >>txt<-  paste(prefix, txt, sep="")
> >> if(missing(cex.cor)) cex<- 0.8/strwidth(txt)
> >> text(0.5, 0.5, txt, cex = cex * r)
> >> }
> >>
> >> panel.hist<- function(x, ...) {
> >> usr<- par("usr"); on.exit(par(usr))
> >> par(usr = c(usr[1:2], 0, 1.5) )
> >> h<- hist(x, plot = FALSE)
> >> breaks<- h$breaks; nB<-  length(breaks)
> >>y<-  h$counts; y<- y/max(y)
> >> rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...)
> >> }
> >>
> >>   pairs(x,  lower.panel=f.xy, upper.panel=panel.cor, 
>diag.panel=panel.hist,
> >>  ...)
> >> }
> >>
> >>
> >>  
> >>> x<- rnorm(100, sd=0.2)
> >>> x<- cbind(x=x-0.1,  y=x+0.1)
> >>> pairs1(x)
> >>> pairs1(x, xlim=c(-1,1),  ylim=c(-1,1))
> >>>
> >>>
> >> Error in lower.panel(...) :
> >> unused  argument(s) (xlim = c(-1, 1), ylim = c(-1,  1))
> >>
> >>
> >>
> >>   [[alternative HTML version deleted]]
> >>
> >>  __
> >> R-help@r-project.org  mailing  list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >>  PLEASE do read the posting  
>guidehttp://www.R-project.org/posting-guide.html
> >> and provide  commented, minimal, self-contained, reproducible  code.
> >>
> >>  
> >  __
> > R-help@r-project.org  mailing  list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE  do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible code.
> >
> >  __
> > R-help@r-project.org  mailing  list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE  do read the posting guidehttp://www.R-project.org/posting-guide.html
> > and  provide commented, minimal, self-contained, reproducible  code.
> >
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting  guide http://www.R-project.or

Re: [R] NLS equation self starting non linear

2010-09-02 Thread Peter Ehlers


On 2010-09-02 22:32, Gabor Grothendieck wrote:

On Thu, Sep 2, 2010 at 7:39 PM, Marlin Keith Cox  wrote:

This data are kilojoules of energy that are consumed in starving fish over a
time period (Days).  The KJ reach a lower asymptote and level off and I
would like to use a non-linear plot to show this leveling off.  The data are
noisy and the sample sizes not the largest.  I have tried selfstarting
weibull curves and tried the following, both end with errors.

Days<-c(12, 12, 12, 12, 22, 22, 22, 22, 32, 32, 32, 32, 37, 38, 38, 39, 39,
39, 39, 39, 39, 39,  1,  1,  1,  1)
Joules<-c(8.782010,  8.540524,  8.507526, 11.296904,  4.232690, 13.026588,
10.213342,  4.771482,  4.560359,  6.146684, 9.651727,  8.064329,  9.419335,
7.129264,  6.079012,  7.095888, 17.996794,  7.028287,  8.028352,  5.497564,
11.607090,  9.987215, 11.065307, 21.433094, 20.366385, 22.475717)
X11()
par(cex=1.6)
plot(Joules~Days,xlab="Days after fasting was initiated",ylab="Mean energy
per fish (KJ)")
model<-nls(joules~a+b*exp(-c*Days),start=list(a=8,b=9,c=-.229),
control=list(minFactor=1e-12),trace=TRUE)
summary(model)


Note that Joules is defined above but joules is used as well.  We have
assumed they are the same.

Also note that the formula is linear in log(joules) if a=0 so lets
assume a=0 and use lm to solve.  Also switch to the "plinear"
algorithm which only requires starting values for the nonlinear
parameters -- in this case only c (which we get from the lm).


joules<- Joules
coef(lm(log(joules) ~ Days))

(Intercept)Days
  2.61442015 -0.01614845


# so use 0.016 as starting value of c
fm<- nls(joules~cbind(1, exp(-c*Days)), start = list(c = 0.016), alg = 
"plinear"); fm

Nonlinear regression model
   model:  joules ~ cbind(1, exp(-c * Days))
data:  parent.frame()
   c   .lin1   .lin2
  0.2314  8.3630 13.2039
  residual sum-of-squares: 290.3



Keith,

You would get Gabor's solution from your nls() call if you did:

1. replace 'Joules' with 'joules'
2. replace 'c=-.229' with 'c=.229' in your start vector. You
already have the 'minus' in your formula.

I find it useful to add the curve you get with your start values
to the scatterplot to see how reasonable the values are:

plot(Joules~Days)
curve(8 + 9 * exp(-.229 * x), add=TRUE)

After you get a convergent solution, you can add a curve with
the model values.

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] running an exe in the background

2010-09-02 Thread raje...@cse.iitm.ac.in


Hi,

I'd like to be able to run a .exe in the background whenever library(package-x) 
is called. Is this possible?

  ~Aks
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Power analysis

2010-09-02 Thread Dennis Murphy

Hi:

Just to add to the discussion, see the following article by Russell Lenth on
the subject:

http://www.stat.uiowa.edu/techrep/tr378.pdf

Dennis

On Thu, Sep 2, 2010 at 3:59 PM, C Peng  wrote:

>
> Agree with Greg's point. In fact it does not make logical sense in many
> cases. Similar to the use of the "statistically unreliable" reliability
> measure Cronbach's alpha in some non-statistical fields.
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Power-analysis-tp2524729p2524907.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculate monthly mean

2010-09-02 Thread Joshua Wiley

Hello Ali,

It looks to me like you have two questions: 1) the error you get from
scatterplot3d 2) how to calculate the monthly mean.

If you could provide some sample data that would make it a lot easier
to help you understand why the error occurred and what to do about it.
 For now, maybe this will get you started on calculating the monthly
mean.

# Create a data frame with the data in 'x' and the dates in 'y'
sampdat <- data.frame(x = 1:365, y = Sys.Date() + 1:365)
# convert 'y' to a factor by cut()ing the dates by month
sampdat$y <- cut.Date(x = sampdat$y, breaks = "month",
  include.lowest = TRUE, right = TRUE)
# now just find the mean of 'x' for each level of 'y'
by(data = sampdat$x, INDICES = sampdat$y, FUN = mean)


# For some documentation
?Date
?cut.Date #cut() will call the right method, but I wanted to be explicit
?by

Cheers,

Josh

On Thu, Sep 2, 2010 at 5:06 PM, Ali Alsamawi  wrote:
>
>
> Hello group
>
>
>       Im trying to plot 3d  with scatterplot packages, i got error say  " 
> length(color) must be equal length(x) or 1 " may data has dimensions 
> (lon,lat,lev,time) ,the time in month i want to
> calculate the monthly mean for the time how can i make that , is there any 
> function doing that
>
> Thanks a lot
>
>
>
>
>
>
>
>
>
> ##load rgl package
> library(rgl)
> library(fields)
> library(ncdf)
> library(scatterplot3d)
> ## open binary file to read
>     nc <- open.ncdf("/srv/ccrc/data05/z3236814/mm5/co2/2000/q.21.mon.nc")
>
> v1 <- nc$var [[1]]
>
> v2 <- nc$var [[2]]
>
> v3 <- nc$var [[3]]
>
> data1 <- get.var.ncdf(nc,v1)
> data2 <- get.var.ncdf(nc,v2)
> data3 <- get.var.ncdf(nc,v3)
>
>
> coldat = data1[1:111,1:101,23,1:60]
>
> ## creat colour
> hcol = cumsum(coldat)
> coldat = hcol
> hcol = hcol/max(hcol,na.rm=TRUE)
>
>
> col <- hsv(h=hcol,s=1,v=1)
>
> X 
> <-scatterplot3d(data3[1:111,1:101],data2[1:111,1:101],data1[1:111,1:101,23,1:60],color=col)
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] define colors for groups in lattice xyplot

2010-09-02 Thread geert . demeyer

Dear all,

Lattice provides automatic coloring for subgroups on each panel by the 
simple use of a groups statement. For an application I want to change 
these colors to a predifined set. This works well using a panel function 
in stead of the default as long as there are only points in the graphs. 
When I set type="b" things get messed up. Any idea why? I include sample 
code for illustration below.

Thanks for your ideas.

Geert

dataset <- data.frame ( time = rep(1:5,times=9),
genotype = factor(rep(rep (c("A","B","C"),each=5),times=3
)),
location= factor(rep (paste("LOC",1:3),each=15)),
color = rep (rep (c("red","green","blue"),each=5),times=3
),
result = rnorm (45))

library(lattice)

xyplot( result ~ time | location, data=dataset,groups=genotype,pch=19, 
type="b")

xyplot( result ~ time | location, data=dataset,groups=genotype,
fill.color = dataset$color,
panel = function(x, y,fill.color,...,subscripts) {
fill = fill.color [subscripts]
panel.xyplot(x, y,pch=19, col=fill)}
)

xyplot( result ~ time | location, data=dataset,groups=genotype,
fill.color = dataset$color,
panel = function(x, y,fill.color,...,subscripts) {
fill = fill.color [subscripts]
panel.xyplot(x, y,pch=19, col=fill, type ="b")}
)

The information contained in this e-mail is for the excl...{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

94 matches

Mail list logo