date:20100830

Re: [R] maxNR in maxLik package never stops

2010-08-30 Thread Arne Henningsen

Hi Martin!

Which version of maxLik do you use?
(You can figure this out, e.g. by typing 'help( package = "maxLik" )'
at the R prompt.)

If you are not using version 0.8-0, please upgrade to the latest version.

You can get more information on the optimization process by adding
"print.level=4" as argument of maxLik() or maxNR().

You could send us also an example which allows us to replicate your problem.

Best wishes,
Arne


On 29 August 2010 01:06, "Martin Boďa"  wrote:
>
>
>
>
>
>
> Greetings,
>
>
>
>  I use maxNR function under maxLik package to find the REML estimates of the 
> parameters of variance components in heteroskedastic linear regression 
> models. I assume that in the model there is additive/multiplicative/mixed 
> heteroskedasticity and I need estimate the respective parameters of 
> additive/multiplicative/mixed variance components.
>
>
>
>  For my research purposes I make a batch of simulations and estimate for the 
> given design matrix but different (simulated) response vectors the variance 
> component parameters. However, it happens very frequently, usually after the 
> first simulation that the log-likelihood function maximization step (for 
> which maxNR is used) R "freezes". It appears nothing wrong with R, it is 
> working but the maximization never stops (frankly, I cannot say that it is 
> really "never", I did not allow it more time than one day). Therefore, my 
> simulations crash - I cannot even perform one simulation.
>
>
>
>  I see two possible solutions:
>
>  1. Using some additional parameters of maxNR (such as number of iterations 
> or something).
>
>  2. Using some function that could help me control the length of the 
> maximization step - if it were to take to long, it would be evaluated as an 
> unsuccessful simulation.
>
>
>
>  Since I do not have much experience with R, it is clear to me that I cannot 
> resolve the thing myself. Maybe there are other viable solutions.
>
>
>
>  I shall be much obliged for your help and advice.
>
>
>
>
> Martin Boďa
>
>  (Matej Bel University / Faculty of Natural Sciences / Banská Bystrica / 
> Slovakia).
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

2010-08-30 Thread Vincy Pyne

Dear R helpers,

Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, 
there seems to be mis-communication from my end corresponding to my 
requirement. As I had mentioned in my earlier mail, I am dealing with a very 
large database of borrowers and I had given a part of it in my earlier mail as 
given below. For a given rating say "A", I needed to have the bad-wise sums of
 ead's (where bands are constructed using the ead size itself.) and not the 
number of borrowers falling in a particular band.

I am reproducing the data and solution as provided by Winsemius Sir (which 
generates the number of band-wise
 borrowers for a given rating.

rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", 
"A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A",
 "AA", "B","A", "AA", "BBB", "A", "BBB")

ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
600, 17715000,  14430325.24, 1180946.57, 15, 167490, 81255.16, 54812.5, 
3000, 1275702.94, 9100, 1763142.3, 3283048.61, 120, 11800, 3000,  
96894.02,  453671.72,  7590, 106065.24, 940711.67,  2443000, 950, 39000, 
1501939.67)

df$ead.cat <- cut(df$ead, breaks=c(0, 10, 50, 100, 200, 500 
, 1000, 1) )

df  

df_sorted <- df[order(df$rating),]   # the output is as given below.

> df_sorted
   rating ead ead.cat
1  
 A      169229.93    (1e+05,5e+05]
3   A         5877794.25    (5e+06,1e+07]
6   A        21000.00   (0,1e+05]
12  A      15.00   (1e+05,5e+05]
13  A  167490.00   (1e+05,5e+05]
18  A         9100.00  
 (0,1e+05]
23  A         3000.00               (0,1e+05]
25  A          453671.72   (1e+05,5e+05]
28  A      940711.67   (5e+05,1e+06]
31  A            39000.00  (0,1e+05]
5  AA   75040962.06  (1e+07,1e+08]
9  AA   17715000.00  (1e+07,1e+08]
10 AA 
 14430325.24  (1e+07,1e+08]
11 AA    1180946.57  (1e+06,2e+06]
14 AA            81255.16 (0,1e+05]
17 AA 1275702.94 (1e+06,2e+06]
26 AA  7590.00            (0,1e+05]
29 AA 2443000.00 (2e+06,5e+06]
2 AAA  
 100.00 (0,1e+05]
19    AAA   1763142.30  (1e+06,2e+06]
27  B   106065.24  (1e+05,5e+05]
7  BB 1028360.00  (1e+06,2e+06]
15 BB    54812.50 (0,1e+05]
22 BB    11800.00 (0,1e+05]
24
 BB    96894.02 (0,1e+05]
4 BBB    9530148.63  (5e+06,1e+07]
8 BBB    600.00  (5e+06,1e+07]
16    BBB    3000.00  (0,1e+05]
20    BBB   3283048.61   (2e+06,5e+06]
21    BBB   120.00   (1e+06,2e+06]
30    BBB  
 950.00   (5e+06,1e+07]
32    BBB   1501939.67   (1e+06,2e+06]


## The following command fetches rating-wise and ead size no of borrowers. 
Thus, for rating A, there are 4 borrowers in the ead range (0, 1e+05], 4 
borrowers in the range (1e+05 to 5e+05] and so on..

> with(df, tapply(ead.cat, rating, table))
$A

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
    4 4 1 0
 0 1 0 

$AA

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
    2 0 0 2 
1 0 3 

$AAA

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08]
 
    1 0 0 1 
0 0 0 

$B

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
    0 1 0
 0 0 0 0 

$BB

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
(5e+06,1e+07] (1e+07,1e+08] 
    3 0 0 1 
0 0 0 

$BBB

    (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06]
 (2e+06,5e+06] (5e+06,1e+07] (1e+07,1e+08] 
    1 0 0 2 
1 3 0


 My ACTUAL REQUIREMENT

Actually for a given rating, I don't want the number of borrowers falling in 
each of the ead_range. What I want is sum of eads falling in each range. Thus, 
say for rating "A", I need following.


 rating   
 ead.cat  ead_total  
1  A   (0,1e+05]         72100.00    # 
(21000+9100+3000+39000)
2  A   (1e+05, 5e+05]       940391.65    

#(169229.93+15.00+167490.00+453671.72

Re: [R] LOOping problem with R

2010-08-30 Thread Berend Hasselman


You made a mistake with theta

theta<-c(0.08,0.06,0.09,0)

This should be (see the fortran)

theta<-c(0.06,0.08,0.09,0)

The innermost loop (for( k in ...) is better written as while loop to take
into account how Fortran handles loops (see the previous replies):

k <- i
while( k <= j-1 ){
   a<-a*theta[k]
   k <- k + 1
}

Berend



-- 
View this message in context: 
http://r.789695.n4.nabble.com/LOOping-problem-with-R-tp2399596p2399709.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] glm prb (Error in `contrasts<-`(`tmp`, value = "contr.treatment") : )

2010-08-30 Thread peter dalgaard


On Aug 29, 2010, at 10:24 PM, David Winsemius wrote:

> 
> On Aug 29, 2010, at 3:13 PM, moleps wrote:
> 
>> 
>> glm(A~B+C+D+E+F,family = binomial(link = "logit"),data=tre,na.action=na.omit)
>> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
>> contrasts can be applied only to factors with 2 or more levels
>> 
>> however,
>> 
>> glm(A~B+C+D+E,family = binomial(link = "logit"),data=tre,na.action=na.omit)
>> 
>> runs fine
>> 
>> glm(A~B+C+D+F,family = binomial(link = "logit"),data=tre,na.action=na.omit)
>> 
>> runs fine
>> 
>> 
>> 
>> glm(A~E+F,family = binomial(link = "logit"),data=tre,na.action=na.omit)
>> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
>> contrasts can be applied only to factors with 2 or more levels
>> 
>> Why is this? Could it be due to collinearity between the two?
> 
> Perhaps, at least to the extent that the term "collinearity" is an 
> appropriate term for factor interactions. The obvious question at this point 
> is: What does:
> 
> with( tre, table(E,F) )   # show?

Yes. Also notice the pattern of missing values. A likely scenario for the error 
would be to have E and F being something like "gender" and "menarche", where 
the latter is absent for boys.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R crashes when communicating with JAGS

2010-08-30 Thread Florian Weiler

Dear all,

I have a question regarding using JAGS and R. My problem is that every
single time I want to call JAGS from R the latter crashes (i.e. it
turns pale and the Windows tells me "R has stopped working".
Strangely, if there is a mistake in my jags code, this will be
reported without a crash. Only when the code is right and the model
should start to run, R stops working. I already tried to use different
version of R (currently R 2.10.1 and JAGS-2.1.0-1),but to no avail. A
couple of days ago I even upgraded from Vista to Windows 7, hoping
this would sort our the problem, but its still exactly the same. Could
anyone think of a possible solution for this problem?

I worked on other computers using Windows without encountering the
described problem. However, I am quite new to using JAGS, so I might
just be missing the obvious.

Thanks for your help and precious time,
Florian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] compare three values

2010-08-30 Thread Alfredo Alessandrini

Hi,

I've three values. What is the best method to choice the lowest values
with an if function?

example:

a = 3

b = 1

c = 5

if (lowest(a,b,c) is a) {}

if (lowest(a,b,c) is b) {}

if (lowest(a,b,c) is c) {}



Thanks,

Alfredo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R crashes when communicating with JAGS

2010-08-30 Thread Florian Weiler

Hi again,

I was asked to provide some code. Well, in my case it doesn't really
matter which example I use, so I just write down a very basic and
canned example:

First I create some data:
N <- 1000
x <- rnorm(N, 0, 5)

Then I specify a model in JAGS, storing it in the directory with the
extension .bug:
model {
for (i in 1:N) {
x[i] ~ dnorm(mu, tau) ## the model
}
mu ~ dnorm(0, .0001) ## uninformative priors
tau <- pow(sigma, -2)
sigma ~ dunif(0, 100)
}

Then I call the rjags library and finally I try to run the jags model:
library('rjags')

jags <- jags.model('example.bug',
   data = list('x' = x,
   'N' = N))

This is where its always crashing, i.e. I don't get an error message,
but R simply stops working and I have to close and reopen the program.
I don't even start drawing samples from the model, it just crashes the
moment I try to call on JAGS.

Sorry for not providing the example right away,
Florian


On Mon, Aug 30, 2010 at 9:17 AM, Florian Weiler
 wrote:
> Dear all,
>
> I have a question regarding using JAGS and R. My problem is that every
> single time I want to call JAGS from R the latter crashes (i.e. it
> turns pale and the Windows tells me "R has stopped working".
> Strangely, if there is a mistake in my jags code, this will be
> reported without a crash. Only when the code is right and the model
> should start to run, R stops working. I already tried to use different
> version of R (currently R 2.10.1 and JAGS-2.1.0-1),but to no avail. A
> couple of days ago I even upgraded from Vista to Windows 7, hoping
> this would sort our the problem, but its still exactly the same. Could
> anyone think of a possible solution for this problem?
>
> I worked on other computers using Windows without encountering the
> described problem. However, I am quite new to using JAGS, so I might
> just be missing the obvious.
>
> Thanks for your help and precious time,
> Florian
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] compare three values

2010-08-30 Thread ONKELINX, Thierry

Have a look at switch() and which.min()

x <- c(a = 3, b = 1, c = 5)
switch(which.min(x), "a is lowest", "b is lowest", "c is lowest")

HTH,

Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Alfredo Alessandrini
> Verzonden: maandag 30 augustus 2010 12:03
> Aan: r-help@r-project.org
> Onderwerp: [R] compare three values
> 
> Hi,
> 
> I've three values. What is the best method to choice the 
> lowest values with an if function?
> 
> example:
> 
> a = 3
> 
> b = 1
> 
> c = 5
> 
> if (lowest(a,b,c) is a) {}
> 
> if (lowest(a,b,c) is b) {}
> 
> if (lowest(a,b,c) is c) {}
> 
> 
> 
> Thanks,
> 
> Alfredo
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer 
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] maxNR in maxLik package never stops

2010-08-30 Thread Ott-Siim Toomet

Hi,

sure, maxNR uses 'iterlim' (or was it 'maxIter'?) where you can specify
the max # of iterations.  The default was 100 iterations, AFAIK.

There was a bug in one of the previous versions which lead to infinite
loops of constrained maximization.  Do you use constrained or
unconstrained?

and yes -- send us a (non)working example!

Best,
Ott

> Hi Martin!
>
> Which version of maxLik do you use?
> (You can figure this out, e.g. by typing 'help( package = "maxLik" )'
> at the R prompt.)
>
> If you are not using version 0.8-0, please upgrade to the latest version.
>
> You can get more information on the optimization process by adding
> "print.level=4" as argument of maxLik() or maxNR().
>
> You could send us also an example which allows us to replicate your
> problem.
>
> Best wishes,
> Arne
>
>
> On 29 August 2010 01:06, "Martin BoÄ�a"  wrote:
>>
>>
>>
>>
>>
>>
>> Greetings,
>>
>>
>>
>> Â I use maxNR function under maxLik package to find the REML estimates
>> of the parameters of variance components in heteroskedastic linear
>> regression models. I assume that in the model there is
>> additive/multiplicative/mixed heteroskedasticity and I need estimate the
>> respective parameters of additive/multiplicative/mixed variance
>> components.
>>
>>
>>
>> Â For my research purposes I make a batch of simulations and estimate
>> for the given design matrix but different (simulated) response vectors
>> the variance component parameters. However, it happens very frequently,
>> usually after the first simulation that the log-likelihood function
>> maximization step (for which maxNR is used) R "freezes". It appears
>> nothing wrong with R, it is working but the maximization never stops
>> (frankly, I cannot say that it is really "never", I did not allow it
>> more time than one day). Therefore, my simulations crash - I cannot even
>> perform one simulation.
>>
>>
>>
>> Â I see two possible solutions:
>>
>> Â 1. Using some additional parameters of maxNR (such as number of
>> iterations or something).
>>
>> Â 2. Using some function that could help me control the length of the
>> maximization step - if it were to take to long, it would be evaluated as
>> an unsuccessful simulation.
>>
>>
>>
>> Â Since I do not have much experience with R, it is clear to me that I
>> cannot resolve the thing myself. Maybe there are other viable solutions.
>>
>>
>>
>> Â I shall be much obliged for your help and advice.
>>
>>
>>
>>
>> Martin BoÄ�a
>>
>> Â (Matej Bel University / Faculty of Natural Sciences / BanskÃ¡ Bystrica
>> / Slovakia).
>>
>>
>>
>>
>>
>> Â  Â  Â  Â [[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Arne Henningsen
> http://www.arne-henningsen.name
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Thaler, Thorn, LAUSANNE, Applied Mathematics

Hi everybody,

I want an x-axis which has xlim=c(max, min) rather than xlim=c(min, max)
in order to reflect the type of the process (cooling):

library(lattice)
myprepanel <- function(x,y,...) list(xlim=rev(range(x)))
x <- rep(1:10, 100)
z <- factor(sample(10, 1000, T))
y <- rnorm(1000, x, as.numeric(z))

xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel)

This works as expected. But I don't like to have the x-axis annotation
in every panel since in fact it is the same in every panel. Doing the
same with relation="same" does not work, however. From "Lattice -
Multivariate Data Visualization with R" p.142:

"Note that this will not work when relation="same", because in that case
the process of combining limits from different packets makes the final
limit sorted [...]"

So which part do I have to change to get a multipanel plot with x-axis
in reversed order but where the x-axes are only drawn in the panels at
the boundary (like in xyplot(y~x|z))? Do I have to provide an own axis
function?

Any help appreciated.

BR

Thorn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Deepayan Sarkar

On Mon, Aug 30, 2010 at 3:57 PM, Thaler, Thorn, LAUSANNE, Applied
Mathematics  wrote:
> Hi everybody,
>
> I want an x-axis which has xlim=c(max, min) rather than xlim=c(min, max)
> in order to reflect the type of the process (cooling):
>
> library(lattice)
> myprepanel <- function(x,y,...) list(xlim=rev(range(x)))
> x <- rep(1:10, 100)
> z <- factor(sample(10, 1000, T))
> y <- rnorm(1000, x, as.numeric(z))
>
> xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel)
>
> This works as expected. But I don't like to have the x-axis annotation
> in every panel since in fact it is the same in every panel. Doing the
> same with relation="same" does not work, however. From "Lattice -
> Multivariate Data Visualization with R" p.142:
>
> "Note that this will not work when relation="same", because in that case
> the process of combining limits from different packets makes the final
> limit sorted [...]"
>
> So which part do I have to change to get a multipanel plot with x-axis
> in reversed order but where the x-axes are only drawn in the panels at
> the boundary (like in xyplot(y~x|z))? Do I have to provide an own axis
> function?

No (in fact that wouldn't work anyway), you can simply do

xyplot(y~x|z, xlim = rev(extendrange(x)))

The point is that in this case you have to explicitly specify a
pre-computed limit, and cannot rely on the prepanel function to give
you a nice default.

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Thaler, Thorn, LAUSANNE, Applied Mathematics

> No (in fact that wouldn't work anyway), you can simply do
> 
> xyplot(y~x|z, xlim = rev(extendrange(x)))
> 
> The point is that in this case you have to explicitly specify a
> pre-computed limit, and cannot rely on the prepanel function to give
> you a nice default.

Thanks that does the trick. Because I'm curious: is extendrange(x) the default 
way to calculate the limits if not given?

By the way, it would be very nice if one could write


df <- data.frame(x=1:3, y=2:4, df.col=3:5)
xyplot(y~x, data=df, col=df.col)

instead of 

xyplot(y~x, data=df, col=df$df.col)

which is a common mistake I make assuming that providing the data argument to a 
lattice command allows for omitting the name of the data.frame in all parts of 
the argument list not just in the formula part. Is there a reason why this is 
not the default behavior?

Thanks again + BR

Thorn


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] odd subset behavior

2010-08-30 Thread Ivan Calandra

  Hi!

I might be a bit late, but maybe you meant:
x[x>= -0.50 | x<= -1.0]

See ?"|"

HTH,
Ivan



Le 8/29/2010 04:54, Joshua Wiley a Ã©crit :
> Dear Erin,
>
> -0.50 is greater than -1.00, this means that your logical test returns
> all FALSE answers.  Now consider the results of:
>
> x[FALSE]
>
> you are selecting none of 'x', hence numeric(0).
>
> Perhaps you mean:
>
> x[x<= -0.50&  x>= -1.0]
>
> HTH,
>
> Josh
>
> On Sat, Aug 28, 2010 at 7:47 PM, Erin Hodgess  wrote:
>> Dear R People:
>>
>> I just produced the following:
>>
>>   x<- seq(-3,3,.01)
>>   x[x>= -0.50&  x<= -1.0]
>> numeric(0)
>>   x[x>= -0.50&&  x<= -1.0]
>> numeric(0)
>>
>> What am I doing wrong, please?  This seems strange.
>>
>> Thanks,
>> Erin
>>
>>
>> --
>> Erin Hodgess
>> Associate Professor
>> Department of Computer and Mathematical Sciences
>> University of Houston - Downtown
>> mailto: erinm.hodg...@gmail.com
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. SÃ¤ugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham

It's not just that counts might be zero, but also that the base of
each bar starts at zero. I really don't see how logging the y/axis of
a histogram makes sense.

Hadley

On Sunday, August 29, 2010, Joshua Wiley  wrote:
> Hi Derek,
>
> Here is an option using the package ggplot2:
>
> library(ggplot2)
> x <- sample(x = 10:50, size = 50, replace = TRUE)
> qplot(x = x, geom = "histogram") + scale_y_log()
>
> However, the log scale is often inappropriate for histograms, because
> the y-axis represents counts, which could potentially be 0, and
> therefore undefined (R outputs -Inf).  Another option using base
> graphics would be something along the lines (no pun intended) of:
>
> temp <- hist(x, plot = FALSE) #get histogram data
> plot(x = temp$mids, y = log(temp$counts), type = "h")
>
> HTH,
>
> Josh
>
> On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones  wrote:
>> All,
>>
>> I have been trying to get calls to hist(...) to be plotted
>> with the y-axis having a log scale.
>>
>> I have tried: par(ylog=TRUE)
>>
>> I have also looked at the histogram package.
>>
>> Suggestions welcome.
>>
>> --
>> Derek M. Jones                         tel: +44 (0) 1252 520 667
>> Knowledge Software Ltd                 mailto:de...@knosof.co.uk
>> Source code analysis                   http://www.knosof.co.uk
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> r-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Deepayan Sarkar

On Mon, Aug 30, 2010 at 3:56 AM, Thaler,Thorn,LAUSANNE,Applied
Mathematics  wrote:
>> No (in fact that wouldn't work anyway), you can simply do
>>
>> xyplot(y~x|z, xlim = rev(extendrange(x)))
>>
>> The point is that in this case you have to explicitly specify a
>> pre-computed limit, and cannot rely on the prepanel function to give
>> you a nice default.
>
> Thanks that does the trick. Because I'm curious: is extendrange(x) the 
> default way to
> calculate the limits if not given?

Well, it's actually lattice:::extend.limits(range(x)), but
extendrange() does basically the same thing, and is available to the
user (i.e., exported), albeit from a different package.

> By the way, it would be very nice if one could write
>
>
> df <- data.frame(x=1:3, y=2:4, df.col=3:5)
> xyplot(y~x, data=df, col=df.col)
>
> instead of
>
> xyplot(y~x, data=df, col=df$df.col)

And what if, say,

df <- data.frame(x=1:3, y=2:4, df.col=c("one", "two", "three"))

and you still want the different 'df.col' values to have different colors?

The Trellis philosophy thinks of 'df.col' as a "grouping variable" (to
be distinguished by color etc. depending on context), so you would say

xyplot(y~x, data=df, groups=df.col)

ggplot2 goes further and allows you to associate different graphical
parameters (color, plotting character, etc.) with different variables,
which is a bit harder to do with lattice. In either case, the point is
that except in special cases, you cannot expect the values of a
variable to be exactly interpretable as valid R color specifications.

You can of course do

with(df, xyplot(y~x, col=df.col))

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.loess and NA/NaN values

2010-08-30 Thread Prof Brian Ripley


The underlying problem is your expectations.

R (unlike S) was set up many years ago to use na.omit as the default, 
and when fitting both lm() and loess() silently omit cases with 
missing values.  So why should prediction from 'newdata' be different 
unless documented to be so (which it is nowadays for predict.lm, 
even though you are adding to the evidence that was a mistake)?


loess() is somewhat different from lm() in that it does not in general 
allow extrapolation, and the prediction for Inf and NaN is simply 
undefined.


Nevertheless, take a look at the version in R-devel (pre-2.12.0) which 
give you more options.


On Fri, 27 Aug 2010, Philipp Pagel wrote:



Hi!

In a current project, I am fitting loess models to subsets of data in
order to use the loess predicitons for normalization (similar to what
is done in many microarray analyses). While working on this I ran into
a problem when I tried to predict from the loess models and the data
contained NAs or NaNs. I tracked down the problem to the fact that
predict.loess will not return a value at all when fed with such
values. A toy example:

x <- rnorm(15)
y <- x + rnorm(15)
model.lm <- lm(y~x)
model.loess <- loess(y~x)
predict(model.lm, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))
predict(model.loess, data.frame(x=c(0.5, Inf, -Inf, NA, NaN)))

The behaviour of predict.lm meets my expectation: I get a vector of
length 5 where the unpredictable ones are NA or NaN. predict.loess on the
other hand returns only 3 values quietly skipping the last two.

I was unable to find anything in the manual page that explains this
behaviour or says how to change it. So I'm asking the community: Is
there a way to fix this or do I have to code around it?

This is in R 2.11.1 (Linux), by the way.

Thanks in advance

Philipp


--
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Thaler, Thorn, LAUSANNE, Applied Mathematics

> Well, it's actually lattice:::extend.limits(range(x)), but
> extendrange() does basically the same thing, and is available to the
> user (i.e., exported), albeit from a different package.

Thanks again Deepayan for the insight.

A followup question though-- in another setting I'd like to have 
relation="free" to allow for different x scales. Thus, I can use the prepanel 
function: 

myprepanel <- function(x,y,...) list(xlim=rev(range(x)))

rep(1:10, 100)
z <- factor(sample(10, 1000, T))
y <- rnorm(1000, x, as.numeric(z))

xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel)

which works as expected. But now I want to reverse the x-axis only for specific 
panels (z %in% 1:9 say). In the above example 'myprepanel' is called 10 times 
(well actually it is called nlevels(z) times). Hence, my approach would be to 
include an 'if' clause in the prepanel function and reverse the xlims only for 
the appropriate panels. 

What I usually do in such cases in panel functions (not beautiful but working 
though) is something like

if (panel.number() == 1)
  do something

I'm aware that this is maybe not in the spirit of lattice and that this kind of 
code maybe breaks as soon as the ordering of panels may change. However, it 
does the trick for panel functions. It does, however, not work for the prepanel 
since panel.number() is not defined at this stage (well, actually it returns a 
0x0 matrix).

Thus, I've three questions:

1.) How can I achieve different xlims for different panels in an _automatic_ 
way? What I've done so far is something like

xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel, xlim=list(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, extendrange(x[z==10])))

Thus, for the first 9 panels the xlims are calculated automatically with 
myprepanel while for the last one I've to provide the limits explicitly. Is 
there a more elegant way to do this?

2.) Within a prepanel function: do I have any possibility to get the 
information for which panel the prepanel function is called?

2.) Is the only way to adjust the panel function for different panels using 
panel.number() or is there another (better) way?

Thanks again for your kind help, it improves my insight on lattice a lot,

BR, Thorn

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones


Hadley,


It's not just that counts might be zero, but also that the base of
each bar starts at zero. I really don't see how logging the y/axis of
a histogram makes sense.


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen

The references given for logHist in David Scott's DistributionUtils
package are:

Barndorff-Nielsen, O. (1977) Exponentially decreasing distributions for 
the logarithm of particle size, Proc. Roy. Soc. Lond., A353, 401–419.


Barndorff-Nielsen, O. and Blæsild, P (1983). Hyperbolic distributions. 
In Encyclopedia of Statistical Sciences, eds., Johnson, N. L., Kotz, S. 
and Read, C. B., Vol. 3, pp. 700–707. New York: Wiley.


Fieller, N. J., Flenley, E. C. and Olbricht, W. (1992) Statistics of 
particle size data. Appl. Statist., 41, 127–146.




Hadley

On Sunday, August 29, 2010, Joshua Wiley  wrote:

Hi Derek,

Here is an option using the package ggplot2:

library(ggplot2)
x<- sample(x = 10:50, size = 50, replace = TRUE)
qplot(x = x, geom = "histogram") + scale_y_log()

However, the log scale is often inappropriate for histograms, because
the y-axis represents counts, which could potentially be 0, and
therefore undefined (R outputs -Inf).  Another option using base
graphics would be something along the lines (no pun intended) of:

temp<- hist(x, plot = FALSE) #get histogram data
plot(x = temp$mids, y = log(temp$counts), type = "h")

HTH,

Josh

On Sun, Aug 29, 2010 at 6:58 PM, Derek M Jones  wrote:

All,

I have been trying to get calls to hist(...) to be plotted
with the y-axis having a log scale.

I have tried: par(ylog=TRUE)

I have also looked at the histogram package.

Suggestions welcome.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [회신] R-help Digest, Vol 90, Issue 34

2010-08-30 Thread jungsg


   9쩔첫   1�횕�횣쨌횓   횁짝횁횜쨈챘횉횖짹쨀쩔징쩌   짹횢쨔짬횉횛쨈횕쨈횢.   쩐횛�쨍쨌횓   쨍횧�횕�쨘   횁짚쩌짰짹횢
   sukgeun.j...@gmail.com �쨍쨌횓 쨘쨍쨀쨩횁횜쩌쩌쩔채.
   I   no  longer  work  at  NFRDI.  Please  send  your  mail  to  me  at
   sukgeun.j...@gmail.com.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] different interface to by (tapply)?

2010-08-30 Thread ivo welch

dear R experts:

has someone written a function that returns the results of by() as a
data frame?   of course, this can work only if the output of the
function that is an argument to by() is a numerical vector.
presumably, what is now names(byobject) would become a column in the
data frame, and the by object's list elements would become columns.
it's a little bit like flattening the by() output object (so that the
name of the list item and its contents become the same row), and
having the right names for the columns.  I don't know how to do this
quickly in the R way.  (Doing it slowly, e.g., with a for loop over
the list of vectors, is easy, but would not make a nice function for
me to use often.)

for example, lets say my by() output is currently

by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )

$`A`
[1] 2 3
$`B`
[2] 4 5

then the revised by() would instead produce

charid  m  s
A          2  3
B          4  5

working with data frames is often more intuitive than working with the
output of by().  the R wizards are probably chuckling now about how
easy this is...

regards,

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice help required

2010-08-30 Thread Kay Cecil Cichini


felix,
thanks a lot for the hint!

i actually found another way by setting up a panel function by which i  
can control every single panel with panel.number(). maybe there is  
more efficient coding - i don't know. i also alternated tickmarks and  
tick-labeling by panel-rows, which is nicer, but ran into the next  
trouble of not knowing how to trigger the ylabs which by default will  
be printed to only one side - i'll show code and would be happy if  
someone would point out how to put the ylabs correctly so that i'll be  
able to bring my lattice-odyssey to an end.


here's the code:

library(lattice)

y1 <- rnorm(120,100,10)
y2 <- rnorm(120,10,1)
facs<-expand.grid(Sites=rep(c("Site I","Site II","Site  
III"),20),Treatment=c("A","B"))


trellis.par.set(clip=list(panel="off",strip="off"),
layout.widths=list(right.padding=10,
  left.padding=8))

PanFun<-function(...){
panel.dotplot(...)
if(is.element(panel.number(),1:2))
{at<-pretty(c(min(y1),max(y1)))
   panel.axis("right",at=at,outside=T,
   labels = F,half=F)}
if(panel.number() == 3)
{at<-pretty(c(min(y1),max(y1)))
   panel.axis("right",at=at,outside=T,
   labels = T,half=F)
   ylab="response y1"} how to set ylabs???
if(panel.number() == 4)
{at<-pretty(c(min(y2),max(y2)))
   panel.axis("left",at=at,outside=T,
   labels = T,half=F)
ylab="response y2"}    how to set ylabs ???
if(is.element(panel.number(),5:6))
{at<-pretty(c(min(y2),max(y2)))
   panel.axis("left",at=at,outside=T,
   labels = F,half=F)}
}

dotplot(y1+y2 ~ Treatment|Sites,
outer=TRUE,data=facs,
scales = list(
x = list(rot = 0, tck=c(1,0)),
  y = list(draw=F,relation="free",
  limits=list(range(y1),range(y1),range(y1),
  range(y2),range(y2),range(y2,
ylab=NULL,  ### this ylab argument would
xlab=c("Site 1", "Site 2","Site 3"),  ### plot only at left side..
strip=FALSE, panel=PanFun)


Zitat von Felix Andrews :


Deepayan Sarkar has a function combineLimits() in the development
version of the latticeExtra package (i.e. the version on
r-forge.r-project.org) which will set common scales in each row or
column of your layout. It can also remove the internal axes.

# Felix


On 26 August 2010 04:43, Kay Cichini  wrote:


.. thanks again, richard.
and you swiftly saw the next problem comming up - when using par.settings =
list(layout.widths = list(axis.panel = c(1, 0))) getting rid of the double
tick labeling would be natural -
but i'll leave it at that for today.

many thanks,
kay


Richard M. Heiberger wrote:


The multiple y axes are protecting you in this situation.


z <- cbind(rnorm(100,c(1,10),1), rnorm(100,c(20,30),1))
dotplot(z[,1]+z[,2] ~ facs$Treatment|facs$Sites,
        outer=TRUE,
        scales = list(
          y = list(
            relation="free")),
        ylab=c("y1", "y2"),
        xlab=c("Site 1", "Site 2"),
        strip=FALSE,
        main="problem")

dotplot(z[,1]+z[,2] ~ facs$Treatment|facs$Sites,
        outer=TRUE,
        scales = list(
          y = list(
            relation="free",
            limits=list(c(-5,13),c(-5,13),c(18,32),c(18,32,
        ylab=c("y1", "y2"),
        xlab=c("Site 1", "Site 2"),
        strip=FALSE, main="protected")

For more control (such as suppressing the y-tick labels in the right-hand
column,
I recommend Deepayan Sarkar's book.

Rich

      [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context:  
http://r.789695.n4.nabble.com/lattice-help-required-tp2338382p2338707.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Felix Andrews / ???
http://www.neurofr

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham

> I have counts ranging over 4-6 orders of magnitude with peaks
> occurring at various 'magic' values.  Using a log scale for the
> y-axis enables the smaller peaks, which would otherwise
> be almost invisible bumps along the x-axis, to be seen

That doesn't justify the use of a _histogram_  - and regardless of
what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).

It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lost in vegan package

2010-08-30 Thread eva


Hi R Helpers, 
I'm still new to R and i experience many difficulties..I'm using vegan
package (R version 2.11) trying to calculate checkerboard units for each
species pair of a matrix. I've prepared the function:

pair.checker=function (dataset) {designdist (dataset,
method="c("(A-J)x(B-J)", terms ="binary", abcd=FALSE)}

to use with function oecosimu as follows:

oecosimu(dataset, pair.checker, "tswap", nsimul=5000, burnin=0, thin=thin,
statistic="pair.checker")

It seemed to work but the output did not include each species pair name. I
don't know what to do. First column was all NAs. I copied and pasted the
results of the console and named each species pair in Excel by hand. But
then I got this really big matrix with 3828 possible species pairs. The
console couldn't show all posssible species pairs even after resetting the
max.print option. I've tried saving the output as:

 save(out, file="d:/eva", ascii=TRUE) 

but I first got a blank txt file and then when I retried the method I got 

Error in file (file, "wb"):cannot open the connection.

I can't use write.table either (I got the message that the output is class
dist...) .

Thank's in advance

Eva


-- 
View this message in context: 
http://r.789695.n4.nabble.com/lost-in-vegan-package-tp2400145p2400145.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread Henrique Dallazuanna

Try this:

as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))

On Mon, Aug 30, 2010 at 10:19 AM, ivo welch  wrote:

> dear R experts:
>
> has someone written a function that returns the results of by() as a
> data frame?   of course, this can work only if the output of the
> function that is an argument to by() is a numerical vector.
> presumably, what is now names(byobject) would become a column in the
> data frame, and the by object's list elements would become columns.
> it's a little bit like flattening the by() output object (so that the
> name of the list item and its contents become the same row), and
> having the right names for the columns.  I don't know how to do this
> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
> the list of vectors, is easy, but would not make a nice function for
> me to use often.)
>
> for example, lets say my by() output is currently
>
> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>
> $`A`
> [1] 2 3
> $`B`
> [2] 4 5
>
> then the revised by() would instead produce
>
> charid  m  s
> A  2  3
> B  4  5
>
> working with data frames is often more intuitive than working with the
> output of by().  the R wizards are probably chuckling now about how
> easy this is...
>
> regards,
>
> /iaw
>
> 
> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread ivo welch

serious?

 key <- c(1,1,1,2,2,2)
 val1 <- rnorm(6)
 indf <- data.frame( key, val1)
 outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
 outdf
indf$key: 1
 m.key m.val1  s.key s.val1
1. 0.6005 0. 1.0191
--
indf$key: 2
  m.key  m.val1   s.key  s.val1
 2. -0.8177  0.  0.3978

> as.data.frame(by(indf, indf$key, function(x) c(m=mean(x), s=sd(x
Error in as.data.frame.default(by(indf, indf$key, function(x) c(m = mean(x),  :
  cannot coerce class '"by"' into a data.frame

/iaw

Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)



On Mon, Aug 30, 2010 at 9:36 AM, Henrique Dallazuanna  wrote:
> Try this:
>
> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
>
> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch  wrote:
>>
>> dear R experts:
>>
>> has someone written a function that returns the results of by() as a
>> data frame?   of course, this can work only if the output of the
>> function that is an argument to by() is a numerical vector.
>> presumably, what is now names(byobject) would become a column in the
>> data frame, and the by object's list elements would become columns.
>> it's a little bit like flattening the by() output object (so that the
>> name of the list item and its contents become the same row), and
>> having the right names for the columns.  I don't know how to do this
>> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
>> the list of vectors, is easy, but would not make a nice function for
>> me to use often.)
>>
>> for example, lets say my by() output is currently
>>
>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>>
>> $`A`
>> [1] 2 3
>> $`B`
>> [2] 4 5
>>
>> then the revised by() would instead produce
>>
>> charid  m  s
>> A          2  3
>> B          4  5
>>
>> working with data frames is often more intuitive than working with the
>> output of by().  the R wizards are probably chuckling now about how
>> easy this is...
>>
>> regards,
>>
>> /iaw
>>
>> 
>> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

2010-08-30 Thread David Winsemius

On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:

Dear R helpers,

Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir.  
However, there seems to be mis-communication from my end  
corresponding to my requirement. As I had mentioned in my earlier  
mail, I am dealing with a very large database of borrowers and I had  
given a part of it in my earlier mail as given below. For a given  
rating say "A", I needed to have the bad-wise sums of ead's (where  
bands are constructed using the ead size itself.) and not the number  
of borrowers falling in a particular band.

I am reproducing the data and solution as provided by Winsemius Sir  
(which generates the number of band-wise borrowers for a given rating.

rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA",  
"AA", "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB",  
"A", "BB", "A", "AA", "B","A", "AA", "BBB", "A", "BBB")

ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000,  
1028360,  600, 17715000,  14430325.24, 1180946.57, 15,  
167490, 81255.16, 54812.5, 3000, 1275702.94, 9100, 1763142.3,  
3283048.61, 120, 11800, 3000,  96894.02,  453671.72,  7590,  
106065.24, 940711.67,  2443000, 950, 39000, 1501939.67)

df$ead.cat <- cut(df$ead, breaks=c(0, 10, 50, 100,  
200, 500 , 1000, 1) )

df

df_sorted <- df[order(df$rating),]  # the output is as given  
below.

> df_sorted
   rating ead ead.cat
1   A  169229.93(1e+05,5e+05]
3   A 5877794.25(5e+06,1e+07]
6   A21000.00   (0,1e+05]
12  A  15.00   (1e+05,5e+05]
13  A  167490.00   (1e+05,5e+05]
18  A 9100.00   (0,1e+05]
23  A 3000.00   (0,1e+05]
25  A  453671.72   (1e+05,5e+05]
28  A  940711.67   (5e+05,1e+06]
31  A39000.00  (0,1e+05]
5  AA   75040962.06  (1e+07,1e+08]
9  AA   17715000.00  (1e+07,1e+08]
10 AA  14430325.24  (1e+07,1e+08]
11 AA1180946.57  (1e+06,2e+06]
14 AA81255.16 (0,1e+05]
17 AA 1275702.94 (1e+06,2e+06]
26 AA  7590.00(0,1e+05]
29 AA 2443000.00 (2e+06,5e+06]
2 AAA   100.00 (0,1e+05]
19AAA   1763142.30  (1e+06,2e+06]
27  B   106065.24  (1e+05,5e+05]
7  BB 1028360.00  (1e+06,2e+06]
15 BB54812.50 (0,1e+05]
22 BB11800.00 (0,1e+05]
24 BB96894.02 (0,1e+05]
4 BBB9530148.63  (5e+06,1e+07]
8 BBB600.00  (5e+06,1e+07]
16BBB3000.00  (0,1e+05]
20BBB   3283048.61   (2e+06,5e+06]
21BBB   120.00   (1e+06,2e+06]
30BBB   950.00   (5e+06,1e+07]
32BBB   1501939.67   (1e+06,2e+06]

## The following command fetches rating-wise and ead size no of  
borrowers. Thus, for rating A, there are 4 borrowers in the ead  
range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so  
on..

> with(df, tapply(ead.cat, rating, table))
$A

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
4 4 1 0  
0 1 0

$AA

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
2 0 0 2  
1 0 3

$AAA

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
1 0 0 1  
0 0 0

$B

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
0 1 0 0  
0 0 0

$BB

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
3 0 0 1  
0 0 0

$BBB

(0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e 
+06] (5e+06,1e+07] (1e+07,1e+08]
1 0 0 2  
1 3 0

 My ACTUAL REQUIREMENT

Actually for a given rating, I don't want the number of borrowers  
falling in each of the ead_range. What I want is sum of eads falling  
in each range. Thus, say for rating "A", I need following.

 ratingead.cat  ead_total
1  A   (0,1e+05] 72100.00#  
(21000+9100+3000+39000)

2  A   (1e+05,

Re: [R] different interface to by (tapply)?

2010-08-30 Thread Marc Schwartz


FYI, since R version 2.11.0, aggregate() can return a vector of summary 
results, rather than just a scalar:

> aggregate(iris$Sepal.Length, list(Species = iris$Species), 
function(x) c(Mean = mean(x), SD = sd(x)))
 Speciesx.Mean  x.SD
1 setosa 5.006 0.3524897
2 versicolor 5.936 0.5161711
3  virginica 6.588 0.6358796


There is also now a formula interface:

> aggregate(. ~ Species, data = iris, 
FUN = function(x) c(Mean = mean(x), SD = sd(x)))
 Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
1 setosa 5.006   0.35248973.428
2 versicolor 5.936   0.51617112.770
3  virginica 6.588   0.63587962.974
  Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
1  0.3790644 1.462   0.17366400.246
2  0.3137983 4.260   0.46991101.326
3  0.3224966 5.552   0.55189472.026
  Petal.Width.SD
1  0.1053856
2  0.1977527
3  0.2746501


HTH,

Marc Schwartz


On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:

> Try this:
> 
> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
> 
> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch  wrote:
> 
>> dear R experts:
>> 
>> has someone written a function that returns the results of by() as a
>> data frame?   of course, this can work only if the output of the
>> function that is an argument to by() is a numerical vector.
>> presumably, what is now names(byobject) would become a column in the
>> data frame, and the by object's list elements would become columns.
>> it's a little bit like flattening the by() output object (so that the
>> name of the list item and its contents become the same row), and
>> having the right names for the columns.  I don't know how to do this
>> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
>> the list of vectors, is easy, but would not make a nice function for
>> me to use often.)
>> 
>> for example, lets say my by() output is currently
>> 
>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>> 
>> $`A`
>> [1] 2 3
>> $`B`
>> [2] 4 5
>> 
>> then the revised by() would instead produce
>> 
>> charid  m  s
>> A  2  3
>> B  4  5
>> 
>> working with data frames is often more intuitive than working with the
>> output of by().  the R wizards are probably chuckling now about how
>> easy this is...
>> 
>> regards,
>> 
>> /iaw
>> 
>> 
>> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread ivo welch

perfect.  this is the R way to do it quick and easy.  thank you, marc.

(PS, in my earlier example, what I wanted was aggregate( . ~ key,
data=indf, FUN = function(x) c(m=mean(x), s=sd(x)))  )


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)





On Mon, Aug 30, 2010 at 10:47 AM, Marc Schwartz  wrote:
>
> FYI, since R version 2.11.0, aggregate() can return a vector of summary 
> results, rather than just a scalar:
>
>> aggregate(iris$Sepal.Length, list(Species = iris$Species),
>            function(x) c(Mean = mean(x), SD = sd(x)))
>     Species    x.Mean      x.SD
> 1     setosa 5.006 0.3524897
> 2 versicolor 5.936 0.5161711
> 3  virginica 6.588 0.6358796
>
>
> There is also now a formula interface:
>
>> aggregate(. ~ Species, data = iris,
>            FUN = function(x) c(Mean = mean(x), SD = sd(x)))
>     Species Sepal.Length.Mean Sepal.Length.SD Sepal.Width.Mean
> 1     setosa         5.006       0.3524897        3.428
> 2 versicolor         5.936       0.5161711        2.770
> 3  virginica         6.588       0.6358796        2.974
>  Sepal.Width.SD Petal.Length.Mean Petal.Length.SD Petal.Width.Mean
> 1      0.3790644         1.462       0.1736640        0.246
> 2      0.3137983         4.260       0.4699110        1.326
> 3      0.3224966         5.552       0.5518947        2.026
>  Petal.Width.SD
> 1      0.1053856
> 2      0.1977527
> 3      0.2746501
>
>
> HTH,
>
> Marc Schwartz
>
>
> On Aug 30, 2010, at 8:36 AM, Henrique Dallazuanna wrote:
>
>> Try this:
>>
>> as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
>>
>> On Mon, Aug 30, 2010 at 10:19 AM, ivo welch  wrote:
>>
>>> dear R experts:
>>>
>>> has someone written a function that returns the results of by() as a
>>> data frame?   of course, this can work only if the output of the
>>> function that is an argument to by() is a numerical vector.
>>> presumably, what is now names(byobject) would become a column in the
>>> data frame, and the by object's list elements would become columns.
>>> it's a little bit like flattening the by() output object (so that the
>>> name of the list item and its contents become the same row), and
>>> having the right names for the columns.  I don't know how to do this
>>> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
>>> the list of vectors, is easy, but would not make a nice function for
>>> me to use often.)
>>>
>>> for example, lets say my by() output is currently
>>>
>>> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
>>>
>>> $`A`
>>> [1] 2 3
>>> $`B`
>>> [2] 4 5
>>>
>>> then the revised by() would instead produce
>>>
>>> charid  m  s
>>> A          2  3
>>> B          4  5
>>>
>>> working with data frames is often more intuitive than working with the
>>> output of by().  the R wizards are probably chuckling now about how
>>> easy this is...
>>>
>>> regards,
>>>
>>> /iaw
>>>
>>> 
>>> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread Erik Iverson


I can definitely recommend the "plyr" package for these sorts of
operations.

http://had.co.nz/plyr/

ivo welch wrote:

dear R experts:

has someone written a function that returns the results of by() as a
data frame?   of course, this can work only if the output of the
function that is an argument to by() is a numerical vector.
presumably, what is now names(byobject) would become a column in the
data frame, and the by object's list elements would become columns.
it's a little bit like flattening the by() output object (so that the
name of the list item and its contents become the same row), and
having the right names for the columns.  I don't know how to do this
quickly in the R way.  (Doing it slowly, e.g., with a for loop over
the list of vectors, is easy, but would not make a nice function for
me to use often.)

for example, lets say my by() output is currently

by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )

$`A`
[1] 2 3
$`B`
[2] 4 5

then the revised by() would instead produce

charid  m  s
A  2  3
B  4  5

working with data frames is often more intuitive than working with the
output of by().  the R wizards are probably chuckling now about how
easy this is...

regards,

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread William Dunlap

Have you tried aggregate or plyr's ddply?
by() is meant for functions that return such
complicated return values that automatically combining
them is not feasible (e.g., lm()).  aggregate()
works for functions that return scalars or
simple vectors and returns a data.frame.
ddply is part of a family of apply functions
with a uniform interface.

I didn't notice any sample data so I made
up some and your by() call didn't work with
what I made up, so perhaps you need something
else.

> indf<-data.frame(charid=c("A","A","A","B","A","B"), x= 11:16)
> by(indf$x, indf$charid, function(x)c(m=mean(x),s=sd(x)))
indf$charid: A
m s 
12.75  1.707825 
 
indf$charid: B
m s 
15.00  1.414214 
> ddply(indf, .variables=.(charid), .fun=function(df)c(m=mean(df$x),s=sd(df$x)))
  charid ms
1  A 12.75 1.707825
2  B 15.00 1.414214
> str(.Last.value)
'data.frame':   2 obs. of  3 variables:
 $ charid: Factor w/ 2 levels "A","B": 1 2
 $ m : num  12.8 15
 $ s : num  1.71 1.41

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch
> Sent: Monday, August 30, 2010 6:19 AM
> To: r-help
> Subject: [R] different interface to by (tapply)?
> 
> dear R experts:
> 
> has someone written a function that returns the results of by() as a
> data frame?   of course, this can work only if the output of the
> function that is an argument to by() is a numerical vector.
> presumably, what is now names(byobject) would become a column in the
> data frame, and the by object's list elements would become columns.
> it's a little bit like flattening the by() output object (so that the
> name of the list item and its contents become the same row), and
> having the right names for the columns.  I don't know how to do this
> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
> the list of vectors, is easy, but would not make a nice function for
> me to use often.)
> 
> for example, lets say my by() output is currently
> 
> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
> 
> $`A`
> [1] 2 3
> $`B`
> [2] 4 5
> 
> then the revised by() would instead produce
> 
> charid  m  s
> A          2  3
> B          4  5
> 
> working with data frames is often more intuitive than working with the
> output of by().  the R wizards are probably chuckling now about how
> easy this is...
> 
> regards,
> 
> /iaw
> 
> 
> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] listing files recursively

2010-08-30 Thread Hyunchul Kim

Hi, all

how to get all filenames in a directory and its all subdirectories?

something like
filenames <- c(Sys.glob('/path/to/directory/*'),
Sys.glob('/path/to/directory/*/*'), Sys.glob('/path/to/directory/*/*/*'),
...)

Thanks in advance,

Hyunchul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to Remove Autocorrelation from Simple Moving Average time series

2010-08-30 Thread Ramesh Kallol

Hi R experts,

 

I am trying to remove autocorrelation from Simple Moving Average time series. I 
know that this can be done by using seasonal ARIMA like,

 

library(TTR)

data <- rnorm(252)

n=21

sma_data=SMA(data,n)

sma_data=sma_data[-1:-n]

acf(sma_data,length(sma_data))

arima=arima(sma_data,c(0,0,0),seasonal = list(order = c(0, 
0,n)));tsdiag(arima,100);arima$aic;

 

 

But is there any easy way that we can do in excel?? (Like differencing, dummy 
variable approach etc)

 

 

Thanks and Regards,

Ramesh Kallol | Amba Research

Ph +91 80 3980 8467 | Mob +91 9019720734

Bangalore * Colombo * London * New York * San José * Singapore * 
www.ambaresearch.com  

 

This e-mail may contain confidential and/or privileged i...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] listing files recursively

2010-08-30 Thread Joshua Wiley

Hi,

This should do it:

list.files(path = "/your/path", recursive = TRUE)

Cheers,

Josh

On Mon, Aug 30, 2010 at 8:01 AM, Hyunchul Kim
 wrote:
> Hi, all
>
> how to get all filenames in a directory and its all subdirectories?
>
> something like
> filenames <- c(Sys.glob('/path/to/directory/*'),
> Sys.glob('/path/to/directory/*/*'), Sys.glob('/path/to/directory/*/*/*'),
> ...)
>
> Thanks in advance,
>
> Hyunchul
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] listing files recursively

2010-08-30 Thread Phil Spector


Hyunchul -
   You don't have to rely on the operating system to 
provide information about the file system.  Look at


?list.files

For example,

list.files('/path/to/directory',recursive=TRUE)

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu

On Tue, 31 Aug 2010, Hyunchul Kim wrote:


Hi, all

how to get all filenames in a directory and its all subdirectories?

something like
filenames <- c(Sys.glob('/path/to/directory/*'),
Sys.glob('/path/to/directory/*/*'), Sys.glob('/path/to/directory/*/*/*'),
...)

Thanks in advance,

Hyunchul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 'mgcv' package, problem with predicting binomial (logit) data

2010-08-30 Thread Jef Vlegels

Dear R-help list,

Im using the mgcv package to plot predictions based on the gam function.

I predict the chance of being a (frequent) participant at theater plays vs.
not being a participant by age.
Because my outcome variable is dichotomous, I use the binomial family with
logit link function.

Dataset in attachment, code to read it in R:
data <- read.spss("pas_r.sav")
attach(data)

In a first step I use gam to model my data and predict to calculate and
plot the predicted values, this all works fine. 
My code looks like this:

test <- gam(participant ~ s(age,fx=FALSE,bs='cr'), family=binomial(logit))
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(age)
plot(age[I1], test.pred$fit[I1],lty=1, type="l")
lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2)
lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2)
plot(age[I1], test.pred$fit[I1] , type="l")

I a second step, I want to calculate a similar model, but only for
respondents with a certain characteristic. For example, in this case, only
for male respondents.
I use a code that looks like this:

participant_male <- participant[gender=="male"]
age_male <- age[gender=="male"]

test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"),
family=binomial(logit), na.action=na.omit)
summary(test2)
plot(test2, shade=TRUE)

I get a nice smoother function in this plot, like I expected.
Then, when I want to plot the predicted values, I use a code that looks like
this:

Test2.pred <- predict(test5,se.fit=TRUE, type="response")
I1<-order(age_male)
plot(age_male[I1], test2.pred$fit[I1],lty=1)

This last plot, of the predictions, is not what I expect. Its just a random
scatterplot, not what I would expect from the smoother plot. Does anybody
know what I did wrong?

Thanks in advance,
Jef Vlegels

Jef Vlegels
Ghent University - Department of Sociology
Korte Meer 3, B-9000 Gent, Belgium
Tel:  09 264 8343
www.psw.UGent.be/sociologie


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 9:19 AM, ivo welch wrote:


dear R experts:

has someone written a function that returns the results of by() as a
data frame?   of course, this can work only if the output of the
function that is an argument to by() is a numerical vector.
presumably, what is now names(byobject) would become a column in the
data frame, and the by object's list elements would become columns.
it's a little bit like flattening the by() output object (so that the
name of the list item and its contents become the same row), and
having the right names for the columns.  I don't know how to do this
quickly in the R way.  (Doing it slowly, e.g., with a for loop over
the list of vectors, is easy, but would not make a nice function for
me to use often.)

for example, lets say my by() output is currently

by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )

$`A`
[1] 2 3
$`B`
[2] 4 5


Doesn't by() return names for the "m" and "s" values?



then the revised by() would instead produce

charid  m  s
A  2  3
B  4  5


by shares with table and tapply the return of a list of matrix or  
array-like objects.


I would expect that as.data.frame would return a dataframe from such  
an argument, but my experience is that it generally forces this into a  
"long" format and one may need to resort to reshape() or reshape::melt  
and reshape::cast() to get it back into a rectangular format. You  
_could_ have created a richer example that would become the basis for  
further elaboration.


--
David.




working with data frames is often more intuitive than working with the
output of by().  the R wizards are probably chuckling now about how
easy this is...

regards,

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] getNodeSet - what am I doing wrong?

2010-08-30 Thread Johannes Graumann

Hi,

Why is the following retuning a nodset of length 0:

> library(XML)
> test <- xmlTreeParse(
> "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE)
> getNodeSet(test,"//modifications_row")

Thanks for any hint.

Joh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones


Hadley,


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen


That doesn't justify the use of a _histogram_  - and regardless of


The usage highlights meaningful characteristics of the data.
What better justification for any method of analysis and display is
there?


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).


Does there have to be a recognized statistical distribution to use R?
In my case I am using R for all of the analysis and graphics in a
new book.  This means that sometimes I have to deal with data sets
that are more or less a jumble of numbers with patterns in a few
places.  For instance, the numeric value of integer constants
appearing as one operand of the binary bitwise-AND operator (see
figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
at: www.knosof.co.uk/cbook/bandcons.hist.gz)

qplot(band, binwidth=8, geom="histogram") + scale_y_log()
does a good job of highlighting the peaks.


It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.


Doesn't being useful for my purpose make it meaningful, at least for me
and I hope my readers?

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Neyman-Scott in 1-dimension?

2010-08-30 Thread Moreno I. Coco


Dear R-Users,

I would like to use the spatstat package in R for simulation and  
analysis of point processes in 1d = 1 dimension, i.e. on the line. I  
tried to use the owin class to obtain a 1-dimensional window, e.g.  
[1,..L], but it doesn't work.


Does anyone know if the spatstat package can deal with linear point  
processes? If yes, how?


Thank you very much for your help,

Moreno

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Include local files when running R remotely

2010-08-30 Thread Erik Shilts

I run R on a remote UNIX server where the data are stored that I ssh into
through Emacs, while I store my R scripts on local Windows network drives.
So far this arrangement hasn't been a problem, though now I'd like to use
source() or a similar function to include other R scripts to get a better
handle on my process management. Is this possible when running R remotely?
When I try to use source() it tells me that the file doesn't exist since
it's looking for it on the UNIX servers.

Any help is much appreciated. Thanks!
Erik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Include local files when running R remotely

2010-08-30 Thread Barry Rowlingson

On Mon, Aug 30, 2010 at 5:29 PM, Erik Shilts  wrote:
> I run R on a remote UNIX server where the data are stored that I ssh into
> through Emacs, while I store my R scripts on local Windows network drives.
> So far this arrangement hasn't been a problem, though now I'd like to use
> source() or a similar function to include other R scripts to get a better
> handle on my process management. Is this possible when running R remotely?
> When I try to use source() it tells me that the file doesn't exist since
> it's looking for it on the UNIX servers.
>
> Any help is much appreciated. Thanks!

 Sounds like a file sharing problem rather than an R problem - ideas
that pop into my head include:

 * mounting the Windows drives on the Unix box - this requires sharing
them out in Windows, and being able to run smbmount on the unix box.

 * get rsync for Windows (maybe install cygwin and get a bunch of
useful Unixy tools for Windows) and use that to efficiently copy from
one machine to the other. But that relies on you remembering to sync
and making sure you dont sync the wrong way. Because then you'll be
sunk.

 * there may be implementations of NFS on Windows, which mean you
could NFS-mount the Unix drives on your Windows box - but I've always
found those things to be a bit flaky.

 The two options involving mounting will require cooperation of your
superuser...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Putting legend outside plotting area

2010-08-30 Thread Chris Campbell

On Sun, Aug 29, 2010 at 20:00, Worik R  wrote:

> Is there  a simple way to put a legend outside the plot area for a simple
> plot?
>
> I found... (at http://www.harding.edu/fmccown/R/)
>
> # Expand right side of clipping rect to make room for the legend
> *par(xpd=T, mar=par()$mar+c(0,0,0,4))*
>
> # Graph autos (transposing the matrix) using heat colors,
> # put 10% of the space between each bar, and make labels
> # smaller with horizontal y-axis labels
> barplot(*t*(autos_data), main="Autos", ylab="Total",
>   col=*heat.colors(3), space=0.1, cex.axis=0.8, las=1,
>   names.arg=c("Mon","Tue","Wed","Thu","Fri"), cex=0.8*)
>
> # Place the legend at (6,30) using heat colors
> legend(*6, 30,* names(*autos_data*), cex=*0.8*, fill=*heat.colors(3)*);
>
> But I do not understand
> *par(xpd=T, mar=par()$mar+c(0,0,0,4))*
>
> Anyway it does not work for me when I change...
>
>
> > y1 <- runif(100, max=1, min=0)
> > y2 <- rnorm(100)
> > ylim <- c(min(y1, y2), max(y1,y2))
> > plot(x, y1, pch=1, ylim=ylim)
> > points(x, y2, pch=2)
> > legend("topleft", legend=c("Uniform", "Normal"), pch=c(1,2))
>
> To
>
> > par(xpd=T, mar=par()$mar+c(0,0,0,4))
> > plot(x, y1, pch=1, ylim=ylim)
> > points(x, y2, pch=2)
> > legend(6, 30, legend=c("Uniform", "Normal"), pch=c(1,2))
>

It looks like (6,30) is way outside of the plot area, which is why you don't
see the legend.  I like to use the values of par()$usr (the xlim and ylim of
the current plot) to set the legend position.  This way it is always
relative to the plotting area.  Try this:

par(mar=c(5.1, 4.1, 4.1, 6.1) )
plot(1:10)
legend( par()$usr[2], mean(par()$usr[3:4]), legend=c("Uniform", "Normal"),
pch=c(1,2), xpd=TRUE, xjust=0, yjust=0.5)

And play around with xjust and yjust to get the legend where you want it.
Hope that helps


>
> I do not get to see the legend at all.
>
> I am confused
>
> cheers
> Worik
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Finding source files automatically within a script

2010-08-30 Thread Abhisek

many thanks!  ill try going down the environment variables path as i kind of
know how to do it.

abhisek

On Sun, Aug 29, 2010 at 9:25 PM, Gabor Grothendieck  wrote:

> On Sun, Aug 29, 2010 at 1:13 PM, Abhisek  wrote:
> > Hi there,
> >
> > Ive tried trawling the lists for a solution but I could not find any
> > matches.  I am typing up all my code on a Linux machine and I call this
> > other script file from my script file using source("foo.r").  Now
> sometimes
> > i access my folder from my Windows machine at work (the files are on
> > dropbox).  But of course my windows machine would not understand the
> linux
> > path name.
> >
> > Is there any syntax for searching the file system for "foo.r" from within
> > the script file that I am writing?  I know that I can always change the
> > working directory to the one where the script file is stored but is there
> > any other way?
> >
> > Thanks!
> > Abhisek Banerjee
> >
>
> You could (1) use file.exists to successively check specific paths.
> It will return FALSE if the path does not exist letting you skip over
> the bad paths.  See ?file.exists
>
> You could alternately (2) set an option in the your .Profile on both
> machines (?options, ?getOption, ?Startup), (3) outside of R set up
> environment variables on both machines reading them in with Sys.getenv
> or (4) set up configuration files (e.g. a file with just one line
> giving the path to the file) which you read in from your home
> directory e.g., readLines("~/.myconfig")  Note that ~ works from
> within R even on Windows and forward slashes work in Windows filenames
> from within R as well as backslashes.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Include local files when running R remotely

2010-08-30 Thread RICHARD M. HEIBERGER

Since you are already on emacs, try using ESS.  I am pretty sure we do that.
http://ess.r-project.org

Please follow up on the ESS mailing ess-h...@stat.math.ethz.ch

Rich

On Mon, Aug 30, 2010 at 12:48 PM, Barry Rowlingson
 wrote:
> On Mon, Aug 30, 2010 at 5:29 PM, Erik Shilts  wrote:
>> I run R on a remote UNIX server where the data are stored that I ssh into
>> through Emacs, while I store my R scripts on local Windows network drives.
>> So far this arrangement hasn't been a problem, though now I'd like to use
>> source() or a similar function to include other R scripts to get a better
>> handle on my process management. Is this possible when running R remotely?
>> When I try to use source() it tells me that the file doesn't exist since
>> it's looking for it on the UNIX servers.
>>
>> Any help is much appreciated. Thanks!
>
>  Sounds like a file sharing problem rather than an R problem - ideas
> that pop into my head include:
>
>  * mounting the Windows drives on the Unix box - this requires sharing
> them out in Windows, and being able to run smbmount on the unix box.
>
>  * get rsync for Windows (maybe install cygwin and get a bunch of
> useful Unixy tools for Windows) and use that to efficiently copy from
> one machine to the other. But that relies on you remembering to sync
> and making sure you dont sync the wrong way. Because then you'll be
> sunk.
>
>  * there may be implementations of NFS on Windows, which mean you
> could NFS-mount the Unix drives on your Windows box - but I've always
> found those things to be a bit flaky.
>
>  The two options involving mounting will require cooperation of your
> superuser...
>
> Barry
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About plot graphs

2010-08-30 Thread Greg Snow

Stephen,

You may have missed an important point in my post (or you may have understood 
it, but not stating it hear could mislead future readers of the exchange.  
Below you have:

> > detach()
> > attach(singer[1:15,])
>
> The following object(s) are masked from women :
>
>  height
>
> > plot(weight,height)
>
> A graph weight(x) height(y) plotted.
>
>

But you need to look really closely at that plot created, this is one of the 
most dangerous plots that R can produce.  It looks fine, there are no errors or 
warnings (well there was a warning earlier, but most people have  forgotten 
about it by the time they look at this plot).

You say that it is a graph of weight(x) height(y), while that is technically 
true it should really be expanded:

It is a plot of the weight from the dataset "women" on the x axis and the 
heights of the 1st 15 singers (singer dataset) on the y axis, there is no 
linkage between the 2 datasets (other than both having a height column), so 
this plot is completely meaningless (look at the plot compared to with(women, 
plot(weight,height)) ).  It is even worse than meaningless because it could 
encourage someone to make the exactly wrong conclusion about the relationship.

My last example there was not showing how to work around a warning/error, but a 
dire caution about how using/misusing attach can lead to wrong decisions.  
Functions like with and within are much preferred for not leading to these 
types of problems.


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: Stephen Liu [mailto:sati...@yahoo.com]
> Sent: Sunday, August 29, 2010 6:55 AM
> To: Greg Snow; gavin.simp...@ucl.ac.uk
> Cc: r-help@r-project.org
> Subject: Re: [R] About plot graphs
>
> Hi Greg,
>
> Thanks for your advice.
>
> > data(women)
> > women
>height weight
> 1  58115
> 2  59117
> 3  60120
> 4  61123
> 5  62126
> 6  63129
> 7  64132
> 8  65135
> 9  66139
> 10 67142
> 11 68146
> 12 69150
> 13 70154
> 14 71159
> 15 72164
> >
>
> > attach(women)
> > library(lattice)
>
> > data(singer)
> Warning message:
> In data(singer) : data set 'singer' not found
>
>
> Continued;
>
> > attach(singer)
>
> The following object(s) are masked from women :
>
>  height
>
> > plot(weight,height)
> Error in xy.coords(x, y, xlabel, ylabel, log) :
>   'x' and 'y' lengths differ
>
> No graph plotted.
>
>
> > detach()
> > attach(singer[1:15,])
>
> The following object(s) are masked from women :
>
>  height
>
> > plot(weight,height)
>
> A graph weight(x) height(y) plotted.
>
>
> B.R.
> Stephen L
>
>
>
>
>
>
> - Original Message 
> From: Greg Snow 
> To: Stephen Liu ; "gavin.simp...@ucl.ac.uk"
> 
> Cc: "r-help@r-project.org" 
> Sent: Sun, August 29, 2010 2:58:12 AM
> Subject: RE: [R] About plot graphs
>
> Gavin gave some problems with relying attaching data, here is another
> example,
> somewhat artificial, but not unrealistic (I had similar to this happen
> to me
> before I learned better):
>
> attach(women)
> # do some stuff
> library(lattice)
> attach(singer)
> # do some more stuff
>
> # now we want to go back and look at the women data
> plot(weight,height)
>
> #or even worse
> detach()
> attach(singer[1:15,])
>
> plot(weight,height)
>
> # what conclusions do we draw from the plot?
>
> detach()
> detach()
>
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s...@imail.org
> 801.408.8111
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> > project.org] On Behalf Of Stephen Liu
> > Sent: Friday, August 27, 2010 11:14 PM
> > To: gavin.simp...@ucl.ac.uk
> > Cc: r-help@r-project.org
> > Subject: Re: [R] About plot graphs
> >
> > Hi Gavin,
> >
> >
> > Thanks for your advice and the examples explaining plotting settings.
> >
> > The steps on your examples work on my test.
> >
> >
> > > 2) Don't attach() things, you are asking for trouble
> >
> > > If a function has a formula method (which plot does) then use it
> like
> > > this: plot(Draft_No. ~ Day_of_year, data = Test01)
> >
> > > If the function doesn't have a formula method, then wrap it in a
> > > with()
> > > call:
> >
> > > with(Test01, plot(Day_of_year, Draft_No.))
> >
> > > No need for attach.
> >
> >
> > Noted and thanks.  What will be the problem caused by "attach()"?
> >
> >
> > > dev.new(height = 6, width = 12)
> > > layout(matrix(1:2, ncol = 2))
> > > op <- par(pty = "s") ## this is the important bit
> > > plot(runif(100), rnorm(100))
> > > plot(runif(100), rnorm(100), col = "red")
> > > par(op) ## now reset the pars
> > > layout(1)
> >
> > What is the function of layout(1) ?  Tks
> >
> >
> > B.R.
> > satimis
> >
> >
> >
> >
> > - Original Message 
> > From: Gavin Simpson 
> > To: Stephen Liu 
> > Cc: "r-help@r-p

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Hadley Wickham

>> That doesn't justify the use of a _histogram_  - and regardless of
>
> The usage highlights meaningful characteristics of the data.
> What better justification for any method of analysis and display is
> there?

That you're displaying something that is mathematically well founded
and meaningful - but my emphasis there was on histogram.  I don't
think a histogram makes sense, but there are other ways of displaying
the same data that would (e.g. a frequency polygon, or maybe a density
plot)

>> what distributional display you use, logging the counts imposes some
>> pretty heavy restrictions on the shape of the distribution (e.g. that
>> it must not drop to zero).
>
> Does there have to be a recognized statistical distribution to use R?

My point is about the display - if your binned counts look like 1,
100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
counts?

> In my case I am using R for all of the analysis and graphics in a
> new book.  This means that sometimes I have to deal with data sets
> that are more or less a jumble of numbers with patterns in a few
> places.  For instance, the numeric value of integer constants
> appearing as one operand of the binary bitwise-AND operator (see
> figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
> at: www.knosof.co.uk/cbook/bandcons.hist.gz)
>
> qplot(band, binwidth=8, geom="histogram") + scale_y_log()
> does a good job of highlighting the peaks.

I couldn't find that figure, but I'd think geom = "freqpoly" would be
more appropriate.  (I'd also suggest adding a bit more space between
the data and the margins in your figures - they overlap in many
plots).

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.loess and NA/NaN values

2010-08-30 Thread Philipp Pagel


> What you can do is patch the code to add the NAs back after the 
> Prediction step (which many predict() methods do).

Thanks Andy for your hints and especially for digging into the problem
like this! I have, in the meantime, written a simple wrapper around
predict.loess that fills in the NAs, where I would like to have them.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Include local files when running R remotely

2010-08-30 Thread Erik Shilts

Thanks for the response. I'm currently running ESS. I sent a message to the
ESS listserv to see if anyone can help with that since I don't see how to do
it in the Manual or Readme.
Erik


On Mon, Aug 30, 2010 at 2:04 PM, RICHARD M. HEIBERGER wrote:

> Since you are already on emacs, try using ESS.  I am pretty sure we do
> that.
> http://ess.r-project.org
>
> Please follow up on the ESS mailing ess-h...@stat.math.ethz.ch
>
> Rich
>
> On Mon, Aug 30, 2010 at 12:48 PM, Barry Rowlingson
>  wrote:
> > On Mon, Aug 30, 2010 at 5:29 PM, Erik Shilts 
> wrote:
> >> I run R on a remote UNIX server where the data are stored that I ssh
> into
> >> through Emacs, while I store my R scripts on local Windows network
> drives.
> >> So far this arrangement hasn't been a problem, though now I'd like to
> use
> >> source() or a similar function to include other R scripts to get a
> better
> >> handle on my process management. Is this possible when running R
> remotely?
> >> When I try to use source() it tells me that the file doesn't exist since
> >> it's looking for it on the UNIX servers.
> >>
> >> Any help is much appreciated. Thanks!
> >
> >  Sounds like a file sharing problem rather than an R problem - ideas
> > that pop into my head include:
> >
> >  * mounting the Windows drives on the Unix box - this requires sharing
> > them out in Windows, and being able to run smbmount on the unix box.
> >
> >  * get rsync for Windows (maybe install cygwin and get a bunch of
> > useful Unixy tools for Windows) and use that to efficiently copy from
> > one machine to the other. But that relies on you remembering to sync
> > and making sure you dont sync the wrong way. Because then you'll be
> > sunk.
> >
> >  * there may be implementations of NFS on Windows, which mean you
> > could NFS-mount the Unix drives on your Windows box - but I've always
> > found those things to be a bit flaky.
> >
> >  The two options involving mounting will require cooperation of your
> > superuser...
> >
> > Barry
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lattice: limits in reversed order with relation="same"

2010-08-30 Thread Deepayan Sarkar

On Mon, Aug 30, 2010 at 5:51 AM, Thaler,Thorn,LAUSANNE,Applied
Mathematics  wrote:
>> Well, it's actually lattice:::extend.limits(range(x)), but
>> extendrange() does basically the same thing, and is available to the
>> user (i.e., exported), albeit from a different package.
>
> Thanks again Deepayan for the insight.
>
> A followup question though-- in another setting I'd like to have 
> relation="free" to allow for different x scales. Thus, I can use the prepanel 
> function:
>
> myprepanel <- function(x,y,...) list(xlim=rev(range(x)))
>
> rep(1:10, 100)
> z <- factor(sample(10, 1000, T))
> y <- rnorm(1000, x, as.numeric(z))
>
> xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel)
>
> which works as expected. But now I want to reverse the x-axis only for 
> specific panels (z %in% 1:9 say). In the above example 'myprepanel' is called 
> 10 times (well actually it is called nlevels(z) times). Hence, my approach 
> would be to include an 'if' clause in the prepanel function and reverse the 
> xlims only for the appropriate panels.
>
> What I usually do in such cases in panel functions (not beautiful but working 
> though) is something like
>
> if (panel.number() == 1)
>  do something
>
> I'm aware that this is maybe not in the spirit of lattice and that this kind 
> of code maybe breaks as soon as the ordering of panels may change. However, 
> it does the trick for panel functions. It does, however, not work for the 
> prepanel since panel.number() is not defined at this stage (well, actually it 
> returns a 0x0 matrix).
>
> Thus, I've three questions:
>
> 1.) How can I achieve different xlims for different panels in an _automatic_ 
> way? What I've done so far is something like
>
> xyplot(y~x|z, scales=list(x="free"), prepanel=myprepanel, xlim=list(NULL, 
> NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, extendrange(x[z==10])))
>
> Thus, for the first 9 panels the xlims are calculated automatically with 
> myprepanel while for the last one I've to provide the limits explicitly. Is 
> there a more elegant way to do this?
>
> 2.) Within a prepanel function: do I have any possibility to get the 
> information for which panel the prepanel function is called?
>
> 2.) Is the only way to adjust the panel function for different panels using 
> panel.number() or is there another (better) way?
>

That is the recommended way for panels.

The only way really to keep track of prepanel calls is to use an
external counter. E.g.,

prepanel.counter <- 0

myprepanel <- function(x,y,...) {
prepanel.counter <<- prepanel.counter + 1
list(xlim=rev(range(x)))
}

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'mgcv' package, problem with predicting binomial (logit) data

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 11:44 AM, Jef Vlegels wrote:

There was a problem with the data in attachment, here is a better  
version.


Use something like:
data <- read.table("pas_r.txt",header=TRUE,sep=";")
to read it.


I did, but I hate dealing with attached datasets and also am  
uncomfortable dealing with datasets named data, so I messed you code  
up a bit to avoid attach()-ment. I also find it cleaner to use subsets  
of dataframes rather htna coding new variables. Eventually it came  
down to removing the NA for the participant column. I didn't see the  
need for the order() operation and thought it might create problems so  
I just commented it out. (You also had  of misspellings of your model  
names.)


test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),  
family=binomial(logit), data=data.age)

summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data.age,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(data.age$age)
with(data.age, plot(age[I1], test.pred$fit[I1],lty=1, type="l"))
with(data.age, lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2))
with(data.age, lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2))
with(data.age, plot(age[I1], test.pred$fit[I1] , type="l"))

#In a second step, I want to calculate a similar model, but only for
#respondents with a certain characteristic. For example, in this case,  
only

#for male respondents.
#I use a code that looks like this:

#participant_male <- participant[gender=="male"]
# age_male <- age[gender=="male"]

test.males<-gam(participant ~ s(age, fx=FALSE, bs="cr"),  
data=subset(data.age, gender=="male"&!is.na(participant)),

family=binomial(logit), na.action=na.omit)
summary(test.males)
plot(test.males, shade=TRUE)

#I get a nice smoother function in this plot, like I expected.
#Then, when I want to plot the predicted values, I use a code that  
looks like

#this:

Test2.pred <- predict(test.males,newdata=subset(data.age,  
gender=="male"&!is.na(participant)), se.fit=TRUE, type="response")

#I1<-with(subset(data.age, gender=="male"), order(age))
with(subset(data.age, gender=="male"&!is.na(participant)), plot(age,  
Test2.pred$fit, lty=1) )


I have not figured out why the NA's caused so much trouble but was  
eventually driven to exclude them in the subset step by mismatched  
lenght errors from plot(),





Thanks,
Jef

-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: maandag 30 augustus 2010 17:22
To: Jef Vlegels
Subject: FYI offlist Re: [R] 'mgcv' package, problem with predicting
binomial (logit) data


On Aug 30, 2010, at 11:17 AM, Jef Vlegels wrote:


Dear R-help list,

I'm using the mgcv package to plot predictions based on the gam
function.

I predict the chance of being a (frequent) participant at theater
plays vs.
not being a participant by age.
Because my outcome variable is dichotomous, I use the binomial
family with
logit link function.

Dataset in attachment, code to read it in R:


Nope. .sav files are not accepted by the mail-server. See the Posting
Guide. (.txt and .pdf are the safest formats)



data <- read.spss("pas_r.sav")
attach(data)

In a first step I use 'gam' to model my data and 'predict' to
calculate and
plot the predicted values, this all works fine.
My code looks like this:

test <- gam(participant ~ s(age,fx=FALSE,bs='cr'),
family=binomial(logit))
summary(test)
plot(test, shade=TRUE)
gam.check(test)
test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response',
na.action=na.omit)
I1<-order(age)
plot(age[I1], test.pred$fit[I1],lty=1, type="l")
lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2)
lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2)
plot(age[I1], test.pred$fit[I1] , type="l")

I a second step, I want to calculate a similar model, but only for
respondents with a certain characteristic. For example, in this
case, only
for male respondents.
I use a code that looks like this:

participant_male <- participant[gender=="male"]
age_male <- age[gender=="male"]

test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"),
family=binomial(logit), na.action=na.omit)
summary(test2)
plot(test2, shade=TRUE)

I get a nice smoother function in this plot, like I expected.
Then, when I want to plot the predicted values, I use a code that
looks like
this:

Test2.pred <- predict(test5,se.fit=TRUE, type="response")
I1<-order(age_male)
plot(age_male[I1], test2.pred$fit[I1],lty=1)

This last plot, of the predictions, is not what I expect. It's just
a random
scatterplot, not what I would expect from the smoother plot. Does
anybody
know what I did wrong?

Thanks in advance,
Jef Vlegels

Jef Vlegels
Ghent University - Department of Sociology
Korte Meer 3, B-9000 Gent, Belgium
Tel:  09 264 8343
www.psw.UGent.be/sociologie


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide

Re: [R] getNodeSet - what am I doing wrong?

2010-08-30 Thread Duncan Temple Lang

Hi Johannes

 This is a common issue.  The document has a default XML namespace, e.g.
the root node is defined as

 http://www.unimod.org/xmlns/schema/unimod_tables_1";...>
   .

 So you need to specify which namespace to match in the XPath expression
in getNodeSet().  The XML package  provides a "convenient" facility for
this. You need only specify the prefix such as "x" and that will
be bound to the default namespace. You need to specify this in
two places - where you use it in the XPath expression and
in the namespaces argument of getNodeSet()

So
   getNodeSet(test, "//x:modifications_row", "x")

gives you probably what you want.

 D.

On 8/30/10 8:02 AM, Johannes Graumann wrote:
> library(XML)
>> test <- xmlTreeParse(
>> "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE)
>> getNodeSet(test,"//modifications_row")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.loess and NA/NaN values

2010-08-30 Thread Philipp Pagel

On Mon, Aug 30, 2010 at 01:50:03PM +0100, Prof Brian Ripley wrote:
> The underlying problem is your expectations.
> 
> R (unlike S) was set up many years ago to use na.omit as the
> default, and when fitting both lm() and loess() silently omit cases
> with missing values.  So why should prediction from 'newdata' be
> different unless documented to be so (which it is nowadays for
> predict.lm, even though you are adding to the evidence that was a
> mistake)?

Thanks for your insights into the undelying philisophy. I agree that
na.omit is a sensible default for model fitting. But I am not so sure
that quietly omitting unpredictable values is such a good idea -
especially if predict methods for different types of model implement
inconsistent approaches. I see no disadvantage in returning NA where
no prediction/computation is possible -- the value is 'Not Available',
after all. (And the length of the result vector would match
nrow(newdata) which would be handy for most practical purposes)

> loess() is somewhat different from lm() in that it does not in
> general allow extrapolation, and the prediction for Inf and NaN is
> simply undefined.

Of course this is correct but I still think that predict.loess not
only acts in a way that will most likely be surprising to most users
but also inconsistent with itself (Inf vs. NA/NaN). If extrapolation
is the problem Inf should not yield anything but it does (and the same
applies to values outside of the original x-range):

x <- rnorm(15)
y <- rnorm(15)
model.loess <- loess(y~x)
predict(model.loess, data.frame(x=c(0.5, Inf)))
# [1] -0.02508801  NA
predict(model.loess, data.frame(x=min(x)-10))
# [1] NA

Actually, while tracking down my problem I did consider that
extrapolation could be the problem and, according to the last example
in ?loess, tried to set control = loess.control(surface = "direct").
To my surprise, now even Inf fails - although I am much happier with
getting an error message than with silent omission.

Anyway, writing a little wrapper that puts NAs back into results, is
not a big deal and in that respect my problem is solved. 

> Nevertheless, take a look at the version in R-devel (pre-2.12.0)
> which give you more options.

Thanks for that information - I will definitely have a look at that.

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] online documentation only?

2010-08-30 Thread Bob McCall


Greetings:

I recently installed R 2.11.1 for windows. It seems that there is only
online help now. Is there any way to get the local docs? I don't have
always-on high-speed internet. 

thanks,
Bob
-- 
View this message in context: 
http://r.789695.n4.nabble.com/online-documentation-only-tp2400647p2400647.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

2010-08-30 Thread Derek M Jones


Hadley,


That you're displaying something that is mathematically well founded
and meaningful - but my emphasis there was on histogram.  I don't
think a histogram makes sense, but there are other ways of displaying
the same data that would (e.g. a frequency polygon, or maybe a density
plot)


The problem I have with geom = "freqpoly" is that it is not immediately
obvious to the casual reader of the figure that binned data has been
plotted.  The horizontal line at the top of each bar does make that
obvious.  Lots of solid black is an eye sore and using something
like fill="white" helps to solve this problem (although this
currently appears red for me, probably some configuration issue to
sort out).

I'm not sure that a histogram using variable width bins and one log
scale has any meaningful interpretation; having both axis use a log
scale might make sense with variable width bins.


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).


Does there have to be a recognized statistical distribution to use R?


My point is about the display - if your binned counts look like 1,
100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
counts?


Many functions cannot handle log(0) so the safest thing to do is
remove 0s.  What about 1 and other values more than X orders of
magnitude less than the maximum?  This is an issue on any log scaled
plot and invariably they don't appear (and neither do the log(0)
cases).

Having a scale that gets closer to zero without ever getting there
is something that has to be accepted when displaying a log scale.

Logarithms are familiar to a technical readership and using them for
data spanning several orders of magnitude can highlight meaningful
relationships.  A non-technical readership is likely to completely
misunderstand a log scale and I have no idea how to display this
kind of data to such people.


I couldn't find that figure, but I'd think geom = "freqpoly" would be
more appropriate.  (I'd also suggest adding a bit more space between
the data and the margins in your figures - they overlap in many
plots).


My mistake, I as looking at a very old printed copy.  See figure 1234.1
These figures are from a previous book
www.knosof.co.uk/cbook
which used grap to draw all the graphs
www.lunabase.org/~faber/Vault/software/grap/
with the numbers being extracted and processed by various C programs and
awk scripts.

--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd mailto:de...@knosof.co.uk
Source code analysis   http://www.knosof.co.uk

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] online documentation only?

2010-08-30 Thread Steve Lianoglou

Hi,

On Mon, Aug 30, 2010 at 3:53 PM, Bob McCall  wrote:
>
> Greetings:
>
> I recently installed R 2.11.1 for windows. It seems that there is only
> online help now. Is there any way to get the local docs? I don't have
> always-on high-speed internet.

You don't have to be connected to the internet. You just need an R
session running.

You'll noticed on first invocation, R will start some web server, eg:

R> ?sort
starting httpd help server ... done

And your web browser will open to:

http://127.0.0.1:X/library/base/html/sort.html

Where  is some random port that the current R process has some
type of minimal web server running that is used to display the help
files ... this assumes that you have options(help_type='html'), or
some such.

No external internet connection necessary.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Brown-Forsythe test of equality of MEANS

2010-08-30 Thread Iasonas Lamprianou


Dear friends,
two years ago (as I found on the web) Paul sent the following message but I was 
not able to find if he got an answer. Today I have the same question and it 
would be great if I could find out that this test has been implemented 
(somehow) in R. Please do not confuse it with the Brown-Forsythe test of 
equality of variances. Thank you:

I've been searching around for a function for computing the Brown-Forsythe F* 
statistic which is a substitute for the normal ANOVA F statistic for when there 
are unequal variances, and when there is evidence of non-normality. A couple of 
other people have asked this question, the responses I found have been:

?oneway.test

However, that function appears to use the Welch W statistic which, while good 
at handling unequal variances, is not as good as F* at handling non-normal 
distributions (or so my textbook tells me). So, two questions:

   1. Is there a function ready to use for calculating the Brown-Forsythe F*?
   2. If not, what do people use for checking the results of a (one-way) ANOVA 
when there is non-normality as well as non-constant variances? 

Thanks,

  
Dr. Iasonas Lamprianou


Assistant Professor (Educational Research and Evaluation)
Department of Education Sciences
European University-Cyprus
P.O. Box 22006
1516 Nicosia
Cyprus 
Tel.: +357-22-713178
Fax: +357-22-590539


Honorary Research Fellow
Department of Education
The University of Manchester
Oxford Road, Manchester M13 9PL, UK
Tel. 0044  161 275 3485
iasonas.lampria...@manchester.ac.uk




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] online documentation only?

2010-08-30 Thread Johnson, Cedrick W.


try help.start()

that starts a local help process (within R) and open your browser to 
that local location.


-c

On 08/30/2010 03:53 PM, Bob McCall wrote:


Greetings:

I recently installed R 2.11.1 for windows. It seems that there is only
online help now. Is there any way to get the local docs? I don't have
always-on high-speed internet.

thanks,
Bob


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] New to R

2010-08-30 Thread Josh HIll

I'm relatively new to R, and not particularly adept yet, but I was wondering
if there was a simply way to simulate missing data that are MAR, MNAR and
MCAR. I've got a good work-around for the MCAR data, but it's sort of hard
to work with.

Josh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] online documentation only?

2010-08-30 Thread Bob McCall


Thanks for the replies. I tried help.start() and ?foo but my browser opened
and was blank. It just said can't open site as I was offline. Must be my
system. I'll try again. Maybe I can get it sorted out. Maybe the install was
bad.

Thanks again,

Bob
-- 
View this message in context: 
http://r.789695.n4.nabble.com/online-documentation-only-tp2400647p2400682.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Brown-Forsythe test of equality of MEANS

2010-08-30 Thread Bert Gunter

Inline below.

-- Bert

On Mon, Aug 30, 2010 at 1:05 PM, Iasonas Lamprianou wrote:

>
> Dear friends,
> two years ago (as I found on the web) Paul sent the following message but I
> was not able to find if he got an answer. Today I have the same question and
> it would be great if I could find out that this test has been implemented
> (somehow) in R. Please do not confuse it with the Brown-Forsythe test of
> equality of variances. Thank you:
>
> I've been searching around for a function for computing the Brown-Forsythe
> F* statistic which is a substitute for the normal ANOVA F statistic for when
> there are unequal variances, and when there is evidence of non-normality.

False, I think, as I'm not entirely clear on your meaning. Brown-Fosythe is
a test for the equality of spreads among groups. From Wikipedia:

The transformed response variable is constructed to measure the
spreadin each
group. Let
 [image: z_{ij}=\left\vert y_{ij} - \tilde{y}_j \right\vert]

where [image: \tilde{y}_j] is the
medianof group
*j*. The BrownForsythe test statistic is the model *F* statistic from a one
way ANOVA on *zij*:

In particular, it is NOT " a substitute for the normal ANOVA F statistic for
when there are unequal variances, and when there is evidence of
non-normality."

--

Bert Gunter

Genentech Noclinical Statistics

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Brown-Forsythe test of equality of MEANS

2010-08-30 Thread JRG

On 30 Aug 2010 at 13:25, Bert Gunter wrote:

> Inline below.
> 
> -- Bert

Wrong.  There *is* a Brown-Forsythe test of equality of means given 
heterogeneity of variance.  
[Kirk's experimental design tst, 3rd Ed. p. 155 describes the test.]

---JRG

John R. Gleason

> 
> On Mon, Aug 30, 2010 at 1:05 PM, Iasonas Lamprianou 
> wrote:
> 
> >
> > Dear friends,
> > two years ago (as I found on the web) Paul sent the following message but I
> > was not able to find if he got an answer. Today I have the same question and
> > it would be great if I could find out that this test has been implemented
> > (somehow) in R. Please do not confuse it with the Brown-Forsythe test of
> > equality of variances. Thank you:
> >
> > I've been searching around for a function for computing the Brown-Forsythe
> > F* statistic which is a substitute for the normal ANOVA F statistic for when
> > there are unequal variances, and when there is evidence of non-normality.
> 
> 
> False, I think, as I'm not entirely clear on your meaning. Brown-Fosythe is
> a test for the equality of spreads among groups. From Wikipedia:
> 
> 
> The transformed response variable is constructed to measure the
> spreadin each
> group. Let
>  [image: z_{ij}=\left\vert y_{ij} - \tilde{y}_j \right\vert]
> 
> where [image: \tilde{y}_j] is the
> medianof group
> *j*. The Brown-Forsythe test statistic is the model *F* statistic from a one
> way ANOVA on *zij*:
> 
> In particular, it is NOT " a substitute for the normal ANOVA F statistic for
> when there are unequal variances, and when there is evidence of
> non-normality."
> 
> --
> 
> Bert Gunter
> 
> Genentech Noclinical Statistics
> 
>   [[alternative HTML version deleted]]
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread Dennis Murphy

Hi:

You've already gotten some good replies re aggregate() and plyr; here are
two more choices, from packages doBy and data.table, plus the others for
a contained summary:

 key <- c(1,1,1,2,2,2)
 val1 <- rnorm(6)
 indf <- data.frame( key, val1)
 outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
 outdf

# Alternatives:

# aggregate (base) with new formula interface

# write a small function to return multiple outputs
f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE))

aggregate(val1 ~ key, data = indf, FUN = f)
  key  val1.meanval1.sd
1   1 -0.9783589  0.6378922
2   2  0.2816016  1.4490699

# package doBy   (get the same output)

library(doBy)
summaryBy(val1 ~ key, data = indf, FUN = f)
  key  val1.mean   val1.sd
1   1 -0.9783589 0.6378922
2   2  0.2816016 1.4490699

# package plyr

library(plyr)
ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
  key   meansd
1   1 -0.9783589 0.6378922
2   2  0.2816016 1.4490699

# package data.table

library(data.table)
indt <- data.table(indf)
indt[, list(mean = mean(val1), sd = sd(val1)), by = list(as.integer(key))]
 key   meansd
[1,]   1 -0.9783589 0.6378922
[2,]   2  0.2816016 1.4490699

It's a cornucopia! :) Multiple grouping variables are no problem with these
functions, BTW.

HTH,
Dennis


On Mon, Aug 30, 2010 at 7:39 AM, ivo welch  wrote:

> serious?
>
>  key <- c(1,1,1,2,2,2)
>  val1 <- rnorm(6)
>  indf <- data.frame( key, val1)
>  outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
>  outdf
> indf$key: 1
>  m.key m.val1  s.key s.val1
> 1. 0.6005 0. 1.0191
>
> --
> indf$key: 2
>  m.key  m.val1   s.key  s.val1
>  2. -0.8177  0.  0.3978
>
> > as.data.frame(by(indf, indf$key, function(x) c(m=mean(x), s=sd(x
> Error in as.data.frame.default(by(indf, indf$key, function(x) c(m =
> mean(x),  :
>  cannot coerce class '"by"' into a data.frame
>
> /iaw
> 
> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
>
>
>
> On Mon, Aug 30, 2010 at 9:36 AM, Henrique Dallazuanna 
> wrote:
> > Try this:
> >
> > as.data.frame(by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) ))
> >
> > On Mon, Aug 30, 2010 at 10:19 AM, ivo welch  wrote:
> >>
> >> dear R experts:
> >>
> >> has someone written a function that returns the results of by() as a
> >> data frame?   of course, this can work only if the output of the
> >> function that is an argument to by() is a numerical vector.
> >> presumably, what is now names(byobject) would become a column in the
> >> data frame, and the by object's list elements would become columns.
> >> it's a little bit like flattening the by() output object (so that the
> >> name of the list item and its contents become the same row), and
> >> having the right names for the columns.  I don't know how to do this
> >> quickly in the R way.  (Doing it slowly, e.g., with a for loop over
> >> the list of vectors, is easy, but would not make a nice function for
> >> me to use often.)
> >>
> >> for example, lets say my by() output is currently
> >>
> >> by( indf, indf$charid, function(x) c(m=mean(x), s=sd(x)) )
> >>
> >> $`A`
> >> [1] 2 3
> >> $`B`
> >> [2] 4 5
> >>
> >> then the revised by() would instead produce
> >>
> >> charid  m  s
> >> A  2  3
> >> B  4  5
> >>
> >> working with data frames is often more intuitive than working with the
> >> output of by().  the R wizards are probably chuckling now about how
> >> easy this is...
> >>
> >> regards,
> >>
> >> /iaw
> >>
> >> 
> >> Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Henrique Dallazuanna
> > Curitiba-Paraná-Brasil
> > 25° 25' 40" S 49° 16' 22" O
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] log y 'axis' of histogram

2010-08-30 Thread David Scott


 On 31/08/10 03:37, Derek M Jones wrote:

Hadley,


I have counts ranging over 4-6 orders of magnitude with peaks
occurring at various 'magic' values.  Using a log scale for the
y-axis enables the smaller peaks, which would otherwise
be almost invisible bumps along the x-axis, to be seen

That doesn't justify the use of a _histogram_  - and regardless of

The usage highlights meaningful characteristics of the data.
What better justification for any method of analysis and display is
there?


what distributional display you use, logging the counts imposes some
pretty heavy restrictions on the shape of the distribution (e.g. that
it must not drop to zero).

Does there have to be a recognized statistical distribution to use R?
In my case I am using R for all of the analysis and graphics in a
new book.  This means that sometimes I have to deal with data sets
that are more or less a jumble of numbers with patterns in a few
places.  For instance, the numeric value of integer constants
appearing as one operand of the binary bitwise-AND operator (see
figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
at: www.knosof.co.uk/cbook/bandcons.hist.gz)

qplot(band, binwidth=8, geom="histogram") + scale_y_log()
does a good job of highlighting the peaks.


It may be useful for your purposes, but that doesn't necessarily make
it a meaningful graphic.

Doesn't being useful for my purpose make it meaningful, at least for me
and I hope my readers?

Hadley is correct about the problem of where to end the bars when trying 
to draw a log-histogram: basically you have to decide to cut them off 
somewhere. He is also right that a log-histogram is perhaps not a great 
graphic to use. However, they are used and indeed there is one in the 
Fieller, Flenley, Olbricht paper (published in Applied Statistics, now 
JRSS C) for example. I haven't searched for others, but certainly when I 
wrote a log-histogram routine it wasn't because I thought of doing such 
a plot all on my own.


A number of authors, including Barndorff-Nielsen in at least some of his 
papers (I haven't gone back and checked all his older work) just plot 
the midpoints of the tops of the log-histogram. (That is an option in 
logHist). Another approach is to fit an empirical density to the data 
and plot the log-density. That matches the advice often seen in this 
forum that plotting empirical density functions is preferable to drawing 
histograms. My feeling is that either of these two approaches is 
probably preferable to using log-histograms for the reasons Hadley 
enunciated. When plotting data plus a fitted curve, the midpoints 
approach does have the advantage of distinguishing data and theoretical 
curve more clearly.


Overall the idea of a plot with a logged y-axis is definitely a good one 
and its use is endemic in literature concerned with heavy-tailed 
distributions, particularly finance. The advantage is the clarity 
offered regarding tail behaviour, where for example exponential tails in 
the density correspond to straight lines in the logged y-axis plot.


Hope this helps.

David Scott


--
_
David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Regarding naive baysian classifier in R

2010-08-30 Thread Vishnampettai


Hi,
  I have a small doubt regarding naive Bayes. I am able to classify the
data's properly but i am just stuck up with how to get the probability
values for naive bayes. In the case of SVM we have "attr" function that
helps in displaying the probability values. Is there any function similar to
"attr" in naive Bayes that can be used for displaying the attribute values.
my code is given below:
library(e1071)
train<-read.table("C:/Users/Aadhithya/Desktop/ThesisDatasetsComb/Part1/Dataset1/groups/train/D1TRg10.txt",header=T);
test<-read.table("C:/Users/Aadhithya/Desktop/ThesisDatasetsComb/Part1/Dataset1/groups/test/D1TEg10.txt",header=T);
length(test);
cl <- c(c(rep("ALL",10), rep("AML",10)));
cl <- factor(cl)
model <- naiveBayes(t(train),cl);
pred<-predict(model, t(test),decision.values=TRUE,probability=TRUE)
attr(pred,"probabilities");
table(pred,t(cl)); 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Regarding-naive-baysian-classifier-in-R-tp2400727p2400727.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Brown-Forsythe test of equality of MEANS

2010-08-30 Thread Bert Gunter

Thanks. I stand corrected, then.

-- Bert

On Mon, Aug 30, 2010 at 1:45 PM, JRG  wrote:

> On 30 Aug 2010 at 13:25, Bert Gunter wrote:
>
> > Inline below.
> >
> > -- Bert
>
> Wrong.  There *is* a Brown-Forsythe test of equality of means given
> heterogeneity of variance.
> [Kirk's experimental design tst, 3rd Ed. p. 155 describes the test.]
>
> ---JRG
>
> John R. Gleason
>
> >
> > On Mon, Aug 30, 2010 at 1:05 PM, Iasonas Lamprianou <
> lampria...@yahoo.com>wrote:
> >
> > >
> > > Dear friends,
> > > two years ago (as I found on the web) Paul sent the following message
> but I
> > > was not able to find if he got an answer. Today I have the same
> question and
> > > it would be great if I could find out that this test has been
> implemented
> > > (somehow) in R. Please do not confuse it with the Brown-Forsythe test
> of
> > > equality of variances. Thank you:
> > >
> > > I've been searching around for a function for computing the
> Brown-Forsythe
> > > F* statistic which is a substitute for the normal ANOVA F statistic for
> when
> > > there are unequal variances, and when there is evidence of
> non-normality.
> >
> >
> > False, I think, as I'm not entirely clear on your meaning. Brown-Fosythe
> is
> > a test for the equality of spreads among groups. From Wikipedia:
> >
> >
> > The transformed response variable is constructed to measure the
> > spreadin each
> > group. Let
> >  [image: z_{ij}=\left\vert y_{ij} - \tilde{y}_j \right\vert]
> >
> > where [image: \tilde{y}_j] is the
> > medianof group
> > *j*. The Brown-Forsythe test statistic is the model *F* statistic from a
> one
> > way ANOVA on *zij*:
> >
> > In particular, it is NOT " a substitute for the normal ANOVA F statistic
> for
> > when there are unequal variances, and when there is evidence of
> > non-normality."
> >
> > --
> >
> > Bert Gunter
> >
> > Genentech Noclinical Statistics
> >
> >   [[alternative HTML version deleted]]
> >
> >
>
>
>


-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Posthoc test for 3-way interaction (log-linear model)

2010-08-30 Thread Sachi Ito

Hi,

I have analyzed my data using log-linear model  as seen below:


> yes.no <- c("Yes","No")
> tk <- c("On","Off")
> ats <- c("S","V","M")

> L <- gl(2,1,12,yes.no)
> T <- gl(2,2,12,tk)
> A <- gl(3,4,12,ats)
> n <- c(1056,4774,22,283,326,2916,27,360,274,1770,15,226)

> library(MASS)
> l.loglm <- data.frame(A,T,L,n)
> l.loglm

A   T   L   n
1  S On  Yes1056
2  S On   No4774
3  S Off Yes22
4  S Off  No283
5  V On  Yes 326
6  V On   No2916
7  V OffYes 27
8  V Off  No360
9  M OnYes  274
10M  OnNo   1770
11M  OffYes 15
12   M   OffNo  226


Model comparison based on likelihood ratio chi-square statistics revealed that 
the 3-way interaction (saturated) model was marginally significantly different 
from the 2-way association model (see below):


> anova(loglm.null,loglm.LA.LT.AT)
LR tests for hierarchical log-linear models

Model 1:
 n ~ T + A + L 
Model 2:
 n ~ L:T + L:A + A:T 

Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
Model 1   305.997600  7
Model 2 4.620979  2 301.376622 50.0
Saturated   0.00  0   4.620979 20.09921


Now, I'd like to run a post-hoc test and see which one of the 3 levels of the 
variable "A" is significantly different from each other (S vs. V vs. M). 

I'd greatly appreciate if anyone can let me know how to run the post-hoc test.

Thank you in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help With Post-hoc Testing

2010-08-30 Thread Bruce Johnson

I am trying to do post-hoc tests associated with a repeated measures
analysis with on factor nested within respondents.  

The factor (SOI) has 17 levels.  The overall testing is working fine, but I
can't seem to get the multiple comparisons to work.

The first step is to "stack" the data. 

 Then I used "lme" to specify and test the overall model. 

 Finally I'm trying to use "glht" to do multiple comparisons.

Here's my code and the resulting output.

> ImpSoi<-ImpData[,c("iobs", soi)]

> ImpSoi[1:5,]

  iobs soi1 soi2 soi3 soi4 soi5 soi6 soi7 soi8 soi9 soi10 soi11 soi12 soi13
soi14 soi15 soi16

1   32 7.00 7.00 7.00 7.00 7.00 6.00  7.0 7.00 5.00  7.00  7.00  7.00  7.00
5.00  7.00  6.00

2   70 5.95 4.95 7.00 4.95 5.20 5.40  4.2 3.95 4.15  4.95  4.85  4.95  6.75
5.95  5.20  5.10

3   78 3.00 1.00 4.75 2.75 3.00 4.50  4.0 4.00 1.50  4.00  4.00  4.50  2.50
3.00  3.75  4.00

4  104 3.75 3.50 6.25 5.25 4.25 3.75  4.0 5.25 4.75  4.75  5.00  5.75  6.00
4.00  4.75  3.75

5  117 5.00 5.00 4.00 5.00 2.00 4.62  5.0 4.00 4.00  4.00  5.00  4.00  4.70
4.70  5.00  2.00

  soi17

1  7.00

2  5.15

3  4.00

4  5.50

5  4.00

> stack <-
reshape(ImpSoi,varying=soi,timevar="solution",idvar="iobs",sep="",
dir="long")

> solution <- factor(solution,levels(1:17),)

> stack[1:5,]

  iobs solution  soi

32.1321 7.00

70.1701 5.95

78.1781 3.00

104.1  1041 3.75

117.1  1171 5.00

> Lmes.mod <- lme(soi ~ solution + iobs, random = ~1 | iobs/solution, data =
stack)

> anova(Lmes.mod)

numDF denDF   F-value p-value

(Intercept) 1  2383 2894.8342  <.0001

solution1  23830.0003  0.9870

iobs1   1470.0126  0.9109

> summary(glht(Lmes.mod, linfct=mcp(solution="Tukey")))

Error in mcp2matrix(model, linfct = linfct) : 

  Variable(s) 'solution' of class 'numeric' is/are not contained as a factor
in 'model'.

> 

I don't understand the error since "solution" is clearly a factor in the
model.

Any suggestions would be welcome.

Bruce

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread Gabor Grothendieck

On Mon, Aug 30, 2010 at 3:54 PM, Dennis Murphy  wrote:
> Hi:
>
> You've already gotten some good replies re aggregate() and plyr; here are
> two more choices, from packages doBy and data.table, plus the others for
> a contained summary:
>
>  key <- c(1,1,1,2,2,2)
>  val1 <- rnorm(6)
>  indf <- data.frame( key, val1)
>  outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
>  outdf
>
> # Alternatives:
>
> # aggregate (base) with new formula interface
>
> # write a small function to return multiple outputs
> f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE))
>
> aggregate(val1 ~ key, data = indf, FUN = f)
>  key  val1.mean    val1.sd
> 1   1 -0.9783589  0.6378922
> 2   2  0.2816016  1.4490699
>
> # package doBy   (get the same output)
>
> library(doBy)
> summaryBy(val1 ~ key, data = indf, FUN = f)
>  key  val1.mean   val1.sd
> 1   1 -0.9783589 0.6378922
> 2   2  0.2816016 1.4490699
>
> # package plyr
>
> library(plyr)
> ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
>  key       mean        sd
> 1   1 -0.9783589 0.6378922
> 2   2  0.2816016 1.4490699
>
> # package data.table
>
> library(data.table)
> indt <- data.table(indf)
> indt[, list(mean = mean(val1), sd = sd(val1)), by = list(as.integer(key))]
>     key       mean        sd
> [1,]   1 -0.9783589 0.6378922
> [2,]   2  0.2816016 1.4490699
>
> It's a cornucopia! :) Multiple grouping variables are no problem with these
> functions, BTW.
>
> HTH,


And here are yet four more:

>
> f.by <- function(x) c(key = x$key[1], mean = mean(x$val), sd = sd(x$val))
> do.call(rbind, by(indf, indf["key"], f.by))
  keymeansd
1   1 0.006794852 0.3779713
2   2 0.251890650 0.4379315
>
> library(sqldf)
> sqldf("select key, avg(val1) mean, stdev(val1) sd from indf group by key")
  keymeansd
1   1 0.006794852 0.3779713
2   2 0.251890650 0.4379315
>
> library(remix)
> remix(val1 ~ key, transform(indf, key = factor(key)), funs = c(mean, sd))
val1 ~ key
==

+-+---+--+---+--+
|| mean  | sd   |
+=+===+==+===+==+
| key | 1 | val1 | 0.01  | 0.38 |
+ +---+--+---+--+
| | 2 | val1 | 0.25  | 0.44 |
+-+---+--+---+--+
>
> library(Hmisc)
> summary(val1 ~ key, indf, fun = function(x) c(mean = mean(x), sd = sd(x)))
val1N=6

+---+-+-+---+-+
|   | |N|mean   |sd.val1  |
+---+-+-+---+-+
|key|1|3|0.006794852|0.3779713|
|   |2|3|0.251890650|0.4379315|
+---+-+-+---+-+
|Overall| |6|0.129342751|0.3897180|
+---+-+-+---+-+

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reordering levels error

2010-08-30 Thread Felipe Carrillo

Please consider the following dataset:
I want to reorder the levels by year but get the following error:

Error in tapply(v, x, FUN, ...) : arguments must have same length
 
I suspect that I need to add the levels before I melt the dataset
 but either way I have only use 'reorder' once before and can't figure
 out how it works..Thanks for any advice.
 
winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
    99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
    35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
    2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
    372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
    0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
    34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
    140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
    18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
    827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
    535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
    205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
    445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
    103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
    2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
    203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
    0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
    6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
    1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
    94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
    22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
    1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
"BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class = "data.frame", 
row.names = c(NA,
-53L))
 str(winter)
 w_melt <- melt(winter,id="week",variable="year");str(w_melt)
 #  Reorder DOESN'T WORK
 w_melt <- 
reorder(w_melt$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
pdf("wtest.pdf")
ggplot(w_melt,aes(week,value/1000,colour=year,order= - as.numeric(year))) + 
geom_line(size=.75)+  theme_bw() +
opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
 labs(y="Number of individuals X 1,000",x="week")
dev.off()
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Brown-Forsythe test of equality of MEANS

2010-08-30 Thread Iasonas Lamprianou

Thank you for replying. But there is another test with the same name which 
tests for equality of means. It is a robust version of ANOVA, like the Welch 
(ANOVA) test. They are both available at SPSS. the Welch test is available 
through the oneway.test in R but the Brown-Forsythe test for the equality of 
means is not. You may have a look at 


De
Beuckelaer, A. (1996). A closer examination on some parametric
alternatives to the ANOVA F test. Statistical Papers, 37, 291-305 if you like 
to see what I mean. 
But thank you for replying so soon. I hope that somebody else solved this 
issue. Also, by the way, I have a second question, if anyone can help. I am 
looking for an R package to run a Dunnett's T3 or C test for Post Hoc 
comparisons when we 'suffer' from heteroscedasticity. (Even Games-Howell would 
do if the Dunnett's tests are not available )

Thank you again for your help. 




Dr. Iasonas Lamprianou





Assistant Professor (Educational Research and Evaluation)

Department of Education Sciences

European University-Cyprus

P.O. Box 22006

1516 Nicosia

Cyprus 

Tel.: +357-22-713178

Fax: +357-22-590539





Honorary Research Fellow

Department of Education

The University of Manchester

Oxford Road, Manchester M13 9PL, UK

Tel. 0044  161 275 3485

iasonas.lampria...@manchester.ac.uk

--- On Mon, 30/8/10, Bert Gunter  wrote:

From: Bert Gunter 
Subject: Re: [R] Brown-Forsythe test of equality of MEANS
To: "Iasonas Lamprianou" 
Cc: r-help@r-project.org
Date: Monday, 30 August, 2010, 21:25

Inline below.
Â 
-- Bert


On Mon, Aug 30, 2010 at 1:05 PM, Iasonas Lamprianou  
wrote:


Dear friends,
two years ago (as I found on the web) Paul sent the following message but I was 
not able to find if he got an answer. Today I have the same question and it 
would be great if I could find out that this test has been implemented 
(somehow) in R. Please do not confuse it with the Brown-Forsythe test of 
equality of variances. Thank you:


I've been searching around for a function for computing the Brown-Forsythe F* 
statistic which is a substitute for the normal ANOVA F statistic for when there 
are unequal variances, and when there is evidence of non-normality.

Â 
False, I think, as I'm not entirely clear on your meaning. Brown-Fosythe is a 
test for the equality of spreads among groups. From Wikipedia:
Â 

The transformed response variable is constructed to measure the spread in each 
group. Let


 
where  is the median of group j. The BrownâForsythe test statistic is the 
model F statistic from a one way ANOVA on zij:

In particular, it is NOT " a substitute for the normal ANOVA F statistic for 
when there are unequal variances, and when there is evidence of non-normality."
--
Bert Gunter
Genentech Noclinical Statistics
Â 



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reordering levels error

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 5:25 PM, Felipe Carrillo wrote:


Please consider the following dataset:
I want to reorder the levels by year but get the following error:

Error in tapply(v, x, FUN, ...) : arguments must have same length

I suspect that I need to add the levels before I melt the dataset
 but either way I have only use 'reorder' once before and can't figure
 out how it works..Thanks for any advice.

winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,


I believe you omitted a necessary"structure" call above.


34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
"BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class =  
"data.frame",

row.names = c(NA,
-53L))
 str(winter)
 w_melt <- melt(winter,id="week",variable="year");str(w_melt)
 #  Reorder


Of course it "DOESN'T WORK". The second argument is much shorter than  
the first.



 w_melt <-
reorder(w_melt 
$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))


> ?reorder

(Not that you would want to have it "work" for the reasons cited below.)

Apparently all you want to do is reverse the ordering of w_melt$year,  
so assigning the output of reorder to what used to be a full dataframe  
is going to create havoc.


>  w_melt <- melt(winter,id="week",variable="year");str(w_melt)
'data.frame':   318 obs. of  3 variables:
 $ week : int  26 27 28 29 30 31 32 33 34 35 ...
 $ year : Factor w/ 6 levels "BY2010","BY2009",..: 1 1 1 1 1 1 1 1 1  
1 ...

 $ value: int  0 0 460 1126 1755 11153 27998 56336 12486 0 ...
>
>  w_melt <- melt(winter,id="week",variable="year");str(w_melt)
'data.frame':   318 obs. of  3 variables:
 $ week : int  26 27 28 29 30 31 32 33 34 35 ...
 $ year : Factor w/ 6 levels "BY2010","BY2009",..: 1 1 1 1 1 1 1 1 1  
1 ...

 $ value: int  0 0 460 1126 1755 11153 27998 56336 12486 0 ...
> levels(w_melt$year)
[1] "BY2010" "BY2009" "BY2008" "BY2007" "BY2006" "BY2005"

So try instead:

> w_melt$year <- reorder(w_melt$year, 7-as.numeric(w_melt$year))
> levels(w_melt$year)
[1] "BY2005" "BY2006" "BY2007" "BY2008" "BY2009" "BY2010"

You have not described what you are trying to show, so I have not  
proceeded any further.



pdf("wtest.pdf")
ggplot(w_melt,aes(week,value/1000,colour=year,order= -  
as.numeric(year))) +

geom_line(size=.75)+  theme_bw() +
opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
 labs(y="Number of individuals X 1,000",x="week")
dev.off()

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mai

[R] search path for attaching packages

2010-08-30 Thread Atkinson, Elizabeth J. (Beth)

Several of us locally are puzzling over the following problem:

We have a central repository of R packages on our linux system.
Occasionally, we'd like to install a different version of the same
package (such as making updates to the survival package before it is
released to the larger group, or a downloaded beta version).   When
using the library function, it appears that lib.loc is not being used
and instead the central repository is being used.  We've even tried
modifying .libPaths( ) so that our local path is first in the search
list.  Is this a bug or a "feature"?

For example: 

>
install.packages('DBI',lib='/data1/bsi/epi/s105816.SNPlinkage/CNV/rlib')
> library(DBI,
lib.loc='/data1/bsi/epi/s105816.SNPlinkage/CNV/rlib')

I can look in my directory and see that DBI is version 0.2-5,
but when I use the library command I get the version installed in the
central location instead of in my specified directory.  

> library(help=DBI)

Information on package 'DBI'

Package:   DBI
Version:   0.2-4
Title: R Database Interface
Author:R Special Interest Group on Databases (R-SIG-DB)
Maintainer:David A. James 
Depends:   R (>= 2.3.0), methods
LazyLoad:  yes
License:   LGPL (version 2 or later)
Collate:   DBI.R Util.R zzz.R
Packaged:  Tue Oct 16 21:43:31 2007; dj
Built: R 2.10.0; ; 2009-10-26 17:51:14 UTC; unix

Further information is available in the following vignettes in
directory
'/usr/local/biotools/r/R-2.10.0/lib64/R/library/DBI/doc':

> version
   _
platform   x86_64-unknown-linux-gnu 
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  10.0 
year   2009 
month  10   
day26   
svn rev50208
language   R
version.string R version 2.10.0 (2009-10-26)

Beth Atkinson, MS | Statistician III | Division of Biomedical Statistics
& Informatics| Assistant Professor of Biostatistics  
Phone: 507-284-0431 | Admin Assistant: 507-266-4556 | Fax: 507-284-9542
| atkin...@mayo.edu
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 |
www.mayoclinic.org



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reordering levels error

2010-08-30 Thread Felipe Carrillo

Sorry about the structure thing, I basically want my levels on this order:
 w_melt <- 
reorder(w_melt$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
 Here's the new dataset , please discard the reverse year, I was just trying it 
but it didn't do what I wanted
and forgot to delete it. 
With ggplot2 I am trying to show the line with max value above and then show 
the 
other lines in the same order as the
w_melt reorder code above.

winter <- structure(list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
    99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
    35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
    2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
    372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
    0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
    34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
    140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
    18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
    827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
    535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
    BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
    205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
    445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
    103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
    2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
    203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
    0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
    6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
    1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
    94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
    22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
    1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
"BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class = "data.frame", 
row.names = c(NA,
-53L))
winter
 w_melt <- melt(winter,id="week",variable="year");str(w_melt)
 #  Reorder DOESN'T WORK
 w_melt <- 
reorder(w_melt$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
pdf("wtest.pdf")
ggplot(w_melt,aes(week,value/1000,colour)) + geom_line(size=.75)+  theme_bw() +
opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
 labs(y="Number of individuals X 1,000",x="week")
dev.off()

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA



- Original Message 
> From: David Winsemius 
> To: r-help Help 
> Sent: Mon, August 30, 2010 2:49:15 PM
> Subject: Re: [R] reordering levels error
> 
> 
> On Aug 30, 2010, at 5:25 PM, Felipe Carrillo wrote:
> 
> > Please consider the following dataset:
> > I want to reorder the levels by year but get the following error:
> > 
> > Error in tapply(v, x, FUN, ...) : arguments must have same length
> > 
> > I suspect that I need to add the levels before I melt the dataset
> >  but either way I have only use 'reorder' once before and can't figure
> >  out how it works..Thanks for any advice.
> > 
> > winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
> 
> I believe you omitted a necessary"structure" call above.
> 
> > 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
> > 47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
> > 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
> > 22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
> > 11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> > 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> > 0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
> > 33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
> > 345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
> > 6846

Re: [R] reordering levels error

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 6:09 PM, Felipe Carrillo wrote:

Sorry about the structure thing, I basically want my levels on this  
order:

 w_melt <-
reorder(w_melt 
$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
 Here's the new dataset , please discard the reverse year, I was  
just trying it

but it didn't do what I wanted
and forgot to delete it.
With ggplot2 I am trying to show the line with max value above and  
then show the

other lines in the same order as the w_melt reorder code above.


I cannot parse that into English ... too many references to undefined  
items and indefinite pronouns to arrive at a meaningful expanded  
sentence. "Max value above"? "Other lines in order"? Try your plotting  
with the simple modification offered below and then report back with  
more detail and fewer indefinite pronouns and vague "as aboves".




winter <- structure(list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L,  
33L,

34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
"BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class =  
"data.frame",

row.names = c(NA,
-53L))
winter
 w_melt <- melt(winter,id="week",variable="year");str(w_melt)
 #  Reorder DOESN'T WORK
 w_melt <-
reorder(w_melt 
$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
Let's drop you misguided use of reorder and use the correct method for  
reversing the print order of a factor. (Let's also drop the misguided  
assignment of a factor to what is the name of hte entire dataset.)



 w_melt$year <- factor(w_melt$year, levels= rev(levels(w_melt$year) ) )
# Should reverse the order


pdf("wtest.pdf")
ggplot(w_melt,aes(week,value/1000,colour)) + geom_line(size=.75)+   
theme_bw() +

opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
 labs(y="Number of individuals X 1,000",x="week")
dev.off()

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA



- Original Message 

From: David Winsemius 
To: r-help Help 
Sent: Mon, August 30, 2010 2:49:15 PM
Subject: Re: [R] reordering levels error


On Aug 30, 2010, at 5:25 PM, Felipe Carrillo wrote:


Please consider the following dataset:
I want to reorder the levels by year but get the following error:

Error in tapply(v, x, FUN, ...) : arguments must have same length

I suspect that I need to add the levels before I melt the dataset
  but either way I have only use 'reorder' once before and can't  
figure

  out how it works..Thanks for any advice.

winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,


I believe you omitted a necessary"structure" call above.

[R] par("usr") when used in heatmap()

2010-08-30 Thread array chip

Hi, is there anyway I can retrieve the user coordinates for the region of the 
heatmap (only the heatmap, not include dendrogram, x- & y- axis annotation). I 
found that par("usr") didn't give the user coordinates that I want. I want 
those 
user coordinates to add some additional information to the outside of the 
heatmap region (some text, some lines, etc). 


par("usr") works perfectly with image().?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] different interface to by (tapply)?

2010-08-30 Thread ivo welch

mercy!!! ;-)


thanks, everyone.  sure beats me trying to reinvent a slower version of the
wheel.  came in very handy.

I think it would be nice to see some of these pointers in the "?by" manual
page.  not sure who to ask to do this, but maybe this person reads r-help.

/iaw


Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)




On Mon, Aug 30, 2010 at 5:23 PM, Gabor Grothendieck  wrote:

> On Mon, Aug 30, 2010 at 3:54 PM, Dennis Murphy  wrote:
> > Hi:
> >
> > You've already gotten some good replies re aggregate() and plyr; here are
> > two more choices, from packages doBy and data.table, plus the others for
> > a contained summary:
> >
> >  key <- c(1,1,1,2,2,2)
> >  val1 <- rnorm(6)
> >  indf <- data.frame( key, val1)
> >  outdf <- by(indf, indf$key, function(x) c(m=mean(x), s=sd(x)) )
> >  outdf
> >
> > # Alternatives:
> >
> > # aggregate (base) with new formula interface
> >
> > # write a small function to return multiple outputs
> > f <- function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm =
> TRUE))
> >
> > aggregate(val1 ~ key, data = indf, FUN = f)
> >  key  val1.meanval1.sd
> > 1   1 -0.9783589  0.6378922
> > 2   2  0.2816016  1.4490699
> >
> > # package doBy   (get the same output)
> >
> > library(doBy)
> > summaryBy(val1 ~ key, data = indf, FUN = f)
> >  key  val1.mean   val1.sd
> > 1   1 -0.9783589 0.6378922
> > 2   2  0.2816016 1.4490699
> >
> > # package plyr
> >
> > library(plyr)
> > ddply(indf, .(key), summarise, mean = mean(val1), sd = sd(val1))
> >  key   meansd
> > 1   1 -0.9783589 0.6378922
> > 2   2  0.2816016 1.4490699
> >
> > # package data.table
> >
> > library(data.table)
> > indt <- data.table(indf)
> > indt[, list(mean = mean(val1), sd = sd(val1)), by =
> list(as.integer(key))]
> > key   meansd
> > [1,]   1 -0.9783589 0.6378922
> > [2,]   2  0.2816016 1.4490699
> >
> > It's a cornucopia! :) Multiple grouping variables are no problem with
> these
> > functions, BTW.
> >
> > HTH,
>
>
> And here are yet four more:
>
> >
> > f.by <- function(x) c(key = x$key[1], mean = mean(x$val), sd =
> sd(x$val))
> > do.call(rbind, by(indf, indf["key"], f.by))
>  keymeansd
> 1   1 0.006794852 0.3779713
> 2   2 0.251890650 0.4379315
> >
> > library(sqldf)
> > sqldf("select key, avg(val1) mean, stdev(val1) sd from indf group by
> key")
>  keymeansd
> 1   1 0.006794852 0.3779713
> 2   2 0.251890650 0.4379315
> >
> > library(remix)
> > remix(val1 ~ key, transform(indf, key = factor(key)), funs = c(mean, sd))
> val1 ~ key
> ==
>
> +-+---+--+---+--+
> || mean  | sd   |
> +=+===+==+===+==+
> | key | 1 | val1 | 0.01  | 0.38 |
> + +---+--+---+--+
> | | 2 | val1 | 0.25  | 0.44 |
> +-+---+--+---+--+
> >
> > library(Hmisc)
> > summary(val1 ~ key, indf, fun = function(x) c(mean = mean(x), sd =
> sd(x)))
> val1N=6
>
> +---+-+-+---+-+
> |   | |N|mean   |sd.val1  |
> +---+-+-+---+-+
> |key|1|3|0.006794852|0.3779713|
> |   |2|3|0.251890650|0.4379315|
> +---+-+-+---+-+
> |Overall| |6|0.129342751|0.3897180|
> +---+-+-+---+-+
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] First and second derivatives of raw observations

2010-08-30 Thread mtorabi

Dear all,

I was asked to send the following question:

We have some (raw) observations and would like to get the first and second
derivatives in R. Any comment would be appreciated.

Thanks,
Mahmoud

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] First and second derivatives of raw observations

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 6:40 PM, mtor...@math.carleton.ca wrote:


Dear all,

I was asked to send the following question:

We have some (raw) observations and would like to get the first and  
second

derivatives in R. Any comment would be appreciated.


From the time of Newton, the quick and dirty (but often very  
informative) path to first and second derivatives has been via first  
and second differences. That, of course, assumes that the intervals  
between these "raw" observation are equal, which might or might not be  
a sensible assumption given the impossibly vague description of the  
problem.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] online documentation only?

2010-08-30 Thread Duncan Murdoch


On 30/08/2010 3:53 PM, Bob McCall wrote:

Greetings:

I recently installed R 2.11.1 for windows. It seems that there is only
online help now. Is there any way to get the local docs? I don't have
always-on high-speed internet. 


The local docs are generated on demand.  You use your browser, but you 
aren't really online.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New to R

2010-08-30 Thread Dennis Murphy

Hi:

Try this:

library(sos) # install from CRAN if you don't have it
findFn('imputation')

I got 285 hits. That should be enough to get you started.

Here's a recent paper about how to use sos from the R Journal (Dec. 2009):
http://journal.r-project.org/archive/2009-2/RJournal_2009-2_Graves~et~al.pdf

HTH,
Dennis

On Mon, Aug 30, 2010 at 1:15 PM, Josh HIll  wrote:

> I'm relatively new to R, and not particularly adept yet, but I was
> wondering
> if there was a simply way to simulate missing data that are MAR, MNAR and
> MCAR. I've got a good work-around for the MCAR data, but it's sort of hard
> to work with.
>
> Josh
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] online documentation only?

2010-08-30 Thread Duncan Murdoch


On 30/08/2010 4:16 PM, Bob McCall wrote:

Thanks for the replies. I tried help.start() and ?foo but my browser opened
and was blank. It just said can't open site as I was offline. Must be my
system. I'll try again. Maybe I can get it sorted out. Maybe the install was
bad.



I'd guess you're set up to use a proxy server.  If so, you need to tell 
your browser to access 127.0.0.1 directly, not through the proxy.  If 
that's not it, you need to tell us more about your setup:  OS, browser, 
version numbers, etc.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] search path for attaching packages

2010-08-30 Thread Duncan Murdoch

On 30/08/2010 6:00 PM, Atkinson, Elizabeth J. (Beth) wrote:

Several of us locally are puzzling over the following problem:

We have a central repository of R packages on our linux system.
Occasionally, we'd like to install a different version of the same
package (such as making updates to the survival package before it is
released to the larger group, or a downloaded beta version).   When
using the library function, it appears that lib.loc is not being used
and instead the central repository is being used.  We've even tried
modifying .libPaths( ) so that our local path is first in the search
list.  Is this a bug or a "feature"?

	For example: 

>
install.packages('DBI',lib='/data1/bsi/epi/s105816.SNPlinkage/CNV/rlib')
> library(DBI,
lib.loc='/data1/bsi/epi/s105816.SNPlinkage/CNV/rlib')

I can look in my directory and see that DBI is version 0.2-5,
but when I use the library command I get the version installed in the
central location instead of in my specified directory.  

> library(help=DBI)

You didn't specify the lib.loc in this command.  Typing library() will 
tell you where it will look, in the order given by .libPaths().

Duncan Murdoch

Information on package 'DBI'

Package:   DBI
Version:   0.2-4
Title: R Database Interface
Author:R Special Interest Group on Databases (R-SIG-DB)
Maintainer:David A. James 
Depends:   R (>= 2.3.0), methods
LazyLoad:  yes
License:   LGPL (version 2 or later)
Collate:   DBI.R Util.R zzz.R
Packaged:  Tue Oct 16 21:43:31 2007; dj
Built: R 2.10.0; ; 2009-10-26 17:51:14 UTC; unix

Further information is available in the following vignettes in
directory
'/usr/local/biotools/r/R-2.10.0/lib64/R/library/DBI/doc':

version
   _
platform   x86_64-unknown-linux-gnu 
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  10.0 
year   2009 
month  10   
day26   
svn rev50208
language   R
version.string R version 2.10.0 (2009-10-26)

Beth Atkinson, MS | Statistician III | Division of Biomedical Statistics
& Informatics| Assistant Professor of Biostatistics  
Phone: 507-284-0431 | Admin Assistant: 507-266-4556 | Fax: 507-284-9542

| atkin...@mayo.edu
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 |
www.mayoclinic.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] First and second derivatives of raw observations

2010-08-30 Thread Duncan Murdoch


On 30/08/2010 6:40 PM, mtor...@math.carleton.ca wrote:

Dear all,

I was asked to send the following question:

We have some (raw) observations and would like to get the first and second
derivatives in R. Any comment would be appreciated.


Fit a model, and take derivatives of the fit.  Which model?  Depends on 
the context.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] First and second derivatives of raw observations

2010-08-30 Thread Spencer Graves

   There is a strong argument for fitting something like splines 
and then differentiating the spline fit.



  I trust you won't object to my immodest recommendation of the 
"fda" package and book by Ramsay, Hooker and Graves (2009) Functional 
Data Analysis with R and Matlab (Springer).  The "fda" package includes 
scripts to work all but one of the 76 examples in the book.



  Hope this helps.
  Spencer Graves


On 8/30/2010 4:13 PM, Duncan Murdoch wrote:

On 30/08/2010 6:40 PM, mtor...@math.carleton.ca wrote:

Dear all,

I was asked to send the following question:

We have some (raw) observations and would like to get the first and 
second

derivatives in R. Any comment would be appreciated.


Fit a model, and take derivatives of the fit.  Which model?  Depends 
on the context.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help With Post-hoc Testing

2010-08-30 Thread Dennis Murphy

Hi:

See below.

On Mon, Aug 30, 2010 at 2:21 PM, Bruce Johnson
wrote:

> I am trying to do post-hoc tests associated with a repeated measures
> analysis with on factor nested within respondents.
>
> The factor (SOI) has 17 levels.  The overall testing is working fine, but I
> can't seem to get the multiple comparisons to work.
>
> The first step is to "stack" the data.
>
>  Then I used "lme" to specify and test the overall model.
>
>  Finally I'm trying to use "glht" to do multiple comparisons.
>
> Here's my code and the resulting output.
>
> > ImpSoi<-ImpData[,c("iobs", soi)]
>
> > ImpSoi[1:5,]
>
>  iobs soi1 soi2 soi3 soi4 soi5 soi6 soi7 soi8 soi9 soi10 soi11 soi12 soi13
> soi14 soi15 soi16
>
> 1   32 7.00 7.00 7.00 7.00 7.00 6.00  7.0 7.00 5.00  7.00  7.00  7.00  7.00
> 5.00  7.00  6.00
>
> 2   70 5.95 4.95 7.00 4.95 5.20 5.40  4.2 3.95 4.15  4.95  4.85  4.95  6.75
> 5.95  5.20  5.10
>
> 3   78 3.00 1.00 4.75 2.75 3.00 4.50  4.0 4.00 1.50  4.00  4.00  4.50  2.50
> 3.00  3.75  4.00
>
> 4  104 3.75 3.50 6.25 5.25 4.25 3.75  4.0 5.25 4.75  4.75  5.00  5.75  6.00
> 4.00  4.75  3.75
>
> 5  117 5.00 5.00 4.00 5.00 2.00 4.62  5.0 4.00 4.00  4.00  5.00  4.00  4.70
> 4.70  5.00  2.00
>
>  soi17
>
> 1  7.00
>
> 2  5.15
>
> 3  4.00
>
> 4  5.50
>
> 5  4.00
>
> > stack <-
> reshape(ImpSoi,varying=soi,timevar="solution",idvar="iobs",sep="",
> dir="long")
>
> > solution <- factor(solution,levels(1:17),)
>
> > stack[1:5,]
>
>  iobs solution  soi
>
> 32.1321 7.00
>
> 70.1701 5.95
>
> 78.1781 3.00
>
> 104.1  1041 3.75
>
> 117.1  1171 5.00
>

Try str(stack) at this point. Is solution a factor? (Hint: You defined
solution *outside* of the stack data frame. Now look at ls(), which
tells you the variables in your global workspace.)


> > Lmes.mod <- lme(soi ~ solution + iobs, random = ~1 | iobs/solution, data
> =
> stack)
>
> > anova(Lmes.mod)
>
>numDF denDF   F-value p-value
>
> (Intercept) 1  2383 2894.8342  <.0001
>
> solution1  23830.0003  0.9870
>
> iobs1   1470.0126  0.9109
>
> > summary(glht(Lmes.mod, linfct=mcp(solution="Tukey")))
>
> Error in mcp2matrix(model, linfct = linfct) :
>
>  Variable(s) 'solution' of class 'numeric' is/are not contained as a factor
> in 'model'.
>

This is telling you solution is a numeric variable in stack at this point,
correctly so.


>

>
>
> I don't understand the error since "solution" is clearly a factor in the
> model.
>

Here's what I don't understand. Why do you believe a procedure that
will produce 17 * 16 / 2 = 136 pairwise comparisons will be scientifically
meaningful? Please tell me you're not using a time variable as an
unordered factor with 17 levels... If you don't understand why that's a
problem, you need to consult with a local statistical expert. Seriously.

HTH,
Dennis

>
> Any suggestions would be welcome.
>
> Bruce
>
>
>
>
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R crashes when communicating with JAGS

2010-08-30 Thread Ben Bolker

Florian Weiler  johnshopkins.it> writes:

> First I create some data:
> N <- 1000
> x <- rnorm(N, 0, 5)
> 
> Then I specify a model in JAGS, storing it in the directory with the
> extension .bug:
> model {
>   for (i in 1:N) {
>   x[i] ~ dnorm(mu, tau) ## the model
>   }
>   mu ~ dnorm(0, .0001) ## uninformative priors
>   tau <- pow(sigma, -2)
>   sigma ~ dunif(0, 100)
> }
> 
> Then I call the rjags library and finally I try to run the jags model:
> library('rjags')
> 
> jags <- jags.model('example.bug',
>data = list('x' = x,
>'N' = N))
> 
> This is where its always crashing, i.e. I don't get an error message,
> but R simply stops working and I have to close and reopen the program.
> I don't even start drawing samples from the model, it just crashes the
> moment I try to call on JAGS.
> 

[snip]

> > My problem is that every
> > single time I want to call JAGS from R the latter crashes (i.e. it
> > turns pale and the Windows tells me "R has stopped working".
> > Strangely, if there is a mistake in my jags code, this will be
> > reported without a crash. Only when the code is right and the model
> > should start to run, R stops working. I already tried to use different
> > version of R (currently R 2.10.1 and JAGS-2.1.0-1),but to no avail. A
> > couple of days ago I even upgraded from Vista to Windows 7, hoping
> > this would sort our the problem, but its still exactly the same. Could
> > anyone think of a possible solution for this problem?
> >
> > I worked on other computers using Windows without encountering the
> > described problem. However, I am quite new to using JAGS, so I might
> > just be missing the obvious.

  This sort of thing is unfortunately really hard to debug remotely.
  Is it feasible to upgrade to the latest release of R (2.11.1), JAGS, and
rjags, just to rule that out as a possibility?  You said you had
used other Windows machines where it worked -- do they have
similar configurations?  Can you try to look for differences in the
configurations?  Do you get any messages at all from JAGS before
the program stops (I get

Compiling model graph
   Resolving undeclared variables
   Allocating nodes
   Graph Size: 1009

  |++| 100%

: does any of this appear before R crashes)?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] First and second derivatives of raw observations

2010-08-30 Thread Spencer Graves

p.s.  My suggestion is a special case of Duncan Murdoch's suggestion: 
If you have a model that should fit the data, use it.  If you don't -- 
or if you have only something rather general -- then the more general 
tools of functional data analysis may be useful.


##
   There is a strong argument for fitting something like splines 
and then differentiating the spline fit.



  I trust you won't object to my immodest recommendation of the 
"fda" package and book by Ramsay, Hooker and Graves (2009) Functional 
Data Analysis with R and Matlab (Springer).  The "fda" package includes 
scripts to work all but one of the 76 examples in the book.



  Hope this helps.
  Spencer Graves


On 8/30/2010 4:13 PM, Duncan Murdoch wrote:

On 30/08/2010 6:40 PM, mtor...@math.carleton.ca wrote:

Dear all,

I was asked to send the following question:

We have some (raw) observations and would like to get the first and
second
derivatives in R. Any comment would be appreciated.


Fit a model, and take derivatives of the fit.  Which model?  Depends
on the context.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reordering levels error

2010-08-30 Thread Dennis Murphy

Hi:

I don't know if this is exactly what you wanted, but here goes. I made
a few adjustments in the data frame before calling ggplot():

# library(ggplot2)

# Reorient the order in which variables appear
winter <- winter[, c(1, 7, 3, 6, 4, 5, 2)]

# Get rid of second week 26 at the end
winter2 <- winter[-53, ]

# Create an ordered factor for labeling the weeks
wkord <- c(26:52, 1:25)
winter2$week <- ordered(winter2$week, levels = wkord)

# melt the data frame
w <- melt(winter2, id = 'week')

# create ordered factor for year
w$year <- substring(as.character(w$variable), 3, 6)
w$year <- factor(w$year, levels = c(2005, 2009, 2006, 2008, 2007, 2010),
  ordered = TRUE)

# x variable in ggplot() must be continuous for geom_line() to work
g <- ggplot(w,aes(x = wk, y = value/1000, colour = year))

# Plot with week numbers on x-axis, rotated and resized for presentation
g + geom_line() + scale_x_continuous(breaks = 1:52, labels = levels(w$week))
+
 opts(axis.text.x = theme_text(angle = 90, vjust = 0.5, hjust = 1, size =
7))

HTH,
Dennis

On Mon, Aug 30, 2010 at 2:25 PM, Felipe Carrillo
wrote:

> Please consider the following dataset:
> I want to reorder the levels by year but get the following error:
>
> Error in tapply(v, x, FUN, ...) : arguments must have same length
>
> I suspect that I need to add the levels before I melt the dataset
>  but either way I have only use 'reorder' once before and can't figure
>  out how it works..Thanks for any advice.
>
> winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
> 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
> 47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
> 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
> 22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
> 11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
> 33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
> 345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
> 6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
> 4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
> 770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
> BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
> 99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
> 35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
> 2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
> 372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
> 0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
> BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
> 34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
> 140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
> 18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
> 827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
> 535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
> BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
> 205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
> 445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
> 103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
> 2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
> 203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
> 0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
> 6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
> 1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
> 94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
> 22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
> 1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
> "BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class = "data.frame",
> row.names = c(NA,
> -53L))
>  str(winter)
>  w_melt <- melt(winter,id="week",variable="year");str(w_melt)
>  #  Reorder DOESN'T WORK
>  w_melt <-
>
> reorder(w_melt$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
> pdf("wtest.pdf")
> ggplot(w_melt,aes(week,value/1000,colour=year,order= - as.numeric(year))) +
> geom_line(size=.75)+  theme_bw() +
> opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
>  labs(y="Number of individuals X 1,000",x="week")
> dev.off()
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commen

Re: [R] reordering levels error

2010-08-30 Thread Felipe Carrillo


 Thanks Dennis:
I always seem to have a hard time defining levels. That's exactly what I 
needed. 

 

>
>From: Dennis Murphy 
>To: Felipe Carrillo 
>Cc: r-h...@stat.math.ethz.ch
>Sent: Mon, August 30, 2010 5:41:02 PM
>Subject: Re: [R] reordering levels error
>
>Hi:
>
>I don't know if this is exactly what you wanted, but here goes. I made
>a few adjustments in the data frame before calling ggplot():
>
># library(ggplot2)
>
># Reorient the order in which variables appear
>winter <- winter[, c(1, 7, 3, 6, 4, 5, 2)]
>
># Get rid of second week 26 at the end
>winter2 <- winter[-53, ]
>
># Create an ordered factor for labeling the weeks
>wkord <- c(26:52, 1:25)
>winter2$week <- ordered(winter2$week, levels = wkord)
>
># melt the data frame
>w <- melt(winter2, id = 'week')
>
># create ordered factor for year
>w$year <- substring(as.character(w$variable), 3, 6)
>w$year <- factor(w$year, levels = c(2005, 2009, 2006, 2008, 2007, 2010),
>  ordered = TRUE)
>
># x variable in ggplot() must be continuous for geom_line() to work
>g <- ggplot(w,aes(x = wk, y = value/1000, colour = year))
>
># Plot with week numbers on x-axis, rotated and resized for presentation
>g + geom_line() + scale_x_continuous(breaks = 1:52, labels = levels(w$week)) +
> opts(axis.text.x = theme_text(angle = 90, vjust = 0.5, hjust = 1, size = 7))
>
>HTH,
>Dennis
>
>
>On Mon, Aug 30, 2010 at 2:25 PM, Felipe Carrillo  
>wrote:
>
>Please consider the following dataset:
>>I want to reorder the levels by year but get the following error:
>>
>>Error in tapply(v, x, FUN, ...) : arguments must have same length
>> 
>>I suspect that I need to add the levels before I melt the dataset
>> but either way I have only use 'reorder' once before and can't figure
>> out how it works..Thanks for any advice.
>> 
>>winter <- (list(week = c(26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
>>34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
>>47L, 48L, 49L, 50L, 51L, 52L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
>>9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L,
>>22L, 23L, 24L, 25L, 26L), BY2010 = c(0L, 0L, 460L, 1126L, 1755L,
>>11153L, 27998L, 56336L, 12486L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
>>0L, 0L, 0L, 0L), BY2009 = c(0L, 95L, 341L, 373L, 2859L, 5769L,
>>33263L, 98435L, 345339L, 622621L, 349958L, 531232L, 652803L,
>>345358L, 142991L, 148883L, 957501L, 32495L, 14862L, 7210L, 9765L,
>>6846L, 5067L, 6201L, 3045L, 106230L, 1183L, 195L, 6855L, 10261L,
>>4179L, 650L, 240L, 165L, 1189L, 863L, 562L, 1350L, 188L, 479L,
>>770L, 0L, 558L, 314L, 147L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>    BY2008 = c(0L, 0L, 77L, 124L, 159L, 2376L, 9480L, 17314L,
>>    99574L, 323679L, 198211L, 93630L, 129183L, 111820L, 71260L,
>>    35241L, 14020L, 20778L, 21694L, 15016L, 13400L, 9187L, 3607L,
>>    2804L, 2417L, 5291L, 16216L, 898L, 558L, 709L, 972L, 61L,
>>    372L, 3086L, 10108L, 4295L, 882L, 2593L, 36L, 233L, 243L,
>>    0L, 70L, 272L, 308L, 134L, 40L, 0L, 0L, 0L, 0L, 0L, 0L),
>>    BY2007 = c(0L, 0L, 0L, 10775L, 4166L, 4958L, 16221L, 29401L,
>>    34951L, 33188L, 146044L, 105007L, 185297L, 159682L, 207537L,
>>    140694L, 128275L, 44274L, 27079L, 18928L, 10437L, 29984L,
>>    18395L, 25846L, 4573L, 31995L, 3679L, 1043L, 9636L, 1524L,
>>    827L, 7009L, 233L, 433L, 0L, 1103L, 257L, 128L, 66L, 70L,
>>    535L, 429L, 97L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
>>    BY2006 = c(0L, 707L, 2390L, 8860L, 24430L, 40885L, 72792L,
>>    205521L, 344493L, 662973L, 526409L, 631953L, 850491L, 842678L,
>>    445987L, 558152L, 332032L, 174326L, 80601L, 48696L, 98571L,
>>    103563L, 149469L, 78081L, 182478L, 2158L, 16566L, 4027L,
>>    2655L, 1112L, 567L, 2595L, 4976L, 6336L, 294L, 1758L, 291L,
>>    203L, 450L, 1098L, 788L, 195L, 532L, 0L, 0L, 0L, 0L, 0L,
>>    0L, 167L, 0L, 0L, 0L), BY2005 = c(0L, 0L, 868L, 2044L, 4064L,
>>    6049L, 9399L, 13304L, 45172L, 242155L, 476864L, 712534L,
>>    1058409L, 2115018L, 1510342L, 1138213L, 333192L, 158820L,
>>    94379L, 348882L, 39290L, 29701L, 47258L, 69837L, 7884L, 49338L,
>>    22168L, 19397L, 19397L, 15984L, 8688L, 1200L, 1623L, 1291L,
>>    1356L, 707L, 875L, 875L, 222L, 883L, 0L, 0L, 129L, 0L, 0L,
>>    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("week", "BY2010",
>>"BY2009", "BY2008", "BY2007", "BY2006", "BY2005"), class = "data.frame",
>>row.names = c(NA,
>>-53L))
>> str(winter)
>> w_melt <- melt(winter,id="week",variable="year");str(w_melt)
>> #  Reorder DOESN'T WORK
>> w_melt <-
>>reorder(w_melt$year,c("BY2005","BY2009","BY2006","BY2008","BY2007","BY2010"))
>>pdf("wtest.pdf")
>>ggplot(w_melt,aes(week,value/1000,colour=year,order= - as.numeric(year))) +
>>geom_line(size=.75)+  theme_bw() +
>>opts(title="Cumulative",axis.text.x = theme_text(angle=45,hjust=1)) +
>> labs(y="Number of individuals X 1,000",x="week")
>>dev.off()
>> 
>>Felipe D. Carrillo
>>Supervisory Fishery Biologist
>>Depart

Re: [R] [Rd] S4 Method Rd Warning

2010-08-30 Thread Dario Strbenac

Thanks for this. I was thinking the spaces rule only applied within the 
\alias{} statements. I'm not sure why I thought this.

 Original message 
>Date: Mon, 30 Aug 2010 18:43:35 +0100 (BST)
>From: Prof Brian Ripley   
>Subject: Re: [Rd] S4 Method Rd Warning  
>To: Duncan Murdoch 
>Cc: d.strbe...@garvan.org.au
>
>On Mon, 30 Aug 2010, Duncan Murdoch wrote:
>
>> On 30/08/2010 1:00 AM, Dario Strbenac wrote:
>>> Hello,
>>> 
>>> I am using R 2.11.0. I have a curious problem where I get a warning in R 
>>> CMD check which is seemingly not relevant to my Rd file.
>>> 
>>
>> 2.11.0 isn't the current release, and there have been fixes to this stuff 
>> since 2.11.1 was released.  Could you try 2.11.1-patched or the devel 
>> version 
>> of 2.12.0 and see if you still get the warnings?
>
>But there are incorrect spaces at e.g. {GRanges, BSgenome}.  Classes 
>in signatures are comma-separated, not comma+space separated.
>
>If we've changed it, we are now accepting errorneous files AFAICS.
>
>>
>> Duncan Murdoch
>>> The warning says :
>>> 
>>> * checking Rd \usage sections ... WARNING
>>> Bad \usage lines found in documentation object 'enrichmentCalc':
>>> S4method{enrichmentCalc}{GenomeDataList, BSgenome}(rs, 
>>> organism, seqLen=NULL, ...)
>>> S4method{enrichmentCalc}{GenomeData, BSgenome}(rs, 
>>> organism, seqLen=NULL, do.warn=FALSE)
>>> S4method{enrichmentCalc}{GRanges, BSgenome}(rs, 
>>> organism, seqLen=NULL)
>>> 
>>> Functions with \usage entries need to have the appropriate \alias entries,
>>> and all their arguments documented.
>>> The \usage entries must correspond to syntactically valid R code.
>>> See the chapter 'Writing R documentation files' in manual 'Writing R
>>> Extensions'.
>>> 
>>> But, I have documented all the arguments, and I have all the aliases. What 
>>> else could it be warning me about ?
>>> 
>>> The Rd file contents are :
>>> 
>>> \name{enrichmentCalc}
>>> \alias{enrichmentCalc}
>>> \alias{enrichmentCalc,GenomeDataList,BSgenome-method}
>>> \alias{enrichmentCalc,GenomeData,BSgenome-method}
>>> \alias{enrichmentCalc,GRanges,BSgenome-method}
>>> 
>>> \title{Calculate sequencing enrichment}
>>> \description{A description}
>>> \usage{
>>>   \S4method{enrichmentCalc}{GenomeDataList, BSgenome}(rs, organism, 
>>> seqLen=NULL, ...)
>>>   \S4method{enrichmentCalc}{GenomeData, BSgenome}(rs, organism, 
>>> seqLen=NULL, do.warn=FALSE)
>>>   \S4method{enrichmentCalc}{GRanges, BSgenome}(rs, organism, seqLen=NULL)
>>> }
>>> \arguments{
>>>   \item{rs}{jjj}
>>>   \item{organism}{ghi}
>>>   \item{seqLen}{def}
>>>   \item{do.warn}{abc}
>>>   \item{...}{xyz}
>>> }
>>> \details{
>>>   Details.
>>> }
>>> \value{
>>>  Text here.
>>> }
>>> \author{A Person}
>>> \examples{
>>>  #See the manual
>>> }
>>> 
>>> Thanks,
>>>Dario.
>>> 
>>> --
>>> Dario Strbenac
>>> Research Assistant
>>> Cancer Epigenetics
>>> Garvan Institute of Medical Research
>>> Darlinghurst NSW 2010
>>> Australia
>>> 
>>> __
>>> r-de...@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>
>> __
>> r-de...@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>-- 
>Brian D. Ripley,  rip...@stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford, Tel:  +44 1865 272861 (self)
>1 South Parks Road, +44 1865 272866 (PA)
>Oxford OX1 3TG, UKFax:  +44 1865 272595


--
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] polr

2010-08-30 Thread Simo Vundla

Dear All,

I am trying to use the packadge polr () to analyse ordinal categorical data 
responses. Instead of using polr() directly, I have altered the script slightly 
(imposed a constraint to make all the parameters greater than or equal to zero) 
(see below), 


fit <- list(coefficients = exp(beta), zeta = zeta, deviance = 
deviance,)

However my main problem is that when I use this "new script" I am unable to get 
the t-values and p-values (though able to get the parameter values and the 
deviance). I have downloaded and scrutinised the polr() script in an attempt to 
understand the section where these t- and p- values are estimated but without 
much success.

Where is polr () are these p- and t-values estimated? 

Thanks

Simo


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error: cannot allocate vector of size 198.4 Mb

2010-08-30 Thread 나여나


Hi, All

   I have a problem of R memory space.

   I am getting "Error: cannot allocate vector of size 198.4 Mb"


   --
   I've tried with:
   > memory.limit(size=2047);
   [1] 2047
   > memory.size(max=TRUE);
   [1] 12.75

   > library('RODBC');
   > Channel<-odbcConnectAccess('c:/test.MDB');  # inputdata:15 cols, 200
   rows, 173 MB
   > x<-sqlFetch(Channel,'data1');
   > odbcCloseAll();

   > gc(TRUE)
   Garbage collection 69 = 2+2+65 (level 2) ...
   3.6 Mbytes of cons cells used (38%)
   183.8 Mbytes of vectors used (36%)
  used  (Mb) gc trigger  (Mb) max used  (Mb)
   Ncells   134456   3.6 35   9.4   35   9.4
   Vcells 24082306 183.8   67102875 512.0 78179534 596.5
   > memory.size()
   [1] 192.16

   > NROW(x)
   [1] 200

   > x

  YEAR MONTH  ACT_AMT  T   M_WEIGHT ACT_AMTSUMGROWTH COMPCNTCOMPCV
   MODLCNTMODLCV FLAG_1 FLAG_2 PRICE HIGHEND_AMT
   1  2002 1   511997  1 0.01563573   32745313 1.000   2
   132.41674  13 170.54307  1  0  11906.91   0.287
   2  2002 2  2254900  2 0.06886176   32745313 1.000   2
   113.06057  17 176.79751  0  0  35232.81   0.4922613

   


   > hbnreg<-function(data,option)
   + {
   + for(i in 1:NROW(option)) {
   + nam<-paste(substring(option[i],1,nchar(option[i])-2))
   + if(substring(option[i],nchar(option[i]))=='T') assign(nam,TRUE)
   + else assign(nam,FALSE)
   + }
   +
   x<-lm("ACT_AMT~T+M_WEIGHT+ACT_AMTSUM+GROWTH+COMPCNT+COMPCV+MODLCNT+MODLCV+FL
   AG_1+FLAG_2+PRICE+HIGHEND_AMT",data=data)
   + y=list()
   + if(summary==TRUE){
   + z<-summary(x)
   + y$coefficients<-z$coefficients
   + y$residuals<-z$residuals
   + }
   + #if(influence==TRUE){
   + #z<-influence(x)
   + #y$hat<-z$hat
   + }
   +
   + y
   + }

   > y<-hbnreg(x,c('summary=T','influence=T'));
   Error:cannot allocate vector of size 198.4 Mb  < error

   -
   my work enviroment :

   > version   _
   platform   i386-pc-mingw32
   arch   i386
   os mingw32
   system i386, mingw32
   status
   major  2
   minor  11.1
   year   2010
   month  05
   day31
   svn rev52157
   language   R
   version.string R version 2.11.1 (2010-05-31)

   - PC OS : 32bit WinXP pro sp3
   - PC RAM : 1 GB
   - Virtual memory : 1524 MB

   --

   Could it be an hardware problem?


   Thanks and best regards.


   Young-Ju, Park
   from Korea

   [1][rKWLzcpt.zNp8gmPEwGJCA00]
   
[...@from=dllmain&rcpt=r%2Dhelp%40r%2Dproject%2Eorg&msgid=%3C20100831101740%2EH
   M%2E0do%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E]

References

   1. mailto:dllm...@hanmail.net
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Band-wise Conditional Sum - Actual problem

2010-08-30 Thread Vincy Pyne

Dear David and Dennis Sir,

Thanks a lot for your guidance. 

As guided by Mr Dennis Murphy Sir in his reply 

Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

I had  really tried earlier to understand the apply, tapply, mapply and sapply 
commands before writing back to the R forum. But I was not able to figure out 
where was the problem. But Mr Dennis Sir really inspired me and when I 
revisited 'tapply', I realized that instead of using 'ead' for getting sum, I 
was using 'ead.cat', and that solved my problem. 

Then I had a new problem of 'How to get rid of NA's' , Again instead of  
posting to the group, I had accessed the earlier R mails and in the end got the 
solution. I sincerely thank both of you for taking so much efforts and guiding 
me.

I will certainly take efforts to understand 'R' at the earliest.

Regards

Vincy



Replace table in the tapply call with sum. While you're at it, typing 
?tapply to find out what the function does wouldn't hurt...

HTH,
Dennis

 

--- On Mon, 8/30/10, David Winsemius  wrote:

From: David Winsemius 
Subject: Re: [R] Band-wise Conditional Sum - Actual problem

Cc: r-help@r-project.org
Received: Monday, August 30, 2010, 2:43 PM


On Aug 30, 2010, at 4:05 AM, Vincy Pyne wrote:

> Dear R helpers,
> 
> Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, 
> there seems to be mis-communication from my end
 corresponding to my requirement. As I had mentioned in my earlier mail, I am 
dealing with a very large database of borrowers and I had given a part of it in 
my earlier mail as given below. For a given rating say "A", I needed to have 
the bad-wise sums of ead's (where bands are constructed using the ead size 
itself.) and not the number of borrowers falling in a particular band.
> 
> I am reproducing the data and solution as provided by Winsemius Sir (which 
> generates the number of band-wise borrowers for a given rating.
> 
> rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", 
> "A", "A", "AA","BB","BBB","AA", "A", "AAA","BBB","BBB", "BB", "A", "BB", "A", 
> "AA", "B","A", "AA", "BBB", "A", "BBB")
> 
> ead <- c(169229.93,100, 5877794.25, 9530148.63, 75040962.06, 21000, 1028360,  
> 600, 17715000,  14430325.24, 1180946.57, 15, 167490, 81255.16, 
> 54812.5, 3000, 1275702.94, 9100,
 1763142.3, 3283048.61, 120, 11800, 3000,  96894.02,  453671.72,  7590, 
106065.24, 940711.67,  2443000, 950, 39000, 1501939.67)
> 
> df$ead.cat <- cut(df$ead, breaks=c(0, 10, 50, 100, 200, 
> 500 , 1000, 1) )
> 
> df
> 
> df_sorted <- df[order(df$rating),]      # the output is as given below.
> 
> > df_sorted
>    rating         ead                     ead.cat
> 1       A          169229.93        (1e+05,5e+05]
> 3       A         5877794.25        (5e+06,1e+07]
> 6       A            21000.00   
            (0,1e+05]
> 12      A          15.00       (1e+05,5e+05]
> 13      A          167490.00       (1e+05,5e+05]
> 18      A             9100.00               (0,1e+05]
> 23      A             3000.00               (0,1e+05]
> 25      A          453671.72       (1e+05,5e+05]
> 28      A          940711.67       (5e+05,1e+06]
> 31      A            39000.00   
           (0,1e+05]
> 5      AA       75040962.06      (1e+07,1e+08]
> 9      AA       17715000.00      (1e+07,1e+08]
> 10     AA      14430325.24      (1e+07,1e+08]
> 11     AA        1180946.57      (1e+06,2e+06]
> 14     AA            81255.16             (0,1e+05]
> 17     AA         1275702.94     (1e+06,2e+06]
> 26     AA              7590.00            (0,1e+05]
> 29     AA     
    2443000.00     (2e+06,5e+06]
> 2     AAA               100.00             (0,1e+05]
> 19    AAA       1763142.30      (1e+06,2e+06]
> 27      B           106065.24      (1e+05,5e+05]
> 7      BB         1028360.00      (1e+06,2e+06]
> 15     BB            54812.50             (0,1e+05]
> 22     BB            11800.00             (0,1e+05]
> 24     BB           
 96894.02             (0,1e+05]
> 4     BBB        9530148.63      (5e+06,1e+07]
> 8     BBB        600.00      (5e+06,1e+07]
> 16    BBB            3000.00              (0,1e+05]
> 20    BBB       3283048.61       (2e+06,5e+06]
> 21    BBB       120.00       (1e+06,2e+06]
> 30    BBB       950.00       (5e+06,1e+07]
> 32    BBB       1501939.67       (1e+06,2e+06]
> 
> 
> ## The following command fetches rating-wise and ead
 size no of borrowers. Thus, for rating A, there are 4 borrowers in the ead 
range (0, 1e+05], 4 borrowers in the range (1e+05 to 5e+05] and so on..
> 
> > with(df, tapply(ead.cat, rating, table))
> $A
> 
>     (0,1e+05] (1e+05,5e+05] (5e+05,1e+06] (1e+06,2e+06] (2e+06,5e+06] 
>(5e+06,1e+07] (1e+07,1e+08]
>             4             4             1             0             0         
>    1             0
> 
> $

[R] Q about package Icens: change the color of the shading in plot function

2010-08-30 Thread Mendolia, Franco

Hello!

I want to use the Icens package for analyzing interval-censored data. This code 
from the manual gives me what I want.

library(Icens)
data(cosmesis)
csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R))
e1 <- VEM(csub1)
plot(e1)

However, I would like to change the color of the shading from green to 
something less green, say gray. Any ideas how I could do that? I looked at par, 
but I wasn't able to find what I need.

Thanks,
Franco
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error: cannot allocate vector of size 198.4 Mb

2010-08-30 Thread Steve Lianoglou

Hi,

On Mon, Aug 30, 2010 at 9:17 PM, 나여나  wrote:
>
>    Hi, All
>
>   I have a problem of R memory space.
>
>   I am getting "Error: cannot allocate vector of size 198.4 Mb"

It's a RAM thing: you don't have enough.

The OS said "nice try" when R tried asked for that last 198.4 MB's of
RAM chunk of RAM.

This question comes up quite often. Read through some of these if your
still confused:
http://search.gmane.org/search.php?group=gmane.comp.lang.r.general&query=cannot+allocate+vector+of+size

-steve

>
>
>   --
>   I've tried with:
>   > memory.limit(size=2047);
>   [1] 2047
>   > memory.size(max=TRUE);
>   [1] 12.75
>
>   > library('RODBC');
>   > Channel<-odbcConnectAccess('c:/test.MDB');  # inputdata:15 cols, 200
>   rows, 173 MB
>   > x<-sqlFetch(Channel,'data1');
>   > odbcCloseAll();
>
>   > gc(TRUE)
>   Garbage collection 69 = 2+2+65 (level 2) ...
>   3.6 Mbytes of cons cells used (38%)
>   183.8 Mbytes of vectors used (36%)
>              used  (Mb) gc trigger  (Mb) max used  (Mb)
>   Ncells   134456   3.6     35   9.4   35   9.4
>   Vcells 24082306 183.8   67102875 512.0 78179534 596.5
>   > memory.size()
>   [1] 192.16
>
>   > NROW(x)
>   [1] 200
>
>   > x
>
>      YEAR MONTH  ACT_AMT  T   M_WEIGHT ACT_AMTSUM    GROWTH COMPCNT    COMPCV
>   MODLCNT    MODLCV FLAG_1 FLAG_2     PRICE HIGHEND_AMT
>   1  2002     1   511997  1 0.01563573   32745313 1.000       2
>   132.41674      13 170.54307      1      0  11906.91   0.287
>   2  2002     2  2254900  2 0.06886176   32745313 1.000       2
>   113.06057      17 176.79751      0      0  35232.81   0.4922613
>
>   
>
>
>   > hbnreg<-function(data,option)
>   + {
>   + for(i in 1:NROW(option)) {
>   + nam<-paste(substring(option[i],1,nchar(option[i])-2))
>   + if(substring(option[i],nchar(option[i]))=='T') assign(nam,TRUE)
>   + else assign(nam,FALSE)
>   + }
>   +
>   x<-lm("ACT_AMT~T+M_WEIGHT+ACT_AMTSUM+GROWTH+COMPCNT+COMPCV+MODLCNT+MODLCV+FL
>   AG_1+FLAG_2+PRICE+HIGHEND_AMT",data=data)
>   + y=list()
>   + if(summary==TRUE){
>   + z<-summary(x)
>   + y$coefficients<-z$coefficients
>   + y$residuals<-z$residuals
>   + }
>   + #if(influence==TRUE){
>   + #z<-influence(x)
>   + #y$hat<-z$hat
>   + }
>   +
>   + y
>   + }
>
>   > y<-hbnreg(x,c('summary=T','influence=T'));
>   Error:cannot allocate vector of size 198.4 Mb  < error
>
>   -
>   my work enviroment :
>
>   > version               _
>   platform       i386-pc-mingw32
>   arch           i386
>   os             mingw32
>   system         i386, mingw32
>   status
>   major          2
>   minor          11.1
>   year           2010
>   month          05
>   day            31
>   svn rev        52157
>   language       R
>   version.string R version 2.11.1 (2010-05-31)
>
>   - PC OS : 32bit WinXP pro sp3
>   - PC RAM : 1 GB
>   - Virtual memory : 1524 MB
>
>   --
>
>   Could it be an hardware problem?
>
>
>   Thanks and best regards.
>
>
>   Young-Ju, Park
>   from Korea
>
>   [1][rKWLzcpt.zNp8gmPEwGJCA00]
>   
> [...@from=dllmain&rcpt=r%2Dhelp%40r%2Dproject%2Eorg&msgid=%3C20100831101740%2EH
>   M%2E0do%40dllmain%2Ewwl737%2Ehanmail%2Enet%3E]
>
> References
>
>   1. mailto:dllm...@hanmail.net
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Q about package Icens: change the color of the shading in plot function

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 10:58 PM, Mendolia, Franco wrote:


Hello!

I want to use the Icens package for analyzing interval-censored  
data. This code from the manual gives me what I want.


library(Icens)
data(cosmesis)
csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R))
e1 <- VEM(csub1)
plot(e1)

However, I would like to change the color of the shading from green  
to something less green, say gray. Any ideas how I could do that? I  
looked at par, but I wasn't able to find what I need.


methods(plot)

getAnywhere(plot.isurv)

(The polygon calls are hard coded and do not accept col= arguments.)

--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Speeding up prediction of survival estimates when using `survifit'

2010-08-30 Thread Ravi Varadhan

Hi,

I fit a Cox PH model to estimate the cause-specific hazards (in a competing 
risks setting).  Then , I compute the survival estimates for all the 
individuals in my data set using the `survfit' function.  I am currently 
playing with a data set that has about 6000 observations and 12 covariates.  I 
am finding that the survfit function is very slow.  

Here is a simple simulation example (modified from Frank Harrell's example for 
`cph') that illustrates the problem:

#n <- 500
set.seed(4321) 

age <- 50 + 12*rnorm(n) 

sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) 

cens <- 5 * runif(n) 

h <- 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female')) 

dt <- -log(runif(n))/h 

e <- ifelse(dt <= cens, 1, 0) 

dt <- pmin(dt, cens) 

Srv <- Surv(dt, e)

 f <- coxph(Srv ~ age + sex, x=TRUE, y=TRUE) 

system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))


When I run the above code with sample sizes, n, taking on values of 500, 1000, 
2000, and 4000, the time it takes for survfit to run are as follows:

# n <- 500
> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
   user  system elapsed 
   0.160.000.15 


# n <- 1000
> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
   user  system elapsed 
   1.450.001.48 


# n <- 2000
> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
   user  system elapsed 
  10.190.00   10.25 


# n <- 4000
> system.time(ans <- survfit(f, type="aalen", se.fit=FALSE, newdata=f$x))
   user  system elapsed 
  72.870.05   74.87 


I eventually want to use `survfit' on a data set with roughly 50K observations, 
which I am afraid is going to be painfully slow.  I would much appreciate 
hints/suggestions on how to make `survfit' faster or any other faster 
alternatives.  

Thanks.

Best,
Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Q about package Icens: change the color of the shading in plot function

2010-08-30 Thread David Winsemius



On Aug 30, 2010, at 11:25 PM, David Winsemius wrote:



On Aug 30, 2010, at 10:58 PM, Mendolia, Franco wrote:


Hello!

I want to use the Icens package for analyzing interval-censored  
data. This code from the manual gives me what I want.


library(Icens)
data(cosmesis)
csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R))
e1 <- VEM(csub1)
plot(e1)

However, I would like to change the color of the shading from green  
to something less green, say gray. Any ideas how I could do that? I  
looked at par, but I wasn't able to find what I need.


methods(plot)

getAnywhere(plot.isurv)

(The polygon calls are hard coded and do not accept col= arguments.)


Hack it,...add a col argument:

plotisurv <- function (x, type = "eq", surv = FALSE, bounds = FALSE,  
shade = 3,

 density = 30, angle = 45, lty = 1, new = TRUE, xlab = "Time",
 ylab = "Probability", main = "GMLE", ltybnds = 2,  
col="green", ...)


And then in the body  change:

   border = FALSE, col = "green")

to:

   border = FALSE, col = col)



--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Making plots in big scatterplot matrix large enough to see

2010-08-30 Thread Jocelyn Paine

I've got a data frame with 23 columns, and wanted to plot a scatterplot 
matrix of it. I called

  pairs( df )
where 'df' is my data frame. This did generate the matrix, but the 
plotting window did not expand to make the individual plots large enough 
to see. Each one was only about 10 pixels high and wide.


I tried sending the plot to a file, with a high and wide image, 
by doing

  png( "plot.png", width = 4000, height = 4000 )
but I got these errors:
  Error in png( "plot.png", width = 4000, height = 4000 ) :
  unable to start device
  In addition: Warning messages:
  1: In png( "plot.png", width = 4000, height = 4000 ) :
 Unable to allocate bitmap
  2: In png( "plot.png", width = 4000, height = 4000 ) :
 opening device failed

The messages aren't helpful, because they don't tell you _why_ R can't 
start the device, allocate it, or open it. The documentation for png says:

  Windows imposes limits on the size of bitmaps: these are not documented
  in the SDK and may depend on the version of Windows. It seems that width
  and height are each limited to 2^15-1.
However, 2^15-1 is 32767, so that isn't the problem here. I tried various 
values for height and width. 2400 was OK, but 2500 wasn't. So it seems R 
can't produce plots that are more than about 2400 pixels square. This is 
with R 2.10.1.


Why is png failing on big images? Also, what's the recommended way to make 
a file containing a scatterplot matrix when you have lots of variables? 
'pairs' is a very useful function, but obviously one does need to be 
careful when doing this, and I don't know what experts would recommend. Do 
you loop round the variables plotting each pair to a different file? I was 
hoping that I could put them all into one very big image and view parts of 
it at a time.


Thanks,

Jocelyn Paine
http://www.j-paine.org
http://www.spreadsheet-parts.org
+44 (0)7768 534 091

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Q about package Icens: change the color of the shading in plot function

2010-08-30 Thread Mendolia, Franco

___
From: David Winsemius [dwinsem...@comcast.net]
Sent: Monday, August 30, 2010 10:41 PM
To: David Winsemius
Cc: Mendolia, Franco; r-help@r-project.org
Subject: Re: [R] Q about package Icens: change the color of the shading in plot 
function

On Aug 30, 2010, at 11:25 PM, David Winsemius wrote:

>
> On Aug 30, 2010, at 10:58 PM, Mendolia, Franco wrote:
>
>> Hello!
>>
>> I want to use the Icens package for analyzing interval-censored
>> data. This code from the manual gives me what I want.
>>
>> library(Icens)
>> data(cosmesis)
>> csub1 <- subset(cosmesis, subset=Trt==0, select=c(L,R))
>> e1 <- VEM(csub1)
>> plot(e1)
>>
>> However, I would like to change the color of the shading from green
>> to something less green, say gray. Any ideas how I could do that? I
>> looked at par, but I wasn't able to find what I need.
>
> methods(plot)
>
> getAnywhere(plot.isurv)
>
> (The polygon calls are hard coded and do not accept col= arguments.)

Hack it,...add a col argument:

plotisurv <- function (x, type = "eq", surv = FALSE, bounds = FALSE,
shade = 3,
  density = 30, angle = 45, lty = 1, new = TRUE, xlab = "Time",
  ylab = "Probability", main = "GMLE", ltybnds = 2,
col="green", ...)

And then in the body  change:

border = FALSE, col = "green")

to:

border = FALSE, col = col)

>
> --
> David.
>


Thanks, wasn't aware of the getAnywhere function.

Franco
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 106 matches

Mail list logo