Re: [R] compare histograms

2010-10-15 Thread Rainer M Krug
On Fri, Oct 15, 2010 at 2:47 AM, Michael Bedward
wrote:

> Hi Rainer,
>
> Great - many thanks for that.  Yes, I'm using OSX
>
> I initially tried to use install.packages to get get a pre-built
> binary of earthmovdist from Rforge, but it failed with...
>
> In getDependencies(pkgs, dependencies, available, lib) :
>  package ‘earthmovdist’ is not available
>

Yes - we had some problems with getting the package build for OSX, but we
(more specifically Dirk) are working on that.



> When I tried installing with type="source" this was also failing.
>
> However, after reading your post I looked at the error messages
> properly and it turned out to be a trivial problem. The .First
> function defined in my .Rprofile was printing some text to the console
> with cat() which was being incorrectly picked up by the package build
> as if it was a makefile argument. When I commented out the call to cat
> the package installed successfully. I haven't had this problem
> installing other packages from source so I think there must be a
> little problem with your setup (?)
>

Thanks for letting us know - could you send us the offending entry in your
.Rprofile (or the whole .Rprofile?), so that we can see if it is an OSX or
general problem?


>
> Now that it's installed I look forward to trying it out shortly.
>

Great - please give us some feedback on what you think about it.

Cheers,

Rainer


>
> Thanks again.
>
> Michael
>
>
>
>
> On 15 October 2010 03:17, Rainer M Krug  wrote:
> >
> >
> > On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward <
> michael.bedw...@gmail.com>
> > wrote:
> >>
> >> Hi Juan,
> >>
> >> Yes, you can use EMD to quantify the difference between any pair of
> >> histograms regardless of their shape. The only constraint, at least
> >> the way that I've done it previously, is to have compatible bins. The
> >> original application of EMD was to compare images based on colour
> >> histograms which could have all sorts of shapes.
> >>
> >> I looked at the package that Dennis alerted me to on RForge but
> >> unfortunately it seems to be inactive
> >
> > No - well, it depends how you define inactive: the functionality we
> wanted
> > to include is included, therefore no further development was necessary.
> >
> >>
> >> and the nightly builds are broken. I've downloaded the source code and
> >> will have a look at it
> >> sometime in the next few days.
> >
> > Thanks for alerting us - we will look into that. But just don't use the
> > nightly builds, as they are not different to the last release. Just
> download
> > the package for your system (I assume Windows or mac, as I just installed
> > from source without problems under Linux).
> > Let me know if it doesn't work,
> > Cheers,
> > Rainer
> >
> >>
> >> Meanwhile, let me know if you want a copy of my own code. It uses the
> >> lpSolve package.
> >>
> >> Michael
> >>
> >> On 14 October 2010 08:46, Juan Pablo Fededa  wrote:
> >> > Hi Michael,
> >> >
> >> >
> >> > I have the same challenge, can you use this earth movers distance it
> to
> >> > compare bimodal distributions?
> >> > Thanks & cheers,
> >> >
> >> >
> >> > Juan
> >> >
> >> >
> >> > On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward
> >> > 
> >> > wrote:
> >> >>
> >> >> Just to add to Greg's comments: I've previously used 'Earth Movers
> >> >> Distance' to compare histograms. Note, this is a distance metric
> >> >> rather than a parametric statistic (ie. not a test) but it at least
> >> >> provides a consistent way of quantifying similarity.
> >> >>
> >> >> It's relatively easy to implement the metric in R (formulating it as
> a
> >> >> linear programming problem). Happy to dig out the code if needed.
> >> >>
> >> >> Michael
> >> >>
> >> >> On 13 October 2010 02:44, Greg Snow  wrote:
> >> >> > That depends a lot on what you mean by the histograms being
> >> >> > equivalent.
> >> >> >
> >> >> > You could just plot them and compare visually.  It may be easier to
> >> >> > compare them if you plot density estimates rather than histograms.
> >> >> >  Even
> >> >> > better would be to do a qqplot comparing the 2 sets of data rather
> >> >> > than the
> >> >> > histograms.
> >> >> >
> >> >> > If you want a formal test then the ks.test function can compare 2
> >> >> > datasets.  Note that the null hypothesis is that they come from the
> >> >> > same
> >> >> > distribution, a significant result means that they are likely
> >> >> > different (but
> >> >> > the difference may not be of practical importance), but a
> >> >> > non-significant
> >> >> > test could mean they are the same, or that you just do not have
> >> >> > enough power
> >> >> > to find the difference (or the difference is hard for the ks test
> to
> >> >> > see).
> >> >> >  You could also use a chi-squared test to compare this way.
> >> >> >
> >> >> > Another approach would be to use the vis.test function from the
> >> >> > TeachingDemos package.  Write a small function that will either
> plot
> >> >> > your 2
> >> >> > histograms (density plots), 

[R] Downloading file with lapply

2010-10-15 Thread Santosh Srinivas
I'm still getting familiar with lapply

I have this date sequence
x <- seq(as.Date("01-Jan-2010",format="%d-%b-%Y"), Sys.Date(), by=1) #to
generate series of dates

I want to apply the function for all values of x . so I use lapply (Still a
newbie!)
I wrote this test function
pFun <- function (x) {
  print(paste("This is: ",x,sep=""))
  }


When I run it:
> lapply(x,pFun(x))
It gives the result correctly.. but end with an error

 Error in match.fun(FUN) : 
  'pFun(x)' is not a function, character or symbol

Question 1: What is the issue here??


Now, to the real problem. I wrote a function to cmDownFun(sDate)which is
working correctly
If I do cmDownFun (x[6]) .. it works correctly .. (i.e. my function was ok
this time around!)

Hoewever if I do, lapply (x,cmDownFun(x))
It fails at 01-Jan-2010  this is because of: HTTP status was '404 Not
Found'

This is OK, because the file does not exist ... but I want to continue for
the remaining values of x
lapply(x,try(cmDownFun(x),silent = TRUE)) .. does not work
I also put the try inside the function itself but does not work either.




--
Thanks R-Helpers. Yes, this is a silly question and it will not be repeated!
:-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Joshua Wiley
Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.

> probs <- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
> probabilites
> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> F[,1] <- probs[,1,drop=TRUE];
> add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>
>
> system.time(F.slow <- t(apply(probs, 1, cumsum)))
   user  system elapsed
 36.758   0.416  42.464
>
> system.time(for (cc in 2:ncol(F)) {
+  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328
>
> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
> probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
   user  system elapsed
  0.420   0.072   0.539
>

.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 3 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:

> (20*.014)/60/60
[1] 0.778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh



On Thu, Oct 14, 2010 at 11:51 PM, Gregor  wrote:
> Dear all,
>
> Maybe the "easiest" solution: Is there anything that speaks against 
> generalizing
> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>
> "cumsum(Matrix, 1)"
> equivalent to
> "apply(Matrix, 1, cumsum)"
>
> The main advantage could be optimized code if the Matrix is extreme nonsquare
> (e.g. 100,000x10), but the summation is done over the short side (in this 
> case 10).
> apply would practically yield a loop over 100,000 elements, and vectorization 
> w.r.t.
> the long side (loop over 10 elements) provides considerable efficiency gains.
>
> Many regards,
> Gregor
>
>
>
>
> On Tue, 12 Oct 2010 10:24:53 +0200
> Gregor  wrote:
>
>> Dear all,
>>
>> I am struggling with a (currently) cost-intensive problem: calculating the
>> (non-normalized) cumulative distribution function, given the (non-normalized)
>> probabilities. something like:
>>
>> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
>> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
>>
>> One (already faster, but for sure not ideal) solution - thanks to Henrik 
>> Bengtsson:
>>
>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>> F[,1] <- probs[,1,drop=TRUE];
>> for (cc in 2:ncol(F)) {
>>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>> }
>>
>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step 
>> around
>> 200,000 times, so speed is crucial. I currently can make sure to have no 
>> NAs, but
>> in order to extend matrixStats, this could be a nontrivial issue.
>>
>> Any ideas for speeding up this - probably routine - task?
>>
>> Thanks in advance,
>> Gregor
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in bestglm, glm, and lm - why do they differ?

2010-10-15 Thread Bill.Venables
AIC is only defined up to an additive constant (as is log-likelihood).

It should not surprise you that the values for AIC differ between packages.

The real question is whether the change in AIC when going form one model to 
anoth is the same.  If not, one is wrong (at least). 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Darren M Gillis
Sent: Friday, 15 October 2010 1:37 PM
To: r-help@r-project.org
Subject: [R] AIC in bestglm, glm, and lm - why do they differ?

I recently found the "bestglm" package while trolling CRAN for a function to
save some keystrokes with simple (k<10) nested models.  I am interested in
using the best of several nested linear models which I have explored using
lm(), glm() and now bestglm().  When I compare the AIC values for the 'best'
candidate model I get the same AIC values with lm() and glm() but a
different AIC (and likelihood) value from bestglm().  Is this the result of
some difference in likelihood calculation that I am missing in reviewing the
documentation and help files? I can provide code if there is interest in
looking into this, otherwise I will continue to assemble my tables the long
way with glm() and lm(), though the options and potential of the bestglm()
package has me very interested.

 

Cheers, Darren Gillis

 

Biological Sciences

University of Manitoba

Winnipeg, MB

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using apply function and storing output

2010-10-15 Thread David A.

Hi list,

I have a 1710x244 matrix of numerical values and I would like to calculate the 
mean of every group of three consecutive values per column to obtain a new 
matrix of 570x244.  I could get it done using a for loop but how can I do that 
using apply functions?
In addition to this, do I have to initizalize a 570x244 matrix with 0's to 
store the calculated values or can the output matrix be generated while 
calculating the mean values?

Cheers,

Dave
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading file with lapply

2010-10-15 Thread Uwe Ligges



On 15.10.2010 09:19, Santosh Srinivas wrote:

I'm still getting familiar with lapply

I have this date sequence
x<- seq(as.Date("01-Jan-2010",format="%d-%b-%Y"), Sys.Date(), by=1) #to
generate series of dates

I want to apply the function for all values of x . so I use lapply (Still a
newbie!)
I wrote this test function
pFun<- function (x) {
   print(paste("This is: ",x,sep=""))
   }


When I run it:

lapply(x,pFun(x))

>

It gives the result correctly.. but end with an error

  Error in match.fun(FUN) :
   'pFun(x)' is not a function, character or symbol

Question 1: What is the issue here??


You have to pass the function, not to call it already:

lapply(x, pFun)




Now, to the real problem. I wrote a function to cmDownFun(sDate)which is
working correctly
If I do cmDownFun (x[6]) .. it works correctly .. (i.e. my function was ok
this time around!)

Hoewever if I do, lapply (x,cmDownFun(x))
It fails at 01-Jan-2010  this is because of: HTTP status was '404 Not
Found'

This is OK, because the file does not exist ... but I want to continue for
the remaining values of x
lapply(x,try(cmDownFun(x),silent = TRUE)) .. does not work
I also put the try inside the function itself but does not work either.



Then either use try() in cmDownFun directly or use an anonymous function 
rather than a call, again:


lapply(x, function(x) try(cmDownFun(x),silent = TRUE))

Uwe Ligges






--
Thanks R-Helpers. Yes, this is a silly question and it will not be repeated!
:-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread Dennis Murphy
Hi:

Look into the rollmean() function in package zoo.

HTH,
Dennis

On Fri, Oct 15, 2010 at 12:34 AM, David A.  wrote:

>
> Hi list,
>
> I have a 1710x244 matrix of numerical values and I would like to calculate
> the mean of every group of three consecutive values per column to obtain a
> new matrix of 570x244.  I could get it done using a for loop but how can I
> do that using apply functions?
> In addition to this, do I have to initizalize a 570x244 matrix with 0's to
> store the calculated values or can the output matrix be generated while
> calculating the mean values?
>
> Cheers,
>
> Dave
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread Dimitris Rizopoulos
one efficient way to do this, avoiding loops, is using the rowsum() 
function, e.g.,


mat <- matrix(rnorm(1710*244), 1710, 244)

id <- rep(seq_len(570), each = 3)
means <- rowsum(mat, id, FALSE) / 3

Regarding the second part of your question, indeed a "gold" rule of 
efficient R programming when it comes to loops is to pre-allocate the 
output object. Check for instance,


system.time({
x <- NULL
for (i in 1:5e04) x <- c(x, rnorm(1))
})

versus

system.time({
x <- numeric(5e04)
for (i in 1:5e04) x[i] <- rnorm(1)
})


I hope it helps.

Best,
Dimitris


On 10/15/2010 9:34 AM, David A. wrote:


Hi list,

I have a 1710x244 matrix of numerical values and I would like to calculate the 
mean of every group of three consecutive values per column to obtain a new 
matrix of 570x244.  I could get it done using a for loop but how can I do that 
using apply functions?
In addition to this, do I have to initizalize a 570x244 matrix with 0's to 
store the calculated values or can the output matrix be generated while 
calculating the mean values?

Cheers,

Dave

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
Hello Dennis,

That's a very good suggestion. I've attached a template here as a .png file,
I hope you can view it. This is what I've managed to achieve in S-Plus (we
use S-Plus at work but I also use R because there's some very good R
packages for PK data that I want to take advantage of that is not available
in S-Plus). The only problem with this is, unfortunately, I cannot figure
out how make the scale non-uniform and I hope to fix that. My data looks
like this:

IDDose Time Conc  Pred ...
1 5   0  00
1 5   0.5   68
1 5   1 16   20
...
1 7   0  00
1 7   0.5  10   12
1 7   1 20   19
...
110  3 60   55
...
2512   4 2
...
ect


I don't care if it's ggplot or something else as long as it looks like how I
envisioned.




On Fri, Oct 15, 2010 at 12:22 AM, Dennis Murphy  wrote:

> I don't recall that you submitted a reproducible example to use as a
> template for assistance. Ista was kind enough to offer a potential solution,
> but it was an abstraction based on the limited information provided in your
> previous mail. If you need help, please provide an example data set that
> illustrates the problems you're encountering and what you hope to achieve -
> your chances of a successful resolution will be much higher when you do.
> BTW, there's a dedicated newsgroup for ggplot2:
> look for the mailing list link at  http://had.co.nz/ggplot2/
>
> HTH,
> Dennis
>
>
> On Thu, Oct 14, 2010 at 10:02 PM, Anh Nguyen  wrote:
>
>> I found 2 problems with this method:
>>
>> - There is only one line for predicted dose at 5 mg.
>> - The different doses are 5, 7, and 10 mg but somehow there is a legend
>> for
>> 5,6,7,8,9,10.
>> - Is there a way to make the line smooth?
>> - The plots are also getting a little crowded and I was wondering if there
>> a
>> way to split it into 2 or more pages?
>>
>> Thanks for your help.
>>
>> On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn > >wrote:
>>
>> > Hi,
>> > Assuming the data is in a data.frame named "D", something like
>> >
>> > library(ggplot2) # May need install.packages("ggplot2") first
>> > ggplot(D, aes(x=Time, y=Concentration, color=Dose) +
>> > geom_point() +
>> > geom_line(aes(y = PredictedConcentration, group=1)) +
>> > facet_wrap(~ID, scales="free", ncol=3)
>> >
>> > should do it.
>> >
>> > -Ista
>> > On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo 
>> wrote:
>> > >
>> > > Hello-- I have a data for small population who took 1 drug at 3
>> different
>> > > doses. I have the actual drug concentrations as well as predicted
>> > > concentrations by my model. This is what I'm looking for:
>> > >
>> > > - Time vs Concentration by ID (individual plots), with each subject
>> > > occupying 1 plot -- there is to be 9 plots per page (3x3)
>> > > - Observed drug concentration is made up of points, and predicted drug
>> > > concentration is a curve without points. Points and curve will be the
>> > same
>> > > color for each dose. Different doses will have different colors.
>> > > - A legend to specify which color correlates to which dose.
>> > > - Axes should be different for each individual (as some individual
>> will
>> > have
>> > > much higher drug concentration than others) and I want to see in
>> detail
>> > how
>> > > well predicted data fits observed data.
>> > >
>> > > Any help would be greatly appreciated.
>> > > --
>> > > View this message in context:
>> >
>> http://r.789695.n4.nabble.com/Time-vs-Concentration-Graphs-by-ID-tp2996431p2996431.html
>> > > Sent from the R help mailing list archive at Nabble.com.
>> > >
>> > > __
>> > > R-help@r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>> >
>> >
>> >
>> > --
>> > Ista Zahn
>> > Graduate student
>> > University of Rochester
>> > Department of Clinical and Social Psychology
>> > http://yourpsyche.org
>> >
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
<>__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code

[R] Help with R

2010-10-15 Thread Muteba Mwamba, John
Dear Sir/Madam;
I'm not sure whether this is the correct contact for help.
I've been recently working with R on my project, unfortunately It sudenly 
crashes!
It gives me the following message:
"FATAL ERROR: unable to restore saved data in .RDATA"
I decided to uninstall the copy (a R2.11.0) and installed a new version 
(2.11.1) but I'm still receiving the same message. When I click OK the closes.
Please help
John Weirstrass Muteba Mwamba
Department of Economics and Econometrics
University of Johannesburg
Tel: +27-11-559-4371
Fax: +27-11-559-3039
Email: joh...@uj.ac.za
D-Ring 218


This email and all contents are subject to the following disclaimer:

http://disclaimer.uj.ac.za

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading file with lapply

2010-10-15 Thread Santosh Srinivas
Thx.

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: 15 October 2010 13:11
To: Santosh Srinivas
Cc: r-help@r-project.org
Subject: Re: [R] Downloading file with lapply



On 15.10.2010 09:19, Santosh Srinivas wrote:
> I'm still getting familiar with lapply
>
> I have this date sequence
> x<- seq(as.Date("01-Jan-2010",format="%d-%b-%Y"), Sys.Date(), by=1) #to
> generate series of dates
>
> I want to apply the function for all values of x . so I use lapply (Still
a
> newbie!)
> I wrote this test function
> pFun<- function (x) {
>print(paste("This is: ",x,sep=""))
>}
>
>
> When I run it:
>> lapply(x,pFun(x))
 >
> It gives the result correctly.. but end with an error
>
>   Error in match.fun(FUN) :
>'pFun(x)' is not a function, character or symbol
>
> Question 1: What is the issue here??

You have to pass the function, not to call it already:

lapply(x, pFun)


>
> Now, to the real problem. I wrote a function to cmDownFun(sDate)which is
> working correctly
> If I do cmDownFun (x[6]) .. it works correctly .. (i.e. my function was ok
> this time around!)
>
> Hoewever if I do, lapply (x,cmDownFun(x))
> It fails at 01-Jan-2010  this is because of: HTTP status was '404 Not
> Found'
>
> This is OK, because the file does not exist ... but I want to continue for
> the remaining values of x
> lapply(x,try(cmDownFun(x),silent = TRUE)) .. does not work
> I also put the try inside the function itself but does not work either.


Then either use try() in cmDownFun directly or use an anonymous function 
rather than a call, again:

lapply(x, function(x) try(cmDownFun(x),silent = TRUE))

Uwe Ligges


>
>
>

> --
> Thanks R-Helpers. Yes, this is a silly question and it will not be
repeated!
> :-)
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] import statsoft statistica files?

2010-10-15 Thread marbal


orduek wrote:
> 
> Is there an option to import .sta files to R?
> I know you can import SPSS ones.
> thank you.
> 

Hi:
I have the same problem.
Did you manage in the end to import .sta files to R?
thank you
-- 
View this message in context: 
http://r.789695.n4.nabble.com/import-statsoft-statistica-files-tp875653p2996629.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Set value if else...

2010-10-15 Thread Joel

Hi

I want to set a variable to either 1 or 0 depending on an value of a
dataframe and then add this as a colum to the dataframe.

This could be done with a loop but as we are able to do questions on a
complete row or colum without a loop it would be sweet if it could be done.

for example:

table:

Name  Age
Joel 24
agust   17
maja40
and so on...

And what I want to do is a command that gives me 
VoteRight<-1 if table$age>18 else set VoteRight to 0

Then use cbind or something to add it to the table.

so i get
Name  Age  VoteRight
Joel 241
agust   170
maja401

And as I said before im guessing this can be done without a loop...

//Joel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Joshua Wiley
On Fri, Oct 15, 2010 at 12:23 AM, Joshua Wiley  wrote:
>
> Hi,
>
> You might look at Reduce().  It seems faster.  I converted the matrix
> to a list in an incredibly sloppy way (which you should not emulate)
> because I cannot think of  the simple way.

Dennis provided the answer:  system.time(add(unclass(as.data.frame(probs

I probably could have suggested Reduce days ago if I was anywhere near
that efficient in my coding

>
> > probs <- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
> > probabilites
> > F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> > F[,1] <- probs[,1,drop=TRUE];
> > add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
> >
> >
> > system.time(F.slow <- t(apply(probs, 1, cumsum)))
>   user  system elapsed
>  36.758   0.416  42.464
> >
> > system.time(for (cc in 2:ncol(F)) {
> +  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> + })
>   user  system elapsed
>  0.980   0.196   1.328
> >
> > system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
> > probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
>   user  system elapsed
>  0.420   0.072   0.539

Josh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Biweight (Tukey) M-estimator of location with known scale parameter

2010-10-15 Thread Ondrej Vozar
Dear colleagues,

I would like to ask you how to estimate
biweight M-estimator of Tukey with known
scale example.

I know how to estimate biweight M-estimator
if estimated scale is used using function
rml in package MASS.

library(MASS)
x<-rnorm(1000)
rlm(x~1,psi="psi.bisquare")

But I would like to estimated this
with true scale s=1.

I also checked package robustbase, but I have not
found any solution.

Thanks for any advice.
Ondrej Vozar

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Set value if else...

2010-10-15 Thread Joshua Wiley
Dear Joel,

On Fri, Oct 15, 2010 at 1:16 AM, Joel  wrote:
>
> Hi
>
> I want to set a variable to either 1 or 0 depending on an value of a
> dataframe and then add this as a colum to the dataframe.
>
> This could be done with a loop but as we are able to do questions on a
> complete row or colum without a loop it would be sweet if it could be done.
>
> for example:
>
> table:
>
> Name  Age
> Joel     24
> agust   17
> maja    40
> and so on...
>
> And what I want to do is a command that gives me
> VoteRight<-1 if table$age>18 else set VoteRight to 0

You're closer than you know:

table$VoteRight <- ifelse(table$age > 18, 1, 0)

?ifelse   is a vectorized function [ifelse(logical test, value if
TRUE, value if FALSE)], and you would not even need to cbind() it into
your table.

Josh

>
> Then use cbind or something to add it to the table.
>
> so i get
> Name  Age  VoteRight
> Joel     24    1
> agust   17    0
> maja    40    1
>
> And as I said before im guessing this can be done without a loop...
>
> //Joel
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Set value if else...

2010-10-15 Thread Dimitris Rizopoulos

try this:

table$VoteRight <- as.numeric(table$age > 18)



Best,
Dimitris


On 10/15/2010 10:16 AM, Joel wrote:


Hi

I want to set a variable to either 1 or 0 depending on an value of a
dataframe and then add this as a colum to the dataframe.

This could be done with a loop but as we are able to do questions on a
complete row or colum without a loop it would be sweet if it could be done.

for example:

table:

Name  Age
Joel 24
agust   17
maja40
and so on...

And what I want to do is a command that gives me
VoteRight<-1 if table$age>18 else set VoteRight to 0

Then use cbind or something to add it to the table.

so i get
Name  Age  VoteRight
Joel 241
agust   170
maja401

And as I said before im guessing this can be done without a loop...

//Joel


--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Set value if else...

2010-10-15 Thread Joel

Indeed I was close :)

Thx for the fast respond!

Have a good day

//Joel
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996682.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread David Winsemius


On Oct 15, 2010, at 3:46 AM, Anh Nguyen wrote:


Hello Dennis,

That's a very good suggestion. I've attached a template here as  
a .png file,
I hope you can view it. This is what I've managed to achieve in S- 
Plus (we

use S-Plus at work but I also use R because there's some very good R
packages for PK data that I want to take advantage of that is not  
available
in S-Plus). The only problem with this is, unfortunately, I cannot  
figure

out how make the scale non-uniform and I hope to fix that.


That would be easy if your efforts which I have not yet seen were in  
lattice. If htat were the case then adding this would solve you problem:


scales=list(y=list(relation="free")

--
David

My data looks
like this:

IDDose Time Conc  Pred ...
1 5   0  00
1 5   0.5   68
1 5   1 16   20
...
1 7   0  00
1 7   0.5  10   12
1 7   1 20   19
...
110  3 60   55
...
2512   4 2
...
ect


I don't care if it's ggplot or something else as long as it looks  
like how I

envisioned.




On Fri, Oct 15, 2010 at 12:22 AM, Dennis Murphy   
wrote:



I don't recall that you submitted a reproducible example to use as a
template for assistance. Ista was kind enough to offer a potential  
solution,
but it was an abstraction based on the limited information provided  
in your
previous mail. If you need help, please provide an example data set  
that
illustrates the problems you're encountering and what you hope to  
achieve -
your chances of a successful resolution will be much higher when  
you do.

BTW, there's a dedicated newsgroup for ggplot2:
look for the mailing list link at  http://had.co.nz/ggplot2/

HTH,
Dennis


On Thu, Oct 14, 2010 at 10:02 PM, Anh Nguyen   
wrote:



I found 2 problems with this method:

- There is only one line for predicted dose at 5 mg.
- The different doses are 5, 7, and 10 mg but somehow there is a  
legend

for
5,6,7,8,9,10.
- Is there a way to make the line smooth?
- The plots are also getting a little crowded and I was wondering  
if there

a
way to split it into 2 or more pages?

Thanks for your help.

On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn  

wrote:



Hi,
Assuming the data is in a data.frame named "D", something like

library(ggplot2) # May need install.packages("ggplot2") first
ggplot(D, aes(x=Time, y=Concentration, color=Dose) +
geom_point() +
geom_line(aes(y = PredictedConcentration, group=1)) +
facet_wrap(~ID, scales="free", ncol=3)

should do it.

-Ista
On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo 

wrote:


Hello-- I have a data for small population who took 1 drug at 3

different

doses. I have the actual drug concentrations as well as predicted
concentrations by my model. This is what I'm looking for:

- Time vs Concentration by ID (individual plots), with each  
subject

occupying 1 plot -- there is to be 9 plots per page (3x3)
- Observed drug concentration is made up of points, and  
predicted drug
concentration is a curve without points. Points and curve will  
be the

same

color for each dose. Different doses will have different colors.
- A legend to specify which color correlates to which dose.
- Axes should be different for each individual (as some individual

will

have

much higher drug concentration than others) and I want to see in

detail

how

well predicted data fits observed data.

Any help would be greatly appreciated.
--
View this message in context:



http://r.789695.n4.nabble.com/Time-vs-Concentration-Graphs-by-ID-tp2996431p2996431.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



  [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org 

Re: [R] Set value if else...

2010-10-15 Thread Rubén Roa
x <- data.frame(x=1:10)
require(gtools)
x$y <- ifelse(odd(x$x),0,1)
HTH

R.

 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Joel
> Enviado el: viernes, 15 de octubre de 2010 10:16
> Para: r-help@r-project.org
> Asunto: [R] Set value if else...
> 
> 
> Hi
> 
> I want to set a variable to either 1 or 0 depending on an 
> value of a dataframe and then add this as a colum to the dataframe.
> 
> This could be done with a loop but as we are able to do 
> questions on a complete row or colum without a loop it would 
> be sweet if it could be done.
> 
> for example:
> 
> table:
> 
> Name  Age
> Joel 24
> agust   17
> maja40
> and so on...
> 
> And what I want to do is a command that gives me
> VoteRight<-1 if table$age>18 else set VoteRight to 0
> 
> Then use cbind or something to add it to the table.
> 
> so i get
> Name  Age  VoteRight
> Joel 241
> agust   170
> maja401
> 
> And as I said before im guessing this can be done without a loop...
> 
> //Joel
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create Arrays

2010-10-15 Thread dpender

Hi,

For this example:

O <- c(0 0 0 2 0 0 2 0)

I want to create an array every time O[i] > 0.  The array should be in the
form;

R[j] <- array(-1, dim=c(2,O[i])) 

i.e. if O[i] > 0 4 times I want 4 R arrays.

Does anyone have any suggestions?

Thanks,

Doug
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2996706.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread David A.

Hi Dennis and Dimitris,

thanks for your answers. I am trying the rollmean() function and also the 
rollapply() function because I also want to calculate CV for the values.
For this I created a co.var() function. I am having problems using them.

>co.var<-function(x)(
+sd(x)/mean(x)
+)


> dim(mydata)
[1] 1710  244

>xxx<-rollmean(mydata,3,by=3) 

works fine and creates a vector which I will transform into a matrix. I still 
have to find out how to store the output in my previously created 570x244 0's 
matrix in an ordered way.

but, since the examples in the help page says it´s the same, I tried

> xxx<-rollapply(mydata,3,mean,by=3)
Error in UseMethod("rollapply") : 
  no applicable method for 'rollapply' applied to an object of class 
"c('matrix', 'integer', 'numeric')"

and, with my created function...

> xxx<-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3)
Error in UseMethod("rollapply") : 
  no applicable method for 'rollapply' applied to an object of class 
"c('matrix', 'integer', 'numeric')"

Can you help me with the error?

Dave

Date: Fri, 15 Oct 2010 00:45:08 -0700
Subject: Re: [R] using apply function and storing output
From: djmu...@gmail.com
To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi:

Look into the rollmean() function in package zoo.

HTH,
Dennis

On Fri, Oct 15, 2010 at 12:34 AM, David A.  wrote:



Hi list,



I have a 1710x244 matrix of numerical values and I would like to calculate the 
mean of every group of three consecutive values per column to obtain a new 
matrix of 570x244.  I could get it done using a for loop but how can I do that 
using apply functions?


In addition to this, do I have to initizalize a 570x244 matrix with 0's to 
store the calculated values or can the output matrix be generated while 
calculating the mean values?



Cheers,



Dave



[[alternative HTML version deleted]]



__

R-help@r-project.org mailing list



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread Barry Rowlingson
On Fri, Oct 15, 2010 at 9:55 AM, dpender  wrote:
>
> Hi,
>
> For this example:
>
> O <- c(0 0 0 2 0 0 2 0)
>
> I want to create an array every time O[i] > 0.  The array should be in the
> form;
>
> R[j] <- array(-1, dim=c(2,O[i]))
>
> i.e. if O[i] > 0 4 times I want 4 R arrays.
>
> Does anyone have any suggestions?
>

 Suggestion number l is don't use O for objects! Far too confusing!

 Serious suggestion is that a concrete example will help. Let me try:

If:

 X = c(0,0,0,2,0,0,3,0) # I'm not using O here!

 Then

R will be a list of length 2 because there are 2 values in X bigger than 0.

R[1] will be array(-1,dim=c(2,2))   # because X[4] is 2
and
R[2] will be array(-1,dim=c(2,3))  # because X[7] is 3

 Yup?

 Okay, first get rid of the zeroes:

Xnz = X[X!=0]

 That simplifies the problem. Then use lapply to iterate over Xnz with
a function that returns the array given the value:

> lapply(Xnz,function(x){array(-1,dim=c(2,x))})
[[1]]
 [,1] [,2]
[1,]   -1   -1
[2,]   -1   -1

[[2]]
 [,1] [,2] [,3]
[1,]   -1   -1   -1
[2,]   -1   -1   -1

 2-d arrays are just matrices, so you can do it all in one line with:

lapply(X[X!=0],function(x){matrix(-1,2,x)})

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread Gerrit . Eichner

Hi, Doug,

maybe

columns <- c( 0, 3, 0, 2, 0, 1)
lapply( columns[ columns > 0],
function( o) array( -1, dim = c( 2, o)))

does what you want?

Regards -- Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109  http://www.uni-giessen.de/~gcb7
-


Zitat von dpender :



Hi,

For this example:

O <- c(0 0 0 2 0 0 2 0)

I want to create an array every time O[i] > 0.  The array should be in the
form;

R[j] <- array(-1, dim=c(2,O[i]))

i.e. if O[i] > 0 4 times I want 4 R arrays.

Does anyone have any suggestions?

Thanks,

Doug
--
View this message in context:  
http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2996706.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random assignment

2010-10-15 Thread John Haart
Dear List,

I am doing some simulation in R and need basic help! 

I have a list of animal families for which i know the number of species in each 
family.

I am working under the assumption that a species has a 7.48% chance of being at 
risk. 

I want to simulate the number of species expected to be at risk under a random 
binomial distribution with 10,000 randomizations. 

I am relatively knew to this field and would greatly appreciate a "idiot-proof" 
response, I.e how should the data be entered into R? I was thinking of using 
read.table, header = T, where the table has F = Family Name, and SP = Number of 
species in that family?

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Biweight (Tukey) M-estimator of location with known scale parameter

2010-10-15 Thread Christian Hennig

Have a look at the package smoothmest.

Christian

On Fri, 15 Oct 2010, Ondrej Vozar wrote:


Dear colleagues,

I would like to ask you how to estimate
biweight M-estimator of Tukey with known
scale example.

I know how to estimate biweight M-estimator
if estimated scale is used using function
rml in package MASS.

library(MASS)
x<-rnorm(1000)
rlm(x~1,psi="psi.bisquare")

But I would like to estimated this
with true scale s=1.

I also checked package robustbase, but I have not
found any solution.

Thanks for any advice.
Ondrej Vozar

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interaction contrasts

2010-10-15 Thread Kay Cichini

hello,

i was shortly asking the list for help with some interaction contrasts (see
below) for which
i had to change the reference level of the model "on the fly" (i read a post
that this is possible in 
multcomp).

if someone has a clue how this is coded in multcomp; glht() - please point
me there.

yours,
kay


Kay Cichini wrote:
> 
> hello list,
> 
> i'd very much appreciate help with setting up the 
> contrast for a 2-factorial crossed design.
> 
> here is a toy example:
> 
> library(multcomp)
> 
> dat<-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]),
> fac2=rep(c("I","II"),16),y=rnorm(32,1,1))
> 
> mod<-lm(y~fac1*fac2,data=dat)
> 
> ## the contrasts i'm interressted in:
> 
> c1<-rbind("fac2-effect in A"=c(0,1,0,0,0,0,0,0),
>   "fac2-effect in B"=c(0,1,0,0,0,1,0,0),
>   "fac2-effect in C"=c(0,1,0,0,0,0,1,0),
>   "fac2-effect in D"=c(0,1,0,0,0,0,0,1),
>   "fac2-effect, A*B"=c(0,0,0,0,0,1,0,0),
>   "fac2-effect, A*C"=c(0,0,0,0,0,0,1,0),
>   "fac2-effect, A*D"=c(0,0,0,0,0,0,0,1))
> 
> summary(glht(mod,c1))
> 
> ## now i want to add the remaining combinations 
> ## "fac2, B*C"
> ## "fac2, B*D"
> ## "fac2, C*D"
> ## to the simultanous tests to see whether the effects 
> ## of fac2 within the levels of fac1 differ between  
> ## each combination of the levels of fac1, or not ??
> 
> thanks for any advise!
> 
> yours,
> kay
> 
> 
> 


-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


-- 
View this message in context: 
http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996772.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Dennis Murphy
Hi:

To get the plots precisely as you have given them in your png file, you're
most likely going to have to use base graphics, especially if you want a
separate legend in each panel. Packages ggplot2 and lattice have more
structured ways of constructing such graphs, so you give up a bit of freedom
in the details of plot construction to get nicer default configurations.

Perhaps you've written a panel function in S-PLUS (?) to produce each graph
in your png file - if so, you could simply add a couple of lines to
determine the max y-value and from that the limits of the y scale. It
shouldn't be at all difficult to make such modifications. Since R base
graphics is very similar to base graphics in S-PLUS, you could possibly get
away with popping your S-PLUS code directly into R and see how far that
takes you.

Lattice is based on Trellis graphics, but the syntax in lattice has changed
a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more
recent graphics system in R predicated on the grammar of graphics exposited
by Leland Wilkinson.

For my example, I've modified the Theophylline data set in package nlme,
described in Pinheiro & Bates (2000), Mixed Effects Models in S and S-PLUS,
Springer. The original data set has eleven unique doses - I combined them
into three intervals and converted them to factor with cut(). I also created
four groups of Subjects and put them into a variable ID. The output data
frame is called theo. I didn't fit the nlme models to the subjects -
instead, I was lazy and used loess smoothing instead. The code to generate
the data frame is given below; this is what we mean by a 'reproducible
example', something that can be copied and pasted into an R session.

# Use modified version of Theophylline data in package nlme

library(nlme)
theo <- Theoph
theo <- subset(theo, Dose > 3.5)
theo$dose <- cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25',
'4.8', '5.5'))
theo <- as.data.frame(theo)
theo$ID <- with(theo, ifelse(Subject %in% c(6, 7, 12), 1,
  ifelse(Subject %in% c(2, 8, 10), 2,
  ifelse(Subject %in% c(4, 11, 5), 3, 4) )))
# ID is used for faceting, dose for color and shape

# lattice version:

xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3,
  pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'),
  scales = list(y = list(relation = 'free')),
  auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points =
TRUE,
  text = levels(theo$dose)) )

# ggplot2 version:
# scales = 'free_y' allows independent y scales per panel
g <- ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose,
   group = Subject))
g + geom_point() + geom_smooth(method = 'loess', se = FALSE) +
facet_wrap( ~ ID, ncol = 2, scales = 'free_y') +
opts(legend.position = c(0.9, 0.4))

This is meant to give you some indication how the two graphics systems work
- it's a foundation, not an end product. There are a couple of ways I could
have adjusted the y-scales in the lattice graphs (either directly in the
scales = ... part or to use a prepanel function for loess), but since you're
not likely to use loess in your plots, it's not an important consideration.

Both ggplot2 and lattice place the legend outside the plot area by default;
I've illustrated a couple of ways to pull it into the plot region FYI.

One other thing: if your data set contains fitted values from your PK models
for each subject * dose combination, the loess smoothing is unnecessary. In
ggplot2, you would use geom_line(aes(y = pred), ...) in place of
geom_smooth(), and in lattice you would replace 'smooth' by 'l' in the type
= argument of xyplot().

HTH,
Dennis


On Fri, Oct 15, 2010 at 12:46 AM, Anh Nguyen  wrote:

> Hello Dennis,
>
> That's a very good suggestion. I've attached a template here as a .png
> file, I hope you can view it. This is what I've managed to achieve in S-Plus
> (we use S-Plus at work but I also use R because there's some very good R
> packages for PK data that I want to take advantage of that is not available
> in S-Plus). The only problem with this is, unfortunately, I cannot figure
> out how make the scale non-uniform and I hope to fix that. My data looks
> like this:
>
> IDDose Time Conc  Pred ...
> 1 5   0  00
> 1 5   0.5   68
> 1 5   1 16   20
> ...
> 1 7   0  00
> 1 7   0.5  10   12
> 1 7   1 20   19
> ...
> 110  3 60   55
> ...
> 2512   4 2
> ...
> ect
>
>
> I don't care if it's ggplot or something else as long as it looks like how
> I envisioned.
>
>
>
>
> On Fri, Oct 15, 20

Re: [R] Create Arrays

2010-10-15 Thread dpender

Barry, Gerrit,

That was what I am after but unfortunately only the starting point.  I am
now trying to amend a function that inserts the R matrices into a dataset in
the correct places:

i.e.

H <- c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23)

T <- c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14)

O <- c(0, 0, 0, 3, 0, 0, 0, 2, 0)


R <- lapply(O[O!=0], function(x){matrix(-1,1,x)})

m <- rbind(H, T)

O2 <- rbind(O,O)

f <- function(x, y, z) {
if(nrow(x) != nrow(y) || nrow(y) != length(z)) 
  stop('unequal numbers of rows among inputs')
out <- matrix(NA,2,length(H)+sum(O))
for(i in 1:2) 
   out[[i,]] <- append(x[i,], as.numeric(z[[i]]), after = which(y[i,] >
0) - 1)
out
   }

f(m, O2, R)


f is the function that requires amendment and was originally written for a
single R value and treated each row of out separately.  

I now want to insert R[i] into m before every point that O[i] > 0

Hope this makes sense.

Doug

I don


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2996776.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R 2.12.0 is released

2010-10-15 Thread Peter Dalgaard
I've rolled up R-2.12.0.tar.gz a short while ago. This is a development
release which contains a number of new features.

Also, a number of mostly minor bugs have been fixed. See the full list
of changes below.

You can get it from

http://cran.r-project.org/src/base/R-2/R-2.12.0.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.

  For the R Core Team

  Peter Dalgaard

These are the md5sums for the freshly created files, in case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = ac9746b4845ae81f51cfc99262f5
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = dbdfe3f2451a4394b4f982e45ba4395c
MD5 (INSTALL) = 70447ae7f2c35233d3065b004aa4f331
MD5 (NEWS) = 8431585f21ee3e9a1cb83e1eb552d0e5
MD5 (ONEWS) = 0c3e10eef74439786e5fceddd06dac71
MD5 (OONEWS) = 58d5b3b614888a6e366ff0c60073584f
MD5 (R-latest.tar.gz) = aa003654d238d70bf5bc7433b8257aac
MD5 (README) = 296871fcf14f49787910c57b92655c76
MD5 (RESOURCES) = 020479f381d5f9038dcb18708997f5da
MD5 (THANKS) = f2ccf22f3e20ebaa86f8ee5cc6b0f655
MD5 (R-2/R-2.12.0.tar.gz) = aa003654d238d70bf5bc7433b8257aac

This is the relevant part of the NEWS file (hoping that the new format survives 
email...)

R News

CHANGES IN R VERSION 2.12.0:

  NEW FEATURES:

• Reading a packages's CITATION file now defaults to ASCII rather
  than Latin-1: a package with a non-ASCII CITATION file should
  declare an encoding in its DESCRIPTION file and use that encoding
  for the CITATION file.

• difftime() now defaults to the "tzone" attribute of "POSIXlt"
  objects rather than to the current timezone as set by the default
  for the tz argument.  (Wish of PR#14182.)

• pretty() is now generic, with new methods for "Date" and "POSIXt"
  classes (based on code contributed by Felix Andrews).

• unique() and match() are now faster on character vectors where
  all elements are in the global CHARSXP cache and have unmarked
  encoding (ASCII).  Thanks to Matthew Dowle for suggesting
  improvements to the way the hash code is generated in unique.c.

• The enquote() utility, in use internally, is exported now.

• .C() and .Fortran() now map non-zero return values (other than
  NA_LOGICAL) for logical vectors to TRUE: it has been an implicit
  assumption that they are treated as true.

• The print() methods for "glm" and "lm" objects now insert
  linebreaks in long calls in the same way that the print() methods
  for "summary.[g]lm" objects have long done.  This does change the
  layout of the examples for a number of packages, e.g. MASS.
  (PR#14250)

• constrOptim() can now be used with method "SANN".  (PR#14245)

  It gains an argument hessian to be passed to optim(), which
  allows all the ... arguments to be intended for f() and grad().
  (PR#14071)

• curve() now allows expr to be an object of mode "expression" as
  well as "call" and "function".

• The "POSIX[cl]t" methods for Axis() have been replaced by a
  single method for "POSIXt".

  There are no longer separate plot() methods for "POSIX[cl]t" and
  "Date": the default method has been able to handle those classes
  for a long time.  This _inter alia_ allows a single date-time
  object to be supplied, the wish of PR#14016.

  The methods had a different default ("") for xlab.

• Classes "POSIXct", "POSIXlt" and "difftime" have generators
  .POSIXct(), .POSIXlt() and .difftime().  Package authors are
  advised to make use of them (they are available from R 2.11.0) to
  proof against planned future changes to the classes.

  The ordering of the classes has been changed, so "POSIXt" is now
  the second class.  See the document ‘Updating packages for
  changes in R 2.12.x’ on http://developer.r-project.org> for
  the consequences for a handful of CRAN packages.

• The "POSIXct" method of as.Date() allows a timezone to be
  specified (but still defaults to UTC).

• New list2env() utility function as an inverse of
  as.list() and for fast multi-assign() to existing
  environment.  as.environment() is now generic and uses list2env()
  as list method.

• There are several small changes to output which ‘zap’ small
  numbers, e.g. in printing quantiles of residuals in summaries
  from "lm" and "glm" fits, and in test statisics in print.anova().

• Special names such as "dim", "names", etc, are now allowed as
  slot names of S4 classes, with "class" the only remaining
  exception.

• File .Renviron can have architecture-specific versions such as
  .Renviron.i386 on systems with sub-architectures.

• installed.packages() has a new argument subarch to filter on
  sub-architecture.

• The summary() method for packageStatus() now has a separate
  print() method.

Re: [R] Data Gaps

2010-10-15 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 14.10.2010 10:34:12:

> 
> Thanks Dennis.
> 
> 
> 
> One more thing if you don't mind.  How to I abstract the individual H 
and T
> “arrays” from f(m,o,l) so as I can combine them with a date/time array 
and
> write to a file?
> 

Try to look at ?merge function as it could do what you really want.

Regards
Petr


> 
> Sorry if it’s a simple question but I’m completely new to R.
> 
> 
> 
> Cheers,
> 
> 
> 
> Doug
> 
> 
> 
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Data-Gaps-
> tp2993317p2994997.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging and working with BIG data sets. Is sqldf the best way??

2010-10-15 Thread Chris Howden
Thanks for the advice Gabor,

I was indeed not starting and finishing with sqldf(). Which was why it was
not working for me. Please forgive a blatantly obvious mistake.


I have tried what U suggested and unfortunately R is still having problems
doing the join. The problem seems to be one of memory since I am receiving
the below error message when I run the natural join using sqldf.
Error: cannot allocate vector of size 250.0 Mb
Timing stopped at: 32.8 0.48 34.79


I have tried it on a subset of the data and it works. So I think it's a
memory issue, being caused by my very large dataset (11 million rows and 2
columns).

I think I may have to admit that R cannot do this (on my machine). And try
doing it in a full blown database such as postgre.


Unless U (or anyone else) have any other suggestions???

Thanks again for your help.



For anyone who's interested here's all my code and R log.
##
# Info on input data
##
> class(A)
[1] "data.frame"
> class(B)
[1] "data.frame"

> names(A)
[1] "POINTID"  "alistair"
> names(B)
[1] "POINTID""alistair_range"

> dim(A)
[1] 110485922
> dim(B)
[1] 110485922


##
# Tried the join with an index on the entire data set
##
> sqldf()


>  system.time(sqldf("create index ai1 on A(POINTID, alistair)"))
   user  system elapsed
  76.850.34   79.67

>  system.time(sqldf("create index ai2 on B(POINTID, alistair_range)"))
   user  system elapsed
  75.430.43   77.16

> system.time(sqldf("select * from main.A natural join main.B"))
Error: cannot allocate vector of size 250.0 Mb
Timing stopped at: 32.8 0.48 34.79

> sqldf()
Error in sqliteCloseConnection(conn, ...) :
  RS-DBI driver: (close the pending result sets before closing this
connection)


##
# Also tried the join with an index built from only the variable I intend
to merge on, since I wasn't exactly sure which index was correct.
##
> sqldf()


> system.time(sqldf("create index ai1 on A(POINTID)"))
   user  system elapsed
  66.670.44   69.28

> system.time(sqldf("create index ai2 on B(POINTID)"))
   user  system elapsed
  68.180.31   68.73

> system.time(sqldf("select * from main.A natural join main.B"))
Error: cannot allocate vector of size 31.2 Mb
Timing stopped at: 10.56 0.04 10.87

> sqldf()
Error in sqliteCloseConnection(conn, ...) :
  RS-DBI driver: (close the pending result sets before closing this
connection)

##
# and some memory info
##
> memory.size()
[1] 412.6

> memory.size(NA)
[1] 4095



Chris Howden
Founding Partner
Tricky Solutions
Tricky Solutions 4 Tricky Problems
Evidence Based Strategic Development, IP development, Data Analysis,
Modelling, and Training
(mobile) 0410 689 945
(fax / office) (+618) 8952 7878
ch...@trickysolutions.com.au


-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
Sent: Friday, 15 October 2010 1:03 PM
To: Chris Howden
Cc: r-help@r-project.org
Subject: Re: [R] merging and working with BIG data sets. Is sqldf the best
way??

On Thu, Oct 14, 2010 at 10:56 PM, Chris Howden
 wrote:
> Thanks for the suggestion and code Gabor,
>
> I've tried creating 2 indices:
>
> 1) just for the variable I intend to merge on
> 2) on the entire data set I am merging (which I think is the one I
should
> be using??)
>
> However neither seemed to work. The first was still going after 2 hours,
> and the second after 12 hours, so I stopped the join.
>
> If it's not too much bother I was wondering if U could let me know which
> index I should be using?
>
>
> Or in other words since I plan to merge using POINTID do I create an
index
> on
>
> system.time(sqldf("create index ai1 on A(POINTID)"))
> system.time(sqldf("create index ai2 on B(POINTID)"))
>
> or
>
> system.time(sqldf("create index ai1 on A(POINTID,alistair)"))
> system.time(sqldf("create index ai2 on B(POINTID, alistair_range)")
>
>
>
> I'm now using the following join statement
> system.time(data2 <- sqldf("select * from A natural join B"))
>

If you only ran the three sqldf statements you mentioned in your post
(thereby omitting 2 of the 5 sqldf calls in example 4i):

sqldf("create...")
sqldf("create...")
sqldf("select...")

then what you are doing is to create a database, upload your data to
it, create an index on it, destroy the database, then create a second
database, upload the data to this second database, create an second
index and then destroy that database too and then finally create a
third database, upload the data to it and then do a join without any
indexes.

You must bracket all this with empty sqldf calls as shown in 4i to
force persistence:

sqldf()
sqldf("create...")
sqldf("create...")
sqldf("select...")
sqldf()

or else put all of the sql statements in a vector to one sqldf call:

sqldf(c("create...", "create...", "select..."))

Also you can replace "select ..." with "explain query plan select
...", and it will let you know which indexes its actually using.  e.g.
in example 4i if we do that:

> sqldf("explain query plan select * from main.DF1 natural join main.DF2

Re: [R] Random assignment

2010-10-15 Thread Dennis Murphy
Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:

> Dear List,
>
> I am doing some simulation in R and need basic help!
>
> I have a list of animal families for which i know the number of species in
> each family.
>
> I am working under the assumption that a species has a 7.48% chance of
> being at risk.
>

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

>
> I want to simulate the number of species expected to be at risk under a
> random binomial distribution with 10,000 randomizations.
>

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis


>
> I am relatively knew to this field and would greatly appreciate a
> "idiot-proof" response, I.e how should the data be entered into R? I was
> thinking of using read.table, header = T, where the table has F = Family
> Name, and SP = Number of species in that family?
>
> John
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread Philipp Pagel
On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:

> "FATAL ERROR: unable to restore saved data in .RDATA"

Without more information it's hard to know what exactly went wrong.

Anyway, the message most likely means that the .RData file got
corrupted. Deleting it should solve the problem. Note that this means
that you will get an empty workspace and have to recreate whatever
data was in it before.

> I decided to uninstall the copy (a R2.11.0) and installed a new
> version (2.11.1) but I'm still receiving the same message. When I
> click OK the closes.

Re-installation of R will most likely not fix this (unless a change in
the format of the .RData files had occurred - but to my knowledge no
such thing has happened, recently.)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread peter dalgaard

On Oct 15, 2010, at 12:37 , Philipp Pagel wrote:

> On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:
> 
>> "FATAL ERROR: unable to restore saved data in .RDATA"
> 
> Without more information it's hard to know what exactly went wrong.
> 
> Anyway, the message most likely means that the .RData file got
> corrupted. Deleting it should solve the problem. Note that this means
> that you will get an empty workspace and have to recreate whatever
> data was in it before.

Renaming the file is a better idea. Then at least you have some chance of being 
able to load() it into an R Session and recover things from the workspace. No 
guarantees, though.

-pd

> 
>> I decided to uninstall the copy (a R2.11.0) and installed a new
>> version (2.11.1) but I'm still receiving the same message. When I
>> click OK the closes.
> 
> Re-installation of R will most likely not fix this (unless a change in
> the format of the .RData files had occurred - but to my knowledge no
> such thing has happened, recently.)
> 
> cu
>   Philipp
> 
> -- 
> Dr. Philipp Pagel
> Lehrstuhl für Genomorientierte Bioinformatik
> Technische Universität München
> Wissenschaftszentrum Weihenstephan
> Maximus-von-Imhof-Forum 3
> 85354 Freising, Germany
> http://webclu.bio.wzw.tum.de/~pagel/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread Petr PIKAL
Sometimes such message appears when you try to open .RData file in 
environment where packages used when the file was created are not 
installed. Than it is possible just to install necessary packages. Without 
whole story it is impossible to say what is real cause for that error.

Regards
Petr

r-help-boun...@r-project.org napsal dne 15.10.2010 12:45:30:

> 
> On Oct 15, 2010, at 12:37 , Philipp Pagel wrote:
> 
> > On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:
> > 
> >> "FATAL ERROR: unable to restore saved data in .RDATA"
> > 
> > Without more information it's hard to know what exactly went wrong.
> > 
> > Anyway, the message most likely means that the .RData file got
> > corrupted. Deleting it should solve the problem. Note that this means
> > that you will get an empty workspace and have to recreate whatever
> > data was in it before.
> 
> Renaming the file is a better idea. Then at least you have some chance 
of 
> being able to load() it into an R Session and recover things from the 
> workspace. No guarantees, though.
> 
> -pd
> 
> > 
> >> I decided to uninstall the copy (a R2.11.0) and installed a new
> >> version (2.11.1) but I'm still receiving the same message. When I
> >> click OK the closes.
> > 
> > Re-installation of R will most likely not fix this (unless a change in
> > the format of the .RData files had occurred - but to my knowledge no
> > such thing has happened, recently.)
> > 
> > cu
> >Philipp
> > 
> > -- 
> > Dr. Philipp Pagel
> > Lehrstuhl für Genomorientierte Bioinformatik
> > Technische Universität München
> > Wissenschaftszentrum Weihenstephan
> > Maximus-von-Imhof-Forum 3
> > 85354 Freising, Germany
> > http://webclu.bio.wzw.tum.de/~pagel/
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins


OK, my last question didn't get any replies so I am going to try and ask a 
different way.

When I generate contrasts with contr.sum() for a 3 level categorical variable I 
get the 2 orthogonal contrasts:

> contr.sum( c(1,2,3) )
  [,1] [,2]
110
201
3   -1   -1

This provides the contrasts <1-3> and <2-3> as expected.  But I also want it to 
create <1-2> (i.e. <1-3> - <2-3>).  So in general I want all possible 
orthogonal contrasts - think of it as the contrasts for all pairwise 
comparisons between the levels.

Are there are any options for contrast() or other functions/libraries that will 
allow me to do this automatically?  I could go through and create new columns 
but I am using this for complex multi-factor experiments with varying levels 
per factor and fitting the models within other functions (e.g. regsubsets()) so 
an automatic solution using what is already available would be far preferable.


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Denis and list

Thanks for this , and sorry for not providing enough information

First let me put the study into a bit more context : -

I know the number of species at risk in each family, what i am asking  is "Is 
risk random according to family or do certain families have a disproportionate 
number of at risk species?"

My idea was to randomly allocate risk to the families based on the criteria 
below (binomial(nspecies, 0.0748)) and then compare this to the "true data" and 
see if there was a significant difference.

So in answer to your questions, (assuming my method is correct !)

> Is this over all families, or within a particular family? If the former, why
> does a distinction of family matter?

Within a particular family  - this is because i am looking to see if risk in 
the "observed" data set is random in respect to family so this will provide the 
baseline to compare against.

> I guess you've stated the p, but what's the n? The number of species in each
> family?

This varies largely, for instance i have some families that are monotypic  
(with 1 species) and then i have other families with 100+ species 


> Assuming you have multiple families, do you want separate simulations per
> family, or do you want to do some sort of weighting (perhaps proportional to
> size) over all families?

I am assuming i want some sort of weighting. This is because i am wanting to 
calculate the number of species expected to be at risk in EACH family under the 
random binomial distribution ( assuming every species has a 7.48% chance of 
being at risk.

Thanks

John




On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:

> Dear List,
> 
> I am doing some simulation in R and need basic help!
> 
> I have a list of animal families for which i know the number of species in
> each family.
> 
> I am working under the assumption that a species has a 7.48% chance of
> being at risk.
> 

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

> 
> I want to simulate the number of species expected to be at risk under a
> random binomial distribution with 10,000 randomizations.
> 

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis


> 
> I am relatively knew to this field and would greatly appreciate a
> "idiot-proof" response, I.e how should the data be entered into R? I was
> thinking of using read.table, header = T, where the table has F = Family
> Name, and SP = Number of species in that family?
> 
> John
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hi John,

The word "species" attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart  wrote:
> Hi Denis and list
>
> Thanks for this , and sorry for not providing enough information
>
> First let me put the study into a bit more context : -
>
> I know the number of species at risk in each family, what i am asking  is "Is 
> risk random according to family or do certain families have a 
> disproportionate number of at risk species?"
>
> My idea was to randomly allocate risk to the families based on the criteria 
> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" 
> and see if there was a significant difference.
>
> So in answer to your questions, (assuming my method is correct !)
>
>> Is this over all families, or within a particular family? If the former, why
>> does a distinction of family matter?
>
> Within a particular family  - this is because i am looking to see if risk in 
> the "observed" data set is random in respect to family so this will provide 
> the baseline to compare against.
>
>> I guess you've stated the p, but what's the n? The number of species in each
>> family?
>
> This varies largely, for instance i have some families that are monotypic  
> (with 1 species) and then i have other families with 100+ species
>
>
>> Assuming you have multiple families, do you want separate simulations per
>> family, or do you want to do some sort of weighting (perhaps proportional to
>> size) over all families?
>
> I am assuming i want some sort of weighting. This is because i am wanting to 
> calculate the number of species expected to be at risk in EACH family under 
> the random binomial distribution ( assuming every species has a 7.48% chance 
> of being at risk.
>
> Thanks
>
> John
>
>
>
>
> On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
>
> Hi:
>
> I don't believe you've provided quite enough information just yet...
>
> On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:
>
>> Dear List,
>>
>> I am doing some simulation in R and need basic help!
>>
>> I have a list of animal families for which i know the number of species in
>> each family.
>>
>> I am working under the assumption that a species has a 7.48% chance of
>> being at risk.
>>
>
> Is this over all families, or within a particular family? If the former, why
> does a distinction of family matter?
>
>>
>> I want to simulate the number of species expected to be at risk under a
>> random binomial distribution with 10,000 randomizations.
>>
>
> I guess you've stated the p, but what's the n? The number of species in each
> family? If you're simulating on a family by family basis, then it would seem
> that a binomial(nspecies, 0.0748) distribution would be the reference.
> Assuming you have multiple families, do you want separate simulations per
> family, or do you want to do some sort of weighting (perhaps proportional to
> size) over all families? The latter is doable, but it would require a
> two-stage simulation: one to randomly select a family and then to randomly
> select a species.
>
> Dennis
>
>
>>
>> I am relatively knew to this field and would greatly appreciate a
>> "idiot-proof" response, I.e how should the data be entered into R? I was
>> thinking of using read.table, header = T, where the table has F = Family
>> Name, and SP = Number of species in that family?
>>
>> John
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread Gerrit Eichner

Hi, Doug,

maybe


HH <- c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23)
TT <- c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14)
columnnumbers <- c(0, 0, 0, 3, 0, 0, 0, 2, 0)

TMP <- lapply( seq( columnnumbers),
   function( i, CN, M) {
if( CN[i] == 0) as.matrix( M[, i]) else
 matrix( -1, nrow( M), CN[i])
}, CN = columnnumbers, M = rbind( HH, TT))

do.call( cbind, TMP)



gets close to what you want (after some adaptation, of course).


 HTH  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Michael,

Thanks for this - the reason i am following this approach is that it appeared 
in a paper i was reading, and i thought it was a interesting angle to take 

The paper is 

Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of 
angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.

and the specific method i am following states :- 

> We calculated the number of species expected to be at risk in each family 
> under a random binomial distribution in 10 000 randomizations [generated 
> using R version 2.6.0 (R Development Team 2007)] assuming every species has a 
> 7.48% chance of being at risk. 

I guess the reason i am doing the simulation is because i am not hugely 
statistically minded and the paper was asking the same question i am interested 
in answering :).

So following your approach -

> if family F has Fn species, your random expectation is that p * Fn of
> them will be at risk (p = 0.0748). The variance on that expectation
> will be p * (1-p) * Fn.


Family f = Bromeliaceae , with Fn = 80, p=0.0748
random expectation = p*Fn = (0.0748*80) = 5.984
variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

So the random expectation is that the Bromeliaceae will have 6 species at risk, 
if risk is assigned randomly?

So if i do this for all the families it will be the same as doing the 
simulation experiment outline in the method above?

Thanks

John




On 15 Oct 2010, at 12:49, Michael Bedward wrote:

Hi John,

The word "species" attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart  wrote:
> Hi Denis and list
> 
> Thanks for this , and sorry for not providing enough information
> 
> First let me put the study into a bit more context : -
> 
> I know the number of species at risk in each family, what i am asking  is "Is 
> risk random according to family or do certain families have a 
> disproportionate number of at risk species?"
> 
> My idea was to randomly allocate risk to the families based on the criteria 
> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" 
> and see if there was a significant difference.
> 
> So in answer to your questions, (assuming my method is correct !)
> 
>> Is this over all families, or within a particular family? If the former, why
>> does a distinction of family matter?
> 
> Within a particular family  - this is because i am looking to see if risk in 
> the "observed" data set is random in respect to family so this will provide 
> the baseline to compare against.
> 
>> I guess you've stated the p, but what's the n? The number of species in each
>> family?
> 
> This varies largely, for instance i have some families that are monotypic  
> (with 1 species) and then i have other families with 100+ species
> 
> 
>> Assuming you have multiple families, do you want separate simulations per
>> family, or do you want to do some sort of weighting (perhaps proportional to
>> size) over all families?
> 
> I am assuming i want some sort of weighting. This is because i am wanting to 
> calculate the number of species expected to be at risk in EACH family under 
> the random binomial distribution ( assuming every species has a 7.48% chance 
> of being at risk.
> 
> Thanks
> 
> John
> 
> 
> 
> 
> On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
> 
> Hi:
> 
> I don't believe you've provided quite enough information just yet...
> 
> On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:
> 
>> Dear List,
>> 
>> I am doing some simulation in R and need basic help!
>> 
>> I have a list of animal families for which i know the number of species in
>> each family.
>> 
>> I am working under the assumption that a species has a 7.48% chance of
>> being at risk.
>> 
> 
> Is this over all families, or within a particular family? If the former, why
> does a distinction of family matter?
> 
>> 
>> I want to simulate the number of species expected to be at risk under a
>> random binomial distribution with 10,000 randomizations.
>> 
> 
> I guess you've stated the p, but what's the n? The number of species in each
> family? If you're simulating on a family by family basis, then it would seem
> that a binomial(nspecies, 0.0748) distribution would be the reference.
> Assuming you have multiple families, do you want separate simulations per
> family, or do you want to do some sort of weighting (perhaps proportional to
> size) over all families? The latter is doable, but it would require a
> two-stage simulation: one to randomly select a family and then to random

Re: [R] Nonlinear Regression Parameter Shared Across Multiple Data Sets

2010-10-15 Thread Jared Blashka
Looking at the source for nlrob, it looks like it saves the coefficients
from the results of running an nls and then passes those coefficients back
into the next nls request. The issue that it's running into is that nls
returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, rather
than just upper and a vector named LOGEC50. Does anyone know a way to
restructure the formula/start parameter so that coef returns a vector
instead of each element individually? Right now, I've had to 'hack' nlrob so
it recombines similarly named elements into a vector, but was wondering if
there was a way to accomplish the end goal without those measures.

Thanks,
Jared

On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka wrote:

> As an addendum to my question, I'm attempting to apply the solution to the
> robust non-linear regression function nlrob from the robustbase package, and
> it doesn't work in that situation. I'm getting
>
> allRobustFit <- nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all
> ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5)))
> Error in nls(formula, data = data, start = start, algorithm = algorithm,
>  :
>   parameters without starting value in 'data': LOGEC50
>
> I'm guessing this is because the nlrob function doesn't know what to do
> with a vector for a start value. Am I correct and is there another method of
> using nlrob in the same way?
>
> Thanks,
> Jared
>
> On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka wrote:
>
>> Thanks so much! It works great.
>>
>> I had thought the way to do it relied on combining the data sets, but I
>> couldn't figure out how to alter the formula to work with the combination.
>>
>> Jared
>>
>>
>> On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell wrote:
>>
>>>
>>> "Jared Blashka"  wrote in message
>>> news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com...
>>> > I'm working with 3 different data sets and applying this non-linear
>>> > regression formula to each of them.
>>> >
>>> > nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers,
>>> > start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5))
>>> >
>>> > Previously, all of the regressions were calculated in Prism, but I'd
>>> like
>>> > to
>>> > be able to automate the calculation process in a script, which is why
>>> I'm
>>> > trying to move to R. The issue I'm running into is that previously, in
>>> > Prism, I was able to calculate a shared value for a constraint so that
>>> all
>>> > three data sets shared the same value, but have other constraints
>>> > calculated
>>> > separately. So Prism would figure out what single value for the
>>> constraint
>>> > in question would work best across all three data sets. For my formula,
>>> > each
>>> > data set needs it's own LOGEC50 value, but the upper value should be
>>> the
>>> > same across the 3 sets. Is there a way to do this within R, or with a
>>> > package I'm not aware of, or will I need to write my own nls function
>>> to
>>> > work with multiple data sets, because I've got no idea where to start
>>> with
>>> > that.
>>> >
>>> > Thanks,
>>> > Jared
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> An approach which works for me (code below to illustrate principle, not
>>> tried...)
>>>
>>> 1) combine all three "data sets" into one dataframe with a column (e.g.
>>> dset) indicating data set (1, 2 or 3)
>>>
>>> 2) express your formula with upper as single valued and LOGEC50 as a
>>> vector
>>> inderxed by dest e.g.
>>> Y ~ upper/(1+10^(C-LOGEC50[dset]))
>>>
>>> 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start for
>>> all
>>> three LOGEC50 values
>>>   start =
>>> list(start=list(upper=max(std_no_outliers$Y),LOGEC50=c(-8.5, -8.5, -8.5))
>>>
>>> Hope that helps,
>>>
>>> Keith J
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scores for a new observation from PCAgrid() in pcaPP

2010-10-15 Thread Kari Ruohonen
Hi,
I a trying to compute scores for a new observation based on previously
computed PCA by PCAgrid() function in the pcaPP package. My data has
more variables than observations.

Here is an imaginary data set to show the case:
> n.samples<-30
> n.bins<-1000
> x.sim<-rep(0,n.bins)
> V.sim<-diag(n.bins)
> mtx<-array(dim=c(n.samples,n.bins))
> for(i in 1:n.samples) mtx[i,]<-mvrnorm(1,x.sim,V.sim)

With prcomp() I can do the following:

> pc.pr2<-prcomp(mtx,scale=TRUE)
> newscr.pr2<-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2
$rotation

The latter computes the scores for the first row of mtx. I can verify
that the scores are the same as computed originally by comparing with

> pc.pr2$x[1,] # that will print out the scores for the first
observation

Now, if I tried the same with PCAgrid() as follows:

> pc.pp2<-PCAgrid(mtx,k=min(dim(mtx)),scale=mad)
> newscr.pp2<-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2
$loadings

The newscr.pp2 do not match the scores in the pc.pp2 object as can be
verified by comparing with:
> pc.pp2$x[1,] 

I wonder what I am missing? Or is it so that for the grid method such
computation of scores from the loadings and original observations is not
possible?

For the case pn.

Many thanks for any help,
Kari

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging and working with BIG data sets. Is sqldf the best way??

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 6:14 AM, Chris Howden
 wrote:
> Thanks for the advice Gabor,
>
> I was indeed not starting and finishing with sqldf(). Which was why it was
> not working for me. Please forgive a blatantly obvious mistake.
>
>
> I have tried what U suggested and unfortunately R is still having problems
> doing the join. The problem seems to be one of memory since I am receiving
> the below error message when I run the natural join using sqldf.
> Error: cannot allocate vector of size 250.0 Mb
> Timing stopped at: 32.8 0.48 34.79
>

Specify an external database so it uses an external, rather than "in
memory", database.  See example 6b (also shown in several other
examples) on the home page:

   sqldf(c("create...", "create...", "select..."), dbname = tempfile())

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hi John,

I haven't read that particular paper but in answer to your question...

> So if i do this for all the families it will be the same as doing the 
> simulation experiment
> outline in the method above?

Yes :)

Michael


On 15 October 2010 23:18, John Haart  wrote:
> Hi Michael,
>
> Thanks for this - the reason i am following this approach is that it appeared 
> in a paper i was reading, and i thought it was a interesting angle to take
>
> The paper is
>
> Vamosi & Wilson, 2008. Nonrandom extinction leads to elevated loss of 
> angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.
>
> and the specific method i am following states :-
>
>> We calculated the number of species expected to be at risk in each family 
>> under a random binomial distribution in 10 000 randomizations [generated 
>> using R version 2.6.0 (R Development Team 2007)] assuming every species has 
>> a 7.48% chance of being at risk.
>
> I guess the reason i am doing the simulation is because i am not hugely 
> statistically minded and the paper was asking the same question i am 
> interested in answering :).
>
> So following your approach -
>
>> if family F has Fn species, your random expectation is that p * Fn of
>> them will be at risk (p = 0.0748). The variance on that expectation
>> will be p * (1-p) * Fn.
>
>
> Family f = Bromeliaceae , with Fn = 80, p=0.0748
> random expectation = p*Fn = (0.0748*80) = 5.984
> variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968
>
> So the random expectation is that the Bromeliaceae will have 6 species at 
> risk, if risk is assigned randomly?
>
> So if i do this for all the families it will be the same as doing the 
> simulation experiment outline in the method above?
>
> Thanks
>
> John
>
>
>
>
> On 15 Oct 2010, at 12:49, Michael Bedward wrote:
>
> Hi John,
>
> The word "species" attracted my attention :)
>
> Like Dennis, I'm not sure I understand your idea properly. In
> particular, I don't see what you need the simulation for.
>
> If family F has Fn species, your random expectation is that p * Fn of
> them will be at risk (p = 0.0748). The variance on that expectation
> will be p * (1-p) * Fn.
>
> If you do your simulation that's the result you'll get.  Perhaps to
> initial identify families with disproportionate observed extinction
> rates all you need is the dbinom function ?
>
> Michael
>
>
> On 15 October 2010 22:29, John Haart  wrote:
>> Hi Denis and list
>>
>> Thanks for this , and sorry for not providing enough information
>>
>> First let me put the study into a bit more context : -
>>
>> I know the number of species at risk in each family, what i am asking  is 
>> "Is risk random according to family or do certain families have a 
>> disproportionate number of at risk species?"
>>
>> My idea was to randomly allocate risk to the families based on the criteria 
>> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" 
>> and see if there was a significant difference.
>>
>> So in answer to your questions, (assuming my method is correct !)
>>
>>> Is this over all families, or within a particular family? If the former, why
>>> does a distinction of family matter?
>>
>> Within a particular family  - this is because i am looking to see if risk in 
>> the "observed" data set is random in respect to family so this will provide 
>> the baseline to compare against.
>>
>>> I guess you've stated the p, but what's the n? The number of species in each
>>> family?
>>
>> This varies largely, for instance i have some families that are monotypic  
>> (with 1 species) and then i have other families with 100+ species
>>
>>
>>> Assuming you have multiple families, do you want separate simulations per
>>> family, or do you want to do some sort of weighting (perhaps proportional to
>>> size) over all families?
>>
>> I am assuming i want some sort of weighting. This is because i am wanting to 
>> calculate the number of species expected to be at risk in EACH family under 
>> the random binomial distribution ( assuming every species has a 7.48% chance 
>> of being at risk.
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>> On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
>>
>> Hi:
>>
>> I don't believe you've provided quite enough information just yet...
>>
>> On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:
>>
>>> Dear List,
>>>
>>> I am doing some simulation in R and need basic help!
>>>
>>> I have a list of animal families for which i know the number of species in
>>> each family.
>>>
>>> I am working under the assumption that a species has a 7.48% chance of
>>> being at risk.
>>>
>>
>> Is this over all families, or within a particular family? If the former, why
>> does a distinction of family matter?
>>
>>>
>>> I want to simulate the number of species expected to be at risk under a
>>> random binomial distribution with 10,000 randomizations.
>>>
>>
>> I guess you've stated the p, but what's the n? The number of species in each
>> family? If you're simulating on a fami

Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Douglas Bates
Although I know there is another message in this thread I am replying
to this message to be able to include the whole discussion to this
point.

Gregor mentioned the possibility of extending the compiled code for
cumsum so that it would handle the matrix case.  The work by Dirk
Eddelbuettel and Romain Francois on developing the Rcpp package make
it surprisingly easy to create compiled code for this task.  I am
copying the Rcpp-devel list on this in case one of the readers of that
list has time to create a sample implementation before I can get to
it.  It's midterm season and I have to tackle the stack of blue books
on my desk before doing fun things like writing code.

On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley  wrote:
> Hi,
>
> You might look at Reduce().  It seems faster.  I converted the matrix
> to a list in an incredibly sloppy way (which you should not emulate)
> because I cannot think of  the simple way.
>
>> probs <- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
>> probabilites
>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>> F[,1] <- probs[,1,drop=TRUE];
>> add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>>
>>
>> system.time(F.slow <- t(apply(probs, 1, cumsum)))
>   user  system elapsed
>  36.758   0.416  42.464
>>
>> system.time(for (cc in 2:ncol(F)) {
> +  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> + })
>   user  system elapsed
>  0.980   0.196   1.328
>>
>> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
>> probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
>   user  system elapsed
>  0.420   0.072   0.539
>>
>
> .539 seconds for 10 vectors each with 1 million elements does not seem
> bad.  For 3 each, on my system it took .014 seconds, which for
> 200,000 iterations, I estimated as:
>
>> (20*.014)/60/60
> [1] 0.778
>
> (and this is on a stone age system with a single core processor and
> only 756MB of RAM)
>
> It should not be difficult to get the list back to a matrix.
>
> Cheers,
>
> Josh
>
>
>
> On Thu, Oct 14, 2010 at 11:51 PM, Gregor  wrote:
>> Dear all,
>>
>> Maybe the "easiest" solution: Is there anything that speaks against 
>> generalizing
>> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>>
>> "cumsum(Matrix, 1)"
>> equivalent to
>> "apply(Matrix, 1, cumsum)"
>>
>> The main advantage could be optimized code if the Matrix is extreme nonsquare
>> (e.g. 100,000x10), but the summation is done over the short side (in this 
>> case 10).
>> apply would practically yield a loop over 100,000 elements, and 
>> vectorization w.r.t.
>> the long side (loop over 10 elements) provides considerable efficiency gains.
>>
>> Many regards,
>> Gregor
>>
>>
>>
>>
>> On Tue, 12 Oct 2010 10:24:53 +0200
>> Gregor  wrote:
>>
>>> Dear all,
>>>
>>> I am struggling with a (currently) cost-intensive problem: calculating the
>>> (non-normalized) cumulative distribution function, given the 
>>> (non-normalized)
>>> probabilities. something like:
>>>
>>> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
>>> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
>>>
>>> One (already faster, but for sure not ideal) solution - thanks to Henrik 
>>> Bengtsson:
>>>
>>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>>> F[,1] <- probs[,1,drop=TRUE];
>>> for (cc in 2:ncol(F)) {
>>>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>>> }
>>>
>>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this 
>>> step around
>>> 200,000 times, so speed is crucial. I currently can make sure to have no 
>>> NAs, but
>>> in order to extend matrixStats, this could be a nontrivial issue.
>>>
>>> Any ideas for speeding up this - probably routine - task?
>>>
>>> Thanks in advance,
>>> Gregor
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide comme

[R] split data with missing data condition

2010-10-15 Thread Jumlong Vongprasert
Dear all
I have data like this:
  x  y
  [1,] 59.74889  3.1317081
  [2,] 38.77629  1.7102589
  [3,]   NA  2.2312962
  [4,] 32.35268  1.3889621
  [5,] 74.01394  1.5361227
  [6,] 34.82584  1.1665412
  [7,] 42.72262  2.7870875
  [8,] 70.54999  3.3917257
  [9,] 59.37573  2.6763249
 [10,] 68.87422  1.9697770
 [11,] 19.00898  2.0584415
 [12,] 60.27915  2.5365194
 [13,] 50.76850  2.3943836
 [14,]   NA  2.2862790
 [15,] 39.01229  1.7924957

and I want to spit data into two set of data,  data set of nonmising and
data set of missing.
How I can do this.
Many Thanks.
Jumlong


-- 
Jumlong Vongprasert
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Berwin A Turlach
G'day Michael,

On Fri, 15 Oct 2010 12:09:07 +0100
Michael Hopkins  wrote:

> OK, my last question didn't get any replies so I am going to try and
> ask a different way.
> 
> When I generate contrasts with contr.sum() for a 3 level categorical
> variable I get the 2 orthogonal contrasts:
> 
> > contr.sum( c(1,2,3) )
>   [,1] [,2]
> 110
> 201
> 3   -1   -1

These two contrasts are *not* orthogonal.

> This provides the contrasts <1-3> and <2-3> as expected.  But I also
> want it to create <1-2> (i.e. <1-3> - <2-3>).  So in general I want
> all possible orthogonal contrasts - think of it as the contrasts for
> all pairwise comparisons between the levels.

You have to decide what you want.  The contrasts for all pairwise
comparaisons between the levels or all possible orthogonal contrasts?

The latter is actually not well defined.  For a factor with p levels,
there would be p orthogonal contrasts, which are only identifiable up to
rotation, hence infinitely many such sets. But there are p(p-1)
pairwise comparisons. So unless p=2, yo have to decide what you want

> Are there are any options for contrast() or other functions/libraries
> that will allow me to do this automatically?

Look at package multcomp, in particular functions glht and mcp, these
might help.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread Dimitris Rizopoulos

you can do the following:

mat <- cbind(x = runif(15, 50, 70), y = rnorm(15, 2))
mat[sample(15, 2), "x"] <- NA

na.x <- is.na(mat[, 1])
mat[na.x, ]
mat[!na.x, ]


I hope it helps.

Best,
Dimitris


On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote:

Dear all
I have data like this:
   x  y
   [1,] 59.74889  3.1317081
   [2,] 38.77629  1.7102589
   [3,]   NA  2.2312962
   [4,] 32.35268  1.3889621
   [5,] 74.01394  1.5361227
   [6,] 34.82584  1.1665412
   [7,] 42.72262  2.7870875
   [8,] 70.54999  3.3917257
   [9,] 59.37573  2.6763249
  [10,] 68.87422  1.9697770
  [11,] 19.00898  2.0584415
  [12,] 60.27915  2.5365194
  [13,] 50.76850  2.3943836
  [14,]   NA  2.2862790
  [15,] 39.01229  1.7924957

and I want to spit data into two set of data,  data set of nonmising and
data set of missing.
How I can do this.
Many Thanks.
Jumlong




--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread jim holtman
Try this:

> a <- read.table(textConnection(" x  y
+  59.74889  3.1317081
+  38.77629  1.7102589
+NA  2.2312962
+  32.35268  1.3889621
+  74.01394  1.5361227
+  34.82584  1.1665412
+  42.72262  2.7870875
+  70.54999  3.3917257
+  59.37573  2.6763249
+   68.87422  1.9697770
+   19.00898  2.0584415
+   60.27915  2.5365194
+   50.76850  2.3943836
+ NA  2.2862790
+   39.01229  1.7924957"), header=TRUE)
> a <- as.matrix(a)
> # good data
> a.good <- a[complete.cases(a),, drop=FALSE]
> a.bad <- a[!complete.cases(a),, drop=FALSE]
> a.good
 xy
 [1,] 59.74889 3.131708
 [2,] 38.77629 1.710259
 [3,] 32.35268 1.388962
 [4,] 74.01394 1.536123
 [5,] 34.82584 1.166541
 [6,] 42.72262 2.787088
 [7,] 70.54999 3.391726
 [8,] 59.37573 2.676325
 [9,] 68.87422 1.969777
[10,] 19.00898 2.058441
[11,] 60.27915 2.536519
[12,] 50.76850 2.394384
[13,] 39.01229 1.792496
> a.bad
  xy
[1,] NA 2.231296
[2,] NA 2.286279
>


On Fri, Oct 15, 2010 at 8:45 AM, Jumlong Vongprasert
 wrote:
> Dear all
> I have data like this:
>              x          y
>  [1,] 59.74889  3.1317081
>  [2,] 38.77629  1.7102589
>  [3,]       NA  2.2312962
>  [4,] 32.35268  1.3889621
>  [5,] 74.01394  1.5361227
>  [6,] 34.82584  1.1665412
>  [7,] 42.72262  2.7870875
>  [8,] 70.54999  3.3917257
>  [9,] 59.37573  2.6763249
>  [10,] 68.87422  1.9697770
>  [11,] 19.00898  2.0584415
>  [12,] 60.27915  2.5365194
>  [13,] 50.76850  2.3943836
>  [14,]       NA  2.2862790
>  [15,] 39.01229  1.7924957
>
> and I want to spit data into two set of data,  data set of nonmising and
> data set of missing.
> How I can do this.
> Many Thanks.
> Jumlong
>
>
> --
> Jumlong Vongprasert
> Institute of Research and Development
> Ubon Ratchathani Rajabhat University
> Ubon Ratchathani
> THAILAND
> 34000
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Color individual leaf labels in dendrogram

2010-10-15 Thread Kennedy

Hi,

I have performed a clustering of a matrix and plotted the result with
pltree. See code below. I want to color the labels of the leafs
individually. For example I want the label name "Node 2" to be plotted in
red. How do I do this?

Sincerely 

Henrik


  library(cluster) 

  D <- matrix(nr=4,nc=4)
  rownames(D) <- c("Node 1","Node 2","Node 3","Node 4")
  D[1,] <- c(0,.6,.1,.7)
  D[2,] <- c(.6,.0,.3,.9)
  D[3,] <- c(.1,.3,0,.9)
  D[4,] <- c(.7,.9,.9,0)

  C <- agnes(D,diss=T,method="complete")

  pltree(C)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Color-individual-leaf-labels-in-dendrogram-tp2996982p2996982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 15.10.2010 15:00:46:

> you can do the following:
> 
> mat <- cbind(x = runif(15, 50, 70), y = rnorm(15, 2))
> mat[sample(15, 2), "x"] <- NA
> 
> na.x <- is.na(mat[, 1])
> mat[na.x, ]
> mat[!na.x, ]

Or if you have missing data in several columns and you want select only 
those which are complete use complete.cases

mat[complete.cases(mat},]
mat[!complete.cases(mat},]

Regards
Petr

> 
> 
> I hope it helps.
> 
> Best,
> Dimitris
> 
> 
> On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote:
> > Dear all
> > I have data like this:
> >x  y
> >[1,] 59.74889  3.1317081
> >[2,] 38.77629  1.7102589
> >[3,]   NA  2.2312962
> >[4,] 32.35268  1.3889621
> >[5,] 74.01394  1.5361227
> >[6,] 34.82584  1.1665412
> >[7,] 42.72262  2.7870875
> >[8,] 70.54999  3.3917257
> >[9,] 59.37573  2.6763249
> >   [10,] 68.87422  1.9697770
> >   [11,] 19.00898  2.0584415
> >   [12,] 60.27915  2.5365194
> >   [13,] 50.76850  2.3943836
> >   [14,]   NA  2.2862790
> >   [15,] 39.01229  1.7924957
> >
> > and I want to spit data into two set of data,  data set of nonmising 
and
> > data set of missing.
> > How I can do this.
> > Many Thanks.
> > Jumlong
> >
> >
> 
> -- 
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
> 
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
> Web: http://www.erasmusmc.nl/biostatistiek/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rcpp-devel] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Romain Francois

Hi,

This would probably deserve some abstraction, we had C++ versions of 
apply in our TODO list for some time, but here is a shot:



require( Rcpp )
require( inline )


f.Rcpp <- cxxfunction( signature( x = "matrix" ), '

NumericMatrix input( x ) ;
NumericMatrix output  = clone( input ) ;

int nr = input.nrow(), nc = input.ncol() ;
NumericVector tmp( nr );
for( int i=0; iprobs <- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
probabilites


bm <- benchmark(
f.R( probs ),
f.Rcpp( probs ),
columns=c("test", "elapsed", "relative", "user.self", "sys.self"),
order="relative",
replications=10)
print( bm )


I get this on my iMac with R 2.12.0 and Rcpp as just released to CRAN.

$ Rscript cumsum.R
   test elapsed relative user.self sys.self
2 f.Rcpp(probs)   4.614  1.0 4.3750.239
1f.R(probs)  68.645 14.8775567.7650.877

When we implement "apply" in C++, we will probably leverage loop 
unrolling to achieve greater performance.


Romain

Le 15/10/10 14:34, Douglas Bates a écrit :

Although I know there is another message in this thread I am replying
to this message to be able to include the whole discussion to this
point.

Gregor mentioned the possibility of extending the compiled code for
cumsum so that it would handle the matrix case.  The work by Dirk
Eddelbuettel and Romain Francois on developing the Rcpp package make
it surprisingly easy to create compiled code for this task.  I am
copying the Rcpp-devel list on this in case one of the readers of that
list has time to create a sample implementation before I can get to
it.  It's midterm season and I have to tackle the stack of blue books
on my desk before doing fun things like writing code.

On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley  wrote:

Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.


probs<- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites
F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
F[,1]<- probs[,1,drop=TRUE];
add<- function(x) {Reduce(`+`, x, accumulate = TRUE)}


system.time(F.slow<- t(apply(probs, 1, cumsum)))

   user  system elapsed
  36.758   0.416  42.464


system.time(for (cc in 2:ncol(F)) {

+  F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328


system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))

   user  system elapsed
  0.420   0.072   0.539




.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 3 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:


(20*.014)/60/60

[1] 0.778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh



On Thu, Oct 14, 2010 at 11:51 PM, Gregor  wrote:

Dear all,

Maybe the "easiest" solution: Is there anything that speaks against generalizing
cumsum from base to cope with matrices (as is done in matlab)? E.g.:

"cumsum(Matrix, 1)"
equivalent to
"apply(Matrix, 1, cumsum)"

The main advantage could be optimized code if the Matrix is extreme nonsquare
(e.g. 100,000x10), but the summation is done over the short side (in this case 
10).
apply would practically yield a loop over 100,000 elements, and vectorization 
w.r.t.
the long side (loop over 10 elements) provides considerable efficiency gains.

Many regards,
Gregor




On Tue, 12 Oct 2010 10:24:53 +0200
Gregor  wrote:


Dear all,

I am struggling with a (currently) cost-intensive problem: calculating the
(non-normalized) cumulative distribution function, given the (non-normalized)
probabilities. something like:

probs<- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
F<- t(apply(probs, 1, cumsum)) #SLOOOW!

One (already faster, but for sure not ideal) solution - thanks to Henrik 
Bengtsson:

F<- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
F[,1]<- probs[,1,drop=TRUE];
for (cc in 2:ncol(F)) {
   F[,cc]<- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
}

In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step 
around
200,000 times, so speed is crucial. I currently can make sure to have no NAs, 
but
in order to extend matrixStats, this could be a nontrivial issue.

Any ideas for speeding up this - probably routine - task?

Thanks in advance,
Gregor



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/b8wOqW : LondonR Rcpp slides
|- http://bit.ly/cCmbgg : Rcpp 0.8.6
`- http://bit.ly/bzoWrs : Rcpp svn revision 2000

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-pro

Re: [R] interaction contrasts

2010-10-15 Thread Kay Cichini

..by some (extensive) trial and error reordering the contrast matrix and the
reference level 
i figured it out myself -
for anyone who might find this helpful searching for a similar contrast in
the future: 
this should be the right one:

c2<-rbind("fac2-effect in A"=c(0,1,0,0,0,0,0,0),
  "fac2-effect in B"=c(0,1,0,0,0,1,0,0),
  "fac2-effect in C"=c(0,1,0,0,0,0,1,0),
  "fac2-effect in D"=c(0,1,0,0,0,0,0,1),
"fac2-effect, A*B"=c(0,0,0,0,0,1,0,0),
  "fac2-effect, A*C"=c(0,0,0,0,0,0,1,0),
  "fac2-effect, A*D"=c(0,0,0,0,0,0,0,1),
"fac2-effect, B*C"=c(0,0,0,0,0,-1,1,0),
"fac2-effect, B*D"=c(0,0,0,0,0,-1,0,1),
"fac2-effect, C*D"=c(0,0,0,0,0,0,-1,1))

summary(glht(mod,c2))


Kay Cichini wrote:
> 
> hello,
> 
> i was shortly asking the list for help with some interaction contrasts
> (see below) for which
> i had to change the reference level of the model "on the fly" (i read a
> post that this is possible in 
> multcomp).
> 
> if someone has a clue how this is coded in multcomp; glht() - please point
> me there.
> 
> yours,
> kay
> 
> 
> Kay Cichini wrote:
>> 
>> hello list,
>> 
>> i'd very much appreciate help with setting up the 
>> contrast for a 2-factorial crossed design.
>> 
>> here is a toy example:
>> 
>> library(multcomp)
>> 
>> dat<-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]),
>> fac2=rep(c("I","II"),16),y=rnorm(32,1,1))
>> 
>> mod<-lm(y~fac1*fac2,data=dat)
>> 
>> ## the contrasts i'm interressted in:
>> 
>> c1<-rbind("fac2-effect in A"=c(0,1,0,0,0,0,0,0),
>>   "fac2-effect in B"=c(0,1,0,0,0,1,0,0),
>>   "fac2-effect in C"=c(0,1,0,0,0,0,1,0),
>>   "fac2-effect in D"=c(0,1,0,0,0,0,0,1),
>>  "fac2-effect, A*B"=c(0,0,0,0,0,1,0,0),
>>   "fac2-effect, A*C"=c(0,0,0,0,0,0,1,0),
>>   "fac2-effect, A*D"=c(0,0,0,0,0,0,0,1))
>> 
>> summary(glht(mod,c1))
>> 
>> ## now i want to add the remaining combinations 
>> ## "fac2, B*C"
>> ## "fac2, B*D"
>> ## "fac2, C*D"
>> ## to the simultanous tests to see whether the effects 
>> ## of fac2 within the levels of fac1 differ between  
>> ## each combination of the levels of fac1, or not ??
>> 
>> thanks for any advise!
>> 
>> yours,
>> kay
>> 
>> 
>> 
> 
> 


-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


-- 
View this message in context: 
http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in DGEList function

2010-10-15 Thread Ying Ye

Hi!

I am a new R user and have no clue of this error (see below) while  
using edgeR package:


> Y <- clade_reads
> y <- Y[,c(g1,g2)]
> grouping <- c( rep(1,length(g1)), rep(2,length(g2)) )
> size <- apply(y, 2, sum)
> d <- DGEList(data = y, group = grouping, lib.size = size)
Error in DGEList(data = y, group = grouping, lib.size = size) :
  unused argument(s) (data = y)


And g1 and g2 are two defined groups. Could anyone kindly interpret  
this?


Many thanks!

mikecrux

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hello again John,

I was going to suggest that you just use qbinom to generate the
expected number of extinctions. For example, for the family with 80
spp the central 95% expectation is:

qbinom(c(0.025, 0.975), 80, 0.0748)

which gives 2 - 11 spp.

If you wanted to do look across a large number of families you'd need
to deal with multiple comparison error but as a quick first look it
might be helpful.

However, I've just got a copy of teh paper and it seems that the
authors are calculating something different to a simple binomial
expecation: they are differentiating between high-risk (red listed)
and low-risk species within a family. They state that this equation
(expressed here in R-ese)...

choose(N, R) * p^R * b^(N - R)

...gives the probabilitiy of an entire family becoming extinct, where
N is number of spp in family; R is number of those that are red
listed; p is extinction probability for red list spp (presumably over
some period but I haven't read the paper properly yet); b is
extinction probability for other spp.

Then, in their simulations they hold b constant but play around with a
range of values for p.

So this sounds a bit different to what you originally posted as your
objective (?)


Michael



On 15 October 2010 22:49, Michael Bedward  wrote:
> Hi John,
>
> The word "species" attracted my attention :)
>
> Like Dennis, I'm not sure I understand your idea properly. In
> particular, I don't see what you need the simulation for.
>
> If family F has Fn species, your random expectation is that p * Fn of
> them will be at risk (p = 0.0748). The variance on that expectation
> will be p * (1-p) * Fn.
>
> If you do your simulation that's the result you'll get.  Perhaps to
> initial identify families with disproportionate observed extinction
> rates all you need is the dbinom function ?
>
> Michael
>
>
> On 15 October 2010 22:29, John Haart  wrote:
>> Hi Denis and list
>>
>> Thanks for this , and sorry for not providing enough information
>>
>> First let me put the study into a bit more context : -
>>
>> I know the number of species at risk in each family, what i am asking  is 
>> "Is risk random according to family or do certain families have a 
>> disproportionate number of at risk species?"
>>
>> My idea was to randomly allocate risk to the families based on the criteria 
>> below (binomial(nspecies, 0.0748)) and then compare this to the "true data" 
>> and see if there was a significant difference.
>>
>> So in answer to your questions, (assuming my method is correct !)
>>
>>> Is this over all families, or within a particular family? If the former, why
>>> does a distinction of family matter?
>>
>> Within a particular family  - this is because i am looking to see if risk in 
>> the "observed" data set is random in respect to family so this will provide 
>> the baseline to compare against.
>>
>>> I guess you've stated the p, but what's the n? The number of species in each
>>> family?
>>
>> This varies largely, for instance i have some families that are monotypic  
>> (with 1 species) and then i have other families with 100+ species
>>
>>
>>> Assuming you have multiple families, do you want separate simulations per
>>> family, or do you want to do some sort of weighting (perhaps proportional to
>>> size) over all families?
>>
>> I am assuming i want some sort of weighting. This is because i am wanting to 
>> calculate the number of species expected to be at risk in EACH family under 
>> the random binomial distribution ( assuming every species has a 7.48% chance 
>> of being at risk.
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>> On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
>>
>> Hi:
>>
>> I don't believe you've provided quite enough information just yet...
>>
>> On Fri, Oct 15, 2010 at 2:22 AM, John Haart  wrote:
>>
>>> Dear List,
>>>
>>> I am doing some simulation in R and need basic help!
>>>
>>> I have a list of animal families for which i know the number of species in
>>> each family.
>>>
>>> I am working under the assumption that a species has a 7.48% chance of
>>> being at risk.
>>>
>>
>> Is this over all families, or within a particular family? If the former, why
>> does a distinction of family matter?
>>
>>>
>>> I want to simulate the number of species expected to be at risk under a
>>> random binomial distribution with 10,000 randomizations.
>>>
>>
>> I guess you've stated the p, but what's the n? The number of species in each
>> family? If you're simulating on a family by family basis, then it would seem
>> that a binomial(nspecies, 0.0748) distribution would be the reference.
>> Assuming you have multiple families, do you want separate simulations per
>> family, or do you want to do some sort of weighting (perhaps proportional to
>> size) over all families? The latter is doable, but it would require a
>> two-stage simulation: one to randomly select a family and then to randomly
>> select a species.
>>
>> Dennis
>>
>>
>>>
>>> I am relatively knew to this field and would greatly appreciate a
>>>

Re: [R] Error in DGEList function

2010-10-15 Thread Martin Morgan
On 10/15/2010 06:17 AM, Ying Ye wrote:
> Hi!
> 
> I am a new R user and have no clue of this error (see below) while using
> edgeR package:

edgeR is a Bioconductor pacakge so please subscribe to the Bioconductor
list and ask there.

http://bioconductor.org/help/mailing-list/

include the output of the sessionInfo() command after library(edgeR)

Martin

> 
>> Y <- clade_reads
>> y <- Y[,c(g1,g2)]
>> grouping <- c( rep(1,length(g1)), rep(2,length(g2)) )
>> size <- apply(y, 2, sum)
>> d <- DGEList(data = y, group = grouping, lib.size = size)
> Error in DGEList(data = y, group = grouping, lib.size = size) :
>   unused argument(s) (data = y)
> 
> 
> And g1 and g2 are two defined groups. Could anyone kindly interpret this?
> 
> Many thanks!
> 
> mikecrux
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] p-values from coxph?

2010-10-15 Thread Öhagen Patrik

Dear List,

I each iteration of a simulation study, I would like to save the p-value 
generated by "coxph". I fail to see how to adress the p-value. Do I have to 
calculate it myself from the Wald Test statistic?


Cheers, Paddy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Color individual leaf labels in dendrogram

2010-10-15 Thread Bryan Hanson
Henrik, there is an easily adaptable example in this thread:
http://r.789695.n4.nabble.com/coloring-leaves-in-a-hclust-or-dendrogram-plot
-tt795496.html#a795497

HTH. Bryan
*
Bryan Hanson
Professor of Chemistry & Biochemistry
DePauw University, Greencastle IN USA



On 10/15/10 9:05 AM, "Kennedy"  wrote:

> 
> Hi,
> 
> I have performed a clustering of a matrix and plotted the result with
> pltree. See code below. I want to color the labels of the leafs
> individually. For example I want the label name "Node 2" to be plotted in
> red. How do I do this?
> 
> Sincerely 
> 
> Henrik
> 
> 
>   library(cluster)
> 
>   D <- matrix(nr=4,nc=4)
>   rownames(D) <- c("Node 1","Node 2","Node 3","Node 4")
>   D[1,] <- c(0,.6,.1,.7)
>   D[2,] <- c(.6,.0,.3,.9)
>   D[3,] <- c(.1,.3,0,.9)
>   D[4,] <- c(.7,.9,.9,0)
> 
>   C <- agnes(D,diss=T,method="complete")
> 
>   pltree(C)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread dpender

Hi Gerrit,

Almost it but I need to insert M[,i] as well as (matrix( -1, nrow( M),
CN[i]) when CN[i] = 0

I know this is not correct but can something like the following be done?

HH <- c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23)
TT <- c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14)
c <- c(0, 0, 0, 2, 0, 0, 0, 2, 0)

TMP <- lapply( seq(c),
function( i, CN, M) {
 if( CN[i] == 0) as.matrix( M[, i]) else
  (matrix( -1, nrow( M), CN[i]) && as.matrix( M[, i]))
 }, CN = c, M = rbind( HH, TT))

do.call( cbind, TMP)

Doug

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2997060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Many Thanks

2010-10-15 Thread Jumlong Vongprasert

 Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the program.
In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing list.
Thanks R-help mailing list.
Thank software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Many Thanks

2010-10-15 Thread Jumlong Vongprasert

 Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the program.
In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing list.
Thanks R-help mailing list.
Thanks software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boxplot issue

2010-10-15 Thread Jonas Josefsson

Hi!

I am trying to produce a graph which shows overlap in latitude for a 
number of species.


I have a dataframe which looks as follows

species1,species2,species3,species4.
minlat  6147947,612352,627241,6112791
maxlat 7542842,723423,745329,7634921


I want to produce a graph with one horizontal bar for each species where 
minlat sets minimum value and maxlat sets maximum value for the bar. I 
want all bars to be stacked on top of eachother to show where I have 
overlap.


I have tried using boxplot but it only produces the standard mean and sd 
of the two values...


Thanks!

Jonas Josefsson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Debian][NetBeans] setting NetBeans to work with JRI libraries

2010-10-15 Thread filip

I am having hard time properly setting NetBeans to work with JRI libs
(http://rosuda.org/JRI/). Most of the instructions I have found so far
are written for Eclipse or Windows (or both).

I have set java.library.path variable in config: customize:VM
arguments field, by specifying
"-Djava.library.path=/home/filip/Pobrane/JRI/". When I run the
application I get the following:

"Cannot find JRI native library!
Please make sure that the JRI native library is in a directory listed
in java.library.path.

java.lang.UnsatisfiedLinkError: /home/fb/Pobrane/JRI/libjri.so:
libR.so: cannot open shared object file: No such file or directory
(...)"

So I ran:

locate libR

with the following result:

/usr/local/lib/R/lib/libRblas.so
/usr/local/lib/R/lib/libRlapack.so

However this command:
ls /usr/local/lib/R/lib/

lists:

libRblas.so  libRlapack.so  libR.so

I have tried various customisations to the config, copying the
libraries and other desperate moves. I can properly compile and run the
rtest.java and rtest2.java under the examples section of my JRI
installation via the ./run script.

It would seem that NetBeans has a problem loading libraries,
that reference to other libraries (libjri is loaded, but not libR). I
have compiled R 2.12.0 with "--with-blas="-lgoto2" --enable-BLAS-shlib
--enable-R-shlib" (shared libraries, shared BLAS). Could the BLAS
be a problem? 


-- 
while(!succeed) { try(); }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tessellation from biological data in spatstat

2010-10-15 Thread Dr Richard Mort

Hi,

I'm new to this mailing list so apologies if this is too basic. I have 
confocal images 512x512 from which I have extracted x,y positions of the 
coordiates of labelled cells exported from ImageJ as a.csv file. I also 
have images that define an underlying pattern in the tissue defined as 
areas of different pixel values 0 or 255 (also 512x512) I've exported 
these images as .txt files. I'd like to use spatstat to look at how the 
individual cells plot onto these patterns.


#I can import my x,y data and do basic stats with spatstat by doing the 
following:


library(spatstat)
data01 <- read.csv("250810.csv", skip=1)
x <- data01[[2]]
y <- data01[[3]]
data02 <- ppp(x, y, xrange=c(0,512), yrange=c(0,512))
unitname(data02) <- c("pixel", "pixels")
plot(data02)
#etc etc

#I can also import my text images and convert them to a tessellation 
using the following:


a <- read.table("FOLLICLES.txt")
win <- owin(c(0,512), c(0,512))
unitname(win) <- c("pixel", "pixels")
b <- as.matrix(a, xrange=c(0,512), yrange=c(0,512))
unitname(b) <- c("pixel", "pixels")
c <- im(b, xcol=seq(ncol(b)), yrow=seq(nrow(b)), unitname="pixels")
d <- tess(image = c)
plot(d)

#I can then plot my ppp pattern on top of my tessellation using:

plot(d)
plot(data02, add=TRUE)

#So far so good, but when I run for example:

qb <- quadratcount(data02, tess = d)

#I get the following error:

Warning message:
In quadratcount.ppp(data02, tess = d) :
 Tessellation does not contain all the points of X

#When I investigate further the following is returned for each object:

> d
Tessellation
Tessellation is determined by a factor-valued image with 2 levels
window: binary image mask
512 x 512 pixel array (ny, nx)
enclosing rectangle: [0.5, 512.5] x [0.5, 512.5] pixels 
> data02

planar point pattern: 254 points
window: rectangle = [0, 512] x [0, 512] units 

#I don't understand why my tessellation is a different size but even 
taking this into account my ppp should plot inside the window and does 
when i do plot(data02, add=TRUE). I've spent ages playing around with 
redefining the windows but I must be missing something (probably obvious).


Any help would be appreciated, I can provide files.
Kind regards
Richard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Hoai Thu Thai

Dear R-Users,

I have a question concerning extraction of parameter estimates of 
variance function from lme fit.


To fit my simulated data, we use varConstPower ( constant plus power 
variance function).

fm<-lme(UPDRS~time,data=data.simula,random=~time,method="ML",weights=varConstPower(fixed=list(power=1)))
I extract the results of this function by using the following codes:
y<-summary(fm)
x<-y$modelStruct$varStruct
> x
Variance function structure of class varConstPower representing
   constpower
1.184535e-15 1.00e+00

These are the constants and power that apppear in the summary(fm), and 
that I want to extract.

Hower I got different value of const when extracting from x:

> x$const
const
-34.36943

Does anyone know why there is such difference and how to extract the 
expected value (1.184535e-15) ?


Thanks in advance for your help.

Thu


---

THAI Hoai Thu
INSERM U738 - Université Paris 7
16 rue Henri Huchard
75018 Paris, FRANCE
Tel: 01 57 27 75 39
Email: hoai-thu.t...@inserm.fr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot issue

2010-10-15 Thread David Winsemius

A) you hijacked another thread.

On Oct 15, 2010, at 9:50 AM, Jonas Josefsson wrote:


Hi!

I am trying to produce a graph which shows overlap in latitude for a  
number of species.


I have a dataframe which looks as follows

   species1,species2,species3,species4.
minlat  6147947,612352,627241,6112791
maxlat 7542842,723423,745329,7634921

B) This "looks" like a data input snafu. It appears that the commas in  
the original data were not properly specified as delimiters.




I want to produce a graph with one horizontal bar for each species  
where minlat sets minimum value and maxlat sets maximum value for  
the bar. I want all bars to be stacked on top of eachother to show  
where I have overlap.


I have tried using boxplot but it only produces the standard mean  
and sd of the two values...


C) boxplot is designed to handle raw data and then pass plotting  
responsibility off to the bxp function. Once you address your data  
input issues you can use bxp.


--
David


Thanks!

Jonas Josefsson


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Henrique Dallazuanna
Try this:

coef(fm$modelStruct$varStruct, uncons = FALSE)

On Fri, Oct 15, 2010 at 11:42 AM, Hoai Thu Thai wrote:

> Dear R-Users,
>
> I have a question concerning extraction of parameter estimates of variance
> function from lme fit.
>
> To fit my simulated data, we use varConstPower ( constant plus power
> variance function).
>
> fm<-lme(UPDRS~time,data=data.simula,random=~time,method="ML",weights=varConstPower(fixed=list(power=1)))
> I extract the results of this function by using the following codes:
> y<-summary(fm)
> x<-y$modelStruct$varStruct
> > x
> Variance function structure of class varConstPower representing
>   constpower
> 1.184535e-15 1.00e+00
>
> These are the constants and power that apppear in the summary(fm), and that
> I want to extract.
> Hower I got different value of const when extracting from x:
>
> > x$const
>const
> -34.36943
>
> Does anyone know why there is such difference and how to extract the
> expected value (1.184535e-15) ?
>
> Thanks in advance for your help.
>
> Thu
>
>
> ---
>
> THAI Hoai Thu
> INSERM U738 - Université Paris 7
> 16 rue Henri Huchard
> 75018 Paris, FRANCE
> Tel: 01 57 27 75 39
> Email: hoai-thu.t...@inserm.fr
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Hoai Thu Thai

Thank you Henrique!! It works.

Thu


Le 15/10/2010 16:53, Henrique Dallazuanna a écrit :

coef(fm$modelStruct$varStruct, uncons = FALSE)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p-values from coxph?

2010-10-15 Thread David Winsemius


On Oct 15, 2010, at 9:21 AM, Öhagen Patrik wrote:



Dear List,

I each iteration of a simulation study, I would like to save the p- 
value generated by "coxph". I fail to see how to adress the p-value.  
Do I have to calculate it myself from the Wald Test statistic?


No. And the most important reason is that would not give you the same  
value as is print()-ed by coxph().


If you ask for the the str(print(coxph(...)) you get NULL (after the  
side-effect of prinitng. The print function only produces side- 
effects. On the other hand you can use the summary function and it  
gives you a richer set of output.  Using the first example on the help  
page for coxph:


str(summary(coxph(Surv(time, status) ~ x + strata(sex), test1)))
List of 12
 $ call: language coxph(formula = Surv(time, status) ~ x +  
strata(sex), data = test1)

 $ fail: NULL
 $ na.action   : NULL
 $ n   : int 7
 $ loglik  : num [1:2] -3.87 -3.33
 $ coefficients: num [1, 1:5] 0.802 2.231 0.822 0.976 0.329
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "x"
  .. ..$ : chr [1:5] "coef" "exp(coef)" "se(coef)" "z" ...
 $ conf.int: num [1, 1:4] 2.231 0.448 0.445 11.18
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr "x"
  .. ..$ : chr [1:4] "exp(coef)" "exp(-coef)" "lower .95" "upper .95"
 $ logtest : Named num [1:3] 1.087 1 0.297
  ..- attr(*, "names")= chr [1:3] "test" "df" "pvalue"
 $ sctest  : Named num [1:3] 1.051 1 0.305
  ..- attr(*, "names")= chr [1:3] "test" "df" "pvalue"
 $ rsq : Named num [1:2] 0.144 0.669
  ..- attr(*, "names")= chr [1:2] "rsq" "maxrsq"
 $ waldtest: Named num [1:3] 0.95 1 0.329
  ..- attr(*, "names")= chr [1:3] "test" "df" "pvalue"
 $ used.robust : logi FALSE

So the fifth element of  coefficients leaf of the list structure has  
the same "p-value" as that print()-ed.


Try:

> summary(fit)$coefficients[5]
[1] 0.3292583

(It does seem to me that the name for that leaf of the fit object is  
not particularly in accord with what I would have considered  
"coefficients"., but I am really in no solid position to criticize  
Terry Therneau to whom we all owe a great deal of gratitude.)



--
David.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Many Thanks

2010-10-15 Thread Andrew Miles
And thank YOU for taking the time to express your gratitude.  I'm sure  
all those who regularly take the time to contribute to the list  
appreciate the appreciation.


Andrew Miles

On Oct 15, 2010, at 9:49 AM, Jumlong Vongprasert wrote:


Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the  
program.

In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing  
list.

Thanks R-help mailing list.
Thanks software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interaction contrasts

2010-10-15 Thread David Winsemius
Should you need to do it again, you may want to look at the relevel  
function. I suppose that would meet the definition of some versions of  
"on the fly" but once I have a model, rerunning with a different  
factor leveling is generally pretty painless.


--
David.


On Oct 15, 2010, at 9:09 AM, Kay Cichini wrote:



..by some (extensive) trial and error reordering the contrast matrix  
and the

reference level
i figured it out myself -
for anyone who might find this helpful searching for a similar  
contrast in

the future:
this should be the right one:

c2<-rbind("fac2-effect in A"=c(0,1,0,0,0,0,0,0),
 "fac2-effect in B"=c(0,1,0,0,0,1,0,0),
 "fac2-effect in C"=c(0,1,0,0,0,0,1,0),
 "fac2-effect in D"=c(0,1,0,0,0,0,0,1),
"fac2-effect, A*B"=c(0,0,0,0,0,1,0,0),
 "fac2-effect, A*C"=c(0,0,0,0,0,0,1,0),
 "fac2-effect, A*D"=c(0,0,0,0,0,0,0,1),
"fac2-effect, B*C"=c(0,0,0,0,0,-1,1,0),
"fac2-effect, B*D"=c(0,0,0,0,0,-1,0,1),
"fac2-effect, C*D"=c(0,0,0,0,0,0,-1,1))

summary(glht(mod,c2))


Kay Cichini wrote:


hello,

i was shortly asking the list for help with some interaction  
contrasts

(see below) for which
i had to change the reference level of the model "on the fly" (i  
read a

post that this is possible in
multcomp).

if someone has a clue how this is coded in multcomp; glht() -  
please point

me there.

yours,
kay


Kay Cichini wrote:


hello list,

i'd very much appreciate help with setting up the
contrast for a 2-factorial crossed design.

here is a toy example:

library(multcomp)

dat<-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]),
   fac2=rep(c("I","II"),16),y=rnorm(32,1,1))

mod<-lm(y~fac1*fac2,data=dat)

## the contrasts i'm interressted in:

c1<-rbind("fac2-effect in A"=c(0,1,0,0,0,0,0,0),
 "fac2-effect in B"=c(0,1,0,0,0,1,0,0),
 "fac2-effect in C"=c(0,1,0,0,0,0,1,0),
 "fac2-effect in D"=c(0,1,0,0,0,0,0,1),
"fac2-effect, A*B"=c(0,0,0,0,0,1,0,0),
 "fac2-effect, A*C"=c(0,0,0,0,0,0,1,0),
 "fac2-effect, A*D"=c(0,0,0,0,0,0,0,1))

summary(glht(mod,c1))

## now i want to add the remaining combinations
## "fac2, B*C"
## "fac2, B*D"
## "fac2, C*D"
## to the simultanous tests to see whether the effects
## of fac2 within the levels of fac1 differ between
## each combination of the levels of fac1, or not ??

thanks for any advise!

yours,
kay









-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context: 
http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear Regression Parameter Shared Across Multiple DataSets

2010-10-15 Thread Keith Jewell
Hi,

I've had to do something like that before. It seems to be a "feature" of nls 
(in R, but not as I recall in Splus) that it accepts a list with vector 
components as 'start' values, but flattens the result values to a single 
vector.

I can't spend much time explaining, but here's a fragment of code that might 
get you started:
---
# Fit the nls, correct coef names lost by nls
   val <- nls(formula=formulaIn, data=DataList, start= tCoefs, 
control=control)
  CoefList <- list()  # initialise CoefList
   for(aName in names(tCoefs)) {# for each vector of 
coefficients
 tvec <- get(aName, envir=val$m$getEnv()) # get it from the nls 
environment
 names(tvec) <- names(tCoefs[[aName]]) # correct its names
  assign(aName, tvec, envir=val$m$getEnv()) # return it to nls
 CoefList[[aName]] <- tvec# store in CoefList
}

As I recall,
tCoefs is a list of starting values that can have vector components
CoefList ends up as a similar structure and named list of result values

hth

Keith J

"Jared Blashka"  wrote in message 
news:aanlktimprowm-mne9n_bu8ty=jylitq4ewhpph1hy...@mail.gmail.com...
> Looking at the source for nlrob, it looks like it saves the coefficients
> from the results of running an nls and then passes those coefficients back
> into the next nls request. The issue that it's running into is that nls
> returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, 
> rather
> than just upper and a vector named LOGEC50. Does anyone know a way to
> restructure the formula/start parameter so that coef returns a vector
> instead of each element individually? Right now, I've had to 'hack' nlrob 
> so
> it recombines similarly named elements into a vector, but was wondering if
> there was a way to accomplish the end goal without those measures.
>
> Thanks,
> Jared
>
> On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka 
> wrote:
>
>> As an addendum to my question, I'm attempting to apply the solution to 
>> the
>> robust non-linear regression function nlrob from the robustbase package, 
>> and
>> it doesn't work in that situation. I'm getting
>>
>> allRobustFit <- nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all
>> ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5)))
>> Error in nls(formula, data = data, start = start, algorithm = algorithm,
>>  :
>>   parameters without starting value in 'data': LOGEC50
>>
>> I'm guessing this is because the nlrob function doesn't know what to do
>> with a vector for a start value. Am I correct and is there another method 
>> of
>> using nlrob in the same way?
>>
>> Thanks,
>> Jared
>>
>> On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka 
>> wrote:
>>
>>> Thanks so much! It works great.
>>>
>>> I had thought the way to do it relied on combining the data sets, but I
>>> couldn't figure out how to alter the formula to work with the 
>>> combination.
>>>
>>> Jared
>>>
>>>
>>> On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell 
>>> wrote:
>>>

 "Jared Blashka"  wrote in message
 news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com...
 > I'm working with 3 different data sets and applying this non-linear
 > regression formula to each of them.
 >
 > nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers,
 > start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5))
 >
 > Previously, all of the regressions were calculated in Prism, but I'd
 like
 > to
 > be able to automate the calculation process in a script, which is why
 I'm
 > trying to move to R. The issue I'm running into is that previously, 
 > in
 > Prism, I was able to calculate a shared value for a constraint so 
 > that
 all
 > three data sets shared the same value, but have other constraints
 > calculated
 > separately. So Prism would figure out what single value for the
 constraint
 > in question would work best across all three data sets. For my 
 > formula,
 > each
 > data set needs it's own LOGEC50 value, but the upper value should be
 the
 > same across the 3 sets. Is there a way to do this within R, or with a
 > package I'm not aware of, or will I need to write my own nls function
 to
 > work with multiple data sets, because I've got no idea where to start
 with
 > that.
 >
 > Thanks,
 > Jared
 >
 > [[alternative HTML version deleted]]
 >
 An approach which works for me (code below to illustrate principle, not
 tried...)

 1) combine all three "data sets" into one dataframe with a column (e.g.
 dset) indicating data set (1, 2 or 3)

 2) express your formula with upper as single valued and LOGEC50 as a
 vector
 inderxed by dest e.g.
 Y ~ upper/(1+10^(C-LOGEC50[dset]))

 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start 
 for
 all
 three LOGEC50 values
>>

Re: [R] specify data frame by name

2010-10-15 Thread Greg Snow
Also look at the get function, it may be a bit more straight forward (and safer 
if there is any risk of someone specifying 'rm(ls())' as a data frame name).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of darckeen
> Sent: Thursday, October 14, 2010 11:53 PM
> To: r-help@r-project.org
> Subject: Re: [R] specify data frame by name
> 
> 
> nvm, i figured it out.
> 
> 
> dfm <- data.frame(x=1:10)
> 
> testfunc <- function(data="dfm")
> {
>   dat <- eval.parent(as.symbol(data))
>   sum(dat$x)
> }
> 
> print(testfunc())
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/specify-
> data-frame-by-name-tp2996534p2996541.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nominal response model

2010-10-15 Thread Mauricio Romero
Is there a way to estimate a nominal response model?

 

To be more specific let's say I want to calibrate:

\pi_{v}(\theta_j)=\frac{e^{\xi_{v}+\lambda_{v}\theta_j}}{\sum_{h=1}^m
e^{\xi_{h}+\lambda_{h}\theta_j}}

 

Where $\theta_j$ is a the dependent variable and I need to estimate
$\xi_{h}$ and $\lambda_{h}$ for $h \in {1...,m}$.

 

Thank you,

 

 

Mauricio Romero 

Quantil S.A.S.

Cel: 3112231150

www.quantil.com.co

 

"It is from the earth that we must find our substance; it is on the earth
that we must find solutions to the problems that promise to destroy all life
here"

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with an unbalanced split plot

2010-10-15 Thread Eugenio Larios
Hi Dennis,

The first thing I did with my data was to explore it with 6 graphs
(wet-high, med, and solo-; dry-high, med, and solo-) and gave me very
interesting patterns: seed size in wet treatments is either negatively
correlated (high and medium densities) or flat (solo). But dry treatments
are all positively correlated! There is a very interesting switch there.

I also figured out why I can't do three way interactions. I explored the
structure of my data with str(mydata) and it shows that water treatment has
three levels when it should have just two. Then I went back to the excel
sheet, tried to sort the data by water treatment and I discover a single
data point from the wet treatment sticking out by itself. That is why R
reads three levels and since it is only one point, there cannot be any stats
of course.

thanks
E

On Thu, Oct 14, 2010 at 9:27 PM, Dennis Murphy  wrote:

> Hi:
>
> On Thu, Oct 14, 2010 at 7:50 PM, Eugenio Larios <
> elari...@email.arizona.edu> wrote:
>
>> Hi Dennis,
>>
>> thank you very much for your help, I really appreciate it.
>>
>> I forgot to say about the imbalance, yes. I only explained the original
>> set up, sorry. Let me explain.
>>
>> It is because in the process of the experiment which lasted 3 months I
>> lost individuals within the plots and I actually ended up losing 2 whole
>> plots (one dry and one wet) and some other individuals in other plots.
>>
>
> That still leaves you balanced at the plot level :)  Fortunately, you have
> enough replication. If you have missing subplots within the remaining plots,
> that would be another source of imbalance at the subplot level, but you
> should have enough subplots to be able to estimate all of the interactions
> unless an entire treatment in one set of plots was missing.
>
> It's worth graphing your data to anticipate which effects/interactions
> should be significant; graphs involving the spatial configuration of the
> plots and subplots would also be worthwhile.
>
>>
>> My study system has this special feature that allows me to track parental
>> seed sizes in plants germinated in the field, a persistent ring that stays
>> attached to the root even when the plant has germinated, so some of the
>> plants I lost did not have this ring anymore. It happens sometimes but most
>> of the time they have it. Also, some plants disappeared probably due to
>> predation, etc That made my experiment imbalanced.
>>
>
> That's common. No big deal.
>
>>
>> Do you think that will change the analysis? Also, do you think I should
>> use least squares ANOVA  (perhaps type III due to the imbalance?) instead of
>> LMM? What about the random effects that my blocking has created?
>>
>
> Actually, with unbalanced data it's to your advantage to use lme() over
> ANOVA. Just don't place too much importance on the p-values of tests; even
> the degrees of freedom are debatable. With unbalanced data, it's hard to
> predict what the sampling distribution of a given statistic will actually
> be, so the p-values aren't as trustworthy.
>
> You mentioned that you couldn't fit a three-way interaction; given your
> data configuration, that shouldn't happen.
>
> (1) Get two-way tables of water * density, one for the counts and one for
> the averages, something like
>
> with(mydata, table(water, density))
> aggregate(log(fitness) ~ water + density, data = mydata, FUN = mean, na.rm
> = TRUE)
>
> In the first table, unless you have very low frequencies in some category,
> your data 'density' should be enough to estimate all the main effects and
> interactions of interest. The second table is to check that you don't have
> NaNs or missing cells, etc.
>
>>
>> I am new to R-help website so I wrote you this message to your email but I
>> would like to post it on the R website, do you know how?
>>
>
> Wag answer: I hope so, since I managed to view and respond to your message
> :)
>
> More seriously, in gmail, the window that opens to produce replies has an
> option 'Reply to all'. I don't know if your e-mail client at UofA has that
> feature, but if not, you could always cc R-help and put the e-mail address
> in by hand if necessary. Most mailers are smart enough to auto-complete an
> address as you type in the name, so you could see if that applies on your
> system.
>
> I keep a separate account for R-help because of the traffic volume - if you
> intend to subscribe to the list, you might want to do the same. It's not
> unusual for 75-100 e-mails a weekday to enter your inbox...
>
>>
>> Thanks again!
>>
>> Eugenio
>>
>>
>> On Thu, Oct 14, 2010 at 5:34 PM, Dennis Murphy  wrote:
>>
>>> Hi:
>>>
>>> On Thu, Oct 14, 2010 at 3:58 PM, Eugenio Larios <
>>> elari...@email.arizona.edu> wrote:
>>>
 Hi Everyone,

 I am trying to analyze a split plot experiment in the field that was
 arranged like this:
 I am trying to measure the fitness consequences of seed size.

 Factors (X):
 *Seed size*: a continuous variable, normally distributed.
 *Water*: C

Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins

On 15 Oct 2010, at 13:55, Berwin A Turlach wrote:

> G'day Michael,
> 

Hi Berwin

Thanks for the reply

> On Fri, 15 Oct 2010 12:09:07 +0100
> Michael Hopkins  wrote:
> 
>> OK, my last question didn't get any replies so I am going to try and
>> ask a different way.
>> 
>> When I generate contrasts with contr.sum() for a 3 level categorical
>> variable I get the 2 orthogonal contrasts:
>> 
>>> contr.sum( c(1,2,3) )
>>  [,1] [,2]
>> 110
>> 201
>> 3   -1   -1
> 
> These two contrasts are *not* orthogonal.
> 
I'm surprised.  Can you please tell me how you calculated that.

>> This provides the contrasts <1-3> and <2-3> as expected.  But I also
>> want it to create <1-2> (i.e. <1-3> - <2-3>).  So in general I want
>> all possible orthogonal contrasts - think of it as the contrasts for
>> all pairwise comparisons between the levels.
> 
> You have to decide what you want.  The contrasts for all pairwise
> comparaisons between the levels or all possible orthogonal contrasts?
> 

Well the pairwise contrasts are the most important as I am looking for evidence 
of whether they are zero (i.e. no difference between levels) or not.  But see 
my above comment about orthogonality.

> The latter is actually not well defined.  For a factor with p levels,
> there would be p orthogonal contrasts, which are only identifiable up to
> rotation, hence infinitely many such sets. But there are p(p-1)
> pairwise comparisons. So unless p=2, yo have to decide what you want
> 
Well of course the pairwise comparisons are bi-directional so in fact only 
p(p-1)/2 are of interest to me.

>> Are there are any options for contrast() or other functions/libraries
>> that will allow me to do this automatically?
> 
> Look at package multcomp, in particular functions glht and mcp, these
> might help.
> 
Thanks I will have a look.  

But I want to be able to do this transparently "within" lm() using regsubsets() 
etc as I am collecting large quantities of summary stats from all possible 
models to use with a model choice criterion based upon true Bayesian model 
probabilities.

> Cheers,
> 
>   Berwin
> 
> == Full address 
> Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
> School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
> The University of Western Australia   FAX : +61 (8) 6488 1028
> 35 Stirling Highway   
> Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
> Australiahttp://www.maths.uwa.edu.au/~berwin




Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread William Dunlap
> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Joshua Wiley
> Sent: Friday, October 15, 2010 12:23 AM
> To: Gregor
> Cc: r-help@r-project.org
> Subject: Re: [R] fast rowCumsums wanted for calculating the cdf
> 
> Hi,
> 
> You might look at Reduce().  It seems faster.  I converted the matrix
> to a list in an incredibly sloppy way (which you should not emulate)
> because I cannot think of  the simple way.
> 
> > probs <- t(matrix(rep(1:1000), nrow=10)) # matrix with 
> row-wise probabilites
> > F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> > F[,1] <- probs[,1,drop=TRUE];
> > add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
> >
> >
> > system.time(F.slow <- t(apply(probs, 1, cumsum)))
>user  system elapsed
>  36.758   0.416  42.464
> >
> > system.time(for (cc in 2:ncol(F)) {
> +  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> + })
>user  system elapsed
>   0.980   0.196   1.328
> >
> > system.time(add(list(probs[,1], probs[,2], probs[,3], 
> probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], 
> probs[,9], probs[,10])))
>user  system elapsed
>   0.420   0.072   0.539

One way to avoid the typing of
   list(probs[,1], probs[,2], ...)
is to use split(probs, col(probs)).  However, split() is
surprisingly slow in this case.

Another approach is to use a matrix multiply, by a ncol(probs)
by ncol(probs) matrix with 0's in lower triangle and 1's
elsewhere.  Here are 4 approaches I've seen mentioned,
made into functions that output the matrix requested:

f1 <- function (x) t(apply(x, 1, cumsum))
f2 <- function (x){
if (ncol(x) > 1) 
for (i in 2:ncol(x)) x[, i] <- x[, i] + x[, i - 1]
x
}
f3 <- function (x) 
matrix(unlist(Reduce(`+`, split(x, col(x)), accumulate = TRUE)), 
  ncol = ncol(x))
f4 <- function (x) x %*% outer(seq_len(ncol(x)), seq_len(ncol(x)), "<=")

I tested it as follows
  > probs <- matrix(runif(1e7), ncol=10, nrow=1e6)
  > system.time(v1 <- f1(probs))
 user  system elapsed 
19.450.25   16.78 
  > system.time(v2 <- f2(probs))
 user  system elapsed 
 0.680.030.79 
  > system.time(v3 <- f3(probs))
 user  system elapsed 
38.250.24   34.47 
  > system.time(v4 <- f4(probs))
 user  system elapsed 
 0.700.050.56 
  > identical(v1,v2) && identical(v1,v3) && identical(v1,v4)
  [1] TRUE

If you have a fancy optimized/multithreaded BLAS linked
to your version of R there you may find that %*% is
much faster (I'm using the precompiled R for Windows).
As ncol(x) gets bigger the %*% approach probably loses
its advantage.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> >
> 
> .539 seconds for 10 vectors each with 1 million elements does not seem
> bad.  For 3 each, on my system it took .014 seconds, which for
> 200,000 iterations, I estimated as:
> 
> > (20*.014)/60/60
> [1] 0.778
> 
> (and this is on a stone age system with a single core processor and
> only 756MB of RAM)
> 
> It should not be difficult to get the list back to a matrix.
> 
> Cheers,
> 
> Josh
> 
> 
> 
> On Thu, Oct 14, 2010 at 11:51 PM, Gregor  wrote:
> > Dear all,
> >
> > Maybe the "easiest" solution: Is there anything that speaks 
> against generalizing
> > cumsum from base to cope with matrices (as is done in matlab)? E.g.:
> >
> > "cumsum(Matrix, 1)"
> > equivalent to
> > "apply(Matrix, 1, cumsum)"
> >
> > The main advantage could be optimized code if the Matrix is 
> extreme nonsquare
> > (e.g. 100,000x10), but the summation is done over the short 
> side (in this case 10).
> > apply would practically yield a loop over 100,000 elements, 
> and vectorization w.r.t.
> > the long side (loop over 10 elements) provides considerable 
> efficiency gains.
> >
> > Many regards,
> > Gregor
> >
> >
> >
> >
> > On Tue, 12 Oct 2010 10:24:53 +0200
> > Gregor  wrote:
> >
> >> Dear all,
> >>
> >> I am struggling with a (currently) cost-intensive problem: 
> calculating the
> >> (non-normalized) cumulative distribution function, given 
> the (non-normalized)
> >> probabilities. something like:
> >>
> >> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with 
> row-wise probabilites
> >> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
> >>
> >> One (already faster, but for sure not ideal) solution - 
> thanks to Henrik Bengtsson:
> >>
> >> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> >> F[,1] <- probs[,1,drop=TRUE];
> >> for (cc in 2:ncol(F)) {
> >>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
> >> }
> >>
> >> In my case, probs is a (30,000 x 10) matrix, and i need to 
> iterate this step around
> >> 200,000 times, so speed is crucial. I currently can make 
> sure to have no NAs, but
> >> in order to extend matrixStats, this could be a nontrivial issue.
> >>
> >> Any ideas for speeding up this - probably routine - task?
> >>
> >> Thanks in advance,
> >> Gregor
> >>
> >> __
> >> R-help@r-project.o

[R] Multi-line graph?

2010-10-15 Thread barnhillec

Hi,

I am relatively new to R but not to graphing, which I used to do in Excel
and a few other environments on the job. I'm going back to school for a PhD
and am teaching myself R beforehand. So I hope this question is not
unacceptably ignorant but I have perused every entry level document I can
find and so far I'm out of luck.

I'm trying to graph some simple music psychology data. Columns are musical
intervals, rows are the initials of the subjects. Numbers are in beats per
minute (this is the value at which they hear the melodic interval split into
two streams). So here's my table:

 Tenth Fifth Third
GG 112152   168
EC  100120   140
SQ  160   184NA
SK   120   100   180

I want a multi-line graph where the intervals are on the X axis, and the y
axis is the beats per minute, and each subject has a different line. In
Excel this would be no problem but I am having trouble in R.

The only way I can figure out how to plot this in R is if the columns or
rows are taken as variables. But the variable is beats per minute. Any
suggestions?

I appreciate the help. -Eric
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multi-line graph?

2010-10-15 Thread Dieter Menne


barnhillec wrote:
> 
> I'm trying to graph some simple music psychology data. Columns are musical
> intervals, rows are the initials of the subjects. Numbers are in beats per
> minute (this is the value at which they hear the melodic interval split
> into two streams). So here's my table:
> 
>  Tenth Fifth Third
> GG 112152   168
> EC  100120   140
> SQ  160   184NA
> SK   120   100   180
> 
> I want a multi-line graph where the intervals are on the X axis, and the y
> axis is the beats per minute, and each subject has a different line. 
> 
> 
The most difficult part of it (at least that's what my students think) is
getting the data into the right format. If you have the changes to start
with the data in the "long" format, use it. What you need is:

init interval beats
GC   Tenth112

In this case, reformatting almost works with the default version of the melt
function in package reshape.

It's good that you supplied a data example, but in general it is better if
you could provide it in a copy-and-paste format as shown below. I needed
more time to reformat the data than to write the rest.


Dieter


library(lattice)
library(reshape)
dt = data.frame(
   init  = c("GG","EC", "SQ","SK"),
   Tenth = c(112,100,160,120),
   Fifth = c(152,120,184,100),
   Third = c(168,140,NA,180))

# The data should look like this
#init interval beats
#GC   Tenth112
#dtlong = melt(dt) # almost does it, but column "variable" is ugly
dtlong = melt(dt,variable_name="beats") # better
dtlong
   
xyplot(value~variable,groups=init,data=dtlong,type="l", 
auto.key=list(lines=TRUE))





-- 
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997435.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multi-line graph?

2010-10-15 Thread William Revelle

?matplot

e.g.,

copy your data to the clipboard then

library(psych)
my.data <- read.clipboard()

 my.data

   Tenth Fifth Third
GG   112   152   168
EC   100   120   140
SQ   160   184NA
SK   120   100   180

matplot(t(my.data),type="b")

Bill



At 10:27 AM -0700 10/15/10, barnhillec wrote:

Hi,

I am relatively new to R but not to graphing, which I used to do in Excel
and a few other environments on the job. I'm going back to school for a PhD
and am teaching myself R beforehand. So I hope this question is not
unacceptably ignorant but I have perused every entry level document I can
find and so far I'm out of luck.

I'm trying to graph some simple music psychology data. Columns are musical
intervals, rows are the initials of the subjects. Numbers are in beats per
minute (this is the value at which they hear the melodic interval split into
two streams). So here's my table:

 Tenth Fifth Third
GG 112152   168
EC  100120   140
SQ  160   184NA
SK   120   100   180

I want a multi-line graph where the intervals are on the X axis, and the y
axis is the beats per minute, and each subject has a different line. In
Excel this would be no problem but I am having trouble in R.

The only way I can figure out how to plot this in R is if the columns or
rows are taken as variables. But the variable is beats per minute. Any
suggestions?

I appreciate the help. -Eric
--
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
Dear all, I have following 2 zoo objects. However when I try to merge those 2 
objects into one, nothing is coming as intended. Please see below the objects 
as well as the merged object:


> dat11
  V2   V3   V4   V5
2010-10-15 13:43:54 73.8 73.8 73.8 73.8
2010-10-15 13:44:15 73.8 73.8 73.8 73.8
2010-10-15 13:45:51 73.8 73.8 73.8 73.8
2010-10-15 13:46:21 73.8 73.8 73.8 73.8
2010-10-15 13:47:27 73.8 73.8 73.8 73.8
2010-10-15 13:47:54 73.8 73.8 73.8 73.8
2010-10-15 13:49:51 73.7 73.7 73.7 73.7
> dat22
  V2   V3   V4   V5
2010-10-15 12:09:12 74.0 74.0 74.0 74.0
2010-10-15 12:09:33 73.9 73.9 73.9 73.9
2010-10-15 12:20:36 74.0 74.0 74.0 74.0
2010-10-15 12:30:36 74.0 74.0 74.0 74.0
2010-10-15 12:41:03 73.7 73.7 73.7 73.7
> merge(dat11, dat22)
V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
V4.dat22 V5.dat22
2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 12:09:33   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:43:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:44:15   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:45:51   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:46:21   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:27   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:49:51   NA   NA   NA   NA   NA   NA   
NA   NA
Warning messages:
1: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length
2: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length

If somebody points me whether I went wrong, it would be really great.

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Overlaying two png?

2010-10-15 Thread steven mosher
I have a program that creates a Png file using Rgooglemap with an extent
(lonmin,lonmax,latmin,latmax)
I also have a contour plot of the same location, same extent, same sized
(height/width) png file.

I'm looking for a way to make the contour semi transparent and overlay it on
the google map ( hybrid map)

Since I have 7000 of these to do an automated process is desired ( grin)

Any pointers in the right direction ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Dieter Menne


Megh wrote:
> 
> Dear all, I have following 2 zoo objects. However when I try to merge
> those 2 objects into one, nothing is coming as intended. Please see below
> the objects as well as the merged object:
> 
> 
>> merge(dat11, dat22)
> V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22
> V4.dat22 V5.dat22
> 2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA 
>  
> NA   NA
> 
Since the simulated example works, it must have to do with your data. Try
str(dat11), str(dat12), maybe something strange has crept in.

Dieter


library(zoo)
x.date <- as.Date(paste(2003, 02, c(1, 3, 7, 9, 14), sep = "-"))
x <- zoo(matrix(1:10, ncol = 2), x.date)
x
str(x)

y.date <- as.Date(paste(2006, 02, c(1, 3, 7, 9, 14), sep = "-"))
y <- zoo(matrix(11:20, ncol = 2), y.date)
y
str(y)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997494.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Achim Zeileis



On Fri, 15 Oct 2010, Megh Dal wrote:


Dear all, I have following 2 zoo objects. However when I try to merge those 2 
objects into one, nothing is coming as intended. Please see below the objects 
as well as the merged object:



dat11

 V2   V3   V4   V5
2010-10-15 13:43:54 73.8 73.8 73.8 73.8
2010-10-15 13:44:15 73.8 73.8 73.8 73.8
2010-10-15 13:45:51 73.8 73.8 73.8 73.8
2010-10-15 13:46:21 73.8 73.8 73.8 73.8
2010-10-15 13:47:27 73.8 73.8 73.8 73.8
2010-10-15 13:47:54 73.8 73.8 73.8 73.8
2010-10-15 13:49:51 73.7 73.7 73.7 73.7

dat22

 V2   V3   V4   V5
2010-10-15 12:09:12 74.0 74.0 74.0 74.0
2010-10-15 12:09:33 73.9 73.9 73.9 73.9
2010-10-15 12:20:36 74.0 74.0 74.0 74.0
2010-10-15 12:30:36 74.0 74.0 74.0 74.0
2010-10-15 12:41:03 73.7 73.7 73.7 73.7

merge(dat11, dat22)

   V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
V4.dat22 V5.dat22
2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 12:09:33   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:43:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:44:15   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:45:51   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:46:21   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:27   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:49:51   NA   NA   NA   NA   NA   NA   
NA   NA
Warning messages:
1: In MATCH(x, x) == seq_len(length(x)) :
 longer object length is not a multiple of shorter object length
2: In MATCH(x, x) == seq_len(length(x)) :
 longer object length is not a multiple of shorter object length

If somebody points me whether I went wrong, it would be really great.


merge() does cbind() (among some more general computations), I guess you 
want rbind().


Try rbind(dat11, dat22).

hth,
Z


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 2:20 PM, Megh Dal  wrote:
> Dear all, I have following 2 zoo objects. However when I try to merge those 2 
> objects into one, nothing is coming as intended. Please see below the objects 
> as well as the merged object:
>
>
>> dat11
>                      V2   V3   V4   V5
> 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
> 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
> 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
> 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
> 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
> 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
> 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
>> dat22
>                      V2   V3   V4   V5
> 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
> 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
> 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
> 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
> 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
>> merge(dat11, dat22)
>                    V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
> V4.dat22 V5.dat22
> 2010-10-15 12:09:12       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 12:09:33       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:43:54       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:44:15       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:45:51       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:46:21       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:47:27       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:47:54       NA       NA       NA       NA       NA       NA     
>   NA       NA
> 2010-10-15 13:49:51       NA       NA       NA       NA       NA       NA     
>   NA       NA
> Warning messages:
> 1: In MATCH(x, x) == seq_len(length(x)) :
>  longer object length is not a multiple of shorter object length
> 2: In MATCH(x, x) == seq_len(length(x)) :
>  longer object length is not a multiple of shorter object length
>
> If somebody points me whether I went wrong, it would be really great.
>

If I try it then it works properly so there is likely something wrong
with your dat11 and dat22 objects.  If you provide the problem
reproducibly one might be able to say more.

> Lines1 <- "Date Time V2   V3   V4   V5
+ 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
+ 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
+ 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:49:51 73.7 73.7 73.7 73.7"
>
> Lines2 <- "Date Time V2   V3   V4   V5
+ 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
+ 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
+ 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:41:03 73.7 73.7 73.7 73.7"
>
> library(zoo)
> dat1 <- read.zoo(textConnection(Lines1), header = TRUE,
+ index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)))
Warning messages:
1: closing unused connection 8 (Lines2)
2: closing unused connection 7 (Lines1)
3: closing unused connection 5 (Lines2)
4: closing unused connection 4 (Lines1)
5: closing unused connection 3 (Lines2)
> dat2 <- read.zoo(textConnection(Lines2), header = TRUE,
+ index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)))
> merge(dat1, dat2)
V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2 V3.dat2
V4.dat2 V5.dat2
2010-10-15 12:09:12  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:09:33  NA  NA  NA  NA73.973.9
73.973.9
2010-10-15 12:20:36  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:30:36  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:41:03  NA  NA  NA  NA73.773.7
73.773.7
2010-10-15 13:43:5473.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:44:1573.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:45:5173.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:46:2173.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:47:2773.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:47:5473.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:49:5173.773.773.773.7  NA  NA
  NA  NA

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
Hi Gabor, please see the attached files which is in text format. I have opened 
them on excel then, used clipboard to load them into R. Still really unclear 
what to do.

Also can you please elaborate this term "index = list(1, 2), FUN = function(d, 
t) as.POSIXct(paste(d, t))" in your previous file? In help, it is given 
that:"If FUN is specified then read.zoo calls FUN with the index as the first 
argument". I really could not connect your syntax with help.

--- On Sat, 10/16/10, Gabor Grothendieck  wrote:

> From: Gabor Grothendieck 
> Subject: Re: [R] Problem with merging two zoo objects
> To: "Megh Dal" 
> Cc: r-h...@stat.math.ethz.ch
> Date: Saturday, October 16, 2010, 12:11 AM
> On Fri, Oct 15, 2010 at 2:20 PM, Megh
> Dal 
> wrote:
> > Dear all, I have following 2 zoo objects. However when
> I try to merge those 2 objects into one, nothing is coming
> as intended. Please see below the objects as well as the
> merged object:
> >
> >
> >> dat11
> >                      V2   V3   V4   V5
> > 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
> > 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
> > 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
> > 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
> > 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
> > 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
> > 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
> >> dat22
> >                      V2   V3   V4   V5
> > 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
> > 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
> > 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
> > 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
> > 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
> >> merge(dat11, dat22)
> >                    V2.dat11 V3.dat11
> V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22
> > 2010-10-15 12:09:12       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 12:09:33       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:43:54       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:44:15       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:45:51       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:46:21       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:47:27       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:47:54       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > 2010-10-15 13:49:51       NA       NA      
> NA       NA       NA       NA       NA      
> NA
> > Warning messages:
> > 1: In MATCH(x, x) == seq_len(length(x)) :
> >  longer object length is not a multiple of shorter
> object length
> > 2: In MATCH(x, x) == seq_len(length(x)) :
> >  longer object length is not a multiple of shorter
> object length
> >
> > If somebody points me whether I went wrong, it would
> be really great.
> >
> 
> If I try it then it works properly so there is likely
> something wrong
> with your dat11 and dat22 objects.  If you provide the
> problem
> reproducibly one might be able to say more.
> 
> > Lines1 <- "Date Time
> V2   V3   V4   V5
> + 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
> + 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
> + 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
> + 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
> + 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
> + 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
> + 2010-10-15 13:49:51 73.7 73.7 73.7 73.7"
> >
> > Lines2 <- "Date Time
> V2   V3   V4   V5
> + 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
> + 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
> + 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
> + 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
> + 2010-10-15 12:41:03 73.7 73.7 73.7 73.7"
> >
> > library(zoo)
> > dat1 <- read.zoo(textConnection(Lines1), header =
> TRUE,
> + index = list(1, 2), FUN = function(d, t)
> as.POSIXct(paste(d, t)))
> Warning messages:
> 1: closing unused connection 8 (Lines2)
> 2: closing unused connection 7 (Lines1)
> 3: closing unused connection 5 (Lines2)
> 4: closing unused connection 4 (Lines1)
> 5: closing unused connection 3 (Lines2)
> > dat2 <- read.zoo(textConnection(Lines2), header =
> TRUE,
> + index = list(1, 2), FUN = function(d, t)
> as.POSIXct(paste(d, t)))
> > merge(dat1, dat2)
>                
>     V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2
> V3.dat2
> V4.dat2 V5.dat2
> 2010-10-15 12:09:12      NA   
>   NA      NA     
> NA    74.0    74.0
> 74.0    74.0
> 2010-10-15 12:09:33      NA   
>   NA      NA     
> NA    73.9    73.9
> 73.9    73.9
> 2010-10-15 12:20:36      NA   
>   NA      NA     
> NA    74.0    74.0
> 74.0    74.0
> 2010-10-15 12:30:36      NA   
>   NA      NA     
> NA    74.0    74.0
> 74.0    74.0
> 2010-10-15 12:41:03      NA   
>   NA      NA     
> NA    73.7    73.7
> 73.7    73.7
> 2010-10-15 13:43:54    73.8   
> 73.8    73.8    73.8     
> NA      NA
>   NA      NA
> 2010-10-15 13:44:15    73.8   
> 73.8    73.8    73.8     
> N

Re: [R] using apply function and storing output

2010-10-15 Thread Dennis Murphy
Hi:

You need to give a function for rollapply() to apply :)

Here's my toy example:

d <- as.data.frame(matrix(rpois(30, 5), nrow = 10))
library(zoo)
d1 <- zoo(d)   # uses row numbers as index

# rolling means of 3 in each subseries (columns)
> rollmean(d1, 3)
V1   V2   V3
2 3.00 5.33 4.33
3 3.33 4.33 4.00
4 2.67 3.67 3.33
5 4.33 3.67 3.33
6 5.00 5.00 5.00
7 4.67 4.67 4.00
8 5.67 3.67 4.00
9 5.00 3.00 2.67

# create new function
> cv <- function(x) sd(x)/mean(x)

# use new function in rollapply():
> rollapply(d1, 3, FUN = cv)
 V1V2V3
2 0.333 0.1082532 0.5329387
3 0.1732051 0.4803845 0.6614378
4 0.2165064 0.5677271 0.4582576
5 0.7418193 0.5677271 0.4582576
6 0.600 0.3464102 0.400
7 0.7525467 0.4948717 0.6614378
8 0.8882158 0.5677271 0.6614378
9 1.0583005 0.333 0.2165064

This is a simplistic example, but it should get you started.

HTH,
Dennis
On Fri, Oct 15, 2010 at 2:00 AM, David A.  wrote:

>  Hi Dennis and Dimitris,
>
> thanks for your answers. I am trying the rollmean() function and also the
> rollapply() function because I also want to calculate CV for the values.
> For this I created a co.var() function. I am having problems using them.
>
> >co.var<-function(x)(
> +sd(x)/mean(x)
> +)
>
>
> > dim(mydata)
> [1] 1710  244
>
> >xxx<-rollmean(mydata,3,by=3)
>
> works fine and creates a vector which I will transform into a matrix. I
> still have to find out how to store the output in my previously created
> 570x244 0's matrix in an ordered way.
>
> but, since the examples in the help page says it´s the same, I tried
>
> > xxx<-rollapply(mydata,3,mean,by=3)
> Error in UseMethod("rollapply") :
>   no applicable method for 'rollapply' applied to an object of class
> "c('matrix', 'integer', 'numeric')"
>
> and, with my created function...
>
> > xxx<-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3)
> Error in UseMethod("rollapply") :
>   no applicable method for 'rollapply' applied to an object of class
> "c('matrix', 'integer', 'numeric')"
>
> Can you help me with the error?
>
> Dave
>
> --
> Date: Fri, 15 Oct 2010 00:45:08 -0700
> Subject: Re: [R] using apply function and storing output
> From: djmu...@gmail.com
> To: dasol...@hotmail.com
> CC: r-help@r-project.org
>
>
> Hi:
>
> Look into the rollmean() function in package zoo.
>
> HTH,
> Dennis
>
> On Fri, Oct 15, 2010 at 12:34 AM, David A.  wrote:
>
>
> Hi list,
>
> I have a 1710x244 matrix of numerical values and I would like to calculate
> the mean of every group of three consecutive values per column to obtain a
> new matrix of 570x244.  I could get it done using a for loop but how can I
> do that using apply functions?
> In addition to this, do I have to initizalize a 570x244 matrix with 0's to
> store the calculated values or can the output matrix be generated while
> calculating the mean values?
>
> Cheers,
>
> Dave
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scaling on normlized data set

2010-10-15 Thread hosnaw

Hi,

I am using R and tried to normalize the data within each sample group using
RMA. When I tried to import the all the normalized expression data as a
single text file and make a boxplot, it showed discrepancy among the sample
groups.  I tried to scale them or re-normalize them again, so that it can be
used for further analysis. Unfortunately, I did not manage it on using
AffyPLM package. Would you please help me out with this problem.

Thanks in advance.

Best regards,
H. Nawaz
-- 
View this message in context: 
http://r.789695.n4.nabble.com/scaling-on-normlized-data-set-tp2996771p2996771.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bwplot change whiskers position to percentile 5 and P95

2010-10-15 Thread Christophe Bouffioux
Hi David,
More info

Thanks a lot
Christophe

##
library(Hmisc)
library(lattice)
library(fields)
library(gregmisc)
library(quantreg)

> str(sasdata03_a)
'data.frame':   109971 obs. of  6 variables:
 $ jaar  : Factor w/ 3 levels "2006","2007",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cat_F : Factor w/ 6 levels "a","d","g","h",..: 1 1 1 1 1 1 1 1 1 1
...
 $ montant_oa: num  8087 1960 1573 11502 3862 ...
 $ nombre_cas: int  519 140 111 780 262 125 79 334 97 503 ...
 $ montant_i : num  221 221 221 221 221 ...
 $ nombre_i  : int  12 12 12 12 12 12 12 12 12 12 ...
> aggregate.table(sasdata03_a$nombre_cas, sasdata03_a$jaar,
sasdata03_a$Cat_F, nobs)
a   d  g  h  i   k
2006 7568 1911 5351 7430 7377 7217
2007 7499 1914 5378 7334 7275 7138
2008 7524 1953 5488 7292 7192 7130
> d2008 <- sasdata03_a[sasdata03_a$Cat_F=="d" & sasdata03_a$jaar==2008, ]
> quantile(d2008$nombre_cas, probs = c(0.95))
   95%
3630.2

#bb#
#   #
#  graph on real data   #
#   #
#bb#

bwplot(jaar ~ nombre_cas | Cat_F, xlab="", ylab="",
   data=sasdata03_a  , xlim=c(-100,3800), layout=c(2,3), X =
sasdata03_a$nombre_i,
   pch = "|",
   stats=boxplot2.stats,
   main="Fig. 1: P95 d 2008=3630",

   par.settings = list(
   plot.symbol = list(alpha = 1, col = "transparent",cex = 1,pch = 20)),
   panel = function(x, y, ..., X, subscripts){
  panel.grid(v = -1, h = 0)
  panel.bwplot(x, y, ..., subscripts = subscripts)
  X <- X[subscripts]
  X <- tapply(X, y, unique)
  Y <- tapply(y, y, unique)
  tg <- table(y)
  panel.points(X, Y, cex=3, pch ="|" , col = "red")
  panel.text((3700*0.8), (Y-0.15), labels = paste("N=", tg))

  })


#Simulated data 1  #


ex <- data.frame(v1 = log(abs(rt(180, 3)) + 1),
 v2 = rep(c("2007", "2006", "2005"), 60),
 z  = rep(c("a", "b", "c", "d", "e", "f"), e = 30))

ex2 <- data.frame(v1b = log(abs(rt(18, 3)) + 1),
 v2 = rep(c("2007", "2006", "2005"), 6),
 z  = rep(c("a", "b", "c", "d", "e", "f"), e = 3))
ex3 <- merge(ex, ex2, by=c("v2","z"))

#
#  #
#graph on simulated data   #
#  #
#

bwplot(as.factor(v2) ~ v1 | as.factor(z), data = ex3, layout=c(3,2), X =
ex3$v1b,
  pch = "|", main="Boxplot.stats",
  stats=boxplot2.stats,
  par.settings = list(
  plot.symbol = list(alpha = 1, col = "transparent",cex = 1,pch = 20)),
 panel = function(x, y, ..., X, subscripts){
   panel.grid(v = -1, h = 0)
   panel.bwplot(x, y, ..., subscripts = subscripts)
X <- X[subscripts]
xmax =max(x)
X <- tapply(X, y, unique)
Y <- tapply(y, y, unique)
tg <- table(y)
panel.points(X, Y, cex=3, pch ="|" , col = "red")
panel.text((xmax-xmax*0.1), (Y-0.15), labels = paste("N=", tg))
 })
##



On Thu, Oct 14, 2010 at 5:22 PM, David Winsemius wrote:

>
> On Oct 14, 2010, at 11:07 AM, Christophe Bouffioux wrote:
>
> Hi,
>>
>> I have tried your proposition, and it works properly on the simulated
>> data, but not on my real data, and I do not see any explanations, this is
>> weird, i have no more ideas to explore the problem
>>
>
> You should:
> a) provide the code that you used to call boxplot
> b) offer str(sasdata03_a) ,  since summary will often obscure
> class-related problems
> c) consider running:
> sasdata03_a$Cat_F <- factor(sasdata03_a$Cat_F) # to get rid of the empty
> factor level
>
>
>
>>
>> so here i give some information on my data, nothing special actually,
>>
>> Christophe
>>
>> > summary(sasdata03_a)
>>   jaar   Cat_F  montant_oanombre_cas
>>  montant_i
>>  2006:36854   a  :22591Min.   :  -112.8Min.   :   -5.0
>>   Min.   :   33.22
>>  2007:36538   h  :220561st Qu.:  1465.5 1st Qu.:  104.01st
>> Qu.:   37.80
>>  2008:36579   i  :21844 Median :  4251.5 Median :  307.0
>> Median :   50.00
>>k  :21485 Mean   : 13400.0Mean   :  557.5
>>  Mean   : 1172.17
>>g  :16217 3rd Qu.: 13648.53rd Qu.:  748.0
>>  3rd Qu.:  458.67
>>d  : 5778  Max.   : 534655.6Max.   :13492.0
>>   Max.   :17306.73
>>(Other):0
>>nombre_i
>>  Min.   :  1.00
>>  1st Qu.:  1.00
>>  Median :  5.00
>>  Mean   : 44.87
>>  3rd Qu.: 29.00
>>  Max.   :689.00
>>
>> > is.data.frame(sasdata03_a)
>> [1] TRUE
>>
>> The code of the function:
>>
>> boxplot2.stats <-  function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE)
>> {
>>if (coef < 0)
>>stop("'coef' must not be negative")
>>nna <- !is.na(x)
>>n <- su

Re: [R] RJava help

2010-10-15 Thread lord12

public class my_convolve
{
public static  void main(String[] args)
{

}
 public static void convolve()
 {
System.out.println("Hello");
  }   
}  


library(rJava) 
.jinit(classpath="C:/Documents and Settings/GV/workspace/Test/bin",
parameters="-Xmx512m")
.jcall("my_convolve", method="convolve")

Error in .jcall("my_convolve", method = "convolve") : 
  method convolve with signature ()V not found



-- 
View this message in context: 
http://r.789695.n4.nabble.com/RJava-help-tp2995886p2996914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
I tried that too, it doesn't work because of the way I wrote the code.
Listing y as free or not giving it a limit makes the scale go from -0.5 to
0.5, which is useless. This is what my code looks like now (it's S-Plus
code, btw)-- I'll try reading up on lattices in R to see if I can come up
with something better.

par(mfrow = c(2,2))
unique.id <- unique(d1$ID)
unique.id <- sort(unique.id)
for(j in unique.id){
temp.id <- d1[d1$ID==j,]
unique.dose <-unique(temp.id$DOSE)
plot(0,0,type="n",
xlab= "Time (hr)",ylab="Concentration (ug/L)",
xlim=c(0,32),ylim=c(0,200),
main=paste("ID",j)
)
for(i in unique.dose){
temp.subanddose <- temp.id[temp.id$DOSE==5,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=1,pch=1,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type="l",col=1)}
for(i in unique.dose){
temp.subanddose <- temp.id[temp.id$DOSE==7,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=2,pch=2,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type="l",col=2)}
for(i in unique.dose){
temp.subanddose <- temp.id[temp.id$DOSE==10,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=3,pch=3,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type="l",col=3)}
key(text=list(c("5 mg", "7 mg", "10 mg")),lines=list(type=rep
("l",3),col=1:3),border=T,corner=c(1,1))
}



On Fri, Oct 15, 2010 at 1:31 AM, David Winsemius wrote:

>
> On Oct 15, 2010, at 3:46 AM, Anh Nguyen wrote:
>
>  Hello Dennis,
>>
>> That's a very good suggestion. I've attached a template here as a .png
>> file,
>> I hope you can view it. This is what I've managed to achieve in S-Plus (we
>> use S-Plus at work but I also use R because there's some very good R
>> packages for PK data that I want to take advantage of that is not
>> available
>> in S-Plus). The only problem with this is, unfortunately, I cannot figure
>> out how make the scale non-uniform and I hope to fix that.
>>
>
> That would be easy if your efforts which I have not yet seen were in
> lattice. If htat were the case then adding this would solve you problem:
>
> scales=list(y=list(relation="free")
>
> --
> David
>
>> My data looks
>> like this:
>>
>> IDDose Time Conc  Pred ...
>> 1 5   0  00
>> 1 5   0.5   68
>> 1 5   1 16   20
>> ...
>> 1 7   0  00
>> 1 7   0.5  10   12
>> 1 7   1 20   19
>> ...
>> 110  3 60   55
>> ...
>> 2512   4 2
>> ...
>> ect
>>
>>
>> I don't care if it's ggplot or something else as long as it looks like how
>> I
>> envisioned.
>>
>>
>>
>>
>> On Fri, Oct 15, 2010 at 12:22 AM, Dennis Murphy 
>> wrote:
>>
>>  I don't recall that you submitted a reproducible example to use as a
>>> template for assistance. Ista was kind enough to offer a potential
>>> solution,
>>> but it was an abstraction based on the limited information provided in
>>> your
>>> previous mail. If you need help, please provide an example data set that
>>> illustrates the problems you're encountering and what you hope to achieve
>>> -
>>> your chances of a successful resolution will be much higher when you do.
>>> BTW, there's a dedicated newsgroup for ggplot2:
>>> look for the mailing list link at  http://had.co.nz/ggplot2/
>>>
>>> HTH,
>>> Dennis
>>>
>>>
>>> On Thu, Oct 14, 2010 at 10:02 PM, Anh Nguyen 
>>> wrote:
>>>
>>>  I found 2 problems with this method:

 - There is only one line for predicted dose at 5 mg.
 - The different doses are 5, 7, and 10 mg but somehow there is a legend
 for
 5,6,7,8,9,10.
 - Is there a way to make the line smooth?
 - The plots are also getting a little crowded and I was wondering if
 there
 a
 way to split it into 2 or more pages?

 Thanks for your help.

 On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn >>>
> wrote:
>

  Hi,
> Assuming the data is in a data.frame named "D", something like
>
> library(ggplot2) # May need install.packages("ggplot2") first
> ggplot(D, aes(x=Time, y=Concentration, color=Dose) +
> geom_point() +
> geom_line(aes(y = PredictedConcentration, group=1)) +
> facet_wrap(~ID, scales="free", ncol=3)
>
> should do it.
>
> -Ista
> On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo 
>
 wrote:

>
>> Hello-- I have a data for small population who took 1 drug at 3
>>
> different

> doses. I have the actual drug concentrations as well as predicted
>> concentrations by my model. This is what I'm looking for:
>>
>> - Time vs Concentration by ID (individual plots), with each subject
>> occupying 1 plot -- there is to be 9 plots per page (3x3)
>> - Observed drug concentration is made up of points, and predic

Re: [R] split data with missing data condition

2010-10-15 Thread Henrique Dallazuanna
Try this:

split(as.data.frame(DF), is.na(DF$x))


On Fri, Oct 15, 2010 at 9:45 AM, Jumlong Vongprasert  wrote:

> Dear all
> I have data like this:
>  x  y
>  [1,] 59.74889  3.1317081
>  [2,] 38.77629  1.7102589
>  [3,]   NA  2.2312962
>  [4,] 32.35268  1.3889621
>  [5,] 74.01394  1.5361227
>  [6,] 34.82584  1.1665412
>  [7,] 42.72262  2.7870875
>  [8,] 70.54999  3.3917257
>  [9,] 59.37573  2.6763249
>  [10,] 68.87422  1.9697770
>  [11,] 19.00898  2.0584415
>  [12,] 60.27915  2.5365194
>  [13,] 50.76850  2.3943836
>  [14,]   NA  2.2862790
>  [15,] 39.01229  1.7924957
>
> and I want to spit data into two set of data,  data set of nonmising and
> data set of missing.
> How I can do this.
> Many Thanks.
> Jumlong
>
>
> --
> Jumlong Vongprasert
> Institute of Research and Development
> Ubon Ratchathani Rajabhat University
> Ubon Ratchathani
> THAILAND
> 34000
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread David A.

Thanks Dennis,

I don't think it was a problem of not feeding in a function for rollapply(), 
because I was using mean() and my co.var() function in the FUN argument. 
The key part seems to be the transformation that zoo() does to the matrix. If I 
do the same transformation to my original matrix, the rollapply() function now 
works fine.

Thanks again

David

Date: Fri, 15 Oct 2010 03:00:27 -0700
Subject: Re: [R] using apply function and storing output
From: djmu...@gmail.com
To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi:

You need to give a function for rollapply() to apply :)

Here's my toy example:

d <- as.data.frame(matrix(rpois(30, 5), nrow = 10))
library(zoo)
d1 <- zoo(d)   # uses row numbers as index


# rolling means of 3 in each subseries (columns)
> rollmean(d1, 3)
V1   V2   V3
2 3.00 5.33 4.33
3 3.33 4.33 4.00
4 2.67 3.67 3.33
5 4.33 3.67 3.33

6 5.00 5.00 5.00
7 4.67 4.67 4.00
8 5.67 3.67 4.00
9 5.00 3.00 2.67

# create new function
> cv <- function(x) sd(x)/mean(x)

# use new function in rollapply():

> rollapply(d1, 3, FUN = cv)
 V1V2V3
2 0.333 0.1082532 0.5329387
3 0.1732051 0.4803845 0.6614378
4 0.2165064 0.5677271 0.4582576
5 0.7418193 0.5677271 0.4582576
6 0.600 0.3464102 0.400

7 0.7525467 0.4948717 0.6614378
8 0.8882158 0.5677271 0.6614378
9 1.0583005 0.333 0.2165064

This is a simplistic example, but it should get you started.

HTH,
Dennis
On Fri, Oct 15, 2010 at 2:00 AM, David A.  wrote:






Hi Dennis and Dimitris,

thanks for your answers. I am trying the rollmean() function and also the 
rollapply() function because I also want to calculate CV for the values.
For this I created a co.var() function. I am having problems using them.


>co.var<-function(x)(
+sd(x)/mean(x)
+)


> dim(mydata)
[1] 1710  244

>xxx<-rollmean(mydata,3,by=3) 

works fine and creates a vector which I will transform into a matrix. I still 
have to find out how to store the output in my previously created 570x244 0's 
matrix in an ordered way.


but, since the examples in the help page says it´s the same, I tried

> xxx<-rollapply(mydata,3,mean,by=3)
Error in UseMethod("rollapply") : 
  no applicable method for 'rollapply' applied to an object of class 
"c('matrix', 'integer', 'numeric')"


and, with my created function...

> xxx<-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3)
Error in UseMethod("rollapply") : 
  no applicable method for 'rollapply' applied to an object of class 
"c('matrix', 'integer', 'numeric')"


Can you help me with the error?

Dave

Date: Fri, 15 Oct 2010 00:45:08 -0700
Subject: Re: [R] using apply function and storing output
From: djmu...@gmail.com

To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi:


Look into the rollmean() function in package zoo.

HTH,
Dennis

On Fri, Oct 15, 2010 at 12:34 AM, David A.  wrote:




Hi list,



I have a 1710x244 matrix of numerical values and I would like to calculate the 
mean of every group of three consecutive values per column to obtain a new 
matrix of 570x244.  I could get it done using a for loop but how can I do that 
using apply functions?



In addition to this, do I have to initizalize a 570x244 matrix with 0's to 
store the calculated values or can the output matrix be generated while 
calculating the mean values?



Cheers,



Dave



[[alternative HTML version deleted]]



__

R-help@r-project.org mailing list



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


  

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem using BRugs

2010-10-15 Thread Sally Luo
Hi R users,

I am trying to call openbugs from R.  And I got the following error message:

~
model is syntactically correct
expected the collection operator c error pos 8 (error on line 1)
variable ww is not defined in model or in data set
[1] "C:\\DOCUME~1\\maomao\\LOCALS~1\\Temp\\RtmpqJk9R3/inits1.txt"
Initializing chain 1: model must be compiled before initial values loaded
model must be initialized before updating
model must be initialized before DIC an be monitored
Error in samplesSet(parametersToSave) :
  model must be initialized before monitors used
~~

I did define variable "ww" in my data and model (they are listed below).  I
am not sure if this is due to some errors in my code (please see below) or
because openbugs cannot handle the model I am using.  In my model, y[i] also
depends on all other y[j]s.  Could you help me figure out the problem and
hopefully get the code to work?

Many thanks for your help.   --- Maomao

~~
data<-list("y","cap2","pol2","cap1","pol1","g","wo","wd","ww","mu","tau")

inits<-function() {list(beta=beta0, rho_o=rho_o_0, rho_d=rho_d_0,
rho_w=rho_w_0)}

parameters<-c("beta", "rho_o", "rho_d", "rho_w")

probit.sim<-BRugsFit(data,inits,parameters,modelFile="spatial.openbugs.txt",numChains=1,nIter=2000)



# my model

model {

for (i in 1:676) {

y[i] ~ dbern(p[i])

wwy[i]<- inprod(ww[i, 1:676] , y[])

woy[i]<- inprod(wo[i, 1:676] , y[])

wdy[i]<- inprod(wd[i, 1:676] , y[])

probit(p[i])<- rho_o * woy[i] + rho_d * wdy[i] + rho_w * wwy[i] + beta[1] +
beta[2] * cap2[i] + beta[3] * pol2[i] + beta[4] * cap1[i] + beta[5] *
pol1[i] + beta[6] * g[i]+ e[i]

}

# Priors

for (j in 1:6) {

beta[1:6] ~ dmnorm(mu[1:6], tau[1:6, 1:6])

   }

rho_o ~ dunif(-1,1)

rho_d ~ dunif(-1,1)

rho_w ~ dunif(-1,1)

for (i in 1:676) {

e[i] ~ dnorm(0, 1)

}

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
Thank you for the very helpful tips. Just one last question:
- In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs
PREDconcentration in one graph (per individual)? You said "in lattice you
would replace 'smooth' by 'l' in the type = argument of xyplot()" that just
means now the OBSconc is replaced by a line instead of points and PREDconc
is not plotted, right?
--> I tried to superimpose 2 lattices but that won't work. I can keep the
x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs
conc = 80 and thus for this case, there would be 2 different y-scales
superimposed on one another.
- the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there
are only 5,7,10 mg doses) but it's nothing that photoshop can't take care
of.

This is very cool, I think I understand it a lot better now. It was a lot
easier than what I was doing before, that's for sure. Thanks!

On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy  wrote:

> Hi:
>
> To get the plots precisely as you have given them in your png file, you're
> most likely going to have to use base graphics, especially if you want a
> separate legend in each panel. Packages ggplot2 and lattice have more
> structured ways of constructing such graphs, so you give up a bit of freedom
> in the details of plot construction to get nicer default configurations.
>
> Perhaps you've written a panel function in S-PLUS (?) to produce each graph
> in your png file - if so, you could simply add a couple of lines to
> determine the max y-value and from that the limits of the y scale. It
> shouldn't be at all difficult to make such modifications. Since R base
> graphics is very similar to base graphics in S-PLUS, you could possibly get
> away with popping your S-PLUS code directly into R and see how far that
> takes you.
>
> Lattice is based on Trellis graphics, but the syntax in lattice has changed
> a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more
> recent graphics system in R predicated on the grammar of graphics exposited
> by Leland Wilkinson.
>
> For my example, I've modified the Theophylline data set in package nlme,
> described in Pinheiro & Bates (2000), Mixed Effects Models in S and S-PLUS,
> Springer. The original data set has eleven unique doses - I combined them
> into three intervals and converted them to factor with cut(). I also created
> four groups of Subjects and put them into a variable ID. The output data
> frame is called theo. I didn't fit the nlme models to the subjects -
> instead, I was lazy and used loess smoothing instead. The code to generate
> the data frame is given below; this is what we mean by a 'reproducible
> example', something that can be copied and pasted into an R session.
>
> # Use modified version of Theophylline data in package nlme
>
> library(nlme)
> theo <- Theoph
> theo <- subset(theo, Dose > 3.5)
> theo$dose <- cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25',
> '4.8', '5.5'))
> theo <- as.data.frame(theo)
> theo$ID <- with(theo, ifelse(Subject %in% c(6, 7, 12), 1,
>   ifelse(Subject %in% c(2, 8, 10), 2,
>   ifelse(Subject %in% c(4, 11, 5), 3, 4) )))
> # ID is used for faceting, dose for color and shape
>
> # lattice version:
>
> xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3,
>   pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'),
>   scales = list(y = list(relation = 'free')),
>   auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points =
> TRUE,
>   text = levels(theo$dose)) )
>
> # ggplot2 version:
> # scales = 'free_y' allows independent y scales per panel
> g <- ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose,
>group = Subject))
> g + geom_point() + geom_smooth(method = 'loess', se = FALSE) +
> facet_wrap( ~ ID, ncol = 2, scales = 'free_y') +
> opts(legend.position = c(0.9, 0.4))
>
> This is meant to give you some indication how the two graphics systems work
> - it's a foundation, not an end product. There are a couple of ways I could
> have adjusted the y-scales in the lattice graphs (either directly in the
> scales = ... part or to use a prepanel function for loess), but since you're
> not likely to use loess in your plots, it's not an important consideration.
>
> Both ggplot2 and lattice place the legend outside the plot area by default;
> I've illustrated a couple of ways to pull it into the plot region FYI.
>
> One other thing: if your data set contains fitted values from your PK
> models for each subject * dose combination, the loess smoothing is
> unnecessary. In ggplot2, you would use geom_line(aes(y = pred), ...) in
> place of geom_smooth(), and in lattice you would replace 'smooth' by 'l' in
> the type = argument of xyplot().
>
> HTH,
> Dennis
>
>
>
> On Fri, Oct 15, 2010 at 12:46 AM, Anh Nguyen  wrote:
>
>> Hello Dennis,
>>
>> That's a very good suggestion. I've attached a

[R] feed cut() output into goodness-of-fit tests

2010-10-15 Thread Andrei Zorine
Hello,
My question is assuming I have cut()'ed my sample and look at the
table() of it, how can I compute probabilities for the bins? Do I have
to parse table's names() to fetch bin endpoints to pass them to
p[distr-name] functions? i really don't want to input arguments to PDF
functions by hand (nor copy-and-paste way).

> x.fr <- table(cut(x,10))
> x.fr

(0.0617,0.549]   (0.549,1.04](1.04,1.52](1.52,2.01] (2.01,2.5]
   16 28 26 18  6
   (2.5,2.99](2.99,3.48](3.48,3.96](3.96,4.45](4.45,4.94]
3  2  0  0  1

> names(x.fr)
 [1] "(0.0617,0.549]" "(0.549,1.04]"   "(1.04,1.52]""(1.52,2.01]"
 [5] "(2.01,2.5]" "(2.5,2.99]" "(2.99,3.48]""(3.48,3.96]"
 [9] "(3.96,4.45]""(4.45,4.94]"


--

Andrei Zorine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh

I have compared "dat11" and "x" using str() function, however did not find
drastic difference:


> str(dat11)
‘zoo’ series from 2010-10-15 13:43:54 to 2010-10-15 13:49:51
  Data: num [1:7, 1:4] 73.8 73.8 73.8 73.8 73.8 73.8 73.7 73.8 73.8 73.8 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:7] "7" "6" "5" "4" ...
  ..$ : chr [1:4] "V2" "V3" "V4" "V5"
  Index:  POSIXlt[1:7], format: "2010-10-15 13:43:54" "2010-10-15 13:44:15"
"2010-10-15 13:45:51" "2010-10-15 13:46:21" "2010-10-15 13:47:27"
"2010-10-15 13:47:54" ...
> str(x)
‘zoo’ series from 2003-02-01 to 2003-02-14
  Data: int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
  Index: Class 'Date'  num [1:5] 12084 12086 12090 12092 12097

Thanks,
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997510.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Achim Zeileis

On Fri, 15 Oct 2010, Megh Dal wrote:

Hi Gabor, please see the attached files which is in text format. I have 
opened them on excel then, used clipboard to load them into R. Still 
really unclear what to do.


I've read both files using read.zoo():

R> z1 <- read.zoo("dat1.txt", sep = ",", header = TRUE,
+format = "%m/%d/%Y %H:%M:%S", tz = "")
Warning message:
In zoo(rval3, ix) :
  some methods for "zoo" objects do not work if the index entries in 
'order.by' are not unique

R> z2 <- read.zoo("dat2.txt", sep = ",", header = TRUE,
+format = "%m/%d/%Y %H:%M:%S", tz = "")


Then, merge() does not work because some time indexes are not unique (and 
it would be unclear how these should be matched):


R> merge(z1, z2)
Error in merge.zoo(z1, z2) :
  series cannot be merged with non-unique index entries in a series

However, you can remove the duplicated, e.g., by computing averages:

R> z1a <- aggregate(z1, time(z1), mean)

Then

R> merge(z1a, z2)

works.

Also can you please elaborate this term "index = list(1, 2), FUN = 
function(d, t) as.POSIXct(paste(d, t))" in your previous file? In help, 
it is given that:"If FUN is specified then read.zoo calls FUN with the 
index as the first argument". I really could not connect your syntax 
with help.


The way Gabor read the data, the index was in two separate columns 
(columns 1 and 2). Hence, he specified

  index = list(1, 2)
and then provided a function that would return a POSIXct object when 
called with two arguments

  FUN(column1, column2)

hth,
Z


--- On Sat, 10/16/10, Gabor Grothendieck  wrote:


From: Gabor Grothendieck 
Subject: Re: [R] Problem with merging two zoo objects
To: "Megh Dal" 
Cc: r-h...@stat.math.ethz.ch
Date: Saturday, October 16, 2010, 12:11 AM
On Fri, Oct 15, 2010 at 2:20 PM, Megh
Dal 
wrote:
> Dear all, I have following 2 zoo objects. However when
I try to merge those 2 objects into one, nothing is coming
as intended. Please see below the objects as well as the
merged object:
>
>
>> dat11
>                      V2   V3   V4   V5
> 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
> 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
> 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
> 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
> 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
> 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
> 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
>> dat22
>                      V2   V3   V4   V5
> 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
> 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
> 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
> 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
> 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
>> merge(dat11, dat22)
>                    V2.dat11 V3.dat11
V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22
> 2010-10-15 12:09:12       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 12:09:33       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:43:54       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:44:15       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:45:51       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:46:21       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:47:27       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:47:54       NA       NA      
NA       NA       NA       NA       NA      
NA
> 2010-10-15 13:49:51       NA       NA      
NA       NA       NA       NA       NA      
NA
> Warning messages:
> 1: In MATCH(x, x) == seq_len(length(x)) :
>  longer object length is not a multiple of shorter
object length
> 2: In MATCH(x, x) == seq_len(length(x)) :
>  longer object length is not a multiple of shorter
object length
>
> If somebody points me whether I went wrong, it would
be really great.
>

If I try it then it works properly so there is likely
something wrong
with your dat11 and dat22 objects.  If you provide the
problem
reproducibly one might be able to say more.

> Lines1 <- "Date Time
V2   V3   V4   V5
+ 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
+ 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
+ 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:49:51 73.7 73.7 73.7 73.7"
>
> Lines2 <- "Date Time
V2   V3   V4   V5
+ 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
+ 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
+ 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:41:03 73.7 73.7 73.7 73.7"
>
> library(zoo)
> dat1 <- read.zoo(textConnection(Lines1), header =
TRUE,
+ index = list(1, 2), FUN = function(d, t)
as.POSIXct(paste(d, t)))
Warning messages:
1: closing unused connection 8 (Lines2)
2: closing unused connection 7 (Lines1)
3: closing unused connection 5 (Lines2)
4: closing unused connection 4 (Lines1)
5: closing unused connection 3 (Lin

[R] Beginner question on bar plot

2010-10-15 Thread steven mosher
I've read a number of examples on doing a multiple bar plot, but cant seem
to grasp
how they work or how to get my data into the proper form.

I have two  variable holding the same factor

The variables were created using a cut command, The following simulates that

A <- 1:100
B <- 1:100
 A[30:60] <- 43
 Acut <- cut(A,breaks=c(0,10,45,120),labels=c("low","med","high"))
 Bcut <- cut(B,breaks=c(0,10,45,120),labels=c("low","med","high"))

What I want to do is create a barplot with  3 groups of side by side bars

group 1, = "low" and the two bars would be the count for Acut, and the count
for Bcut
group 2 = "med" and the two bars again would be the counts for  this factor
level in Acut and Bcut
group 3 = high  and like the above two.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >