date:20100405

[R] logistic regression in an incomplete dataset

2010-04-05 Thread Desmond D Campbell

Dear all,

I want to do a logistic regression.
So far I've only found out how, in a dataset of complete cases.
I'd like to do logistic regression via max likelihood, using all the study
cases (complete and incomplete). Can you help?

I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is
dropped, i.e. it is doing a complete case analysis.
As a lot of study cases are being dropped, I'd rather it did maximum
likelihood and on all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about
NA's present in the study cases.

regards
Desmond

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Biplot for PCA using labdsv package

2010-04-05 Thread Michael Denslow

Hi Dilys,

On Fri, Apr 2, 2010 at 3:08 AM, Dilys Vela  wrote:
> Hi everyone,
>
> I am doing PCA with labdsv package. I was trying to create a biplot graphs
> in order to observe arrows related to my variables. However when I run the
> script for this graph, the console just keep saying:
>
> *Error in nrow(y) : element 1 is empty;
>   the part of the args list of 'dim' being evaluated was:
>   (x)*
>
> could please someone tell me what this means? what i am doing wrong? I will
> really appreciate any suggestions and help.
>
It seems like you are mixing the pca function from package labdsv with
the biplot function from package stats. Is that correct? pca from
labdsv is a wrapper for prcomp but has a different output. For example
using the Bryce Data from labdsv:

# pca in labdsv
> str(pca(bryceveg))
List of 4
 $ scores  : num [1:160, 1:160] 0.304 -0.107 0.325 0.301 3.051 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:160] "50001" "50002" "50003" "50004" ...
  .. ..$ : chr [1:160] "PC1" "PC2" "PC3" "PC4" ...
 $ loadings: num [1:169, 1:160] -0.00638 0.03989 0.85517 -0.05806 -0.03787 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:169] "junost" "ameuta" "arcpat" "arttri" ...
  .. ..$ : chr [1:160] "PC1" "PC2" "PC3" "PC4" ...
 $ sdev: num [1:160] 1.799 1.09 0.793 0.647 0.557 ...
 $ totdev  : num 9.24
 - attr(*, "class")= chr "pca"

# prcomp from stats
> str(prcomp(bryceveg))
List of 5
 $ sdev: num [1:160] 1.799 1.09 0.793 0.647 0.557 ...
 $ rotation: num [1:169, 1:160] -0.00638 0.03989 0.85517 -0.05806 -0.03787 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:169] "junost" "ameuta" "arcpat" "arttri" ...
  .. ..$ : chr [1:160] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:169] 0.0125 0.0813 1.1231 0.1531 0.1 ...
  ..- attr(*, "names")= chr [1:169] "junost" "ameuta" "arcpat" "arttri" ...
 $ scale   : logi FALSE
 $ x   : num [1:160, 1:160] 0.304 -0.107 0.325 0.301 3.051 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:160] "50001" "50002" "50003" "50004" ...
  .. ..$ : chr [1:160] "PC1" "PC2" "PC3" "PC4" ...
 - attr(*, "class")= chr "prcomp"

biplot (or biplot.prcomp) is expecting something like the second example.
I am not sure if there is an equivalent function in labdsv so you may
wish to use prcomp directly. Either way if you provide the code you
used it will make it easier to sort out.

Hope this helps,
Michael

-- 
Michael Denslow

I.W. Carpenter Jr. Herbarium [BOON]
Department of Biology
Appalachian State University
Boone, North Carolina U.S.A.
-- AND --
Communications Manager
Southeast Regional Network of Expertise and Collections
sernec.org

36.214177, -81.681480 +/- 3103 meters

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Co-occurrence

2010-04-05 Thread mailing-list

Hello together!

I was searching through and different R sites, however I did not find any 
useful site for my problem.

Is there any co-occurrence program in R which could work with text strings?

Thanks for any help or hint,
best regards,
Georg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Creating R packages, passing by reference and oo R.

2010-04-05 Thread Roger Gill

Dear All,

I would like some advice on creating R packages, passing by reference and oo R.

I have created a package that is neither elegant nor extensible and rather 
cumbersome (it works). I would like to re write the code to make the package 
distributable (should it be of interest) and easy to maintain.

The package is for Bayesian model determination via a reversible jump algorithm 
and has currently been written to minimise computational expense.

At every iteration of the MCMC I make a call to a likelihood function. Here, I 
pass by value a matrix containing the explanatory variables of the current 
model and this is quite costly if X is quite large. This matrix could change at 
every iteration (it's certainly proposed). Ideally I would pass by reference (a 
pointer in C, \@ in perl etc...) and allow the user to specify this function 
should they so desire. This is where I become a little stuck.

I have searched the lists, used google and not found a accepted solution. How 
do I pass by ref? What are my options? Can you point me towards some suitable 
reading? Are there any good examples? How have others overcome this problem.

Thanks in advance,

best wishes

Roger




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLM

2010-04-05 Thread Nuno Miguel Madeira Veiga


Hi,
I am working on GLM models. However I am having some problems that I would like 
that someone give me some guidance
 
One of the explanatory variables ERECTANGLE  is not present in all the 
individual rows.
 
1  when I delete the rows for which the variable ERECTANGLE  is missing I get 
the following results
Null deviance: 290.884  on 774  degrees of freedom
Residual deviance:  85.863  on 760  degrees of freedom
AIC: 11385
 
2- when I run the model with all the rows including the null ECTANGLE I get the 
following results
Null deviance: 480.75  on 1232  degrees of freedom
Residual deviance: 140.30  on 1211  degrees of freedom
AIC: 18113
 
 
So I am curious on how the algorithm functions in terms of model adjustment
 
I will be waiting for your answer Thanks
Nuno  
_

 veja como

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression in an incomplete dataset

2010-04-05 Thread Desmond Campbell


Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset of complete cases.
I'd like to do logistic regression via max likelihood, using all the study 
cases (complete and incomplete). Can you help?

I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is dropped, 
i.e. it is doing a complete cases analysis.
As a lot of study cases are being dropped, I'd rather it did maximum likelihood 
using all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about NA's 
present in the study cases.
I've about 1000 unmatched study cases and less than 10 covariates so could use 
unconditional ML estimation (as opposed to conditional ML estimation).

regards
Desmond


--
Desmond Campbell
UCL Genetics Institute
d.campb...@ucl.ac.uk
Tel. ext. 020 31084006, int. 54006

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating R packages, passing by reference and oo R.

2010-04-05 Thread Duncan Murdoch


On 05/04/2010 7:35 AM, Roger Gill wrote:

Dear All,

I would like some advice on creating R packages, passing by reference and oo R.

I have created a package that is neither elegant nor extensible and rather 
cumbersome (it works). I would like to re write the code to make the package 
distributable (should it be of interest) and easy to maintain.

The package is for Bayesian model determination via a reversible jump algorithm 
and has currently been written to minimise computational expense.

At every iteration of the MCMC I make a call to a likelihood function. Here, I 
pass by value a matrix containing the explanatory variables of the current 
model and this is quite costly if X is quite large. This matrix could change at 
every iteration (it's certainly proposed). Ideally I would pass by reference (a 
pointer in C, \@ in perl etc...) and allow the user to specify this function 
should they so desire. This is where I become a little stuck.

I have searched the lists, used google and not found a accepted solution. How 
do I pass by ref? What are my options? Can you point me towards some suitable 
reading? Are there any good examples? How have others overcome this problem.



R discourages passing by reference; it encourages a functional style of 
programming.  You can do references by using an external pointer (so the 
contents of the object are managed entirely in your C code), or by 
putting the object into an environment, and passing the environment around.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Co-occurrence

2010-04-05 Thread Joris Meys

R contains a whole set of functions to work with text strings, of which
following are definitely worth a look :
?substr
?strsplit
?grep
?gsub
...

Please provide us with an exact problem, and some sample R-code, e.g.
I want to see if a certain word occurs in a vector :
a <- c("a","b","x","c")
b <- "x"
b.in.a <- ???   (this is solved by "b %in% a" )

Cheers
Joris
On Mon, Apr 5, 2010 at 1:19 PM,  wrote:

> Hello together!
>
> I was searching through and different R sites, however I did not find any
> useful site for my problem.
>
> Is there any co-occurrence program in R which could work with text strings?
>
> Thanks for any help or hint,
> best regards,
> Georg
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression in an incomplete dataset

2010-04-05 Thread Desmond Campbell


Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset of complete 
cases.
I'd like to do logistic regression via max likelihood, using all the 
study cases (complete and incomplete). Can you help?


I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is 
dropped, i.e. it is doing a complete cases analysis.
As a lot of study cases are being dropped, I'd rather it did maximum 
likelihood using all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about 
NA's present in the study cases.
I've about 1000 unmatched study cases and less than 10 covariates so 
could use unconditional ML estimation (as opposed to conditional ML 
estimation).


regards
Desmond

--
Desmond Campbell
UCL Genetics Institute
d.campb...@ucl.ac.uk
Tel. ext. 020 31084006, int. 54006

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2 geom_rect(): What am I missing here

2010-04-05 Thread Marshall Feldman

Thanks to David Winsemius, Peter Ehlers, and Paul Murrell who pointed 
out my careless error working with ggplot2's geom_rect(). Not to make 
excuses, but when you've done something successfully dozens of times and 
suddenly it doesn't work, you're more likely to look for careless errors 
on your part. When you've never done something before and unsure that 
you understand the proper use of the tool, you're more likely to think 
you're missing something about the tool's proper use and to overlook 
your own careless errors.


This list is great! I posted my question, went off to do something else, 
and within a few hours had the answer to my problem.


Thanks again

Marsh Feldman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] A questionb about the Wilcoxon signed rank test

2010-04-05 Thread hix li

Hi guys,
 
I have two data sets of prices: endprice0, endprice1
 
I use the Wilcox test:
 
wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",  
conf.int = T, conf.level = 0.9)
 
The result is with V = 1819, p-value = 0.8812. 
 
Then I calculated the z-value of the test: z-value = -2.661263. The 
corresponding p-value is: p-value = 0.003892, which is different from the 
p-value computed in the Wilcox test, I am using the following steps to compute 
the z-value:
 
diff = c(endprice0 - endprice1)
diffNew = diff[diff !=0]
diffNew.rank = rank(abs(diffNew))
diffNew.rank.sign <-  diffNew.rank  *  sign(diffNew)
ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
 
v = ranks.neg
n = 100
z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263


Which p-value should I take for the Wilcox test then? 
 
Hix
 
the data sets used in my test are:

endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0., 0.0600, 159.1900, 
242.5600, 0.0400, 289.9000, 0., 42.6100, 275.9500, 76.6200, 36.6400, 
0., 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 
137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0., 
130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 
162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0., 
0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0., 239.2100, 0., 13.7700, 
95.5300, 0.0400, 146.7200, 0., 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 
206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 
0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 
132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)
 
endprice1 = c(138.9300, 131.9700, 300.4700, 0., 0., 0.2200, 159.6300, 
277.9100, 0., 328.9700, 0., 40.5100, 270.1000, 52.8000, 39.3800, 
0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 
67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0., 1.3700, 140.6500, 
165.7200, 0., 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 
83.3700, 139.5300, 157.0500, 0., 147.5900, 105.2800, 0., 0., 
207.3000, 74.1100, 288.3900, 0.0400, 0., 213.7200, 0.0400, 14.8300, 
53.7000, 0.0400, 150.0800, 0., 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 
354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 
0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 
9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)


 


  __
Make your browsing faster, safer, and easier with the new Internet 
Explorer[[elided Yahoo spam]]
com/ca/internetexplorer/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating R packages, passing by reference and oo R.

2010-04-05 Thread Gabor Grothendieck

Passing by value does not necessarily mean physical copying.  Check out this:

> x <- matrix(1:1000^2, 1000, 1000)
> gc()
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 114520  3.1 35  9.4   35  9.4
Vcells 577124  4.51901092 14.6  1577448 12.1
> f <- function(x) { y <- max(x); print(gc()); y }
> f(x)
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 114560  3.1 35  9.4   35  9.4
Vcells 577133  4.51901092 14.6  1577448 12.1
[1] 100
> R.version.string
[1] "R version 2.10.1 (2009-12-14)"
> win.version()
[1] "Windows Vista (build 6002) Service Pack 2"


On Mon, Apr 5, 2010 at 7:35 AM, Roger Gill  wrote:
> Dear All,
>
> I would like some advice on creating R packages, passing by reference and oo 
> R.
>
> I have created a package that is neither elegant nor extensible and rather 
> cumbersome (it works). I would like to re write the code to make the package 
> distributable (should it be of interest) and easy to maintain.
>
> The package is for Bayesian model determination via a reversible jump 
> algorithm and has currently been written to minimise computational expense.
>
> At every iteration of the MCMC I make a call to a likelihood function. Here, 
> I pass by value a matrix containing the explanatory variables of the current 
> model and this is quite costly if X is quite large. This matrix could change 
> at every iteration (it's certainly proposed). Ideally I would pass by 
> reference (a pointer in C, \@ in perl etc...) and allow the user to specify 
> this function should they so desire. This is where I become a little stuck.
>
> I have searched the lists, used google and not found a accepted solution. How 
> do I pass by ref? What are my options? Can you point me towards some suitable 
> reading? Are there any good examples? How have others overcome this problem.
>
> Thanks in advance,
>
> best wishes
>
> Roger
>
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bootstrap confidence intervals, non iid

2010-04-05 Thread Kay Cichini


hello,

i need to calculate ci's for each of 4 groups within a dataset, to be able
to infere about differences in the variable "similarity". the problem is
that data within groups is dependent, as assigned by the blocking-factor
"site". my guess was to use a block bootstrap but samples within in these
blocks / sites are not of same length. i was not able to find a method to
deal with this -
 
i'd really appreciate any advise with this!

thanks,
kay

my data:
##
similarity<-data.frame(list(structure(list(stage = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), site =
structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 6L, 6L, 
6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 14L, 
14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 18L, 
18L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 21L, 
22L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L, 
25L, 25L, 26L, 26L, 26L, 26L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 
28L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 32L, 32L, 32L, 
32L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 
36L, 36L, 36L, 36L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 39L, 39L
), .Label = c("A11", "A12", "A14", "A15", "A16", "A17", "A18", 
"A19", "A20", "A5", "A7", "A8", "B1", "B12", "B13", "B14", "B15", 
"B17", "B18", "B2", "B4", "B7", "B8", "B9", "C1", "C10", "C11", 
"C15", "C17", "C18", "C19", "C2", "C20", "C3", "C4", "C6", "D1", 
"D4", "D7"), class = "factor"), MH.Index = c(0.392156863, 0.602434077, 
0.576923077, 0.647482014, 0.989010989, 0.857142857, 1, 1, 1, 
0, 1, 0.378378378, 0.839087948, 0.252915554, 1, 0.22556391, 0.510366826, 
0.476190476, 0.555819477, 0.961538462, 0.7, 0.089285714, 
0.923076923, 0.571428571, 0, 0.923076923, 0.617647059, 0.599423631, 
0, 0.727272727, 0.998112812, 0, 0, 0, 1, 0.565656566, 0.75, 0.923076923, 
0.654545455, 0.14084507, 0.617647059, 0.315789474, 0.179347826, 
0.583468021, 0.165525114, 0.817438692, 0.41457, 0.49548886, 
0.556127703, 0.707431246, 0.506757551, 0.689655172, 0.241433511, 
0.379232506, 0.241935484, 0, 0.30848329, 0.530973451, 0.148148148, 
0, 0.976744186, 0.550218341, 0.542168675, 0.769230769, 0.153310105, 
0, 0, 0.380569406, 0.742174733, 0.2, 0.046925432, 0, 
0.068076328, 0.772727273, 0.830039526, 0.503458415, 0.863910822, 
0.39401263, 0.081818182, 0.368421053, 0.088607595, 0, 0.575499851, 
0.605657238, 0.714854232, 0.855881172, 0.815689401, 0.552207228, 
0.81708081, 0.583228133, 0.334466349, 0.259477365, 0.194711538, 
0.278916707, 0.636304805, 0.593715432, 0.661016949, 0.626865672, 
0.420219245, 0.453535143, 0.471243706, 0.462427746, 0.56980057, 
0.453821155, 0.052828527, 0.926829268, 0.51988266, 0.472200264, 
0.351219512, 0.290030211, 0.765258974, 0.564894108, 0.789699571, 
0.863378215, 0.525181559, 0.803061458, 0.260164645, 0.477265792, 
0.265889379, 0.317791411, 0.107623318, 0.279181709, 0.471953363, 
0.463724265, 0.241966696, 0.403647213, 0.693087992, 0.494259925, 
0.68904453, 0.39329147, 0.498161213, 0.376225983, 0.407001046, 
0.825016633, 0.718991658, 0.662995912)), .Names = c("stage", 
"site", "MH.Index"), class = "data.frame", row.names = c(NA, 
-136L
##
-- 
View this message in context: 
http://n4.nabble.com/bootstrap-confidence-intervals-non-iid-tp1751619p1751619.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] using difftime()

2010-04-05 Thread steve012


I'm new to R and have the following problem with difftime:

if I directly assign date/time strings in difftime I get the expected
result:

> a<-"2010-03-23 10:52:00"
> a
[1] "2010-03-23 10:52:00"
> b<-"2010-03-23 11:53:00"
> u2<-as.difftime(c(a,b), format ="%Y-%m-%d %H:%M:%S", units="mins")
> u2
Time differences in mins
[1] -18008 -17947
attr(,"tzone")


However, if I first assign the values of "a" and "b" from a data frame
within a loop I get a different result:

a<-u[i]
> a
[1] 2010-03-23 10:52:00
6838 Levels: 2010-03-18 16:54:00 2010-03-18 16:55:00 ... 2010-03-23 11:53:00
b<-u[i+1]
> b
[1] 2010-03-23 11:53:00
6838 Levels: 2010-03-18 16:54:00 2010-03-18 16:55:00 ... 2010-03-23 11:53:00
u2<-as.difftime(c(a,b), format ="%Y-%m-%d %H:%M:%S", units="mins")
> u2
Time differences in mins
[1] 6837 6838


So, how do use difftime in the context of a loop? Thanks.
-- 
View this message in context: 
http://n4.nabble.com/using-difftime-tp1751607p1751607.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM

2010-04-05 Thread David Winsemius



On Apr 5, 2010, at 5:32 AM, Nuno Miguel Madeira Veiga wrote:



Hi,
I am working on GLM models. However I am having some problems that I  
would like that someone give me some guidance


One of the explanatory variables ERECTANGLE  is not present in all  
the individual rows.


1  when I delete the rows for which the variable ERECTANGLE  is  
missing I get the following results

   Null deviance: 290.884  on 774  degrees of freedom
Residual deviance:  85.863  on 760  degrees of freedom
AIC: 11385

2- when I run the model with all the rows including the null  
ECTANGLE I get the following results

   Null deviance: 480.75  on 1232  degrees of freedom
Residual deviance: 140.30  on 1211  degrees of freedom
AIC: 18113


So I am curious on how the algorithm functions in terms of model  
adjustment


Which algorithm? You have not included any code.

What data? Most routines wouled either exclude missing data if  
properly registered as NA. It you used a missing data indicator of  
your own construction then who know what might be happening.


--
David.


I will be waiting for your answer Thanks
Nuno
_

veja como

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sample size > 20K? Was: fitness of regression tree: how tomeasure???

2010-04-05 Thread Liaw, Andy

Just to follow up on Bert's and Frank's excellent comments.  I'm
continued to be amazed by people trying to interpret a single tree.
Besides the variability in the tree structure (try bootstrapping and see
how the trees change), it is difficult to make sense of splits more than
a few levels down (high order interaction).  

Also, even large sample size won't make up for poorly sampled data.
>From wikipedia's entry on the Kinsey Reports: "In 1948, the same year as
the original publication, a committee of the American Statistical
Association, including notable statisticians such as John Tukey,
condemned the sampling procedure. Tukey was perhaps the most vocal
critic, saying, "A random selection of three people would have been
better than a group of 300 chosen by Mr. Kinsey."

Andy

From: Frank E Harrell Jr
> 
> Good comments Bert.  Just 2 points to add: People rely a lot 
> on the tree 
> structure found by recursive partitioning, so the structure 
> needs to be 
> stable.  This requires a huge samples size.  Second, recursive 
> partitioning is not competitive with other methods in terms of 
> predictive descrimination unless the sample size is so large that the 
> tree doesn't need to be pruned upon cross-validation.
> 
> Frank
> 
> 
> Bert Gunter wrote:
> > Since Frank has made this somewhat cryptic remark (sample 
> size > 20K)
> > several times now, perhaps I can add a few words of (what I 
> hope is) further
> > clarification.
> > 
> > Despite any claims to the contrary, **all** statistical 
> (i.e. empirical)
> > modeling procedures are just data interpolators: that is, 
> all that they can
> > claim to do is produce reasonable predictions of what may 
> be expected within
> > the extent of the data. The quality of the model is judged 
> by the goodness
> > of fit/prediction over this extent. Ergo the standard 
> textbook caveats about
> > the dangers of extrapolation when using fitted models for 
> prediction. Note,
> > btw, the contrast to "mechanistic" models, which typically 
> **are** assessed
> > by how well they **extrapolate** beyond current data. For 
> example, Newton's
> > apple to the planets. They are often "validated" by their 
> ability to "work"
> > in circumstances (or scales) much different than those from 
> which they were
> > derived.
> > 
> > So statistical models are just fancy "prediction engines." 
> In particular,
> > there is no guarantee that they provide any meaningful assessment of
> > variable importance: how predictors causally relate to the response.
> > Obviously, empirical modeling can often be useful for this purpose,
> > especially in well-designed studies and experiments, but there's no
> > guarantee: it's an "accidental" byproduct of effective prediction.
> > 
> > This is particularly true for happenstance (un-designed) data and
> > non-parametric models like regression/classification trees. 
> Typically, there
> > are many alternative models (trees) that give essentially 
> the same quality
> > of prediction. You can see this empirically by removing a 
> modest random
> > subset of the data and re-fitting. You should not be 
> surprised to see the
> > fitted model -- the tree topology -- change quite 
> radically. HOWEVER, the
> > predictions of the models within the extent of the data 
> will be quite
> > similar to the original results. Frank's point is that 
> unless the data set
> > is quite large and the predictive relationships quite 
> strong -- which
> > usually implies parsimony -- this is exactly what one 
> should expect. Thus it
> > is critical not to over-interpret the particular model one 
> get, i.e. to
> > infer causality from the model (tree)structure.
> > 
> > Incidentally, there is nothing new or radical in this; 
> indeed, John Tukey,
> > Leo Breiman, George Box, and others wrote eloquently about 
> this decades ago.
> > And Breiman's random forest modeling procedure explicitly 
> abandoned efforts
> > to build simply interpretable models (from which one might 
> infer causality)
> > in favor of building better interpolators, although 
> assessment of "variable
> > importance" does try to recover some of that 
> interpretability (however, no
> > guarantees are given).
> > 
> > HTH. And contrary views welcome, as always.
> > 
> > Cheers to all,
> > 
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >  
> >  
> > -Original Message-
> > From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On
> > Behalf Of Frank E Harrell Jr
> > Sent: Thursday, April 01, 2010 5:02 AM
> > To: vibha patel
> > Cc: r-help@r-project.org
> > Subject: Re: [R] fitness of regression tree: how to measure???
> > 
> > vibha patel wrote:
> >> Hello,
> >>
> >> I'm using rpart function for creating regression trees.
> >> now how to measure the fitness of regression tree???
> >>
> >> thanks n Regards,
> >> Vibha
> > 
> > If the sample size is less than 20,000, assume that the tree is a 
> > somewhat arbitrary representation of the relations

Re: [R] using difftime()

2010-04-05 Thread jim holtman

First thing to notice in your second example is that you have "factor" for
what you think are strings.  You should try:

u2<-as.difftime(c(as.character(a),as.character(b)), format ="%Y-%m-%d
%H:%M:%S", units="mins")

and see if this gives you what you are expecting.  If you are reading this
in with 'read.table', you might want to specify those columns as characters,
or use 'as.is=TRUE' to prevent conversion to factors.  If you are not using
factors, then you probably want the character strings.
On Mon, Apr 5, 2010 at 9:19 AM, steve012  wrote:

>
> I'm new to R and have the following problem with difftime:
>
> if I directly assign date/time strings in difftime I get the expected
> result:
>
> > a<-"2010-03-23 10:52:00"
> > a
> [1] "2010-03-23 10:52:00"
> > b<-"2010-03-23 11:53:00"
> > u2<-as.difftime(c(a,b), format ="%Y-%m-%d %H:%M:%S", units="mins")
> > u2
> Time differences in mins
> [1] -18008 -17947
> attr(,"tzone")
>
>
> However, if I first assign the values of "a" and "b" from a data frame
> within a loop I get a different result:
>
> a<-u[i]
> > a
> [1] 2010-03-23 10:52:00
> 6838 Levels: 2010-03-18 16:54:00 2010-03-18 16:55:00 ... 2010-03-23
> 11:53:00
> b<-u[i+1]
> > b
> [1] 2010-03-23 11:53:00
> 6838 Levels: 2010-03-18 16:54:00 2010-03-18 16:55:00 ... 2010-03-23
> 11:53:00
> u2<-as.difftime(c(a,b), format ="%Y-%m-%d %H:%M:%S", units="mins")
> > u2
> Time differences in mins
> [1] 6837 6838
>
>
> So, how do use difftime in the context of a loop? Thanks.
> --
> View this message in context:
> http://n4.nabble.com/using-difftime-tp1751607p1751607.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A questionb about the Wilcoxon signed rank test

2010-04-05 Thread David Winsemius



On Apr 5, 2010, at 8:06 AM, hix li wrote:


Hi guys,

I have two data sets of prices: endprice0, endprice1

I use the Wilcox test:

wilcox.test(endprice0, endprice1, paired = TRUE, alternative =  
"two.sided",  conf.int = T, conf.level = 0.9)


The result is with V = 1819, p-value = 0.8812.

Then I calculated the z-value of the test: z-value = -2.661263. The  
corresponding p-value is: p-value = 0.003892, which is different  
from the p-value computed in the Wilcox test, I am using the  
following steps to compute the z-value:


If you are trying to invent a new test then you should provide a  
theoretic justification. If you are doing this a a homework exercise,  
then consult with your instructor. If you are looking for alternative  
methods of looking at the data then either do a paired t.test or try :


 plot(density(endprice0))
lines(density(endprice1), col="red")

--
David.



diff = c(endprice0 - endprice1)
diffNew = diff[diff !=0]
diffNew.rank = rank(abs(diffNew))
diffNew.rank.sign <-  diffNew.rank  *  sign(diffNew)
ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751

v = ranks.neg
n = 100
z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263


Which p-value should I take for the Wilcox test then?

Hix

the data sets used in my test are:

endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0., 0.0600,  
159.1900, 242.5600, 0.0400, 289.9000, 0., 42.6100, 275.9500,  
76.6200, 36.6400, 0., 81.5900, 179.3600, 86.2200, 210.8000,  
118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400,  
0.0400, 130.2300, 0.0400, 0., 130.3800, 150.7600, 0.5900,  
277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500,  
125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0., 0.0400,  
213.7300, 86.9200, 294.8200, 0.0400, 0., 239.2100, 0.,  
13.7700, 95.5300, 0.0400, 146.7200, 0., 0.00, 121.57, 68.23,  
5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0,  
129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19,  
327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23,  
13.46, 326.64, 198.19, 114.22, 79.53)


endprice1 = c(138.9300, 131.9700, 300.4700, 0., 0., 0.2200,  
159.6300, 277.9100, 0., 328.9700, 0., 40.5100, 270.1000,  
52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600,  
123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900,  
0.0400, 126.1000, 0., 1.3700, 140.6500, 165.7200, 0.,  
314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700,  
139.5300, 157.0500, 0., 147.5900, 105.2800, 0., 0.,  
207.3000, 74.1100, 288.3900, 0.0400, 0., 213.7200, 0.0400,  
14.8300, 53.7000, 0.0400, 150.0800, 0., 0, 123.73, 68.01, 9.52,  
0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92,  
114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11,  
0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17,  
191.54, 126.44, 84.83)






  
__
Make your browsing faster, safer, and easier with the new Internet  
Explorer[[elided Yahoo spam]]

com/ca/internetexplorer/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R 2.6 Support Question

2010-04-05 Thread Raadt, Timothy W.


Hello,

I have a question on the support of the R 2.6 software.  We are in the process 
of planning for a  hardware refresh and our new machines will be running 
Windows 7 and Internet Explorer 8.   My question is if the R 2.6 software would 
be supported on a system running Windows 7 and Internet  Explorer 8?  Also, are 
you aware of any customers having issues with Windows 7 and/or  Internet 
Explorer 8?

Thank you for your time and any information you can provide.

Tim Raadt
Systems Developer
Federated Insurance
121 East Park Square
Owatonna, MN 55060
507-444-7163





This e-mail and its attachments are intended only for the use of the 
addressee(s) and may contain privileged, confidential or proprietary 
information. If you are not the intended recipient, or the employee or agent 
responsible for delivering the message to the intended recipient, you are 
hereby notified that any dissemination, distribution, displaying, copying, or 
use of this information is strictly prohibited. If you have received this 
communication in error, please inform the sender immediately and delete and 
destroy any record of this message. Thank you.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating R packages, passing by reference and oo R.

2010-04-05 Thread Sharpie



Gabor Grothendieck wrote:
> 
> Passing by value does not necessarily mean physical copying.  Check out
> this:
> 
>> x <- matrix(1:1000^2, 1000, 1000)
>> gc()
>  used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 114520  3.1 35  9.4   35  9.4
> Vcells 577124  4.51901092 14.6  1577448 12.1
>> f <- function(x) { y <- max(x); print(gc()); y }
>> f(x)
>  used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 114560  3.1 35  9.4   35  9.4
> Vcells 577133  4.51901092 14.6  1577448 12.1
> [1] 100
>> R.version.string
> [1] "R version 2.10.1 (2009-12-14)"
>> win.version()
> [1] "Windows Vista (build 6002) Service Pack 2"
> 

Yes, but you might pay a price in memory if you start altering values in x
within the function:

> x <- matrix(1:1000^2, 1000, 1000)
> gc(TRUE)
Garbage collection 5 = 1+0+4 (level 2) ... 
3.2 Mbytes of cons cells used (34%)
4.5 Mbytes of vectors used (30%)
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 119554  3.2 35  9.4   35  9.4
Vcells 578542  4.51902416 14.6  1581475 12.1

> f <- function(x) { x[100,100]<-2; print(gc(TRUE)) }
> f(x)
Garbage collection 7 = 1+0+6 (level 2) ... 
3.2 Mbytes of cons cells used (34%)
12.1 Mbytes of vectors used (60%)
  used (Mb) gc trigger (Mb) max used (Mb)
Ncells  119634  3.2 35  9.4   35  9.4
Vcells 1578573 12.12629025 20.1  2078567 15.9

> R.version.string
[1] "R version 2.10.1 (2009-12-14)"


-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://n4.nabble.com/Creating-R-packages-passing-by-reference-and-oo-R-tp1751525p1751682.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] skip for loop

2010-04-05 Thread Ravi S. Shankar

Hi R,

 

I am running a for loop in which I am doing a certain calculation. As an
outcome of calculation I get an out put say "a". Now in my for loop "I"
needs to be initiated to "a". 

 

Based the below example if the output "a"=3 then the second iteration
needs to be skipped. Is there a way to do this? 

for(i in 1:5)

{

##Calculation##

a=3 ## outcome of calculation

}

 

Any help appreciated. Thanks in advance for the time!

 

Regards

Ravi

 

This e-mail may contain confidential and/or privileged i...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.6 Support Question

2010-04-05 Thread Barry Rowlingson

On Mon, Apr 5, 2010 at 3:20 PM, Raadt, Timothy W.  wrote:
>
> Hello,
>
> I have a question on the support of the R 2.6 software.  We are in the 
> process of planning for a  hardware refresh and our new machines will be 
> running Windows 7 and Internet Explorer 8.   My question is if the R 2.6 
> software would be supported on a system running Windows 7 and Internet  
> Explorer 8?  Also, are you aware of any customers having issues with Windows 
> 7 and/or  Internet Explorer 8?
>
> Thank you for your time and any information you can provide.

 Do you understand how Open Source software works? There's a
difference between 'runs on' and 'supported on'. R 2.6 might 'run on'
Windows 7, but that doesn't mean if you have a problem that someone
will fix it. The same applies to the any version of R on any operating
system really.

 If you want "support" for R (in the sense of something that props you
up when otherwise you'd fall over) then you need to pony up some cash
and find a company that does it. These do exist. R-help isn't it.

Also, if you find that R 2.6 doesn't 'run on' Windows 7, you could
offer to pay an individual to port it. Or even to port your code for R
2.6 onto the latest version. However, R 2.6 is very, very old now.

 Have you yet tried running R 2.6 on Windows 7? That would be the
first thing you could do. Not like it will cost you anything assuming
you have a Windows 7 box handy.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rprintf not updating

2010-04-05 Thread Erik Wright

Hello all,

I am using Rprintf in a C for loop (from .Call) to print a progress indicator 
showing the current percent complete. The loop I am doing is an time intensive 
call to another function.  I have noticed that Rprintf does not print to the 
R-window until the entire loop has been completed.  When it reaches the end of 
the loop it suddenly prints 0 percent to 100 percent in a split second.  For 
less intensive function calls it prints properly while looping.  My question is 
this:  is there any way to force Rprintf to print to the screen during a loop?

On a related note, how can I show a percent sign in the output?  For example, 
Rprintf("0%"); only prints a zero.

Thanks!,
Erik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.6 Support Question

2010-04-05 Thread Sharpie

Raadt, Timothy W. wrote:
> 
> 
> Hello,
> 
> I have a question on the support of the R 2.6 software.  We are in the
> process of planning for a  hardware refresh and our new machines will be
> running Windows 7 and Internet Explorer 8.   My question is if the R 2.6
> software would be supported on a system running Windows 7 and Internet 
> Explorer 8?  Also, are you aware of any customers having issues with
> Windows 7 and/or  Internet Explorer 8?
> 
> Thank you for your time and any information you can provide.
> 
> Tim Raadt
> Systems Developer
> Federated Insurance
> 121 East Park Square
> Owatonna, MN 55060
> 507-444-7163
> 

Well, "support" given on this mailing list is a volunteer effort.  It is
important to understand that members of this list are in no way obligated to
answer any questions- they do it in their free time as a service to the R
community.  A key part of this process is for volunteers to be able to
quickly reproduce problems with the version of R they have installed. 
Generally, the most knowledgeable people on this list tend to run the newest
version of R available-- which is currently 2.10.1.

R 2.6 is nearly three years old, so it is unlikely that a significant number
of people on this list still use it.  Therefore, the answer to any questions
posed concerning 2.6 is likely to be "upgrade to the newest version and see
if the problem goes away".

If you want "real" technical support, then that will require purchasing from
a company that provides commercially supported version of R.  Two that I
know of are R+ from XL Solutions and REvolution R from REvolution Computing.

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://n4.nabble.com/R-2-6-Support-Question-tp1751666p1751705.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.6 Support Question

2010-04-05 Thread Marc Schwartz

On Apr 5, 2010, at 9:20 AM, Raadt, Timothy W. wrote:

> 
> Hello,
> 
> I have a question on the support of the R 2.6 software.  We are in the 
> process of planning for a  hardware refresh and our new machines will be 
> running Windows 7 and Internet Explorer 8.   My question is if the R 2.6 
> software would be supported on a system running Windows 7 and Internet  
> Explorer 8?  Also, are you aware of any customers having issues with Windows 
> 7 and/or  Internet Explorer 8?
> 
> Thank you for your time and any information you can provide.

See the R Windows FAQ, which is generally updated to reflect current versions 
of R:

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#Does-R-run-under-Windows-Vista_003f

That being said, R version 2.6.x is no longer supported from the perspective of 
bug fixes, patches and so forth. That version is now well over 2 years old, 
which is, in the R space-time continuum, ancient. The current release and the 
only actively supported version is 2.10.1.

If you want to get a sense of R's Software Development Life Cycle (SDLC), 
please read Section 6 in:

  http://www.r-project.org/doc/R-FDA.pdf

While the above document is targeted to regulated clinical trials, the SDLC is 
explained in detail there. In brief, there is only one version of R that is 
formally supported at any given time and that is the current release version. 

>From the perspective of using the R e-mail lists for support, if you post a 
>query and mention that you are using R version 2.6.x, the first thing that you 
>will get back in reply is a rather direct series of messages telling you to 
>update to the current release version of R before posting further inquiries.

You can also review the R Installation and Administration Manual:

  http://cran.r-project.org/doc/manuals/R-admin.html

which is probably the first place that you should look relative to R running on 
particular platforms, Windows or otherwise.

Finally, you can search the e-mail list archives using either the built-in 
function RSiteSearch(), or one of the online search engines such as rseek.org. 
You will find myriad discussions on R and Windows 7, etc.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread JoAnn Alvarez


Hello Desmond,

The only way to not drop cases with incomplete data would be some sort 
of imputation for the missing covariates.


JoAnn

Desmond Campbell wrote:

Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset of complete cases.
I'd like to do logistic regression via max likelihood, using all the study 
cases (complete and incomplete). Can you help?

I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is dropped, 
i.e. it is doing a complete cases analysis.
As a lot of study cases are being dropped, I'd rather it did maximum likelihood 
using all the study cases.
I tried setting glm()'s na.action to NULL, but then it complained about NA's 
present in the study cases.
I've about 1000 unmatched study cases and less than 10 covariates so could use 
unconditional ML estimation (as opposed to conditional ML estimation).

regards
Desmond


  



--
JoAnn Álvarez
Biostatistician
Department of Biostatistics
D-2220 Medical Center North
Vanderbilt University School of Medicine
1161 21st Ave. South 
Nashville, TN 37232-2158  


http://biostat.mc.vanderbilt.edu/JoAnnAlvarez

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] skip for loop

2010-04-05 Thread Tal Galili

wouldn't using

inside the loop:

if ("a"==3){
# do one thing
} else {
# do another thing
}


Do the trick ?



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Mon, Apr 5, 2010 at 5:46 PM, Ravi S. Shankar wrote:

> Hi R,
>
>
>
> I am running a for loop in which I am doing a certain calculation. As an
> outcome of calculation I get an out put say "a". Now in my for loop "I"
> needs to be initiated to "a".
>
>
>
> Based the below example if the output "a"=3 then the second iteration
> needs to be skipped. Is there a way to do this?
>
> for(i in 1:5)
>
> {
>
> ##Calculation##
>
> a=3 ## outcome of calculation
>
> }
>
>
>
> Any help appreciated. Thanks in advance for the time!
>
>
>
> Regards
>
> Ravi
>
>
>
> This e-mail may contain confidential and/or privileged i...{{dropped:13}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Deleting many list elements in one time

2010-04-05 Thread anna


Hi guys, here is a simple thing I want to do but it doesn't work:
I have a vector of the element indexes that I want to delete called index
so when I write myList[[index]] <- NULL to delete these elements here is
what I get:
Error in myList[[index]] <- NULL : 
  more elements supplied than there are to replace
Isn't it possible to delete multiple elements?


-
Anna Lippel
-- 
View this message in context: 
http://n4.nabble.com/Deleting-many-list-elements-in-one-time-tp1751715p1751715.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting many list elements in one time

2010-04-05 Thread Duncan Murdoch


On 05/04/2010 11:17 AM, anna wrote:

Hi guys, here is a simple thing I want to do but it doesn't work:
I have a vector of the element indexes that I want to delete called index
so when I write myList[[index]] <- NULL to delete these elements here is
what I get:
Error in myList[[index]] <- NULL : 
  more elements supplied than there are to replace

Isn't it possible to delete multiple elements?


If your index is a list of numerical indices as opposed to names, then 
just do


myList <- myList[-index]

the same way you'd delete elements from any vector.

If index is a vector of names, you need to convert it to numbers, using 
something like


indexnums <- which( names(myList) %in% index )

and then use myList[-indexnums].

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread Sharpie

Erik Wright wrote:
> 
> Hello all,
> 
> I am using Rprintf in a C for loop (from .Call) to print a progress
> indicator showing the current percent complete. The loop I am doing is an
> time intensive call to another function.  I have noticed that Rprintf does
> not print to the R-window until the entire loop has been completed.  When
> it reaches the end of the loop it suddenly prints 0 percent to 100 percent
> in a split second.  For less intensive function calls it prints properly
> while looping.  My question is this:  is there any way to force Rprintf to
> print to the screen during a loop?
> 
> On a related note, how can I show a percent sign in the output?  For
> example, Rprintf("0%"); only prints a zero.
> 
> Thanks!,
> Erik
> 

Perhaps you could use a callback to R's built in progress bar functions to
perform this for you in a nicely formatted way.  Something of the form:

  pBar <- txtProgressBar( min = , max = , type = 3 )

  cResults <- .Call( 'some_c_routine', some, args, including, pBar )

  close( pBar )

Since you pass the progress bar to the C routine, you could perform a
callback to the R function setTxtProgressBar() to update it every iteration. 
I posted an example of how to perform callbacks from C to R in this thread:

http://n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.html#a1580423

Hope this helps!

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://n4.nabble.com/Rprintf-not-updating-tp1751703p1751725.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help transfrom R to C

2010-04-05 Thread pinusan


Dear R users,

I would like to transform the following "for loop" from R-code to C-code
because it takes really long time to have inc.freqy table.
Unfortunately, I do not have experience to write C code. 
Plese, give me some example or advise to transfrom the R to C code.

I have attached the code and some result that I made in R.

Have a nice day.

Hong Su
 
# Data import
ct10_pt1_neartree<-read.table("~/Desktop/hongsu/clustered_pattern/ct10/ct10/ct10_58ft_pt1_neartree",
 
   header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
test<-round(ct10_pt1_neartree, digits=1)

# randomly select 2 data set for test
test<-as.matrix(test[sample(2,replace=T),])
test

# Selected Data
> test
   Rpt1 Rpt2 Rpt3 Rpt4 Rpt5 Rpt6 Rpt7 Rpt8 Rpt9 Rpt10 Rpt11 Rpt12 Rpt13
Trial1 19.3  9.1 20.7  3.0 21.4 14.5  8.2 10.5  7.4  11.0   6.9   9.5  15.4
Trial2 24.5 22.2  4.9  7.8 20.9 12.5 18.3  6.5  7.6   2.2   8.9  19.6  21.1
   Rpt14 Rpt15 Rpt16 Rpt17 Rpt18 Rpt19 Rpt20 Rpt21 Rpt22 Rpt23 Rpt24
Rpt25
Trial1  49.2  39.9  18.3  14.8  29.9  27.3  20.2  14.6   9.5  14.2   7.5 
16.0
Trial2  38.6  12.8   1.5  12.5   4.6  10.6  14.5  12.5   5.8  14.0  55.9  
4.8
   Rpt26 Rpt27 Rpt28 Rpt29 Rpt30 Rpt31 Rpt32
Trial1  19.7  12.9   5.3   3.1  11.7  19.0  21.2
Trial2   0.8  11.3   4.7  12.5   5.5   5.9   5.2

# N of columns in test
n.center.point<-ncol(test)

# Maximum number in test
max.dist<-max(test) 

# make distance 
unit.dist <- seq(0, round(max.dist, digit=1),0.1) # change file


# for loop (need to change C)
inc.freqy<-matrix(0,nrow(test), length(unit.dist))

for(i in 1:nrow(test)){
  for (j in 1:length(unit.dist)){
inc.freqy[i,j]<-length(test[i,][test[i,]<=unit.dist[j]])
}  
   }

inc.freqy[,1:30]

# partial result for inc.freqy
> inc.freqy[,1:30]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[,14]
[1,]000000000 0 0 0 0
0
[2,]000000001 1 1 1 1
1
 [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 1 2 2 2 2 2 2 2 3 3 3 3
 [,27] [,28] [,29] [,30]
[1,] 0 0 0 0
[2,] 3 3 3 3

-- 
View this message in context: 
http://n4.nabble.com/Help-transfrom-R-to-C-tp1751764p1751764.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subtract a specified number of days from current date

2010-04-05 Thread Ravi S. Shankar

Hi R,

 

I have a column with dates. I need to create a vector say from (current
date-90 days: current date) 

 

For example I need to subtract 90 days from say Sys.Date()-92

 

If Sys.Date()-92 ==  "Sunday", Sys.Date()-92+1

if Sys.Date()-92 ==  "Saturday", Sys.Date()-92+2

 

i.e if subtracting gives me a weekend I  need the next work day.

 

I used the below. 

ifelse(weekdays(seq(seq(tf[[i]][j,1],by="-1
day",length.out=90)[90])=="Saturday",match(seq(tf[[i]][j,1],by="-1
day",length.out=90)[90]+2,tf[[i]][,1]),ifelse(weekdays(seq(tf[[i]][j,1],
by="-1 day",length.out=90)[90])=="Sunday",match(seq(tf[[i]][j,1],by="-1
day",length.out=90)[90]+1,tf[[i]][,1]),match(seq(tf[[i]][j, 1],by="-1
day",length.out=90)[90],tf[[i]][,1])))

I would be grateful If anybody can help me with a more elegant/efficient
approach.

Thank you in advance for your time!

Ravi 

This e-mail may contain confidential and/or privileged i...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] find the "next non-NA" value within each row of a data-frame

2010-04-05 Thread Anna Stevenson

#I wish to find the "next non-NA" value within each row of a data-frame.
#e.g. I have a data frame mydata. Rows 1, 2 & 3 have soem NA values.

mydata <- data.frame(matrix(seq(20*6), 20, 6))
mydata[1,3:5] <-  NA
mydata[2,2:3] <-  NA
mydata[2,5] <-  NA
mydata[3,6] <-  NA
mydata[1:3,]

#this loop accomplishes the task; I am tryign toi learn a "better" way

for(i in (ncol(mydata)-1):1 ){
mydata[,i] <- ifelse(is.na(mydata[,i])==TRUE,mydata[,i+1],mydata[,i])
}

mydata[1:3,]
#Thank you. I appreciate the help.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Erik Wright
> Sent: Monday, April 05, 2010 7:59 AM
> To: r-help@r-project.org
> Subject: [R] Rprintf not updating
> 
> Hello all,
> 
> I am using Rprintf in a C for loop (from .Call) to print a 
> progress indicator showing the current percent complete. The 
> loop I am doing is an time intensive call to another 
> function.  I have noticed that Rprintf does not print to the 
> R-window until the entire loop has been completed.  When it 
> reaches the end of the loop it suddenly prints 0 percent to 
> 100 percent in a split second.  For less intensive function 
> calls it prints properly while looping.  My question is this: 
>  is there any way to force Rprintf to print to the screen 
> during a loop?

If you are using the Windows GUI for R hit control-W or
toggle the Misc\OutputBuffering menu item.

> On a related note, how can I show a percent sign in the 
> output?  For example, Rprintf("0%"); only prints a zero.

Use %%.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Thanks!,
> Erik
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A questionb about the Wilcoxon signed rank test

2010-04-05 Thread Thomas Lumley




The problem is that your data contains ties, which mess up the nice theory and 
result in different people using different approximations.

I don't know where your z-statistic formula comes from, but you can find the 
one R uses by looking at the source code in stats:::wilcox.test.default.

To see that R's z-statistic approximation is better than yours, try breaking 
the ties randomly and using exact=TRUE.

wilcox.test(endprice0+rnorm(length(endprice0),s=1e-10),endprice1,paired=TRUE,exact=TRUE)
You will find that the p-values agree fairly well with R's 0.88.

-thomas


On Mon, 5 Apr 2010, hix li wrote:


Hi guys,
?
I have two data sets of prices: endprice0, endprice1
?
I use the Wilcox test:
?
wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",? 
conf.int = T, conf.level = 0.9)
?
The result is with V = 1819, p-value = 0.8812.
?
Then I calculated the z-value of the test: z-value = -2.661263. The 
corresponding p-value is: p-value = 0.003892, which is different from the 
p-value computed in the Wilcox test, I am using the following steps to compute 
the z-value:
?
diff = c(endprice0 - endprice1)
diffNew = diff[diff !=0]
diffNew.rank = rank(abs(diffNew))
diffNew.rank.sign <-? diffNew.rank? *? sign(diffNew)
ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819
ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751
?
v = ranks.neg
n = 100
z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263


Which p-value should I take for the Wilcox test then?
?
Hix
?
the data sets used in my?test?are:

endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0., 0.0600, 159.1900, 
242.5600, 0.0400, 289.9000, 0., 42.6100, 275.9500, 76.6200, 36.6400, 
0., 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 
137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0., 
130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 
162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0., 
0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0., 239.2100, 0., 13.7700, 
95.5300, 0.0400, 146.7200, 0., 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 
206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 
0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 
132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)
?
endprice1 = c(138.9300, 131.9700, 300.4700, 0., 0., 0.2200, 159.6300, 
277.9100, 0., 328.9700, 0., 40.5100, 270.1000, 52.8000, 39.3800, 
0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 
67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0., 1.3700, 140.6500, 
165.7200, 0., 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 
83.3700, 139.5300, 157.0500, 0., 147.5900, 105.2800, 0., 0., 
207.3000, 74.1100, 288.3900, 0.0400, 0., 213.7200, 0.0400, 14.8300, 
53.7000, 0.0400, 150.0800, 0., 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 
354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 
0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 
9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)


?


 __
Make your browsing faster, safer, and easier with the new Internet 
Explorer[[elided Yahoo spam]]
com/ca/internetexplorer/
[[alternative HTML version deleted]]




Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Deleting many list elements in one time

2010-04-05 Thread William Dunlap

> -Original Message-
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Duncan Murdoch
> Sent: Monday, April 05, 2010 8:24 AM
> To: anna
> Cc: r-help@r-project.org
> Subject: Re: [R] Deleting many list elements in one time
> 
> On 05/04/2010 11:17 AM, anna wrote:
> > Hi guys, here is a simple thing I want to do but it doesn't work:
> > I have a vector of the element indexes that I want to 
> delete called index
> > so when I write myList[[index]] <- NULL to delete these 
> elements here is
> > what I get:
> > Error in myList[[index]] <- NULL : 
> >   more elements supplied than there are to replace
> > Isn't it possible to delete multiple elements?
> 
> If your index is a list of numerical indices as opposed to 
> names, then 
> just do
> 
> myList <- myList[-index]
> 
> the same way you'd delete elements from any vector.
> 
> If index is a vector of names, you need to convert it to 
> numbers, using 
> something like
> 
> indexnums <- which( names(myList) %in% index )
> 
> and then use myList[-indexnums].

If none of names(myList) are in the index vector then
indexnums will be integer(0) and myList[-integer(0)]
will be the same as myList[integer(0)] which has length
zero.  Hence you should check for length(indexnums)>0
before doing the subscripting.

Logical subscripting avoids that trap
isNameInIndex <- names(myList) %in% index
myList[!isNameInIndex]
(Read the '[' as 'such that').

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Duncan Murdoch
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help transfrom R to C

2010-04-05 Thread Thomas Lumley


On Mon, 5 Apr 2010, pinusan wrote:



Dear R users,

I would like to transform the following "for loop" from R-code to C-code
because it takes really long time to have inc.freqy table.
Unfortunately, I do not have experience to write C code.
Plese, give me some example or advise to transfrom the R to C code.


I think you can just use

rowSums( test, unit.dist, "<=")

instead of your for() loop.

   -thomas




I have attached the code and some result that I made in R.

Have a nice day.

Hong Su

# Data import
ct10_pt1_neartree<-read.table("~/Desktop/hongsu/clustered_pattern/ct10/ct10/ct10_58ft_pt1_neartree",
  header=TRUE, sep=",", na.strings="NA", dec=".", strip.white=TRUE)
test<-round(ct10_pt1_neartree, digits=1)

# randomly select 2 data set for test
test<-as.matrix(test[sample(2,replace=T),])
test

# Selected Data

test

  Rpt1 Rpt2 Rpt3 Rpt4 Rpt5 Rpt6 Rpt7 Rpt8 Rpt9 Rpt10 Rpt11 Rpt12 Rpt13
Trial1 19.3  9.1 20.7  3.0 21.4 14.5  8.2 10.5  7.4  11.0   6.9   9.5  15.4
Trial2 24.5 22.2  4.9  7.8 20.9 12.5 18.3  6.5  7.6   2.2   8.9  19.6  21.1
  Rpt14 Rpt15 Rpt16 Rpt17 Rpt18 Rpt19 Rpt20 Rpt21 Rpt22 Rpt23 Rpt24
Rpt25
Trial1  49.2  39.9  18.3  14.8  29.9  27.3  20.2  14.6   9.5  14.2   7.5
16.0
Trial2  38.6  12.8   1.5  12.5   4.6  10.6  14.5  12.5   5.8  14.0  55.9
4.8
  Rpt26 Rpt27 Rpt28 Rpt29 Rpt30 Rpt31 Rpt32
Trial1  19.7  12.9   5.3   3.1  11.7  19.0  21.2
Trial2   0.8  11.3   4.7  12.5   5.5   5.9   5.2

# N of columns in test
n.center.point<-ncol(test)

# Maximum number in test
max.dist<-max(test)

# make distance
unit.dist <- seq(0, round(max.dist, digit=1),0.1) # change file


# for loop (need to change C)
inc.freqy<-matrix(0,nrow(test), length(unit.dist))

for(i in 1:nrow(test)){
 for (j in 1:length(unit.dist)){
   inc.freqy[i,j]<-length(test[i,][test[i,]<=unit.dist[j]])
   }
  }

inc.freqy[,1:30]

# partial result for inc.freqy

inc.freqy[,1:30]

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[,14]
[1,]000000000 0 0 0 0
0
[2,]000000001 1 1 1 1
1
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 1 2 2 2 2 2 2 2 3 3 3 3
[,27] [,28] [,29] [,30]
[1,] 0 0 0 0
[2,] 3 3 3 3

--
View this message in context: 
http://n4.nabble.com/Help-transfrom-R-to-C-tp1751764p1751764.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A questionb about the Wilcoxon signed rank test

2010-04-05 Thread Peter Ehlers


Since this may be homework, I'll confine myself to a hint (which
may or may not be the problem; I haven't checked):

The formula you use for z is strongly dependent on the value of 'n'.

 -Peter Ehlers

On 2010-04-05 6:06, hix li wrote:

Hi guys,

I have two data sets of prices: endprice0, endprice1

I use the Wilcox test:

wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",  
conf.int = T, conf.level = 0.9)

The result is with V = 1819, p-value = 0.8812.

Then I calculated the z-value of the test: z-value = -2.661263. The 
corresponding p-value is: p-value = 0.003892, which is different from the 
p-value computed in the Wilcox test, I am using the following steps to compute 
the z-value:

diff = c(endprice0 - endprice1)
diffNew = diff[diff !=0]
diffNew.rank = rank(abs(diffNew))
diffNew.rank.sign<-  diffNew.rank  *  sign(diffNew)
ranks.pos<- sum(diffNew.rank.sign[diffNew.rank.sign>0]) = 1819
ranks.neg<- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751

v = ranks.neg
n = 100
z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263


Which p-value should I take for the Wilcox test then?

Hix

the data sets used in my test are:

endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0., 0.0600, 159.1900, 
242.5600, 0.0400, 289.9000, 0., 42.6100, 275.9500, 76.6200, 36.6400, 
0., 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 
137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0., 
130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 
162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0., 
0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0., 239.2100, 0., 13.7700, 
95.5300, 0.0400, 146.7200, 0., 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 
206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 
0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 
132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53)

endprice1 = c(138.9300, 131.9700, 300.4700, 0., 0., 0.2200, 159.6300, 
277.9100, 0., 328.9700, 0., 40.5100, 270.1000, 52.8000, 39.3800, 
0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 
67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0., 1.3700, 140.6500, 
165.7200, 0., 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 
83.3700, 139.5300, 157.0500, 0., 147.5900, 105.2800, 0., 0., 
207.3000, 74.1100, 288.3900, 0.0400, 0., 213.7200, 0.0400, 14.8300, 
53.7000, 0.0400, 150.0800, 0., 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 
354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 
0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 
9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83)





   __
Make your browsing faster, safer, and easier with the new Internet 
Explorer[[elided Yahoo spam]]
com/ca/internetexplorer/
[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Adding a prefix to all values in a col in a data.frame

2010-04-05 Thread Abhishek Pratap

Hi All

I am looking for a way to prefix a constant value to all the rows in column
in a data frame.

Eg.

V1
2
3
4
5

I want to make it like this

V1
number2
number3
number4
number5

Thanks!
-Abhi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding a prefix to all values in a col in a data.frame

2010-04-05 Thread Henrique Dallazuanna

Try this:

DF <- transform(DF, V1 = sprintf('number%d', V1))


On Mon, Apr 5, 2010 at 2:38 PM, Abhishek Pratap  wrote:
> Hi All
>
> I am looking for a way to prefix a constant value to all the rows in column
> in a data frame.
>
> Eg.
>
> V1
> 2
> 3
> 4
> 5
>
> I want to make it like this
>
> V1
> number2
> number3
> number4
> number5
>
> Thanks!
> -Abhi
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding a prefix to all values in a col in a data.frame

2010-04-05 Thread Abhishek Pratap

Thanks Henrique. It works fine.

Cheers!
-Abhi

On Mon, Apr 5, 2010 at 1:41 PM, Henrique Dallazuanna wrote:

> Try this:
>
> DF <- transform(DF, V1 = sprintf('number%d', V1))
>
>
> On Mon, Apr 5, 2010 at 2:38 PM, Abhishek Pratap 
> wrote:
> > Hi All
> >
> > I am looking for a way to prefix a constant value to all the rows in
> column
> > in a data frame.
> >
> > Eg.
> >
> > V1
> > 2
> > 3
> > 4
> > 5
> >
> > I want to make it like this
> >
> > V1
> > number2
> > number3
> > number4
> > number5
> >
> > Thanks!
> > -Abhi
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NMDS Ordination Graphics Problem

2010-04-05 Thread Trey


Dr. Stevens,

Hi, my name is Trey Scott, and I'm a grad student of Brian McCarthy's.  He
referred me to you because of your expertise in handling complex R problems. 
We were hoping you could help us solve a nagging problem that is prohibiting
me from producing graphicl output.

Here is a simple mock-up of the matrix I'm using

a b c d e f
1i  1 4 7 9 2 5
2i  12   176 2 3 7
3i   25 8 1 3 2
1c  02 4 7 2 1
2c  01 4 6 9 10 
3c  13   15   19   108 9

Where:  1i-3i are "infested" sites, and 1c-3c are "control sites".  A-F are
species found at each site.  I have several of these ordinations to perform
on different variables (BA, density, RIV, cover, etc..., all in different
matrices).  I'm running NMDS (metaMDS) ordinations on each matrices, and
producing ordination graphs for each cloud of points.  The problem I have is
that I cannot devise a way to split the cloud of points into infested and
control so that I can deduce any significant groupings.  A simple difference
in symbols/color (Ex. gray triangles for infested, black circles for
control) would do.  Also, I understand the use of pch/col/cex, I just need
to apply them to the "split".

So:

+   How would I split these out in R after I run the metaMDS in vegan?
+   What code would be necessary to bring this about?

McCarthy and I are at the end of our preoverbial rope on this; nothing has
worked.
-- 
View this message in context: 
http://n4.nabble.com/NMDS-Ordination-Graphics-Problem-tp1751845p1751845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] use of random and nested factors in lme

2010-04-05 Thread Joris Meys

Dear all,

I've read numerous posts about the random and nested factors in lme,
comparison to proc Mixed in SAS, and so on, but I'm still a bit confused by
the notations. More specifically, say we have a model with a fixed effect F,
a random effect R and another one N which is nested in R.

Say the model is described by Y~F
Can anyone clarify the difference between :
random = ~1|R:N
random = ~1|R/N
random = ~R:N
random = ~R/N
random = ~R|N
random = ~1|R+N

or direct me to an overview regarding notation of these formulas in lme
(package nlme)? The help files weren't exactly clear to me on this subject.

What confuses me most, is the use of the intercept in the random factor.
Does this mean the intercept is seen as random, has a random component or is
it just notation? In different mails from this list I found different
explanations.

Thank you in advance.
Cheers
Joris

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using pch with the RGL library

2010-04-05 Thread Iuri Gavronski

Hi,

I am trying to compare two 3D plots. For that, I am trying to use the
"pch" parameter in the "points3d" function, but it is not working. Is
it implemented? Any suggestion?

Here goes a reproducible code. I wanted the second plot having
different symbols for the points.

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
   matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2),
   matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y", "z")
k1<-kmeans(x,3)
k2<-kmeans(x,3,algorithm = "Lloyd")
library(rgl)
plot3d(x,col=k1$cluster)
points3d(x+.02,pch=8,col=k2$cluster)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NMDS Ordination Graphics Problem

2010-04-05 Thread stephen sefick

This is the easiest way I have found to do something similar to what you want.

#output of dput() easy way to share data with the list
x <- (structure(list(a = c(1L, 12L, 2L, 0L, 0L, 13L), b = c(4L, 17L,
5L, 2L, 1L, 15L), c = c(7L, 6L, 8L, 4L, 4L, 19L), d = c(9L, 2L,
1L, 7L, 6L, 10L), e = c(2L, 3L, 3L, 2L, 9L, 8L), f = c(5L, 7L,
2L, 1L, 10L, 9L)), .Names = c("a", "b", "c", "d", "e", "f"), class =
"data.frame", row.names = c("1i",
"2i", "3i", "1c", "2c", "3c")))

library(vegan)
library(ggplot2)

p <- metaMDS(x)

y <- do.call(rbind,strsplit(as.character(rownames(x)), split=""))
z <- data.frame(sample = y[,1], status = y[,2], p$points[,1:2])
qplot(X1,X2, data=z, shape=status)




On Mon, Apr 5, 2010 at 12:58 PM, Trey  wrote:
>
> Dr. Stevens,
>
> Hi, my name is Trey Scott, and I'm a grad student of Brian McCarthy's.  He
> referred me to you because of your expertise in handling complex R problems.
> We were hoping you could help us solve a nagging problem that is prohibiting
> me from producing graphicl output.
>
> Here is a simple mock-up of the matrix I'm using
>
>        a     b     c     d     e     f
> 1i      1     4     7     9     2     5
> 2i      12   17    6     2     3     7
> 3i       2    5     8     1     3     2
> 1c      0    2     4     7     2     1
> 2c      0    1     4     6     9     10
> 3c      13   15   19   10    8     9
>
> Where:  1i-3i are "infested" sites, and 1c-3c are "control sites".  A-F are
> species found at each site.  I have several of these ordinations to perform
> on different variables (BA, density, RIV, cover, etc..., all in different
> matrices).  I'm running NMDS (metaMDS) ordinations on each matrices, and
> producing ordination graphs for each cloud of points.  The problem I have is
> that I cannot devise a way to split the cloud of points into infested and
> control so that I can deduce any significant groupings.  A simple difference
> in symbols/color (Ex. gray triangles for infested, black circles for
> control) would do.  Also, I understand the use of pch/col/cex, I just need
> to apply them to the "split".
>
> So:
>
> +   How would I split these out in R after I run the metaMDS in vegan?
> +   What code would be necessary to bring this about?
>
> McCarthy and I are at the end of our preoverbial rope on this; nothing has
> worked.
> --
> View this message in context: 
> http://n4.nabble.com/NMDS-Ordination-Graphics-Problem-tp1751845p1751845.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Stephen Sefick

Let's not spend our time and resources thinking about things that are
so little or so large that all they really do for us is puff us up and
make us feel like gods.  We are mammals, and have not exhausted the
annoying little problems of being mammals.

-K. Mullis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Pearson correlation matrix heatmap

2010-04-05 Thread Bill Hyman

Hi all,

Does any one know how to make Pearson correlation matrix heatmap in R? The 
heatmap is a square with highly correlated elements clustered together. And the 
heatmap matrix is symmetric with respect to the diagonal line. Many thanks for 
your help!

Bill

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data manipulation problem

2010-04-05 Thread moleps

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to age-standardize 
the incidence I need to transform the data into a matrix with age-groups 
(divided in 5 or 10 years) along one axis and year divided into 5 years along 
the other axis. Each cell should contain the number of cases for that age group 
and for that period. 

I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 


age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

2010-04-05 Thread Erik Iverson


?cut to create categories
?table to make the table

moleps wrote:

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to age-standardize the incidence I need to transform the data into a matrix with age-groups (divided in 5 or 10 years) along one axis and year divided into 5 years along the other axis. Each cell should contain the number of cases for that age group and for that period. 


I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 



age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

2010-04-05 Thread Bert Gunter

You have tempted, and being weak, I yield to temptation:

"Any good ideas?"

Yes. Don't do this.

(what you probably really want to do is fit a model with age as a factor,
which can be done statistically e.g. by logistic regression; or graphically
using conditioning plots, e.g. via trellis graphics (the lattice package).
This avoids the arbitrariness and discontinuities of binning by age range.)

Bert Gunter
Genentech Nonclinical Biostatistics
 
 -Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of moleps
Sent: Monday, April 05, 2010 11:46 AM
To: r-help@r-project.org
Subject: [R] Data manipulation problem

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to
age-standardize the incidence I need to transform the data into a matrix
with age-groups (divided in 5 or 10 years) along one axis and year divided
into 5 years along the other axis. Each cell should contain the number of
cases for that age group and for that period. 

I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 


age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matrix elements are vectors

2010-04-05 Thread Trafim Vanishek

Dear all,

My question how is it possible to define a matrix A with 10 rows 1 column,
so that its elements are vectors of undefined length.

I need to have a possibility later to add elements like A[1,1] <-
c(A[1,1],3,4,5)

Thanks a lot for the help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Using pch with the RGL library

2010-04-05 Thread Duncan Murdoch


On 05/04/2010 2:27 PM, Iuri Gavronski wrote:

Hi,

I am trying to compare two 3D plots. For that, I am trying to use the
"pch" parameter in the "points3d" function, but it is not working. Is
it implemented? Any suggestion?
  


Try reading the help page for points3d.  There is no "pch" mentioned.  
You can plot different characters using text3d, but the list of symbols 
that plot shows with numerical pch values just don't exist in rgl.


Duncan Murdoch

Here goes a reproducible code. I wanted the second plot having
different symbols for the points.

x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
   matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2),
   matrix(rnorm(100, mean = 2, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y", "z")
k1<-kmeans(x,3)
k2<-kmeans(x,3,algorithm = "Lloyd")
library(rgl)
plot3d(x,col=k1$cluster)
points3d(x+.02,pch=8,col=k2$cluster)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

2010-04-05 Thread moleps

I already did try the regression modeling approach. However the epidemiologists 
(referee) turns out to be quite fond of comparing the incidence rates to 
different standard populations, hence the need for this labourius approach. 
And trying the "cutting" approach I ended up with :

> table (age5)
age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
(40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  (80,85] 
(85,100] 
  35   34   33   47   51  109  157  231  
362  511  745  926 1002  866  547  247   82 
  18 
> table (yr5)
yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
(1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 
  3   5   5   5   5   5 
  5   5   5   5   5   3 
> table (yr5,age5)
Error in table(yr5, age5) : all arguments must have the same length

Sincerely,
M





On 5. apr. 2010, at 20.59, Bert Gunter wrote:

> You have tempted, and being weak, I yield to temptation:
> 
> "Any good ideas?"
> 
> Yes. Don't do this.
> 
> (what you probably really want to do is fit a model with age as a factor,
> which can be done statistically e.g. by logistic regression; or graphically
> using conditioning plots, e.g. via trellis graphics (the lattice package).
> This avoids the arbitrariness and discontinuities of binning by age range.)
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of moleps
> Sent: Monday, April 05, 2010 11:46 AM
> To: r-help@r-project.org
> Subject: [R] Data manipulation problem
> 
> Dear R´ers.
> 
> I´ve got a dataset with age and year of diagnosis. In order to
> age-standardize the incidence I need to transform the data into a matrix
> with age-groups (divided in 5 or 10 years) along one axis and year divided
> into 5 years along the other axis. Each cell should contain the number of
> cases for that age group and for that period. 
> 
> I.e.
> My data format now is
> ID-age (to one decimal)-year(yearly data).
> 
> What I´d like is 
> 
> 
> age 1960-1965 1966-1970 etc...
> 0-5 3 8 10 15
> 6-10 2 5 8 13
> etc..
> 
> 
> Any good ideas?
> 
> Regards,
> M
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix elements are vectors

2010-04-05 Thread Bert Gunter

Use lists:

?list (but probably assumes you know what a list is already)

Relevant sections of "An Introduction to R." (Considerable time and effort
have been spent writing this to ease the entry of new users into R. Have you
devoted any time or effort to reading it? )

Bert Gunter
Genentech Nonclinical Biostatistics
 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Trafim Vanishek
Sent: Monday, April 05, 2010 12:21 PM
To: r-help@r-project.org
Subject: [R] Matrix elements are vectors

Dear all,

My question how is it possible to define a matrix A with 10 rows 1 column,
so that its elements are vectors of undefined length.

I need to have a possibility later to add elements like A[1,1] <-
c(A[1,1],3,4,5)

Thanks a lot for the help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix elements are vectors

2010-04-05 Thread Daniel Malter

I may be mistaken, but I don't think that's possible or even should be
possible. A matrix is m x n, where m and n are (kind of fixed) integers. You
cannot have a matrix where m(1) to m(n) (the row lengths) vary. If you want
to do this, you have to use a list instead (I believe).

As a poor workaround you could fill an m x n matrix with NAs or 0s (where n
is the anticipated maximum length among all rows) and then fill the rows
with the values. However, this is dirty as you will not be able to
distinguish true NAs or 0s at the end of a row from the ones you have used
to fill the dummy matrix in the first place; so the list is really the way
to go.

x1=c(1,2,3)
x2=c(4,5,6)

y1=c(0,1,2,3)
y2=c(4,5,6,7)

myList=list(x1,y1)
myList

myList[[1]]=append(myList[[1]],x2)
myList[[2]]=append(myList[[2]],y2)
myList

HTH,
Daniel



-
cuncta stricte discussurus
-

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Trafim Vanishek
Sent: Monday, April 05, 2010 3:21 PM
To: r-help@r-project.org
Subject: [R] Matrix elements are vectors

Dear all,

My question how is it possible to define a matrix A with 10 rows 1 column,
so that its elements are vectors of undefined length.

I need to have a possibility later to add elements like A[1,1] <-
c(A[1,1],3,4,5)

Thanks a lot for the help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix elements are vectors

2010-04-05 Thread Duncan Murdoch


On 05/04/2010 3:20 PM, Trafim Vanishek wrote:

Dear all,

My question how is it possible to define a matrix A with 10 rows 1 column,
so that its elements are vectors of undefined length.

I need to have a possibility later to add elements like A[1,1] <-
c(A[1,1],3,4,5)

Thanks a lot for the help!

You create the matrix from a list:

A <- matrix( as.list(1:10), 10, 1)

But now you need to be careful:  This won't work

A[1,1] <- c(A[1,1],3,4,5)

you need


A[[1,1]] <- c(A[[1,1]],3,4,5)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

2010-04-05 Thread Erik Iverson

I don't know what your data are like, since you haven't given a 
reproducible example. I was imagining something like:


## generate fake data
age <- sample(20:90, 100, replace = TRUE)
year <- sample(1950:2000, 100, replace = TRUE)

##look at big table
table(age, year)

## categorize data
## see include.lowest and right arguments to cut
age.factor <- cut(age, breaks = seq(20, 90, by = 10),
  include.lowest = TRUE)

year.factor <- cut(year, breaks = seq(1950, 2000, by = 10),
   include.lowest = TRUE)

table(age.factor, year.factor)

moleps wrote:
I already did try the regression modeling approach. However the epidemiologists (referee) turns out to be quite fond of comparing the incidence rates to different standard populations, hence the need for this labourius approach. 
And trying the "cutting" approach I ended up with :



table (age5)

age5
   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  (80,85] (85,100] 
  35   34   33   47   51  109  157  231  362  511  745  926 1002  866  547  247   82   18 

table (yr5)

yr5
(1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009] 
  3   5   5   5   5   5   5   5   5   5   5   3 

table (yr5,age5)

Error in table(yr5, age5) : all arguments must have the same length

Sincerely,
M





On 5. apr. 2010, at 20.59, Bert Gunter wrote:


You have tempted, and being weak, I yield to temptation:

"Any good ideas?"

Yes. Don't do this.

(what you probably really want to do is fit a model with age as a factor,
which can be done statistically e.g. by logistic regression; or graphically
using conditioning plots, e.g. via trellis graphics (the lattice package).
This avoids the arbitrariness and discontinuities of binning by age range.)

Bert Gunter
Genentech Nonclinical Biostatistics

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of moleps
Sent: Monday, April 05, 2010 11:46 AM
To: r-help@r-project.org
Subject: [R] Data manipulation problem

Dear R´ers.

I´ve got a dataset with age and year of diagnosis. In order to
age-standardize the incidence I need to transform the data into a matrix
with age-groups (divided in 5 or 10 years) along one axis and year divided
into 5 years along the other axis. Each cell should contain the number of
cases for that age group and for that period. 


I.e.
My data format now is
ID-age (to one decimal)-year(yearly data).

What I´d like is 



age 1960-1965 1966-1970 etc...
0-5 3 8 10 15
6-10 2 5 8 13
etc..


Any good ideas?

Regards,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] skip for loop

2010-04-05 Thread Etienne Bellemare Racine

you can use break :
for(i in 1:5) {
 #e.g.
 a <- sample(1:10, 1)
 # important part :
 if(a==3) break
}
a

or while :
a <- 0
while(a != 3){
 # an operation that change a :
a <- sample(1:10, 1)
}

Etienne

Le 2010-04-05 10:46, Ravi S. Shankar a écrit :
> Hi R,
>
>
>
> I am running a for loop in which I am doing a certain calculation. As an
> outcome of calculation I get an out put say "a". Now in my for loop "I"
> needs to be initiated to "a".
>
>
>
> Based the below example if the output "a"=3 then the second iteration
> needs to be skipped. Is there a way to do this?
>
> for(i in 1:5)
>
> {
>
> ##Calculation##
>
> a=3 ## outcome of calculation
>
> }
>
>
>
> Any help appreciated. Thanks in advance for the time!
>
>
>
> Regards
>
> Ravi
>
>
>
> This e-mail may contain confidential and/or privileged i...{{dropped:13}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data manipulation problem

2010-04-05 Thread moleps


Thx Erik,
I have no idea what went wrong with the other code snippet, but this one 
works.. Appreciate it.

qta<- table(cut(age,breaks = seq(0, 100, by = 10),include.lowest = 
TRUE),cut(year,breaks=seq(1950,2010,by=5),include.lowest=TRUE))

M


On 5. apr. 2010, at 21.45, Erik Iverson wrote:

> I don't know what your data are like, since you haven't given a reproducible 
> example. I was imagining something like:
> 
> ## generate fake data
> age <- sample(20:90, 100, replace = TRUE)
> year <- sample(1950:2000, 100, replace = TRUE)
> 
> ##look at big table
> table(age, year)
> 
> ## categorize data
> ## see include.lowest and right arguments to cut
> age.factor <- cut(age, breaks = seq(20, 90, by = 10),
>  include.lowest = TRUE)
> 
> year.factor <- cut(year, breaks = seq(1950, 2000, by = 10),
>   include.lowest = TRUE)
> 
> table(age.factor, year.factor)
> 
> moleps wrote:
>> I already did try the regression modeling approach. However the 
>> epidemiologists (referee) turns out to be quite fond of comparing the 
>> incidence rates to different standard populations, hence the need for this 
>> labourius approach. And trying the "cutting" approach I ended up with :
>>> table (age5)
>> age5
>>   (0,5]   (5,10]  (10,15]  (15,20]  (20,25]  (25,30]  (30,35]  (35,40]  
>> (40,45]  (45,50]  (50,55]  (55,60]  (60,65]  (65,70]  (70,75]  (75,80]  
>> (80,85] (85,100]   35   34   33   47   51  109  
>> 157  231  362  511  745  926 1002  866  547  
>> 247   82   18 
>>> table (yr5)
>> yr5
>> (1950,1955] (1955,1960] (1960,1965] (1965,1970] (1970,1975] (1975,1980] 
>> (1980,1985] (1985,1990] (1990,1995] (1995,2000] (2000,2005] (2005,2009]  
>>  3   5   5   5   5   5   
>> 5   5   5   5   5   3 
>>> table (yr5,age5)
>> Error in table(yr5, age5) : all arguments must have the same length
>> Sincerely,
>> M
>> On 5. apr. 2010, at 20.59, Bert Gunter wrote:
>>> You have tempted, and being weak, I yield to temptation:
>>> 
>>> "Any good ideas?"
>>> 
>>> Yes. Don't do this.
>>> 
>>> (what you probably really want to do is fit a model with age as a factor,
>>> which can be done statistically e.g. by logistic regression; or graphically
>>> using conditioning plots, e.g. via trellis graphics (the lattice package).
>>> This avoids the arbitrariness and discontinuities of binning by age range.)
>>> 
>>> Bert Gunter
>>> Genentech Nonclinical Biostatistics
>>> 
>>> -Original Message-
>>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>>> Behalf Of moleps
>>> Sent: Monday, April 05, 2010 11:46 AM
>>> To: r-help@r-project.org
>>> Subject: [R] Data manipulation problem
>>> 
>>> Dear R´ers.
>>> 
>>> I´ve got a dataset with age and year of diagnosis. In order to
>>> age-standardize the incidence I need to transform the data into a matrix
>>> with age-groups (divided in 5 or 10 years) along one axis and year divided
>>> into 5 years along the other axis. Each cell should contain the number of
>>> cases for that age group and for that period. 
>>> I.e.
>>> My data format now is
>>> ID-age (to one decimal)-year(yearly data).
>>> 
>>> What I´d like is 
>>> 
>>> age 1960-1965 1966-1970 etc...
>>> 0-5 3 8 10 15
>>> 6-10 2 5 8 13
>>> etc..
>>> 
>>> 
>>> Any good ideas?
>>> 
>>> Regards,
>>> M
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

This is not meant to be critical of R, but is intended as
a possible source for improvements to R.
SAS needs the competition.


I am reasonably knowledgeable about

R
SAS-(all products including IML)

SAS and R run on

Windows(all flavors)
UNIX(all flavors)
Apple OSs

Does R run on natively (no emulation)?
We have quite a few users on these systems

VAX-VMS
Z-OS (mainframe)
MVS
VM/CMS(IBM)


SAS had the notion in the 80's of a logon to SAS and
connections to mutiple operating systems simultaneously,
making all operating systems look like one.
SAS has native low level functions like dcreate(create directory), fopen,
fread..
that can be used on all operating systems eliminating
operating specific commands, like dir(windows), ls(unix) and ispf(3.4 on
mainframes).

SAS provides one IDE to multiple operating systems simultaneously.

For instance:

Run under windows(slightly simplified)

libname unx work server=unix;  /* SAS work directory on UNIX - can be a
virtual storage system like EMC */

data "c:\tmp\class.sas7bdat"; /* create sas dataset class in windows */
  set unx.class; /* dataset class is in the UNIX work directory - not
mounted in windows */
run;

or

libname xls "c:\temp\test.xls"; /* does not have to exist */

data xls.class;  /* create excel file under windows */
  set unx.class; /* remote unix system - file system not mounted in windows
*/
run;

You can mix mainframe, windows and unix.

Other functions I use all the time when coding SAS.

1. Highlight a block of code and hit F1 and the code is run
   interactivel under windows.

2. Highlight and hit F2 and the code is run
   batch under windows.

3. Highlight and hit F3 and the code is run
   interactively in unix.

4. Highlight and hit F4 and the code is run
   batch in unix.

5. Highlight "c:\tmp\class.sas7bdat" in the editor and
   hit F5 and a proc contents appears in the output window

6. Highlight "c:\tmp\class.sas7bdat" in the editor and
   hit F6 and 40 obs appear in the output window

6. Highlight "c:\tmp\class.sas7bdat" type freq name*sex and
   a frequency analysis appears in the output window.

I think the real power of SAS is the ability to run
all of SAS from command line without affecting the IDE.
I have about 50 commands that interact with the
editor and multiple operating systems to do useful work.


As far as I am concerned SAS-IML is dead.
R is much more powerful.
The key for SAS is a 'drop down to R from SAS'.
The current implementation in IML is absurd, because
you do not need IML.
SAS is resisting this because of its investment in IML.
Dataframes and SAS datasets have a lot in common.
-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752043.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread Erik Wright

Hi Charlie,

I like your idea of updating an R progress bar from C, but I don't at all 
understand how to call txtProgressBar from C.  I have looked at Writing R 
Extensions and it is equally confusing.  Any help would be appreciated.

Thanks!,
Erik


On Apr 5, 2010, at 10:29 AM, Sharpie wrote:

> 
> 
> Erik Wright wrote:
>> 
>> Hello all,
>> 
>> I am using Rprintf in a C for loop (from .Call) to print a progress
>> indicator showing the current percent complete. The loop I am doing is an
>> time intensive call to another function.  I have noticed that Rprintf does
>> not print to the R-window until the entire loop has been completed.  When
>> it reaches the end of the loop it suddenly prints 0 percent to 100 percent
>> in a split second.  For less intensive function calls it prints properly
>> while looping.  My question is this:  is there any way to force Rprintf to
>> print to the screen during a loop?
>> 
>> On a related note, how can I show a percent sign in the output?  For
>> example, Rprintf("0%"); only prints a zero.
>> 
>> Thanks!,
>> Erik
>> 
> 
> Perhaps you could use a callback to R's built in progress bar functions to
> perform this for you in a nicely formatted way.  Something of the form:
> 
>  pBar <- txtProgressBar( min = , max = , type = 3 )
> 
>  cResults <- .Call( 'some_c_routine', some, args, including, pBar )
> 
>  close( pBar )
> 
> Since you pass the progress bar to the C routine, you could perform a
> callback to the R function setTxtProgressBar() to update it every iteration. 
> I posted an example of how to perform callbacks from C to R in this thread:
> 
> 
> http://n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.html#a1580423
> 
> Hope this helps!
> 
> -Charlie
> 
> -
> Charlie Sharpsteen
> Undergraduate-- Environmental Resources Engineering
> Humboldt State University
> -- 
> View this message in context: 
> http://n4.nabble.com/Rprintf-not-updating-tp1751703p1751725.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread Emmanuel Charpentier

Dear Desmond,

a somewhat analogous question has been posed recently (about 2 weeks
ago) on the sig-mixed-model list, and I tried (in two posts) to give
some elements of information (and some bibliographic pointers). To
summarize tersely :

- a model of "information missingness" (i. e. *why* are some data
missing ?) is necessary to choose the right measures to take. Two
special cases (Missing At Random and Missing Completely At Random) allow
for (semi-)automated compensation. See literature for further details.

- complete-case analysis may give seriously weakened and *biased*
results. Pairwise-complete-case analysis is usually *worse*.

- simple imputation leads to underestimated variances and might also
give biased results.

- multiple imputation is currently thought of a good way to alleviate
missing data if you have a missingness model (or can honestly bet on
MCAR or MAR), and if you properly combine the results of your
imputations.

- A few missing data packages exist in R to handle this case. My ersonal
selection at this point would be mice, mi, Amelia, and possibly mitools,
but none of them is fully satisfying(n particular, accounting for a
random effect needs special handling all the way in all packages...).

- An interesting alternative is to write a full probability model (in
BUGS fo example) and use Bayesian estimation ; in this framework,
missing data are "naturally" modeled in the model used for analysis.
However, this might entail *large* work, be difficult and not always
succeed (numerical difficulties. Furthermore, the results of a Byesian
analysis might not be what you seek...

HTH,

Emmanuel Charpentier

Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit :
> Dear all,
> 
> I want to do a logistic regression.
> So far I've only found out how to do that in R, in a dataset of complete 
> cases.
> I'd like to do logistic regression via max likelihood, using all the study 
> cases (complete and incomplete). Can you help?
> 
> I'm using glm() with family=binomial(logit).
> If any covariate in a study case is missing then the study case is dropped, 
> i.e. it is doing a complete cases analysis.
> As a lot of study cases are being dropped, I'd rather it did maximum 
> likelihood using all the study cases.
> I tried setting glm()'s na.action to NULL, but then it complained about NA's 
> present in the study cases.
> I've about 1000 unmatched study cases and less than 10 covariates so could 
> use unconditional ML estimation (as opposed to conditional ML estimation).
> 
> regards
> Desmond
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] changing column names in a dataframe

2010-04-05 Thread jda


Hi folks,

I have imported data from an Excel spreadsheet.  Columns in that spreadsheet
are named "name", "x", and "y", and several sets of those columns appear in
the worksheet.  For example:

name  x y name   x y
test1 1 3 test2  4 4
test1 2 2 test2  5 5
test1 3 1 test2  6 6

When I import these data into R, into a dataframe, I end up with something
like this:

  name  x y name1  x1 y1
1 test1 1 3 test2  4  4
2 test1 2 2 test2  5  5
3 test1 3 1 test2  6  6

I -cannot- change the excel file, and must work with the data and labels as
they appear in R.  I would like to end up with a dataframe that looks more
like this:

  x-test1  y-test1  x-test2  y-test2
1 1344
2 2255
3 3166

I believe this involves renaming the dataframe's columns using information
found within other columns of the dataframe.  It subsequently involves
deleting some of the (now-redundant) columns.  If anyone could offer some
helpful hints, I would be very appreciative!  Thank you in advance,
regardless of reply, for taking the time to read through my plea.

-jda
-- 
View this message in context: 
http://n4.nabble.com/changing-column-names-in-a-dataframe-tp1752034p1752034.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the "next non-NA" value within each row of a data-frame

2010-04-05 Thread Peter Ehlers


If I understand correctly what you want (according to your loop),
you could use the na.locf function in pkg:zoo.

 library(zoo)
 mat <- t(apply(mydata, 1, na.locf, fromLast=TRUE, na.rm=FALSE))
 dat <- as.data.frame(mat) ## since apply returns a matrix

 -Peter Ehlers

On 2010-04-05 10:52, Anna Stevenson wrote:

#I wish to find the "next non-NA" value within each row of a data-frame.
#e.g. I have a data frame mydata. Rows 1, 2&  3 have soem NA values.

mydata<- data.frame(matrix(seq(20*6), 20, 6))
mydata[1,3:5]<-  NA
mydata[2,2:3]<-  NA
mydata[2,5]<-  NA
mydata[3,6]<-  NA
mydata[1:3,]

#this loop accomplishes the task; I am tryign toi learn a "better" way

for(i in (ncol(mydata)-1):1 ){
mydata[,i]<- ifelse(is.na(mydata[,i])==TRUE,mydata[,i+1],mydata[,i])
}

mydata[1:3,]
#Thank you. I appreciate the help.




[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Ehlers
University of Calgary

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a warning message from "heatmap.2"

2010-04-05 Thread Tao Shi


Hi List,

I want to show the heatmap of a correlation matrix using "heatmap.2", however 
always get this warning message (see below) and the column dendrogram is not 
showing.  It's not really a big deal, but curious how to suppress it and still 
let R show what I want to show (i.e. a symmetrical heatmap with dendrogram for 
both rows and columns).  I'm using gplots 2.7.4 and R2.10.1 on Win XP.

Thanks!


> heatmap.2(cor(matrix(rnorm(100),10,10)), symm=T, symbreaks=T, trace="none", 
> density.info="none")
Warning message:
In heatmap.2(cor(matrix(rnorm(100), 10, 10)), symm = T, symbreaks = T,  :
  Discrepancy: Colv is FALSE, while dendrogram is `row'. Omitting column 
dendogram.



  
_
The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with 
Hotmail. 

PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] contradictory output between ncv.test() and gvlma()

2010-04-05 Thread Anthony Lopez

Can anyone tell me why the ncv.test output and the gvlma output would be
contradictory on the question of heteroscedasticity?  Below, the ncv.test
output reveals a problem with heteroscedasticity, but the gvlma output says
that the assumptions are acceptable.  How is this reconciled?

> ncv.test(defmodA)
Non-constant Variance Score Test
Variance formula: ~ fitted.values
Chisquare = 7.360374Df = 1 p = 0.00666769

> gvlma(defmodA)

Call:
lm(formula = DefPunWmn1 ~ DefPersBenef, data = Data)

Coefficients:
 (Intercept)  DefPersBenef
  1.25790.1572


ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance =  0.05

Call:
 gvlma(x = defmodA)

 Value   p-value   Decision
Global Stat37.3746 1.508e-07 Assumptions NOT satisfied!
Skewness   32.8916 9.744e-09 Assumptions NOT satisfied!
Kurtosis2.6248 1.052e-01Assumptions acceptable.
Link Function   0.3684 5.439e-01Assumptions acceptable.
Heteroscedasticity  1.4899 2.222e-01Assumptions acceptable.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find the "next non-NA" value within each row of a data-frame

2010-04-05 Thread Gabor Grothendieck

Here is a slight simplification to the first line based on the fact
that na.locf works column by column:

mat <- t(na.locf(t(mydata), fromLast = TRUE, na.rm = FALSE))


On Mon, Apr 5, 2010 at 1:46 PM, Peter Ehlers  wrote:
> If I understand correctly what you want (according to your loop),
> you could use the na.locf function in pkg:zoo.
>
>  library(zoo)
>  mat <- t(apply(mydata, 1, na.locf, fromLast=TRUE, na.rm=FALSE))
>  dat <- as.data.frame(mat) ## since apply returns a matrix
>
>  -Peter Ehlers
>
> On 2010-04-05 10:52, Anna Stevenson wrote:
>>
>> #I wish to find the "next non-NA" value within each row of a data-frame.
>> #e.g. I have a data frame mydata. Rows 1, 2&  3 have soem NA values.
>>
>> mydata<- data.frame(matrix(seq(20*6), 20, 6))
>> mydata[1,3:5]<-  NA
>> mydata[2,2:3]<-  NA
>> mydata[2,5]<-  NA
>> mydata[3,6]<-  NA
>> mydata[1:3,]
>>
>> #this loop accomplishes the task; I am tryign toi learn a "better" way
>>
>> for(i in (ncol(mydata)-1):1 ){
>> mydata[,i]<- ifelse(is.na(mydata[,i])==TRUE,mydata[,i+1],mydata[,i])
>> }
>>
>> mydata[1:3,]
>> #Thank you. I appreciate the help.
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Ehlers
> University of Calgary
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] new to R, analysis of latency data

2010-04-05 Thread jeff d


Hi,

  I'd like to move from excel to R because our dataset are so large. Here's
what my data looks like:

Transaction Rate   Run#   Transaction TypeLocationLatency in
Seconds
   101   Order 
A0
   101   Order 
B3
   101   Order 
C1
   101   Order 
D2
   101   ACK   
A0
   101   ACK   
B5
   101   ACK   
C2
   101   ACK   
D2
   101   FILL   
A0
   101   FILL   
B2
   101   FILL   
C3
   101   FILL   
D2

- we have about 1000 runs per transaction rate (run# 1..1000)
- we have 50 transaction rates (transaction rate 10..500 incrementing by 10)

We'd like to be able to create a graph where:
- Y axis = 95 pecentile latency of transaction type data (order, ack, fill)
- X axis = transaction rate

I've read the basic doc, created some simple plots, could someone get me
going in the right direction?

tia,
jd


  
-- 
View this message in context: 
http://n4.nabble.com/new-to-R-analysis-of-latency-data-tp1752096p1752096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Permutation of Matrix

2010-04-05 Thread Ayush Raman

Hi all,

How can I have a permuted matrix where the second row is the permutation
over the first row ; third is the permutation of the second row; forth is
the permutation of the third row and so on ?

Thanks.

-- 
Regards,
Ayush Raman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Thomas Lumley


On Mon, 5 Apr 2010, Roger DeAngelis(xlr82sas) wrote:


SAS and R run on

Windows(all flavors)
UNIX(all flavors)
Apple OSs


I would expect that for more obscure Unices it would be difficult to get SAS, 
but basically, yes.


Does R run on natively (no emulation)?
We have quite a few users on these systems

VAX-VMS
Z-OS (mainframe)
MVS
VM/CMS(IBM)


I don't know that anyone has tried. The basic requirements for R are ANSI C89 
and some of the more common parts of C99, Fortran 77, the standard IEEE 
floating point formats, and quite a lot of POSIX.

For IBM's z/OS at least, I would have thought Linux virtualization would be the 
most straightforward approach to using R.

 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread Sharpie

Erik Wright wrote:
> 
> Hi Charlie,
> 
> I like your idea of updating an R progress bar from C, but I don't at all
> understand how to call txtProgressBar from C.  I have looked at Writing R
> Extensions and it is equally confusing.  Any help would be appreciated.
> 
> Thanks!,
> Erik
> 

Hi Erik,

Did you look at the link I put in my last post?

http://n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.html#a1580423

I linked that post because it gives a step-by-step example for performing a
callback to the runif function from C.  Just  change the execution
environment from stats to utils and alter the function arguments
appropriately and it should work for txtProgressBar and setTxtProgressBar.

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://n4.nabble.com/Rprintf-not-updating-tp1751703p1752102.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Kevin Middleton

> SAS and R run on
> 
> Windows(all flavors)
> UNIX(all flavors)
> Apple OSs

According to SAS (http://support.sas.com/kb/33/140.html and 
http://support.sas.com/kb/22/960.html), SAS will not run on OS X past 10.4. OS 
X 10.5 was released in late 2007, so I don't think it's really fair to say that 
SAS runs on Apple OSs.


-
Kevin M. Middleton
Department of Biology
California State University San Bernardino
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changing column names in a dataframe

2010-04-05 Thread Ista Zahn

If the columns are in order you can just past in the new names:

Dat <- read.table(textConnection("name  x y name1  x1 y1
test1 1 3 test2  4  4
test1 2 2 test2  5  5
test1 3 1 test2  6  6"), header=TRUE)
closeAllConnections()

x.vars <- grep("x", names(Dat))
y.vars <- grep("y", names(Dat))
names.vars <- grep("name", names(Dat))

names(Dat)[x.vars]  <- paste("x-test", 1:length(x.vars), sep="")
names(Dat)[y.vars]  <- paste("y-test", 1:length(y.vars), sep="")

Dat <- Dat[, -(names.vars)]


-Ista

On Mon, Apr 5, 2010 at 9:09 PM, jda  wrote:

>
> Hi folks,
>
> I have imported data from an Excel spreadsheet.  Columns in that
> spreadsheet
> are named "name", "x", and "y", and several sets of those columns appear in
> the worksheet.  For example:
>
> name  x y name   x y
> test1 1 3 test2  4 4
> test1 2 2 test2  5 5
> test1 3 1 test2  6 6
>
> When I import these data into R, into a dataframe, I end up with something
> like this:
>
>  name  x y name1  x1 y1
> 1 test1 1 3 test2  4  4
> 2 test1 2 2 test2  5  5
> 3 test1 3 1 test2  6  6
>
> I -cannot- change the excel file, and must work with the data and labels as
> they appear in R.  I would like to end up with a dataframe that looks more
> like this:
>
>  x-test1  y-test1  x-test2  y-test2
> 1 1344
> 2 2255
> 3 3166
>
> I believe this involves renaming the dataframe's columns using information
> found within other columns of the dataframe.  It subsequently involves
> deleting some of the (now-redundant) columns.  If anyone could offer some
> helpful hints, I would be very appreciative!  Thank you in advance,
> regardless of reply, for taking the time to read through my plea.
>
> -jda
> --
> View this message in context:
> http://n4.nabble.com/changing-column-names-in-a-dataframe-tp1752034p1752034.html
> Sent from the R help mailing list archive at Nabble.com.
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  You are correct SAS no longer supports OS X under SAS-Proper. I use the
term SAS-Proper for base SAS with SAS-Connect. It does appear that some
improper SAS products are supported under MAC OS?


SAS releases JMP® 8 for Mac, Linux
Users of all major desktop operating systems can now explore and visualize
data through intuitive drag-and-drop interaction

CARY, NC  (Apr. 28, 2009)  –  SAS, the leader in business analytics, today
begins shipping JMP 8 statistical discovery software for Macintosh and
Linux. SAS also starts shipping JMP 64-Bit Edition, Version 8, for both of
these operating systems. These releases extend JMP’s groundbreaking new way
to explore and visualize data to more users. 

I get very confused with the myriad of SAS products.
-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752139.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread Desmond D Campbell

Dear JoAnn,

Thank you very much for your reply.
If that is the case I am surprised.
I would have though ML could incorporate study cases with some missingness
in them.
Furthermore I believe ML estimates should generally be more robust than
complete case based estimates.
For unbiased estimates I think
  ML requires the data is MAR,
  complete case requires the data is MCAR
Maybe it is more difficult to make the ML estimate on incomplete data than
I imagine. My knowledge is patchy.

Thanks again.

regards
Desmond

> Hello Desmond,
>
> The only way to not drop cases with incomplete data would be some sort
> of imputation for the missing covariates.
>
> JoAnn
>
> Desmond Campbell wrote:
>> Dear all,
>>
>> I want to do a logistic regression.
>> So far I've only found out how to do that in R, in a dataset of complete
>> cases.
>> I'd like to do logistic regression via max likelihood, using all the
>> study cases (complete and incomplete). Can you help?
>>
>> I'm using glm() with family=binomial(logit).
>> If any covariate in a study case is missing then the study case is
>> dropped, i.e. it is doing a complete cases analysis.
>> As a lot of study cases are being dropped, I'd rather it did maximum
>> likelihood using all the study cases.
>> I tried setting glm()'s na.action to NULL, but then it complained about
>> NA's present in the study cases.
>> I've about 1000 unmatched study cases and less than 10 covariates so
>> could use unconditional ML estimation (as opposed to conditional ML
>> estimation).
>>
>> regards
>> Desmond
>>
>>
>>
>
>
> --
> JoAnn Álvarez
> Biostatistician
> Department of Biostatistics
> D-2220 Medical Center North
> Vanderbilt University School of Medicine
> 1161 21st Ave. South
> Nashville, TN 37232-2158
>
> http://biostat.mc.vanderbilt.edu/JoAnnAlvarez
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Barry Rowlingson

On Mon, Apr 5, 2010 at 9:13 PM, Roger DeAngelis(xlr82sas)
 wrote:
>
> Hi,
>
> This is not meant to be critical of R, but is intended as
> a possible source for improvements to R.
> SAS needs the competition.
>
>
> I am reasonably knowledgeable about
>
> R
> SAS-(all products including IML)

> SAS has native low level functions like dcreate(create directory), fopen,
> fread..
> that can be used on all operating systems eliminating
> operating specific commands, like dir(windows), ls(unix) and ispf(3.4 on
> mainframes).

 ...so your reasonable knowledge of R doesn't extend to
help.search("file") and then typing "?files" then? :)

> SAS provides one IDE to multiple operating systems simultaneously.

 Seems to be two things going on here - file system access and SAS execution:

> data "c:\tmp\class.sas7bdat"; /* create sas dataset class in windows */
>  set unx.class; /* dataset class is in the UNIX work directory - not
> mounted in windows */
> run;

 - if this is some kind of abstraction of a network file system then
it's best done at the OS level. On my Linux box, for example, I can
use smbfs to connect to my Windows network file system. Then I can
read, write, edit those files with anything on my Linux box. Having an
application punch through a network to a custom server using some
unknown (possibly insecure) protocol for a specific purpose seems a
bit pointless when you can do it at the OS level using a secure
protocol (smbfs secure? Ummm transport it over ssh of course...)

> libname xls "c:\temp\test.xls"; /* does not have to exist */
>
> data xls.class;  /* create excel file under windows */
>  set unx.class; /* remote unix system - file system not mounted in windows
> */
> run;

 - Not sure I understand what these do. Is 'unx' a special word here?
Does the .class mean something? What is 'set' setting?

> Other functions I use all the time when coding SAS.
>
> 1. Highlight a block of code and hit F1 and the code is run
>   interactivel under windows.

> 3. Highlight and hit F3 and the code is run
>   interactively in unix.

Okay, what's going on here? You have a Windows box (presumably in
front of you) and a Unix box somewhere on the network. And hitting F1
runs it on the Windows box and hitting F4 magically runs it on the
Unix box? I'm guessing this isn't SAS straight-out-of-the-box, someone
has set this all up carefully (for example, how do you authenticate to
the Unix box?).

 This is actually a nice paradigm. Users are developing code on their
desktops with N=10 and then can launch it on the mainframe with
N=1 with a single button press.

Again, you could implement this at the operating system level with
ssh. I can do:

 ssh fnordbox R CMD BATCH analysis.R analysis.Rout

and it would run the job on my machine fnordbox. Obviously it would
need access to the .R and any data files but thanks to the miracle of
shared network file systems it can do that. It's not one-button
execution, but it's likely that any one-button execution you have is
hiding a lot of setup behind it - such as authorisation and
authentication to the remote system.

 Another way of doing this would be to use Rserve, a general R
execution server.

 Of course things on R, like many open-source projects, get done
either by people to scratch an itch they have themselves, or by people
paid to scratch other people's itches. I'm not sure duplicating some
of SAS's enterprisey functionality is likely to itch enough for people
to do it, when the cheaper option of learning the R (or Unixy) way of
doing it is much more flexible...

Barry

-- 
blog: http://geospaced.blogspot.com/
web: http://www.maths.lancs.ac.uk/~rowlings
web: http://www.rowlingson.com/
twitter: http://twitter.com/geospacedman
pics: http://www.flickr.com/photos/spacedman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread Desmond D Campbell

Dear Emmanuel,

Thank you.

Yes I broadly agree with what you say.
I think ML is a better strategy than complete case, because I think its
estimates will be more robust than complete case.
For unbiased estimates I think
  ML requires the data is MAR,
  complete case requires the data is MCAR

Anyway I would have thought ML could be done without resorting to Multiple
Imputation, but I'm at the edge of my knowledge here.

Thanks once again,

regards
Desmond


From: Emmanuel Charpentier  bacbuc.dyndns.org>
Subject: Re: logistic regression in an incomplete dataset
Newsgroups: gmane.comp.lang.r.general
Date: 2010-04-05 19:58:20 GMT (2 hours and 10 minutes ago)

Dear Desmond,

a somewhat analogous question has been posed recently (about 2 weeks
ago) on the sig-mixed-model list, and I tried (in two posts) to give
some elements of information (and some bibliographic pointers). To
summarize tersely :

- a model of "information missingness" (i. e. *why* are some data
missing ?) is necessary to choose the right measures to take. Two
special cases (Missing At Random and Missing Completely At Random) allow
for (semi-)automated compensation. See literature for further details.

- complete-case analysis may give seriously weakened and *biased*
results. Pairwise-complete-case analysis is usually *worse*.

- simple imputation leads to underestimated variances and might also
give biased results.

- multiple imputation is currently thought of a good way to alleviate
missing data if you have a missingness model (or can honestly bet on
MCAR or MAR), and if you properly combine the results of your
imputations.

- A few missing data packages exist in R to handle this case. My ersonal
selection at this point would be mice, mi, Amelia, and possibly mitools,
but none of them is fully satisfying(n particular, accounting for a
random effect needs special handling all the way in all packages...).

- An interesting alternative is to write a full probability model (in
BUGS fo example) and use Bayesian estimation ; in this framework,
missing data are "naturally" modeled in the model used for analysis.
However, this might entail *large* work, be difficult and not always
succeed (numerical difficulties. Furthermore, the results of a Byesian
analysis might not be what you seek...

HTH,

Emmanuel Charpentier

Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit :
> Dear all,
>
> I want to do a logistic regression.
> So far I've only found out how to do that in R, in a dataset of complete
cases.
> I'd like to do logistic regression via max likelihood, using all the
study cases (complete and
incomplete). Can you help?
>
> I'm using glm() with family=binomial(logit).
> If any covariate in a study case is missing then the study case is
dropped, i.e. it is doing a complete cases analysis.
> As a lot of study cases are being dropped, I'd rather it did maximum
likelihood using all the study cases.
> I tried setting glm()'s na.action to NULL, but then it complained about
NA's present in the study cases.
> I've about 1000 unmatched study cases and less than 10 covariates so
could use unconditional ML
estimation (as opposed to conditional ML estimation).
>
> regards
> Desmond
>
>
> --
> Desmond Campbell
> UCL Genetics Institute
> d.campb...@ucl.ac.uk
> Tel. ext. 020 31084006, int. 54006
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Stuart Luppescu

On Mon, 2010-04-05 at 23:12 +0100, Barry Rowlingson wrote:
> > 3. Highlight and hit F3 and the code is run
> >   interactively in unix.
> 
> Okay, what's going on here? You have a Windows box (presumably in
> front of you) and a Unix box somewhere on the network. And hitting F1
> runs it on the Windows box and hitting F4 magically runs it on the
> Unix box? I'm guessing this isn't SAS straight-out-of-the-box, someone
> has set this all up carefully (for example, how do you authenticate to
> the Unix box?).

This is probably the facility provided by ess under emacs. You can get
the same thing with R in ess by highlighting the code and pressing C-c
C-r (or C-c C-b to run the whole program).
-- 
Stuart Luppescu 
University of Chicago

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation of Matrix

2010-04-05 Thread David Winsemius



On Apr 5, 2010, at 4:58 PM, Ayush Raman wrote:


Hi all,

How can I have a permuted matrix where the second row is the  
permutation
over the first row ; third is the permutation of the second row;  
forth is

the permutation of the third row and so on ?


Wouldn't any of those permutations just be a permutation of the first  
row? Perhaps something like:



M <- matrix(first_row, length(first_row), length(first_row))
M[1, ] <- first_row
M[2:length(first_row), ] <- rbind(
  t(sapply(2:length(first_row), function(x) {
   sample(first_row, length(first_row),  
replace=FALSE) }

) )
 )



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  I am using SSH, however I do have to set up a SAS Spawner on the remote
host and use SAS remote library services. I also have to have listeners on
both client and host?

  I am not a systems guy, so I do not know exactly how SAS makes the remote
libraries available to windows.

  It is a little trickier to get UNIX to recognize my windows 'c' drive.

  It is quite powerful to take the idiosyncracies of operating systems out
of the picture, I think perl is little closer to doing this than R. The stat
function in perl appears to be able to operate on windows and unix
filesystems, however separately. Many SAS objects like datasets are exactly
the same between windows(32/64) and corresponding unix systems. No big
endian little endian stuff to worry about.

Here is my program  
  
%put &sysscp;/* Show OS */
%let pth=%sysfunc(pathname(unx)); /* SHOW UNIX PATH */
%put pth=&pth;   
data "c:\temp\class.sas7bdat";   /* WINDOWS */
  set unx.class; 
run;   

Here is the log showing I am running under windows but input is unix

28 %put &sysscp;
SYMBOLGEN:  Macro variable SYSSCP resolves to WIN
WIN
29 %let pth=%sysfunc(pathname(unx));
30 %put pth=&pth;
SYMBOLGEN:  Macro variable PTH resolves to
/workspace/SAS_work90424DF5_zeus_unix
pth=/workspace/SAS_work90424DF5_global_unix_server
31 data "c:\temp\class.sas7bdat";
32   set unx.class;
33 run;
  
NOTE: There were 19 observations read from the data set UNX.CLASS.
NOTE: The data set c:\temp\class.sas7bdat has 19 observations and 5
variables.
NOTE: DATA statement used (Total process time):
  real time   0.04 seconds
  cpu time0.01 seconds

I can even concatenate windows and unix SAS datasets.



-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752185.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  Here is the SAS command macro that reads what I highlight in my editor and
prints 40 observations from the highlighted dataset after hitting the
function key F4

F4 stores the highlighted text in the clipboard then executes the command
macro, the rsubmit executes the code on the unix zeus server. (Slight
modification on the rsubmit for clarity only)

Function key setting

F4 store;note;notesubmit '%uxp';   /* 40  obs from highlighted
dataset   */   

Command macro

%macro uxp; 
FILENAME clp clipbrd ;  
DATA _NULL_;
  INFILE clp;   
  INPUT;
  put _infile_; 
  call symputx('argx',_infile_);
RUN;
dm "out;clear;";
%put argx=&argx.;   
%syslput argx=&argx;
rsubmit zeus;
footnote;   
title "Up to 40 obs from &argx";
options nocenter;   
proc print data=&argx( Obs= 40 ) width=min; 
format _all_;   
run;
endrsubmit zeus; 
dm "out;top";   
%mend uxp;  

-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] new to R, analysis of latency data

2010-04-05 Thread Dennis Murphy

Hi:

I'll use some fake data to show you how to get the plots. To get the data
from Excel into
R, there are several ways to do it: converting the Excel file into csv and
using read.csv() in
R is one method and the XLSReadWrite package is another. Here's a link from
the R Wiki:
http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=read%20excel

It sounds like what you want is a plot of the 95th percentiles of the
latency by
transaction rate for each transaction type. Here are a couple of ways to do
that using
some fake data.

df <- data.frame(rate = rep(seq(10, 50, by = 10), each = 12),
 type = rep(rep(c('Order', 'ACK', 'FILL'), each = 4), 5),
 latency = rpois(60, 3))
library(ggplot2)
library(lattice)

# get 95th percentiles
# Several ways to do this; I'll use ddply() from the plyr package (which
loads with ggplot2)
latq95 <- ddply(df, .(rate, type), summarise, lq95 = quantile(latency,
0.95))

# ggplot2 way to get the graph:
p <- ggplot(latq95, aes(x = rate, y = lq95, shape = type))
p + geom_point(size = 2) + geom_line() +
  xlab('Transaction rate') + ylab('95th percentile')

# lattice method of getting the graph:
xyplot(lq95 ~ rate, data = latq95, groups = type, type = c('p', 'l'),
   xlab = 'Transaction rate', ylab = '95th percentile',
   auto.key = list(text = levels(latq95$type), space = 'right',
  points = FALSE, lines = TRUE))

Hope this is what you were after...

Dennis


On Mon, Apr 5, 2010 at 1:55 PM, jeff d wrote:

>
> Hi,
>
>  I'd like to move from excel to R because our dataset are so large. Here's
> what my data looks like:
>
> Transaction Rate   Run#   Transaction TypeLocationLatency
> in
> Seconds
>   101   Order
> A0
>   101   Order
> B3
>   101   Order
> C1
>   101   Order
> D2
>   101   ACK
> A0
>   101   ACK
> B5
>   101   ACK
> C2
>   101   ACK
> D2
>   101   FILL
> A0
>   101   FILL
> B2
>   101   FILL
> C3
>   101   FILL
> D2
>
> - we have about 1000 runs per transaction rate (run# 1..1000)
> - we have 50 transaction rates (transaction rate 10..500 incrementing by
> 10)
>
> We'd like to be able to create a graph where:
> - Y axis = 95 pecentile latency of transaction type data (order, ack, fill)
> - X axis = transaction rate
>
> I've read the basic doc, created some simple plots, could someone get me
> going in the right direction?
>
> tia,
> jd
>
>
>
> --
> View this message in context:
> http://n4.nabble.com/new-to-R-analysis-of-latency-data-tp1752096p1752096.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Permutation of Matrix

2010-04-05 Thread Marc Schwartz

On Apr 5, 2010, at 5:25 PM, David Winsemius wrote:

> 
> On Apr 5, 2010, at 4:58 PM, Ayush Raman wrote:
> 
>> Hi all,
>> 
>> How can I have a permuted matrix where the second row is the permutation
>> over the first row ; third is the permutation of the second row; forth is
>> the permutation of the third row and so on ?
> 
> Wouldn't any of those permutations just be a permutation of the first row? 
> Perhaps something like:
> 
> 
> M <- matrix(first_row, length(first_row), length(first_row))
> M[1, ] <- first_row
> M[2:length(first_row), ] <- rbind(
>  t(sapply(2:length(first_row), function(x) {
>   sample(first_row, length(first_row), replace=FALSE) }
>) )
> )


How about something like this:

# set first row entries
row1 <- 1:6

# Total desired rows, including above
totalrows <- 5


set.seed(1)

> rbind(row1, t(replicate(totalrows - 1, sample(row1))), deparse.level = 0)
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]123456
[2,]263415
[3,]643152
[4,]524631
[5,]345126

?

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] predict.lm

2010-04-05 Thread Luis Felipe Parra

Hello I am trying to use predict.lm, but I am having trouble getting out of
sample predictions. I am getting the same output if I use the following
three commands:

predict(ModeloLineal,predictors[721:768,])
predict(ModeloLineal,predictors[1:768,])
predict(ModeloLineal)

where ModeloLineal is the output from ModeloLineal<-lm(dataTS[,6] ~
predictors[1:720,]), so the first 720 observations of predictors i would
like to use them to build my model and the other 48 I would like to use them
to have an out of sample forecast, Do you know how can I do this?

Thank you

Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Agnes in Cluster Package and index.G1 in the clusterSim package questions

2010-04-05 Thread Pancho Aguirre

Dear R Users:
 
I am new to R and I am trying to do a cluster analysis on a
single continuous variable using the Agnes [Agglomerative Nesting (Hierarchical
Clustering) ] in  the Package âclusterâ.   I was able to apply this 
clustering method to
my data:

ward1 <- Agnes(balances, diss= FALSE, metric =
"euclidean", stand = TRUE, method = "ward", keep.diss
=TRUE, keep.data = TRUE)


Now I am wondering if someone can point me in the right direction
with the following: 

1.1.   How do I determine the number of clusters?  I would like to 
apply the Calinkski and
Harabasz (1974) stopping rule recommended in the Monte Carlo Simulation by 
Milligan
and Cooper (1985).

2.   2.  How do I obtain the clustering rules? (I.e. how
do I assign my observations to the cluster from my final solution?).


I was able to find  the Package âclusterSimâ which Calculates
Calinski-Harabasz pseudo F-statistic but I am having some difficulty
substituting in the CL (A vector of integers indicating the cluster to which
each object is allocated) argument  in
the formula:
index.G1 (x,cl,d=NULL,centrotypes="centroids")
 
Thanks in advance for any help,
 
Sincerely,
 
Pancho Aguirre


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread Erik Wright

Hi Charlie,

Thanks, I got it working by looking at your myRunIfConcise function.

SEXP changePercent(SEXP pBar)
{
int *rPercentComplete;
SEXP utilsPackage, percentComplete;
PROTECT(utilsPackage = eval(lang2(install("getNamespace"), 
ScalarString(mkChar("utils"))), R_GlobalEnv));
*rPercentComplete = 10; //this value increments
eval(lang4(install("setTxtProgressBar"), pBar, percentComplete, 
R_NilValue), utilsPackage);
}

Now I am wondering why the pBar doesn't show initially if the percent complete 
is 0%?

Thanks again,
Erik



On Apr 5, 2010, at 4:00 PM, Sharpie wrote:

> 
> 
> Erik Wright wrote:
>> 
>> Hi Charlie,
>> 
>> I like your idea of updating an R progress bar from C, but I don't at all
>> understand how to call txtProgressBar from C.  I have looked at Writing R
>> Extensions and it is equally confusing.  Any help would be appreciated.
>> 
>> Thanks!,
>> Erik
>> 
> 
> Hi Erik,
> 
> Did you look at the link I put in my last post?
> 
> http://n4.nabble.com/Writing-own-simulation-function-in-C-td1580190.html#a1580423
> 
> I linked that post because it gives a step-by-step example for performing a
> callback to the runif function from C.  Just  change the execution
> environment from stats to utils and alter the function arguments
> appropriately and it should work for txtProgressBar and setTxtProgressBar.
> 
> -Charlie
> 
> -
> Charlie Sharpsteen
> Undergraduate-- Environmental Resources Engineering
> Humboldt State University
> -- 
> View this message in context: 
> http://n4.nabble.com/Rprintf-not-updating-tp1751703p1752102.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  One other point.

  The connection I have with mutiple servers is persistent the windows SAS
executable is is constant contact with all the SAS server executables.

  Also I can submit a job where unix code is interspersed with windows code.

  I do execute R and perl from SAS using pipes. I am experimenting with SAS
Java objects for tighter intergration of SAS with perl and R.

  As a side note in the pharmaceutical industry we can often develop under
windows, but production code must be Unix and the logs must be absolutely
clean. 

 One drawback is SAS handling of interrupts, I sometimes have difficulty
interrupting a server job from windows.


-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752233.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

2010-04-05 Thread Thomas Lumley


On Mon, 5 Apr 2010, Desmond D Campbell wrote:


Dear Emmanuel,

Thank you.

Yes I broadly agree with what you say.
I think ML is a better strategy than complete case, because I think its
estimates will be more robust than complete case.
For unbiased estimates I think
 ML requires the data is MAR,
 complete case requires the data is MCAR

Anyway I would have thought ML could be done without resorting to Multiple
Imputation, but I'm at the edge of my knowledge here.


This is an illustration of why Rubin's hierarchy, while useful, doesn't 
displace actual thinking about the problem.

The maximum-likelihood problem for which the MAR assumption is sufficient 
involves specifying the joint likelihood for the outcome and all predictor 
variables, which is basically the same problem as multiple imputation.  
Multiple imputation averages the estimate over the distribution of the unknown 
values; maximum likelihood integrates out the unknown values, but for 
reasonably large sample sizes the estimates will be equivalent (by asymptotic 
linearity of the estimator).  Standard error calculation is probably easier 
with multiple imputation.


Also, it is certainly not true that a complete-case regression analysis 
requires MCAR.  For example, if the missingness is independent of Y given X, 
the complete-case distribution will have the same mean of Y given X  as the 
population and so will have the same best-fitting regression.   This is a 
stronger assumption than you need for multiple imputation, but not a lot 
stronger.

-thomas



Thanks once again,

regards
Desmond


From: Emmanuel Charpentier  bacbuc.dyndns.org>
Subject: Re: logistic regression in an incomplete dataset
Newsgroups: gmane.comp.lang.r.general
Date: 2010-04-05 19:58:20 GMT (2 hours and 10 minutes ago)

Dear Desmond,

a somewhat analogous question has been posed recently (about 2 weeks
ago) on the sig-mixed-model list, and I tried (in two posts) to give
some elements of information (and some bibliographic pointers). To
summarize tersely :

- a model of "information missingness" (i. e. *why* are some data
missing ?) is necessary to choose the right measures to take. Two
special cases (Missing At Random and Missing Completely At Random) allow
for (semi-)automated compensation. See literature for further details.

- complete-case analysis may give seriously weakened and *biased*
results. Pairwise-complete-case analysis is usually *worse*.

- simple imputation leads to underestimated variances and might also
give biased results.

- multiple imputation is currently thought of a good way to alleviate
missing data if you have a missingness model (or can honestly bet on
MCAR or MAR), and if you properly combine the results of your
imputations.

- A few missing data packages exist in R to handle this case. My ersonal
selection at this point would be mice, mi, Amelia, and possibly mitools,
but none of them is fully satisfying(n particular, accounting for a
random effect needs special handling all the way in all packages...).

- An interesting alternative is to write a full probability model (in
BUGS fo example) and use Bayesian estimation ; in this framework,
missing data are "naturally" modeled in the model used for analysis.
However, this might entail *large* work, be difficult and not always
succeed (numerical difficulties. Furthermore, the results of a Byesian
analysis might not be what you seek...

HTH,

Emmanuel Charpentier

Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit :

Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset of complete

cases.

I'd like to do logistic regression via max likelihood, using all the

study cases (complete and
incomplete). Can you help?


I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is

dropped, i.e. it is doing a complete cases analysis.

As a lot of study cases are being dropped, I'd rather it did maximum

likelihood using all the study cases.

I tried setting glm()'s na.action to NULL, but then it complained about

NA's present in the study cases.

I've about 1000 unmatched study cases and less than 10 covariates so

could use unconditional ML
estimation (as opposed to conditional ML estimation).


regards
Desmond


--
Desmond Campbell
UCL Genetics Institute
d.campb...@ucl.ac.uk
Tel. ext. 020 31084006, int. 54006




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE

[R] strange behavior of matrix

2010-04-05 Thread William Revelle


Dear R list,

I have discovered a seemingly peculiar feature when using a matrix to 
index itself (yes, this is strange code, which I have now modified to 
be more reasonable).


 #this makes sense
 s <- matrix(1:3,nrow=1)
 s[s]#all three elements are shown

 #but when I try
 s <- matrix(1:2,nrow=1)
 s[1] #fine, the first element is shown
 s[2] #fine, the second element is shown
 s[s] #just the second element is shown  -- this is peculiar


 #But doing it by columns works for both cases
 s <- matrix(1:3,ncol=1)
 s[s]#all three elements are shown


 #and when I try the same problem down a column
 s <- matrix(1:2,ncol=1)
 s[1] #fine

 s[2] #fine

 s[s] #this shows both elements  as would be expected

 #clearly since I have just one dimension, it would have been better to
 s <- 1:2
 s[s]   #which works as one would expect.

Or, using the array  function we get the same problem.


 s <- array(1:2,dim=c(1,2))
 s[s]

[1] 2

 s <- array(1:2,dim=c(2,1))
 s[s]

[1] 1 2



 sessionInfo()

R version 2.11.0 Under development (unstable) (2010-03-24 r51389)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] psych_1.0-87




I think this is unexpected behavior.

Best wishes,

Bill


--
William Revelle http://personality-project.org/revelle.html
Professor   http://personality-project.org
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Use R for psychology   http://personality-project.org/r
It is 6 minutes to midnight http://www.thebulletin.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  One final point about persistent simultaneous environments.

  In windows I sit in my development directory(PWD) and simultaneously my
Unix session sits in my production directory(PWD). This simplifies
versioning,  promotion to production and batch execution.

  My Unix session is persistent so my work directory, macro variables,
configuration settings persist form one interactive submission to the next.
I can easily change things like debugging options for just one block of
code. If I want to run some code in the 'clean production environment' I
just submit it to unix batch.

 I think all the suggestions so far have been 'batch' execution on the
server.

Regards

 
-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752255.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Daniel Nordlund

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Roger DeAngelis(xlr82sas)
> Sent: Monday, April 05, 2010 3:35 PM
> To: r-help@r-project.org
> Subject: Re: [R] SAS and R on multiple operating systems
> 
> 
> Hi,
> 
>   I am using SSH, however I do have to set up a SAS Spawner on the remote
> host and use SAS remote library services. I also have to have listeners on
> both client and host?
> 
>   I am not a systems guy, so I do not know exactly how SAS makes the
> remote
> libraries available to windows.
> 
>   It is a little trickier to get UNIX to recognize my windows 'c' drive.
> 
>   It is quite powerful to take the idiosyncracies of operating systems out
> of the picture, I think perl is little closer to doing this than R. The
> stat
> function in perl appears to be able to operate on windows and unix
> filesystems, however separately. Many SAS objects like datasets are
> exactly
> the same between windows(32/64) and corresponding unix systems. No big
> endian little endian stuff to worry about.

Well, this is not exactly correct. SAS datasets on Windows 32-bit and 64-bit 
systems do use a different file format, and the format is different from 
Unix/Linux systems.  SAS will automagically use the cross-environment data 
access (CEDA) engine, where possible, for reading SAS datasets created in 
different environments.  So what you describe below can be done, but only 
because SAS handles the file conversion relatively transparently.  And while 
the CEDA engine allows one to read datasets from other environments, there can 
be a performance hit when doing so.

> 
> Here is my program
> 
> %put &sysscp;/* Show OS */
> %let pth=%sysfunc(pathname(unx)); /* SHOW UNIX PATH */
> %put pth=&pth;
> data "c:\temp\class.sas7bdat";   /* WINDOWS */
>   set unx.class;
> run;
> 
> Here is the log showing I am running under windows but input is unix
> 
> 28 %put &sysscp;
> SYMBOLGEN:  Macro variable SYSSCP resolves to WIN
> WIN
> 29 %let pth=%sysfunc(pathname(unx));
> 30 %put pth=&pth;
> SYMBOLGEN:  Macro variable PTH resolves to
> /workspace/SAS_work90424DF5_zeus_unix
> pth=/workspace/SAS_work90424DF5_global_unix_server
> 31 data "c:\temp\class.sas7bdat";
> 32   set unx.class;
> 33 run;
> 
> NOTE: There were 19 observations read from the data set UNX.CLASS.
> NOTE: The data set c:\temp\class.sas7bdat has 19 observations and 5
> variables.
> NOTE: DATA statement used (Total process time):
>   real time   0.04 seconds
>   cpu time0.01 seconds
> 
> I can even concatenate windows and unix SAS datasets.
> 
> 

Hope this helpful,

Dan

Daniel Nordlund
Bothell, WA USA
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange behavior of matrix

2010-04-05 Thread Phil Spector


William -
   An interesting feature of matrix indexing in R is that
if you provide a two column matrix as a subscript, you are
refering to the elements whose indices are in the rows
of the matrix.  This is extremely handy for converting 
tables to matrices:



m = cbind(c(1,1,1,2,2,2,3,3,3),c(1,2,3,1,2,3,1,2,3),c(10,19,8,14,12,6,17,9,2))
m

  [,1] [,2] [,3]
 [1,]11   10
 [2,]12   19
 [3,]138
 [4,]21   14
 [5,]22   12
 [6,]236
 [7,]31   17
 [8,]329
 [9,]332

newmat = matrix(0,3,3)
newmat[m[,1:2]] = m[,3]
newmat

 [,1] [,2] [,3]
[1,]   10   198
[2,]   14   126
[3,]   1792

It's also handy for extracting a vector with just the elements you want:


newmat[cbind(c(1,2,3),c(2,3,1))]

[1] 19  6 17  # 1,2 2,3 3,1 elements

So it's a bit surprising when you index a matrix with a 2 column 
matrix, but it is a documented fact.


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu




On Mon, 5 Apr 2010, William Revelle wrote:


Dear R list,

I have discovered a seemingly peculiar feature when using a matrix to index 
itself (yes, this is strange code, which I have now modified to be more 
reasonable).


#this makes sense
s <- matrix(1:3,nrow=1)
s[s]#all three elements are shown

#but when I try
s <- matrix(1:2,nrow=1)
s[1] #fine, the first element is shown
s[2] #fine, the second element is shown
s[s] #just the second element is shown  -- this is peculiar


#But doing it by columns works for both cases
s <- matrix(1:3,ncol=1)
s[s]#all three elements are shown


#and when I try the same problem down a column
s <- matrix(1:2,ncol=1)
s[1] #fine

s[2] #fine

s[s] #this shows both elements  as would be expected

#clearly since I have just one dimension, it would have been better to
s <- 1:2
s[s]   #which works as one would expect.

Or, using the array  function we get the same problem.


 s <- array(1:2,dim=c(1,2))
 s[s]

[1] 2

 s <- array(1:2,dim=c(2,1))
 s[s]

[1] 1 2



 sessionInfo()

R version 2.11.0 Under development (unstable) (2010-03-24 r51389)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] psych_1.0-87




I think this is unexpected behavior.

Best wishes,

Bill


--
William Revelle http://personality-project.org/revelle.html
Professor   http://personality-project.org
Department of Psychology http://www.wcas.northwestern.edu/psych/
Northwestern University http://www.northwestern.edu/
Use R for psychology   http://personality-project.org/r
It is 6 minutes to midnight http://www.thebulletin.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rprintf not updating

2010-04-05 Thread Sharpie

Erik Wright wrote:
> 
> Hi Charlie,
> 
> Thanks, I got it working by looking at your myRunIfConcise function.
> 
> SEXP changePercent(SEXP pBar)
> {
>   int *rPercentComplete;
>   SEXP utilsPackage, percentComplete;
>   PROTECT(utilsPackage = eval(lang2(install("getNamespace"),
> ScalarString(mkChar("utils"))), R_GlobalEnv));
>   *rPercentComplete = 10; //this value increments
>   eval(lang4(install("setTxtProgressBar"), pBar, percentComplete,
> R_NilValue), utilsPackage);
> }
> 
> Now I am wondering why the pBar doesn't show initially if the percent
> complete is 0%?
> 
> Thanks again,
> Erik
> 

I did a quick check on the R side of things and txtProgressBars do not
display until after the first call to setTxtProgressBar unless the initial
parameter is set to something other than 0 (possible bug?).  Try executing a
call immediately after creation to set the value to 0.

-Charlie

-
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
-- 
View this message in context: 
http://n4.nabble.com/Rprintf-not-updating-tp1751703p1752289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

 You are absolutely correct about 32/64 and it appears to be a severe
penalty. But I think 32 to 32(win/unix) does not incur the penalty. There
are even more issues between mainframe and unix/windows. The 32 to 64 is a
big hit when querying data dictionaries that have a mixture of 32/64 bit SAS
objects.

As a side note,  I could reload the R workspace between batch submissions to
maintain consistent environments.

I do support quite a few statisticians and I try to get them to think more
in terms of R especially those that use IML, but unlike Frank I feel there
are several areas where SAS is the more logical solution. I think SAS(9.22)
is a little ahead of R in SVG graphics(the future) and graphic editors and
unlike Frank I don't think R and LaTex is more powerful than ods/tagsets and
SAS reporting/graph procedures. Imbeding graphic objects inside proc report
is powerful, see

http://homepage.mac.com/magdelina/.Public/met_may_103.rtf

And I know SAS  allows some merging of cells in proc report so arbitrary
objects, like graphs can be merged into proc report.

The pharma industry externals tend to be MS-Word and Excel, although final
FDA documents are usually pdf. So whatever I do I have to make my final
product look good in word. This can be a real challenge because of words
faulty import engines. I would have preferred postscript where R would be a
real contender.

I also remember that SAS won a recent graphics award and I think is was
competing against R.
I don't have the link and I hope I am correct.

see for some SAS graphics

http://robslink.com/SAS/Home.htm

The real strength of R is in the flexibilty of its statistical functions.
Sometimes SAS makes the wrong or illogical decisions with its canned
routines.






 
-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752294.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] use of random and nested factors in lme

2010-04-05 Thread Kingsford Jones

On Mon, Apr 5, 2010 at 12:21 PM, Joris Meys  wrote:
> Dear all,
>
> I've read numerous posts about the random and nested factors in lme,
> comparison to proc Mixed in SAS, and so on, but I'm still a bit confused by
> the notations. More specifically, say we have a model with a fixed effect F,
> a random effect R and another one N which is nested in R.
>
> Say the model is described by Y~F
> Can anyone clarify the difference between :
> random = ~1|R:N

Here you ask to estimate the variance in random intercepts between
levels created by the interaction of the R and N factors (note
anything on the RHS of the "|" must be factor (or can be converted)
since this specifies the groups between which the random effects vary)

> random = ~1|R/N

The RHS can be read as "R and N %in% R".  So this is as above, but
additionally estimate the variance in random intercepts between the
levels of R.

> random = ~R:N
> random = ~R/N

In the above formulas you need to specify which effects you're
interested in; as is they only specify grouping.

> random = ~R|N

Because the intercept is implicit unless explicitly removed (with +0
or -1), this requests estimation of the variance in random intercepts
between levels of N, as well as either variance in random slopes
between levels of N if R is numeric, or variation in random effects
associated with levels of the R factor between levels of N if R is a
factor (resulting in as many variance components being estimated as
there are levels of R).  Note that as with factor-level effects in the
fixed portion of the model, what is being estimated will depend on the
contrast coding for the R factor.  Also, with factor random effects
specifications it is often sensible to remove the intercept (e.g., ~0
+ R|N) to estimate random variation associated with a specific level
rather than variation in, e.g., difference from a baseline level.  The
interactions between contrast coding, mean structure and in/exclusion
of intercepts is too much to go into here, but hopefully this gives
the gist of concepts.

> random = ~1|R+N
>
> or direct me to an overview regarding notation of these formulas in lme
> (package nlme)? The help files weren't exactly clear to me on this subject.

IMO Pinheiro and Bates' companion book to nlme is a prerequisite for
efficient use of their software.

hoping this helps,

Kingsford Jones

>
> What confuses me most, is the use of the intercept in the random factor.
> Does this mean the intercept is seen as random, has a random component or is
> it just notation? In different mails from this list I found different
> explanations.
>
> Thank you in advance.
> Cheers
> Joris
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Gabor Grothendieck

On Mon, Apr 5, 2010 at 8:34 PM, Roger DeAngelis(xlr82sas)
 wrote:
>
> Hi,
>
>  You are absolutely correct about 32/64 and it appears to be a severe
> penalty. But I think 32 to 32(win/unix) does not incur the penalty. There
> are even more issues between mainframe and unix/windows. The 32 to 64 is a
> big hit when querying data dictionaries that have a mixture of 32/64 bit SAS
> objects.
>
> As a side note,  I could reload the R workspace between batch submissions to
> maintain consistent environments.
>
> I do support quite a few statisticians and I try to get them to think more
> in terms of R especially those that use IML, but unlike Frank I feel there
> are several areas where SAS is the more logical solution. I think SAS(9.22)
> is a little ahead of R in SVG graphics(the future) and graphic editors and
> unlike Frank I don't think R and LaTex is more powerful than ods/tagsets and
> SAS reporting/graph procedures. Imbeding graphic objects inside proc report
> is powerful, see
>
> http://homepage.mac.com/magdelina/.Public/met_may_103.rtf

This seems quite comparable:

library(meta)
example(forest)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Douglas Bates

On Mon, Apr 5, 2010 at 3:13 PM, Roger DeAngelis(xlr82sas)
 wrote:
>
> Hi,
>
> This is not meant to be critical of R, but is intended as
> a possible source for improvements to R.
> SAS needs the competition.
>
>
> I am reasonably knowledgeable about
>
> R
> SAS-(all products including IML)
>
> SAS and R run on
>
> Windows(all flavors)
> UNIX(all flavors)
> Apple OSs
>
> Does R run on natively (no emulation)?
> We have quite a few users on these systems
>
> VAX-VMS
> Z-OS (mainframe)
> MVS
> VM/CMS(IBM)
...

Just out of curiosity, do you really know of anyone who is running a
VAX-VMS system at present?  My recollection of VAX minicomputers and
VaxStations is that they would be considerably less powerful, have
much less memory and possibly less disk space than a $250 netbook.
The first VAX at our university, a VAX-11/780, was allegedly a "1
MIPS" machine but it was pretty difficult to get it to do a million of
any instruction in one second.  I eventually succeeded because it had
a special instruction for decrement and branch on zero so if you
loaded a register with 10 and put it in a tight loop of that one
instruction, it was able to count from 10 down to zero within a
second.  When I bought the second Vax on the campus, a less powerful
Vax-11/750, I was considered extravagant because I equipped it with a
*second* megabyte of memory (at a cost of about $9000).

I recently bought an Intel Atom and Nvidia ION based "nettop" computer
for $330.  That computer has a dual-core 1.6 GHz processor, plus 2GB
of memory and 160 GB SATA hard drive.  I expect the electricity and
air conditioning bill for running a VAX would probably hit $300 after
a few months, and that isn't even counting the cost of buying it and
paying for (probably very old) operators to take care of it.  And for
that you would get a computer that is less than 1/100th as powerful as
the micro desktop.  Why would anyone do that?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  Again I slight mistake on my part. One of the largest Pharmaceutical
Contract companies uses VMS. I erroneously added the VAX in front from my
memory. I don't want to mention the company, but if anyone else there is
familiar with contact pharma companies, one of these uses
VMS(extensively-probably not on a VAX), ICON, PPD, PRA, Quintile or Covance.

Here is an excerpt from a recent email 

I WILL BE OUT OF OFFICE 3/24-3/31 AND RETURN ON 4/1.  
I WILL NOT HAVE ACCESS TO EMAILS AND VMS.

FOR USING THE REFLECTION "HOST - UNIX AND OPENVMS"

I also know the programmer uses SAS on VMS

Regards

-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752353.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Roger DeAngelis(xlr82sas)


Hi,

  About the forest plot.

   Some Phrarma companies demand the report and graphics follow very
restrictive layouts.

   SAS allows uses to use one template for graphs and tables.

   Margins have to the same for all reports.
   Fonts, fontsizes, linewidths, boxing body, cell spacing, cell padding ...

   The output has to look perfect in MS-Word.

I am not a R graphics expert but I tried to run the forest plot and
output the graphs to png and svg.
In both cases the plots were mangled. This was probably not fair, because it
looked like the SVG issue was just a font size issue and the png issues was
a width issue, but I know SAS tends to produce graphs that can be scaled
post processing.

I also imported the png into word and is was right clipped.

 devSVG(file = "c:\\temp\\Rplots.svg", width = 10, height = 8,  
 bg = "white", fg = "black", onefile=TRUE, xmlHeader=TRUE)  
 example(forest)
 dev.off()  

png("c:\\temp\\myplot.png", bg="transparent", width=700, height=500)
example(forest) 
dev.off()   

 Also I have seen 5,000 page listings in SAS.



-- 
View this message in context: 
http://n4.nabble.com/SAS-and-R-on-multiple-operating-systems-tp1752043p1752362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error with read.csv.sql on processing large file

2010-04-05 Thread Moiz Saifee

Hi,

I was trying to read & filter records from a large file using read.csv.sql.
I was successfully able to do that with a ~1 GB file. However I get the
following error with a >2 GB file which has exactly the same structure as
the first file.

*Error in try({ : *
*  RS-DBI driver: (RS_sqlite_import: cannot open file  )*

-I have checked the file permission and the file exists in the directory.
-The error comes instantly, so I am assuming its not due to memory.
-If I skip significant number of rows(using skip= statement) from the > 2GB
file, I don't get the error.

I can do a quick and dirty fix by splitting the file into two and by
performing two separate reads, but may be thats not so elegant. Can somebody
please help?

--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SAS and R on multiple operating systems

2010-04-05 Thread Peter Dalgaard

Douglas Bates wrote:
> The first VAX at our university, a VAX-11/780, was allegedly a "1
> MIPS" machine but it was pretty difficult to get it to do a million of
> any instruction in one second.  I eventually succeeded because it had
> a special instruction for decrement and branch on zero so if you
> loaded a register with 10 and put it in a tight loop of that one
> instruction, it was able to count from 10 down to zero within a
> second.  When I bought the second Vax on the campus, a less powerful
> Vax-11/750, I was considered extravagant because I equipped it with a
> *second* megabyte of memory (at a cost of about $9000).

You will probably remember the unfortunate name-collision with an
industrial strength vacuum cleaner. The latter alledgedly had a large
advertising campaign with the slogan "Nothing sucks like a VAX".

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

100 matches

Mail list logo