Re: [R] How to test if there is a subvector in a longer vector

2012-09-28 Thread Berend Hasselman

On 28-09-2012, at 07:41, Atte Tenkanen  wrote:

> Sorry. I should have mentioned that the order of the components is important.
> 
> So c(1,4,6) is accepted as a subvector of c(2,1,1,4,6,3), but not of 
> c(2,1,1,6,4,3).
> 
> How to test this?

See this discussion for a variety of solutions.

http://r.789695.n4.nabble.com/matching-a-sequence-in-a-vector-td4389523.html#a4393453

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Krunal Nanavati
Hi Rui,

Excellent!!  This is what I was looking for. Thanks for the help.

So, now I have stored the result of the 10 regressions in  "summ.list
<- lapply(lm.list2, summary)"

And now once I enter" sum.list "it gives me the output for all
the 10 regressions...

I wanted to access a beta coefficient of one of the regressionssay
"Price2+Media1+Trend+Seasonality"...the result of which is stored in "
sum.list[2] "

I entered the below statement for accessing the Beta coefficient for
Price2...

> summ.list[2]$coefficients[2]
NULL

But this is giving me " NULL " as the output...

What I am looking for, is to access a beta value of a particular variable
from a particular regression output and use it for further analysis.

Can you please help me out with this. Greatly appreciate, you guys
efforts.




Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 21:55
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Inline.
Em 27-09-2012 13:52, Krunal Nanavati escreveu:
> Hi,
>
> Thanks for all your help. I am stuck again, but with a new problem, on
> similar lines.
>
> I have taken the problem to the next step now...i have now added 2 "for"
> loops... 1 for the Price variable...and another for the Media variable
>
> I have taken 5 price variables...and 2 media variables with the "trend
> and seasonality"(appearing in all of them)so in all there will be
> 10 regression to run now
>
> Price 1, Media 1
>
> Price 1, Media 2
>
> Price 2, Media 1'
>
> Price 2, Media 2
>
> ...and so on
>
> I have built up a code for it...
>
>
>
>
>> tryout=read.table("C:\\Users\\Krunal\\Desktop\\R
> tryout.csv",header=T,sep=",")
>> cnames <- names(tryout)
>> price <- cnames[grep("Price", cnames)] media <- cnames[grep("Media",
>> cnames)] resp <- cnames[1] regr <- cnames[7:8] lm.list <-
>> vector("list", 10) for(i in 1:5)
> + {
> + regress <- paste(price[i], paste(regr, collapse = "+"), sep = "+")
> + for(j in 1:2) {
> + regress1 <- paste(media[j],regress,sep="+") fmla <- paste(resp,
> + regress1, sep = "~") lm.list[[i]] <- lm(as.formula(fmla), data =
> + tryout) } }
>> summ.list <- lapply(lm.list, summary) summ.list
>
>
>
>
>
> But it is only running...5 regressions...only Media 1 along with the 5
> Price variables & Trend & Seasonality is regressed on Volume...giving
> only
> 5 outputs
>
> I feel there is something wrong with the" lm.list[[i]] <-
> lm(as.formula(fmla), data = tryout)"   statement.

No, I don't think so. If it's giving you only 5 outputs the error is
probably in the fmla construction. Put print statements to see the results
of those paste() instructions.

Supposing your data.frame is now called tryout2,


price <- paste("Price", 1:5, sep = "")
media <- paste("Media", 1:2, sep = "")
pricemedia <- apply(expand.grid(price, media, stringsAsFactors = FALSE),
1, paste, collapse="+")

response <- "Volume"
trendseason <- "Trend+Seasonality"  # do this only once

lm.list2 <- list()
for(i in seq_along(pricemedia)){
 regr <- paste(pricemedia[i], trendseason, sep = "+")
 fmla <- paste(response, regr, sep = "~")
 lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }

The trick is to use ?expand.grid

Hope this helps,

Rui Barradas

>   I am not sure about its
> placement...whether it should be in loop 2 or in loop 1
>
> Can you please help me out??
>
>
>
>
>
>
>
>
>
>
> Thanks & Regards,
>
> Krunal Nanavati
> 9769-919198
>
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: 27 September 2012 16:22
> To: David Winsemius
> Cc: Krunal Nanavati; r-help@r-project.org
> Subject: Re: [R] Running different Regressions using for loops
>
> Hello,
>
> Just to add that you can also
>
> lapply(lm.list, coef)
>
> with a different output.
>
> Rui Barradas
> Em 27-09-2012 09:24, David Winsemius escreveu:
>> On Sep 26, 2012, at 10:31 PM, Krunal Nanavati wrote:
>>
>>> Dear Rui,
>>>
>>> Thanks for your time.
>>>
>>> I have a question though, when I run the 5 regression, whose outputs
>>> are stored in "lm.list[i]", I only get the coefficients for the
>>> Intercept, Price, Trend & Seasonality as below
>>>
>>>
 lm.list[1]
>>> [[1]]
>>>
>>> Call:
>>>
>>> lm(formula = as.formula(fmla), data = tryout)
>>>
>>> Coefficients:
>>>
>>> (Intercept)   Price4Trend  Seasonality
>>>
>>>  9923123 -260682664616   551392
>> summ.list <- lapply(lm.list, summary) coef.list <- lapply(summ.list,
>> coef) coef.list
>>
>>> I am also looking out for t stats and p value and R squared.
>> For the r.squared
>>
>> rsq.vec <- sapply(summ.list, "$", "r.squared") adj.rsq <-
>> sapply(summ.list, "$", "adj.r.squared")
>>
>>> Do you know,
>>> how can I get all these statistics. Also, why is " as.formula " used
>>> in the lm function. It should work without that as well, right?
>> No.
>>> Can you pleas

[R] blank plot----how do I make symbols appear

2012-09-28 Thread Jessica da Silva
Hi,

I am trying to create a scatterplot, coding each point to one of 5
populations.  I was successful when I did this for one set of data, yet
when I try plotting other data a blank plot appears (although the axes are
labelled and I can fit the regression lines from each population).  I have
tried a variety of things to fix this but nothing seems to work.

I can plot the points if I do not specify that I want each population to
have a particular symbol. However, once I add the command [grip$Morph] to
my symbol parameter (e.g., pch=c(2,6,5,19,15) [grip$morph] ), I loose all
the points.  As I mentioned above, I was able to  create a plot
successfully using other data points from the same table (different
columns), so I know the data are fine.

Has anyone come across this before?


R-script used:

HAND<-AllMal[,c(2,4,5)]
na.omit(HAND)->HAND

write.csv(HAND, "grip.csv")

read.csv("grip.csv")->grip
grip
class(grip)
class(HAND)


grip$morph<-as.character(grip$Morph)

morph<- grip$morph
BML<-grip$BML
grip$MCF->MCF


reg1<-lm(BML~MCF,data=subset(grip,morph=="mel"));reg1
reg2<-lm(BML~MCF,data=subset(grip,morph=="tham"));reg2
reg3<-lm(BML~MCF,data=subset(grip,morph=="A"));reg3
reg4<-lm(BML~MCF,data=subset(grip,morph=="B"));reg4
reg5<-lm(BML~MCF,data=subset(grip,morph=="C"));reg5


plot(MCF,BML,pch=c(2,6,5,19,15)[grip$morph],xlab="Residual Metacarpal
Length",ylab="Residual Hand Strength (Broad Dowel)", main="Males")
abline(reg1,lty=1)
abline(reg2,lty=2)
abline(reg3,lty=3)
abline(reg4,lty=4)
abline(reg5,lty=6)

-- 
*Jessica da Silva*
PhD Candidate

Molecular Ecology & Evolution Program
Applied Biodiversity Research
Kirstenbosch Research Centre
South African National Biodiversity Institute

Postal address:
3 Sangster Road
Howick, KZN
3290

Home/Fax: +27 33 330 2230
Cell: +27 79 045 1781

Email: jessica.m.dasi...@gmail.com
  j.dasi...@sanbi.org.za

Website: http://jmdasilva.doodlekit.com/home/home

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] blank plot----how do I make symbols appear

2012-09-28 Thread Ken Knoblauch
Jessica da Silva  gmail.com> writes:
> I am trying to create a scatterplot, coding each point to 
one of 5
> populations.  I was successful when I did this for one 
set of data, yet
> when I try plotting other data a blank plot appears 
(although the axes are
> labelled and I can fit the regression lines from each
 population).  I

However, once I add the command [grip$Morph] to
> my symbol parameter (e.g., pch=c(2,6,5,19,15) [grip$morph] ),
 I loose all
> the points.  As I mentioned above, I was able 
to  create a plot
> successfully using other data points from the 
same table (different
> columns), so I know the data are fine.
> 

Try  

grip$morph<-unclass(grip$Morph)

instead.  Look at what 

as.character(factor(letters[1:3]))

gives you.

> R-script used:
> 
> HAND<-AllMal[,c(2,4,5)]
> na.omit(HAND)->HAND
> 
> write.csv(HAND, "grip.csv")
> 
> read.csv("grip.csv")->grip
> grip
> class(grip)
> class(HAND)
> 
> grip$morph<-as.character(grip$Morph)
> 
> morph<- grip$morph
> BML<-grip$BML
> grip$MCF->MCF
> 
> reg1<-lm(BML~MCF,data=subset(grip,morph=="mel"));reg1
> reg2<-lm(BML~MCF,data=subset(grip,morph=="tham"));reg2
> reg3<-lm(BML~MCF,data=subset(grip,morph=="A"));reg3
> reg4<-lm(BML~MCF,data=subset(grip,morph=="B"));reg4
> reg5<-lm(BML~MCF,data=subset(grip,morph=="C"));reg5
> 
> plot(MCF,BML,pch=c(2,6,5,19,15)[grip$morph],xlab="Residual Metacarpal
> Length",ylab="Residual Hand Strength (Broad Dowel)", main="Males")
> abline(reg1,lty=1)
> abline(reg2,lty=2)
> abline(reg3,lty=3)
> abline(reg4,lty=4)
> abline(reg5,lty=6)
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Gerrit Eichner

Hello, Krunal,

try


summ.list[[2]]$coefficients[2]


Note the double square brackets (as summ.list is a list)!

Hth,

Gerrit


On Fri, 28 Sep 2012, Krunal Nanavati wrote:


Hi Rui,

Excellent!!  This is what I was looking for. Thanks for the help.

So, now I have stored the result of the 10 regressions in  "summ.list
<- lapply(lm.list2, summary)"

And now once I enter" sum.list "it gives me the output for all
the 10 regressions...

I wanted to access a beta coefficient of one of the regressionssay
"Price2+Media1+Trend+Seasonality"...the result of which is stored in "
sum.list[2] "

I entered the below statement for accessing the Beta coefficient for
Price2...


summ.list[2]$coefficients[2]

NULL

But this is giving me " NULL " as the output...

What I am looking for, is to access a beta value of a particular variable
from a particular regression output and use it for further analysis.



<<>>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Rui Barradas

Hello,

To access list elements you need `[[`, like this:

summ.list[[2]]$coefficients

Or Use the extractor function,

coef(summ.list[[2]])

Rui Barradas
Em 28-09-2012 07:23, Krunal Nanavati escreveu:

Hi Rui,

Excellent!!  This is what I was looking for. Thanks for the help.

So, now I have stored the result of the 10 regressions in  "summ.list
<- lapply(lm.list2, summary)"

And now once I enter" sum.list "it gives me the output for all
the 10 regressions...

I wanted to access a beta coefficient of one of the regressionssay
"Price2+Media1+Trend+Seasonality"...the result of which is stored in"
sum.list[2] "

I entered the below statement for accessing the Beta coefficient for
Price2...


summ.list[2]$coefficients[2]

NULL

But this is giving me " NULL " as the output...

What I am looking for, is to access a beta value of a particular variable
from a particular regression output and use it for further analysis.

Can you please help me out with this. Greatly appreciate, you guys
efforts.




Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 21:55
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Inline.
Em 27-09-2012 13:52, Krunal Nanavati escreveu:

Hi,

Thanks for all your help. I am stuck again, but with a new problem, on
similar lines.

I have taken the problem to the next step now...i have now added 2 "for"
loops... 1 for the Price variable...and another for the Media variable

I have taken 5 price variables...and 2 media variables with the "trend
and seasonality"(appearing in all of them)so in all there will be
10 regression to run now

Price 1, Media 1

Price 1, Media 2

Price 2, Media 1'

Price 2, Media 2

...and so on

I have built up a code for it...





tryout=read.table("C:\\Users\\Krunal\\Desktop\\R

tryout.csv",header=T,sep=",")

cnames <- names(tryout)
price <- cnames[grep("Price", cnames)] media <- cnames[grep("Media",
cnames)] resp <- cnames[1] regr <- cnames[7:8] lm.list <-
vector("list", 10) for(i in 1:5)

+ {
+ regress <- paste(price[i], paste(regr, collapse = "+"), sep = "+")
+ for(j in 1:2) {
+ regress1 <- paste(media[j],regress,sep="+") fmla <- paste(resp,
+ regress1, sep = "~") lm.list[[i]] <- lm(as.formula(fmla), data =
+ tryout) } }

summ.list <- lapply(lm.list, summary) summ.list





But it is only running...5 regressions...only Media 1 along with the 5
Price variables & Trend & Seasonality is regressed on Volume...giving
only
5 outputs

I feel there is something wrong with the" lm.list[[i]] <-
lm(as.formula(fmla), data = tryout)"   statement.

No, I don't think so. If it's giving you only 5 outputs the error is
probably in the fmla construction. Put print statements to see the results
of those paste() instructions.

Supposing your data.frame is now called tryout2,


price <- paste("Price", 1:5, sep = "")
media <- paste("Media", 1:2, sep = "")
pricemedia <- apply(expand.grid(price, media, stringsAsFactors = FALSE),
1, paste, collapse="+")

response <- "Volume"
trendseason <- "Trend+Seasonality"  # do this only once

lm.list2 <- list()
for(i in seq_along(pricemedia)){
  regr <- paste(pricemedia[i], trendseason, sep = "+")
  fmla <- paste(response, regr, sep = "~")
  lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }

The trick is to use ?expand.grid

Hope this helps,

Rui Barradas


   I am not sure about its
placement...whether it should be in loop 2 or in loop 1

Can you please help me out??










Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 16:22
To: David Winsemius
Cc: Krunal Nanavati; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Just to add that you can also

lapply(lm.list, coef)

with a different output.

Rui Barradas
Em 27-09-2012 09:24, David Winsemius escreveu:

On Sep 26, 2012, at 10:31 PM, Krunal Nanavati wrote:


Dear Rui,

Thanks for your time.

I have a question though, when I run the 5 regression, whose outputs
are stored in "lm.list[i]", I only get the coefficients for the
Intercept, Price, Trend & Seasonality as below



lm.list[1]

[[1]]

Call:

lm(formula = as.formula(fmla), data = tryout)

Coefficients:

(Intercept)   Price4Trend  Seasonality

  9923123 -260682664616   551392

summ.list <- lapply(lm.list, summary) coef.list <- lapply(summ.list,
coef) coef.list


I am also looking out for t stats and p value and R squared.

For the r.squared

rsq.vec <- sapply(summ.list, "$", "r.squared") adj.rsq <-
sapply(summ.list, "$", "adj.r.squared")


Do you know,
how can I get all these statistics. Also, why is " as.formula " used
in the lm function. It should work without that as well, right?

No.

Can you please tell me, why the code that I had written, does 

Re: [R] Drawing asymmetric error bars

2012-09-28 Thread Jim Lemon

On 09/27/2012 08:59 PM, Alexandra Howe wrote:

Hello,

I have data which I have arcsin transformed to analyse.
I want to plot my data with error bars however as my data is
back-transformed my standard errors are uneven.
Is there a simple way to draw these asymmetric error bars in R?


Hi Alexandra,
Have a look at the "dispersion" function in the plotrix package.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Crosstable-like analysis (ks test) of dataframe

2012-09-28 Thread Johannes Radinger
Hi,

I have a dataframe with multiple (appr. 20) columns containing
vectors of different values (different distributions).
 Now I'd like to create a crosstable
where I compare the distribution of each vector (df-column) with
each other. For the comparison I want to use the ks.test().
The result should contain as row and column names the column names
of the input dataframe and the cells should be populated with
the p-value of the ks.test for each pairwise analysis.

My data.frame looks like:
df <- data.frame(X=rnorm(1000,2),Y=rnorm(1000,1),Z=rnorm(1000,2))

And the test for one single case is:
ks <- ks.test(df$X,df$Z)

where the p value is:
ks[2]

How can I create an automatized way of this pairwise analysis?
Any suggestions? I guess that is a quite common analysis (probably with
other tests).

cheers,
Johannes

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Crosstable-like analysis (ks test) of dataframe

2012-09-28 Thread Rui Barradas

Hello,

Try the following.


f <- function(x, y, ...,
alternative = c("two.sided", "less", "greater"), exact = NULL){
#w <- getOption("warn")
#options(warn = -1)  # ignore warnings
p <- ks.test(x, y, ..., alternative = alternative, exact = 
exact)$p.value

#options(warn = w)
p
}

n <- 1e1
dat <- data.frame(X=rnorm(n), Y=runif(n), Z=rchisq(n, df=3))

apply(dat, 2, function(x) apply(dat, 2, function(y) f(x, y)))

Hope this helps,

Rui Barradas
Em 28-09-2012 11:10, Johannes Radinger escreveu:

Hi,

I have a dataframe with multiple (appr. 20) columns containing
vectors of different values (different distributions).
  Now I'd like to create a crosstable
where I compare the distribution of each vector (df-column) with
each other. For the comparison I want to use the ks.test().
The result should contain as row and column names the column names
of the input dataframe and the cells should be populated with
the p-value of the ks.test for each pairwise analysis.

My data.frame looks like:
df <- data.frame(X=rnorm(1000,2),Y=rnorm(1000,1),Z=rnorm(1000,2))

And the test for one single case is:
ks <- ks.test(df$X,df$Z)

where the p value is:
ks[2]

How can I create an automatized way of this pairwise analysis?
Any suggestions? I guess that is a quite common analysis (probably with
other tests).

cheers,
Johannes

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Rui Barradas

Hello,

Try

names(lm.list2[[2]]$coefficient[2] )

Rui Barradas
Em 28-09-2012 11:29, Krunal Nanavati escreveu:

Ok...this solves a part of my problem

When I type   " lm.list2[2] " ...I get the following output

[[1]]

Call:
lm(formula = as.formula(fmla), data = tryout2)

Coefficients:
(Intercept)   Price2   Media1  Distri1Trend
Seasonality
13491232 -5759030-15203437048628
445351




When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
output

Price2
-5759030

And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
number...which is   -5759030


I am looking out for a way to get just the  " Price2 "is there a
statement for that??



Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 15:18
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

To access list elements you need `[[`, like this:

summ.list[[2]]$coefficients

Or Use the extractor function,

coef(summ.list[[2]])

Rui Barradas
Em 28-09-2012 07:23, Krunal Nanavati escreveu:

Hi Rui,

Excellent!!  This is what I was looking for. Thanks for the help.

So, now I have stored the result of the 10 regressions in

"summ.list

<- lapply(lm.list2, summary)"

And now once I enter" sum.list "it gives me the output for

all

the 10 regressions...

I wanted to access a beta coefficient of one of the regressionssay
"Price2+Media1+Trend+Seasonality"...the result of which is stored in"
sum.list[2] "

I entered the below statement for accessing the Beta coefficient for
Price2...


summ.list[2]$coefficients[2]

NULL

But this is giving me " NULL " as the output...

What I am looking for, is to access a beta value of a particular
variable from a particular regression output and use it for further

analysis.

Can you please help me out with this. Greatly appreciate, you guys
efforts.




Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 21:55
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Inline.
Em 27-09-2012 13:52, Krunal Nanavati escreveu:

Hi,

Thanks for all your help. I am stuck again, but with a new problem,
on similar lines.

I have taken the problem to the next step now...i have now added 2

"for"

loops... 1 for the Price variable...and another for the Media
variable

I have taken 5 price variables...and 2 media variables with the
"trend and seasonality"(appearing in all of them)so in all there
will be
10 regression to run now

Price 1, Media 1

Price 1, Media 2

Price 2, Media 1'

Price 2, Media 2

...and so on

I have built up a code for it...





tryout=read.table("C:\\Users\\Krunal\\Desktop\\R

tryout.csv",header=T,sep=",")

cnames <- names(tryout)
price <- cnames[grep("Price", cnames)] media <- cnames[grep("Media",
cnames)] resp <- cnames[1] regr <- cnames[7:8] lm.list <-
vector("list", 10) for(i in 1:5)

+ {
+ regress <- paste(price[i], paste(regr, collapse = "+"), sep = "+")
+ for(j in 1:2) {
+ regress1 <- paste(media[j],regress,sep="+") fmla <- paste(resp,
+ regress1, sep = "~") lm.list[[i]] <- lm(as.formula(fmla), data =
+ tryout) } }

summ.list <- lapply(lm.list, summary) summ.list




But it is only running...5 regressions...only Media 1 along with the
5 Price variables & Trend & Seasonality is regressed on
Volume...giving only
5 outputs

I feel there is something wrong with the" lm.list[[i]] <-
lm(as.formula(fmla), data = tryout)"   statement.

No, I don't think so. If it's giving you only 5 outputs the error is
probably in the fmla construction. Put print statements to see the
results of those paste() instructions.

Supposing your data.frame is now called tryout2,


price <- paste("Price", 1:5, sep = "") media <- paste("Media", 1:2,
sep = "") pricemedia <- apply(expand.grid(price, media,
stringsAsFactors = FALSE), 1, paste, collapse="+")

response <- "Volume"
trendseason <- "Trend+Seasonality"  # do this only once

lm.list2 <- list()
for(i in seq_along(pricemedia)){
   regr <- paste(pricemedia[i], trendseason, sep = "+")
   fmla <- paste(response, regr, sep = "~")
   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }

The trick is to use ?expand.grid

Hope this helps,

Rui Barradas


I am not sure about its
placement...whether it should be in loop 2 or in loop 1

Can you please help me out??










Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 16:22
To: David Winsemius
Cc: Krunal Nanavati; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Just to add that you can also

lapply(lm.list, coef)

with a different output.

Re: [R] changing outlier shapes of boxplots using lattice

2012-09-28 Thread Sarah Goslee
I would guess that if you find the bit that says pch="|" and change it to
pch=1 it will solve your question, and that reading ?par will tell you why.

Sarah

On Thursday, September 27, 2012, Elaine Kuo wrote:

> Hello
>
> This is Elaine.
>
> I am using package lattice to generate boxplots.
> Using Richard's code, the display was almost perfect except the outlier
> shape.
> Based on the following code, the outliers are vertical lines.
> However, I want the outliers to be empty circles.
> Please kindly help how to modify the code to change the outlier shapes.
> Thank you.
>
> code
> package (lattice)
>
> dataN <- data.frame(GE_distance=rnorm(260),
>
> Diet_B=factor(rep(1:13, each=20)))
>
> Diet.colors <- c("forestgreen", "darkgreen","chocolate1","darkorange2",
>
>  "sienna2","red2","firebrick3","saddlebrown","coral4",
>
>  "chocolate4","darkblue","navy","grey38")
>
> levels(dataN$Diet_B) <- Diet.colors
>
> bwplot(GE_distance ~ Diet_B, data=dataN,
>
>xlab=list("Diet of Breeding Ground", cex = 1.4),
>
>ylab = list(
>
>  "Distance between Centers of B and NB Range (1000 km)",
>
>  cex = 1.4),
>
>panel=panel.bwplot.intermediate.hh,
>
>col=Diet.colors,
>
>pch=rep("|",13),
>
>scales=list(x=list(rot=90)),
>
>par.settings=list(box.umbrella=list(lty=1)))
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Rui Barradas
Ok, if I'm understanding it well, you want the mean value of Price1,   , 
Price5? I don't know if it makes any sense, the coefficients already are 
mean values, but see if this is it.


price.coef <- sapply(lm.list, function(x) coef(x)[2])
mean(price.coef)

Rui Barradas
Em 28-09-2012 12:07, Krunal Nanavati escreveu:

Hi,

Yes the thing that you provided...works finebut probably I should have
asked for some other thing.

Here is what I am trying to do

I am trying to get the mean of Price variableso I am entering the
below function:

  mean(names(lm.list2[[2]]$coefficient[2] ))

but this gives me an error

[1] NA
Warning message:
In mean.default(names(lm.list2[[2]]$coefficient[2])) :
argument is not numeric or logical: returning NA

I thought by getting the text from the list variable...will help me
generate the mean for that text...which is a variable in the data...say
Price 1, Media 2and so on

Is this a proper approach...if it is...then something more needs to be
done with the function that you provided.

If not, is there a better way...to generate the mean of a particular
variable inside the " for loop " used earlier...given below:


lm.list2 <- list()
for(i in seq_along(pricemedia)){
   regr <- paste(pricemedia[i], trendseason, sep = "+")
   fmla <- paste(response, regr, sep = "~")
   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }




Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 16:02
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Try

names(lm.list2[[2]]$coefficient[2] )

Rui Barradas
Em 28-09-2012 11:29, Krunal Nanavati escreveu:

Ok...this solves a part of my problem

When I type   " lm.list2[2] " ...I get the following output

[[1]]

Call:
lm(formula = as.formula(fmla), data = tryout2)

Coefficients:
(Intercept)   Price2   Media1  Distri1Trend
Seasonality
 13491232 -5759030-15203437048628
445351




When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
output

Price2
-5759030

And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
number...which is   -5759030


I am looking out for a way to get just the  " Price2 "is there a
statement for that??



Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 15:18
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

To access list elements you need `[[`, like this:

summ.list[[2]]$coefficients

Or Use the extractor function,

coef(summ.list[[2]])

Rui Barradas
Em 28-09-2012 07:23, Krunal Nanavati escreveu:

Hi Rui,

Excellent!!  This is what I was looking for. Thanks for the help.

So, now I have stored the result of the 10 regressions in

"summ.list

<- lapply(lm.list2, summary)"

And now once I enter" sum.list "it gives me the output for

all

the 10 regressions...

I wanted to access a beta coefficient of one of the
regressionssay "Price2+Media1+Trend+Seasonality"...the result of

which is stored in"

sum.list[2] "

I entered the below statement for accessing the Beta coefficient for
Price2...


summ.list[2]$coefficients[2]

NULL

But this is giving me " NULL " as the output...

What I am looking for, is to access a beta value of a particular
variable from a particular regression output and use it for further

analysis.

Can you please help me out with this. Greatly appreciate, you guys
efforts.




Thanks & Regards,

Krunal Nanavati
9769-919198

-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 27 September 2012 21:55
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Inline.
Em 27-09-2012 13:52, Krunal Nanavati escreveu:

Hi,

Thanks for all your help. I am stuck again, but with a new problem,
on similar lines.

I have taken the problem to the next step now...i have now added 2

"for"

loops... 1 for the Price variable...and another for the Media
variable

I have taken 5 price variables...and 2 media variables with the
"trend and seasonality"(appearing in all of them)so in all there
will be
10 regression to run now

Price 1, Media 1

Price 1, Media 2

Price 2, Media 1'

Price 2, Media 2

...and so on

I have built up a code for it...





tryout=read.table("C:\\Users\\Krunal\\Desktop\\R

tryout.csv",header=T,sep=",")

cnames <- names(tryout)
price <- cnames[grep("Price", cnames)] media <-
cnames[grep("Media", cnames)] resp <- cnames[1] regr <- cnames[7:8]
lm.list <- vector("list", 10) for(i in 1:5)

+ {
+ regress <- paste(price[i], paste(regr, collapse = "+"), sep = "+")
+ for(j in 1:2) {
+ regre

Re: [R] What to use for ti in back-transforming summary statistics from F-T double square-root transformation in 'metafor'

2012-09-28 Thread Viechtbauer Wolfgang (STAT)
Dear Chunyan,

One possibility would be to use the harmonic mean of the person-time at risk 
values. You will have to do this manually though at the moment. Here is an 
example:

### let's just use the treatment group data from dat.warfarin
data(dat.warfarin)
dat <- escalc(xi=x1i, ti=t1i, measure="IRFT", data=dat.warfarin, append=TRUE)
dat

### check if back-transformation of individual IRFT values works
transf.iirft(dat$yi, ti=dat$t1i)
escalc(xi=x1i, ti=t1i, measure="IR", data=dat.warfarin)$yi

### random-effects models
res <- rma(yi, vi, data=dat)
res

### harmonic mean of the ti's
ti.hm <- 1/(mean(1/dat$t1i))

### back-transformation using the harmonic mean
transf.iirft(res$b, ti=ti.hm)
transf.iirft(res$ci.lb, ti=ti.hm)
transf.iirft(res$ci.ub, ti=ti.hm)

Best,
Wolfgang

--
Wolfgang Viechtbauer, Ph.D., Statistician
Department of Psychiatry and Psychology
School for Mental Health and Neuroscience
Faculty of Health, Medicine, and Life Sciences
Maastricht University, P.O. Box 616 (VIJV1)
6200 MD Maastricht, The Netherlands
+31 (43) 388-4170 | http://www.wvbauer.com

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Liu, Chunyan [chunyan@cchmc.org]
Sent: Thursday, September 27, 2012 10:48 PM
To: r-help@R-project.org
Subject: [R] What to use for ti in back-transforming summary statistics from 
F-T double square-root transformation in 'metafor'

Hi Dr. Viechtbauer,

I'm doing meta-analysis using your package 'metafor'. I used the 'IRFT' to 
transform the incident rate. But when I tried to back-transform the summary 
estimates from function rma, I don't know what's the appropriate ti to feed in 
function transf.iirft. I searched and found your post about using harmonic mean 
for ni  to back-transform the double arcsine  transformation. I'm hoping I can 
get your help on ti too.

Thanks.


Chunyan Liu

513-636-9763
Biostatistician II
Department of Biostatistics and Epidemiology
Cincinnati Children's Hospital Medical Center
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Anova and tukey-grouping

2012-09-28 Thread Landi
Hello,

I am really new to R and it's still a challenge to me.
Currently I'm working on my Master's Thesis. My supervisor works with SAS
and is not familiar with R at all.

I want to run an Anova, a tukey-test and as a result I want to have the
tukey-grouping ( something like A - AB - B)

I came across the HSD.test in the agricolae-package, but... unfortunately I
do not get an output (like here in the answer
http://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe
)

I did it like this:

##   ANOVA
anova.typabunmit<-aov(ds.typabunmit$abun ~ ds.typabunmit$typ)
summary(anova.typabunmit)
summary.lm(anova.typabunmit)

## post HOC
tukey.typabunmit<-TukeyHSD(anova.typabunmit)
tukey.typabunmit

## HSD
HSD.test(anova.typabunmit, "abun", group=TRUE)



and the ONLY output is this:
Name:  abun 
 ds.typabunmit$typ 


I would be very pleased about some ides..:!





--
View this message in context: 
http://r.789695.n4.nabble.com/Anova-and-tukey-grouping-tp4644485.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is it possible to enter in a function wich is within a library ?

2012-09-28 Thread ikuzar
Hello, 

I'd like to know if it is Ipossible to enter in a function wich is included
in a library ?
I know how to debug function wich is in a R file (but not in a library). But
it is not the case when the function is included in a library. I want to go
step by step in this function in order to test objects 'values.  I tried
debug(the_function) but the program does not stop at the_function (it only
shows the body of the function).

Thanks for your help.





--
View this message in context: 
http://r.789695.n4.nabble.com/Is-it-possible-to-enter-in-a-function-wich-is-within-a-library-tp4644488.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread Krunal Nanavati
Ok...this solves a part of my problem

When I type   " lm.list2[2] " ...I get the following output

[[1]]

Call:
lm(formula = as.formula(fmla), data = tryout2)

Coefficients:
(Intercept)   Price2   Media1  Distri1Trend
Seasonality
   13491232 -5759030-15203437048628
445351




When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
output

Price2
-5759030

And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
number...which is   -5759030


I am looking out for a way to get just the  " Price2 "is there a
statement for that??



Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 15:18
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

To access list elements you need `[[`, like this:

summ.list[[2]]$coefficients

Or Use the extractor function,

coef(summ.list[[2]])

Rui Barradas
Em 28-09-2012 07:23, Krunal Nanavati escreveu:
> Hi Rui,
>
> Excellent!!  This is what I was looking for. Thanks for the help.
>
> So, now I have stored the result of the 10 regressions in
"summ.list
> <- lapply(lm.list2, summary)"
>
> And now once I enter" sum.list "it gives me the output for
all
> the 10 regressions...
>
> I wanted to access a beta coefficient of one of the regressionssay
> "Price2+Media1+Trend+Seasonality"...the result of which is stored in"
> sum.list[2] "
>
> I entered the below statement for accessing the Beta coefficient for
> Price2...
>
>> summ.list[2]$coefficients[2]
> NULL
>
> But this is giving me " NULL " as the output...
>
> What I am looking for, is to access a beta value of a particular
> variable from a particular regression output and use it for further
analysis.
>
> Can you please help me out with this. Greatly appreciate, you guys
> efforts.
>
>
>
>
> Thanks & Regards,
>
> Krunal Nanavati
> 9769-919198
>
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: 27 September 2012 21:55
> To: Krunal Nanavati
> Cc: David Winsemius; r-help@r-project.org
> Subject: Re: [R] Running different Regressions using for loops
>
> Hello,
>
> Inline.
> Em 27-09-2012 13:52, Krunal Nanavati escreveu:
>> Hi,
>>
>> Thanks for all your help. I am stuck again, but with a new problem,
>> on similar lines.
>>
>> I have taken the problem to the next step now...i have now added 2
"for"
>> loops... 1 for the Price variable...and another for the Media
>> variable
>>
>> I have taken 5 price variables...and 2 media variables with the
>> "trend and seasonality"(appearing in all of them)so in all there
>> will be
>> 10 regression to run now
>>
>> Price 1, Media 1
>>
>> Price 1, Media 2
>>
>> Price 2, Media 1'
>>
>> Price 2, Media 2
>>
>> ...and so on
>>
>> I have built up a code for it...
>>
>>
>>
>>
>>> tryout=read.table("C:\\Users\\Krunal\\Desktop\\R
>> tryout.csv",header=T,sep=",")
>>> cnames <- names(tryout)
>>> price <- cnames[grep("Price", cnames)] media <- cnames[grep("Media",
>>> cnames)] resp <- cnames[1] regr <- cnames[7:8] lm.list <-
>>> vector("list", 10) for(i in 1:5)
>> + {
>> + regress <- paste(price[i], paste(regr, collapse = "+"), sep = "+")
>> + for(j in 1:2) {
>> + regress1 <- paste(media[j],regress,sep="+") fmla <- paste(resp,
>> + regress1, sep = "~") lm.list[[i]] <- lm(as.formula(fmla), data =
>> + tryout) } }
>>> summ.list <- lapply(lm.list, summary) summ.list
>>
>>
>>
>>
>> But it is only running...5 regressions...only Media 1 along with the
>> 5 Price variables & Trend & Seasonality is regressed on
>> Volume...giving only
>> 5 outputs
>>
>> I feel there is something wrong with the" lm.list[[i]] <-
>> lm(as.formula(fmla), data = tryout)"   statement.
> No, I don't think so. If it's giving you only 5 outputs the error is
> probably in the fmla construction. Put print statements to see the
> results of those paste() instructions.
>
> Supposing your data.frame is now called tryout2,
>
>
> price <- paste("Price", 1:5, sep = "") media <- paste("Media", 1:2,
> sep = "") pricemedia <- apply(expand.grid(price, media,
> stringsAsFactors = FALSE), 1, paste, collapse="+")
>
> response <- "Volume"
> trendseason <- "Trend+Seasonality"  # do this only once
>
> lm.list2 <- list()
> for(i in seq_along(pricemedia)){
>   regr <- paste(pricemedia[i], trendseason, sep = "+")
>   fmla <- paste(response, regr, sep = "~")
>   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }
>
> The trick is to use ?expand.grid
>
> Hope this helps,
>
> Rui Barradas
>
>>I am not sure about its
>> placement...whether it should be in loop 2 or in loop 1
>>
>> Can you please help me out??
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks & Regards,
>>
>> Krunal Nanavati
>> 9769-919198
>>
>> -Original Message-
>> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
>> Sent: 27 September 2012 16:22
>> To: David Winsemius
>> C

Re: [R] Running different Regressions using for loops

2012-09-28 Thread Krunal Nanavati
Hi,

Yes the thing that you provided...works finebut probably I should have
asked for some other thing.

Here is what I am trying to do

I am trying to get the mean of Price variableso I am entering the
below function:

 mean(names(lm.list2[[2]]$coefficient[2] ))

but this gives me an error

[1] NA
Warning message:
In mean.default(names(lm.list2[[2]]$coefficient[2])) :
argument is not numeric or logical: returning NA

I thought by getting the text from the list variable...will help me
generate the mean for that text...which is a variable in the data...say
Price 1, Media 2and so on

Is this a proper approach...if it is...then something more needs to be
done with the function that you provided.

If not, is there a better way...to generate the mean of a particular
variable inside the " for loop " used earlier...given below:

> lm.list2 <- list()
> for(i in seq_along(pricemedia)){
>   regr <- paste(pricemedia[i], trendseason, sep = "+")
>   fmla <- paste(response, regr, sep = "~")
>   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }




Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 16:02
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Hello,

Try

names(lm.list2[[2]]$coefficient[2] )

Rui Barradas
Em 28-09-2012 11:29, Krunal Nanavati escreveu:
> Ok...this solves a part of my problem
>
> When I type   " lm.list2[2] " ...I get the following output
>
> [[1]]
>
> Call:
> lm(formula = as.formula(fmla), data = tryout2)
>
> Coefficients:
> (Intercept)   Price2   Media1  Distri1Trend
> Seasonality
> 13491232 -5759030-15203437048628
> 445351
>
>
>
>
> When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
> output
>
> Price2
> -5759030
>
> And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
> number...which is   -5759030
>
>
> I am looking out for a way to get just the  " Price2 "is there a
> statement for that??
>
>
>
> Thanks & Regards,
>
> Krunal Nanavati
> 9769-919198
>
>
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: 28 September 2012 15:18
> To: Krunal Nanavati
> Cc: David Winsemius; r-help@r-project.org
> Subject: Re: [R] Running different Regressions using for loops
>
> Hello,
>
> To access list elements you need `[[`, like this:
>
> summ.list[[2]]$coefficients
>
> Or Use the extractor function,
>
> coef(summ.list[[2]])
>
> Rui Barradas
> Em 28-09-2012 07:23, Krunal Nanavati escreveu:
>> Hi Rui,
>>
>> Excellent!!  This is what I was looking for. Thanks for the help.
>>
>> So, now I have stored the result of the 10 regressions in
> "summ.list
>> <- lapply(lm.list2, summary)"
>>
>> And now once I enter" sum.list "it gives me the output for
> all
>> the 10 regressions...
>>
>> I wanted to access a beta coefficient of one of the
>> regressionssay "Price2+Media1+Trend+Seasonality"...the result of
which is stored in"
>> sum.list[2] "
>>
>> I entered the below statement for accessing the Beta coefficient for
>> Price2...
>>
>>> summ.list[2]$coefficients[2]
>> NULL
>>
>> But this is giving me " NULL " as the output...
>>
>> What I am looking for, is to access a beta value of a particular
>> variable from a particular regression output and use it for further
> analysis.
>> Can you please help me out with this. Greatly appreciate, you guys
>> efforts.
>>
>>
>>
>>
>> Thanks & Regards,
>>
>> Krunal Nanavati
>> 9769-919198
>>
>> -Original Message-
>> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
>> Sent: 27 September 2012 21:55
>> To: Krunal Nanavati
>> Cc: David Winsemius; r-help@r-project.org
>> Subject: Re: [R] Running different Regressions using for loops
>>
>> Hello,
>>
>> Inline.
>> Em 27-09-2012 13:52, Krunal Nanavati escreveu:
>>> Hi,
>>>
>>> Thanks for all your help. I am stuck again, but with a new problem,
>>> on similar lines.
>>>
>>> I have taken the problem to the next step now...i have now added 2
> "for"
>>> loops... 1 for the Price variable...and another for the Media
>>> variable
>>>
>>> I have taken 5 price variables...and 2 media variables with the
>>> "trend and seasonality"(appearing in all of them)so in all there
>>> will be
>>> 10 regression to run now
>>>
>>> Price 1, Media 1
>>>
>>> Price 1, Media 2
>>>
>>> Price 2, Media 1'
>>>
>>> Price 2, Media 2
>>>
>>> ...and so on
>>>
>>> I have built up a code for it...
>>>
>>>
>>>
>>>
 tryout=read.table("C:\\Users\\Krunal\\Desktop\\R
>>> tryout.csv",header=T,sep=",")
 cnames <- names(tryout)
 price <- cnames[grep("Price", cnames)] media <-
 cnames[grep("Media", cnames)] resp <- cnames[1] regr <- cnames[7:8]
 lm.list <- vector("list", 10) for(i in 1:5)
>>> + {
>>> + regress <- paste(price[i], paste(regr, collapse = "+"), sep = "

Re: [R] Running different Regressions using for loops

2012-09-28 Thread Krunal Nanavati
Ok...I am sorry for the misunderstanding

what I am trying to do is


>> lm.list2 <- list()
>> for(i in seq_along(pricemedia)){
>>regr <- paste(pricemedia[i], trendseason, sep = "+")
>>fmla <- paste(response, regr, sep = "~")
>>lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }


When I run...this set of statementsthe 1st regression to be run, will
have Price 1, Media 1...as X variablesand in the second loop it will
have Price 1 & Media 2 

So, what I was thinking is...if I can generate inside the for loopthe
mean for Price 1 and Media 1 during the 1st loopand then mean for
Price 1 and Media 2 during the second loop...and so on...for all the 10
regressions


Is the method that I was trying appropriate...or is there a better method
there...I am sorry for the earlier explanation, I hope this one makes it
more understandable


Thanks for your time...and all the quick replies




Thanks & Regards,

Krunal Nanavati
9769-919198


-Original Message-
From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
Sent: 28 September 2012 16:49
To: Krunal Nanavati
Cc: David Winsemius; r-help@r-project.org
Subject: Re: [R] Running different Regressions using for loops

Ok, if I'm understanding it well, you want the mean value of Price1,   ,
Price5? I don't know if it makes any sense, the coefficients already are
mean values, but see if this is it.

price.coef <- sapply(lm.list, function(x) coef(x)[2])
mean(price.coef)

Rui Barradas
Em 28-09-2012 12:07, Krunal Nanavati escreveu:
> Hi,
>
> Yes the thing that you provided...works finebut probably I should
> have asked for some other thing.
>
> Here is what I am trying to do
>
> I am trying to get the mean of Price variableso I am entering the
> below function:
>
>   mean(names(lm.list2[[2]]$coefficient[2] ))
>
> but this gives me an error
>
>   [1] NA
>   Warning message:
>   In mean.default(names(lm.list2[[2]]$coefficient[2])) :
>   argument is not numeric or logical: returning NA
>
> I thought by getting the text from the list variable...will help me
> generate the mean for that text...which is a variable in the
> data...say Price 1, Media 2and so on
>
> Is this a proper approach...if it is...then something more needs to be
> done with the function that you provided.
>
> If not, is there a better way...to generate the mean of a particular
> variable inside the " for loop " used earlier...given below:
>
>> lm.list2 <- list()
>> for(i in seq_along(pricemedia)){
>>regr <- paste(pricemedia[i], trendseason, sep = "+")
>>fmla <- paste(response, regr, sep = "~")
>>lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }
>
>
>
> Thanks & Regards,
>
> Krunal Nanavati
> 9769-919198
>
>
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: 28 September 2012 16:02
> To: Krunal Nanavati
> Cc: David Winsemius; r-help@r-project.org
> Subject: Re: [R] Running different Regressions using for loops
>
> Hello,
>
> Try
>
> names(lm.list2[[2]]$coefficient[2] )
>
> Rui Barradas
> Em 28-09-2012 11:29, Krunal Nanavati escreveu:
>> Ok...this solves a part of my problem
>>
>> When I type   " lm.list2[2] " ...I get the following output
>>
>> [[1]]
>>
>> Call:
>> lm(formula = as.formula(fmla), data = tryout2)
>>
>> Coefficients:
>> (Intercept)   Price2   Media1  Distri1Trend
>> Seasonality
>>  13491232 -5759030-15203437048628
>> 445351
>>
>>
>>
>>
>> When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
>> output
>>
>> Price2
>> -5759030
>>
>> And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
>> number...which is   -5759030
>>
>>
>> I am looking out for a way to get just the  " Price2 "is there a
>> statement for that??
>>
>>
>>
>> Thanks & Regards,
>>
>> Krunal Nanavati
>> 9769-919198
>>
>>
>> -Original Message-
>> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
>> Sent: 28 September 2012 15:18
>> To: Krunal Nanavati
>> Cc: David Winsemius; r-help@r-project.org
>> Subject: Re: [R] Running different Regressions using for loops
>>
>> Hello,
>>
>> To access list elements you need `[[`, like this:
>>
>> summ.list[[2]]$coefficients
>>
>> Or Use the extractor function,
>>
>> coef(summ.list[[2]])
>>
>> Rui Barradas
>> Em 28-09-2012 07:23, Krunal Nanavati escreveu:
>>> Hi Rui,
>>>
>>> Excellent!!  This is what I was looking for. Thanks for the help.
>>>
>>> So, now I have stored the result of the 10 regressions in
>> "summ.list
>>> <- lapply(lm.list2, summary)"
>>>
>>> And now once I enter" sum.list "it gives me the output for
>> all
>>> the 10 regressions...
>>>
>>> I wanted to access a beta coefficient of one of the
>>> regressionssay "Price2+Media1+Trend+Seasonality"...the result of
> which is stored in"
>>> sum.list[2] "
>>>
>>> I entered the below statement for accessing the Beta coefficient for
>>> Price2...
>>>
 summ.list[2]$coeffi

[R] RES: Generating an autocorrelated binary variable

2012-09-28 Thread André Gabriel
I think the package BinarySimCLF can help.
See http://cran.r-project.org/web/packages/binarySimCLF/binarySimCLF.pdf.


André Gabriel.



-Mensagem original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Em
nome de Rolf Turner
Enviada em: sexta-feira, 28 de setembro de 2012 00:02
Para: Simon Zehnder
Cc: r help
Assunto: Re: [R] Generating an autocorrelated binary variable


I have no idea what your code is doing, nor why you want correlated binary
variables.  Correlation makes little or no sense in the context of binary
random variables --- or more generally in the context of discrete random
variables.

Be that as it may, it is an easy calculation to show that if X and Y are
binary random variables both with success probability of 0.5 then cor(X,Y) =
0.2 if and only if Pr(X=1 | Y = 1) = 0.6.  So just generate X and Y using
that
fact:

set.seed(42)
X <- numeric(1000)
Y <- numeric(1000)
for(i in 1:1000) {
Y[i] <- rbinom(1,1,0.5)
X[i] <- if(Y[i]==1) rbinom(1,1,0.6) else rbinom(1,1,0.4) }

# Check:
cor(X,Y) # Get 0.2012336

Looks about right.  Note that the sample proportions are 0.484 and
0.485 for X and Y respectively.  These values do not differ significantly
from 0.5.

 cheers,

 Rolf Turner

On 28/09/12 08:26, Simon Zehnder wrote:
> Hi R-fellows,
>
> I am trying to simulate a multivariate correlated sample via the Gaussian
copula method. One variable is a binary variable, that should be
autocorrelated. The autocorrelation should be rho = 0.2. Furthermore, the
overall probability to get either outcome of the binary variable should be
0.5.
> Below you can see the R code (I use for simplicity a diagonal matrix in
rmvnorm even if it produces no correlated sample):
>
> "sampleCop" <- function(n = 1000, rho = 0.2) {
>   
>   require(splus2R)
>   mvrs <- rmvnorm(n + 1, mean = rep(0, 3), cov = diag(3))
>   pmvrs <- pnorm(mvrs, 0, 1)
>   var1 <- matrix(0, nrow = n + 1, ncol = 1)
>   var1[1] <- qbinom(pmvrs[1, 1], 1, 0.5)
>   if(var1[1] == 0) var1[nrow(mvrs)] <- -1
>   for(i in  1:(nrow(pmvrs) - 1)) {
>   if(pmvrs[i + 1, 1] <= rho) var1[i + 1] <- var1[i]
>   else var1[i + 1] <- var1[i] * (-1)
>   }
>   sample <- matrix(0, nrow = n, ncol = 4)
>   sample[, 1] <- var1[1:nrow(var1) - 1]
>   sample[, 2] <- var1[2:nrow(var1)]
>   sample[, 3] <- qnorm(pmvrs[1:nrow(var1) - 1, 2], 0, 1, 1, 0)
>   sample[, 4] <- qnorm(pmvrs[1:nrow(var1) - 1, 3], 0, 1, 1, 0)
>   
>   sample
>   
> }
>
> Now, the code is fine, everything compiles. But when I compute the
autocorrelation of the binary variable, it is not 0.2, but 0.6. Does anyone
know why this happens?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to write R package

2012-09-28 Thread Duncan Murdoch

On 27/09/2012 5:15 PM, Dr. Alireza Zolfaghari wrote:

Hi List,
Would you please send me a good link to talk me through on how to write a R
package?



See the ?package.skeleton help page.  After you have run it, follow the 
instructions in the "Read-and-delete-me" file that it will create.


For full details, see the Writing R Extensions manual.

For modifying the package after you've finished the "Read-and-delete-me" 
instructions, just manually add *.R files where the rest of them are, 
and use the prompt() function to produce skeleton documentation.


That's about it, but you can read more if you like in a tutorial I gave 
a few years ago at a UseR meeting in Dortmund:


http://www.statistik.uni-dortmund.de/useR-2008/slides/Murdoch.pdf

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing outlier shapes of boxplots using lattice

2012-09-28 Thread Richard M. Heiberger
Elaine,

For panel.bwplot you see that the central dot and the outlier dots are
controlled by
the same pch argument.  I initially set the pch="|" to match your first
example with the horizontal
indicator for the median.  I would be inclined to use the default circle
for the outliers and
therefore also for the median.

Rich

On Fri, Sep 28, 2012 at 7:13 AM, Sarah Goslee wrote:

> I would guess that if you find the bit that says pch="|" and change it to
> pch=1 it will solve your question, and that reading ?par will tell you why.
>
> Sarah
>
> On Thursday, September 27, 2012, Elaine Kuo wrote:
>
> > Hello
> >
> > This is Elaine.
> >
> > I am using package lattice to generate boxplots.
> > Using Richard's code, the display was almost perfect except the outlier
> > shape.
> > Based on the following code, the outliers are vertical lines.
> > However, I want the outliers to be empty circles.
> > Please kindly help how to modify the code to change the outlier shapes.
> > Thank you.
> >
> > code
> > package (lattice)
> >
> > dataN <- data.frame(GE_distance=rnorm(260),
> >
> > Diet_B=factor(rep(1:13, each=20)))
> >
> > Diet.colors <- c("forestgreen", "darkgreen","chocolate1","darkorange2",
> >
> >  "sienna2","red2","firebrick3","saddlebrown","coral4",
> >
> >  "chocolate4","darkblue","navy","grey38")
> >
> > levels(dataN$Diet_B) <- Diet.colors
> >
> > bwplot(GE_distance ~ Diet_B, data=dataN,
> >
> >xlab=list("Diet of Breeding Ground", cex = 1.4),
> >
> >ylab = list(
> >
> >  "Distance between Centers of B and NB Range (1000 km)",
> >
> >  cex = 1.4),
> >
> >panel=panel.bwplot.intermediate.hh,
> >
> >col=Diet.colors,
> >
> >pch=rep("|",13),
> >
> >scales=list(x=list(rot=90)),
> >
> >par.settings=list(box.umbrella=list(lty=1)))
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Sarah Goslee
> http://www.stringpage.com
> http://www.sarahgoslee.com
> http://www.functionaldiversity.org
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova and tukey-grouping

2012-09-28 Thread arun
HI,

I guess there is a mistake in your code.  You should have used "typ" instead of 
"abun" as "abun" is the dependent variable.
summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks))
myresults    <-  TukeyHSD(fm1, "tension", ordered = TRUE)
library(agricolae)

HSD.test(fm1,"wool",group=TRUE)
#Study:
#HSD Test for breaks 
#Mean Square Error:  134.9578 
#wool,  means
#    breaks  std.err replication
#A 31.03704 3.050609  27
#B 25.25926 1.789963  27
#alpha: 0.05 ; Df Error: 50 
#Critical Value of Studentized Range: 2.840532 
#Honestly Significant Difference: 6.350628 
#Means with the same letter are not significantly different.
#Groups, Treatments and means
#a      A      31.037037037037 
#a      B      25.2592592592593 

 

A.K.



- Original Message -
From: Landi 
To: r-help@r-project.org
Cc: 
Sent: Friday, September 28, 2012 5:41 AM
Subject: [R] Anova and tukey-grouping

Hello,

I am really new to R and it's still a challenge to me.
Currently I'm working on my Master's Thesis. My supervisor works with SAS
and is not familiar with R at all.

I want to run an Anova, a tukey-test and as a result I want to have the
tukey-grouping ( something like A - AB - B)

I came across the HSD.test in the agricolae-package, but... unfortunately I
do not get an output (like here in the answer
http://stats.stackexchange.com/questions/31547/how-to-obtain-the-results-of-a-tukey-hsd-post-hoc-test-in-a-table-showing-groupe
)

I did it like this:

##   ANOVA
anova.typabunmit<-aov(ds.typabunmit$abun ~ ds.typabunmit$typ)
summary(anova.typabunmit)
summary.lm(anova.typabunmit)

## post HOC
tukey.typabunmit<-TukeyHSD(anova.typabunmit)
tukey.typabunmit

## HSD
HSD.test(anova.typabunmit, "abun", group=TRUE)



and the ONLY output is this:
Name:  abun 
ds.typabunmit$typ 


I would be very pleased about some ides..:!





--
View this message in context: 
http://r.789695.n4.nabble.com/Anova-and-tukey-grouping-tp4644485.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Crosstable-like analysis (ks test) of dataframe

2012-09-28 Thread Johannes Radinger
Thank you Rui!

that works as I want it... :)

/Johannes

On Fri, Sep 28, 2012 at 12:30 PM, Rui Barradas  wrote:
> Hello,
>
> Try the following.
>
>
> f <- function(x, y, ...,
> alternative = c("two.sided", "less", "greater"), exact = NULL){
> #w <- getOption("warn")
> #options(warn = -1)  # ignore warnings
> p <- ks.test(x, y, ..., alternative = alternative, exact =
> exact)$p.value
> #options(warn = w)
> p
> }
>
> n <- 1e1
> dat <- data.frame(X=rnorm(n), Y=runif(n), Z=rchisq(n, df=3))
>
> apply(dat, 2, function(x) apply(dat, 2, function(y) f(x, y)))
>
> Hope this helps,
>
> Rui Barradas
> Em 28-09-2012 11:10, Johannes Radinger escreveu:
>>
>> Hi,
>>
>> I have a dataframe with multiple (appr. 20) columns containing
>> vectors of different values (different distributions).
>>   Now I'd like to create a crosstable
>> where I compare the distribution of each vector (df-column) with
>> each other. For the comparison I want to use the ks.test().
>> The result should contain as row and column names the column names
>> of the input dataframe and the cells should be populated with
>> the p-value of the ks.test for each pairwise analysis.
>>
>> My data.frame looks like:
>> df <- data.frame(X=rnorm(1000,2),Y=rnorm(1000,1),Z=rnorm(1000,2))
>>
>> And the test for one single case is:
>> ks <- ks.test(df$X,df$Z)
>>
>> where the p value is:
>> ks[2]
>>
>> How can I create an automatized way of this pairwise analysis?
>> Any suggestions? I guess that is a quite common analysis (probably with
>> other tests).
>>
>> cheers,
>> Johannes
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread Rich Shepard

  I'm not able to create the proper syntax to specify a lattice bwplot() for
only one of two conditioning factors.

  The syntax that produces a box plot of each of the two conditioning
factors is:

bwplot(quant ~ param | era, data=mg.d, main='Dissolved Magnesium', 
ylab='Concentration (mg/L)')

  What I've tried unsuccessfully are:

bwplot(quant ~ param | factor(era=='Pre-mining'), data=mg.d,
main='Magnesium', ylab='Concentration (mg/L))

bwplot(quant ~ param | era, data=mg.d, main='Magnesium', ylab='Concentration
(mg/L)', subset=era('Pre-mining'))

plus slight variations of the above. None work.

  Please point me to what I've missed in specifying only one of two
conditioning factors for the plot.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Bhupendrasinh Thakre
Many thanks Dr. Winsemius , Kimmo and Pascal
All of them are working and really beautiful...

Best Regards,


Bhupendrasinh Thakre

*Disclaimer :*

The information contained in this communication is confidential and may be
legally privileged. It is intended solely for the use of the individual or
entity to whom it is adressed. If you are not the intended recipient you
are hereby (a) notified that any disclosure, copying, distribution or
taking any action with respect to the content of this information is
strictly prohibited and may be unlawful, and (b) kindly requested to inform
the sender immediately and destroy any copies.



On Fri, Sep 28, 2012 at 1:36 AM, David Winsemius wrote:

>
> On Sep 27, 2012, at 11:13 PM, Bhupendrasinh Thakre wrote:
>
> >
> > Hi Everyone,
> >
> > I am trying a very simple task to append the Timestamp with a variable
> name so something like
> > a_2012_09_27_00_12_30 <- rnorm(1,2,1).
>
> If you want to assign a value to a character-name you need to use ...
> `assign`. You cannot just stick a numeric value which is what you get with
> sys.Time() on the LHS of a "<-" and expect R to intuit what you intend.
>
> ?assign
> assign( "a_2012_09_27_00_12_30" ,  rnorm(1,2,1) )
> assign( as.character(unclass(Sys.time())) ,  rnorm(1,2,1) )
>
> (I would have thought you wanted to format that sys.Time result:)
>
> > format(Sys.time(), "%Y_%m_%d_%H_%M_%S")
> [1] "2012_09_27_23_32_40"
>
> >  assign(format(Sys.time(), "%Y_%m_%d_%H_%M_%S"),  rnorm(1,2,1) )
> > grep("^2012", ls(), value=TRUE)
> [1] "2012_09_27_23_33_45"
>
>
> >
> > Tried some commands but it doesn't work out well. Hope someone has some
> answer on it.
> >
> > Session Info
> >
> > R version 2.15.1 (2012-06-22)
> > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > other attached packages:
> > [1] chron_2.3-42twitteR_0.99.19 rjson_0.2.9 RCurl_1.91-1
>  bitops_1.0-4.1  tm_0.5-7.1  RMySQL_0.9-3DBI_0.2-5
> >
> > loaded via a namespace (and not attached):
> > [1] slam_0.1-24  tools_2.15.1
> >
> > Statement I tried :
> >
> > b <- unclass(Sys.time())
> > b = 1348812597
> > c_b <- rnorm(1,2,1)
> >
> > Works perfect but doesn't show me c_1348812597.
> >
> > Best Regards,
> >
> >
> > Bhupendrasinh Thakre
> >   [[alternative HTML version deleted]]
>
> BT; Please learn to post in plain text. It's really very simple with gmail.
>
> --
>
> David Winsemius, MD
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread jens . oehlschlaegel

   Jonathan,
   ff has a utility function file.resize() which allows to give a new filesize
   in bytes using doubles.
   See ?file.resize
   Regards
   Jens Oehlschlägel
   Gesendet: Donnerstag, 27. September 2012 um 21:17 Uhr
   Von: "Jonathan Greenberg" 
   An: r-help , r-sig-...@r-project.org
   Betreff: Re: [R-sig-hpc] Quickest way to make a large "empty" file on disk?
   Folks:
   Asked this question some time ago, and found what appeared (at first) to be
   the best solution, but I'm now finding a new problem. First off, it seemed
   like ff as Jens suggested worked:
   # outdata_ncells = the number of rows * number of columns * number of bands
   in an image:
   out<-ff(vmode="double",length=outdata_ncells,filename=filename)
   finalizer(out) <- close
   close(out)
   This was working fine until I attempted to set length to a VERY large
   number: outdata_ncells = 17711913600. This would create a file that is
   131.964GB. Big, but not obscenely so (and certainly not larger than the
   filesystem can handle). However, length appears to be restricted
   by .Machine$integer.max (I'm on a 64-bit windows box):
   > .Machine$integer.max
   [1] 2147483647
   Any suggestions on how to solve this problem for much larger file sizes?
   --j
   OnThu,   May   3,   2012   at   10:44   AM,   Jonathan   Greenberg
   wrote:
   > Thanks, all! I'll try these out. I'm trying to work up something that is
   > platform independent (if possible) for use with mmap. I'll do some tests
   > on these suggestions and see which works best. I'll try to report back in
   a
   > few days. Cheers!
   >
   > --j
   >
   >
   >
   > 2012/5/3 "Jens Oehlschlägel" 
   >
   >> Jonathan,
   >>
   >> On some filesystems (e.g. NTFS, see below) it is possible to create
   >> 'sparse' memory-mapped files, i.e. reserving the space without the cost
   of
   >> actually writing initial values.
   >> Package 'ff' does this automatically and also allows to access the file
   >> in parallel. Check the example below and see how big file creation is
   >> immediate.
   >>
   >> Jens Oehlschlägel
   >>
   >>
   >> > library(ff)
   >> > library(snowfall)
   >> > ncpus <- 2
   >> > n <- 1e8
   >> > system.time(
   >> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
   >> + )
   >> User System verstrichen
   >> 0.01 0.00 0.02
   >> > # check finalizer, with an explicit filename we should have a 'close'
   >> finalizer
   >> > finalizer(x)
   >> [1] "close"
   >> > # if not, set it to 'close' inorder to not let slaves delete x on slave
   >> shutdown
   >> > finalizer(x) <- "close"
   >> > sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
   >> R Version: R version 2.15.0 (2012-03-30)
   >>
   >> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
   >> CPUs.
   >>
   >> > sfLibrary(ff)
   >> Library ff loaded.
   >> Library ff loaded in cluster.
   >>
   >> Warnmeldung:
   >> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
   >> = TRUE, :
   >> 'keep.source' is deprecated and will be ignored
   >> > sfExport("x") # note: do not export the same ff multiple times
   >> > # explicitely opening avoids a gc problem
   >> > sfClusterEval(open(x, caching="mmeachflush")) # opening with
   >> 'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
   >> write storms when the file is larger than RAM
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > system.time(
   >> + sfLapply( chunk(x, length=ncpus), function(i){
   >> + x[i] <- runif(sum(i))
   >> + invisible()
   >> + })
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 30.78
   >> > system.time(
   >> + s <- sfLapply( chunk(x, length=ncpus), function(i) quantile(x[i],
   >> c(0.05, 0.95)) )
   >> + )
   >> User System verstrichen
   >> 0.00 0.00 4.38
   >> > # for completeness
   >> > sfClusterEval(close(x))
   >> [[1]]
   >> [1] TRUE
   >>
   >> [[2]]
   >> [1] TRUE
   >>
   >> > csummary(s)
   >> 5% 95%
   >> Min. 0.04998 0.95
   >> 1st Qu. 0.04999 0.95
   >> Median 0.05001 0.95
   >> Mean 0.05001 0.95
   >> 3rd Qu. 0.05002 0.95
   >> Max. 0.05003 0.95
   >> > # stop slaves
   >> > sfStop()
   >>
   >> Stopping cluster
   >>
   >> > # with the close finalizer we are responsible for deleting the file
   >> explicitely (unless we want to keep it)
   >> > delete(x)
   >> [1] TRUE
   >> > # remove r-side metadata
   >> > rm(x)
   >> > # truly free memory
   >> > gc()
   >>
   >>
   >>
   >> *Gesendet:* Donnerstag, 03. Mai 2012 um 00:23 Uhr
   >> *Von:* "Jonathan Greenberg" 
   >> *An:* r-help , r-sig-...@r-project.org
   >> *Betreff:* [R-sig-hpc] Quickest way to make a large "empty" file on
   >> disk?
   >> R-helpers:
   >>
   >> What would be the absolute fastest way to make a large "empty" file (e.g.
   >> filled with all zeroes) on disk, given a byte size and a given number
   >> number of empty values. I know I can use writeBin, but the "object" in
   >>  this case may be far too large to store in main

Re: [R] Anova and tukey-grouping

2012-09-28 Thread Landi
Hello !

Thanks for your advice. I tried it, but the output is the same:
> HSD.test(anova.typabunmit, "typ", group=TRUE)
Name:  typ 
 ds.typabunmit$typ 

I don't get the values...!?!?



--
View this message in context: 
http://r.789695.n4.nabble.com/Anova-and-tukey-grouping-tp4644485p4644513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List of Variables in Original Order

2012-09-28 Thread rkulp
AK: Thanks, that was very helpful. It led me to think of  the function 
names(base) which provided the vector of names in the correct order. I 
then used the same matrix formatting and everything worked out exactly 
as planned.
Dick
On 9/28/2012 1:09 AM, arun kirshna [via R] wrote:
>
>
> HI,
> May be this helps you:
> set.seed(1)
>  mat1<-matrix(rnorm(60,5),nrow=5,ncol=12)
> colnames(mat1)<-paste0("Var",1:12)
> vec2<-format(c(1,cor(mat1[,1],mat1[,2:12])),digits=4)
> vec3<-colnames(mat1)
> arr2<-array(rbind(vec3,vec2),dim=c(2,3,4))
> res<-data.frame(do.call(rbind,lapply(1:dim(arr2)[3],function(i) 
> arr2[,,i])))
>  res
> #X1   X2   X3
> #1 Var1 Var2 Var3
> #2  1.0  0.27890 -0.61497
> #3 Var4 Var5 Var6
> #4  0.24916 -0.76155  0.30853
> #5 Var7 Var8 Var9
> #6 -0.46413  0.79287  0.05191
> #7Var10Var11Var12
> #8 -0.06940 -0.53251  0.06766
>
> A.K.
>
>
> - Original Message -
> From: rkulp <[hidden email] 
> >
> To: [hidden email] 
> Cc:
> Sent: Thursday, September 27, 2012 6:26 PM
> Subject: [R] List of Variables in Original Order
>
> I am trying to Sweave the output of calculating correlations between one
> variable and several others. I wanted to print a table where the
> odd-numbered rows contain the variable names and the even-numbered rows
> contain the correlations. So if VarA is correlated with all the 
> variables in
> mydata.df, then it would look like
>
> var1var2  var3
> corr1  corr2 corr3
> var4   var5var6
> corr4 corr5 corr6
> .
> .
> etc.
> I tried using a matrix for the correlations and another one for the 
> variable
> names. I built the correlation matrix using
> x = matrix(format(cor(mydata.df[,1],mydata.df[,c(2:79)]),digits=4),nc=3)
> and the variable names matrix using
> y = matrix(ls(mydata.df[c(2:79)]),nc=3).
> The problem is the function ls returns the names in alphabetical order,
> columnar order.
> How do I get the names in columnar order? Is there a better way to 
> display
> the correlation of a single variable with a large number of other 
> variables?
> If there is, how do I do it? I appreciate any help I can get. This is my
> first project in R so I don't know much about it yet.
>
>
>
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/List-of-Variables-in-Original-Order-tp4644436.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [hidden email]  
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> [hidden email]  
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> 
> If you reply to this email, your message will be added to the 
> discussion below:
> http://r.789695.n4.nabble.com/List-of-Variables-in-Original-Order-tp4644436p4644469.html
>  
>
> To unsubscribe from List of Variables in Original Order, click here 
> .
> NAML 
> 
>  
>



rkulp.vcf (418 bytes) 





--
View this message in context: 
http://r.789695.n4.nabble.com/List-of-Variables-in-Original-Order-tp4644436p4644516.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Running different Regressions using for loops

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 4:35 AM, Krunal Nanavati wrote:

> Ok...I am sorry for the misunderstanding
> 
> what I am trying to do is

Perhaps (and that is a really large 'perhaps'):

>>> lm.list2 <- list()
lm.means <- list()
>>> for(i in seq_along(pricemedia)){
>>>   regr <- paste(pricemedia[i], trendseason, sep = "+")
>>>   fmla <- paste(response, regr, sep = "~")
>>>   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }
   lm.means[[i]]  <- mean(lm.list2[[i]]$coefficients[c("Price1", 
"Media1")]
}



> 
> When I run...this set of statementsthe 1st regression to be run, will
> have Price 1, Media 1...as X variablesand in the second loop it will
> have Price 1 & Media 2 
> 
> So, what I was thinking is...if I can generate inside the for loopthe
> mean for Price 1 and Media 1 during the 1st loopand then mean for
> Price 1 and Media 2 during the second loop...and so on...for all the 10
> regressions
> 
> 
> Is the method that I was trying appropriate...or is there a better method
> there...I am sorry for the earlier explanation, I hope this one makes it
> more understandable

One generally want ones methods to be determinate while allowing the results to 
be approximate.

Had you followed the posting guide a offered a reproducible example it would 
have been much more "understandable".


> 
> 
> Thanks for your time...and all the quick replies
> 
> 
> -Original Message-
> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
> Sent: 28 September 2012 16:49
> To: Krunal Nanavati
> Cc: David Winsemius; r-help@r-project.org
> Subject: Re: [R] Running different Regressions using for loops
> 
> Ok, if I'm understanding it well, you want the mean value of Price1,   ,
> Price5? I don't know if it makes any sense, the coefficients already are
> mean values, but see if this is it.
> 
> price.coef <- sapply(lm.list, function(x) coef(x)[2])
> mean(price.coef)
> 
> Rui Barradas
> Em 28-09-2012 12:07, Krunal Nanavati escreveu:
>> Hi,
>> 
>> Yes the thing that you provided...works finebut probably I should
>> have asked for some other thing.
>> 
>> Here is what I am trying to do
>> 
>> I am trying to get the mean of Price variableso I am entering the
>> below function:
>> 
>>  mean(names(lm.list2[[2]]$coefficient[2] ))
>> 
>> but this gives me an error
>> 
>>  [1] NA
>>  Warning message:
>>  In mean.default(names(lm.list2[[2]]$coefficient[2])) :
>>  argument is not numeric or logical: returning NA
>> 
>> I thought by getting the text from the list variable...will help me
>> generate the mean for that text...which is a variable in the
>> data...say Price 1, Media 2and so on
>> 
>> Is this a proper approach...if it is...then something more needs to be
>> done with the function that you provided.
>> 
>> If not, is there a better way...to generate the mean of a particular
>> variable inside the " for loop " used earlier...given below:
>> 
>>> lm.list2 <- list()
>>> for(i in seq_along(pricemedia)){
>>>   regr <- paste(pricemedia[i], trendseason, sep = "+")
>>>   fmla <- paste(response, regr, sep = "~")
>>>   lm.list2[[i]] <- lm(as.formula(fmla), data = tryout2) }
>> 
>> 
>> 
>> Thanks & Regards,
>> 
>> Krunal Nanavati
>> 9769-919198
>> 
>> 
>> -Original Message-
>> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
>> Sent: 28 September 2012 16:02
>> To: Krunal Nanavati
>> Cc: David Winsemius; r-help@r-project.org
>> Subject: Re: [R] Running different Regressions using for loops
>> 
>> Hello,
>> 
>> Try
>> 
>> names(lm.list2[[2]]$coefficient[2] )
>> 
>> Rui Barradas
>> Em 28-09-2012 11:29, Krunal Nanavati escreveu:
>>> Ok...this solves a part of my problem
>>> 
>>> When I type   " lm.list2[2] " ...I get the following output
>>> 
>>> [[1]]
>>> 
>>> Call:
>>> lm(formula = as.formula(fmla), data = tryout2)
>>> 
>>> Coefficients:
>>> (Intercept)   Price2   Media1  Distri1Trend
>>> Seasonality
>>> 13491232 -5759030-15203437048628
>>> 445351
>>> 
>>> 
>>> 
>>> 
>>> When I enter   " lm.list2[[2]]$coefficient[2] " it gives me the below
>>> output
>>> 
>>> Price2
>>> -5759030
>>> 
>>> And when I enter   " lm.list2[[2]]$coefficient[[2]] " ...I get the
>>> number...which is   -5759030
>>> 
>>> 
>>> I am looking out for a way to get just the  " Price2 "is there a
>>> statement for that??
>>> 
>>> 
>>> 
>>> Thanks & Regards,
>>> 
>>> Krunal Nanavati
>>> 9769-919198
>>> 
>>> 
>>> -Original Message-
>>> From: Rui Barradas [mailto:ruipbarra...@sapo.pt]
>>> Sent: 28 September 2012 15:18
>>> To: Krunal Nanavati
>>> Cc: David Winsemius; r-help@r-project.org
>>> Subject: Re: [R] Running different Regressions using for loops
>>> 
>>> Hello,
>>> 
>>> To access list elements you need `[[`, like this:
>>> 
>>> summ.list[[2]]$coefficients
>>> 
>>> Or Use the extractor function,
>>> 
>>> coef(summ.list[[2]])
>>> 
>>> Rui Barradas
>>> Em 28-09-2012 07:23, Krunal Nanava

[R] max & summary contradict each other

2012-09-28 Thread Sam Steingold
why does summary report max 27600 and not 27603?

> x <- c(27603, 1)
> max(x)
[1] 27603
> summary(x)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  16902   13800   13800   20700   27600 

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://pmw.org.il
http://dhimmi.com http://iris.org.il http://mideasttruth.com
Vegetarians eat Vegetables, Humanitarians are scary.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] max & summary contradict each other

2012-09-28 Thread Duncan Murdoch

On 28/09/2012 12:14 PM, Sam Steingold wrote:

why does summary report max 27600 and not 27603?

> x <- c(27603, 1)
> max(x)
[1] 27603
> summary(x)
Min. 1st Qu.  MedianMean 3rd Qu.Max.
   16902   13800   13800   20700   27600



Because you asked for 3 digit accuracy.  See ?summary.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 7:49 AM, Rich Shepard wrote:

>  I'm not able to create the proper syntax to specify a lattice bwplot() for
> only one of two conditioning factors.

Wouldn't that involve specifying the 'subset' parameter (if bwplot accepts a 
subset argument) or using the 'subset' function to pass the desired rows to the 
data argument if it doesn't?

> 
>  The syntax that produces a box plot of each of the two conditioning
> factors is:
> 
> bwplot(quant ~ param | era, data=mg.d, main='Dissolved Magnesium', 
> ylab='Concentration (mg/L)')
> 
>  What I've tried unsuccessfully are:
> 
> bwplot(quant ~ param | factor(era=='Pre-mining'), data=mg.d,
> main='Magnesium', ylab='Concentration (mg/L))
> 
> bwplot(quant ~ param | era, data=mg.d, main='Magnesium', ylab='Concentration
> (mg/L)', subset=era('Pre-mining'))
> 
> plus slight variations of the above. None work.
> 
>  Please point me to what I've missed in specifying only one of two
> conditioning factors for the plot.
> 
> 
-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Bhupendrasinh Thakre
Hi Everyone,

Sorry for coming back again with a new problem.
Editing question, session info and data so you don't have to scroll till
the end of page.

*Situation :*

I have a data frame and it's name is df. Now I want to add Time Stamp to
the end of *"name" of "data Frame" i.e. "df_system_time"*. Previously it
was running great and thanks to Dr. Winsemius , Kimmo and Pascal and I
believe as the function which i used was scalar.

*Data :*

dput(df)structure(list(x = 1:10, y = 1:10), .Names = c("x", "y"),
row.names = c(NA,
-10L), class = "data.frame")

*Session Info :*

R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils methods
[7] base

other attached packages:
[1] rcom_2.2-5 rscproxy_2.0-5

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.2
 [4] ggplot2_0.9.2.1grid_2.15.1gtable_0.1.1
 [7] labeling_0.1   MASS_7.3-18memoise_0.1
[10] munsell_0.3plyr_1.7.1 proto_0.3-9.2
[13] RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.2
[16] stringr_0.6.1  tools_2.15.1


It's kind of very easy in SQL but I love doing all the work in R so don't
want to leave for just changing the name.

Best Regards,

Bhupendrasinh Thakre








Best Regards,


Bhupendrasinh Thakre

*Disclaimer :*

The information contained in this communication is confidential and may be
legally privileged. It is intended solely for the use of the individual or
entity to whom it is adressed. If you are not the intended recipient you
are hereby (a) notified that any disclosure, copying, distribution or
taking any action with respect to the content of this information is
strictly prohibited and may be unlawful, and (b) kindly requested to inform
the sender immediately and destroy any copies.



On Fri, Sep 28, 2012 at 10:13 AM, Bhupendrasinh Thakre <
vickytha...@gmail.com> wrote:

> Many thanks Dr. Winsemius , Kimmo and Pascal
> All of them are working and really beautiful...
>
> Best Regards,
>
>
> Bhupendrasinh Thakre
>
> *Disclaimer :*
>
> The information contained in this communication is confidential and may be
> legally privileged. It is intended solely for the use of the individual or
> entity to whom it is adressed. If you are not the intended recipient you
> are hereby (a) notified that any disclosure, copying, distribution or
> taking any action with respect to the content of this information is
> strictly prohibited and may be unlawful, and (b) kindly requested to inform
> the sender immediately and destroy any copies.
>
>
>
> On Fri, Sep 28, 2012 at 1:36 AM, David Winsemius 
> wrote:
>
>>
>> On Sep 27, 2012, at 11:13 PM, Bhupendrasinh Thakre wrote:
>>
>> >
>> > Hi Everyone,
>> >
>> > I am trying a very simple task to append the Timestamp with a variable
>> name so something like
>> > a_2012_09_27_00_12_30 <- rnorm(1,2,1).
>>
>> If you want to assign a value to a character-name you need to use ...
>> `assign`. You cannot just stick a numeric value which is what you get with
>> sys.Time() on the LHS of a "<-" and expect R to intuit what you intend.
>>
>> ?assign
>> assign( "a_2012_09_27_00_12_30" ,  rnorm(1,2,1) )
>> assign( as.character(unclass(Sys.time())) ,  rnorm(1,2,1) )
>>
>> (I would have thought you wanted to format that sys.Time result:)
>>
>> > format(Sys.time(), "%Y_%m_%d_%H_%M_%S")
>> [1] "2012_09_27_23_32_40"
>>
>> >  assign(format(Sys.time(), "%Y_%m_%d_%H_%M_%S"),  rnorm(1,2,1) )
>> > grep("^2012", ls(), value=TRUE)
>> [1] "2012_09_27_23_33_45"
>>
>>
>> >
>> > Tried some commands but it doesn't work out well. Hope someone has some
>> answer on it.
>> >
>> > Session Info
>> >
>> > R version 2.15.1 (2012-06-22)
>> > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>> >
>> > locale:
>> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> >
>> > attached base packages:
>> > [1] stats graphics  grDevices utils datasets  methods   base
>> >
>> > other attached packages:
>> > [1] chron_2.3-42twitteR_0.99.19 rjson_0.2.9 RCurl_1.91-1
>>  bitops_1.0-4.1  tm_0.5-7.1  RMySQL_0.9-3DBI_0.2-5
>> >
>> > loaded via a namespace (and not attached):
>> > [1] slam_0.1-24  tools_2.15.1
>> >
>> > Statement I tried :
>> >
>> > b <- unclass(Sys.time())
>> > b = 1348812597
>> > c_b <- rnorm(1,2,1)
>> >
>> > Works perfect but doesn't show me c_1348812597.
>> >
>> > Best Regards,
>> >
>> >
>> > Bhupendrasinh Thakre
>> >   [[alternative HTML version deleted]]
>>
>> BT; Please learn to post in plain text. It's really very simple with
>> gmail.
>>
>> --
>>
>> David Winsemius, MD
>> Alameda, CA, USA
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https

Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread Bert Gunter
A small reproducible example, as requested bythe posting guide, would
have been very helpful here (if you provide one, use ?dput to provide
the data). You have also not told us what you mean by "unsuccessful,"
so we are left to guess what sort of problems you experienced.  "None
work" is completely useless to help diagnose the problem. This means
we waste time going back and forth trying to elucidate what you mean.
Please consider these things if/when you post in future.

In any case, my guess is that param is numeric and it should be a
factor, so, e.g.

 bwplot(quant ~ factor(param) | era, data=mg.d, main='Dissolved
Magnesium', ylab='Concentration (mg/L)')

might be what you want. But of course, it may be completely wrong.

Cheers,
Bert





On Fri, Sep 28, 2012 at 9:25 AM, David Winsemius  wrote:
>
> On Sep 28, 2012, at 7:49 AM, Rich Shepard wrote:
>
>>  I'm not able to create the proper syntax to specify a lattice bwplot() for
>> only one of two conditioning factors.
>
> Wouldn't that involve specifying the 'subset' parameter (if bwplot accepts a 
> subset argument) or using the 'subset' function to pass the desired rows to 
> the data argument if it doesn't?
>
>>
>>  The syntax that produces a box plot of each of the two conditioning
>> factors is:
>>
>> bwplot(quant ~ param | era, data=mg.d, main='Dissolved Magnesium', 
>> ylab='Concentration (mg/L)')
>>
>>  What I've tried unsuccessfully are:
>>
>> bwplot(quant ~ param | factor(era=='Pre-mining'), data=mg.d,
>> main='Magnesium', ylab='Concentration (mg/L))
>>
>> bwplot(quant ~ param | era, data=mg.d, main='Magnesium', ylab='Concentration
>> (mg/L)', subset=era('Pre-mining'))
>>
>> plus slight variations of the above. None work.
>>
>>  Please point me to what I've missed in specifying only one of two
>> conditioning factors for the plot.
>>
>>
> --
>
> David Winsemius, MD
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Bhupendrasinh Thakre
Hi Everyone,

Sorry for coming back again with a new problem.
Editing question, session info and data so you don't have to scroll till
the end of page.

*Situation :*

I have a data frame and it's name is df. Now I want to add Time Stamp to
the end of *"name" of "data Frame" i.e. "df_system_time"*. Previously it
was running great and thanks to Dr. Winsemius , Kimmo and Pascal and I
believe as the function which i used was scalar.

*Data :*

dput(df)structure(list(x = 1:10, y = 1:10), .Names = c("x", "y"),
row.names = c(NA,
-10L), class = "data.frame")

*Session Info :*

R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices datasets  utils methods
[7] base

other attached packages:
[1] rcom_2.2-5 rscproxy_2.0-5

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.2
 [4] ggplot2_0.9.2.1grid_2.15.1gtable_0.1.1
 [7] labeling_0.1   MASS_7.3-18memoise_0.1
[10] munsell_0.3plyr_1.7.1 proto_0.3-9.2
[13] RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.2
[16] stringr_0.6.1  tools_2.15.1


It's kind of very easy in SQL but I love doing all the work in R so don't
want to leave for just changing the name.

Best Regards,

Bhupendrasinh Thakre








Best Regards,


Bhupendrasinh Thakre

*Disclaimer :*

The information contained in this communication is confidential and may be
legally privileged. It is intended solely for the use of the individual or
entity to whom it is adressed. If you are not the intended recipient you
are hereby (a) notified that any disclosure, copying, distribution or
taking any action with respect to the content of this information is
strictly prohibited and may be unlawful, and (b) kindly requested to inform
the sender immediately and destroy any copies.



On Fri, Sep 28, 2012 at 10:13 AM, Bhupendrasinh Thakre <
vickytha...@gmail.com> wrote:

> Many thanks Dr. Winsemius , Kimmo and Pascal
> All of them are working and really beautiful...
>
> Best Regards,
>
>
> Bhupendrasinh Thakre
>
> *Disclaimer :*
>
> The information contained in this communication is confidential and may be
> legally privileged. It is intended solely for the use of the individual or
> entity to whom it is adressed. If you are not the intended recipient you
> are hereby (a) notified that any disclosure, copying, distribution or
> taking any action with respect to the content of this information is
> strictly prohibited and may be unlawful, and (b) kindly requested to inform
> the sender immediately and destroy any copies.
>
>
>
> On Fri, Sep 28, 2012 at 1:36 AM, David Winsemius 
> wrote:
>
>>
>> On Sep 27, 2012, at 11:13 PM, Bhupendrasinh Thakre wrote:
>>
>> >
>> > Hi Everyone,
>> >
>> > I am trying a very simple task to append the Timestamp with a variable
>> name so something like
>> > a_2012_09_27_00_12_30 <- rnorm(1,2,1).
>>
>> If you want to assign a value to a character-name you need to use ...
>> `assign`. You cannot just stick a numeric value which is what you get with
>> sys.Time() on the LHS of a "<-" and expect R to intuit what you intend.
>>
>> ?assign
>> assign( "a_2012_09_27_00_12_30" ,  rnorm(1,2,1) )
>> assign( as.character(unclass(Sys.time())) ,  rnorm(1,2,1) )
>>
>> (I would have thought you wanted to format that sys.Time result:)
>>
>> > format(Sys.time(), "%Y_%m_%d_%H_%M_%S")
>> [1] "2012_09_27_23_32_40"
>>
>> >  assign(format(Sys.time(), "%Y_%m_%d_%H_%M_%S"),  rnorm(1,2,1) )
>> > grep("^2012", ls(), value=TRUE)
>> [1] "2012_09_27_23_33_45"
>>
>>
>> >
>> > Tried some commands but it doesn't work out well. Hope someone has some
>> answer on it.
>> >
>> > Session Info
>> >
>> > R version 2.15.1 (2012-06-22)
>> > Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>> >
>> > locale:
>> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> >
>> > attached base packages:
>> > [1] stats graphics  grDevices utils datasets  methods   base
>> >
>> > other attached packages:
>> > [1] chron_2.3-42twitteR_0.99.19 rjson_0.2.9 RCurl_1.91-1
>>  bitops_1.0-4.1  tm_0.5-7.1  RMySQL_0.9-3DBI_0.2-5
>> >
>> > loaded via a namespace (and not attached):
>> > [1] slam_0.1-24  tools_2.15.1
>> >
>> > Statement I tried :
>> >
>> > b <- unclass(Sys.time())
>> > b = 1348812597
>> > c_b <- rnorm(1,2,1)
>> >
>> > Works perfect but doesn't show me c_1348812597.
>> >
>> > Best Regards,
>> >
>> >
>> > Bhupendrasinh Thakre
>> >   [[alternative HTML version deleted]]
>>
>> BT; Please learn to post in plain text. It's really very simple with
>> gmail.
>>
>> --
>>
>> David Winsemius, MD
>> Alameda, CA, USA
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https

Re: [R] max & summary contradict each other

2012-09-28 Thread arun
Hi,
Try this:
summary(x,digits=max(5))
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  #  1.0  6901.5 13802.0 13802.0 20702.0 27603.0 
A.K.




- Original Message -
From: Sam Steingold 
To: r-help@r-project.org
Cc: 
Sent: Friday, September 28, 2012 12:14 PM
Subject: [R] max & summary contradict each other

why does summary report max 27600 and not 27603?

> x <- c(27603, 1)
> max(x)
[1] 27603
> summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1    6902   13800   13800   20700   27600 

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://pmw.org.il
http://dhimmi.com http://iris.org.il http://mideasttruth.com
Vegetarians eat Vegetables, Humanitarians are scary.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread Rich Shepard

On Fri, 28 Sep 2012, David Winsemius wrote:


Wouldn't that involve specifying the 'subset' parameter (if bwplot accepts
a subset argument) or using the 'subset' function to pass the desired rows
to the data argument if it doesn't?


David,

  That's what I tried:


bwplot(quant ~ param | era, data=mg.d, main='Magnesium', ylab='Concentration
(mg/L)', subset=era('Pre-mining'))


  Perhaps I didn't write it correctly.

Thanks,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Berend Hasselman

On 28-09-2012, at 18:40, Bhupendrasinh Thakre  wrote:

> Hi Everyone,
> 
> Sorry for coming back again with a new problem.
> Editing question, session info and data so you don't have to scroll till
> the end of page.
> 
> *Situation :*
> 
> I have a data frame and it's name is df. Now I want to add Time Stamp to
> the end of *"name" of "data Frame" i.e. "df_system_time"*. Previously it
> was running great and thanks to Dr. Winsemius , Kimmo and Pascal and I
> believe as the function which i used was scalar.
> 
> *Data :*
> 
> dput(df)structure(list(x = 1:10, y = 1:10), .Names = c("x", "y"),
> row.names = c(NA,
> -10L), class = "data.frame")
> 

You have been given the answer.
It only needs a minor variation:

newname.df <- paste0("df_", format(Sys.time(), "%Y_%m_%d_%H_%M_%S") )
assign(newname.df,df)

and if you wish

rm(list=c('df','newname.df'))

Or install package memisc (found by doing findFn("rename") from package sos) 
and use function rename(0; I have not tried this.

Berend
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Jonathan Greenberg
Rui:

Quick follow-up -- it looks like seek does do what I want (I see Simon
suggested it some time ago) -- what do mean by "trash your disk"?  What I'm
trying to accomplish is getting parallel, asynchronous writes to a large
binary image (just a binary file) working.  Each node writes to a different
sector of the file via mmap, "filling in" the values as the process runs,
but the file needs to be pre-created before I can mmap it.  Running a
writeBin with a bunch of 0s would mean I'd basically have to write the file
twice, but the seek/ff trick seems to be much faster.

Do I risk doing some damage to my filesystem if I use seek?  I see there is
a strongly worded warning in the help for ?seek:

"Use of seek on Windows is discouraged. We have found so many errors in the
Windows implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the *R* developers' time
with bug reports on Windows' deficiencies." --> there's no detail here on
which errors people have experienced, so I'm not sure if doing something as
simple as just "creating" a file using seek falls under the "discouraging"
category.

As a note, we are trying to work this up on both Windows and *nix systems,
hence our wanting to have a single approach that works on both OSs.

--j


On Thu, Sep 27, 2012 at 3:49 PM, Rui Barradas  wrote:

>  Hello,
>
> If you really need to trash your disk, why not use seek()?
>
> > fl <- file("Test.txt", open = "wb")
> > seek(fl, where = 1024, origin = "start", rw = "write")
> [1] 0
> > writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
> Warning message:
> In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
>   writeChar: more characters requested than are in the string - will
> zero-pad
> > close(fl)
>
>
> File "Test.txt" is now 1Kb in size.
>
> Hope this helps,
>
> Rui Barradas
> Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
>
> Folks:
>
> Asked this question some time ago, and found what appeared (at first) to be
> the best solution, but I'm now finding a new problem.  First off, it seemed
> like ff as Jens suggested worked:
>
> # outdata_ncells = the number of rows * number of columns * number of bands
> in an image:
> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
> finalizer(out) <- close
> close(out)
>
> This was working fine until I attempted to set length to a VERY large
> number: outdata_ncells = 17711913600.  This would create a file that is
> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
> filesystem can handle).  However, length appears to be restricted
> by .Machine$integer.max (I'm on a 64-bit windows box):
>
>  .Machine$integer.max
>
>  [1] 2147483647
>
> Any suggestions on how to solve this problem for much larger file sizes?
>
> --j
>
>
> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg  
> wrote:
>
>
>  Thanks, all!  I'll try these out.  I'm trying to work up something that is
> platform independent (if possible) for use with mmap.  I'll do some tests
> on these suggestions and see which works best. I'll try to report back in a
> few days.  Cheers!
>
> --j
>
>
>
> 2012/5/3 "Jens Oehlschlägel"  
> 
>
>  Jonathan,
>
> On some filesystems (e.g. NTFS, see below) it is possible to create
> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
> actually writing initial values.
> Package 'ff' does this automatically and also allows to access the file
> in parallel. Check the example below and see how big file creation is
> immediate.
>
> Jens Oehlschlägel
>
>
>
>  library(ff)
> library(snowfall)
> ncpus <- 2
> n <- 1e8
> system.time(
>
>  + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
> + )
>User  System verstrichen
>0.010.000.02
>
>  # check finalizer, with an explicit filename we should have a 'close'
>
>  finalizer
>
>  finalizer(x)
>
>  [1] "close"
>
>  # if not, set it to 'close' inorder to not let slaves delete x on slave
>
>  shutdown
>
>  finalizer(x) <- "close"
> sfInit(parallel=TRUE, cpus=ncpus, type="SOCK")
>
>  R Version:  R version 2.15.0 (2012-03-30)
>
> snowfall 1.84 initialized (using snow 0.3-9): parallel execution on 2
> CPUs.
>
>
>  sfLibrary(ff)
>
>  Library ff loaded.
> Library ff loaded in cluster.
>
> Warnmeldung:
> In library(package = "ff", character.only = TRUE, pos = 2, warn.conflicts
> = TRUE,  :
>   'keep.source' is deprecated and will be ignored
>
>  sfExport("x") # note: do not export the same ff multiple times
> # explicitely opening avoids a gc problem
> sfClusterEval(open(x, caching="mmeachflush")) # opening with
>
>  'mmeachflush' inststead of 'mmnoflush' is a bit slower but prevents OS
> write storms when the file is larger than RAM
> [[1]]
> [1] TRUE
>
> [[2]]
> [1] TRUE
>
>
>  system.time(
>
>  + sfLapply( chunk(x, length=ncpus), function(i){
> +   x[i] <- runif(sum(i))
> +   invisible()
> + })
> + )
>User  System verstrichen
>0.000.00   30.78
>
>

Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 9:56 AM, Rich Shepard wrote:

> On Fri, 28 Sep 2012, David Winsemius wrote:
> 
>> Wouldn't that involve specifying the 'subset' parameter (if bwplot accepts
>> a subset argument) or using the 'subset' function to pass the desired rows
>> to the data argument if it doesn't?
> 
> David,
> 
>  That's what I tried:
> 
>>> bwplot(quant ~ param | era, data=mg.d, main='Magnesium', ylab='Concentration
>>> (mg/L)', subset=era('Pre-mining'))

Sigh. If I were testing that strategy (which I did not try because you were too 
busy to have included a working example)  I would have written it:

bwplot(quant ~ param , data=mg.d, main='Magnesium', ylab='Concentration
(mg/L)', subset= era=='Pre-mining' )

That passes a logical vector which will "work" only if bwplot created an local 
environment where column names of the 'data' argument have been added to the 
local namespce. I do not know if that is true. I just looked at the bwplot help 
page and do not see a subset argument documented there.

The other suggestion which it seems you were also to busy too have tried was:

bwplot(quant ~ param ,  main='Magnesium', ylab='Concentration
(mg/L)', data = subset( mg.dsubset,  era=='Pre-mining' ) )

Wrapping a column name around a factor level with parentheses (which R takes to 
mean there is a function named 'era' to be applied)  and expecting R to 
understand the you want a subset seems doomed to failure.

It makes no sense to me to condition on a factor that you know for certainty 
has only one level in the data being offered.
--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread Rich Shepard

On Fri, 28 Sep 2012, David Winsemius wrote:


bwplot(quant ~ param , data=mg.d, main='Magnesium', ylab='Concentration
(mg/L)', subset= era=='Pre-mining' )


David, Don:

  Thank you. I tried subset= and era== separately, not together.

  Now I know.

Much appreciated,

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lattice bwplot(): Conditioning on one factor

2012-09-28 Thread Bert Gunter
Yes. Now I understand what was wanted.

1. the subset argument is certainly documented on the Help page:

subset  

An expression that evaluates to a logical or integer indexing vector.
Like groups, it is evaluated in data. Only the resulting rows of data
are used for the plot. If subscripts is TRUE, the subscripts provided
to the panel function will be indices referring to the rows of data
prior to the subsetting. Whether levels of factors in the data frame
that are unused after the subsetting will be dropped depends on the
drop.unused.levels argument.

Had the OP read this carefully, he would have presumably recognized
the errors in his specification.

2. Here is a small reproducible example to show how it should be done
(probably unnecessary now):

> df <-expand.grid(a = letters[1:3],b=LETTERS[1:2])
> df <- df[rep(1:6,10),]
> df$y <- runif(60)
> bwplot(y~a|b, dat=df,subset = (b=="A"))
## The logical condition is parenthesized only for clarity

Cheers,
Bert



On Fri, Sep 28, 2012 at 10:10 AM, David Winsemius
 wrote:
>
> On Sep 28, 2012, at 9:56 AM, Rich Shepard wrote:
>
>> On Fri, 28 Sep 2012, David Winsemius wrote:
>>
>>> Wouldn't that involve specifying the 'subset' parameter (if bwplot accepts
>>> a subset argument) or using the 'subset' function to pass the desired rows
>>> to the data argument if it doesn't?
>>
>> David,
>>
>>  That's what I tried:
>>
 bwplot(quant ~ param | era, data=mg.d, main='Magnesium', 
 ylab='Concentration
 (mg/L)', subset=era('Pre-mining'))
>
> Sigh. If I were testing that strategy (which I did not try because you were 
> too busy to have included a working example)  I would have written it:
>
> bwplot(quant ~ param , data=mg.d, main='Magnesium', ylab='Concentration
> (mg/L)', subset= era=='Pre-mining' )
>
> That passes a logical vector which will "work" only if bwplot created an 
> local environment where column names of the 'data' argument have been added 
> to the local namespce. I do not know if that is true. I just looked at the 
> bwplot help page and do not see a subset argument documented there.
>
> The other suggestion which it seems you were also to busy too have tried was:
>
> bwplot(quant ~ param ,  main='Magnesium', ylab='Concentration
> (mg/L)', data = subset( mg.dsubset,  era=='Pre-mining' ) )
>
> Wrapping a column name around a factor level with parentheses (which R takes 
> to mean there is a function named 'era' to be applied)  and expecting R to 
> understand the you want a subset seems doomed to failure.
>
> It makes no sense to me to condition on a factor that you know for certainty 
> has only one level in the data being offered.
> --
>
> David Winsemius, MD
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.packages on windows

2012-09-28 Thread Uwe Ligges



On 28.09.2012 00:32, Duncan Murdoch wrote:

On 12-09-27 2:53 PM, Anju R wrote:

Sometimes when I try to install certain packages I get a warning message.
For example, I tried to install the package "Imtest" on windows R version
2.15.1 and got the following message:

Warning message:
package ‘Imtest’ is not available (for R version 2.15.1)

How can I install the above package? Why do I get the above Warning
message?


It probably means exactly what it says, except that the information is
about the mirror you are using.

I would try another mirror.  If that doesn't solve it, then it probably
means that the package is really not available for 2.15.1.

You can look on the cran.r-project.org website for information about it,
and probably download the source from there, but you will probably need
to fix whatever is wrong with it before it will work.



Or in other words:

There is no such package "Imtest" on CRAN, perhaps you are looking for 
"lmtest"?


Uwe Ligges




Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova and tukey-grouping

2012-09-28 Thread arun
Hi,

As I mentioned earlier, these are just guess work until you provide a subset of 
your data with dput().  Also, please check the structure of the data with str().

A.K.  






- Original Message -
From: Landi 
To: r-help@r-project.org
Cc: 
Sent: Friday, September 28, 2012 10:35 AM
Subject: Re: [R] Anova and tukey-grouping

Hello !

Thanks for your advice. I tried it, but the output is the same:
> HSD.test(anova.typabunmit, "typ", group=TRUE)
Name:  typ 
ds.typabunmit$typ 

I don't get the values...!?!?



--
View this message in context: 
http://r.789695.n4.nabble.com/Anova-and-tukey-grouping-tp4644485p4644513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to write R package

2012-09-28 Thread Uwe Ligges



On 28.09.2012 14:22, Duncan Murdoch wrote:

On 27/09/2012 5:15 PM, Dr. Alireza Zolfaghari wrote:

Hi List,
Would you please send me a good link to talk me through on how to
write a R
package?



See the ?package.skeleton help page.  After you have run it, follow the
instructions in the "Read-and-delete-me" file that it will create.

For full details, see the Writing R Extensions manual.

For modifying the package after you've finished the "Read-and-delete-me"
instructions, just manually add *.R files where the rest of them are,
and use the prompt() function to produce skeleton documentation.

That's about it, but you can read more if you like in a tutorial I gave
a few years ago at a UseR meeting in Dortmund:

http://www.statistik.uni-dortmund.de/useR-2008/slides/Murdoch.pdf




... and there are others who gave talks or tutorials about it (inlcuding 
myself).


Nevertheless, I'd recommend to look into the manual "Writing R 
Extensions" which is updated with R and with the changes in the package 
related mechanisms --- while all our talks and tutorials won't get 
updated. Probably Duncan's is still correct, but I want to make this 
remark for the list's archives.


Best,
Uwe Ligges







Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Bhupendrasinh Thakre
Thanks a ton Berend. That worked like a charm..
R comes with thousands of Sweet Surprises everyday


Bhupendrasinh Thakre




On Sep 28, 2012, at 12:00 PM, Berend Hasselman  wrote:

> 
> On 28-09-2012, at 18:40, Bhupendrasinh Thakre  wrote:
> 
>> Hi Everyone,
>> 
>> Sorry for coming back again with a new problem.
>> Editing question, session info and data so you don't have to scroll till
>> the end of page.
>> 
>> *Situation :*
>> 
>> I have a data frame and it's name is df. Now I want to add Time Stamp to
>> the end of *"name" of "data Frame" i.e. "df_system_time"*. Previously it
>> was running great and thanks to Dr. Winsemius , Kimmo and Pascal and I
>> believe as the function which i used was scalar.
>> 
>> *Data :*
>> 
>> dput(df)structure(list(x = 1:10, y = 1:10), .Names = c("x", "y"),
>> row.names = c(NA,
>> -10L), class = "data.frame")
>> 
> 
> You have been given the answer.
> It only needs a minor variation:
> 
> newname.df <- paste0("df_", format(Sys.time(), "%Y_%m_%d_%H_%M_%S") )
> assign(newname.df,df)
> 
> and if you wish
> 
> rm(list=c('df','newname.df'))
> 
> Or install package memisc (found by doing findFn("rename") from package sos) 
> and use function rename(0; I have not tried this.
> 
> Berend


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Question

2012-09-28 Thread Bert Gunter
On Fri, Sep 28, 2012 at 11:15 AM, Bhupendrasinh Thakre
 wrote:
> Thanks a ton Berend. That worked like a charm..
> R comes with thousands of Sweet Surprises everyday

-- Not for those who read the docs. :-o

-- Bert

>
>
> Bhupendrasinh Thakre
>
>
>
>
> On Sep 28, 2012, at 12:00 PM, Berend Hasselman  wrote:
>
>>
>> On 28-09-2012, at 18:40, Bhupendrasinh Thakre  wrote:
>>
>>> Hi Everyone,
>>>
>>> Sorry for coming back again with a new problem.
>>> Editing question, session info and data so you don't have to scroll till
>>> the end of page.
>>>
>>> *Situation :*
>>>
>>> I have a data frame and it's name is df. Now I want to add Time Stamp to
>>> the end of *"name" of "data Frame" i.e. "df_system_time"*. Previously it
>>> was running great and thanks to Dr. Winsemius , Kimmo and Pascal and I
>>> believe as the function which i used was scalar.
>>>
>>> *Data :*
>>>
>>> dput(df)structure(list(x = 1:10, y = 1:10), .Names = c("x", "y"),
>>> row.names = c(NA,
>>> -10L), class = "data.frame")
>>>
>>
>> You have been given the answer.
>> It only needs a minor variation:
>>
>> newname.df <- paste0("df_", format(Sys.time(), "%Y_%m_%d_%H_%M_%S") )
>> assign(newname.df,df)
>>
>> and if you wish
>>
>> rm(list=c('df','newname.df'))
>>
>> Or install package memisc (found by doing findFn("rename") from package sos) 
>> and use function rename(0; I have not tried this.
>>
>> Berend
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to test if there is a subvector in a longer vector

2012-09-28 Thread Atte Tenkanen
Thank you!
___
Lähettäjä: Berend Hasselman [b...@xs4all.nl]
Lähetetty: 28. syyskuuta 2012 10:47
Vastaanottaja: Atte Tenkanen
Cc: R help
Aihe: Re: [R] How to test if there is a subvector in a longer vector

On 28-09-2012, at 07:41, Atte Tenkanen  wrote:

> Sorry. I should have mentioned that the order of the components is important.
>
> So c(1,4,6) is accepted as a subvector of c(2,1,1,4,6,3), but not of 
> c(2,1,1,6,4,3).
>
> How to test this?

See this discussion for a variety of solutions.

http://r.789695.n4.nabble.com/matching-a-sequence-in-a-vector-td4389523.html#a4393453

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Arules - predict function issues - subscript out of bounds

2012-09-28 Thread alicechao
Hi Ankur, 

I am running into the exact same issue you have described above. Were you
able to find out why it didn't work on your data set and resolve it? If yes,
could you share? 

Much thanks & regards,
Alice 



--
View this message in context: 
http://r.789695.n4.nabble.com/Arules-predict-function-issues-subscript-out-of-bounds-tp4634422p4644546.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Rui Barradas

Hello,

I've written a function to try to answer to your op request, but I've 
run into a problem. See in the end.

In the mean time, inline.
Em 28-09-2012 17:44, Jonathan Greenberg escreveu:

Rui:

Quick follow-up -- it looks like seek does do what I want (I see Simon
suggested it some time ago) -- what do mean by "trash your disk"?
Nothing special, just that sometimes there are good ways of doing so. 
mmap seems to be safe.

   What I'm
trying to accomplish is getting parallel, asynchronous writes to a large
binary image (just a binary file) working.  Each node writes to a different
sector of the file via mmap, "filling in" the values as the process runs,
but the file needs to be pre-created before I can mmap it.  Running a
writeBin with a bunch of 0s would mean I'd basically have to write the file
twice, but the seek/ff trick seems to be much faster.

Do I risk doing some damage to my filesystem if I use seek?  I see there is
a strongly worded warning in the help for ?seek:

"Use of seek on Windows is discouraged. We have found so many errors in the
Windows implementation of file positioning that users are advised to use it
only at their own risk, and asked not to waste the *R* developers' time
with bug reports on Windows' deficiencies." --> there's no detail here on
which errors people have experienced, so I'm not sure if doing something as
simple as just "creating" a file using seek falls under the "discouraging"
category.


I'm not a great system programmer but in 20+ years of using seek on 
Windows has shown nothing of the sort. In fact, I've just found a 
problem with ubuntu 12.04, where seek gives the expected result on 
Windows, it goes up to a certain point on ubuntu and then "stops 
seeking", or whatever is happening. I installed ubuntu very recently so 
I really don't know why the behavior that you can see in the example run 
below. But I do that Windows 7 is causing no problem, as expected.

As a note, we are trying to work this up on both Windows and *nix systems,
hence our wanting to have a single approach that works on both OSs.

--j


#
# Function: creates a file of ascii nulls using seek/writeBin. File size 
can be big.

#
createBig <- function(filename, size){
if(size == 0) return(0)
chunk <- .Machine$integer.max
nchunks <- as.integer(size / chunk)
rest <- size - as.double(nchunks)*as.double(chunk)
fl <- file(filename, open = "wb")
for(i in seq_len(nchunks)){
seek(fl, where = chunk - 1, origin = "current", rw = "write")
writeBin(raw(1), fl)
# -- debug --
print(seek(fl, where = NA))
}
if(rest > 0){
seek(fl, where = rest - 1, origin = "current", rw = "write")
writeBin(raw(1), fl)
}
close(fl)
}

As you can see from the debug prints, on Windows 7,  everything works as 
planned while on ubuntu 12.04 when it reaches 17Gb seek stops seeking. 
The increments in file size become 1 byte at a time, explained by the 
writeBin instruction. (The different, slightly larger, size is 
irrelevant, the code was ran several times all with the same result:  at 
17179869176 bytes it no longer works.)


#
#
# System: Windows 7 / R 2.15.1

size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 22195364413

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 19327352823
[1] 21474836470

file.info("Test.txt")$size
[1] 22195364413
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 10
file.info("Test.txt")$size %% .Machine$integer.max
[1] 720527943

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

#
#
# System: ubuntu 12.04 precise pangolim / R 2.15.1
size <- 10*.Machine$integer.max + sample(.Machine$integer.max, 1)
size
[1] 23091487381

createBig("Test.txt", size)
[1] 2147483647
[1] 4294967294
[1] 6442450941
[1] 8589934588
[1] 10737418235
[1] 12884901882
[1] 15032385529
[1] 17179869176
[1] 17179869177
[1] 17179869178

file.info("Test.txt")$size
[1] 17179869179
file.info("Test.txt")$size %/% .Machine$integer.max
[1] 8
file.info("Test.txt")$size %% .Machine$integer.max
[1] 3


sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C

Re: [R] changing outlier shapes of boxplots using lattice

2012-09-28 Thread ilai
On Fri, Sep 28, 2012 at 6:57 AM, Richard M. Heiberger wrote:

> Elaine,
>
> For panel.bwplot you see that the central dot and the outlier dots are
> controlled by
> the same pch argument.


??? I don't think so...

bwplot(rgamma(20,.1,1)~gl(2,10), pch=rep(17,2),
panel = lattice::panel.bwplot)

I think you mean panel.bwplot.intermidiate.hh ?

BTW thank you for the useful HH package but in this case OP is using it
with no "at" argument, so why not

Diet.colors <- c("forestgreen", "darkgreen","chocolate1","darkorange2",
"sienna2","red2","firebrick3","saddlebrown","coral4","chocolate4","darkblue","navy","grey38")
 bwplot(rgamma(20*13,1,.1)~gl(13,20),
  fill = Diet.colors, pch = "|",
  par.settings = list(box.umbrella=list(lty=1)))

cheers



I initially set the pch="|" to match your first
> example with the horizontal
> indicator for the median.  I would be inclined to use the default circle
> for the outliers and
> therefore also for the median.
>
> Rich
>
> On Fri, Sep 28, 2012 at 7:13 AM, Sarah Goslee  >wrote:
>
> > I would guess that if you find the bit that says pch="|" and change it to
> > pch=1 it will solve your question, and that reading ?par will tell you
> why.
> >
> > Sarah
> >
> > On Thursday, September 27, 2012, Elaine Kuo wrote:
> >
> > > Hello
> > >
> > > This is Elaine.
> > >
> > > I am using package lattice to generate boxplots.
> > > Using Richard's code, the display was almost perfect except the outlier
> > > shape.
> > > Based on the following code, the outliers are vertical lines.
> > > However, I want the outliers to be empty circles.
> > > Please kindly help how to modify the code to change the outlier shapes.
> > > Thank you.
> > >
> > > code
> > > package (lattice)
> > >
> > > dataN <- data.frame(GE_distance=rnorm(260),
> > >
> > > Diet_B=factor(rep(1:13, each=20)))
> > >
> > > Diet.colors <- c("forestgreen", "darkgreen","chocolate1","darkorange2",
> > >
> > >  "sienna2","red2","firebrick3","saddlebrown","coral4",
> > >
> > >  "chocolate4","darkblue","navy","grey38")
> > >
> > > levels(dataN$Diet_B) <- Diet.colors
> > >
> > > bwplot(GE_distance ~ Diet_B, data=dataN,
> > >
> > >xlab=list("Diet of Breeding Ground", cex = 1.4),
> > >
> > >ylab = list(
> > >
> > >  "Distance between Centers of B and NB Range (1000 km)",
> > >
> > >  cex = 1.4),
> > >
> > >panel=panel.bwplot.intermediate.hh,
> > >
> > >col=Diet.colors,
> > >
> > >pch=rep("|",13),
> > >
> > >scales=list(x=list(rot=90)),
> > >
> > >par.settings=list(box.umbrella=list(lty=1)))
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org  mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Sarah Goslee
> > http://www.stringpage.com
> > http://www.sarahgoslee.com
> > http://www.functionaldiversity.org
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Better way of Grouping?

2012-09-28 Thread Charles Determan Jr
Hello R users,

This is more of a convenience question that I hope others might find useful
if there is a better answer.  I work with large datasets that requires
multiple parsing stages for different analysis.  For example, compare group
3 vs. group 4.  A more complicated comparison would be time B in group 3 of
group L with B in group 4 of group L.  I normally subset each group with
the following type of code.

data=read(...)

#L v D
L=data[LvD %in% c("L"),]
D=data[LvD %in% c("D"),]

#Groups 3 and 4 within L and D
group3L=L[group %in% c("3"),]
group4L=L[group %in% c("3"),]

group3D=D[group %in% c("3"),]
group4D=D[group %in% c("3"),]

#Times B, S45, FR2, FR8
you get the idea


Is there a more efficient way to subset groups?  Thanks for any insight.

Regards,
Charles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [R-sig-hpc] Quickest way to make a large "empty" file on disk?

2012-09-28 Thread Simon Urbanek

On Sep 28, 2012, at 12:44 PM, Jonathan Greenberg wrote:

> Rui:
> 
> Quick follow-up -- it looks like seek does do what I want (I see Simon
> suggested it some time ago) -- what do mean by "trash your disk"?  

I can't speak for Rui, but the difference between seeking and explicit write is 
that the FS can optimize the former by not actually writing anything to disk 
(which is why it's so fast on some OS/FS combos). However, what this means that 
the layout on the disk may not be sequential depending on the write patterns of 
the actual data blocks, because the FS may keep a mask of unused blocks and 
don't write them. But that is just a FS issue and thus varies vasty by OS and 
FS. For your use this probably doesn't matter as you probably don't need to 
stream the resulting file at the end.


> What I'm
> trying to accomplish is getting parallel, asynchronous writes to a large
> binary image (just a binary file) working.  Each node writes to a different
> sector of the file via mmap, "filling in" the values as the process runs,
> but the file needs to be pre-created before I can mmap it.  Running a
> writeBin with a bunch of 0s would mean I'd basically have to write the file
> twice, but the seek/ff trick seems to be much faster.
> 
> Do I risk doing some damage to my filesystem if I use seek?  I see there is
> a strongly worded warning in the help for ?seek:
> 
> "Use of seek on Windows is discouraged. We have found so many errors in the
> Windows implementation of file positioning that users are advised to use it
> only at their own risk, and asked not to waste the *R* developers' time
> with bug reports on Windows' deficiencies." --> there's no detail here on
> which errors people have experienced, so I'm not sure if doing something as
> simple as just "creating" a file using seek falls under the "discouraging"
> category.
> 

Quick search in my mail shows issues that were related to what Windows reports 
as the seek location on text files when querying. AFAICS it did not affect the 
side-effect of seek which is what you're interested in.

Cheers,
Simon


> As a note, we are trying to work this up on both Windows and *nix systems,
> hence our wanting to have a single approach that works on both OSs.
> 
> --j
> 
> 
> On Thu, Sep 27, 2012 at 3:49 PM, Rui Barradas  wrote:
> 
>> Hello,
>> 
>> If you really need to trash your disk, why not use seek()?
>> 
>>> fl <- file("Test.txt", open = "wb")
>>> seek(fl, where = 1024, origin = "start", rw = "write")
>> [1] 0
>>> writeChar(character(1), fl, nchars = 1, useBytes = TRUE)
>> Warning message:
>> In writeChar(character(1), fl, nchars = 1, useBytes = TRUE) :
>>  writeChar: more characters requested than are in the string - will
>> zero-pad
>>> close(fl)
>> 
>> 
>> File "Test.txt" is now 1Kb in size.
>> 
>> Hope this helps,
>> 
>> Rui Barradas
>> Em 27-09-2012 20:17, Jonathan Greenberg escreveu:
>> 
>> Folks:
>> 
>> Asked this question some time ago, and found what appeared (at first) to be
>> the best solution, but I'm now finding a new problem.  First off, it seemed
>> like ff as Jens suggested worked:
>> 
>> # outdata_ncells = the number of rows * number of columns * number of bands
>> in an image:
>> out<-ff(vmode="double",length=outdata_ncells,filename=filename)
>> finalizer(out) <- close
>> close(out)
>> 
>> This was working fine until I attempted to set length to a VERY large
>> number: outdata_ncells = 17711913600.  This would create a file that is
>> 131.964GB.  Big, but not obscenely so (and certainly not larger than the
>> filesystem can handle).  However, length appears to be restricted
>> by .Machine$integer.max (I'm on a 64-bit windows box):
>> 
>> .Machine$integer.max
>> 
>> [1] 2147483647
>> 
>> Any suggestions on how to solve this problem for much larger file sizes?
>> 
>> --j
>> 
>> 
>> On Thu, May 3, 2012 at 10:44 AM, Jonathan Greenberg  
>> wrote:
>> 
>> 
>> Thanks, all!  I'll try these out.  I'm trying to work up something that is
>> platform independent (if possible) for use with mmap.  I'll do some tests
>> on these suggestions and see which works best. I'll try to report back in a
>> few days.  Cheers!
>> 
>> --j
>> 
>> 
>> 
>> 2012/5/3 "Jens Oehlschlägel"  
>> 
>> 
>> Jonathan,
>> 
>> On some filesystems (e.g. NTFS, see below) it is possible to create
>> 'sparse' memory-mapped files, i.e. reserving the space without the cost of
>> actually writing initial values.
>> Package 'ff' does this automatically and also allows to access the file
>> in parallel. Check the example below and see how big file creation is
>> immediate.
>> 
>> Jens Oehlschlägel
>> 
>> 
>> 
>> library(ff)
>> library(snowfall)
>> ncpus <- 2
>> n <- 1e8
>> system.time(
>> 
>> + x <- ff(vmode="double", length=n, filename="c:/Temp/x.ff")
>> + )
>>   User  System verstrichen
>>   0.010.000.02
>> 
>> # check finalizer, with an explicit filename we should have a 'close'
>> 
>> finalizer
>> 
>> finalizer(x)
>> 
>> [1] "close"
>> 
>> # if not, set it

[R] Select Original and Duplicates

2012-09-28 Thread Adam Gabbert
I would like to select a all the duplicate rows of a data frame including
the original.  Any help would be much appreciated.  This is where I'm at so
far. Thanks.

#Sample data frame:
df <- read.table(header=T, con <- textConnection('
 label value
 A 4
 B 3
 C 6
 B 3
 B 1
 A 2
 A 4
 A 4
'))
close(con)

# Duplicate entries
df[duplicated(df),]

# label value
# B 3
# A 4
# A 4

#I want to select all the rows that are duplicated including the original
#This is the output I want
# label value
# B 3
# B 3
# A 4
# A 4
# A 4

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select Original and Duplicates

2012-09-28 Thread Rui Barradas

Hello,

Try the following.


idx <- duplicated(df) | duplicated(df, fromLast = TRUE)
df[idx, ]

Note that they are returned in their original order in the df.

Hope this helps,

Rui Barradas

Em 28-09-2012 21:11, Adam Gabbert escreveu:

I would like to select a all the duplicate rows of a data frame including
the original.  Any help would be much appreciated.  This is where I'm at so
far. Thanks.

#Sample data frame:
df <- read.table(header=T, con <- textConnection('
  label value
  A 4
  B 3
  C 6
  B 3
  B 1
  A 2
  A 4
  A 4
'))
close(con)

# Duplicate entries
df[duplicated(df),]

# label value
# B 3
# A 4
# A 4

#I want to select all the rows that are duplicated including the original
#This is the output I want
# label value
# B 3
# B 3
# A 4
# A 4
# A 4

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better way of Grouping?

2012-09-28 Thread Jeff Newmiller
You have not specified the objective function you are trying to optimize with 
your term "efficient", or what you do with all of these subsets once you have 
them. 

For notational simplification and completeness of coverage (not necessarily 
computational speedup) you might want to look at "tapply" or ddply/dlply from 
the plyr package. If you build lists of subsets you can index into them 
according to grouping value. You can use expand.grid to build all permutations 
of grouping values to use as indexes into those lists of subsets.

To reiterate, you have not indicated what you want to do with these subsets, so 
there could be special-purpose functions that do what you want.  As always, 
reproducible code leads to reproducible answers. :)
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Charles Determan Jr  wrote:

>Hello R users,
>
>This is more of a convenience question that I hope others might find
>useful
>if there is a better answer.  I work with large datasets that requires
>multiple parsing stages for different analysis.  For example, compare
>group
>3 vs. group 4.  A more complicated comparison would be time B in group
>3 of
>group L with B in group 4 of group L.  I normally subset each group
>with
>the following type of code.
>
>data=read(...)
>
>#L v D
>L=data[LvD %in% c("L"),]
>D=data[LvD %in% c("D"),]
>
>#Groups 3 and 4 within L and D
>group3L=L[group %in% c("3"),]
>group4L=L[group %in% c("3"),]
>
>group3D=D[group %in% c("3"),]
>group4D=D[group %in% c("3"),]
>
>#Times B, S45, FR2, FR8
>you get the idea
>
>
>Is there a more efficient way to subset groups?  Thanks for any
>insight.
>
>Regards,
>Charles
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merging multiple columns into one column

2012-09-28 Thread Meredith Ballard LaBeau
Good Evening-
 I have a dataframe that has 10 columns that has a header and 7306 rows in
each column, I want to combine these columns into one. I utilized the stack
function but it only returned 3/4 of the data...my code is:
where nfcuy_bw is the dataframe with 7305 obs. and 10 variables
Once I apply this code I only receive a data frame with 58440 obs. of 2
variables, of which there should be 73,050 obs. of 2 variables, just
wondering what is happening here?

 View(nfcuy_bw)

attach(nfcuy_bw)

cuyahoga_nf<-data.frame(s5,s10,s25,s27,s33,s41,s51,his_c)

cuy_nf<-stack(cuyahoga_nf)

Thanks
Meredith

-- 
Doctoral Candidate
Department of Civil and Environmental Engineering
Michigan Technological University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better way of Grouping?

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 11:59 AM, Charles Determan Jr wrote:

> Hello R users,
> 
> This is more of a convenience question that I hope others might find useful
> if there is a better answer.  I work with large datasets that requires
> multiple parsing stages for different analysis.  For example, compare group
> 3 vs. group 4.  A more complicated comparison would be time B in group 3 of
> group L with B in group 4 of group L.  I normally subset each group with
> the following type of code.
> 
> data=read(...)
> 
> #L v D
> L=data[LvD %in% c("L"),]
> D=data[LvD %in% c("D"),]
> 
> #Groups 3 and 4 within L and D
> group3L=L[group %in% c("3"),]
> group4L=L[group %in% c("3"),]

Assume you meant to have a "4" there
> 
> group3D=D[group %in% c("3"),]
> group4D=D[group %in% c("3"),]

Ditto. Only makes sense with a "4".



The usual way is to use:

lapply( split(data, interaction(data$LvD, data$group)) ,
 fun( subdf) {} )

That way you do not end up littering you workspace with subsidiary subsets of 
you main data object.


> 
> #Times B, S45, FR2, FR8
> you get the idea
> 
> 
> Is there a more efficient way to subset groups?  Thanks for any insight.
> 
-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging multiple columns into one column

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 2:51 PM, Meredith Ballard LaBeau wrote:

> Good Evening-
> I have a dataframe that has 10 columns that has a header and 7306 rows in
> each column, I want to combine these columns into one. I utilized the stack
> function but it only returned 3/4 of the data...my code is:
> where nfcuy_bw is the dataframe with 7305 obs. and 10 variables
> Once I apply this code I only receive a data frame with 58440 obs. of 2
> variables, of which there should be 73,050 obs. of 2 variables, just
> wondering what is happening here?
> 
> View(nfcuy_bw)
> 
> attach(nfcuy_bw)

Using 'attach' is a great way to produce confusing errors.

> 
> cuyahoga_nf<-data.frame(s5,s10,s25,s27,s33,s41,s51,his_c)
> 
> cuy_nf<-stack(cuyahoga_nf)

Unable to do much else in the absence of a dataset, much less a summary of 
these objects,  whose creation is your responsibility, not ours.

-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Heatmap Colors

2012-09-28 Thread Nick Fankhauser

Hello R-Users!

I'm using a heatmap to visualize a matrix of values between -1 and 3.
How can I set the colors so that white is zero, below zero is blue of 
increasing intensity towards -1 and above zero is red of increasing 
intensity towards red?


I tried like this (using the marray and gplots packages from bioconductor):
mcol <- maPalette(low="blue", mid="white", high="red",k=100)
heatmap.2(my_matrix, col=mcol)

But white does not correspond to zero, because the value distribution is 
not symmetrical, so that zero is not in the middle.
Is it somehow possible to create a color palette with white centered at 
zero?


Nick Fankhauser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging multiple columns into one column

2012-09-28 Thread Bert Gunter
?unlist

(A data frame is a list, as ?data.frame explains. Also the Intro to R
tutorial, which should be read by everyone beginning with R).

-- Bert

On Fri, Sep 28, 2012 at 2:51 PM, Meredith Ballard LaBeau
 wrote:
> Good Evening-
>  I have a dataframe that has 10 columns that has a header and 7306 rows in
> each column, I want to combine these columns into one. I utilized the stack
> function but it only returned 3/4 of the data...my code is:
> where nfcuy_bw is the dataframe with 7305 obs. and 10 variables
> Once I apply this code I only receive a data frame with 58440 obs. of 2
> variables, of which there should be 73,050 obs. of 2 variables, just
> wondering what is happening here?
>
>  View(nfcuy_bw)
>
> attach(nfcuy_bw)
>
> cuyahoga_nf<-data.frame(s5,s10,s25,s27,s33,s41,s51,his_c)
>
> cuy_nf<-stack(cuyahoga_nf)
>
> Thanks
> Meredith
>
> --
> Doctoral Candidate
> Department of Civil and Environmental Engineering
> Michigan Technological University
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] changing outlier shapes of boxplots using lattice

2012-09-28 Thread Elaine Kuo
Hello Ilai,

Thank you for the response.
It did help a lot.

However, a beginner to lattice has three questions.

Q1

Please kindly explain why "in this case OP is using it with no "at"
argument,""
so it is possible to display the median and the outliers with different pch?

Q2.
what is the relationship between package "HH" and graphic-drawing?
I checked ??HH and found little explanation on its function of
graphic-drawing.

Q3

Please kindly advise how to make outliers empty circle (pch=2) in this case
as the code below.
Thank you.

code

Diet.colors <-
c("forestgreen","darkgreen","chocolate1","darkorange2","sienna2",

"red2","firebrick3","saddlebrown","coral4","chocolate4","darkblue","navy","grey38")

levels(dataN$Diet_B) <- diet.code


bwplot(MS_midpoint_lat~Diet_B, data=dataN,
xlab=list("Diet of Breeding Ground", cex = 1.4),
ylab = list("Latitudinal Midpoint Breeding Ground ",cex = 1.4),
lwd=1.5,
cex.lab=1.4, cex.axis=1.2,
font.axis=2,
cex=1.5,
las=1,
panel=panel.bwplot.intermediate.hh,
bty="l",
col=Diet.colors,
pch=rep("l",13),
scales=list(x=list(rot=90)),
par.settings=list(plot.symbol = list(pch = 2, cex =
2),box.umbrella=list(lty=1)))

Elaine


On Sat, Sep 29, 2012 at 2:44 AM, ilai  wrote:

> On Fri, Sep 28, 2012 at 6:57 AM, Richard M. Heiberger wrote:
>
>> Elaine,
>>
>> For panel.bwplot you see that the central dot and the outlier dots are
>> controlled by
>> the same pch argument.
>
>
> ??? I don't think so...
>
> bwplot(rgamma(20,.1,1)~gl(2,10), pch=rep(17,2),
> panel = lattice::panel.bwplot)
>
> I think you mean panel.bwplot.intermidiate.hh ?
>
> BTW thank you for the useful HH package but in this case OP is using it
> with no "at" argument, so why not
>
> Diet.colors <- c("forestgreen", "darkgreen","chocolate1","darkorange2",
> "sienna2","red2","firebrick3","saddlebrown","coral4","chocolate4","darkblue","navy","grey38")
>  bwplot(rgamma(20*13,1,.1)~gl(13,20),
>   fill = Diet.colors, pch = "|",
>   par.settings = list(box.umbrella=list(lty=1)))
>
> cheers
>
>
>
> I initially set the pch="|" to match your first
>> example with the horizontal
>> indicator for the median.  I would be inclined to use the default circle
>> for the outliers and
>> therefore also for the median.
>>
>> Rich
>>
>> On Fri, Sep 28, 2012 at 7:13 AM, Sarah Goslee > >wrote:
>>
>> > I would guess that if you find the bit that says pch="|" and change it
>> to
>> > pch=1 it will solve your question, and that reading ?par will tell you
>> why.
>> >
>> > Sarah
>> >
>> > On Thursday, September 27, 2012, Elaine Kuo wrote:
>> >
>> > > Hello
>> > >
>> > > This is Elaine.
>> > >
>> > > I am using package lattice to generate boxplots.
>> > > Using Richard's code, the display was almost perfect except the
>> outlier
>> > > shape.
>> > > Based on the following code, the outliers are vertical lines.
>> > > However, I want the outliers to be empty circles.
>> > > Please kindly help how to modify the code to change the outlier
>> shapes.
>> > > Thank you.
>> > >
>> > > code
>> > > package (lattice)
>> > >
>> > > dataN <- data.frame(GE_distance=rnorm(260),
>> > >
>> > > Diet_B=factor(rep(1:13, each=20)))
>> > >
>> > > Diet.colors <- c("forestgreen",
>> "darkgreen","chocolate1","darkorange2",
>> > >
>> > >  "sienna2","red2","firebrick3","saddlebrown","coral4",
>> > >
>> > >  "chocolate4","darkblue","navy","grey38")
>> > >
>> > > levels(dataN$Diet_B) <- Diet.colors
>> > >
>> > > bwplot(GE_distance ~ Diet_B, data=dataN,
>> > >
>> > >xlab=list("Diet of Breeding Ground", cex = 1.4),
>> > >
>> > >ylab = list(
>> > >
>> > >  "Distance between Centers of B and NB Range (1000 km)",
>> > >
>> > >  cex = 1.4),
>> > >
>> > >panel=panel.bwplot.intermediate.hh,
>> > >
>> > >col=Diet.colors,
>> > >
>> > >pch=rep("|",13),
>> > >
>> > >scales=list(x=list(rot=90)),
>> > >
>> > >par.settings=list(box.umbrella=list(lty=1)))
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > __
>> > > R-help@r-project.org  mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>> >
>> >
>> > --
>> > Sarah Goslee
>> > http://www.stringpage.com
>> > http://www.sarahgoslee.com
>> > http://www.functionaldiversity.org
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __

Re: [R] Heatmap Colors

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 3:16 PM, Nick Fankhauser wrote:

> Hello R-Users!
> 
> I'm using a heatmap to visualize a matrix of values between -1 and 3.
> How can I set the colors so that white is zero, below zero is blue of 
> increasing intensity towards -1 and above zero is red of increasing intensity 
> towards red?
> 
> I tried like this (using the marray and gplots packages from bioconductor):
> mcol <- maPalette(low="blue", mid="white", high="red",k=100)
> heatmap.2(my_matrix, col=mcol)
> 
> But white does not correspond to zero, because the value distribution is not 
> symmetrical, so that zero is not in the middle.
> Is it somehow possible to create a color palette with white centered at zero?

The way you stated it at the beginning, I thought you should want the palette 
centered at 1 rather than 0:

test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:16)/16)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)

Perhaps I was being led astray by a somewhat similar question on StackOverflow.

-- 
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Heatmap Colors

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 4:52 PM, David Winsemius wrote:

> 
> On Sep 28, 2012, at 3:16 PM, Nick Fankhauser wrote:
> 
>> Hello R-Users!
>> 
>> I'm using a heatmap to visualize a matrix of values between -1 and 3.
>> How can I set the colors so that white is zero, below zero is blue of 
>> increasing intensity towards -1 and above zero is red of increasing 
>> intensity towards red?
>> 
>> I tried like this (using the marray and gplots packages from bioconductor):
>> mcol <- maPalette(low="blue", mid="white", high="red",k=100)
>> heatmap.2(my_matrix, col=mcol)
>> 
>> But white does not correspond to zero, because the value distribution is not 
>> symmetrical, so that zero is not in the middle.
>> Is it somehow possible to create a color palette with white centered at zero?
> 
> The way you stated it at the beginning, I thought you should want the palette 
> centered at 1 rather than 0:

Oopps ... should have the number of breaks match the number of colors:

test <- seq(-1,3, len=20)
shift.BR <- colorRamp(c("blue","white", "red"), bias=2)((1:20)/20)
tpal <- rgb(shift.BR, maxColorValue=255)
barplot(test,col = tpal)
> 
-- 

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Errors in if statement

2012-09-28 Thread JiangZhengyu

Hi guys, I have many rows (>1000) and columns (>30) of "geno" matrix. I use the 
following loop and condition statement (adapted from someone else code). I 
always have an error below.  I was wondering if anyone knows what's the problem 
& how to fix it.  
Thanks,Zhengyu  ### geno matrix P1  P2  P3  P4 
1  2  2  3 2 
 2  2  2  1 1
1  2  1  2  NANA 2  3  4  5 ###
for(i in 1:4) {
 cat(i,"")
 if(sum(geno[i,]!=2)>3 && sum(geno[i,]==1)>=1 && sum(geno[i,]==3)>=1){
   tmp = 1
   }
} ### 1 2 Error in if (sum(geno[i, ] != 2) > 3 && sum(geno[i, ] == 1) 
>= 1 && sum(geno[i,  : 
  missing value where TRUE/FALSE needed
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Better way of Grouping?

2012-09-28 Thread arun
Hi,
You can also use grep() to subset:


LD<-paste0(rep(rep(c(3,4),each=4),2),c(rep("L",8),rep("D",8)))
set.seed(1)
dat1<-data.frame(LD=LD,value=sample(1:15,16,replace=TRUE))
dat2<-within(dat1,{LD<-as.character(LD)})
dat2[grepl(".*L",dat2$LD),] # subset all L values
dat2[grepl(".*D",dat2$LD),] # subset all D values
 dat2[grepl("3D",dat2$LD),]
dat2[grepl("4D",dat2$LD),]


A.K.




- Original Message -
From: Charles Determan Jr 
To: r-help@r-project.org
Cc: 
Sent: Friday, September 28, 2012 2:59 PM
Subject: [R] Better way of Grouping?

Hello R users,

This is more of a convenience question that I hope others might find useful
if there is a better answer.  I work with large datasets that requires
multiple parsing stages for different analysis.  For example, compare group
3 vs. group 4.  A more complicated comparison would be time B in group 3 of
group L with B in group 4 of group L.  I normally subset each group with
the following type of code.

data=read(...)

#L v D
L=data[LvD %in% c("L"),]
D=data[LvD %in% c("D"),]

#Groups 3 and 4 within L and D
group3L=L[group %in% c("3"),]
group4L=L[group %in% c("3"),]

group3D=D[group %in% c("3"),]
group4D=D[group %in% c("3"),]

#Times B, S45, FR2, FR8
you get the idea


Is there a more efficient way to subset groups?  Thanks for any insight.

Regards,
Charles

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients

2012-09-28 Thread Paul Miller
Happy Friday Everyone,
 
Hope Friday afternoon doesn't turn out to be a terrible time to post a 
question. I've been doing a little data mining of patient text medical records 
as of late. I started out trying to predict whether or not cancer patients had 
received KRAS mutation testing and did quite well with that. Now I'm trying to 
predict the results of KRAS testing (mutated vs. wild type). This is proving to 
be a little more difficult.
 
With the first classification task, I created counts of terms (e.g., ""kras", 
"mutated") in the text medical records using the tm package and then used those 
counts to predict whether or not patients had had KRAS mutation testing. I 
tried a few different analyses here, but found that random forests worked the 
best.
 
Predicting the results of testing is harder though because of the way 
physicians and other healthcare professionals write about testing. For example, 
I'm finding phrases like "KRAS mutation returned wild-type". In this example, 
if we're counting, we get 1 instance of "kras", 1 instance of "mutated", and 
one instance of "wild". So you can see how it might be difficult to accurately 
predict the results of testing based on counts alone.
 
My question is how best to deal with this. Are there any R text mining packages 
or related software that would be particularly suited to my problem? I took a 
look at the CRAN Task View: Natural Language Processing and there were so many 
options I didn't really know where to start (and it's not even clear that an 
R-based solution will work best for my problem). Alternatively, is there any 
real chance one could simply write code that would be able to identify true 
references to the results of KRAS testing and then create counts only of what 
are likely to be true references?
 
I'd greatly appreciate it if someone could point me in the right direction.
 
Thanks,
 
Paul 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select Original and Duplicates

2012-09-28 Thread Adam Gabbert
That works. Thank you!

On Fri, Sep 28, 2012 at 4:22 PM, Rui Barradas  wrote:

> Hello,
>
> Try the following.
>
>
> idx <- duplicated(df) | duplicated(df, fromLast = TRUE)
> df[idx, ]
>
> Note that they are returned in their original order in the df.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 28-09-2012 21:11, Adam Gabbert escreveu:
>
>> I would like to select a all the duplicate rows of a data frame including
>> the original.  Any help would be much appreciated.  This is where I'm at
>> so
>> far. Thanks.
>>
>> #Sample data frame:
>> df <- read.table(header=T, con <- textConnection('
>>   label value
>>   A 4
>>   B 3
>>   C 6
>>   B 3
>>   B 1
>>   A 2
>>   A 4
>>   A 4
>> '))
>> close(con)
>>
>> # Duplicate entries
>> df[duplicated(df),]
>>
>> # label value
>> # B 3
>> # A 4
>> # A 4
>>
>> #I want to select all the rows that are duplicated including the original
>> #This is the output I want
>> # label value
>> # B 3
>> # B 3
>> # A 4
>> # A 4
>> # A 4
>>
>> [[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html 
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select Original and Duplicates

2012-09-28 Thread arun
HI,

You can also try:
idx<-data.frame(t(sapply(df,function(x) !is.na(match(x,x[duplicated(x)])
 df1<-df[sapply(idx,function(x) all(x==TRUE)),]
df1
#  label value
#1 A 4
#2 B 3
#4 B 3
#7 A 4
#8 A 4

A.K.

- Original Message -
From: Rui Barradas 
To: Adam Gabbert 
Cc: r-help@r-project.org
Sent: Friday, September 28, 2012 4:22 PM
Subject: Re: [R] Select Original and Duplicates

Hello,

Try the following.


idx <- duplicated(df) | duplicated(df, fromLast = TRUE)
df[idx, ]

Note that they are returned in their original order in the df.

Hope this helps,

Rui Barradas

Em 28-09-2012 21:11, Adam Gabbert escreveu:
> I would like to select a all the duplicate rows of a data frame including
> the original.  Any help would be much appreciated.  This is where I'm at so
> far. Thanks.
>
> #Sample data frame:
> df <- read.table(header=T, con <- textConnection('
>   label value
>       A     4
>       B     3
>       C     6
>       B     3
>       B     1
>       A     2
>       A     4
>       A     4
> '))
> close(con)
>
> # Duplicate entries
> df[duplicated(df),]
>
> # label value
> #     B     3
> #     A     4
> #     A     4
>
> #I want to select all the rows that are duplicated including the original
> #This is the output I want
> # label value
> #     B     3
> #     B     3
> #     A     4
> #     A     4
> #     A     4
>
>     [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Converting array to matrix

2012-09-28 Thread farnoosh sheikhi
Hi,

I have a 3d array as below, I want to make this array to a matrix of p=50(rows) 
and n=20(columns) with the coverage values .
The code before the array is:

library(binom)
Loading required package: lattice
pi.seq<-seq(from = 0.01, to = 0.5, by = 0.01)
no.seq<-seq(from = 5, to = 100, by = 5)
cp.all = binom.coverage( p = pi.seq, n = no.seq , conf.level = 0.95, method = 
"exact")


I basically want to plot this probability with filled. contour.  
Many thanks.

method    p   n  coverage
1     exact 0.01   5 0.9990199
2     exact 0.01  10 0.9957338
3     exact 0.01  15 0.9903702
4     exact 0.01  20 0.9831407
5     exact 0.01  25 0.9980493
6     exact 0.01  30 0.9966823
7     exact 0.01  35 0.9948463
8     exact 0.01  40 0.9925026
9     exact 0.01  45 0.9896219
10    exact 0.01  50 0.9861827
11    exact 0.01  55 0.9821712
12    exact 0.01  60 0.9775798
13    exact 0.01  65 0.9958308
14    exact 0.01  70 0.9945711
15    exact 0.01  75 0.9930800
16    exact 0.01  80 0.9913408
17    exact 0.01  85 0.9893386
18    exact 0.01  90 0.9870598
19    exact 0.01  95 0.9844924
20    exact 0.01 100 0.9816260
21    exact 0.02   5 0.9961576
22    exact 0.02  10 0.9838224
23    exact 0.02  15 0.9969606
24    exact 0.02  20 0.9929313
25    exact 0.02  25 0.9867566
26    exact 0.02  30 0.9782822
27    exact 0.02  35 0.9948918
28    exact 0.02  40 0.9917591
29    exact 0.02  45 0.9875780
30    exact 0.02  50 0.9822419
31    exact 0.02  55 0.9756698
32    exact 0.02  60 0.9929754
33    exact 0.02  65 0.9902072
34    exact 0.02  70 0.9867702
35    exact 0.02  75 0.9826010
36    exact 0.02  80 0.9776446
37    exact 0.02  85 0.9927058
38    exact 0.02  90 0.9904482
39    exact 0.02  95 0.9877327
40    exact 0.02 100 0.9845164
41    exact 0.03   5 0.9915279
42    exact 0.03  10 0.9972351
43    exact 0.03  15 0.9906286
44    exact 0.03  20 0.9789916
45    exact 0.03  25 0.9938142
46    exact 0.03  30 0.9880954
47    exact 0.03  35 0.9797802
48    exact 0.03  40 0.9933299
49    exact 0.03  45 0.9890462
50    exact 0.03  50 0.9831894
51    exact 0.03  55 0.9755598
52    exact 0.03  60 0.9908560
53    exact 0.03  65 0.9866943
54    exact 0.03  70 0.9813629
55    exact 0.03  75 0.9926775
56    exact 0.03  80 0.9896911
57    exact 0.03  85 0.9859049
58    exact 0.03  90 0.9812172
59    exact 0.03  95 0.9755343
60    exact 0.03 100 0.9893762
61    exact 0.04   5 0.9852420
62    exact 0.04  10 0.9937863
63    exact 0.04  15 0.9797082
64    exact 0.04  20 0.9925871
65    exact 0.04  25 0.9834784
66    exact 0.04  30 0.9936800
67    exact 0.04  35 0.9877867
68    exact 0.04  40 0.9789777
69    exact 0.04  45 0.9912599
70    exact 0.04  50 0.9855896
71    exact 0.04  55 0.9777638
72    exact 0.04  60 0.9901122
73    exact 0.04  65 0.9849824
74    exact 0.04  70 0.9781965
75    exact 0.04  75 0.9897956
76    exact 0.04  80 0.9852643
77    exact 0.04  85 0.9794261
78    exact 0.04  90 0.9899813
79    exact 0.04  95 0.9653302
80    exact 0.04 100 0.9641378
81    exact 0.05   5 0.9774075
82    exact 0.05  10 0.9884964
83    exact 0.05  15 0.9945327
84    exact 0.05  20 0.9840985
85    exact 0.05  25 0.9928351
86    exact 0.05  30 0.9843645
87    exact 0.05  35 0.9927483
88    exact 0.05  40 0.9861231
89    exact 0.05  45 0.9761385
90    exact 0.05  50 0.9882136
91    exact 0.05  55 0.9806825
92    exact 0.05  60 0.9902109
93    exact 0.05  65 0.9844774
94    exact 0.05  70 0.9766393
95    exact 0.05  75 0.9662306
96    exact 0.05  80 0.9650815
97    exact 0.05  85 0.9772934
98    exact 0.05  90 0.9755923
99    exact 0.05  95 0.9718140
100   exact 0.05 100 0.9826071
101   exact 0.06   5 0.9980297
102   exact 0.06  10 0.9811622
103   exact 0.06  15 0.9896401
104   exact 0.06  20 0.9943659
105   exact 0.06  25 0.9849507
106   exact 0.06  30 0.9920548
107   exact 0.06  35 0.9831689
108   exact 0.06  40 0.9909419
109   exact 0.06  45 0.9829932
110   exact 0.06  50 0.9906217
111   exact 0.06  55 0.9836566
112   exact 0.06  60 0.9663670
113   exact 0.06  65 0.9668145
114   exact 0.06  70 0.9630279
115   exact 0.06  75 0.9763348
116   exact 0.06  80 0.9716289
117   exact 0.06  85 0.9820840
118   exact 0.06  90 0.9772655
119   exact 0.06  95 0.9687703
120   exact 0.06 100 0.9680765
121   exact 0.07   5 0.9969201
122   exact 0.07  10 0.9964239
123   exact 0.07  15 0.9824673
124   exact 0.07  20 0.9892932
125   exact 0.07  25 0.9934691
126   exact 0.07  30 0.9837683
127   exact 0.07  35 0.9902956
128   exact 0.07  40 0.9801496
129   exact 0.07  45 0.9879752
130   exact 0.07  50 0.9779901
131   exact 0.07  55 0.9679391
132   exact 0.07  60 0.9640110
133   exact 0.07  65 0.9765091
134   exact 0.07  70 0.9702320
135   exact 0.07  75 0.9806132
136   exact 0.07  80 0.9553953
137   exact 0.07  85 0.9692733
138   exact 0.07  90 0.9656231
139   exact 0.07  95 0.9765780
140   exact 0.07 100 0.9715796
141   exact 0.08   5 0.9954747
142   exact 0.08  10 0.9941987
143   exact 0.08  15 0.9950303
144   exact 0.08  20 0.9816556
145   exact 0.08  25 0.9877073
146   exact

Re: [R] Converting array to matrix

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 3:59 PM, farnoosh sheikhi wrote:

> Hi,
> 
> I have a 3d array as below, I want to make this array to a matrix of 
> p=50(rows) and n=20(columns) with the coverage values .
> The code before the array is:

?matrix

mat <- matrix(datfrm$coverage, 50, 20)
filled.contour(mat)  # untested


-- 
David
> library(binom)
> Loading required package: lattice
> pi.seq<-seq(from = 0.01, to = 0.5, by = 0.01)
> no.seq<-seq(from = 5, to = 100, by = 5)
> cp.all = binom.coverage( p = pi.seq, n = no.seq , conf.level = 0.95, method = 
> "exact")
> 
> 
> I basically want to plot this probability with filled. contour.  
> Many thanks.
> 
> methodp   n  coverage
> 1 exact 0.01   5 0.9990199
> 2 exact 0.01  10 0.9957338
> 3 exact 0.01  15 0.9903702
> 4 exact 0.01  20 0.9831407
> 5 exact 0.01  25 0.9980493
> 6 exact 0.01  30 0.9966823
> 7 exact 0.01  35 0.9948463
> 8 exact 0.01  40 0.9925026
> 9 exact 0.01  45 0.9896219
> 10exact 0.01  50 0.9861827
> 11exact 0.01  55 0.9821712
> 12exact 0.01  60 0.9775798
> 13exact 0.01  65 0.9958308
> 14exact 0.01  70 0.9945711
> 15exact 0.01  75 0.9930800
> 16exact 0.01  80 0.9913408
> 17exact 0.01  85 0.9893386
> 18exact 0.01  90 0.9870598
> 19exact 0.01  95 0.9844924
> 20exact 0.01 100 0.9816260
> 21exact 0.02   5 0.9961576
> 22exact 0.02  10 0.9838224
> 23exact 0.02  15 0.9969606
> 24exact 0.02  20 0.9929313
> 25exact 0.02  25 0.9867566
> 26exact 0.02  30 0.9782822
> 27exact 0.02  35 0.9948918
> 28exact 0.02  40 0.9917591
> 29exact 0.02  45 0.9875780
> 30exact 0.02  50 0.9822419
> 31exact 0.02  55 0.9756698
> 32exact 0.02  60 0.9929754
> 33exact 0.02  65 0.9902072
> 34exact 0.02  70 0.9867702
> 35exact 0.02  75 0.9826010
> 36exact 0.02  80 0.9776446
> 37exact 0.02  85 0.9927058
> 38exact 0.02  90 0.9904482
> 39exact 0.02  95 0.9877327
> 40exact 0.02 100 0.9845164
> 41exact 0.03   5 0.9915279
> 42exact 0.03  10 0.9972351
> 43exact 0.03  15 0.9906286
> 44exact 0.03  20 0.9789916
> 45exact 0.03  25 0.9938142
> 46exact 0.03  30 0.9880954
> 47exact 0.03  35 0.9797802
> 48exact 0.03  40 0.9933299
> 49exact 0.03  45 0.9890462
> 50exact 0.03  50 0.9831894
> 51exact 0.03  55 0.9755598
> 52exact 0.03  60 0.9908560
> 53exact 0.03  65 0.9866943
> 54exact 0.03  70 0.9813629
> 55exact 0.03  75 0.9926775
> 56exact 0.03  80 0.9896911
> 57exact 0.03  85 0.9859049
> 58exact 0.03  90 0.9812172
> 59exact 0.03  95 0.9755343
> 60exact 0.03 100 0.9893762
> 61exact 0.04   5 0.9852420
> 62exact 0.04  10 0.9937863
> 63exact 0.04  15 0.9797082
> 64exact 0.04  20 0.9925871
> 65exact 0.04  25 0.9834784
> 66exact 0.04  30 0.9936800
> 67exact 0.04  35 0.9877867
> 68exact 0.04  40 0.9789777
> 69exact 0.04  45 0.9912599
> 70exact 0.04  50 0.9855896
> 71exact 0.04  55 0.9777638
> 72exact 0.04  60 0.9901122
> 73exact 0.04  65 0.9849824
> 74exact 0.04  70 0.9781965
> 75exact 0.04  75 0.9897956
> 76exact 0.04  80 0.9852643
> 77exact 0.04  85 0.9794261
> 78exact 0.04  90 0.9899813
> 79exact 0.04  95 0.9653302
> 80exact 0.04 100 0.9641378
> 81exact 0.05   5 0.9774075
> 82exact 0.05  10 0.9884964
> 83exact 0.05  15 0.9945327
> 84exact 0.05  20 0.9840985
> 85exact 0.05  25 0.9928351
> 86exact 0.05  30 0.9843645
> 87exact 0.05  35 0.9927483
> 88exact 0.05  40 0.9861231
> 89exact 0.05  45 0.9761385
> 90exact 0.05  50 0.9882136
> 91exact 0.05  55 0.9806825
> 92exact 0.05  60 0.9902109
> 93exact 0.05  65 0.9844774
> 94exact 0.05  70 0.9766393
> 95exact 0.05  75 0.9662306
> 96exact 0.05  80 0.9650815
> 97exact 0.05  85 0.9772934
> 98exact 0.05  90 0.9755923
> 99exact 0.05  95 0.9718140
> 100   exact 0.05 100 0.9826071
> 101   exact 0.06   5 0.9980297
> 102   exact 0.06  10 0.9811622
> 103   exact 0.06  15 0.9896401
> 104   exact 0.06  20 0.9943659
> 105   exact 0.06  25 0.9849507
> 106   exact 0.06  30 0.9920548
> 107   exact 0.06  35 0.9831689
> 108   exact 0.06  40 0.9909419
> 109   exact 0.06  45 0.9829932
> 110   exact 0.06  50 0.9906217
> 111   exact 0.06  55 0.9836566
> 112   exact 0.06  60 0.9663670
> 113   exact 0.06  65 0.9668145
> 114   exact 0.06  70 0.9630279
> 115   exact 0.06  75 0.9763348
> 116   exact 0.06  80 0.9716289
> 117   exact 0.06  85 0.9820840
> 118   exact 0.06  90 0.9772655
> 119   exact 0.06  95 0.9687703
> 120   exact 0.06 100 0.9680765
> 121   exact 0.07   5 0.9969201
> 122   exact 0.07  10 0.9964239
> 123   exact 0.07  15 0.9824673
> 124   exact 0.07  20 0.9892932
> 125   exact 0.07  25 0.9934691
> 126   exact 0.07  30 0.9837683
> 127   exact 0.07  35 0.9902956
> 128   exact 0.07  40 0.9801496
> 129   exact 0.07  45 0.9879752
> 130   exact 0.07  50 0.9779901
> 131   exact 0.07  55 0.9679391
> 

Re: [R] Errors in if statement

2012-09-28 Thread David Winsemius

On Sep 28, 2012, at 1:16 PM, JiangZhengyu wrote:

> 
> Hi guys, I have many rows (>1000) and columns (>30) of "geno" matrix. I use 
> the following loop and condition statement (adapted from someone else code). 
> I always have an error below.  I was wondering if anyone knows what's the 
> problem & how to fix it.  

Boy, it surely looks like missing values are the problem. Have you read:

?sum

-- 
David.

> Thanks,Zhengyu  ### geno matrix P1  P2  P3  P4 
> 1  2  2  3 2 
> 2  2  2  1 1
> 1  2  1  2  NANA 2  3  4  5 ###
> for(i in 1:4) {
> cat(i,"")
> if(sum(geno[i,]!=2)>3 && sum(geno[i,]==1)>=1 && sum(geno[i,]==3)>=1){
>   tmp = 1
>   }
> } ### 1 2 Error in if (sum(geno[i, ] != 2) > 3 && sum(geno[i, ] == 1) 
> >= 1 && sum(geno[i,  : 
>  missing value where TRUE/FALSE needed
> 
>   [[alternative HTML version deleted]]

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.