Re: [R] subsetting without losing the attributes

2012-07-18 Thread Rui Barradas

Hello,

Another quote from the help pages that could be usefull is, from ?subset,

Warning

This is a convenience function intended for use interactively. For 
programming it is better to use the standard subsetting functions like 
[, and in particular the non-standard evaluation of argument subset can 
have unanticipated consequences.



So if the op wants a behavior that can be seen with `[`, use the 
recommended extractor function.


Hope this helps,

Rui Barradas

Em 18-07-2012 01:03, Peter Ehlers escreveu:

Inline

On 2012-07-17 04:16, MK wrote:

Hi,

How do I use the subset function without losing the attributes?


You don't. At least not without modifying subset.data.frame().

The key sentence on the ?Extract help page is this:

  "Subsetting (except by an empty index) will drop all
   attributes except names, dim and dimnames."

(This should probably say "... by an empty _first_ index ...")

If you really want a subset function to do what you want then
it's an easy modification; just replace the last line of
subset.data.frame with

  if(isTRUE(r))
  x[, vars, drop = drop]
  else
  x[r, vars, drop = drop]

But this still won't let you pick subsets of rows without
losing the attributes. So I see no advantage over [,].

I would just save the attributes and reassign them.

Peter Ehlers



For example,

test.df <- data.frame(a = rnorm(100), b = runif(100), c = rexp(100))
attr(test.df[, 1], "units") <- c("cm")
attr(test.df[, 2], "units") <- c("kg")
attr(test.df[, 3], "units") <- c("ha")


## We want this behavior
str(test.df[, "a", drop = FALSE])

## But not the behavior of subset
str(subset(test.df, select = a, drop = FALSE))
## Whic is equivalent to
str(test.df[TRUE, "a", drop = FALSE])

Cheers,
M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PCA

2012-07-18 Thread Guforu
Hi everybody!

I do at this time PCA. Everything was going good, I have got good results.
Now I want make a plot. You can see it on the next image:
http://r.789695.n4.nabble.com/file/n4636840/Forum.jpg 

For this plot I use the simple biplot function.

Now, it is not exactly what I want. I prefer to get the time series like
coordinates and PCA like the variables (exactly turn off in the previous
plot). Should look like this:

http://r.789695.n4.nabble.com/file/n4636840/F2.jpg 

you can see the different, The PCA in my plot are the coordinates and in the
foreign plot, they are the variables and build the ellipse (which is also
orthogonal).
Please, can you tell me some function in R for this operation?

--
View this message in context: http://r.789695.n4.nabble.com/PCA-tp4636840.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NTH PERCENTILE COULMNWIESE

2012-07-18 Thread Rantony
Hi,

Here i have an matrix
mydat <- 
ABCXYZ
- --
12  6
6 50
90 100
55   85
100 25

i need to find the " NTH percentile "  [result should be in column-wise].
here i have a code for NTH percenile,

For eg:- if i need 20th-percentile then,
quantile(ncol(mydat),.2)  - here i getting output for complete matrix ,But i
need this for all the columnswise. like this,

for nth percentile
  ABC XYZ
  -----
20%   1024[here, given percentile value is not exact result, its
just for output format]

-Its urgent !

- Thanks 
Antony.


--
View this message in context: 
http://r.789695.n4.nabble.com/NTH-PERCENTILE-COULMNWIESE-tp4636839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "duplicate couples (time-id)" Problem

2012-07-18 Thread Saint
Thank you!

Problem solved! There where two identical observations.

--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-couples-time-id-Problem-tp4636316p4636841.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] complexity of operations in R

2012-07-18 Thread Patrick Burns

It looks to me like the following
should do what you want:

f2 <- function(dotot) array(FALSE, c(dotot, 1))

What am I missing?

Pat

On 17/07/2012 21:58, Johan Henriksson wrote:

thanks for the link! I should read it through. that said, I didn't find
any good general solution to the problem so here I post some attempts
for general input. maybe someone knows how to speed this up. both my
solutions are theoretically O(n) for creating a list of n elements. The
function to improve is O(n^2) which should suck tremendously - but the
slow execution of R probably blows up the constant factor of the smarter
solutions.

Array doubling comes close in speed for large lists but it would be
great if it could be comparable for smaller lists. One hidden cost I see
directly is that allocating a list in R is O(n), not O(1) (or close),
since it always fills it with values. Is there a way around this? I
guess by using C, one could just malloc() and leave the content
undefined - but is there no better way?

thanks,
/Johan



# the function we wish to improve

f<-function(dotot){
   v<-matrix(ncol=1,nrow=0)
   for(i in 1:dotot){
 v<-rbind(v,FALSE)
   }
   return(v)
}

##
# first attempt: linked lists

emptylist <- NA

addtolist <- function(x,prev){
   return(list(x,prev))
}

g<-function(dotot){
   v<-emptylist
   for(i in 1:dotot){
 v<-addtolist(FALSE,v)
   }
   return(v)
}


# second attempt: array doubling

emptyexpandlist<-list(nelem=0,l=matrix(ncol=1,nrow=0))

addexpandlist<-function(x,prev){
   if(nrow(prev$l)==prev$nelem){
 nextsize<-max(nrow(prev$l),1)
 prev$l<-rbind(prev$l,matrix(ncol=1,nrow=nextsize))
   }
   prev$nelem<-prev$nelem+1
   prev$l[prev$nelem]<-x
   return(prev)
}

compressexpandlist<-function(prev){
   return(as.vector(prev$l[1:prev$nelem]))
}

h<-function(dotot){
   v<-emptyexpandlist
   for(i in 1:dotot){
 v<-addexpandlist(FALSE,v)
   }
   return(compressexpandlist(v))
}

#

dotot=10
system.time(f(dotot))
#system.time(g(dotot))
system.time(h(dotot))








On Tue, Jul 17, 2012 at 8:42 PM, Patrick Burns mailto:pbu...@pburns.seanet.com>> wrote:

Johan,

If you don't know 'The R Inferno', it might
help a little.  Circle 2 has an example of
how to efficiently (relatively speaking) grow
an object if you don't know the final length.

http://www.burns-stat.com/__pages/Tutor/R_inferno.pdf


If you gave a simple example of how your code
looks now and what you want it to do, then you
might get some ideas of how to improve it.


Pat


On 17/07/2012 12:47, Johan Henriksson wrote:

Hello!
I am optimizing my code in R and for this I need to know a bit
more about
the internals. It would help tremendously if someone could link
me to a
page with O()-complexities of all the operations.

In this particular case, I need something like a linked list
with O(1)
insertLast/First ability. I can't preallocate a vector since I
do not know
the final size of the list ahead of time.

The classic array-doubling trick would give me O(1) amortized
time but it's
a bit messy. The other alternative I see would be to recursively
store
lists (elem, (elem, (elem, (..., which I take also would
work? But I'd
rather go for a standard R solution if there is one!

cheers,
/Johan


--
Patrick Burns
pbu...@pburns.seanet.com 
twitter: @portfolioprobe
http://www.portfolioprobe.com/__blog

http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')





--
--
---
Johan Henriksson, PhD
Karolinska Institutet
Ecobima AB - Custom solutions for life sciences
http://www.ecobima.se  http://mahogny.areta.org
http://www.endrov.net




--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OT: Where's the new Tukey?

2012-07-18 Thread Liviu Andronic
On Wed, Jul 18, 2012 at 12:30 AM, Kjetil Halvorsen
 wrote:
> Venables & Ripley:
> "Modern Applied Statistics with S (fourth Edition)"
>
[..]
>
> On Sat, Jul 14, 2012 at 4:01 PM, Larry White  wrote:
>> I'm looking for a single book that provides a deep, yet readable
>> introduction to applied data analysis for general readers.
>
In my experience MASS doesn't apply to general readers. More to experts.


> I'm looking for coverage on things like understanding randomness, "natural
>> experiments", confounding, causality and correlation, data cleaning and
>> transforms, lagging, residuals, exploratory graphics, curve fitting,
>> descriptive stats Preferably with examples/case studies that illustrate
>> the art and craft of data analysis. No proofs or heavy math.
>
I'm no expert, but I'm very happy with what I'm reading in
'Statistics' by Freedman et al. (2007) [1]. This book concerns itself
with providing the reader with a clear, intuitive and accessible
understanding of the fundamentals of statistics. Its hallmark (for
better or worse) is the thorough avoidance of formulas or
incomprehensible math jargon. (It still contains a lot of proper
jargon, but it doesn't assume, as many books do, that the user
perfectly understands all the mathematical and statistical terms.) The
book requires, essentially, no prerequisites from the reader. As far
as I go, very good.
[1] https://en.wikipedia.org/wiki/David_A._Freedman_(statistician)

For a more rigorous, mathematic and advanced approach I like Applied
Regression Analysis and GLM by Fox (2008). Whereas the first book is
concerned with intuition, this book is focused on application (in the
context of regression analysis). To freely quote the author, the text
is as accessible as possible without the material being watered down
unduly. The prerequisites for reading the book are higher. As far as
I'm concerned, the material is clearly exposed and the author tackles
head-on a lot of thorny issues that other books leave untouched. I
guess this qualifies as "deep, yet readable introduction".

This last book can be perfectly complemented with An R Companion to
Applier Regression by Fox and Weisberg (2011). This is to teach people
R in the context of regression analysis.

Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] duplicate data between two data frames according to row names

2012-07-18 Thread jeff6868
Hi everybody.

I'll first explain my problem and what I'm trying to do. 
Admit this example:
I'm working on 5 different weather stations.
I have first in one file 3 of these 5 weather stations, containing their
data. Here's an example of this file:

DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8))

And my two other stations in this other data.frame:

DF2 <- data.frame(station=c("ST002","ST003"),data=c(3,7))

I would like to add geographical coordinates of these weather stations
inside these two data.frames, according to the number of the weather
station.

All of my geographical coordinates for each of the 5 weather stations are
inside another data frame:

DF3 <-
data.frame(station=c("ST001","ST002","ST003","ST004","ST005"),lat=c(40,41,42,43,44),lon=c(1,2,3,4,5))

My question is: how can I put automatically these geographical coordinates
inside my first 2 data frames, according to the number of the weather
station?

For this example, the first two data frames DF1 and DF2 should become:

DF1 <-
data.frame(station=c("ST001","ST004","ST005"),lat=c(40,43,44),lon=c(1,4,5),data=c(5,2,8))
and
DF2 <-
data.frame(station=c("ST002","ST003"),lat=c(41,42),lon=c(2,3),data=c(3,7))

I need to automatize this method because my real dataset contains 70 weather
stations, and each file contains other (or same sometimes) stations , but
each station can be found in the list of the coordinates file (DF3).

Is there any way or any function able to do this kind of thing?

Thank you very much!




--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression Identity

2012-07-18 Thread peter dalgaard

On Jul 18, 2012, at 05:11 , darnold wrote:

> Hi,
> 
> I see a lot of folks verify the regression identity SST = SSE + SSR
> numerically, but I cannot seem to find a proof. I wonder if any folks on
> this list could guide me to a mathematical proof of this fact.
> 

Wrong list, isn't it? 

http://stats.stackexchange.com/ is -> _that_ way...

Anyways: Any math stats book should have it somewhere inside. There are two 
basic approaches, depending on what level of abstraction one expects from 
students.

First principles: Write out  SST=sum((y-yhat)+(yhat-ybar))^2 and use the normal 
equations to show that the sum of product terms is zero. This is a bit tedious, 
but straightforward in principle.

Linear algebra: The least squares fitted values are the orthogonal projection 
onto a subspace of R^N (N=number of observations). Hence the vector of 
residuals is orthogonal to the vector (yhat - ybar) and the N-dimensional 
version of the Pythagorean theorem is

||yhat - ybar||^2 + ||y - yhat||^2 == ||y - ybar||^2

since the three vectors involved form a right-angled triangle. 
(http://en.wikipedia.org/wiki/Pythagorean_theorem, scroll down to "Inner 
product spaces".) 


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with function

2012-07-18 Thread Rui Barradas

Hello,

I believe that the answer to your final question is yes but you are 
making a really big confusion of what you want, of how you are 
explaining it and of the way R works.


First, I have written a function to read the data from the post, without 
creating a disk file. In order to be complete, I'll repost everything:




f <- function(){
x <- read.csv2(text="
Id;fc;zat;dat;hat
PG;1,3;1;2;3
HU;2,5;0;2;5
JZ;1,8;1;0;4
", header=TRUE)
x
}

# your function, one line changed.

read.data <- function(file,  ...){

#  mydata <- read.csv2 (file, header=TRUE, as.is=TRUE) # read everything
  mydata <- f()

  fc <- mydata[,2]  # extract FC column
  rownames(mydata) <- mydata[,1]# same for gene names to be used
# as rownames
  mydata <- mydata[,-(1:2)] # drop those two columns
  attr(mydata, 'fc') <- fc  # attach FC back as attribute
  return(mydata)# and return
  # return fc or return attr(mydata, 'fc') does not work  (???)
}

# further code is like this:

newdata <- read.data("testfile")
head(newdata)   # works but not
fc  # or (why should it?)
attr(newdata, 'fc') # it does not work if I include return(fc) or (???)
# attr... in my function neither

attributes(newdata) # This is new, see ALL attributes.


I've also ncluded some (???) in the comments. See below inline.

Em 17-07-2012 22:13, Hermann Norpois escreveu:

Dear list,

I have a problem with defining a function (see below) to read my testfile
(see testfile). My function only returns mydata I wish to work with
attr(mydata, 'fc') as well (for labelling a plot). Principally it works if
I do not insist on this function but it would be much easer if it is
possible to return mydata AND attr(mydata, 'fc') by using a function.

1) testfile:

Id;fc;zat;dat;hat
PG;1,3;1;2;3
HU;2,5;0;2;5
JZ;1,8;1;0;4


2) my function:

read.data <- function(file,  ...){

   mydata <- read.csv2 (file, header=TRUE, as.is=TRUE) # read everything

   fc <- mydata[,2]  # extract FC column
   rownames(mydata) <- mydata[,1]# same for gene names to be used as
rownames
   mydata <- mydata[,-(1:2)] # drop those two columns
   attr(mydata, 'fc') <- fc   # attach FC back as attribute
   return(mydata)# and return
   # return fc or return attr(mydata, 'fc') does not work
}


What doesn't work? Why should it? If you return fc or attr() you lose 
mydata, In R what goes on inside a function stays there unless it s 
returned.

Your return(mydata) in the previous line does what you want!!!
Meaning, it reutrns mydata with it's attributes, including fc.



further code is like this:

newdata <- read.data("testfile")
head (newdata) #works but not
fc # or
attr(mydata, 'fc') # it does not work if I include return fc or attr... in
my function neither



Of course it doesn't work, mydata does NOT exist here, it only exists 
inside the function read.data(). The function returns the VALUE OF 
mydata, not the variable itself. R works in a call/return by value basis.


Then you assign the return value to newdata. So what you want is 
attr(newdata, 'fc') like my code asks for, and it does work.




Then I can get access to newdata and on but not on fc or attr(mydata, 'fc').

Is there a possibility to use function and having access to attr AND
newdata?


Yes.

But no longer to mydata. Its value yes, it's now in newdata. You are 
obviuosly mistaking one for the other. Do not worry. Everything is 
working like it should. You are correctly reading the data in, assigning 
row names, assigning an attribute called fc with some decimal values in 
it and correctly returning what you need be returned.


Everything is allright. Read my code and run it.

Hope this helps,

Rui Barradas



Thanks
Sincerely
Hermann Norpois

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NTH PERCENTILE COULMNWIESE

2012-07-18 Thread Rui Barradas

Hello,

Try the following.


colQuant <- function(x, probs= seq(0, 1, 0.25), na.rm = FALSE,
 names = TRUE, type = 7, ...){
	apply(x, 2, quantile, probs = probs, na.rm = na.rm, names = names, type 
= type, ...)

}

mat <- matrix(rnorm(12), 4)
colQuant(mat, 0.20)
colQuant(mat, c(0.20, 0.5))


Hope this helps,

Rui Barradas

Em 18-07-2012 08:06, Rantony escreveu:

Hi,

Here i have an matrix
mydat <-
ABCXYZ
- --
12  6
6 50
90 100
55   85
100 25

i need to find the " NTH percentile "  [result should be in column-wise].
here i have a code for NTH percenile,

For eg:- if i need 20th-percentile then,
quantile(ncol(mydat),.2)  - here i getting output for complete matrix ,But i
need this for all the columnswise. like this,

for nth percentile
   ABC XYZ
   -----
20%   1024[here, given percentile value is not exact result, its
just for output format]

-Its urgent !

- Thanks
Antony.


--
View this message in context: 
http://r.789695.n4.nabble.com/NTH-PERCENTILE-COULMNWIESE-tp4636839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] complexity of operations in R

2012-07-18 Thread Rui Barradas

Hello,

Em 18-07-2012 09:06, Patrick Burns escreveu:

It looks to me like the following
should do what you want:

f2 <- function(dotot) array(FALSE, c(dotot, 1))

What am I missing?


That matrix is even faster?


f2 <- function(dotot) array(FALSE, c(dotot, 1))
f3 <- function(dotot) matrix(FALSE, dotot, 1)

t2 <- system.time(replicate(1e4, f2(1000)))
t3 <- system.time(replicate(1e4, f3(1000)))
rbind(t2, t3, t2/t3)

Rui Barradas


Pat

On 17/07/2012 21:58, Johan Henriksson wrote:

thanks for the link! I should read it through. that said, I didn't find
any good general solution to the problem so here I post some attempts
for general input. maybe someone knows how to speed this up. both my
solutions are theoretically O(n) for creating a list of n elements. The
function to improve is O(n^2) which should suck tremendously - but the
slow execution of R probably blows up the constant factor of the smarter
solutions.

Array doubling comes close in speed for large lists but it would be
great if it could be comparable for smaller lists. One hidden cost I see
directly is that allocating a list in R is O(n), not O(1) (or close),
since it always fills it with values. Is there a way around this? I
guess by using C, one could just malloc() and leave the content
undefined - but is there no better way?

thanks,
/Johan



# the function we wish to improve

f<-function(dotot){
   v<-matrix(ncol=1,nrow=0)
   for(i in 1:dotot){
 v<-rbind(v,FALSE)
   }
   return(v)
}

##
# first attempt: linked lists

emptylist <- NA

addtolist <- function(x,prev){
   return(list(x,prev))
}

g<-function(dotot){
   v<-emptylist
   for(i in 1:dotot){
 v<-addtolist(FALSE,v)
   }
   return(v)
}


# second attempt: array doubling

emptyexpandlist<-list(nelem=0,l=matrix(ncol=1,nrow=0))

addexpandlist<-function(x,prev){
   if(nrow(prev$l)==prev$nelem){
 nextsize<-max(nrow(prev$l),1)
 prev$l<-rbind(prev$l,matrix(ncol=1,nrow=nextsize))
   }
   prev$nelem<-prev$nelem+1
   prev$l[prev$nelem]<-x
   return(prev)
}

compressexpandlist<-function(prev){
   return(as.vector(prev$l[1:prev$nelem]))
}

h<-function(dotot){
   v<-emptyexpandlist
   for(i in 1:dotot){
 v<-addexpandlist(FALSE,v)
   }
   return(compressexpandlist(v))
}

#

dotot=10
system.time(f(dotot))
#system.time(g(dotot))
system.time(h(dotot))








On Tue, Jul 17, 2012 at 8:42 PM, Patrick Burns mailto:pbu...@pburns.seanet.com>> wrote:

Johan,

If you don't know 'The R Inferno', it might
help a little.  Circle 2 has an example of
how to efficiently (relatively speaking) grow
an object if you don't know the final length.

http://www.burns-stat.com/__pages/Tutor/R_inferno.pdf


If you gave a simple example of how your code
looks now and what you want it to do, then you
might get some ideas of how to improve it.


Pat


On 17/07/2012 12:47, Johan Henriksson wrote:

Hello!
I am optimizing my code in R and for this I need to know a bit
more about
the internals. It would help tremendously if someone could link
me to a
page with O()-complexities of all the operations.

In this particular case, I need something like a linked list
with O(1)
insertLast/First ability. I can't preallocate a vector since I
do not know
the final size of the list ahead of time.

The classic array-doubling trick would give me O(1) amortized
time but it's
a bit messy. The other alternative I see would be to recursively
store
lists (elem, (elem, (elem, (..., which I take also would
work? But I'd
rather go for a standard R solution if there is one!

cheers,
/Johan


--
Patrick Burns
pbu...@pburns.seanet.com 
twitter: @portfolioprobe
http://www.portfolioprobe.com/__blog

http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')





--
--
---
Johan Henriksson, PhD
Karolinska Institutet
Ecobima AB - Custom solutions for life sciences
http://www.ecobima.se  http://mahogny.areta.org
http://www.endrov.net






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] TRUE/FALSE

2012-07-18 Thread Jim Lemon

On 07/17/2012 11:55 PM, cm wrote:

Any reason I'd get an error like this?

Error in if (round(pos[o + 1]) == (pos[o + 1] - 0.4)) { :
   missing value where TRUE/FALSE needed

but when i do it individually, out of the for loop,

(round(pos[o+1])==(pos[o+1]-.4) )

   65
TRUE

Elementary, my dear cm. pos is a vector of length n. From your 
description of the problem, I suspect that you are trying to run a loop 
that looks something like this:


for(o in 1:n)) {
 if(round(pos[o+1]) == (pos[o+1]-0.4))
  print(o)
}

The error occurs on the nth loop. The additional clue that you have 
unwittingly provided in the form of the number 65 that shouldn't be 
there brings me to the conclusion that n=65.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicate data between two data frames according to row names

2012-07-18 Thread Eik Vettorazzi
Hi Jeff,
looks like a job for ?rbind and ?merge

merge(rbind(DF1,DF2),DF3)


hth

Am 18.07.2012 10:21, schrieb jeff6868:
> Hi everybody.
> 
> I'll first explain my problem and what I'm trying to do. 
> Admit this example:
> I'm working on 5 different weather stations.
> I have first in one file 3 of these 5 weather stations, containing their
> data. Here's an example of this file:
> 
> DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8))
> 
> And my two other stations in this other data.frame:
> 
> DF2 <- data.frame(station=c("ST002","ST003"),data=c(3,7))
> 
> I would like to add geographical coordinates of these weather stations
> inside these two data.frames, according to the number of the weather
> station.
> 
> All of my geographical coordinates for each of the 5 weather stations are
> inside another data frame:
> 
> DF3 <-
> data.frame(station=c("ST001","ST002","ST003","ST004","ST005"),lat=c(40,41,42,43,44),lon=c(1,2,3,4,5))
> 
> My question is: how can I put automatically these geographical coordinates
> inside my first 2 data frames, according to the number of the weather
> station?
> 
> For this example, the first two data frames DF1 and DF2 should become:
> 
> DF1 <-
> data.frame(station=c("ST001","ST004","ST005"),lat=c(40,43,44),lon=c(1,4,5),data=c(5,2,8))
> and
> DF2 <-
> data.frame(station=c("ST002","ST003"),lat=c(41,42),lon=c(2,3),data=c(3,7))
> 
> I need to automatize this method because my real dataset contains 70 weather
> stations, and each file contains other (or same sometimes) stations , but
> each station can be found in the list of the coordinates file (DF3).
> 
> Is there any way or any function able to do this kind of thing?
> 
> Thank you very much!
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Eik Vettorazzi
Institut für Medizinische Biometrie und Epidemiologie
Universitätsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. 
Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] contour

2012-07-18 Thread Akhil dua
Hello Everyone

I have the data long format and I want to draw the contour plot with it

x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33


when I am using contour(x1,x2,x3,col=heat.colors) or fill.contour
its giving me an error that increasing x and y expected


So please tell me what is the right function to draw contour when the data
is not ordered and you cant order it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] complexity of operations in R

2012-07-18 Thread Prof Brian Ripley

On 18/07/2012 09:49, Rui Barradas wrote:

Hello,

Em 18-07-2012 09:06, Patrick Burns escreveu:

It looks to me like the following
should do what you want:

f2 <- function(dotot) array(FALSE, c(dotot, 1))

What am I missing?


That matrix is even faster?


Depends on your unstated version of R  Not so in current R (>= 
2.15.1 patched)



f2 <- function(dotot) array(FALSE, c(dotot, 1))
f3 <- function(dotot) matrix(FALSE, dotot, 1)


Using an integer constant for an integer helps if you do this often 
enough (1L, 1000L)




t2 <- system.time(replicate(1e4, f2(1000)))
t3 <- system.time(replicate(1e4, f3(1000)))
rbind(t2, t3, t2/t3)

Rui Barradas


Pat

On 17/07/2012 21:58, Johan Henriksson wrote:

thanks for the link! I should read it through. that said, I didn't find
any good general solution to the problem so here I post some attempts
for general input. maybe someone knows how to speed this up. both my
solutions are theoretically O(n) for creating a list of n elements. The
function to improve is O(n^2) which should suck tremendously - but the
slow execution of R probably blows up the constant factor of the smarter
solutions.

Array doubling comes close in speed for large lists but it would be
great if it could be comparable for smaller lists. One hidden cost I see
directly is that allocating a list in R is O(n), not O(1) (or close),
since it always fills it with values. Is there a way around this? I
guess by using C, one could just malloc() and leave the content
undefined - but is there no better way?

thanks,
/Johan



# the function we wish to improve

f<-function(dotot){
   v<-matrix(ncol=1,nrow=0)
   for(i in 1:dotot){
 v<-rbind(v,FALSE)
   }
   return(v)
}

##
# first attempt: linked lists

emptylist <- NA

addtolist <- function(x,prev){
   return(list(x,prev))
}

g<-function(dotot){
   v<-emptylist
   for(i in 1:dotot){
 v<-addtolist(FALSE,v)
   }
   return(v)
}


# second attempt: array doubling

emptyexpandlist<-list(nelem=0,l=matrix(ncol=1,nrow=0))

addexpandlist<-function(x,prev){
   if(nrow(prev$l)==prev$nelem){
 nextsize<-max(nrow(prev$l),1)
 prev$l<-rbind(prev$l,matrix(ncol=1,nrow=nextsize))
   }
   prev$nelem<-prev$nelem+1
   prev$l[prev$nelem]<-x
   return(prev)
}

compressexpandlist<-function(prev){
   return(as.vector(prev$l[1:prev$nelem]))
}

h<-function(dotot){
   v<-emptyexpandlist
   for(i in 1:dotot){
 v<-addexpandlist(FALSE,v)
   }
   return(compressexpandlist(v))
}

#

dotot=10
system.time(f(dotot))
#system.time(g(dotot))
system.time(h(dotot))








On Tue, Jul 17, 2012 at 8:42 PM, Patrick Burns mailto:pbu...@pburns.seanet.com>> wrote:

Johan,

If you don't know 'The R Inferno', it might
help a little.  Circle 2 has an example of
how to efficiently (relatively speaking) grow
an object if you don't know the final length.

http://www.burns-stat.com/__pages/Tutor/R_inferno.pdf


If you gave a simple example of how your code
looks now and what you want it to do, then you
might get some ideas of how to improve it.


Pat


On 17/07/2012 12:47, Johan Henriksson wrote:

Hello!
I am optimizing my code in R and for this I need to know a bit
more about
the internals. It would help tremendously if someone could link
me to a
page with O()-complexities of all the operations.

In this particular case, I need something like a linked list
with O(1)
insertLast/First ability. I can't preallocate a vector since I
do not know
the final size of the list ahead of time.

The classic array-doubling trick would give me O(1) amortized
time but it's
a bit messy. The other alternative I see would be to recursively
store
lists (elem, (elem, (elem, (..., which I take also would
work? But I'd
rather go for a standard R solution if there is one!

cheers,
/Johan


--
Patrick Burns
pbu...@pburns.seanet.com 
twitter: @portfolioprobe
http://www.portfolioprobe.com/__blog

http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')





--
--
---
Johan Henriksson, PhD
Karolinska Institutet
Ecobima AB - Custom solutions for life sciences
http://www.ecobima.se  http://mahogny.areta.org
http://www.endrov.net






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible c

Re: [R] Imposing more than one condition to if

2012-07-18 Thread Rui Barradas

hELLO,

There was no nedd to change the names of the variables inside the 
fucntion. What was going on s that in this new file column 'dtime' 
doesn't exist. Since column 'clock' exists in all files, I've changed 
the function once again, making it more general.


Note that there is an argument 'colTime' with a default value. In the 
function's use below I call it with and without that argument.



f <- function(x, colTime="clock"){
zrle <- rle(x$lig == 0)
if(zrle$values[1]){
idusk <- sum(zrle$lengths[1:2]) + 1
idawn <- zrle$lengths[1] + 1
x$dusk <- x[ idusk , colTime ]
x$dawn <- x[ idawn , colTime ]
}else{
idusk <- zrle$lengths[1] + 1
x$dusk <- x[ idusk , colTime ]
x$dawn <- NA
}
x
}

str(d)
#d$date <- as.Date(d$date, format="%d/%m%/y")

#library(chron)
#tm <- times(as.character(d$clock))
#d$clock <- tm

# See what will happen. This call uses the default 'colTime'
bb <- by(d, d$date, f)
for(i in seq_along(bb)) print(head(bb[[i]], 1))

# call and rbind. This call uses explicit arg 'colTime'
do.call(rbind, by(d, d$date, f, colTime="clock"))

# Alternatively, it could be, because 'bb' is already created,
do.call(rbind, bb)


In the code above, I use an optional conversion to date and time 
classes; as.Date is part of base R, but class times needs package chron. 
It's a good idea to convert these variables, you can later use, say, 
arithmetics on dates and times, such as differences.


Hope this helps,

Rui Barradas

Em 17-07-2012 19:53, Santiago Guallar escreveu:

Thank for your time, Rui.
Now, I get this error message:
Error en rbind(deparse.level, ...) :
numbers of columns of arguments do not match
Apparently, some columns have missing values and rbind doesn't work. I
tried:
require(plyr)
do.call(rbind.fill, by(z, z$date, f))
Then the code runs through but dusk the variable dusk is missing and
dawn is filled with NA.
Just in case the problem simply lies in a name, this is your code after
I changed the object names (basically 'x' and 'd' by 'z') to adapt them
to the names of my dataset:
f <- function(z){
zrle <- rle(z$lig == 0)
if(zrle$values[1]){
idusk <- sum(zrle$lengths[1:2]) + 1
idawn <- zrle$lengths[1] + 1
z$dusk <- z$dtime[ idusk ]
z$dawn <- z$dtime[ idawn ]
}else{
idusk <- zrle$lengths[1] + 1
z$dusk <- z$dtime[ idusk ]
z$dawn <- NA
}
z
}

do.call(rbind, by(z, z$date, f))
Again, I attached a dput() with the object z which contains my dataset.
Santi

*From:* Rui Barradas 
*To:* Santiago Guallar 
*Cc:* r-help@r-project.org
*Sent:* Tuesday, July 17, 2012 11:52 AM
*Subject:* Re: [R] Imposing more than one condition to if


Hello,

My code couldn't find the right input columns because your real
data has
different names, it could only find the example dataset's names.

And there's another problem, my code would give correct answers
with a
limted number of possible inputs and fail with real data.

Corrected:


f <- function(x){
 zrle <- rle(x$lig == 0)
 if(zrle$values[1]){
 idusk <- sum(zrle$lengths[1:2]) + 1
 idawn <- zrle$lengths[1] + 1
 x$dusk <- x$dtime[ idusk ]
 x$dawn <- x$dtime[ idawn ]
 }else{
 idusk <- zrle$lengths[1] + 1
 x$dusk <- x$dtime[ idusk ]
 x$dawn <- NA
 }
 x
}

do.call(rbind, by(d, d$date, f))


One more thing, you are reading your dataset into a data.frame
forgetting that character strings become factors. Try str(d) to
see it.
('d' is the data.frame.) You could/should coerce the date/time
values to
appropriate classes, something like


d$time <- as.character(d$time)
d$time <- as.POSIXct(d$time, format="%d/%m/%y %H:%M:%S")
d$date <- as.character(d$date)
d$date <- as.Date(d$date, format="%d/%m/%y")


Hope this helps,

Rui Barradas

Em 17-07-2012 07:14, Santiago Guallar escreveu:
 > Thank you Rui,
 >
 > When applied to my original data, your code goes through
although it
 > does not produce the correct results: for dusk gives the
first time
 > value of next day, for dawn it gives NA. It seems that the
function f
 > doesn't find the right input columns.
 > A, Ilso had to push up the memory size.
 > Attached a file (containing just 3000 of the original c.
45000 rows)
 > after dput().
 >
 > Santi
 >
 >
 >

 >*From:* Rui Barradas mailto:ruipbarra...@sapo.pt>>
 >*To:* Santiago Guallar mailto:sgual...@yahoo.com>>
 >*Cc:* r-help@r-project.org 
 >*Sen

Re: [R] Problem creation tensor

2012-07-18 Thread Petr Savicky
On Tue, Jul 17, 2012 at 12:31:38PM +0200, Peppe Ricci wrote:
> Hi guys,
> 
> I need some help to analyzing my data.
> I start to describe my data: I have 21 matrices, every matrix on the
> rows has users and on columns has items, in my case films.
> Element of index (i, j) represent the rating expressed by user i about item j.
> I have a matrix for each of professions.
> An example of a this type of matrix is:
> 
> item 1item 2item 3item4
>   id user 11  ?  ?   5
>   id user 2?  3  3   ?
>   id user 32  ?  3   2
>   id user 4?  ?  ?   4
>   ...
> So user 1 don't like item 1 but he likes so much item 4, for item 2
> and 3 he hasn't expressed a rating, etc.
> I need to construct a tensor with n users, m items and 21 occupations.
> After I have construct the tensor I want apply Parafac.
> I read data from a CSV file and build each matrix for each occupation.
> 
> Didier Leibovici (author of PTAk package) suggested to me:
> 
> ok that's bit clearer you have 21 matrices ( 1 for each occupations)
> of users rating their preferences (from 1 to 5 but without rating all
> of them: missing values) of  m items.
> but I suppose the users are not the same across the 21 occupations
> (one has only one occupation  if you're talking about
> working/living occupation)
> so you can't create a tensor n users x m items x 21 occupations
> but you can build the contingencies of preferences m items x 21
> occupations x 5 ratings
> 
> One way to build your tensor m x 21 x 5 is:
> M1 is the first occupation (users x m) ...
> UserItem <-rbind(M1,M2, ...M21)
> 
> m=1682
> 
> for (j in 1:m){
> UserItem[,j] =factor(UserItem[,j],levels=1:5)
> }
> occ=factor(c(rep(1,dim(M1)[1]),rep(2,dim(M2)[1]),
> ...,rep(21,dim(M21)[1])),levels=1:21)
> 
> Z <- array(rep(0,m*21*5),c(m,21,5),
> list(paste("item",1:m,sep=""),paste("Occ",1:21,sep=""),c("pr1","pr2","pr3","pr4","pr5")))
> for ( i in 1:m){
>   as.matrix(table(occ, UserItem[,2]))
>   Z[i,,]=table(occ, UserItem[,i])
> }
> 
> Z.CAND <- CANPARA(Z,dim=7)
> 
> I have implemented this code but I have one error in correspondance of:
> 
>   for ( i in 1:m){
> Z[i,,]=table(occ,UserItem[,i])
>   }
> 
> and error is:
> 
> Error in
> Z[i,,]=table(occ,UserItem[,i])
> the number of elements to be replaced is not a multiple of the length
> of substitution

Hi.

The problem in this code is that the command

  UserItem <- rbind(M1, M2, ..., M21)

produces a matrix and not a data.frame. Due to this, the commands

UserItem[, j] <- factor(UserItem[, j], levels=1:5)

do not convert the columns to factors, but the columns remain numeric.
Due to this, the table created as

  table(occ, UserItem[, i])

may not have the full size, since the columns correspond only to
preferences, which do occur in UserItem[, i], and not to all possible
preferences.

Changing 

  UserItem <- rbind(M1, M2, ..., M21)

to

  UserItem <- data.frame(rbind(M1, M2, ..., M21))

can resolve the problem, since then the columns will be coerced to factors,
whose list of levels is complete, even if some level is not used.

For better clarity, consider the definition of the array in an equivalent
form

  Z <- array(0, dim=c(m, 21, 5),
  dimnames=list(paste("item", 1:m, sep=""), paste("Occ", 1:21, sep=""),
  c("pr1", "pr2", "pr3", "pr4", "pr5")))

which contains the names of the used arguments of the function array().

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem creation tensor

2012-07-18 Thread Petr Savicky
On Wed, Jul 18, 2012 at 11:38:59AM +0200, Petr Savicky wrote:
> On Tue, Jul 17, 2012 at 12:31:38PM +0200, Peppe Ricci wrote:
> > Hi guys,
> > 
> > I need some help to analyzing my data.
> > I start to describe my data: I have 21 matrices, every matrix on the
> > rows has users and on columns has items, in my case films.
> > Element of index (i, j) represent the rating expressed by user i about item 
> > j.
> > I have a matrix for each of professions.
> > An example of a this type of matrix is:
> > 
> > item 1item 2item 3item4
> >   id user 11  ?  ?   5
> >   id user 2?  3  3   ?
> >   id user 32  ?  3   2
> >   id user 4?  ?  ?   4
> >   ...
> > So user 1 don't like item 1 but he likes so much item 4, for item 2
> > and 3 he hasn't expressed a rating, etc.
> > I need to construct a tensor with n users, m items and 21 occupations.
> > After I have construct the tensor I want apply Parafac.
> > I read data from a CSV file and build each matrix for each occupation.
> > 
[...]
> > I have implemented this code but I have one error in correspondance of:
> > 
> >   for ( i in 1:m){
> > Z[i,,]=table(occ,UserItem[,i])
> >   }
> > 
> > and error is:
> > 
> > Error in
> > Z[i,,]=table(occ,UserItem[,i])
> > the number of elements to be replaced is not a multiple of the length
> > of substitution
> 
> Hi.
> 
> The problem in this code is that the command
> 
>   UserItem <- rbind(M1, M2, ..., M21)
> 
> produces a matrix and not a data.frame. Due to this, the commands

Hi.

Let me include a few more comments. The function rbind() preserves the
types "matrix" and "data.frame". So, if Mi are indeed matrices, then
the above applies.

> UserItem[, j] <- factor(UserItem[, j], levels=1:5)
> 
> do not convert the columns to factors, but the columns remain numeric.
> Due to this, the table created as
> 
>   table(occ, UserItem[, i])
> 
> may not have the full size, since the columns correspond only to
> preferences, which do occur in UserItem[, i], and not to all possible
> preferences.

This can be demonstrated by the following example.

  occ <- 1:3
  pref <- c(1, 3, 4)
  table(occ, pref)

 pref
  occ 1 3 4
1 1 0 0
2 0 1 0
3 0 0 1

and

  table(occ, pref=factor(pref, levels=1:5))

 pref
  occ 1 2 3 4 5
1 1 0 0 0 0
2 0 0 1 0 0
3 0 0 0 1 0

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'symbols' not plotting correct circle radii

2012-07-18 Thread Stuart Leask
Hi there. 

I have been plotting some circles using 'symbols', with radii representing my 
data, but the radii looked incorrect. 

It seems to happen with a single circle too:

Symbols ( 0, 0, circles = 40, xlim = c(-40, 40), ylim= c(-40, 40))

If I put a ruler up to my monitor (technology!)  to compare the radius with the 
axes, the circle isn't radius 40; it is closer to 15... 

I suspect there is a simple answer.


Stuart
This message and any attachment are intended solely for the addressee and may 
contain confidential information. If you have received this message in error, 
please send it back to me, and immediately delete it.   Please do not use, copy 
or disclose the information contained in this message or in any attachment.  
Any views or opinions expressed by the author of this email do not necessarily 
reflect the views of the University of Nottingham.

This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour

2012-07-18 Thread Rui Barradas

Hello,

Follow the code below and see what's the right way:


d <- read.table(text="
x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33
", header=TRUE)

x3 <- matrix(d[, "x3"], ncol=2)
levels <- sort(unique(x3))
contour(0:1, 1:3, t(x3), levels=levels)


Hope this helps,

Rui Barradas

Em 18-07-2012 10:23, Akhil dua escreveu:

Hello Everyone

I have the data long format and I want to draw the contour plot with it

x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33


when I am using contour(x1,x2,x3,col=heat.colors) or fill.contour
its giving me an error that increasing x and y expected


So please tell me what is the right function to draw contour when the data
is not ordered and you cant order it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicate data between two data frames according to row names

2012-07-18 Thread jeff6868
"merge" is enough for me, thanks!
I was thinking about a loop, or a function like "grep", or maybe another
function.
I'll have to think easier next time!
Thanks again! 

--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845p4636859.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem creation tensor

2012-07-18 Thread GMailPegasus

what is the problem?
gr

-Messaggio originale- 
From: Kjetil Halvorsen

Sent: Wednesday, July 18, 2012 1:01 AM
To: Peppe Ricci
Cc: r-help@r-project.org
Subject: Re: [R] Problem creation tensor

Amusing that someone named RICCI is asking about tensors 
(sorry!)

Kjetil

On Tue, Jul 17, 2012 at 6:31 AM, Peppe Ricci  wrote:

Hi guys,

I need some help to analyzing my data.
I start to describe my data: I have 21 matrices, every matrix on the
rows has users and on columns has items, in my case films.
Element of index (i, j) represent the rating expressed by user i about 
item j.

I have a matrix for each of professions.
An example of a this type of matrix is:

item 1item 2item 3item4
  id user 11  ?  ?   5
  id user 2?  3  3   ?
  id user 32  ?  3   2
  id user 4?  ?  ?   4
  ...
So user 1 don't like item 1 but he likes so much item 4, for item 2
and 3 he hasn't expressed a rating, etc.
I need to construct a tensor with n users, m items and 21 occupations.
After I have construct the tensor I want apply Parafac.
I read data from a CSV file and build each matrix for each occupation.

Didier Leibovici (author of PTAk package) suggested to me:

ok that's bit clearer you have 21 matrices ( 1 for each occupations)
of users rating their preferences (from 1 to 5 but without rating all
of them: missing values) of  m items.
but I suppose the users are not the same across the 21 occupations
(one has only one occupation  if you're talking about
working/living occupation)
so you can't create a tensor n users x m items x 21 occupations
but you can build the contingencies of preferences m items x 21
occupations x 5 ratings

One way to build your tensor m x 21 x 5 is:
M1 is the first occupation (users x m) ...
UserItem <-rbind(M1,M2, ...M21)

m=1682

for (j in 1:m){
UserItem[,j] =factor(UserItem[,j],levels=1:5)
}
occ=factor(c(rep(1,dim(M1)[1]),rep(2,dim(M2)[1]),
...,rep(21,dim(M21)[1])),levels=1:21)

Z <- array(rep(0,m*21*5),c(m,21,5),
list(paste("item",1:m,sep=""),paste("Occ",1:21,sep=""),c("pr1","pr2","pr3","pr4","pr5")))
for ( i in 1:m){
  as.matrix(table(occ, UserItem[,2]))
  Z[i,,]=table(occ, UserItem[,i])
}

Z.CAND <- CANPARA(Z,dim=7)

I have implemented this code but I have one error in correspondance of:

  for ( i in 1:m){
Z[i,,]=table(occ,UserItem[,i])
  }

and error is:

Error in
Z[i,,]=table(occ,UserItem[,i])
the number of elements to be replaced is not a multiple of the length
of substitution

Can anyone help me to understand this code and how I can resolve the 
error?

Thanks.
Best regards.
Giuseppe

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'symbols' not plotting correct circle radii

2012-07-18 Thread Jim Lemon

On 07/18/2012 08:02 PM, Stuart Leask wrote:

Hi there.

I have been plotting some circles using 'symbols', with radii representing my 
data, but the radii looked incorrect.

It seems to happen with a single circle too:

Symbols ( 0, 0, circles = 40, xlim = c(-40, 40), ylim= c(-40, 40))

If I put a ruler up to my monitor (technology!)  to compare the radius with the 
axes, the circle isn't radius 40; it is closer to 15...

I suspect there is a simple answer.


Hi Stuart,
The answer does not appear to be simple. While the documentation for 
symbols says that the "inches" argument controls the size of the 
circles, it appears to do so with reference to the user units of the 
plot, not to that quaint unit descending from the twelfth part of 
someone's foot. So, whatever numbers you pass as "circles" seem to be 
scaled to a maximum dimension of whatever "inches" is in user units, at 
least on the x11 graphics device.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to have original (name) order after melt and cast command

2012-07-18 Thread Vincy Pyne
Dear R helpers,

I have a data.frame as given below -

dat1 = data.frame(date = 
as.Date(c("3/30/12","3/29/12","3/28/12","3/27/12","3/26/12",
"3/23/12","3/22/12","3/21/12","3/20/12", 
"3/30/12","3/29/12","3/28/12","3/27/12",
"3/26/12","3/23/12","3/22/12","3/21/12","3/20/12", 
"3/30/12","3/29/12","3/28/12",
"3/27/12","3/26/12","3/23/12","3/22/12","3/21/12","3/20/12"), 
format="%m/%d/%y"),

name = as.character(c("xyz","xyz","xyz","xyz","xyz","xyz","xyz","xyz", 
"xyz","abc", "abc","abc","abc","abc","abc", "abc","abc","abc","lmn","lmn", 
"lmn","lmn",  "lmn","lmn", "lmn","lmn","lmn")),

rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 
0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 4.877620777,  
5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 5.009506516, 
4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 
10.22966704,10.20967742,10.22927793,10.02439192)))

> dat1
 date  name rate
1  2012-03-30  xyz  0.065550707
2  2012-03-29  xyz  0.001825007
3  2012-03-28  xyz  0.054441969
4  2012-03-27  xyz  0.020810572
5  2012-03-26  xyz  0.073430586
6  2012-03-23  xyz  0.037299722
7  2012-03-22  xyz  0.099807733
8  2012-03-21  xyz  0.042072817
9  2012-03-20  xyz  0.099487289
10 2012-03-30  abc  5.550737022
11 2012-03-29  abc  4.877620777
12 2012-03-28  abc  5.462477493
13 2012-03-27  abc  4.972518082
14 2012-03-26  abc  5.014954070
15 2012-03-23  abc  5.820459609
16 2012-03-22  abc  5.403881954
17 2012-03-21  abc  5.009506516
18 2012-03-20  abc  4.807763909
19 2012-03-30  lmn 10.118854340
20 2012-03-29  lmn 10.185697500
21 2012-03-28  lmn 10.049768060
22 2012-03-27  lmn 10.154286320
23 2012-03-26  lmn 10.203993350
24 2012-03-23  lmn 10.229667040
25 2012-03-22  lmn 10.209677420
26 2012-03-21  lmn 10.229277930
27 2012-03-20  lmn 10.024391920


attach(dat1)

library(plyr)
library(reshape)


in.melt <- melt(dat1, measure = 'rate')
(df = cast(in.melt, date ~ name))

df_sorted = df[order(as.Date(df$date, "%m/%d/%Y"), decreasing = TRUE),]


> df_sorted
    date abc lmn xyz
9 2012-03-30    5.550737 10.11885 0.065550707
8 2012-03-29    4.877621 10.18570 0.001825007
7 2012-03-28    5.462477 10.04977 0.054441969
6 2012-03-27    4.972518 10.15429 0.020810572
5 2012-03-26    5.014954 10.20399 0.073430586
4 2012-03-23    5.820460 10.22967 0.037299722
3 2012-03-22    5.403882 10.20968 0.099807733
2 2012-03-21    5.009507 10.22928 0.042072817
1 2012-03-20    4.807764 10.02439 0.099487289


My Problem :-

The original data.frame has the order name as "xyz", "abc" and "lmn". However, 
after melt and cast command, the order in the "df_sorted" has changed to "abc", 
"lmn" and " xyz". How do I maintain the original order in "df_sorted" i.e. I 
need 

    date   xyz     abc   lmn    

9 2012-03-30   0.065550707   5.550737   10.11885 

8 2012-03-29   0.001825007   4.877621   10.18570 

7 2012-03-28   0.054441969   5.462477   10.04977 

6 2012-03-27   0.020810572   4.972518   10.15429 

5 2012-03-26   0.073430586   5.014954   10.20399 

4 2012-03-23   0.037299722   5.820460   10.22967 

3 2012-03-22   0.099807733   5.403882   10.20968 

2 2012-03-21   0.042072817   5.009507   10.22928 

1 2012-03-20   0.099487289   4.807764   10.02439 


Kindly guide

Thanking in advance

Vincy 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour

2012-07-18 Thread Eik Vettorazzi
Hi Akhil,
in addition to Rui's post, here is a solution using lattice graphics.

tmp<-read.table(textConnection("x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33"),header=T)

library(lattice)
contourplot(x3~x1*x2,tmp)

hth.

Am 18.07.2012 11:23, schrieb Akhil dua:
> Hello Everyone
> 
> I have the data long format and I want to draw the contour plot with it
> 
> x1 x2 x3
> 0   12
> 0   21
> 0   35
> 1   14
> 1   22
> 1   33
> 
> 
> when I am using contour(x1,x2,x3,col=heat.colors) or fill.contour
> its giving me an error that increasing x and y expected
> 
> 
> So please tell me what is the right function to draw contour when the data
> is not ordered and you cant order it.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

--
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und 
Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg

Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. 
Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] contour

2012-07-18 Thread Rui Barradas

Hello,

Now I don't understand. Inline

Em 18-07-2012 11:56, Akhil dua escreveu:

why are you writing ncol=2 ?

I have levels=100 for x1 and the x1 is my z matrix



In your contour instruction x3 is the z matrix, not x1. And your dataset 
shows a 2x3 grid, hence ncol=2, for (x1 times x2) 0:1x1:3.


Also, see Eik's post, lattic graphics clearly are more intuitive.

Rui Barradas




On Wed, Jul 18, 2012 at 3:46 PM, Rui Barradas mailto:ruipbarra...@sapo.pt>> wrote:

Hello,

Follow the code below and see what's the right way:


d <- read.table(text="

x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33
", header=TRUE)

x3 <- matrix(d[, "x3"], ncol=2)
levels <- sort(unique(x3))
contour(0:1, 1:3, t(x3), levels=levels)


Hope this helps,

Rui Barradas

Em 18-07-2012 10:23, Akhil dua escreveu:

Hello Everyone

I have the data long format and I want to draw the contour plot
with it

x1 x2 x3
0   12
0   21
0   35
1   14
1   22
1   33


when I am using contour(x1,x2,x3,col=heat.__colors) or fill.contour
its giving me an error that increasing x and y expected


So please tell me what is the right function to draw contour
when the data
is not ordered and you cant order it.

 [[alternative HTML version deleted]]


R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/__listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html

and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mean of matched data

2012-07-18 Thread robgriffin247
Hi
I think/hope there will be a simple solution to this but google-ing has
provided no answers (probably not using the right words)

I have a long data frame of >2 000 000 rows, and 6 columns. Across this
there are 24 000 combinations of gene in a column (n=12000) and gender in a
column (n=2... obviously). I want to create 2 new columns in the data frame
that on each row gives, in one column the mean value (of gene expression, in
the column called "value") for that row's gene&gender combination, and in
the other column the standard deviation for the gene&gender combination.

Any suggestions?

Rob

Example of the top of the data frame:

genevariablevalue   gender  linerep
1   CG1 X208.F1.30456   4.758010Female  208 1
2   CG1 X365.F2.30478   4.915395Female  365 2
3   CG1 X799.F2.30509   4.641636Female  799 2
4   CG1 X306.M2.32650   4.550676Male306 2
5   CG1 X712.M2.30830   4.633811Male712 2
6   CG1 X732.M2.30504   4.857564Male732 2
7   CG1 X707.F1.31120   5.104165Female  707 1
8   CG1 X514.F2.30493   4.730814Female  514 2

--
View this message in context: 
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] check whether connection can be opened

2012-07-18 Thread Berry Boessenkool


Hi all,

I'm working on a function that reads online data that is only available to 
certain IPs. It then writes a subset of the data into a file.
So whenever I'm logged in elsewhere or am not connected to the internet, I get 
an error, and the function is terminated.
I want it to rather print a message into the file.

Is there any way to test whether a connection can be opened?
Analogous to isOpen something like canOpen...
Or would I want to look into the error handling options?

By the way, the data is the internet volume used so far, which I want to track 
regularly (on every R-Start via Rprofile).
My brother (a computer schientist in progress) did not know a solution, so I 
set out to try this with R - which I got to work fine (except the above 
mentioned). 
"R beats java", I told him, to which he responded nothing ;-)
Isn't it amazing, what one can do with R?

Thanks ahead,
Berry Boessenkool

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have original (name) order after melt and cast command

2012-07-18 Thread Vincy Pyne
Dear Mr Rui Barradas,

Thanks a lot for your wonderful suggestion. It worked and will help me 
immensely in future too. Really heartfelt thanks once again.

Vincy

--- On Wed, 7/18/12, Rui Barradas  wrote:

From: Rui Barradas 
Subject: Re: [R] How to have original (name) order after melt and cast command
To: "Vincy Pyne" 
Cc: r-help@r-project.org
Received: Wednesday, July 18, 2012, 11:18 AM

Hello,

Try the following.

# This is your code
df_sorted = df[order(as.Date(df$date, "%m/%d/%Y"), decreasing = TRUE),]

# This is my code
nams <- as.character(unique(dat1$name))
nums <- sapply(nams, function(nm) which(names(df_sorted) %in% nm))
df_sorted[, sort(nums)] <- df_sorted[, nams]
names(df_sorted)[sort(nums)] <- nams
df_sorted


Hope this helps,

Rui Barradas

Em 18-07-2012 11:52, Vincy Pyne escreveu:
> Dear R helpers,
>
> I have a data.frame as given below -
>
> dat1 = data.frame(date = 
> as.Date(c("3/30/12","3/29/12","3/28/12","3/27/12","3/26/12",
> "3/23/12","3/22/12","3/21/12","3/20/12", 
> "3/30/12","3/29/12","3/28/12","3/27/12",
> "3/26/12","3/23/12","3/22/12","3/21/12","3/20/12", 
> "3/30/12","3/29/12","3/28/12",
> "3/27/12","3/26/12","3/23/12","3/22/12","3/21/12","3/20/12"), 
> format="%m/%d/%y"),
>
> name = as.character(c("xyz","xyz","xyz","xyz","xyz","xyz","xyz","xyz", 
> "xyz","abc", "abc","abc","abc","abc","abc", "abc","abc","abc","lmn","lmn", 
> "lmn","lmn",  "lmn","lmn", "lmn","lmn","lmn")),
>
> rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 
> 0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 
> 4.877620777,  5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 
> 5.009506516,
> 4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 
> 10.22966704,10.20967742,10.22927793,10.02439192)))
>
>> dat1
>           date      name         rate
> 1  2012-03-30  xyz  0.065550707
> 2  2012-03-29  xyz  0.001825007
> 3  2012-03-28  xyz  0.054441969
> 4  2012-03-27  xyz  0.020810572
> 5  2012-03-26  xyz  0.073430586
> 6  2012-03-23  xyz  0.037299722
> 7  2012-03-22  xyz  0.099807733
> 8  2012-03-21  xyz  0.042072817
> 9  2012-03-20  xyz  0.099487289
> 10 2012-03-30  abc  5.550737022
> 11 2012-03-29  abc  4.877620777
> 12 2012-03-28  abc  5.462477493
> 13 2012-03-27  abc  4.972518082
> 14 2012-03-26  abc  5.014954070
> 15 2012-03-23  abc  5.820459609
> 16 2012-03-22  abc  5.403881954
> 17 2012-03-21  abc  5.009506516
> 18 2012-03-20  abc  4.807763909
> 19 2012-03-30  lmn 10.118854340
> 20 2012-03-29  lmn 10.185697500
> 21 2012-03-28  lmn 10.049768060
> 22 2012-03-27  lmn 10.154286320
> 23 2012-03-26  lmn 10.203993350
> 24 2012-03-23  lmn 10.229667040
> 25 2012-03-22  lmn 10.209677420
> 26 2012-03-21  lmn 10.229277930
> 27 2012-03-20  lmn 10.024391920
>
>
> attach(dat1)
>
> library(plyr)
> library(reshape)
>
>
> in.melt <- melt(dat1, measure = 'rate')
> (df = cast(in.melt, date ~ name))
>
> df_sorted = df[order(as.Date(df$date, "%m/%d/%Y"), decreasing = TRUE),]
>
>
>> df_sorted
>          date         abc         lmn         xyz
> 9 2012-03-30    5.550737 10.11885 0.065550707
> 8 2012-03-29    4.877621 10.18570 0.001825007
> 7 2012-03-28    5.462477 10.04977 0.054441969
> 6 2012-03-27    4.972518 10.15429 0.020810572
> 5 2012-03-26    5.014954 10.20399 0.073430586
> 4 2012-03-23    5.820460 10.22967 0.037299722
> 3 2012-03-22    5.403882 10.20968 0.099807733
> 2 2012-03-21    5.009507 10.22928 0.042072817
> 1 2012-03-20    4.807764 10.02439 0.099487289
>
>
> My Problem :-
>
> The original data.frame has the order name as "xyz", "abc" and "lmn". 
> However, after melt and cast command, the order in the "df_sorted" has 
> changed to "abc", "lmn" and " xyz". How do I maintain the original order in 
> "df_sorted" i.e. I need
>
>          date       xyz                 abc           lmn
>
> 9 2012-03-30   0.065550707   5.550737   10.11885
>
> 8 2012-03-29   0.001825007   4.877621   10.18570
>
> 7 2012-03-28   0.054441969   5.462477   10.04977
>
> 6 2012-03-27   0.020810572   4.972518   10.15429
>
> 5 2012-03-26   0.073430586   5.014954   10.20399
>
> 4 2012-03-23   0.037299722   5.820460   10.22967
>
> 3 2012-03-22   0.099807733   5.403882   10.20968
>
> 2 2012-03-21   0.042072817   5.009507   10.22928
>
> 1 2012-03-20   0.099487289   4.807764   10.02439
>
>
> Kindly guide
>
> Thanking in advance
>
> Vincy
>
>
>     [[alternative HTML version deleted]]
>
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible co

Re: [R] How to have original (name) order after melt and cast command

2012-07-18 Thread Rui Barradas

Hello,

Try the following.

# This is your code
df_sorted = df[order(as.Date(df$date, "%m/%d/%Y"), decreasing = TRUE),]

# This is my code
nams <- as.character(unique(dat1$name))
nums <- sapply(nams, function(nm) which(names(df_sorted) %in% nm))
df_sorted[, sort(nums)] <- df_sorted[, nams]
names(df_sorted)[sort(nums)] <- nams
df_sorted


Hope this helps,

Rui Barradas

Em 18-07-2012 11:52, Vincy Pyne escreveu:

Dear R helpers,

I have a data.frame as given below -

dat1 = data.frame(date = 
as.Date(c("3/30/12","3/29/12","3/28/12","3/27/12","3/26/12",
"3/23/12","3/22/12","3/21/12","3/20/12", 
"3/30/12","3/29/12","3/28/12","3/27/12",
"3/26/12","3/23/12","3/22/12","3/21/12","3/20/12", 
"3/30/12","3/29/12","3/28/12",
"3/27/12","3/26/12","3/23/12","3/22/12","3/21/12","3/20/12"), 
format="%m/%d/%y"),

name = as.character(c("xyz","xyz","xyz","xyz","xyz","xyz","xyz","xyz", "xyz","abc", "abc","abc","abc","abc","abc", "abc","abc","abc","lmn","lmn", 
"lmn","lmn",  "lmn","lmn", "lmn","lmn","lmn")),

rate = c(c(0.065550707, 0.001825007, 0.054441969, 0.020810572, 0.073430586, 
0.037299722, 0.099807733, 0.042072817, 0.099487289, 5.550737022, 4.877620777,  
5.462477493, 4.972518082, 5.01495407, 5.820459609, 5.403881954, 5.009506516,
4.807763909, 10.11885434,10.1856975,10.04976806,10.15428632, 10.20399335, 
10.22966704,10.20967742,10.22927793,10.02439192)))


dat1

  date  name rate
1  2012-03-30  xyz  0.065550707
2  2012-03-29  xyz  0.001825007
3  2012-03-28  xyz  0.054441969
4  2012-03-27  xyz  0.020810572
5  2012-03-26  xyz  0.073430586
6  2012-03-23  xyz  0.037299722
7  2012-03-22  xyz  0.099807733
8  2012-03-21  xyz  0.042072817
9  2012-03-20  xyz  0.099487289
10 2012-03-30  abc  5.550737022
11 2012-03-29  abc  4.877620777
12 2012-03-28  abc  5.462477493
13 2012-03-27  abc  4.972518082
14 2012-03-26  abc  5.014954070
15 2012-03-23  abc  5.820459609
16 2012-03-22  abc  5.403881954
17 2012-03-21  abc  5.009506516
18 2012-03-20  abc  4.807763909
19 2012-03-30  lmn 10.118854340
20 2012-03-29  lmn 10.185697500
21 2012-03-28  lmn 10.049768060
22 2012-03-27  lmn 10.154286320
23 2012-03-26  lmn 10.203993350
24 2012-03-23  lmn 10.229667040
25 2012-03-22  lmn 10.209677420
26 2012-03-21  lmn 10.229277930
27 2012-03-20  lmn 10.024391920


attach(dat1)

library(plyr)
library(reshape)


in.melt <- melt(dat1, measure = 'rate')
(df = cast(in.melt, date ~ name))

df_sorted = df[order(as.Date(df$date, "%m/%d/%Y"), decreasing = TRUE),]



df_sorted

 date abc lmn xyz
9 2012-03-305.550737 10.11885 0.065550707
8 2012-03-294.877621 10.18570 0.001825007
7 2012-03-285.462477 10.04977 0.054441969
6 2012-03-274.972518 10.15429 0.020810572
5 2012-03-265.014954 10.20399 0.073430586
4 2012-03-235.820460 10.22967 0.037299722
3 2012-03-225.403882 10.20968 0.099807733
2 2012-03-215.009507 10.22928 0.042072817
1 2012-03-204.807764 10.02439 0.099487289


My Problem :-

The original data.frame has the order name as "xyz", "abc" and "lmn". However, after melt and cast command, the order in the 
"df_sorted" has changed to "abc", "lmn" and " xyz". How do I maintain the original order in "df_sorted" i.e. 
I need

 date   xyz abc   lmn

9 2012-03-30   0.065550707   5.550737   10.11885

8 2012-03-29   0.001825007   4.877621   10.18570

7 2012-03-28   0.054441969   5.462477   10.04977

6 2012-03-27   0.020810572   4.972518   10.15429

5 2012-03-26   0.073430586   5.014954   10.20399

4 2012-03-23   0.037299722   5.820460   10.22967

3 2012-03-22   0.099807733   5.403882   10.20968

2 2012-03-21   0.042072817   5.009507   10.22928

1 2012-03-20   0.099487289   4.807764   10.02439


Kindly guide

Thanking in advance

Vincy


[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of matched data

2012-07-18 Thread Rui Barradas

Helo,

All problems should be easy.


d <- read.table(text="
gene variable value gender line rep
1 CG1 X208.F1.30456 4.758010 Female 208 1
2 CG1 X365.F2.30478 4.915395 Female 365 2
3 CG1 X799.F2.30509 4.641636 Female 799 2
4 CG1 X306.M2.32650 4.550676 Male 306 2
5 CG1 X712.M2.30830 4.633811 Male 712 2
6 CG1 X732.M2.30504 4.857564 Male 732 2
7 CG1 X707.F1.31120 5.104165 Female 707 1
8 CG1 X514.F2.30493 4.730814 Female 514 2
", header=TRUE)

# See what we have
str(d)

# or put function(x) ...etc... in the aggregate
f <- function(x) c(mean=mean(x), sd=sd(x))
aggregate(value ~ gene + gender, data = d, f)


Hope this helps,

Rui Barradas
Em 18-07-2012 10:54, robgriffin247 escreveu:

Hi
I think/hope there will be a simple solution to this but google-ing has
provided no answers (probably not using the right words)

I have a long data frame of >2 000 000 rows, and 6 columns. Across this
there are 24 000 combinations of gene in a column (n=12000) and gender in a
column (n=2... obviously). I want to create 2 new columns in the data frame
that on each row gives, in one column the mean value (of gene expression, in
the column called "value") for that row's gene&gender combination, and in
the other column the standard deviation for the gene&gender combination.

Any suggestions?

Rob

Example of the top of the data frame:

genevariablevalue   gender  linerep
1   CG1 X208.F1.30456   4.758010Female  208 1
2   CG1 X365.F2.30478   4.915395Female  365 2
3   CG1 X799.F2.30509   4.641636Female  799 2
4   CG1 X306.M2.32650   4.550676Male306 2
5   CG1 X712.M2.30830   4.633811Male712 2
6   CG1 X732.M2.30504   4.857564Male732 2
7   CG1 X707.F1.31120   5.104165Female  707 1
8   CG1 X514.F2.30493   4.730814Female  514 2

--
View this message in context: 
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'symbols' not plotting correct circle radii

2012-07-18 Thread Sarah Goslee
On Wed, Jul 18, 2012 at 6:42 AM, Jim Lemon  wrote:
> On 07/18/2012 08:02 PM, Stuart Leask wrote:
>>
>> Hi there.
>>
>> I have been plotting some circles using 'symbols', with radii representing
>> my data, but the radii looked incorrect.
>>
>> It seems to happen with a single circle too:
>>
>> Symbols ( 0, 0, circles = 40, xlim = c(-40, 40), ylim= c(-40, 40))
>>
>> If I put a ruler up to my monitor (technology!)  to compare the radius
>> with the axes, the circle isn't radius 40; it is closer to 15...
>>
>> I suspect there is a simple answer.
>>
> Hi Stuart,
> The answer does not appear to be simple. While the documentation for symbols
> says that the "inches" argument controls the size of the circles, it appears
> to do so with reference to the user units of the plot, not to that quaint
> unit descending from the twelfth part of someone's foot. So, whatever
> numbers you pass as "circles" seem to be scaled to a maximum dimension of
> whatever "inches" is in user units, at least on the x11 graphics device.

>From ?symbols:

If inches is FALSE, the units are taken to be those of the appropriate axes.

Is that not what's wanted here?

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'symbols' not plotting correct circle radii

2012-07-18 Thread Jim Lemon

On 07/18/2012 09:42 PM, Sarah Goslee wrote:

On Wed, Jul 18, 2012 at 6:42 AM, Jim Lemon  wrote:

On 07/18/2012 08:02 PM, Stuart Leask wrote:


Hi there.

I have been plotting some circles using 'symbols', with radii representing
my data, but the radii looked incorrect.

It seems to happen with a single circle too:

Symbols ( 0, 0, circles = 40, xlim = c(-40, 40), ylim= c(-40, 40))

If I put a ruler up to my monitor (technology!)  to compare the radius
with the axes, the circle isn't radius 40; it is closer to 15...

I suspect there is a simple answer.


Hi Stuart,
The answer does not appear to be simple. While the documentation for symbols
says that the "inches" argument controls the size of the circles, it appears
to do so with reference to the user units of the plot, not to that quaint
unit descending from the twelfth part of someone's foot. So, whatever
numbers you pass as "circles" seem to be scaled to a maximum dimension of
whatever "inches" is in user units, at least on the x11 graphics device.



From ?symbols:


If inches is FALSE, the units are taken to be those of the appropriate axes.

Is that not what's wanted here?

Sarah

My interpretation was that Stuart wanted circles with radii that were 
the stated value of the "circles" argument in user units. As the 
"inches" argument is TRUE (1) by default, it only seems to serve as a 
scaling factor for the values of the "circles" argument. So, the 
combination of "circles=40" and "inches=1" yields a circle with a radius 
of 1 user unit. When I reran the example with "inches=0.5", I got 
circles with a maximum radius of 0.5 user units. I haven't tested this 
on any device other than x11.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] order or sort doesn´t work properly

2012-07-18 Thread Jean V Adams
# MCT ordered by the second column of MCT, from smallest to largest values
MCT[order(MCT[, 2]), ]

Jean


Trying To learn again  wrote on 07/17/2012 
04:42:24 PM:

> Hi all,
> 
> I want to order a series that is included on the second column in 
MCT.csv.
> 
> I do but R doesn´t order, could be because is a csv?
> 
> I have prove
> 
> MCT<-read.csv("MCT.csv")
> 
> a<-order(MCTor[,2],2,decreasing = FALSE)
> 
> 
> a<-order(MCTor[,2],1,decreasing = FALSE)
> 
> or the same with sort but didn´t worked.
> 
> It is suposed that a will have the ordered on ascending or descending
> direction, doesn´it?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'symbols' not plotting correct circle radii

2012-07-18 Thread Jim Lemon

On 07/18/2012 10:11 PM, Jim Lemon wrote:

On 07/18/2012 09:42 PM, Sarah Goslee wrote:

On Wed, Jul 18, 2012 at 6:42 AM, Jim Lemon wrote:

On 07/18/2012 08:02 PM, Stuart Leask wrote:


Hi there.

I have been plotting some circles using 'symbols', with radii
representing
my data, but the radii looked incorrect.

It seems to happen with a single circle too:

Symbols ( 0, 0, circles = 40, xlim = c(-40, 40), ylim= c(-40, 40))

If I put a ruler up to my monitor (technology!) to compare the radius
with the axes, the circle isn't radius 40; it is closer to 15...

I suspect there is a simple answer.


Hi Stuart,
The answer does not appear to be simple. While the documentation for
symbols
says that the "inches" argument controls the size of the circles, it
appears
to do so with reference to the user units of the plot, not to that
quaint
unit descending from the twelfth part of someone's foot. So, whatever
numbers you pass as "circles" seem to be scaled to a maximum
dimension of
whatever "inches" is in user units, at least on the x11 graphics device.



From ?symbols:


If inches is FALSE, the units are taken to be those of the appropriate
axes.

Is that not what's wanted here?

Sarah


My interpretation was that Stuart wanted circles with radii that were
the stated value of the "circles" argument in user units. As the
"inches" argument is TRUE (1) by default, it only seems to serve as a
scaling factor for the values of the "circles" argument. So, the
combination of "circles=40" and "inches=1" yields a circle with a radius
of 1 user unit. When I reran the example with "inches=0.5", I got
circles with a maximum radius of 0.5 user units. I haven't tested this
on any device other than x11.

Jim


Oops, I just realized that I left out saying that you were correct.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable labels

2012-07-18 Thread Nutter, Benjamin
I have my own function for doing this that is similar to the one presented 
below.  Others may have other ideas that work better.  As a general rule, I 
would caution against writing out just the label without the variable name.  
The only reason I see to separate the labels and names is if you are generating 
a report (and using write.table wouldn't be my first choice for doing that).  
However, the function below at least allows you to decide when you call it.  If 
you want to exclude the names, just set names=FALSE.


#*** possibly buggy.  I didn't do much testing on this function.
write.table.with.labels <- function(data, file, names=TRUE, labels=TRUE, 
append=FALSE, ...){
  if (names) write.table(t(names(data)), file, col.names=FALSE, ...)
  if (labels) write.table(t(label(data)), file, col.names=FALSE, 
append=any(c(names, append)), ...)
  write.table(data, file, col.names=FALSE, append=TRUE, any(c(names, labels, 
append)), ...)
}

label(mtcars$mpg) <- "Miles per gallon"

write.table.with.labels(mtcars, "test.csv", sep=",")



  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
445-1365


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Francois Maurice
Sent: Tuesday, July 17, 2012 4:36 PM
To: r-help@r-project.org
Subject: [R] Variable labels



Hi,
 
I'm using write.table() to export dataframe to Excel. All works fine except 
that I want to export the variable labels instead of variable names.

 I can see the labels in the R consol using attr(), but I just don't know how 
to use the labels instead of the names.

Thanks,

François Maurice
[[alternative HTML version deleted]]


===


 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & 
World Report (2010).  
Visit us online at http://www.clevelandclinic.org for a complete listing of our 
services, staff and locations.


Confidentiality Note:  This message is intended for use ...{{dropped:18}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Skip file

2012-07-18 Thread bunnylover23
What i have is a for loop to name files. I want it to skip over the file if it 
doesn't exist. Is there a way to do that?

s=seq(from = chron("03/15/2012"), to = chron("06/15/2012"))
day=format(as.Date(s), "%Y%m%d")
for (k in 1:length(day)){
B1=read.csv(paste("S:/file_", day[k], ".csv", sep=""))
Date=strptime(B1[,1], format="%m/%d/%Y %I:%M:%S %p")
print(B1) }

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] contour lines with number labels in lattice?

2012-07-18 Thread Martin Ivanov
 Dear R users,

I am new to lattice but am trying to get to grips with it. 
I need to add contour lines for the topography to a plot I created with spplot. 
The topography is available as a
SpatialPixelsDataFrame object Z_sfc. If I plot it as spatial lines, after 
converting to a SpatialLines object:
imageZ <- as.image.SpatialGridDataFrame(x=Z_sfc);
cLZ <- contourLines(x=imageZ$x, y=imageZ$y, z=imageZ$z, levels = 
pretty(range(imageZ$z, na.rm=TRUE), 5));
sldfZ_sfc <- ContourLines2SLDF(cL=cLZ, proj4string=CRS(proj4string(Z_sfc)))
topography <- list("sp.lines", sldfZ_sfc, first=FALSE, col="black");

and then using the sp.layout=topgraphy argument to spplot, I lose the number 
labels on the lines. 
Is it possible in lattice to print the height of the isolines just as it is 
printed with contour?

I tried with contour, it does work with SpatialPixelsDataFrame, but 
unfortunately contour does not seem work with lattice. And the lattice 
equivalent contourplot does not recognise SpatialPixelsDataFrame objects. Is it 
possible somehow to add a contour plot of Z_sfc to my plot?


Many thanks in advance.

Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?

2012-07-18 Thread Terry Therneau

Here is an example of how to do it.
> library(survival)
> vfit <- coxph(Surv(time, status) ~ celltype + trt, data=veteran)

> userinput <- data.frame(celltype="smallcell", trt = 1)
> usercurve <- survfit(vfit, newdata=userinput)  #the entire predicted 
survival curve

> user2 <- summary(usercurve, time= 2*365.25)# 2 year time point
> user2$surv
[1] 0.0007438084

Some comments:
 1. The summary function for survival curves was written so that people 
could print out shorter summaries, but it works nicely for picking off a 
particular time point.  Since the curve is a step function and there 
isn't likely to be a step at exactly "x" years, this is a bit more work 
to do yourself.  You might want to include the confidence limits in your 
web report as well.


 2. Put the whole formula into your coxph call.  I have never ever 
understood why people use the construct

 tempvar <- with(data, Surv(time, status))
 coxph(tempvar ~ age + sex + 
It leaves you with harder to read code, poorer documentation (the 
printout from coxph no longer shows the actual response variable), leads 
to hard-to-diagnose failures for certain uses of predict, ... the list 
goes on.  I have not yet thought of a single good reason for doing it 
other than "because you can".


 3. Make the user data the same as the original.  In the veteran cancer 
data set "trt" is a numeric 0/1 variable, you had it as a factor in the 
new data set.


 4. Your should get your keyboard fixed -- it appears that the spacebar 
is disabled when writing code  :-)


 5. If you plot the survival curve for the veterans cancer data set it 
only reaches to about 2 1/2 years, so the summary for 5 years will 
return NULL.


Terry Therneau

On 07/18/2012 05:00 AM, r-help-requ...@r-project.org wrote:

I am a medical student and as a capstone for my summer research project I am
going to create a simple online web "calculator" for users to input their
relevant data, and a probability of relapse within 5 years will be computed
and returned based on the Cox PH model I have developed.

The issue I'm having is finding a definitive method/function to feed the
user's "newdata" and return the probability of relapse within 5 years.  I
have googled this and the answers seems to be inconsistent; I have variously
seen people recommend survest(), survfit(), and predict.coxph().  Terry had
a useful answer
http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html
here  but I didn't quite understand what he meant in his last sentence.

Here is some code for you to quickly illustrate what you suggest.

library(rms)
library(survival)
library(Hmisc)
data(veteran)
dd=datadist(veteran)
options(datadist='dd')
options(digits=4)
obj=with(veteran,Surv(time,status))
vetcoxph=coxph(obj~celltype+trt,data=veteran)#I will fit models from
both the survival and rms packages so you can
#use what you like
vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T)
#let's say the user inputted that their cell type was smallcell and their
treatment was "1".
userinput=data.frame(celltype='smallcell',trt=factor(1))

I really appreciate your recommendations

Best,
Jahan


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for Cox regression with a time-varying covariate

2012-07-18 Thread Terry Therneau
Marc gave the referencer for Schoenfeld's article.  It's actually quite 
simple.


Sample size for a Cox model has two parts:
 1. Easy part: how many deaths to I need

  d = (za + zb)^2 / [var(x) * coef^2]

  za = cutoff for your alpah, usually 1.96 (.05 two-sided)
  zb = cutoff for power, often 0.84 = qnorm(.8) = 80% power
  var(x) = variance of the covariate you are testing.  For a yes/no 
variable like treatment this would be p(1-p) where p = fraction on the 
first arm
  coef = the target coefficient in your Cox model.  For an 
"increase in survival of 50%" we need exp(coef)=1.5 or coef=.405


All leading to the value I've memorized by now of (1.96 + 0.84)^2 /(.25* 
.405^2) = 191 deaths for a balanced two arm study to detect a 50% 
increase in survival.


 2. Hard part: How many patients will I need to recruit, over what 
interval of time, and with how much total follow-up to achieve this 
number of events?
   I never use the canned procedures for sample size because this 
second part is so study specific.  And frankly, it's always a 
guesstimate.  Death rates for a condidtion will usually drop by 1/3 as 
soon as you start enrolling subjects.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R code for to check outliers

2012-07-18 Thread Sajeeka Nanayakkara




 What is the R code to check whether data series have outliers or not?

Thanks,

Sajeeka Nanayakkara
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read mignight as 24:00 and not as 0:00

2012-07-18 Thread Sandy Adriaenssens
Ok, thank you Dan! 
I was already afraid that I would get this answer.  I will solve it by
defining date/time both as a date/time object and as a character object ( in
another column)

Regards,
Sandy

--
View this message in context: 
http://r.789695.n4.nabble.com/read-mignight-as-24-00-and-not-as-0-00-tp4636423p4636884.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] loop searching the id corresponding to the given index (timestamp)

2012-07-18 Thread Yolande Tra
Hello,

I have the following loop for two data sets: diveData_2008 and
diveData_2009. It uses two other data: diveCond_all and fishTable. The
problem is at the point to identify the dive_id for the given index (index
is timestamp). It keeps on saying
for the1st loop
Error in fishReport$dive_id[i] <- dive_id : replacement has length zero
for the 2nd loop
Error in fishReport$dive_id[i + j] <- dive_id :
  replacement has length zero

That is where the only problem resides.
Here is the code and below are the first 15 lines of data

N <- dim(diveData_2009)[1] + dim(diveData_2008)[1]
fishReport <- data.frame(matrix(0, nrow=N, ncol=11))
names(fishReport) <- c("dive_id", "site", "section", "level", "transect",
"depth", "species", "count", "size")
for(i in 1:dim(diveData_2008)[1]){
# Which dive is this observation from
thisIndex <- as.character(index(diveData_2008[i,]))
dive_id <-
diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
cat(dive_id, thisIndex, "\n")
# Store the pertinent data in the fish report data.frame
fishReport$dive_id[i] <- dive_id
fishReport$site[i] <-
diveData_2008[i,"site"]
fishReport$level[i]  <-
diveData_2008[i,"level"]
fishReport$section[i] <- 0
fishReport$transect[i]   <-
as.numeric(diveData_2008[i,"TRANSECT"])
fishReport$depth[i]<-
as.numeric(diveData_2008[i,"depth"])
fishReport$species[i] <-
fishTable$name_2009[fishTable$name_2008==as.character(diveData_2008[i,"species"])
][1]
fishReport$count[i]<-
as.numeric(diveData_2008[i,"count"])
fishReport$size[i]<-
as.numeric(diveData_2008[i,"size"])
  fishReport$visibility[i]  <-
as.numeric(diveData_2008[i,"VIS_M"])
  fishReport$swell[i]   <-as.numeric(diveData_2008[i,"swell_URSKI"])
}
j<-i
## The 2009 dives
for(i in i+1:dim(diveData_2009)[1]){
thisIndex <- as.character(index(diveData_2009[i,]))
dive_id <-
diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
cat(dive_id, thisIndex, "\n")
fishReport$dive_id[i+j] <- dive_id
fishReport$site[i+j] <- "Hopkins"
fishReport$level[i+j]  <-
diveData_2009[i,"level"]
fishReport$transect[i+j]   <-
as.numeric(diveData_2009[i,"TRANSECT"])
fishReport$depth[i+j]<-
as.numeric(diveData_2009[i,"depth"])
fishReport$species[i+j] <-
diveData_2009[i,"species"]
fishReport$count[i+j]<-
as.numeric(diveData_2008[i,"count"])
fishReport$size[i+j]<-
as.numeric(diveData_2009[i,"size"])
  fishReport$visibility[i+j]  <-
as.numeric(diveData_2008[i,"VIS_M"])
}


diveData_2008 (first 15 lines)
structure(c(" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1",
" 2", " 2", " 2", " 2", " 2", " 2", "8/6/2008", "8/6/2008", "8/6/2008",
"8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
"8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
"8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49",
"8:49", "10:03", "10:03", "10:03", "10:03", "10:03", "10:03",
"S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
"O. pictus", "C. nicholsii", "C. nicholsii", "D. vacca", "No organisms",
"S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
"S. chrysomelas", "O. pictus", "  15", "   1", "   1", "   1",
"   1", "   1", "   1", "   1", "   1", "   1", "   1", "   1",
"   1", "   1", "   1", " 6", "23", "27", "29", " 8", " 8", " 9",
"18", NA, " 6", "23", "28", "35", "32", " 8", "Hopkins", "Hopkins",
"Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
"Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
"Hopkins", "15", "15", "15", "15", "15", "15", "15", "15", "15",
"10", "10", "10", "10", "10", "10", "B", "B", "B", "B", "B",
"B", "B", "B", "M", "B", "B", "B", "B", "B", "B", " 1", " 1",
" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 2", " 2", " 2", " 2",
" 2", " 2", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5",
" 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5",
"13.9", "13.9", "13.9", "13.9", "13.9", "13.9", "13.9", "13.9",
"13.9", "13.9", "13.9", "13.9", "13.9", "13.9", "13.9", "1.0686708",
"1.0686708", "1.0686708", "1.0686708", "1.0686708", "1.0686708",
"1.0686708", "1.0686708", "1.0686708", "1.950", "1.950",
"1.950", "1.950", "1.950", "1.950"), .indexCLASS =
c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "",
class = c("xts",
"z

[R] fitting several lme sistematically

2012-07-18 Thread Berta Ibáñez

Dear R-list, 
 
I have a data set (in the following example called "a") which have: 

one "subject indicator" variable (called "id")
three dependent variables (varD, varE, var F)
three independent variables (varA, varB, varC)
 
I want to fit 9 lme models, one per posible combination (DA, DB, DC, EA, EB, 
EC, FA, FB, FC).
In stead of writting the 9 lme models, I want to do it sistematically (the 
example is a simplification of what I really have). Here you have the comands 
for the first model: 
 
library(nlme)
set.seed(50)
a<-data.frame(array(c(rep(1:10,10), rnorm(600)), c(100,7)))
names(a)<-c("id", "varA", "varB", "varC", "varD", "varE", "varF")
lme(varD ~ varA , random= ~1|id,  data=a, na.action="na.exclude")
 
I supossed that a simple sintaxis going through the variables of dataset "a" 
could cope with it: 
 
for(i in 2:4){
for(j in 5:7){
lme(a[,j] ~ a[,i] , random= ~1|id,  data=a, na.action="na.exclude")
}}
 
but it does not, and the use of eval, as.symbol and so on does not help. 
 
for(i in 2:4){
for(j in 5:7){
lme(eval(as.symbol(names(a)[j])) ~ eval(as.symbol(names(a)[i]))  , random= 
~1|id,  data=a, na.action="na.exclude")
}}
 
Any help??? Thanks a lot in advance!
 
 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting trajectories of ordinal variables over time by cheating

2012-07-18 Thread Eiko Fried
Hello.

I have an ordered dependent variable (scale 0 - 3), 5 measurement points. I
am afraid I have strong ceiling effects in my data, and would like to plot
the data (trajectories).

However, you know that plotting ordered variables isn't really feasible.

My question: would you think it appropriate to create a new variable with
let's say 200 categories out of my 1000 subjects, and averaging 5 subjects
into each group?
That way I would still get 200 trajectories and would have pseudo-metric
variables. It would just be for the visualization of the data.

If that's possible, how would I tell R to do this? (I know how to plot the
trajectories, but not how to recode 1000 subjects into 200 new categories)

Thanks
Eiko

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Skip file

2012-07-18 Thread Duncan Murdoch

On 18/07/2012 8:27 AM, bunnylove...@optonline.net wrote:

What i have is a for loop to name files. I want it to skip over the file if it 
doesn't exist. Is there a way to do that?

s=seq(from = chron("03/15/2012"), to = chron("06/15/2012"))
day=format(as.Date(s), "%Y%m%d")
for (k in 1:length(day)){
B1=read.csv(paste("S:/file_", day[k], ".csv", sep=""))
Date=strptime(B1[,1], format="%m/%d/%Y %I:%M:%S %p")
print(B1) }


Use file.exists() to test for the existence of a file.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread Bert Gunter
checkforoutliers <- function(series)NULL

Cheers,
Bert

*Explanation: There is no such thing as a statistical outlier -- or,
rather,"outlier" is a fraudulent statistical concept, defined arbitrarily
and without scientific legitimacy. The typical unstated purpose of such
identification is to remove contaminating or irrelevant data, but such a
judgment can only be made by a subject matter expert with knowledge of the
context and, usually, the specific cause for the unusual data. Do not be
misled by the large body of statistical literature on this topic into
believing that statistical analysis alone can provide objective criteria to
do this. That is a path to scientific purgatory.

For the record:
1. I am a statistician
2. Lots of highly knowledgeable, smart statisticians will condemn what I
have just said as stupid ranting.

The perils of a mailing list.

-- Bert

On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara wrote:

>
>
>
>
>  What is the R code to check whether data series have outliers or not?
>
> Thanks,
>
> Sajeeka Nanayakkara
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check whether connection can be opened

2012-07-18 Thread Michael Weylandt
I think you'll need to roll your own using tryCatch() around open(). 

Michael

On Jul 18, 2012, at 5:09 AM, Berry Boessenkool  
wrote:

> 
> 
> Hi all,
> 
> I'm working on a function that reads online data that is only available to 
> certain IPs. It then writes a subset of the data into a file.
> So whenever I'm logged in elsewhere or am not connected to the internet, I 
> get an error, and the function is terminated.
> I want it to rather print a message into the file.
> 
> Is there any way to test whether a connection can be opened?
> Analogous to isOpen something like canOpen...
> Or would I want to look into the error handling options?
> 
> By the way, the data is the internet volume used so far, which I want to 
> track regularly (on every R-Start via Rprofile).
> My brother (a computer schientist in progress) did not know a solution, so I 
> set out to try this with R - which I got to work fine (except the above 
> mentioned). 
> "R beats java", I told him, to which he responded nothing ;-)
> Isn't it amazing, what one can do with R?
> 
> Thanks ahead,
> Berry Boessenkool
> 
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] functions of vectors : loop or vectorization

2012-07-18 Thread Uwe Ligges

On 14.07.2012 00:41, Julien Salanie wrote:

I have a read a lot about the benefits of vectorization in R. I have a
program that takes "almost forever" to run. A good way to see if I have
learned something ... My problem can be summarized like this : I have a
nonlinear function of several variables that I want to optimize over one
letting the other describe a family of curves. In short, I wan't to optimize
f(x,a,b) for several values of a and b.

It is easily done with a loop. Here's an example :

a = 1:5;
b = 1:5;
myfunction = function(x){y*x-(x+z)^2};
myresults = array(dim=c(length(a),length(b)));
for(y in a){ for(z in b) { myresults[y,z] =
optimize(myfunction,c(-10,10),maximum=TRUE)$maximum }};
myresults;

[,1] [,2] [,3] [,4] [,5]
[1,] -0.5 -1.5 -2.5 -3.5 -4.5
[2,]  0.0 -1.0 -2.0 -3.0 -4.0
[3,]  0.5 -0.5 -1.5 -2.5 -3.5
[4,]  1.0  0.0 -1.0 -2.0 -3.0
[5,]  1.5  0.5 -0.5 -1.5 -2.5

Of course, my real life problem is a bit more complicated and runs in days
...

I didn't find a straightforward way to do this using the apply family. I did
a small script that works. Here it is :

c = 1:5;
d = 1:5;
myfunction2 =
function(c,d){optimize(function(x){c*x-(x+d)^2},c(-10,10),maximum=TRUE)$maximum};
v.myfunction2 = Vectorize(myfunction2, c("c","d"));
outer(c, d, v.myfunction2);

all.equal(myresults,outer(c, d, v.myfunction2));
[1] TRUE

I was quite happy with my trick of separating and wrapping the functions
until I increased the size of the two input vectors and checked for the
processing time. I made no gain. In that case :


time.elapsed; time.elapsed2;

Time difference of 0.0816 secs
Time difference of 0.0792 secs

When I changed the size of the vectors and added a logarithm here and there
to complicate a bit, it doesn't change the problem. The two methods perform
identically. Am I missing something ? Is there a better way to vectorize the
problem to gain time ? How is it that my loop performs as well as "outer" ?
Thanks in advance for your help. All the best, Julien


From what I understand you can't without loop. Actually, the 
Vectorize() call generates a function that calls mapply() which is some 
kind of a loop. This may be a case where the transition of inner loops 
to C may be benefitial.


Best,
Uwe Ligges







--
View this message in context: 
http://r.789695.n4.nabble.com/functions-of-vectors-loop-or-vectorization-tp4636494.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing packages from RProfile.site file

2012-07-18 Thread Uwe Ligges



On 17.07.2012 21:34, abhisarihan wrote:

I am trying to install custom packages upon starting R. A lot of the code
that is written by us right now is available for editing to the users. To
try and protect the code, I am packaging the production level code and
having the users install it on their machine during start up.

However, when I try to install packages in RProfile.site file, the program
goes into a loop and R is constantly launched over and over.



Yes: install.packages() starts a separate R instances to install the 
package which reads the Rprofile.site file and starts a separate R 
instances to install the package which reads the Rprofile.site file and 
starts a separate R instances to install the package which reads the 
Rprofile.site file 


Best,
Uwe Ligges





 I noticed that

a lock file for the package is created along with the package in the library
folder within R.

Here is the code I have added to the site file:

if(length(grep("customPackage", installed.packages()[,1]))==0) {
install.packages("customPackage", repos=NULL, type="source")
}

When I try to run this code after starting R (without changing the site
file), it installs the package perfectly fine and moves on. However, when I
try to do it through the RProfile file, that's when it creates the problems.

Any help on this matter would be greatly appreciated!

--
View this message in context: 
http://r.789695.n4.nabble.com/Installing-packages-from-RProfile-site-file-tp4636794.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Installing packages from RProfile.site file

2012-07-18 Thread G See
abhisarihan,

Please don't crosspost!
http://stackoverflow.com/questions/11530800/installing-packages-from-rprofile-site-file

Thanks,
Garrett

On Tue, Jul 17, 2012 at 2:34 PM, abhisarihan  wrote:
> I am trying to install custom packages upon starting R. A lot of the code
> that is written by us right now is available for editing to the users. To
> try and protect the code, I am packaging the production level code and
> having the users install it on their machine during start up.
>
> However, when I try to install packages in RProfile.site file, the program
> goes into a loop and R is constantly launched over and over. I noticed that
> a lock file for the package is created along with the package in the library
> folder within R.
>
> Here is the code I have added to the site file:
>
> if(length(grep("customPackage", installed.packages()[,1]))==0) {
> install.packages("customPackage", repos=NULL, type="source")
> }
>
> When I try to run this code after starting R (without changing the site
> file), it installs the package perfectly fine and moves on. However, when I
> try to do it through the RProfile file, that's when it creates the problems.
>
> Any help on this matter would be greatly appreciated!
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Installing-packages-from-RProfile-site-file-tp4636794.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] keep parameter in step function

2012-07-18 Thread Mati Bielawski
I would like to know how to use the keep parameter in the step function. Or
the way to always keep one of my Xs, and the other Xs to be keep or not by
step forward method.
Thank you all
Mati Bielawski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] keep parameter in step function

2012-07-18 Thread Uwe Ligges



On 18.07.2012 18:03, Mati Bielawski wrote:

I would like to know how to use the keep parameter in the step function. Or
the way to always keep one of my Xs, and the other Xs to be keep or not by
step forward method.


See ?step and in particular the "lower" part of its scope argument:

step(, scope = list(lower = ~ XtoKeep, upper = ~ X1 + X2 + etc))

Best,
Uwe Ligges





Thank you all
Mati Bielawski

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread R. Michael Weylandt
To further what Bert says:

You would almost certainly prefer to use robust statistics than
"outlier detection".

I believe Greg Snow's TeachingDemos package has a data set "outliers"
suggesting some of the perils of doing things the outlier-removal way.

Best,
Michael

On Wed, Jul 18, 2012 at 9:14 AM, Bert Gunter  wrote:
> checkforoutliers <- function(series)NULL
>
> Cheers,
> Bert
>
> *Explanation: There is no such thing as a statistical outlier -- or,
> rather,"outlier" is a fraudulent statistical concept, defined arbitrarily
> and without scientific legitimacy. The typical unstated purpose of such
> identification is to remove contaminating or irrelevant data, but such a
> judgment can only be made by a subject matter expert with knowledge of the
> context and, usually, the specific cause for the unusual data. Do not be
> misled by the large body of statistical literature on this topic into
> believing that statistical analysis alone can provide objective criteria to
> do this. That is a path to scientific purgatory.
>
> For the record:
> 1. I am a statistician
> 2. Lots of highly knowledgeable, smart statisticians will condemn what I
> have just said as stupid ranting.
>
> The perils of a mailing list.
>
> -- Bert
>
> On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara 
> wrote:
>
>>
>>
>>
>>
>>  What is the R code to check whether data series have outliers or not?
>>
>> Thanks,
>>
>> Sajeeka Nanayakkara
>> [[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread S Ellison
 

> >>  What is the R code to check whether data series have 
>>> outliers or not?

In case noone else has pointed you there, you could try the 'outliers' package. 
That contains some of the 'standard' methods of outlier testing for univariate 
data.

What you do with them when you find them is a rather more complicated and, as 
you have already seen, controversial question.

S Ellison

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread Duncan Murdoch

On 18/07/2012 10:14 AM, Bert Gunter wrote:

checkforoutliers <- function(series)NULL

Cheers,
Bert

*Explanation: There is no such thing as a statistical outlier -- or,
rather,"outlier" is a fraudulent statistical concept, defined arbitrarily
and without scientific legitimacy. The typical unstated purpose of such
identification is to remove contaminating or irrelevant data, but such a
judgment can only be made by a subject matter expert with knowledge of the
context and, usually, the specific cause for the unusual data. Do not be
misled by the large body of statistical literature on this topic into
believing that statistical analysis alone can provide objective criteria to
do this. That is a path to scientific purgatory.

For the record:
1. I am a statistician
2. Lots of highly knowledgeable, smart statisticians will condemn what I
have just said as stupid ranting.

The perils of a mailing list.


I think you are assuming that Sajeeka will handle the outliers 
incorrectly.   It happens often enough, but I don't think it's polite to 
make that assumption.


My answer to the question would have been to ask the question, "how do 
you define outliers?"  Certainly it's possible to define outliers in the 
context of a model, and their presence is an indication of problems with 
the model.  The correct response might be to weaken the assumptions of 
your model and use a robust procedure as Michael suggested (which might 
mean throwing away the outliers), or it might be to change the model in 
some other way.  Your advice to consult a subject matter expert is good, 
but in my experience, they often put more faith in their models than 
they should, so as a statistician, I think you should point out 
discrepancies like outliers.  Which means it's good to have a function 
to detect them.


Duncan Murdoch



-- Bert

On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara wrote:

>
>
>
>
>  What is the R code to check whether data series have outliers or not?
>
> Thanks,
>
> Sajeeka Nanayakkara
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread Martin Maechler
> Bert Gunter 
> on Wed, 18 Jul 2012 07:14:31 -0700 writes:

> checkforoutliers <- function(series) NULL 

  > Cheers, Bert

> *Explanation: There is no such thing as a statistical
> outlier -- or, rather,"outlier" is a fraudulent
> statistical concept, defined arbitrarily and without
> scientific legitimacy. The typical unstated purpose of
> such identification is to remove contaminating or
> irrelevant data, but such a judgment can only be made by a
> subject matter expert with knowledge of the context and,
> usually, the specific cause for the unusual data. Do not
> be misled by the large body of statistical literature on
> this topic into believing that statistical analysis alone
> can provide objective criteria to do this. That is a path
> to scientific purgatory.

> For the record: 1. I am a statistician 
> 2. Lots of highly knowledgeable, smart statisticians will condemn what I
> have just said as stupid ranting.

I entirely agree with you that  outlier-removing
procedures are mostly misused, and dangerous because of that
misuse {and hence should typically NOT be taught, or not the way
I have seen them taught (on occasions, not here at ETH!)...}

and I even more fervently agree with Michael Weylandt's 
recommendation to use robust statistics rather than outlier
detection --- at least in those cases where "robust statistics"
is *not* ill-re-defined  as  {outlier detection}+{classical stats}.

However, I don't think 'outlier' to be a fraudulent concept.
Rather I think outliers can be pretty well defined along the
line of "outlier WITH RESPECT TO A MODEL" 
 (and 'model' means 'statistical model', i.e., with some
 randomness built in) :

Outlier wrt model M := 
  an observation which is highly
  improbable to be observed under model M

(and "highly improbable" of course is somewhat vague, but that's
 not a problem per se.)
A version of the above is 

 Outlier := an observation that has unduely large influence on
 the estimators/inference performed

where 'estimator / inference'  imply a model of course.

So I think outlier is a useful concept for those who think about
*models* (rather than just data sets), and I agree that without
an implicit or explicit model, "outlier" is not well defined.

> The perils of a mailing list.
> -- Bert

:-)

Martin



> On Wed, Jul 18, 2012 at 6:27 AM, Sajeeka Nanayakkara .. wrote:

>> 
>> What is the R code to check whether data series have
>> outliers or not?
>> 
>> Thanks,
>> 
>> Sajeeka Nanayakkara


> -- 
> Bert Gunter Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting several lme sistematically

2012-07-18 Thread Jean V Adams
I'm not sure why, but lme() doesn't seem to like the variables to be 
referenced as part of a list using [ or $.
Here's an easy workaround ...

ids <- a$id
for(i in 2:4){
for(j in 5:7){
y <- a[, j]
x <- a[, i]
lme(y ~ x , random= ~1|ids, na.action="na.exclude")
}}

Jean


Berta Ibáñez  wrote on 07/18/2012 08:53:51 AM:

> Dear R-list, 
> 
> I have a data set (in the following example called "a") which have: 
> 
> one "subject indicator" variable (called "id")
> three dependent variables (varD, varE, var F)
> three independent variables (varA, varB, varC)
> 
> I want to fit 9 lme models, one per posible combination (DA, DB, DC,
> EA, EB, EC, FA, FB, FC).
> In stead of writting the 9 lme models, I want to do it 
> sistematically (the example is a simplification of what I really 
> have). Here you have the comands for the first model: 
> 
> library(nlme)
> set.seed(50)
> a<-data.frame(array(c(rep(1:10,10), rnorm(600)), c(100,7)))
> names(a)<-c("id", "varA", "varB", "varC", "varD", "varE", "varF")
> lme(varD ~ varA , random= ~1|id,  data=a, na.action="na.exclude")
> 
> I supossed that a simple sintaxis going through the variables of 
> dataset "a" could cope with it: 
> 
> for(i in 2:4){
> for(j in 5:7){
> lme(a[,j] ~ a[,i] , random= ~1|id,  data=a, na.action="na.exclude")
> }}
> 
> but it does not, and the use of eval, as.symbol and so on does not help. 

> 
> for(i in 2:4){
> for(j in 5:7){
> lme(eval(as.symbol(names(a)[j])) ~ eval(as.symbol(names(a)[i]))  , 
> random= ~1|id,  data=a, na.action="na.exclude")
> }}
> 
> Any help??? Thanks a lot in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weighted mean by week

2012-07-18 Thread Dimitri Liakhovitski
David, thanks a lot!
I tried x[-1] myself but forgot to delete 'group' from the keyby
statement - this explains why it did not work for me.
This is amazing - just 2 lines instead of my many-many.
Great learning!
Dimitri


On Tue, Jul 17, 2012 at 10:49 PM, David Freedman  wrote:
> Honestly, I wasn't sure what you wanted to do with 'group'.  Here it is with
> the 'group' variable deleted
>
> library(data.table)
> dt=data.table(x[,-1])
> dt[,lapply(.SD, function(x)weighted.mean(x,myweight)), keyby='myweek']
>
>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/weighted-mean-by-week-tp4636814p4636828.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tweaking forest plot (metafor package)

2012-07-18 Thread Michael Dewey

At 00:10 18/07/2012, eeste...@ualg.pt wrote:

Dear All,

I'm having trouble tweaking a forest plot made using the R
meta-analysis package metafor. I did the analysis based upon the
correlation coeff from studies and plotted the corresponding forest
plot easily


q2<-rma(yi,vi,mods=cbind(grupo),data=qim)
q2
forest (q2,transf=transf.ztor,digits=3, ... ,alim=c(0,1),refline=.5)
text(-1.55,42,"Espécie [refª]",pos=4,font=2)


Not sure why you use cbind(grupo) but it will not change much.



I'd like to subdivide the "table" by the moderator 'grupo' - i.e.
create the necessary spacings in the list of studies to accommodate
the moderator statistics (cf.
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=metafor:addpoly.default).

My main issue is that I cannot really understand how the function
(forest()) really uses space; specifically, how to use the parameters
'xlim', 'ylim' and 'rows' in the function? how can I change the scale
from 0-1 to 0.5-1?


An example of what went wrong would help but I 
think you will find that the rows argument tells 
forest where to put the results of the individual 
trials so if you give it non-consecutive integers 
you will get gaps in the plot which you can fill 
with addpoly. So if rows does not contain 4 you 
can use addpoly to put something in row 4. As far 
as xlim and ylim are concerned my advice would be 
to try a whole range of values and see what 
happens as this is one of those cases where it is 
easier to experiment than to explain. If all else 
fails either (a) look at the code (b) wait for 
Wolfgang to ell you and me what really happens.



Hopefully I've made myself (relatively) clear but... Thanks.

Eduardo Esteves




Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop searching the id corresponding to the given index (timestamp)

2012-07-18 Thread Jean V Adams
What is the function index() that you use in this line of code?
thisIndex <- as.character(index(diveData_2008[i,]))

Is it from some package?  Or a function you wrote yourself?
I'm trying to run the code you submitted, but I don't have a function 
called index().

Jean


Yolande Tra  wrote on 07/18/2012 08:47:04 AM:

> Hello,
> 
> I have the following loop for two data sets: diveData_2008 and
> diveData_2009. It uses two other data: diveCond_all and fishTable. The
> problem is at the point to identify the dive_id for the given index 
(index
> is timestamp). It keeps on saying
> for the1st loop
> Error in fishReport$dive_id[i] <- dive_id : replacement has length zero
> for the 2nd loop
> Error in fishReport$dive_id[i + j] <- dive_id :
>   replacement has length zero
> 
> That is where the only problem resides.
> Here is the code and below are the first 15 lines of data
> 
> N <- dim(diveData_2009)[1] + dim(diveData_2008)[1]
> fishReport <- data.frame(matrix(0, nrow=N, ncol=11))
> names(fishReport) <- c("dive_id", "site", "section", "level", 
"transect",
> "depth", "species", "count", "size")
> for(i in 1:dim(diveData_2008)[1]){
> # Which dive is this observation from
> thisIndex <- as.character(index(diveData_2008[i,]))
> dive_id <-
> diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
> cat(dive_id, thisIndex, "\n")
> # Store the pertinent data in the fish report data.frame
> fishReport$dive_id[i] <- dive_id
> fishReport$site[i] <-
> diveData_2008[i,"site"]
> fishReport$level[i]  <-
> diveData_2008[i,"level"]
> fishReport$section[i] <- 0
> fishReport$transect[i]   <-
> as.numeric(diveData_2008[i,"TRANSECT"])
> fishReport$depth[i]<-
> as.numeric(diveData_2008[i,"depth"])
> fishReport$species[i] <-
> fishTable$name_2009[fishTable$name_2008==as.character(diveData_2008
> [i,"species"])
> ][1]
> fishReport$count[i]<-
> as.numeric(diveData_2008[i,"count"])
> fishReport$size[i]<-
> as.numeric(diveData_2008[i,"size"])
>   fishReport$visibility[i]  <-
> as.numeric(diveData_2008[i,"VIS_M"])
>   fishReport$swell[i]   <-as.numeric(diveData_2008[i,"swell_URSKI"])
> }
> j<-i
> ## The 2009 dives
> for(i in i+1:dim(diveData_2009)[1]){
> thisIndex <- as.character(index(diveData_2009[i,]))
> dive_id <-
> diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
> cat(dive_id, thisIndex, "\n")
> fishReport$dive_id[i+j] <- dive_id
> fishReport$site[i+j] <- 
"Hopkins"
> fishReport$level[i+j]  <-
> diveData_2009[i,"level"]
> fishReport$transect[i+j]   <-
> as.numeric(diveData_2009[i,"TRANSECT"])
> fishReport$depth[i+j] <-
> as.numeric(diveData_2009[i,"depth"])
> fishReport$species[i+j] <-
> diveData_2009[i,"species"]
> fishReport$count[i+j]<-
> as.numeric(diveData_2008[i,"count"])
> fishReport$size[i+j]<-
> as.numeric(diveData_2009[i,"size"])
>   fishReport$visibility[i+j]  <-
> as.numeric(diveData_2008[i,"VIS_M"])
> }
> 
> 
> diveData_2008 (first 15 lines)
> structure(c(" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1",
> " 2", " 2", " 2", " 2", " 2", " 2", "8/6/2008", "8/6/2008", "8/6/2008",
> "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
> "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
> "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49",
> "8:49", "10:03", "10:03", "10:03", "10:03", "10:03", "10:03",
> "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
> "O. pictus", "C. nicholsii", "C. nicholsii", "D. vacca", "No organisms",
> "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
> "S. chrysomelas", "O. pictus", "  15", "   1", "   1", "   1",
> "   1", "   1", "   1", "   1", "   1", "   1", "   1", "   1",
> "   1", "   1", "   1", " 6", "23", "27", "29", " 8", " 8", " 9",
> "18", NA, " 6", "23", "28", "35", "32", " 8", "Hopkins", "Hopkins",
> "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
> "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
> "Hopkins", "15", "15", "15", "15", "15", "15", "15", "15", "15",
> "10", "10", "10", "10", "10", "10", "B", "B", "B", "B", "B",
> "B", "B", "B", "M", "B", "B", "B", "B", "B", "B", " 1", " 1",
> " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 2", " 2", " 2", " 2",
> " 2", " 2", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5",
> " 3.5", " 3.5", "

Re: [R] loop searching the id corresponding to the given index (timestamp)

2012-07-18 Thread Yolande Tra
it is a function from library(xts)

On Wed, Jul 18, 2012 at 1:15 PM, Jean V Adams  wrote:

> What is the function index() that you use in this line of code?
> thisIndex <- as.character(index(diveData_2008[i,]))
>
> Is it from some package?  Or a function you wrote yourself?
> I'm trying to run the code you submitted, but I don't have a function
> called index().
>
> Jean
>
>
> Yolande Tra  wrote on 07/18/2012 08:47:04 AM:
>
>
> > Hello,
> >
> > I have the following loop for two data sets: diveData_2008 and
> > diveData_2009. It uses two other data: diveCond_all and fishTable. The
> > problem is at the point to identify the dive_id for the given index
> (index
> > is timestamp). It keeps on saying
> > for the1st loop
> > Error in fishReport$dive_id[i] <- dive_id : replacement has length zero
> > for the 2nd loop
> > Error in fishReport$dive_id[i + j] <- dive_id :
> >   replacement has length zero
> >
> > That is where the only problem resides.
> > Here is the code and below are the first 15 lines of data
> >
> > N <- dim(diveData_2009)[1] + dim(diveData_2008)[1]
> > fishReport <- data.frame(matrix(0, nrow=N, ncol=11))
> > names(fishReport) <- c("dive_id", "site", "section", "level", "transect",
> > "depth", "species", "count", "size")
> > for(i in 1:dim(diveData_2008)[1]){
> > # Which dive is this observation from
> > thisIndex <- as.character(index(diveData_2008[i,]))
> > dive_id <-
> > diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
> > cat(dive_id, thisIndex, "\n")
> > # Store the pertinent data in the fish report data.frame
> > fishReport$dive_id[i] <- dive_id
> > fishReport$site[i] <-
> > diveData_2008[i,"site"]
> > fishReport$level[i]  <-
> > diveData_2008[i,"level"]
> > fishReport$section[i] <- 0
> > fishReport$transect[i]   <-
> > as.numeric(diveData_2008[i,"TRANSECT"])
> > fishReport$depth[i]<-
> > as.numeric(diveData_2008[i,"depth"])
> > fishReport$species[i] <-
> > fishTable$name_2009[fishTable$name_2008==as.character(diveData_2008
> > [i,"species"])
> > ][1]
> > fishReport$count[i]<-
> > as.numeric(diveData_2008[i,"count"])
> > fishReport$size[i]<-
> > as.numeric(diveData_2008[i,"size"])
> >   fishReport$visibility[i]  <-
> > as.numeric(diveData_2008[i,"VIS_M"])
> >   fishReport$swell[i]   <-as.numeric(diveData_2008[i,"swell_URSKI"])
> > }
> > j<-i
> > ## The 2009 dives
> > for(i in i+1:dim(diveData_2009)[1]){
> > thisIndex <- as.character(index(diveData_2009[i,]))
> > dive_id <-
> > diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
> > cat(dive_id, thisIndex, "\n")
> > fishReport$dive_id[i+j] <- dive_id
> > fishReport$site[i+j] <- "Hopkins"
> > fishReport$level[i+j]  <-
> > diveData_2009[i,"level"]
> > fishReport$transect[i+j]   <-
> > as.numeric(diveData_2009[i,"TRANSECT"])
> > fishReport$depth[i+j]
>  <-
> > as.numeric(diveData_2009[i,"depth"])
> > fishReport$species[i+j] <-
> > diveData_2009[i,"species"]
> > fishReport$count[i+j]<-
> > as.numeric(diveData_2008[i,"count"])
> > fishReport$size[i+j]<-
> > as.numeric(diveData_2009[i,"size"])
> >   fishReport$visibility[i+j]  <-
> > as.numeric(diveData_2008[i,"VIS_M"])
> > }
> >
> >
> > diveData_2008 (first 15 lines)
> > structure(c(" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1",
> > " 2", " 2", " 2", " 2", " 2", " 2", "8/6/2008", "8/6/2008", "8/6/2008",
> > "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
> > "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
> > "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49",
> > "8:49", "10:03", "10:03", "10:03", "10:03", "10:03", "10:03",
> > "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
> > "O. pictus", "C. nicholsii", "C. nicholsii", "D. vacca", "No organisms",
> > "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
> > "S. chrysomelas", "O. pictus", "  15", "   1", "   1", "   1",
> > "   1", "   1", "   1", "   1", "   1", "   1", "   1", "   1",
> > "   1", "   1", "   1", " 6", "23", "27", "29", " 8", " 8", " 9",
> > "18", NA, " 6", "23", "28", "35", "32", " 8", "Hopkins", "Hopkins",
> > "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
> > "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
> > "Hopkins", "15", "15", "15", "15", "15", "15", "15", "15", 

Re: [R] loop searching the id corresponding to the given index (timestamp)

2012-07-18 Thread Rui Barradas

Hello,

Try the following.

for(i in 1:dim(diveData_2008)[1]){
# Which dive is this observation from
thisIndex <- as.character(index(diveData_2008[i,]))
thisIndex <- as.POSIXct(thisIndex)
dive_id   <- diveCond_all$dive_id[diveCond_all$timestamp == thisIndex]
#
[... etc ...]

And the same in the 2009 loop.

I think this is it.

Good luck,

Rui Barradas

Em 18-07-2012 14:47, Yolande Tra escreveu:

Hello,

I have the following loop for two data sets: diveData_2008 and
diveData_2009. It uses two other data: diveCond_all and fishTable. The
problem is at the point to identify the dive_id for the given index (index
is timestamp). It keeps on saying
for the1st loop
Error in fishReport$dive_id[i] <- dive_id : replacement has length zero
for the 2nd loop
Error in fishReport$dive_id[i + j] <- dive_id :
   replacement has length zero

That is where the only problem resides.
Here is the code and below are the first 15 lines of data

N <- dim(diveData_2009)[1] + dim(diveData_2008)[1]
fishReport <- data.frame(matrix(0, nrow=N, ncol=11))
names(fishReport) <- c("dive_id", "site", "section", "level", "transect",
"depth", "species", "count", "size")
for(i in 1:dim(diveData_2008)[1]){
 # Which dive is this observation from
 thisIndex <- as.character(index(diveData_2008[i,]))
 dive_id <-
diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
 cat(dive_id, thisIndex, "\n")
 # Store the pertinent data in the fish report data.frame
 fishReport$dive_id[i] <- dive_id
 fishReport$site[i] <-
diveData_2008[i,"site"]
 fishReport$level[i]  <-
diveData_2008[i,"level"]
 fishReport$section[i] <- 0
 fishReport$transect[i]   <-
as.numeric(diveData_2008[i,"TRANSECT"])
 fishReport$depth[i]<-
as.numeric(diveData_2008[i,"depth"])
 fishReport$species[i] <-
fishTable$name_2009[fishTable$name_2008==as.character(diveData_2008[i,"species"])
][1]
 fishReport$count[i]<-
as.numeric(diveData_2008[i,"count"])
 fishReport$size[i]<-
as.numeric(diveData_2008[i,"size"])
   fishReport$visibility[i]  <-
as.numeric(diveData_2008[i,"VIS_M"])
   fishReport$swell[i]   <-as.numeric(diveData_2008[i,"swell_URSKI"])
}
j<-i
## The 2009 dives
for(i in i+1:dim(diveData_2009)[1]){
 thisIndex <- as.character(index(diveData_2009[i,]))
 dive_id <-
diveCond_all$dive_id[diveCond_all$timestamp==thisIndex]
 cat(dive_id, thisIndex, "\n")
 fishReport$dive_id[i+j] <- dive_id
 fishReport$site[i+j] <- "Hopkins"
 fishReport$level[i+j]  <-
diveData_2009[i,"level"]
 fishReport$transect[i+j]   <-
as.numeric(diveData_2009[i,"TRANSECT"])
 fishReport$depth[i+j]<-
as.numeric(diveData_2009[i,"depth"])
 fishReport$species[i+j] <-
diveData_2009[i,"species"]
 fishReport$count[i+j]<-
as.numeric(diveData_2008[i,"count"])
 fishReport$size[i+j]<-
as.numeric(diveData_2009[i,"size"])
   fishReport$visibility[i+j]  <-
as.numeric(diveData_2008[i,"VIS_M"])
 }


diveData_2008 (first 15 lines)
structure(c(" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1",
" 2", " 2", " 2", " 2", " 2", " 2", "8/6/2008", "8/6/2008", "8/6/2008",
"8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
"8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
"8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49",
"8:49", "10:03", "10:03", "10:03", "10:03", "10:03", "10:03",
"S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
"O. pictus", "C. nicholsii", "C. nicholsii", "D. vacca", "No organisms",
"S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
"S. chrysomelas", "O. pictus", "  15", "   1", "   1", "   1",
"   1", "   1", "   1", "   1", "   1", "   1", "   1", "   1",
"   1", "   1", "   1", " 6", "23", "27", "29", " 8", " 8", " 9",
"18", NA, " 6", "23", "28", "35", "32", " 8", "Hopkins", "Hopkins",
"Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
"Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
"Hopkins", "15", "15", "15", "15", "15", "15", "15", "15", "15",
"10", "10", "10", "10", "10", "10", "B", "B", "B", "B", "B",
"B", "B", "B", "M", "B", "B", "B", "B", "B", "B", " 1", " 1",
" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 2", " 2", " 2", " 2",
" 2", " 2", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5", " 3.5",
" 3.5", " 3.5", " 3.5", " 3

Re: [R] R code for to check outliers

2012-07-18 Thread Nordlund, Dan (DSHS/RDA)
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Sajeeka Nanayakkara
> Sent: Wednesday, July 18, 2012 6:28 AM
> To: r-help@r-project.org
> Subject: [R] R code for to check outliers
> 
> 
> 
> 
> 
>  What is the R code to check whether data series have outliers or not?
> 
> Thanks,
> 
> Sajeeka Nanayakkara
>   [[alternative HTML version deleted]]

Sajeeka,

You have been given lots of good information and appropriate warnings.  Let me 
add another caveat to think about in the context of outliers/unusual values.  A 
value may only be unusual in a multivariate context.  If we have a dataset with 
human heights in it, a value of 73 inches would not be unusual.  If we then 
learned that this particular individual was female, it would be somewhat 
unusual but certainly within the realm of possibility.  If we then learn that 
the individual is 3 years old, it would be highly unusual.

So, you can see why people on the list are somewhat unwilling to say here is 
THE function "to check whether data series have outliers or not."

Now having said that, can you define what YOU mean by "outlier" and why you are 
concerned about finding them.  Someone may be able to offer advice that will 
help you achieve your goal.

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting points to a map

2012-07-18 Thread Gary Dong
Dear R users,

I have a city map in shape file (polygon). I also have some points that I
hope to plot them to the city map. The only information I have about those
points are their relative longitude and latitude to the city center by
miles. Is there a way that R can help me to do this? Thanks.

Gary

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop searching the id corresponding to the given index (timestamp)

2012-07-18 Thread Yolande Tra
Thank you Rui. Very helpful. Yolande.

On Wed, Jul 18, 2012 at 1:37 PM, Rui Barradas  wrote:

> Hello,
>
> Try the following.
>
>
> for(i in 1:dim(diveData_2008)[1]){
> # Which dive is this observation from
> thisIndex <- as.character(index(diveData_**2008[i,]))
> thisIndex <- as.POSIXct(thisIndex)
> dive_id   <- diveCond_all$dive_id[diveCond_**all$timestamp ==
> thisIndex]
> #
> [... etc ...]
>
> And the same in the 2009 loop.
>
> I think this is it.
>
> Good luck,
>
> Rui Barradas
>
> Em 18-07-2012 14:47, Yolande Tra escreveu:
>
>>  Hello,
>>
>> I have the following loop for two data sets: diveData_2008 and
>> diveData_2009. It uses two other data: diveCond_all and fishTable. The
>> problem is at the point to identify the dive_id for the given index (index
>> is timestamp). It keeps on saying
>> for the1st loop
>> Error in fishReport$dive_id[i] <- dive_id : replacement has length zero
>> for the 2nd loop
>> Error in fishReport$dive_id[i + j] <- dive_id :
>>replacement has length zero
>>
>> That is where the only problem resides.
>> Here is the code and below are the first 15 lines of data
>>
>> N <- dim(diveData_2009)[1] + dim(diveData_2008)[1]
>> fishReport <- data.frame(matrix(0, nrow=N, ncol=11))
>> names(fishReport) <- c("dive_id", "site", "section", "level", "transect",
>> "depth", "species", "count", "size")
>> for(i in 1:dim(diveData_2008)[1]){
>>  # Which dive is this observation from
>>  thisIndex <- as.character(index(diveData_**2008[i,]))
>>  dive_id <-
>> diveCond_all$dive_id[diveCond_**all$timestamp==thisIndex]
>>  cat(dive_id, thisIndex, "\n")
>>  # Store the pertinent data in the fish report data.frame
>>  fishReport$dive_id[i] <- dive_id
>>  fishReport$site[i] <-
>> diveData_2008[i,"site"]
>>  fishReport$level[i]  <-
>> diveData_2008[i,"level"]
>>  fishReport$section[i] <- 0
>>  fishReport$transect[i]   <-
>> as.numeric(diveData_2008[i,"**TRANSECT"])
>>  fishReport$depth[i]<-
>> as.numeric(diveData_2008[i,"**depth"])
>>  fishReport$species[i] <-
>> fishTable$name_2009[fishTable$**name_2008==as.character(**
>> diveData_2008[i,"species"])
>> ][1]
>>  fishReport$count[i]<-
>> as.numeric(diveData_2008[i,"**count"])
>>  fishReport$size[i]<-
>> as.numeric(diveData_2008[i,"**size"])
>>fishReport$visibility[i]  <-
>> as.numeric(diveData_2008[i,"**VIS_M"])
>>fishReport$swell[i]   <-as.numeric(diveData_2008[i,"**
>> swell_URSKI"])
>> }
>> j<-i
>> ## The 2009 dives
>> for(i in i+1:dim(diveData_2009)[1]){
>>  thisIndex <- as.character(index(diveData_**2009[i,]))
>>  dive_id <-
>> diveCond_all$dive_id[diveCond_**all$timestamp==thisIndex]
>>  cat(dive_id, thisIndex, "\n")
>>  fishReport$dive_id[i+j] <- dive_id
>>  fishReport$site[i+j] <- "Hopkins"
>>  fishReport$level[i+j]  <-
>> diveData_2009[i,"level"]
>>  fishReport$transect[i+j]   <-
>> as.numeric(diveData_2009[i,"**TRANSECT"])
>>  fishReport$depth[i+j]
>>  <-
>> as.numeric(diveData_2009[i,"**depth"])
>>  fishReport$species[i+j] <-
>> diveData_2009[i,"species"]
>>  fishReport$count[i+j]<-
>> as.numeric(diveData_2008[i,"**count"])
>>  fishReport$size[i+j]<-
>> as.numeric(diveData_2009[i,"**size"])
>>fishReport$visibility[i+j]  <-
>> as.numeric(diveData_2008[i,"**VIS_M"])
>>  }
>>
>>
>> diveData_2008 (first 15 lines)
>> structure(c(" 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1", " 1",
>> " 2", " 2", " 2", " 2", " 2", " 2", "8/6/2008", "8/6/2008", "8/6/2008",
>> "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
>> "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008", "8/6/2008",
>> "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49", "8:49",
>> "8:49", "10:03", "10:03", "10:03", "10:03", "10:03", "10:03",
>> "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
>> "O. pictus", "C. nicholsii", "C. nicholsii", "D. vacca", "No organisms",
>> "S. OYT", "S. atrovirens", "S. atrovirens", "S. atrovirens",
>> "S. chrysomelas", "O. pictus", "  15", "   1", "   1", "   1",
>> "   1", "   1", "   1", "   1", "   1", "   1", "   1", "   1",
>> "   1", "   1", "   1", " 6", "23", "27", "29", " 8", " 8", " 9",
>> "18", NA, " 6", "23", "28", "35", "32", " 8", "Hopkins", "Hopkins",
>> "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins", "Hopkins",
>> "Hop

Re: [R] Power analysis for Cox regression with a time-varying covariate

2012-07-18 Thread Paul Miller
Hi Terry, Greg, and Marc,
 
Thanks for your advice about this. I think I have a pretty good starting point 
now for the analysis.
 
Appreciate your help.
 
Paul

--- On Wed, 7/18/12, Terry Therneau  wrote:


From: Terry Therneau 
Subject: Re: [R] Power analysis for Cox regression with a time-varying covariate
To: "Marc Schwartz" , "Greg Snow" <538...@gmail.com>, 
r-help@r-project.org, "Paul Miller" 
Received: Wednesday, July 18, 2012, 8:24 AM


Marc gave the referencer for Schoenfeld's article.  It's actually quite simple.

Sample size for a Cox model has two parts:
1. Easy part: how many deaths to I need

      d = (za + zb)^2 / [var(x) * coef^2]

      za = cutoff for your alpah, usually 1.96 (.05 two-sided)
      zb = cutoff for power, often 0.84 = qnorm(.8) = 80% power
      var(x) = variance of the covariate you are testing.  For a yes/no 
variable like treatment this would be p(1-p) where p = fraction on the first arm
      coef = the target coefficient in your Cox model.  For an "increase in 
survival of 50%" we need exp(coef)=1.5 or coef=.405

All leading to the value I've memorized by now of (1.96 + 0.84)^2 /(.25* 
.405^2) = 191 deaths for a balanced two arm study to detect a 50% increase in 
survival.

2. Hard part: How many patients will I need to recruit, over what interval of 
time, and with how much total follow-up to achieve this number of events?
   I never use the canned procedures for sample size because this second part 
is so study specific.  And frankly, it's always a guesstimate.  Death rates for 
a condidtion will usually drop by 1/3 as soon as you start enrolling subjects.

Terry T.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How does "rlm" in R decide its "w" weights for each IRLS iteration?

2012-07-18 Thread Michael
Hi all,

I am also confused about the manual:

   a.  The input arguments:

wt.method are the weights case weights (giving the relative importance of
case, so a weight of 2 means there are two of these) or the inverse of the
variances, so a weight of two means this error is half as variable?

w (optional) initial down-weighting for each case.

init (optional) initial values for the coefficients OR a method to find
initial values OR the result of a fit with a coef component. Known methods
are "ls" (the default) for an initial least-squares fit using weights
w*weights, and "lts" for an unweighted least-trimmed squares fit with 200
samples.



b. The returned values:

w the weights used in the IWLS process

wresid a working residual, weighted for "inv.var" weights only.



How to use these input arguments?

Anybody please shed some light?

Also, is my understanding below correct?

The input argument "w" is used for the initial values of the rlm IRLS
weighting and the output value "w" is the converged "w".

The "weights" input argument is actually what I want to apply - if I have
my own set of weights.
And the real/actual weights are the product of "weights"(I supplied) and
the converged output "w" (an output).


If my understanding above is correct, how does "rlm" decide its "w" for
each IRLS iteration then?

Any pointers/tutorials/notes to the calculation of these "w"'s in each IRLS
iteration?

Thanks to all.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Imposing more than one condition to if

2012-07-18 Thread Rui Barradas

Hello,

You're right. I had thought of this, and I believe there's a day when it 
happens to have a zero in the middle of the day. R doesn't allow 
conditions like the one you've written but it does allow multiple 
conditions, combined using the logical connectives, 'not' - '!', 'or' - 
'|' and 'and' - '&'. Let's see:


Day <- 05:00:00 <= x$clock & x$clock <= 17:00:00
Night <- !(05:00:00 <= x$clock & x$clock <= 17:00:00) # equal to:
Night <- x$clock < 05:00:00 | x$clock > 17:00:00

So, you had the first condition reversed and maybe !Day is more readable ;)

(Note: to check if a variable is in an interval, say (a, b), I find it 
better to write a < x & x < b with the interval's end points as 
condition extremes, like Day above. Then if negation is needed half the 
work is already done.)


Anyway, it should be easy to put the compound condition in the function f().

Rui Barradas

Em 18-07-2012 19:02, Santiago Guallar escreveu:

Nice! It works. Thank you, Rui
There's something that takes me back to the original question, though.
The code takes the first value that meets the critical condition (light
==0); however, this value can appear more than once a day because these
animals nest in dark caves, and once they are in the logger sensor
registers a 0. So, a second condition limiting the time at which the
critical condition can be accepted to calculate dusk should be
implemented; something like (although R doesn't allow to impose the
double condition):
if (05:00:00 17:00:00) {zrle <- rle(x$lig == 0)}
Santi

*From:* Rui Barradas 
*To:* Santiago Guallar 
*Cc:* r-help@r-project.org
*Sent:* Wednesday, July 18, 2012 11:37 AM
*Subject:* Re: [R] Imposing more than one condition to if

hELLO,

There was no nedd to change the names of the variables inside the
fucntion. What was going on s that in this new file column 'dtime'
doesn't exist. Since column 'clock' exists in all files, I've changed
the function once again, making it more general.

Note that there is an argument 'colTime' with a default value. In the
function's use below I call it with and without that argument.


f <- function(x, colTime="clock"){
 zrle <- rle(x$lig == 0)
 if(zrle$values[1]){
 idusk <- sum(zrle$lengths[1:2]) + 1
 idawn <- zrle$lengths[1] + 1
 x$dusk <- x[ idusk , colTime ]
 x$dawn <- x[ idawn , colTime ]
 }else{
 idusk <- zrle$lengths[1] + 1
 x$dusk <- x[ idusk , colTime ]
 x$dawn <- NA
 }
 x
}

str(d)
#d$date <- as.Date(d$date, format="%d/%m%/y")

#library(chron)
#tm <- times(as.character(d$clock))
#d$clock <- tm

# See what will happen. This call uses the default 'colTime'
bb <- by(d, d$date, f)
for(i in seq_along(bb))print(head(bb[[i]], 1))

# call and rbind. This call uses explicit arg 'colTime'
do.call(rbind, by(d, d$date, f, colTime="clock"))

# Alternatively, it could be, because 'bb' is already created,
do.call(rbind, bb)


In the code above, I use an optional conversion to date and time
classes; as.Date is part of base R, but class times needs package
chron.
It's a good idea to convert these variables, you can later use, say,
arithmetics on dates and times, such as differences.

Hope this helps,

Rui Barradas

Em 17-07-2012 19:53, Santiago Guallar escreveu:
 > Thank for your time, Rui.
 > Now, I get this error message:
 > Error en rbind(deparse.level, ...) :
 > numbers of columns of arguments do not match
 > Apparently, some columns have missing values and rbind doesn't
work. I
 > tried:
 > require(plyr)
 > do.call(rbind.fill, by(z, z$date, f))
 > Then the code runs through but dusk the variable dusk is missing and
 > dawn is filled with NA.
 > Just in case the problem simply lies in a name, this is your code
after
 > I changed the object names (basically 'x' and 'd' by 'z') to
adapt them
 > to the names of my dataset:
 > f <- function(z){
 > zrle <- rle(z$lig == 0)
 > if(zrle$values[1]){
 > idusk <- sum(zrle$lengths[1:2]) + 1
 > idawn <- zrle$lengths[1] + 1
 > z$dusk <- z$dtime[ idusk ]
 > z$dawn <- z$dtime[ idawn ]
 > }else{
 > idusk <- zrle$lengths[1] + 1
 > z$dusk <- z$dtime[ idusk ]
 > z$dawn <- NA
 > }
 > z
 > }
 >
 > do.call(rbind, by(z, z$date, f))
 > Again, I attached a dput() with the object z which contains my
dataset.
 > Santi
 >
 >*From:* Rui Barradas mailto:ruipbarra...@sapo.pt>>
 >*To:* Santiago Guallar mailto:sgual...@yahoo.com>>
 >*Cc:* r-help@r-project.org 
 >*Sent:* Tuesday, July 17, 2012 11:52 AM
 >*Subject:* Re: [R] Imposing more than one condition to if
 >
 >
 >Hello,
 >
 >My code couldn't find the right in

Re: [R] plotting points to a map

2012-07-18 Thread Sarah Goslee
Hi,

On Wed, Jul 18, 2012 at 2:11 PM, Gary Dong  wrote:
> Dear R users,
>
> I have a city map in shape file (polygon). I also have some points that I
> hope to plot them to the city map.

What projection is the shape file in? Have you successfully imported
it into R yet?

This page has a nice overview on dealing with shapefiles in R.

http://www.nceas.ucsb.edu/scicomp/usecases/ReadWriteESRIShapeFiles


> The only information I have about those
> points are their relative longitude and latitude to the city center by
> miles. Is there a way that R can help me to do this? Thanks.

What does "relative longitude and latitude in miles" mean? That makes
no sense to me. Do you mean distance and direction? (Which wouldn't be
latitude and longitude.)

If the latter, do you know the coordinates of the reference point?

I think we need more information to be able to help. You might also be
better served by asking on the r-sig-geo list instead of the main
R-help list.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting points to a map

2012-07-18 Thread Gary Dong
Thanks, Sara.

For your first question:

Geographic Coordinate System: GCS_North_American_1983_HARN

Projected Coordinate System:
NAD_1983_HARN_StatePlane_Oregon_North_FIPS_3601_Feet_Intl

I have imported the shp file in R by read.shaplefiles().

For your second question:
1) by relative long and lat, I mean distances to the city center from
south-north and east-west directions.
2) I know the coordinate of the reference point.

Thanks.

Best
Gary


On Wed, Jul 18, 2012 at 11:30 AM, Sarah Goslee wrote:

> Hi,
>
> On Wed, Jul 18, 2012 at 2:11 PM, Gary Dong  wrote:
> > Dear R users,
> >
> > I have a city map in shape file (polygon). I also have some points that I
> > hope to plot them to the city map.
>
> What projection is the shape file in? Have you successfully imported
> it into R yet?
>
> This page has a nice overview on dealing with shapefiles in R.
>
> http://www.nceas.ucsb.edu/scicomp/usecases/ReadWriteESRIShapeFiles
>
>
> > The only information I have about those
> > points are their relative longitude and latitude to the city center by
> > miles. Is there a way that R can help me to do this? Thanks.
>
> What does "relative longitude and latitude in miles" mean? That makes
> no sense to me. Do you mean distance and direction? (Which wouldn't be
> latitude and longitude.)
>
> If the latter, do you know the coordinates of the reference point?
>
> I think we need more information to be able to help. You might also be
> better served by asking on the r-sig-geo list instead of the main
> R-help list.
>
> Sarah
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread Rui Barradas

Hello,

Inline

Em 18-07-2012 18:44, Nordlund, Dan (DSHS/RDA) escreveu:

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Sajeeka Nanayakkara
Sent: Wednesday, July 18, 2012 6:28 AM
To: r-help@r-project.org
Subject: [R] R code for to check outliers





  What is the R code to check whether data series have outliers or not?

Thanks,

Sajeeka Nanayakkara
[[alternative HTML version deleted]]


Sajeeka,

You have been given lots of good information and appropriate warnings.  Let me 
add another caveat to think about in the context of outliers/unusual values.  A 
value may only be unusual in a multivariate context.  If we have a dataset with 
human heights in it, a value of 73 inches would not be unusual.  If we then 
learned that this particular individual was female, it would be somewhat 
unusual but certainly within the realm of possibility.


Uma Thurman!


 If we then learn that the individual is 3 years old, it would be highly 
unusual.

So, you can see why people on the list are somewhat unwilling to say here is THE function 
"to check whether data series have outliers or not."

Now having said that, can you define what YOU mean by "outlier" and why you are 
concerned about finding them.  Someone may be able to offer advice that will help you 
achieve your goal.


Agreeing with what has being said and not wanting to misdirect no one, 
there's a function in the graphics package that gives outliers, boxplot. 
They are computed based on boxplot.stats so see


?boxplot.stats

in particular the parameter coef and it's default value.
See also the return values from both these functions.

Rui Barradas



Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trapping option settings

2012-07-18 Thread David A Vavra
Jim,

Thanks. 

It wasn't sure if merely overriding the options function by placing one in
the global environment would guarantee it would be the one actually called.
In any case, I didn't know how to identify the caller. This is quite helpful
and looks promising. I'll give it a try.

DAV




-Original Message-
From: jim holtman [mailto:jholt...@gmail.com] 
Sent: Monday, July 16, 2012 9:59 PM
To: David A Vavra
Cc: r-help@r-project.org
Subject: Re: [R] Trapping option settings

Here is one way by redefining 'options' so you can check for 'width'
and then call the 'options' in 'base':


> options <-  # define 'options' in Global
+ function(...)
+ {
+ args <- list(...)  # get arguments
+ if ('width' %in% names(args)){  # see if 'width' is in them
+ .caller <- sys.calls()  # get where called from
+ if (length(.caller) == 1)  # called from command line
+ .caller <- "Rgui"
+ else .caller <- as.character(.caller[[length(.caller) - 1]])[1]
+ cat("width being changed:", args[['width']], "Called from",
.caller, '\n')
+ }
+ base::options(...)  # call the real options
+ }
>
>
> options(width = 123) # test at command line
width being changed: 123 Called from Rgui
> f.myfunc <- function() options(width = 456)  # within a function
> f.myfunc()
width being changed: 456 Called from f.myfunc
>




On Mon, Jul 16, 2012 at 9:01 PM, David A Vavra  wrote:
> Something has been changing the setting the width option to 1 and not
> resetting it. It does this intermittently. What I would like to do is trap
> changing the setting so I can determine where the change is occurring and
> hopefully fix it.
>
> Is there any easy way to do this?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trapping option settings

2012-07-18 Thread William Dunlap
Try using trace(), as in
  trace(options, quote(print(as.list(sys.calls()

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of David A Vavra
> Sent: Wednesday, July 18, 2012 1:00 PM
> To: 'jim holtman'
> Cc: r-help@r-project.org
> Subject: Re: [R] Trapping option settings
> 
> Jim,
> 
> Thanks.
> 
> It wasn't sure if merely overriding the options function by placing one in
> the global environment would guarantee it would be the one actually called.
> In any case, I didn't know how to identify the caller. This is quite helpful
> and looks promising. I'll give it a try.
> 
> DAV
> 
> 
> 
> 
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Monday, July 16, 2012 9:59 PM
> To: David A Vavra
> Cc: r-help@r-project.org
> Subject: Re: [R] Trapping option settings
> 
> Here is one way by redefining 'options' so you can check for 'width'
> and then call the 'options' in 'base':
> 
> 
> > options <-  # define 'options' in Global
> + function(...)
> + {
> + args <- list(...)  # get arguments
> + if ('width' %in% names(args)){  # see if 'width' is in them
> + .caller <- sys.calls()  # get where called from
> + if (length(.caller) == 1)  # called from command line
> + .caller <- "Rgui"
> + else .caller <- as.character(.caller[[length(.caller) - 1]])[1]
> + cat("width being changed:", args[['width']], "Called from",
> .caller, '\n')
> + }
> + base::options(...)  # call the real options
> + }
> >
> >
> > options(width = 123) # test at command line
> width being changed: 123 Called from Rgui
> > f.myfunc <- function() options(width = 456)  # within a function
> > f.myfunc()
> width being changed: 456 Called from f.myfunc
> >
> 
> 
> 
> 
> On Mon, Jul 16, 2012 at 9:01 PM, David A Vavra  wrote:
> > Something has been changing the setting the width option to 1 and not
> > resetting it. It does this intermittently. What I would like to do is trap
> > changing the setting so I can determine where the change is occurring and
> > hopefully fix it.
> >
> > Is there any easy way to do this?
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> --
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of matched data

2012-07-18 Thread Peter Ehlers

On 2012-07-18 04:27, Rui Barradas wrote:

Helo,

All problems should be easy.


d <- read.table(text="
gene variable value gender line rep
1 CG1 X208.F1.30456 4.758010 Female 208 1
2 CG1 X365.F2.30478 4.915395 Female 365 2
3 CG1 X799.F2.30509 4.641636 Female 799 2
4 CG1 X306.M2.32650 4.550676 Male 306 2
5 CG1 X712.M2.30830 4.633811 Male 712 2
6 CG1 X732.M2.30504 4.857564 Male 732 2
7 CG1 X707.F1.31120 5.104165 Female 707 1
8 CG1 X514.F2.30493 4.730814 Female 514 2
", header=TRUE)

# See what we have
str(d)

# or put function(x) ...etc... in the aggregate
f <- function(x) c(mean=mean(x), sd=sd(x))
aggregate(value ~ gene + gender, data = d, f)


Hope this helps,

Rui Barradas


I read the request a bit differently; we can use ave()
to generate the requested new variables:

d1 <- transform(d,
   MN = ave(value, gene, gender),
   SD = ave(value, gene, gender, FUN = sd))

Or use within() instead of transform().

Peter Ehlers


Em 18-07-2012 10:54, robgriffin247 escreveu:

Hi
I think/hope there will be a simple solution to this but google-ing has
provided no answers (probably not using the right words)

I have a long data frame of >2 000 000 rows, and 6 columns. Across this
there are 24 000 combinations of gene in a column (n=12000) and gender in a
column (n=2... obviously). I want to create 2 new columns in the data frame
that on each row gives, in one column the mean value (of gene expression, in
the column called "value") for that row's gene&gender combination, and in
the other column the standard deviation for the gene&gender combination.

Any suggestions?

Rob

Example of the top of the data frame:

genevariablevalue   gender  linerep
1   CG1 X208.F1.30456   4.758010Female  208 1
2   CG1 X365.F2.30478   4.915395Female  365 2
3   CG1 X799.F2.30509   4.641636Female  799 2
4   CG1 X306.M2.32650   4.550676Male306 2
5   CG1 X712.M2.30830   4.633811Male712 2
6   CG1 X732.M2.30504   4.857564Male732 2
7   CG1 X707.F1.31120   5.104165Female  707 1
8   CG1 X514.F2.30493   4.730814Female  514 2

--
View this message in context: 
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternate tick labels and tick marks with lattice xyplot

2012-07-18 Thread John Kane
Oh, I clearly misunderstood what you were doing there.  I don't know anything 
about Manhattan plots but a quick google for "manhattan plot r package" turns 
up a number of items so the type of plot you want may already exist.  

Sorry to not be of more help.

John Kane
Kingston ON Canada


> -Original Message-
> From: lmpr...@gmail.com
> Sent: Wed, 18 Jul 2012 14:02:49 -0400
> To: jrkrid...@inbox.com
> Subject: Re: [R] alternate tick labels and tick marks with lattice xyplot
> 
> Yes,
> 
> I would be interested in both the ggplot2 and lattice ways of doing
> this. Unfortunately, I am not interested in creating a panel for each
> chromosome. Actually, I would like to create a Manhattan plot using
> xyplot. Thus I would need to alternate tick marks and tick labels.
> 
> Thanks!
> 
> 
> 
> 
> On Mon, Jul 16, 2012 at 12:11 PM, John Kane  wrote:
>> 
>> I have not seen any response yet so I thought I would reply.
>> 
>> No idea of how to do this in lattice but an approximation of it can be
>> done in ggplot2. I am trying to learn ggplot2 and it was a handy
>> exercise.  I still have not figured out how to get the extra line on the
>> x-axis, hence the lines in the graph body instead
>> 
>> Example:
>> ##++###
>> 
>> library(ggplot2)
>> data  <-  structure(list(Chromosome = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
>> 3), BasePair = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Pvalue =
>> c(0.819178665755317,
>> 0.462827404495329, 0.44360001408495, 0.871020796708763,
>> 0.404167180880904,
>> 0.115009917411953, 0.51048326632008, 0.292681957129389,
>> 0.839718346949667,
>> 0.586992112919688, 0.29609977430664, 0.873988818377256), indice = 1:12,
>> group = c("Group 1", "Group 2", "Group 1", "Group 1", "Group 1",
>> "Group 1", "Group 2", "Group 1", "Group 2", "Group 1", "Group 2",
>> "Group 2")), .Names = c("Chromosome", "BasePair", "Pvalue",
>> "indice", "group"), row.names = c(NA, -12L), class = "data.frame")
>> 
>> library(ggplot2)
>> 
>> p  <-  ggplot(data, aes(indice, -log10(Pvalue))) + geom_line()  +
>>   opts(legend.position = "none") +
>>   scale_y_continuous(expression(paste(-log[10], "p-value"))) +
>>   scale_x_continuous("Chromosome", breaks=c(2.5, 6.5 ,10.5),
>> labels=c("1", "2","3")) +
>>   geom_segment(aes(x = 4, y = 0.01, xend = 9, yend = 0.01,
>> colour = group))  +
>>   opts(title = "Results")  + facet_grid(. ~ group)
>> p
>> 
>> ##===##
>> John Kane
>> Kingston ON Canada
>> 
>> 
>>> -Original Message-
>>> From: lmpr...@gmail.com
>>> Sent: Fri, 13 Jul 2012 15:33:43 -0400
>>> To: r-help@r-project.org
>>> Subject: [R] alternate tick labels and tick marks with lattice xyplot
>>> 
>>> Hi,
>>> 
>>> I would like to use xyplot to create a figure. Unfortunately, I cannot
>>> find
>>> documentation in xyplot to specify alternating the x-axis tick labels
>>> with
>>> the x-axis tick marks. I can do this with the regular R plot function
>>> as
>>> follows.
>>> 
>>> 
>>> #A small version of my data looks like this
>>> data<-data.frame(matrix(ncol=3,nrow=12))
>>> data[,1]<-rep(c(1,2,3),c(4,4,4))
>>> data[,2]<-rep(c(1,2,3,4),3)
>>> data[,3]<-runif(12,0,1)
>>> names(data)<-c("Chromosome", "BasePair", "Pvalue")
>>> #using R's plot function, I would place the the chromosome label
>>> between
>>> the
>>> #tick marks as follows:
>>> v1<-c(4,8)
>>> v2<-c(2,6,10)
>>> data$indice<-seq(1:12)
>>> plot(data$indice, -log10(data$Pvalue), type="l", xaxt="n",
>>> main="Result",
>>>  xlab="Chromosome", ylab=expression(paste(-log[10]," p-value")))
>>> axis(1, v1,labels=FALSE )
>>> axis(1, v2, seq(1:3), tick=FALSE, cex.axis=.6)
>>> 
>>> Can this be done with lattice xyplot?
>>> 
>>> 
>>> --
>>> Leah Preus
>>> Biostatistician
>>> Roswell Park Cancer Institute
>>> 
>>>   [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at
>> http://www.inbox.com/smileys
>> Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
>> and most webmails
>> 
>> 
> 
> 
> 
> --
> Leah Preus
> Biostatistician
> Roswell Park Cancer Institute


GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at 
http://www.inbox.com/smileys
Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and most 
webmails

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and 

[R] information

2012-07-18 Thread Karan Anand
hi,
   i am new to using  R, so if u can  help me in merging my csv file having
different sheets .if u can help me with the commands for it.


regards
karan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] double for cycle

2012-07-18 Thread cesare orsini
Hi nice people,

i am trying to made a double "for" cycle, i wish that for cycle on k
activates t times  the for cycle on s 

the first cycle is this:

s<-rep(1,10)
s
 [1] 1 1 1 1 1 1 1 1 1 1

> for (n in 2:10) {
+ s[n]=1+s[n]+rnorm(1,mean=0,sd=1)
+ }
> s
 [1] 1 4.75 4.86 4.05 4.09 4.56 4.63 4.65 4.12 4.01

now i wish another cycle that activates t times the before cycle gives me a
matrix with t columns and 10 rows

like for example if t=3

1  1   1
4.75  4.56 4.13
4.86  4.12 4.58
4.05  4.17 4.78
4.09  4.44 4.15
4.56  4.80 4.15
4.63  4.56 4.11
4.65  4.22 4.25
4.12  4.55 4.56
4.01  4.25 4.36

how can i do?
thank you for avalaibility!


--
View this message in context: 
http://r.789695.n4.nabble.com/double-for-cycle-tp4636916.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conditional increase by increment

2012-07-18 Thread penguins
I am trying to assign increasing trip numbers to a binary variable ("land";
1=home and 0=away) where a string of 1's shouldn't increment the trip_no
more than once. 

For example; based on 
land<-c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0)
the "trip_no" sequence produced should be   1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3

This is as far as I can get but Im stumped. In addition I need it to work on
data where the land variable can start on "0" or "1" for trip_no=1. Any help
would be hugely appreciated:

land<-c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0)
trip_no <- rep(0, length(land))
gg<-cbind(land,trip_no)

increment <- function(x){
  eval.parent(substitute(x <- x + 1))
}

for(i in length(gg)){
  if(gg$land[[i]]==1) {
  gg$trip_no<-increment(trip_no[i])
}
 }

--
View this message in context: 
http://r.789695.n4.nabble.com/conditional-increase-by-increment-tp4636910.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with correlation matrices, thresholding

2012-07-18 Thread Drinniol
Thank you!  I had to make some modifications since my data is between a data
set and a subset not one data set to itself, but I was able to use
essentially your method to get it working.  

I did have some trouble with the matrix command on the last line - it kept
returning lots of NAs.  So I just did it this way:

data.frame(colnames(x)[i[,2]], rownames(x)[i[,1]], value = x[i]) 

@Bert
I'll definitely check out the Bioconductor stuff - there's a lot of it!

Thanks again for the help.  

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-with-correlation-matrices-thresholding-tp4636697p4636893.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting problem data

2012-07-18 Thread Lib Gray
Hello, I need to subset my data to only look at the parts that have "holes"
in it. I already have a formula to get rid of inconsistencies, but now I
need to look only at the problem data to reconfigure it. In my data set
where there are multiple "cycles" per "patient," and I want to highlight
the patients who have a variable was not measured every cycle.

Here's a similar example of the data:

Patient, Cycle, Variable1, Variable 2
A, 1, 4, 5
A, 2, 3, 3
A, 3, 4, NA
B, 1, 6, 6
B, 2, NA, 6
C, 1, 6, 5
C, 3, 2, 2

So in this case, I would want Patient A and Patient B, but not Patient C.

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] convert deldir$delsgs to a X3D IndexedTriangleSet

2012-07-18 Thread Erdal Karaca
Anyone knows how to convert a deldir$delsgs to a X3D IndexedTriangleSet?
Are there already any functions/packages?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The best solver for non-smooth functions?

2012-07-18 Thread Cren
# Hi all,

# consider the following code (please, run it:
# it's fully working and requires just few minutes
# to finish):

require(CreditMetrics)
require(clusterGeneration)
install.packages("Rdonlp2", repos= c("http://R-Forge.R-project.org";,
getOption("repos")))
install.packages("Rsolnp2", repos= c("http://R-Forge.R-project.org";,
getOption("repos")))
require(Rdonlp2)
require(Rsolnp)
require(Rsolnp2)

N <- 3
n <- 10
r <- 0.0025
ead <- rep(1/3,3)  
rc <- c("AAA", "AA", "A", "BBB", "BB", "B", "CCC", "D")
lgd <- 0.99
rating <- c("BB", "BB", "BBB")  
firmnames <- c("firm 1", "firm 2", "firm 3")
alpha <- 0.99

# One year empirical migration matrix from Standard & Poor's website

rc <- c("AAA", "AA", "A", "BBB", "BB", "B", "CCC", "D")
M <- matrix(c(90.81,  8.33,  0.68,  0.06,  0.08,  0.02,  0.01,   0.01,
  0.70, 90.65,  7.79,  0.64,  0.06,  0.13,  0.02,   0.01,
  0.09,  2.27, 91.05,  5.52,  0.74,  0.26,  0.01,   0.06,
  0.02,  0.33,  5.95, 85.93,  5.30,  1.17,  1.12,   0.18,
  0.03,  0.14,  0.67,  7.73, 80.53,  8.84,  1.00,   1.06,
  0.01,  0.11,  0.24,  0.43,  6.48, 83.46,  4.07,   5.20,
  0.21, 0,  0.22,  1.30,  2.38, 11.24, 64.86,  19.79,
  0, 0, 0, 0, 0, 0, 0, 100
)/100, 8, 8, dimnames = list(rc, rc), byrow = TRUE)

# Correlation matrix

rho <- rcorrmatrix(N) ; dimnames(rho) = list(firmnames, firmnames)

# Credit Value at Risk

cm.CVaR(M, lgd, ead, N, n, r, rho, alpha, rating)

# Risk neutral yield rates

Y <- cm.cs(M, lgd)
y <- c(Y[match(rating[1],rc)], Y[match(rating[2],rc)],
Y[match(rating[3],rc)]) ; y

# The function to be minimized

sharpe <- function(w) {
  - (t(w) %*% y) / cm.CVaR(M, lgd, ead, N, n, r, rho, alpha, rating)
}

# The linear constraints

constr <- function(w) {
  sum(w)
}

# Results' matrix (it's empty by now)

Results <- matrix(NA, nrow = 3, ncol = 4)
rownames(Results) <- list('donlp2', 'solnp', 'solnp2')
colnames(Results) <- list('w_1', 'w_2', 'w_3', 'Sharpe')

# See the differences between different solvers

rho
Results[1,1:3] <- round(donlp2(fn = sharpe, par = rep(1/N,N), par.lower =
rep(0,N), par.upper = rep(1,N), A = t(rep(1,N)), lin.lower = 1, lin.upper =
1)$par, 2)
Results[2,1:3] <- round(solnp(pars = rep(1/N,N), fun = sharpe, eqfun =
constr, eqB = 1, LB = rep(0,N), UB = rep(1,N))$pars, 2)
Results[3,1:3] <- round(solnp2(par = rep(1/N,N), fun = sharpe, eqfun =
constr, eqB = 1, LB = rep(0,N), UB = rep(1,N))$pars, 2)
for(i in 1:3) {
  Results[i,4] <- abs(sharpe(Results[i,1:3]))  
}
Results

# In fact the "sharpe" function I previously defined
# is not smooth because of the cm.CVaR function.
# If you change correlation matrix, ratings or yields
# you see how different solvers produce different
# parameters estimation.

# Then the main issue is: how may I know which is the
# best solver at all to deal with non-smooth functions
# such as this one?

--
View this message in context: 
http://r.789695.n4.nabble.com/The-best-solver-for-non-smooth-functions-tp4636934.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] alternate tick labels and tick marks with lattice xyplot

2012-07-18 Thread Leah Marian
Yes,

I would be interested in both the ggplot2 and lattice ways of doing
this. Unfortunately, I am not interested in creating a panel for each
chromosome. Actually, I would like to create a Manhattan plot using
xyplot. Thus I would need to alternate tick marks and tick labels.

Thanks!




On Mon, Jul 16, 2012 at 12:11 PM, John Kane  wrote:
>
> I have not seen any response yet so I thought I would reply.
>
> No idea of how to do this in lattice but an approximation of it can be done 
> in ggplot2. I am trying to learn ggplot2 and it was a handy exercise.  I 
> still have not figured out how to get the extra line on the x-axis, hence the 
> lines in the graph body instead
>
> Example:
> ##++###
>
> library(ggplot2)
> data  <-  structure(list(Chromosome = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
> 3), BasePair = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Pvalue = 
> c(0.819178665755317,
> 0.462827404495329, 0.44360001408495, 0.871020796708763, 0.404167180880904,
> 0.115009917411953, 0.51048326632008, 0.292681957129389, 0.839718346949667,
> 0.586992112919688, 0.29609977430664, 0.873988818377256), indice = 1:12,
> group = c("Group 1", "Group 2", "Group 1", "Group 1", "Group 1",
> "Group 1", "Group 2", "Group 1", "Group 2", "Group 1", "Group 2",
> "Group 2")), .Names = c("Chromosome", "BasePair", "Pvalue",
> "indice", "group"), row.names = c(NA, -12L), class = "data.frame")
>
> library(ggplot2)
>
> p  <-  ggplot(data, aes(indice, -log10(Pvalue))) + geom_line()  +
>   opts(legend.position = "none") +
>   scale_y_continuous(expression(paste(-log[10], "p-value"))) +
>   scale_x_continuous("Chromosome", breaks=c(2.5, 6.5 ,10.5), 
> labels=c("1", "2","3")) +
>   geom_segment(aes(x = 4, y = 0.01, xend = 9, yend = 0.01, colour = 
> group))  +
>   opts(title = "Results")  + facet_grid(. ~ group)
> p
>
> ##===##
> John Kane
> Kingston ON Canada
>
>
> > -Original Message-
> > From: lmpr...@gmail.com
> > Sent: Fri, 13 Jul 2012 15:33:43 -0400
> > To: r-help@r-project.org
> > Subject: [R] alternate tick labels and tick marks with lattice xyplot
> >
> > Hi,
> >
> > I would like to use xyplot to create a figure. Unfortunately, I cannot
> > find
> > documentation in xyplot to specify alternating the x-axis tick labels
> > with
> > the x-axis tick marks. I can do this with the regular R plot function as
> > follows.
> >
> >
> > #A small version of my data looks like this
> > data<-data.frame(matrix(ncol=3,nrow=12))
> > data[,1]<-rep(c(1,2,3),c(4,4,4))
> > data[,2]<-rep(c(1,2,3,4),3)
> > data[,3]<-runif(12,0,1)
> > names(data)<-c("Chromosome", "BasePair", "Pvalue")
> > #using R's plot function, I would place the the chromosome label between
> > the
> > #tick marks as follows:
> > v1<-c(4,8)
> > v2<-c(2,6,10)
> > data$indice<-seq(1:12)
> > plot(data$indice, -log10(data$Pvalue), type="l", xaxt="n", main="Result",
> >  xlab="Chromosome", ylab=expression(paste(-log[10]," p-value")))
> > axis(1, v1,labels=FALSE )
> > axis(1, v2, seq(1:3), tick=FALSE, cex.axis=.6)
> >
> > Can this be done with lattice xyplot?
> >
> >
> > --
> > Leah Preus
> > Biostatistician
> > Roswell Park Cancer Institute
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> 
> GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at 
> http://www.inbox.com/smileys
> Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™ and 
> most webmails
>
>



--
Leah Preus
Biostatistician
Roswell Park Cancer Institute

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The best solver for non-smooth functions?

2012-07-18 Thread Cren
# Whoops! I have just seen there's a little mistake
# in the 'sharpe' function, because I had to use
# 'w' array instead of 'ead' in the cm.CVaR function!
# This does not change the main features of my,
# but you should be aware of it

---

# The function to be minimized

sharpe <- function(w) {
  - (t(w) %*% y) / cm.CVaR(M, lgd, ead, N, n, r, rho, alpha, rating)
} 

# This becomes...

sharpe <- function(w) {
  - (t(w) %*% y) / cm.CVaR(M, lgd, w, N, n, r, rho, alpha, rating)
} 

# ...substituting 'ead' with 'w'.

--
View this message in context: 
http://r.789695.n4.nabble.com/The-best-solver-for-non-smooth-functions-tp4636934p4636936.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of matched data

2012-07-18 Thread robgriffin247
Thanks,
in a way this has worked... with a slight modification to this:
  
   
narrow3<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,mean)
   
narrow4<-aggregate(narrow2$value~narrow2$gene+narrow2$gender,data=narrow2,sd)

which gives a table of the 24000 gene&gender means (narrow3) and the
standard deviations (narrow4)

which I then merge in to one df using

narrow5<-merge(narrow3,narrow4,by=c("narrow2$gene","narrow2$gender"))
colnames(narrow5)<-c("gene","gender","mean","sd")


Is there a way I can lift the mean and std.dev. values from data frame
narrow5 and paste them to the original narrow2 df? In effect... R would read
what gene and gender each row of narrow2 has & then paste in the according
mean value in to a new column. then do the same for a new sd column. each
mean/sd value would occur in the new column 80 times (there are 80
occurrences of each gene&gender combination).

rob

--
View this message in context: 
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856p4636871.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NTH PERCENTILE COULMNWIESE

2012-07-18 Thread arun
Hi
Try this:

mydat <-read.table(text="
ABC    XYZ
12  6
6    50
90    100
55  85
100    25
",sep="",header=TRUE)
apply(mydat,2,quantile,probs=0.2)
 ABC  XYZ 
10.8 21.2 
A.K.




- Original Message -
From: Rantony 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 18, 2012 3:06 AM
Subject: [R] NTH PERCENTILE COULMNWIESE

Hi,

Here i have an matrix
mydat <- 
ABC    XYZ
-     --
12          6
6             50
90         100
55           85
100         25

i need to find the " NTH percentile "  [result should be in column-wise].
here i have a code for NTH percenile,

For eg:- if i need 20th-percentile then,
quantile(ncol(mydat),.2)  - here i getting output for complete matrix ,But i
need this for all the columnswise. like this,

for nth percentile
          ABC     XYZ
          --    ---
20%   10        24    [here, given percentile value is not exact result, its
just for output format]

-Its urgent !

- Thanks 
Antony.


--
View this message in context: 
http://r.789695.n4.nabble.com/NTH-PERCENTILE-COULMNWIESE-tp4636839.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] duplicate data between two data frames according to row names

2012-07-18 Thread arun


Hi,

You could use "merge", "join" etc.
merge(DF3,DF2)
 # station lat lon data
#1   ST002  41   2    3
#2   ST003  42   3    7
library(plyr)
 join(DF3,DF2,type="inner")
Joining by: station
#  station lat lon data
#1   ST002  41   2    3
#2   ST003  42   3    7

#or
join(DF3,DF2,type="right")


Hope this helps
A.K.



- Original Message -
From: jeff6868 
To: r-help@r-project.org
Cc: 
Sent: Wednesday, July 18, 2012 4:21 AM
Subject: [R] duplicate data between two data frames according to row names

Hi everybody.

I'll first explain my problem and what I'm trying to do. 
Admit this example:
I'm working on 5 different weather stations.
I have first in one file 3 of these 5 weather stations, containing their
data. Here's an example of this file:

DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8))

And my two other stations in this other data.frame:

DF2 <- data.frame(station=c("ST002","ST003"),data=c(3,7))

I would like to add geographical coordinates of these weather stations
inside these two data.frames, according to the number of the weather
station.

All of my geographical coordinates for each of the 5 weather stations are
inside another data frame:

DF3 <-
data.frame(station=c("ST001","ST002","ST003","ST004","ST005"),lat=c(40,41,42,43,44),lon=c(1,2,3,4,5))

My question is: how can I put automatically these geographical coordinates
inside my first 2 data frames, according to the number of the weather
station?

For this example, the first two data frames DF1 and DF2 should become:

DF1 <-
data.frame(station=c("ST001","ST004","ST005"),lat=c(40,43,44),lon=c(1,4,5),data=c(5,2,8))
and
DF2 <-
data.frame(station=c("ST002","ST003"),lat=c(41,42),lon=c(2,3),data=c(3,7))

I need to automatize this method because my real dataset contains 70 weather
stations, and each file contains other (or same sometimes) stations , but
each station can be found in the list of the coordinates file (DF3).

Is there any way or any function able to do this kind of thing?

Thank you very much!




--
View this message in context: 
http://r.789695.n4.nabble.com/duplicate-data-between-two-data-frames-according-to-row-names-tp4636845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] System Invocation Error - Cspade

2012-07-18 Thread sdoberman
I am trying to run the cspade function in the arulesSequences package in R.
After I successfully read in my transactions using read_baskets, I try to
execute the cspade function against the transactions object I read in. 

However, when I execute the command, I obtain an error: system invocation
failed. Specifically, here is the output. 

preprocessing ... 1 partition(s), 1.2 MB [0.23s] 
mining transactions ...Error in cspade(table, parameter = list(support =
0.1), control = list  (verbose = TRUE)) :  
system invocation failed 

 The presence of "mining transactions" indicates that the following function
call in the cspade code is failing. 

if (system2(file.path(exe, "spade"), args = c("-i", file,  
   "-s", parameter@support, opt, "-e", nop, "-o"), stdout = out))  
stop("system invocation failed").  

 For reference, I can successfully generate sequences using the example zaki
dataset.

Does anyone have any idea why that command might be failing?

Thanks,

Stewart


--
View this message in context: 
http://r.789695.n4.nabble.com/System-Invocation-Error-Cspade-tp4636899.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mean of matched data

2012-07-18 Thread robgriffin247
got it... another merge did the trick

narrow6<-merge(narrow2,narrow5,by=c("gene","gender"))

Thanks for the help Rui

--
View this message in context: 
http://r.789695.n4.nabble.com/Mean-of-matched-data-tp4636856p4636877.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable labels

2012-07-18 Thread Francois Maurice
Thanks a lot, it works fine with me !!


François Maurice

De : "Nutter, Benjamin" 
À : Francois Maurice ; r-help@r-project.org 
Envoyé le : mercredi 18 juillet 2012 8h21
Objet : RE: [R] Variable labels

I have my own function for doing this that is similar to the one presented 
below.  Others may have other ideas that work better.  As a general rule, I 
would caution against writing out just the label without the variable name.  
The only reason I see to separate the labels and names is if you are generating 
a report (and using write.table wouldn't be my first choice for doing that).  
However, the function below at least allows you to decide when you call it.  If 
you want to exclude the names, just set names=FALSE.


#*** possibly buggy.  I didn't do much testing on this function.
write.table.with.labels <- function(data, file, names=TRUE, labels=TRUE, 
append=FALSE, ...){
  if (names) write.table(t(names(data)), file, col.names=FALSE, ...)
  if (labels) write.table(t(label(data)), file, col.names=FALSE, 
append=any(c(names, append)), ...)
  write.table(data, file, col.names=FALSE, append=TRUE, any(c(names, labels, 
append)), ...)
}

label(mtcars$mpg) <- "Miles per gallon"

write.table.with.labels(mtcars, "test.csv", sep=",")



  Benjamin Nutter |  Biostatistician    |  Quantitative Health Sciences
  Cleveland Clinic   |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 
445-1365


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Francois Maurice
Sent: Tuesday, July 17, 2012 4:36 PM
To: r-help@r-project.org
Subject: [R] Variable labels



Hi,
 
I'm using write.table() to export dataframe to Excel. All works fine except 
that I want to export the variable labels instead of variable names.

 I can see the labels in the R consol using attr(), but I just don't know how 
to use the labels instead of the names.

Thanks,

François Maurice
    [[alternative HTML version deleted]]


===


Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals in America by U.S.News & 
World Report (2010).  
Visit us online at http://www.clevelandclinic.org/ for a complete listing of 
our services, staff and locations.


Confidentiality Note:  This message is intended for us...{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taylor expansions with real vectors

2012-07-18 Thread bilelsan
Dear list, 

I have a big deal concerning the development of a Taylor expansion. 

require(Matrix) 
e1 <- as.vector(1:5) 
e2 <- as.vector(6:10) 

in order to obtain all the combinations between these two vectors following
a Taylor expansion (or more simply through a Maclaurin series) for real
numbers. 
We have f(x) = f(0) + f'(0)(x-0) + f''(0)(x-0)^2/2! + … + f^(k)(0)(x-0)^k/k! 
with 
f(x) = e1 + e2 for Taylor expansion (r = 1) 
+ 1/2!*e1^2 + 1/2!*e2^2 + 1/2!*e1*e2 for Taylor expansion (r = 2)
excluding e2*e1 
+ 1/3!*e1^3 + 1/3!*e1^2*e2 + 1/3!*e2^2*e1 + 1/3!*e2^3 for Taylor
expansion (r = 3) excluding e2*e1^2 and e1*e2^2 
   ... 
I already write the number of possible combinations as : 
x <- as.vector(0) 
for (r in 1:r){x[r] <- 2*(sum(choose(2*q+r-1,r))-sum(choose(q+r-1,r)))}# q:
number of lag of e1 and e2; r: order of taylor expansion 
nstar   <- sum(x) # N* number of total combinations 
  
How to write f(x) in a general framework? 
Quid of this framework when e1 and e2 are completed with their lags if q >
1? 
Your help or advice would be greatly appreciated 

Bilel

--
View this message in context: 
http://r.789695.n4.nabble.com/RE-taylor-expansions-with-real-vectors-tp4636886.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for to check outliers

2012-07-18 Thread arun
HI,

Check this link:
http://stackoverflow.com/questions/1444306/how-to-use-outlier-tests-in-r-code

Hope it would be helpful.
A.K.



- Original Message -
From: Sajeeka Nanayakkara 
To: "r-help@r-project.org" 
Cc: 
Sent: Wednesday, July 18, 2012 9:27 AM
Subject: [R] R code for to check outliers





 What is the R code to check whether data series have outliers or not?

Thanks,

Sajeeka Nanayakkara
    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Imposing more than one condition to if

2012-07-18 Thread Santiago Guallar
Nice! It works. Thank you, Rui
 
There's something that takes me back to the original question, though. The code 
takes the first value that meets the critical condition (light ==0); however, 
this value can appear more than once a day because these animals nest in dark 
caves, and once they are in the logger sensor registers a 0. So, a second 
condition limiting the time at which the critical condition can be accepted to 
calculate dusk should be implemented; something like (although R doesn't allow 
to impose the double condition):
 
if (05:00:00 17:00:00) {zrle <- rle(x$lig == 0)}
 
 
Santi

From: Rui Barradas 
>To: Santiago Guallar  
>Cc: r-help@r-project.org 
>Sent: Wednesday, July 18, 2012 11:37 AM
>Subject: Re: [R] Imposing more than one condition to if
>
>hELLO,
>
>There was no nedd to change the names of the variables inside the 
>fucntion. What was going on s that in this new file column 'dtime' 
>doesn't exist. Since column 'clock' exists in all files, I've changed 
>the function once again, making it more general.
>
>Note that there is an argument 'colTime' with a default value. In the 
>function's use below I call it with and without that argument.
>
>
>f <- function(x, colTime="clock"){
>    zrle <- rle(x$lig == 0)
>    if(zrle$values[1]){
>        idusk <- sum(zrle$lengths[1:2]) + 1
>        idawn <- zrle$lengths[1] + 1
>        x$dusk <- x[ idusk , colTime ]
>        x$dawn <- x[ idawn , colTime ]
>    }else{
>        idusk <- zrle$lengths[1] + 1
>        x$dusk <- x[ idusk , colTime ]
>        x$dawn <- NA
>    }
>    x
>}
>
>str(d)
>#d$date <- as.Date(d$date, format="%d/%m%/y")
>
>#library(chron)
>#tm <- times(as.character(d$clock))
>#d$clock <- tm
>
># See what will happen. This call uses the default 'colTime'
>bb <- by(d, d$date, f)
>for(i in seq_along(bb))    print(head(bb[[i]], 1))
>
># call and rbind. This call uses explicit arg 'colTime'
>do.call(rbind, by(d, d$date, f, colTime="clock"))
>
># Alternatively, it could be, because 'bb' is already created,
>do.call(rbind, bb)
>
>
>In the code above, I use an optional conversion to date and time 
>classes; as.Date is part of base R, but class times needs package chron. 
>It's a good idea to convert these variables, you can later use, say, 
>arithmetics on dates and times, such as differences.
>
>Hope this helps,
>
>Rui Barradas
>
>Em 17-07-2012 19:53, Santiago Guallar escreveu:
>> Thank for your time, Rui.
>> Now, I get this error message:
>> Error en rbind(deparse.level, ...) :
>> numbers of columns of arguments do not match
>> Apparently, some columns have missing values and rbind doesn't work. I
>> tried:
>> require(plyr)
>> do.call(rbind.fill, by(z, z$date, f))
>> Then the code runs through but dusk the variable dusk is missing and
>> dawn is filled with NA.
>> Just in case the problem simply lies in a name, this is your code after
>> I changed the object names (basically 'x' and 'd' by 'z') to adapt them
>> to the names of my dataset:
>> f <- function(z){
>> zrle <- rle(z$lig == 0)
>> if(zrle$values[1]){
>> idusk <- sum(zrle$lengths[1:2]) + 1
>> idawn <- zrle$lengths[1] + 1
>> z$dusk <- z$dtime[ idusk ]
>> z$dawn <- z$dtime[ idawn ]
>> }else{
>> idusk <- zrle$lengths[1] + 1
>> z$dusk <- z$dtime[ idusk ]
>> z$dawn <- NA
>> }
>> z
>> }
>>
>> do.call(rbind, by(z, z$date, f))
>> Again, I attached a dput() with the object z which contains my dataset.
>> Santi
>>
>>    *From:* Rui Barradas 
>>    *To:* Santiago Guallar 
>>    *Cc:* r-help@r-project.org
>>    *Sent:* Tuesday, July 17, 2012 11:52 AM
>>    *Subject:* Re: [R] Imposing more than one condition to if
>>
>>
>>        Hello,
>>
>>        My code couldn't find the right input columns because your real
>>        data has
>>        different names, it could only find the example dataset's names.
>>
>>        And there's another problem, my code would give correct answers
>>        with a
>>        limted number of possible inputs and fail with real data.
>>
>>        Corrected:
>>
>>
>>        f <- function(x){
>>              zrle <- rle(x$lig == 0)
>>              if(zrle$values[1]){
>>                  idusk <- sum(zrle$lengths[1:2]) + 1
>>                  idawn <- zrle$lengths[1] + 1
>>                  x$dusk <- x$dtime[ idusk ]
>>                  x$dawn <- x$dtime[ idawn ]
>>              }else{
>>                  idusk <- zrle$lengths[1] + 1
>>                  x$dusk <- x$dtime[ idusk ]
>>                  x$dawn <- NA
>>              }
>>              x
>>        }
>>
>>        do.call(rbind, by(d, d$date, f))
>>
>>
>>        One more thing, you are reading your dataset into a data.frame
>>        forgetting that character strings become factors. Try str(d) to
>>        see it.
>>        ('d' is the data.frame.) You could/should coerce the date/time
>>        values to
>>        appropriate classes, something like
>>
>>
>>        d$time <- as.character(d$time)
>>        d$time <- as.POSIXct(d$time, format="%d/%m/%y %H:%M:%S")
>>        d$date <- as.character(d$date)
>>   

Re: [R] plotting points to a map

2012-07-18 Thread Barry Rowlingson
On Wed, Jul 18, 2012 at 7:41 PM, Gary Dong  wrote:

> For your second question:
> 1) by relative long and lat, I mean distances to the city center from
> south-north and east-west directions.
> 2) I know the coordinate of the reference point.

 The problem here is that going 10 miles north along a line of
longitude then going 5 miles east along a line of latitude doesn't get
you to the same point as first going 5 miles east along a line of
latitude and then going 10 miles north on the line of longitude you
reach. If you don't believe me, try doing it 12 miles due south of the
north pole.

 To be precise, if someone tells you X miles north and Y miles east of
point Z, then they have to specify the projected coordinate system
they are talking about - this should be an approximation to a square
grid valid to a certain precision.

 Now maybe the projected coord system you mentioned is the right one
for your points. Maybe your city map is already in that projection.
When you plot the city shapefile, are the axis values in lat-long
degrees or could they be feet?

 require(rgdal)
 city = readOGR(".","city") # should read city.shp in the current directory
 bbox(city)

 if that is in feet, then city.shp is already projected, and all you
need to do is add the north and east offsets converted to feet to the
projected coordinate of your origin point...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting several lme sistematically

2012-07-18 Thread arun
Hi Jean,

 Is there something missing in the function?


ids <- a$id
for(i in 2:4){
for(j in 5:7){
    y <- a[, j]
    x <- a[, i]
    model<-lme(y ~ x , random= ~1|ids, na.action="na.exclude")
    }}

 summary(model)
Linear mixed-effects model fit by REML
 Data: NULL 
   AIC  BIC    logLik
  281.1838 291.5236 -136.5919

Random effects:
 Formula: ~1 | ids
    (Intercept)  Residual
StdDev:   0.1109054 0.9251637

Fixed effects: y ~ x 
  Value  Std.Error DF    t-value p-value
(Intercept)  0.03931479 0.09909825 89  0.3967254  0.6925
x   -0.11826047 0.09731719 89 -1.2152063  0.2275
 Correlation: 
  (Intr)
x 0.056 

Standardized Within-Group Residuals:
   Min Q1    Med Q3    Max 
-2.0882452 -0.7718563  0.1156507  0.6119178  1.7986478 

Number of Observations: 100
Number of Groups: 10 

A.K.



- Original Message -
From: Jean V Adams 
To: Berta Ibáñez 
Cc: Lista de R 
Sent: Wednesday, July 18, 2012 1:02 PM
Subject: Re: [R] fitting several lme sistematically

I'm not sure why, but lme() doesn't seem to like the variables to be 
referenced as part of a list using [ or $.
Here's an easy workaround ...

ids <- a$id
for(i in 2:4){
for(j in 5:7){
        y <- a[, j]
        x <- a[, i]
        lme(y ~ x , random= ~1|ids, na.action="na.exclude")
        }}

Jean


Berta Ibáñez  wrote on 07/18/2012 08:53:51 AM:

> Dear R-list, 
> 
> I have a data set (in the following example called "a") which have: 
> 
> one "subject indicator" variable (called "id")
> three dependent variables (varD, varE, var F)
> three independent variables (varA, varB, varC)
> 
> I want to fit 9 lme models, one per posible combination (DA, DB, DC,
> EA, EB, EC, FA, FB, FC).
> In stead of writting the 9 lme models, I want to do it 
> sistematically (the example is a simplification of what I really 
> have). Here you have the comands for the first model: 
> 
> library(nlme)
> set.seed(50)
> a<-data.frame(array(c(rep(1:10,10), rnorm(600)), c(100,7)))
> names(a)<-c("id", "varA", "varB", "varC", "varD", "varE", "varF")
> lme(varD ~ varA , random= ~1|id,  data=a, na.action="na.exclude")
> 
> I supossed that a simple sintaxis going through the variables of 
> dataset "a" could cope with it: 
> 
> for(i in 2:4){
> for(j in 5:7){
> lme(a[,j] ~ a[,i] , random= ~1|id,  data=a, na.action="na.exclude")
> }}
> 
> but it does not, and the use of eval, as.symbol and so on does not help. 

> 
> for(i in 2:4){
> for(j in 5:7){
> lme(eval(as.symbol(names(a)[j])) ~ eval(as.symbol(names(a)[i]))  , 
> random= ~1|id,  data=a, na.action="na.exclude")
> }}
> 
> Any help??? Thanks a lot in advance!

    [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Entering Data Files

2012-07-18 Thread darnold
Hi,

Entering data from a given file is the hardest part of learning R for me.
For example, I have this datafile from Moore's text:

LiveAge Count
Parents Age19   324
Another Age19   37
OwnPlaceAge19   116
Group   Age19   58
Other   Age19   5
Parents Age20   378
Another Age20   47
OwnPlaceAge20   279
Group   Age20   60
Other   Age20   2
Parents Age21   337
Another Age21   40
OwnPlaceAge21   372
Group   Age21   49
Other   Age21   3
Parents Age22   318
Another Age22   38
OwnPlaceAge22   487
Group   Age22   25
Other   Age22   9

I can manually enter the numbers and create my table and barplot.

x <- matrix(c(324, 378, 337, 318, 37, 47, 40, 38, 116, 279, 372, 487, 58,
60, 49, 25, 5, 2, 3, 9), c(5,4),byrow=T)
rownames(x)=c("Parents","Another","OwnPlace","Group","Other")
colnames(x)=c("Age19","Age20","Age21","Age22")
x <- as.table(x)

barplot(x,beside=T)

But it would be nice to be able to read the data from the file, arrange it
using R, then plot, if that is possible. Any suggestions?

Thanks

David



--
View this message in context: 
http://r.789695.n4.nabble.com/Entering-Data-Files-tp4636943.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trapping option settings

2012-07-18 Thread David A Vavra
Thanks. Also helpful. 

DAV


-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Wednesday, July 18, 2012 4:20 PM
To: David A Vavra; 'jim holtman'
Cc: r-help@r-project.org
Subject: RE: [R] Trapping option settings

Try using trace(), as in
  trace(options, quote(print(as.list(sys.calls()

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On
> Behalf Of David A Vavra
> Sent: Wednesday, July 18, 2012 1:00 PM
> To: 'jim holtman'
> Cc: r-help@r-project.org
> Subject: Re: [R] Trapping option settings
> 
> Jim,
> 
> Thanks.
> 
> It wasn't sure if merely overriding the options function by placing one in
> the global environment would guarantee it would be the one actually
called.
> In any case, I didn't know how to identify the caller. This is quite
helpful
> and looks promising. I'll give it a try.
> 
> DAV
> 
> 
> 
> 
> -Original Message-
> From: jim holtman [mailto:jholt...@gmail.com]
> Sent: Monday, July 16, 2012 9:59 PM
> To: David A Vavra
> Cc: r-help@r-project.org
> Subject: Re: [R] Trapping option settings
> 
> Here is one way by redefining 'options' so you can check for 'width'
> and then call the 'options' in 'base':
> 
> 
> > options <-  # define 'options' in Global
> + function(...)
> + {
> + args <- list(...)  # get arguments
> + if ('width' %in% names(args)){  # see if 'width' is in them
> + .caller <- sys.calls()  # get where called from
> + if (length(.caller) == 1)  # called from command line
> + .caller <- "Rgui"
> + else .caller <- as.character(.caller[[length(.caller) - 1]])[1]
> + cat("width being changed:", args[['width']], "Called from",
> .caller, '\n')
> + }
> + base::options(...)  # call the real options
> + }
> >
> >
> > options(width = 123) # test at command line
> width being changed: 123 Called from Rgui
> > f.myfunc <- function() options(width = 456)  # within a function
> > f.myfunc()
> width being changed: 456 Called from f.myfunc
> >
> 
> 
> 
> 
> On Mon, Jul 16, 2012 at 9:01 PM, David A Vavra 
wrote:
> > Something has been changing the setting the width option to 1 and
not
> > resetting it. It does this intermittently. What I would like to do is
trap
> > changing the setting so I can determine where the change is occurring
and
> > hopefully fix it.
> >
> > Is there any easy way to do this?
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> --
> Jim Holtman
> Data Munger Guru
> 
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional increase by increment

2012-07-18 Thread William Dunlap
> land<-c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0)
> expect <- c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
> cumsum(c(TRUE, diff(land)==1))
 [1] 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3
> all.equal(.Last.value, expect)
[1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of penguins
> Sent: Wednesday, July 18, 2012 10:13 AM
> To: r-help@r-project.org
> Subject: [R] conditional increase by increment
> 
> I am trying to assign increasing trip numbers to a binary variable ("land";
> 1=home and 0=away) where a string of 1's shouldn't increment the trip_no
> more than once.
> 
> For example; based on
> land<-c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0)
> the "trip_no" sequence produced should be   1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3
> 
> This is as far as I can get but Im stumped. In addition I need it to work on
> data where the land variable can start on "0" or "1" for trip_no=1. Any help
> would be hugely appreciated:
> 
> land<-c(0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0)
> trip_no <- rep(0, length(land))
> gg<-cbind(land,trip_no)
> 
> increment <- function(x){
>   eval.parent(substitute(x <- x + 1))
> }
> 
> for(i in length(gg)){
>   if(gg$land[[i]]==1) {
>   gg$trip_no<-increment(trip_no[i])
> }
>  }
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/conditional-increase-
> by-increment-tp4636910.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting several lme sistematically

2012-07-18 Thread Jean V Adams
You have only saved the last run of the for() loop (when i is 4 and j is 
7) to the object called model.  If you want to save all of the lme fits, 
you need to set give some dimensionality to the object model.  For 
example, you could define model as a list, then store the lme fits as 
separate elements of the list.

ids <- a$id
model <- vector("list", 9)
count <- 0
for(i in 2:4){
for(j in 5:7){
count <- count + 1
y <- a[, j]
x <- a[, i]
model[[count]] <- lme(y ~ x , random= ~1|ids, 
na.action="na.exclude")
}}

Now you can summarize any of the fits, or get predicted values, or 
whatever ...

summary(model[[5]])
predict(model[[7]])

Jean


arun  wrote on 07/18/2012 12:34:31 PM:

> Hi Jean,
> 
>  Is there something missing in the function?
> 
> 
> ids <- a$id
> for(i in 2:4){
> for(j in 5:7){
> y <- a[, j]
> x <- a[, i]
> model<-lme(y ~ x , random= ~1|ids, na.action="na.exclude")
> }}
> 
>  summary(model)
> Linear mixed-effects model fit by REML
>  Data: NULL 
>AIC  BIClogLik
>   281.1838 291.5236 -136.5919
> 
> Random effects:
>  Formula: ~1 | ids
> (Intercept)  Residual
> StdDev:   0.1109054 0.9251637
> 
> Fixed effects: y ~ x 
>   Value  Std.Error DFt-value p-value
> (Intercept)  0.03931479 0.09909825 89  0.3967254  0.6925
> x   -0.11826047 0.09731719 89 -1.2152063  0.2275
>  Correlation: 
>   (Intr)
> x 0.056 
> 
> Standardized Within-Group Residuals:
>Min Q1Med Q3Max 
> -2.0882452 -0.7718563  0.1156507  0.6119178  1.7986478 
> 
> Number of Observations: 100
> Number of Groups: 10 
> 
> A.K.
> 
> 
> 
> - Original Message -
> From: Jean V Adams 
> To: Berta Ibáñez 
> Cc: Lista de R 
> Sent: Wednesday, July 18, 2012 1:02 PM
> Subject: Re: [R] fitting several lme sistematically
> 
> I'm not sure why, but lme() doesn't seem to like the variables to be 
> referenced as part of a list using [ or $.
> Here's an easy workaround ...
> 
> ids <- a$id
> for(i in 2:4){
> for(j in 5:7){
> y <- a[, j]
> x <- a[, i]
> lme(y ~ x , random= ~1|ids, na.action="na.exclude")
> }}
> 
> Jean
> 
> 
> Berta Ibáñez  wrote on 07/18/2012 08:53:51 AM:
> 
> > Dear R-list, 
> > 
> > I have a data set (in the following example called "a") which have: 
> > 
> > one "subject indicator" variable (called "id")
> > three dependent variables (varD, varE, var F)
> > three independent variables (varA, varB, varC)
> > 
> > I want to fit 9 lme models, one per posible combination (DA, DB, DC,
> > EA, EB, EC, FA, FB, FC).
> > In stead of writting the 9 lme models, I want to do it 
> > sistematically (the example is a simplification of what I really 
> > have). Here you have the comands for the first model: 
> > 
> > library(nlme)
> > set.seed(50)
> > a<-data.frame(array(c(rep(1:10,10), rnorm(600)), c(100,7)))
> > names(a)<-c("id", "varA", "varB", "varC", "varD", "varE", "varF")
> > lme(varD ~ varA , random= ~1|id,  data=a, na.action="na.exclude")
> > 
> > I supossed that a simple sintaxis going through the variables of 
> > dataset "a" could cope with it: 
> > 
> > for(i in 2:4){
> > for(j in 5:7){
> > lme(a[,j] ~ a[,i] , random= ~1|id,  data=a, na.action="na.exclude")
> > }}
> > 
> > but it does not, and the use of eval, as.symbol and so on does not 
help. 
> 
> > 
> > for(i in 2:4){
> > for(j in 5:7){
> > lme(eval(as.symbol(names(a)[j])) ~ eval(as.symbol(names(a)[i]))  , 
> > random= ~1|id,  data=a, na.action="na.exclude")
> > }}
> > 
> > Any help??? Thanks a lot in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >