Re: [R] symmetric matrix multiplication

2011-10-23 Thread Daniel Nordlund


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of statfan
> Sent: Saturday, October 22, 2011 10:45 PM
> To: r-help@r-project.org
> Subject: [R] symmetric matrix multiplication
> 
> I have a symmetric matrix B (17x17), and a (17x17) square matrix A.  If do
> the following matrix multiplication I SHOULD get a symmetric matrix,
> however
> i don't.  The computation required is:
> 
> C = t(A)%*%B%*%A
> 
> here are some checks for symmetry
> > (max(abs(B - t(B
> [1] 0
> > C = t(A)%*%B%*%A
> > (max(abs(C - t(C
> [1] 3.552714e-15
> 
> Any help on the matter would be very much appreciated.
> 
> 

Welcome to the world of floating-point calculation on finite precision 
computers.  You need to read R FAQ 7.31.  Your maximum difference is for all 
intents and purposes equal to zero.  

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symmetric matrix multiplication

2011-10-23 Thread Ted Harding
On 23-Oct-11 07:00:07, Daniel Nordlund wrote:
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org]
>> On Behalf Of statfan
>> Sent: Saturday, October 22, 2011 10:45 PM
>> To: r-help@r-project.org
>> Subject: [R] symmetric matrix multiplication
>> 
>> I have a symmetric matrix B (17x17), and a (17x17) square matrix A. 
>> If do
>> the following matrix multiplication I SHOULD get a symmetric matrix,
>> however
>> i don't.  The computation required is:
>> 
>> C = t(A)%*%B%*%A
>> 
>> here are some checks for symmetry
>> > (max(abs(B - t(B
>> [1] 0
>> > C = t(A)%*%B%*%A
>> > (max(abs(C - t(C
>> [1] 3.552714e-15
>> 
>> Any help on the matter would be very much appreciated.
> 
> Welcome to the world of floating-point calculation on
> finite precision computers. You need to read R FAQ 7.31.
> Your maximum difference is for all intents and purposes
> equal to zero.
> 
> Hope this is helpful,
> Dan
> 
> Daniel Nordlund
> Bothell, WA USA

In addition to Dan's comment, let me point out that you
can convert your very nearly symmetric matrix C to an
exactly (even by R's finite-precision standards) symmetric
matrix by using (C + t(C))/2. The result will differ from
the original matrix C by similar "for all intents and purposes
zero" amounts. Here is an example, using 4x4 matrices:

##[1]: The symmetric matrix B:
B <- matrix( c(
  1.1, 1.2, 1.3, 1.4,
  1.2, 2.2, 2.3, 2.4,
  1.3, 2.3, 3.3, 3.4,
  1.4, 2.4, 3.4, 4.4), byrow=TRUE, nrow=4 )

##[2]: The non-symmetric matrix B:
A <- matrix( c(
  1.1, 1.2, 1.3, 1.4,
  2.1, 2.2, 2.3, 2.4,
  3.1, 3.2, 3.3, 3.4,
  4.1, 4.2, 4.3, 4.4), byrow=TRUE, nrow=4 )

##[3]: An allegedly symmetric matrix C1 (constructed
## like your C):
C1 <- t(A)%*%B%*%A

##[4]: But it isn't exactly symmetric:
max(abs(C1 - t(C1)))
# [1] 5.684342e-14

##[5]: So construct an exactly symmetric version:
C2 <- (C1 + t(C1))/2

##[6]: Check that it is exactly symmetric:
max(abs(C2 - t(C2)))
# [1] 0

##[7]: And check how close it is to the original C1:
max(abs(C2 - C1))
# 1] 5.684342e-14

Hoping this helps!
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-11   Time: 08:43:18
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: Re: symmetric matrix multiplication

2011-10-23 Thread Ted Harding
Just to avoid possible confusion, let me correct a typo
(at step [2] in the example below). Apologies!

-FW: -

Date: Sun, 23 Oct 2011 08:43:27 +0100 (BST)
Sender: r-help-boun...@r-project.org
From: (Ted Harding) 
To: r-help@r-project.org
Subject: Re: [R] symmetric matrix multiplication

On 23-Oct-11 07:00:07, Daniel Nordlund wrote:
>> -Original Message-
>> From: r-help-boun...@r-project.org
>> [mailto:r-help-boun...@r-project.org]
>> On Behalf Of statfan
>> Sent: Saturday, October 22, 2011 10:45 PM
>> To: r-help@r-project.org
>> Subject: [R] symmetric matrix multiplication
>> 
>> I have a symmetric matrix B (17x17), and a (17x17) square matrix A. 
>> If do
>> the following matrix multiplication I SHOULD get a symmetric matrix,
>> however
>> i don't.  The computation required is:
>> 
>> C = t(A)%*%B%*%A
>> 
>> here are some checks for symmetry
>> > (max(abs(B - t(B
>> [1] 0
>> > C = t(A)%*%B%*%A
>> > (max(abs(C - t(C
>> [1] 3.552714e-15
>> 
>> Any help on the matter would be very much appreciated.
> 
> Welcome to the world of floating-point calculation on
> finite precision computers. You need to read R FAQ 7.31.
> Your maximum difference is for all intents and purposes
> equal to zero.
> 
> Hope this is helpful,
> Dan
> 
> Daniel Nordlund
> Bothell, WA USA

In addition to Dan's comment, let me point out that you
can convert your very nearly symmetric matrix C to an
exactly (even by R's finite-precision standards) symmetric
matrix by using (C + t(C))/2. The result will differ from
the original matrix C by similar "for all intents and purposes
zero" amounts. Here is an example, using 4x4 matrices:

##[1]: The symmetric matrix B:
B <- matrix( c(
  1.1, 1.2, 1.3, 1.4,
  1.2, 2.2, 2.3, 2.4,
  1.3, 2.3, 3.3, 3.4,
  1.4, 2.4, 3.4, 4.4), byrow=TRUE, nrow=4 )

##[2]: The non-symmetric matrix B:  [OOPS! Typo!!]
##[2]: The non-symmetric matrix A:

A <- matrix( c(
  1.1, 1.2, 1.3, 1.4,
  2.1, 2.2, 2.3, 2.4,
  3.1, 3.2, 3.3, 3.4,
  4.1, 4.2, 4.3, 4.4), byrow=TRUE, nrow=4 )

##[3]: An allegedly symmetric matrix C1 (constructed
## like your C):
C1 <- t(A)%*%B%*%A

##[4]: But it isn't exactly symmetric:
max(abs(C1 - t(C1)))
# [1] 5.684342e-14

##[5]: So construct an exactly symmetric version:
C2 <- (C1 + t(C1))/2

##[6]: Check that it is exactly symmetric:
max(abs(C2 - t(C2)))
# [1] 0

##[7]: And check how close it is to the original C1:
max(abs(C2 - C1))
# 1] 5.684342e-14

Hoping this helps!
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-11   Time: 08:43:18
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--End of forwarded message-


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-11   Time: 08:52:38
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data frame manipulation by eliminating rows containing extreme values

2011-10-23 Thread aajit75
Hi David,

Thanks for the reply,


f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm =
TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)} 

Here parameter 1.5 is set for example in the above function as argument, it
can be even more may be 3.0 after analyzing actual data. Here expectation is
to find cut-off on both sides(higher and lower values) for each variable as
like in box plot. And then I would like to eliminate observations based on
the cut-off.

For the second point, I am extremly sorry. It was because of the typo
mistake, actually in 
xyz <- lapply(data1, f) here it is data2

n <- 100 
x1 <- runif(n) 
x2 <- runif(n) 
x3 <- x1 + x2 + runif(n)/10 
x4 <- x1 + x2 + x3 + runif(n)/10 
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) 
x6 <- 1*(x5=='a' | x5=='c') 
data1 <- cbind(x1,x2,x3,x4,x5,x6) 
data2 <- data.frame(data1) 
xyz <- lapply(data2, f) 
str (xyz)

Now it has list of six only
List of 6
 $ x1: num [1, 1:2] 0.7797 0.0613
 $ x2: num [1, 1:2] 0.9533 0.0194
 $ x3: num [1, 1:2] 1.438 0.532
 $ x4: num [1, 1:2] 2.85 1.03
 $ x5: num [1, 1:2] 4 0
 $ x6: num [1, 1:2] 1.5 -0.5

Third point you mentioned is the problem to resolved, now I am overwriting
data2 applying these cut-offs for each variable. Is there any efficient way
to do this?

 data2 <- subset (data2, x1<=xyz$x1[,1] &  x1>=xyz$x1[,2]) 
 data2 <- subset (data2, x1<=xyz$x2[,1] &  x1>=xyz$x2[,2]) 

On the last point you mentioned, I agree on the removing "extreme values" is
a serious distortion of the data.  But in my data values to some
observations is set to very high number like say . Also this is
not consistent across all variables in the data. So I can set value higher
than 1.5 in the function and get cut-offs for each varibales and remove such
obervations. As rm.outlier removes only one value, I am using above
function.

Thanks for the help in advance.

Regards,
-Ajit




--
View this message in context: 
http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3929927.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A problem with chol() function

2011-10-23 Thread Ron Michael
I think I am missing something with the chol() function. Here is my calculation:
 
> mat
 [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1
> eigen(mat)
$values
[1] 1 1 1 1 1
$vectors
 [,1]  [,2] [,3] [,4] [,5]
[1,]    1 -1.00e+00    0    0    0
[2,]    0  7.401487e-17    0    0    0
[3,]    0  0.00e+00    1    0    0
[4,]    0  0.00e+00    0    1    0
[5,]    0  0.00e+00    0    0    1
> chol(mat)
Error in chol.default(mat) : 
  the leading minor of order 2 is not positive definite

As per the eigen values my matrix is PD (as all eigen values are positive). 
Then why still I can not get Cholesky factor of my matrix? Can somebody point 
mw where I am missing?
 
Thanks and regards,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R for loop stops after 4 iterations

2011-10-23 Thread Philip Robinson
That's fantastic, thank you very much, the qnorm option is interesting, I
will have to play around with it.

Many thanks again
Philip

-Original Message-
From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] 
Sent: Sunday, 23 October 2011 10:28 AM
To: Philip Robinson
Cc: r-help@r-project.org
Subject: Re: [R] R for loop stops after 4 iterations

There's a seeming inconsistency in this question -- namely,  you provide an
example of a data frame with 4 columns but say it is 27x3
-- but I think your question comes from a misunderstanding of what
length(e) calculates. For a data frame it gives the number of columns back.
Hence if you have a 27x4 data frame (which you appear to) iterations will
only fill the first four elements of output.

You'd probably rather use NROW(e). As an aside, for these sort of loops,
seq_along() is usually a very good choice, but it doesn't work here because
of the length() thing.

On another note, why don't you just do the calculation analytically and save
yourself some trouble?


# Something like
with(e, qnorm(0.42, V2, V3)*100)


Michael


On Sat, Oct 22, 2011 at 7:33 PM, Philip Robinson
 wrote:
> I have a data frame called e, dim is 27,3, the first 5 lines look like
this:
>
>
>
>
>
>     V1   V2   V3        V4
>
> 1  1673 0.36 0.08  Smith
>
> 2 167 0.36 0.08     Allen
>
> 3    99 0.37 0.06     Allen
>
> 4   116 0.38 0.07     Allen
>
> 5    95 0.41 0.08     Allen
>
>
>
> I am trying to calculate the proportion/percentage of V1 which would 
> have values >0.42 if V2 was the mean of a normal distribution with V1 
> people and a standard distribution of V3. The loop works but only for 
> 4 iterations then stops, I can't understand why, the code and the 
> output are below
>
>
>
>
>
> output <- rep(NA, 27)
>
> for (i in 1:length(e))
>
> {
>
> x <- rnorm(n=e[i,1], mean=e[i,2], sd=e[i,3])
>
> n <- e[i,1]
>
> v <- x>0.42
>
> q <-(sum(v)/n)*100
>
> output[i] <- q
>
> }
>
>
>
>>output
>
> [1] 22.23551 27.54491 25.25253 19.82759       NA       NA       NA       
> NA NA
>
> [10]       NA       NA       NA       NA       NA       NA       NA       
> NA NA
>
> [19]       NA       NA       NA       NA       NA       NA       NA       
> NA NA
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A problem with chol() function

2011-10-23 Thread Prof Brian Ripley

On Sun, 23 Oct 2011, Ron Michael wrote:


I think I am missing something with the chol() function. Here is my calculation:
 

mat

 [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1

eigen(mat)

$values
[1] 1 1 1 1 1
$vectors
 [,1]  [,2] [,3] [,4] [,5]
[1,]    1 -1.00e+00    0    0    0
[2,]    0  7.401487e-17    0    0    0
[3,]    0  0.00e+00    1    0    0
[4,]    0  0.00e+00    0    1    0
[5,]    0  0.00e+00    0    0    1

chol(mat)

Error in chol.default(mat) :
  the leading minor of order 2 is not positive definite

As per the eigen values my matrix is PD (as all eigen values are 
positive). Then why still I can not get Cholesky factor of my 
matrix? Can somebody point mw where I am missing?   Thanks and 
regards,


Reading the help page:

 Compute the Choleski factorization of a real symmetric
  ^
 positive-definite square matrix.



 Note that only the upper triangular part of ‘x’ is used, so that
   ^^

A <- diag(5)
A[1,2] <- A[2,1] <- 3
eigen(A)$values
[1]  4  1  1  1 -2



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to plot a distribution of mean and standard deviation

2011-10-23 Thread gj
Hi,
I have the following data about courses (504) in a university, two
attributes about the proportion of resources used (#resources_used /
#resources_available), namely the average and the standard deviation.
Thus I have:
[1] n=504 rows
[2] 1 id column and 2 attributes

Here's a sample of the data:

courseid,average,std
12741,1,0
17161,1,0
12514,1,0
12316,0.866692648178,0.26090261464799325
2467,0.8623188442510107,0.24920700355307424
3047,0.85,0.2314550249431379
1747,0.8481481481481481,0.23078446747051584
2487,0.8383838455333854,0.20429589057565342
13869,0.8181818181818182,0.2522624895547565
1706,0.8158730235364702,0.19332287915878024
2041,0.8095238095238095,0.24880667576405963
1864,0.8080808141014793,0.17456052968726046
2106,0.78437623024,0.2475808839379094

.

My question is how can I sensibly visualise this data.

In this context, it does not make sense to go find the population mean
or population std. However, what would sense is showing the cdf of the
mean. So, I'm thinking of doing this using ecdf(). But what about the
standard deviation? How can I include visualise the standard deviation
as well as the mean? Would that make sense on just one plot?

Any idea?

Thanks
Gawesh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] issue loading doBy library

2011-10-23 Thread Giovanni Azua
Hi Josh,

Thank you for your feedback, after lot of trial and error the problem is 
finally solved.

To solve this problem, I tried in this order:

1) uninstalling the two packages "Matrix" and "lme4" and reinstalling them.
2) uninstalling doBy and reinstalling it with and without 1)
3) upgrading to the latest R version and re-doing 1) 2)

And finally 4) wiped all R traces from my Mac OS X 10.7.2 and re-installed the 
latest version. The latest Matrix version seems to have changed in the 
meanwhile so I was lucky and now it works.

It seems to me that the whole concept of dependency analysis for installing 
packages in R is broken ... seems like the packages depend only on the package 
name and not on the specific versions which is wrong as in this case, chances 
are that a user will say in this "live lock" where will never find a "happy 
together" versions of Matrix and lme4 ... but well, you statisticians know 
better about chances :P
 
Thank you again.
Best regards,
Giovanni

On Oct 23, 2011, at 3:12 AM, Joshua Wiley wrote:

> Hi Giovanni,
> 
> This is a dependency issue between lme4 and Matrix.  There is
> substantial discussion of this on the R sig mixed models list.  A
> simple update may fix the problem, or you may need to be a little bit
> more precise about getting version of Matrix and lme4 that work with
> each other.
> 
> HTH,
> 
> Josh

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] issue loading doBy library

2011-10-23 Thread Gabor Grothendieck
On Sun, Oct 23, 2011 at 8:10 AM, Giovanni Azua  wrote:
> Hi Josh,
>
> Thank you for your feedback, after lot of trial and error the problem is 
> finally solved.
>
> To solve this problem, I tried in this order:
>
> 1) uninstalling the two packages "Matrix" and "lme4" and reinstalling them.
> 2) uninstalling doBy and reinstalling it with and without 1)
> 3) upgrading to the latest R version and re-doing 1) 2)
>
> And finally 4) wiped all R traces from my Mac OS X 10.7.2 and re-installed 
> the latest version. The latest Matrix version seems to have changed in the 
> meanwhile so I was lucky and now it works.
>
> It seems to me that the whole concept of dependency analysis for installing 
> packages in R is broken ... seems like the packages depend only on the 
> package name and not on the specific versions which is wrong as in this case, 
> chances are that a user will say in this "live lock" where will never find a 
> "happy together" versions of Matrix and lme4 ... but well, you statisticians 
> know better about chances :P
>

Version information can be incorporated. For example, note the imports
line of the zoo DESCRIPTION file where it specifies a specific version
or later of lattice:

https://r-forge.r-project.org/scm/viewvc.php/pkg/zoo/DESCRIPTION?view=markup&root=zoo

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to plot a distribution of mean and standard deviation

2011-10-23 Thread R. Michael Weylandt
It seems like the relevant plot would depend on what you are trying to
investigate, but usually a scatterplot would well work for bivariate
data with no other assumptions needed. I usually find ecdf() plots
rather hard to interpret without playing around with the data
elsewhere first and I'm not sure they make an enormous amount of sense
for bivariate data in your case since they reorder inputs.

Michael

On Sun, Oct 23, 2011 at 6:51 AM, gj  wrote:
> Hi,
> I have the following data about courses (504) in a university, two
> attributes about the proportion of resources used (#resources_used /
> #resources_available), namely the average and the standard deviation.
> Thus I have:
> [1] n=504 rows
> [2] 1 id column and 2 attributes
>
> Here's a sample of the data:
>
> courseid,average,std
> 12741,1,0
> 17161,1,0
> 12514,1,0
> 12316,0.866692648178,0.26090261464799325
> 2467,0.8623188442510107,0.24920700355307424
> 3047,0.85,0.2314550249431379
> 1747,0.8481481481481481,0.23078446747051584
> 2487,0.8383838455333854,        0.20429589057565342
> 13869,0.8181818181818182,0.2522624895547565
> 1706,0.8158730235364702,0.19332287915878024
> 2041,0.8095238095238095,0.24880667576405963
> 1864,0.8080808141014793,0.17456052968726046
> 2106,0.78437623024,0.2475808839379094
> 
> .
>
> My question is how can I sensibly visualise this data.
>
> In this context, it does not make sense to go find the population mean
> or population std. However, what would sense is showing the cdf of the
> mean. So, I'm thinking of doing this using ecdf(). But what about the
> standard deviation? How can I include visualise the standard deviation
> as well as the mean? Would that make sense on just one plot?
>
> Any idea?
>
> Thanks
> Gawesh
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to delete rows by a list of rownames

2011-10-23 Thread B77S
here is one way

df1 <- data.frame(c(1:20), c(21:40), c(31:50))
list1 <- c(3, 6, 20)
df2 <- df1[-list1,]





hanansela wrote:
> 
> Hello
> I have a list of row names that needs to be deleted from a data frame. How
> do i do that? 
> one of the columns in the data frame contains the row names as numbers. I
> can also select by this column (will it be easier?). 
> Thank you
> 


--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-delete-rows-by-a-list-of-rownames-tp3930206p3930273.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to delete rows by a list of rownames

2011-10-23 Thread R. Michael Weylandt
I think that only works because the rows are ordered and have no
names: try something more like this:

df1 <- data.frame(1:20, 21:40, 31:50)
rownames(df1) <- sample(letters, 20)

toDrop <- sample(rownames(df1), 5)

df1[ !(rownames(df1) %in% toDrop), ]

or alternatively

toKeep <- sample(rownames(df1), 5)

df1[rownames(df1) %in% toKeep, ]

Michael

On Sun, Oct 23, 2011 at 9:30 AM, B77S  wrote:
> here is one way
>
> df1 <- data.frame(c(1:20), c(21:40), c(31:50))
> list1 <- c(3, 6, 20)
> df2 <- df1[-list1,]
>
>
>
>
>
> hanansela wrote:
>>
>> Hello
>> I have a list of row names that needs to be deleted from a data frame. How
>> do i do that?
>> one of the columns in the data frame contains the row names as numbers. I
>> can also select by this column (will it be easier?).
>> Thank you
>>
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/how-to-delete-rows-by-a-list-of-rownames-tp3930206p3930273.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 104, Issue 23

2011-10-23 Thread mihalicza . peter
Október 19-től 21-ig irodán kívül vagyok, és az emailjeimet nem érem el.

Sürgős esetben kérem forduljon Kárpáti Edithez (karpati.e...@gyemszi.hu).

Üdvözlettel,
Mihalicza Péter


I will be out of the office from 19 till 21 October with no access to my emails.

In urgent cases please contact Ms. Edit Kárpáti (karpati.e...@gyemszi.hu).

With regards,
Peter Mihalicza

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread apeer
Dr. Harrell,

Thanks for your response.  The predictor variables I initially included in
the model were based on the x mean plots and whether they exhibited
ordinality and whether they appeared to meet the CR assumptions.  Only 7 of
16 potential variables fit that designation and those are the variables I
initially included.  I then used backward variable selection, which selected
3 significant terms.  Does that seem reasonable?  

Also, are you saying that if the exceedence probabilites for the middle Y
category have a wide range then keeping the model as is would be fine for
future predictions?

Thanks for your time,
Adam

--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930088.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to delete rows by a list of rownames

2011-10-23 Thread hanansela
Hello
I have a list of row names that needs to be deleted from a data frame. how
do i do that. 
one of the columns in the data frame contains the row names as numbers I can
also select by this column(will it be easier?). 
Thank you  

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-delete-rows-by-a-list-of-rownames-tp3930206p3930206.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to create a new variable based on parts of another character variable.

2011-10-23 Thread Philipp Fischer
Hello,
I am just starting with R and I am having a (most probably) stupid problem by 
creating a new variable in a data.frame based on a part of another character 
variable.

I have a data frame like this one:


A   B   C
AWI-test1   1   i
AWI-test5   2   r
AWI-tes75   56  z
UFT-2   5   I
UFT56   f   t
UFT356  9j  t
etc. etc.   89  t


I now want to look in the variable A if the string AWI is present and then 
create a variable D and putting "Arctic" inside. However, if the string UFT 
occurs in the variable A, then the variable D shall be "Boreal" etc. etc.

The resulting data.frame file should look like 
A   B   C   D
AWI-test1   1   i   Arctic  
AWI-test5   2   r   Arctic
AWI-tes75   56  z   Arctic
UFT-2   5   I   Boreal
UFT56   f   t   Boreal
UFT356  9j  t   Boreal
etc. etc.   89  t


I know how to do this when I want to look for the entire string of A means when 
there is "AWI-test1" and then create the variable D with "Arctic" but not how 
to look only for a substring in A?
Would be great if somebody might help.
Thanks
Philipp



*** 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a new variable based on parts of another character variable.

2011-10-23 Thread jim holtman
Use regular expressions

?grepl

On Sunday, October 23, 2011, Philipp Fischer  wrote:
> Hello,
> I am just starting with R and I am having a (most probably) stupid problem
by creating a new variable in a data.frame based on a part of another
character variable.
>
> I have a data frame like this one:
>
>
> A   B   C
> AWI-test1   1   i
> AWI-test5   2   r
> AWI-tes75   56  z
> UFT-2   5   I
> UFT56   f   t
> UFT356  9j  t
> etc. etc.   89  t
>
>
> I now want to look in the variable A if the string AWI is present and then
create a variable D and putting "Arctic" inside. However, if the string UFT
occurs in the variable A, then the variable D shall be "Boreal" etc. etc.
>
> The resulting data.frame file should look like
> A   B   C   D
> AWI-test1   1   i   Arctic
> AWI-test5   2   r   Arctic
> AWI-tes75   56  z   Arctic
> UFT-2   5   I   Boreal
> UFT56   f   t   Boreal
> UFT356  9j  t   Boreal
> etc. etc.   89  t
>
>
> I know how to do this when I want to look for the entire string of A means
when there is "AWI-test1" and then create the variable D with "Arctic" but
not how to look only for a substring in A?
> Would be great if somebody might help.
> Thanks
> Philipp
>
>
>
> ***
>
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A problem with chol() function

2011-10-23 Thread Bert Gunter
Perhaps to clarify Prof. Ripley's remarks below , the part that you missed
was "symmetric," which your matrix obviously is not.

-- Bert

2011/10/23 Prof Brian Ripley 

> On Sun, 23 Oct 2011, Ron Michael wrote:
>
>  I think I am missing something with the chol() function. Here is my
>> calculation:
>>
>>
>>> mat
>>>
>>  [,1] [,2] [,3] [,4] [,5]
>> [1,]13000
>> [2,]01000
>> [3,]00100
>> [4,]00010
>> [5,]00001
>>
>>> eigen(mat)
>>>
>> $values
>> [1] 1 1 1 1 1
>> $vectors
>>  [,1]  [,2] [,3] [,4] [,5]
>> [1,]1 -1.00e+00000
>> [2,]0  7.401487e-17000
>> [3,]0  0.00e+00100
>> [4,]0  0.00e+00010
>> [5,]0  0.00e+00001
>>
>>> chol(mat)
>>>
>> Error in chol.default(mat) :
>>   the leading minor of order 2 is not positive definite
>>
>> As per the eigen values my matrix is PD (as all eigen values are
>> positive). Then why still I can not get Cholesky factor of my matrix? Can
>> somebody point mw where I am missing?   Thanks and regards,
>>
>
> Reading the help page:
>
> Compute the Choleski factorization of a real symmetric
>  ^
> positive-definite square matrix.
>
> 
>
> Note that only the upper triangular part of ‘x’ is used, so that
>   ^^
>
> A <- diag(5)
> A[1,2] <- A[2,1] <- 3
> eigen(A)$values
> [1]  4  1  1  1 -2
>
>
>
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  
> http://www.stats.ox.ac.uk/~**ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unfold list (variable number of columns) into a data frame

2011-10-23 Thread Giovanni Azua
Hello,

I used R a lot one year ago and now I am a bit rusty :)

I have my raw data which correspond to the list of runtimes per minute (minute 
"1" "2" "3" in two database modes "sharding" and "query" and two workload types 
"query" and "refresh") and as a list of char arrays that looks like this:

> str(data)
List of 122
 $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
"79" ...
 $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
"79" ...
 $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  ...
 $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
 $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
 $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
"79" ...
 
I would like to transform the one above into a data frame where this structure 
in unfolded in the following way:

'data.frame': N obs. of  3 variables:
 $ time : int  1 1 1 1 1 1 1 1 1 1 1 ...
 $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" 
"sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
 $ workload : chr "query" "query" "query" "query" "query" "query" "query" 
"refresh" "refresh" "refresh" "refresh" ...
 $ runtime : num  607 85 52 79 77 67 98 2932 2870 2877 2868...

So instead of having an associative array (variable number of columns) it 
should become a simple list where the group or factors are repeated for every 
occurrence of the  specific runtime. Basically my ultimate goal is to get a 
data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" i.e. 
using this data frame format.

Help greatly appreciated!

TIA,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread Frank Harrell
That's not reasonable for 2 reasons.  First, selecting variables based on
apparent assumption satisfaction is an unexplored technique.  Second, you
failed to account for variable selection during resampling validation.  You
will need to give the model all CANDIDATE variables and use the bw=TRUE
option for validate() and calibrate() to get the right answer.  You'll have
to specify the stopping rule too.

If there is a wide range of predicted probabilities then an Emax of 0.05 is
less stressful.  But the Emax is meaningless if you didn't repeat all
modeling steps that used Y for each resampling iteration.

Frank

apeer wrote:
> 
> Dr. Harrell,
> 
> Thanks for your response.  The predictor variables I initially included in
> the model were based on the x mean plots and whether they exhibited
> ordinality and whether they appeared to meet the CR assumptions.  Only 7
> of 16 potential variables fit that designation and those are the variables
> I initially included.  I then used backward variable selection, which
> selected 3 significant terms.  Does that seem reasonable?  
> 
> Also, are you saying that if the exceedence probabilites for the middle Y
> category have a wide range then keeping the model as is would be fine for
> future predictions?
> 
> Thanks for your time,
> Adam
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930552.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to plot a distribution of mean and standard deviation

2011-10-23 Thread Ben Bolker
R. Michael Weylandt  gmail.com> writes:

> It seems like the relevant plot would depend on what you are trying to
> investigate, but usually a scatterplot would well work for bivariate
> data with no other assumptions needed. I usually find ecdf() plots
> rather hard to interpret without playing around with the data
> elsewhere first and I'm not sure they make an enormous amount of sense
> for bivariate data in your case since they reorder inputs.
> 
> Michael

 [snip]
> On Sun, Oct 23, 2011 at 6:51 AM, gj  gmail.com> wrote:
> > Hi,
> > I have the following data about courses (504) in a university, two
> > attributes about the proportion of resources used (#resources_used /
> > #resources_available), namely the average and the standard deviation.
> > Thus I have:
> > [1] n=504 rows
> > [2] 1 id column and 2 attributes
> >
> > Here's a sample of the data:

 [snip]


  You could make a "caterpillar plot" as follows:

X <- read.csv("coursetmp.dat")
library(ggplot2)
X <- transform(X,courseid=reorder(courseid,average))
ggplot(X,aes(x=courseid,y=average,
   ymin=average-2*std,ymax=average+2*std))+geom_point()+
  geom_linerange()+coord_flip()

  (Here the x and y axes are flipped because it's easier to plot & read
the course ID labels that way)

  Of course, the answer to "how should I visualize these data?" always
depends on what you want to find out ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Segfault and bad output with fOptions::rnorm.sobol

2011-10-23 Thread Robert McDonald
>
>  I think your question is answered by
>
> http://cran.r-project.org/web/packages/fOptions/ChangeLog
>
> 2010-04-23  chalabi
>
>* ChangeLog, DESCRIPTION: updated DESCR and ChangeLog
>* src/085A-LowDiscrepancy.f: fixed sobol RVS on 64 bit platform
>* ChangeLog, DESCRIPTION: updated DESC and ChangeLog
>
>  The middle item seems to address your problem exactly.
>  That fix is 18 months old, so updating might be a good idea ...
>
>  Ben Bolker


Ben, thanks very much for the pointer but the bug must not really be fixed.
I encounter the problem when using version 2140.79, which is the version at
http://cran.r-project.org/web/packages/fOptions/ and which has a date of
2011-06-08.

Oddly I have a machine with (what I thought was) an up-to-date version of
13.2 which includes fOptions version 2110.78, dated 2010-04-27. This is the
one install on which I have *not* encountered the problem. Very strange!

Bob

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Exponential fit of form y=exp(a*x) and not of form y=l*exp(a*x)

2011-10-23 Thread Henri Mone
Dear R Users, Beginners and Experts,

I want to fit to my data an exponential function with following functional form:
y=exp(a*x)

I used the function "nls" but this gives me exponential fits with
following functional form:
y=l*exp(a*x)

With "l" being an scaling factor. What do I need to change in my R code?

t.dataFitModel=nls(t.dataForFitY ~exp(a*t.dataForFitX),
data=t.dataForFit, start=list(a = 0.01242922), trace=TRUE, algorithm =
"plinear")



Thanks in advance and a nice weekend,
Henri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mlogit dummy problem

2011-10-23 Thread Ville Iiskola
Hi

I have tried to estimate  race  winning probabilities with mlogit in R. I have 
different amount of contestors in the races and mlogit has a bug so that in 
those situations the mlogit does not work. So i tried to add dummy contestors 
to the race so that every race has an equal amount of contestors. 
The problem is that the dummies infect the estimates.

In the attached excel file i have an examble of the situation. The first sheet 
has no dummies and the second sheet has the same information but dummies are 
added. When i run mlogit as followin

library(RODBC)

library(mlogit)

library(foreign)

z<-odbcConnectExcel("D:\\Testi2.xls")

y<-sqlFetch(z,"eidummyjä")

Mallidata=mlogit.data(y,choice="Voittaja",shape="long",id.var="Päiväjalähtö",alt.var="Kilpailunumero")

summary(mlogit(Voittaja  ~ Onkokaikkikengätpois + Onkoosakengistäjalassa+ 
OurChoicedummy +MvaiN+OvaiR+Log-1 , data=Mallidata))



Coefficients :
 Estimate Std. Error t-value Pr(>|t|)
Onkokaikkikengätpois -16.7722 25041.3517 -0.0007   0.9995
Onkoosakengistäjalassa38.9587  8371.6220  0.0047   0.9963
OurChoicedummy39.5846 26049.3329  0.0015   0.9988
MvaiN-41.4435 10733.2019 -0.0039   0.9969
OvaiR 58.0948 11534.2609  0.0050   0.9960
Log   -6.5563  4870.5160 -0.0013   0.9989

Log-Likelihood: -1.5546e-07



And the dummies run

Coefficients :
 Estimate Std. Error t-value Pr(>|t|)
Onkokaikkikengätpois -15.4053 31857.0476 -0.0005   0.9996
Onkoosakengistäjalassa40.0112 17493.0896  0.0023   0.9982
OurChoicedummy39.3606 34019.6023  0.0012   0.9991
MvaiN-40.4852 13794.8400 -0.0029   0.9977
OvaiR 57.0165 14268.6747  0.0040   0.9968
Log   -5.5429  9380.3919 -0.0006   0.9995

Log-Likelihood: -1.7162e-07

So the likelihoods are different and estimates also. 

How should i add the dummies or is there some other way of doing this...?

Ville 






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread apeer
I guess I must be misunderstanding the point of checking the ordinality
assumptions prior to fitting a model.  Are you saying that a response
variable that does not behave in an ordinal fashion can still be included in
the initial and final model?

--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3930644.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unfold list (variable number of columns) into a data frame

2011-10-23 Thread Dennis Murphy
Hi:

Here's one approach:

# Function to process a list component into a data frame
ff <- function(x) {
 data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
runtime = as.numeric(x[4:length(x)]) )
   }

# Apply it to each element of the list:
do.call(rbind, lapply(data, ff))

or equivalently, using the plyr package,

library('plyr')
ldply(data, ff)

# Example:
L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
"67", "98"),
  c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
  c("1", "replication", "query", "2891", "2907", "2922", "2937"))
do.call(rbind, lapply(L, ff))
   time partitioning_mode workload runtime
1 1  shardingquery 607
2 1  shardingquery  85
3 1  shardingquery  52
4 1  shardingquery  79
5 1  shardingquery  77
6 1  shardingquery  67
7 1  shardingquery  98
8 1  sharding  refresh2932
9 1  sharding  refresh2870
101  sharding  refresh2877
111  sharding  refresh2868
121   replicationquery2891
131   replicationquery2907
141   replicationquery2922
151   replicationquery2937

HTH,
Dennis

On Sun, Oct 23, 2011 at 8:38 AM, Giovanni Azua  wrote:
> Hello,
>
> I used R a lot one year ago and now I am a bit rusty :)
>
> I have my raw data which correspond to the list of runtimes per minute 
> (minute "1" "2" "3" in two database modes "sharding" and "query" and two 
> workload types "query" and "refresh") and as a list of char arrays that looks 
> like this:
>
>> str(data)
> List of 122
>  $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  
> ...
>  $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
> "79" ...
>  $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  
> ...
>  $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
> "79" ...
>  $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"  
> ...
>  $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
>  $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
>  $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79" 
> "79" ...
>
> I would like to transform the one above into a data frame where this 
> structure in unfolded in the following way:
>
> 'data.frame': N obs. of  3 variables:
>  $ time : int  1 1 1 1 1 1 1 1 1 1 1 ...
>  $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding" 
> "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
>  $ workload : chr "query" "query" "query" "query" "query" "query" "query" 
> "refresh" "refresh" "refresh" "refresh" ...
>  $ runtime : num  607 85 52 79 77 67 98 2932 2870 2877 2868...
>
> So instead of having an associative array (variable number of columns) it 
> should become a simple list where the group or factors are repeated for every 
> occurrence of the  specific runtime. Basically my ultimate goal is to get a 
> data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly" 
> i.e. using this data frame format.
>
> Help greatly appreciated!
>
> TIA,
> Best regards,
> Giovanni
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exponential fit of form y=exp(a*x) and not of form y=l*exp(a*x)

2011-10-23 Thread Ben Bolker
Henri Mone  gmail.com> writes:

> I want to fit to my data an exponential function with following 
> functional form:
> y=exp(a*x)
> 
> I used the function "nls" but this gives me exponential fits with
> following functional form:
> y=l*exp(a*x)
> 
> With "l" being an scaling factor. What do I need to change in my R code?
> 
> t.dataFitModel=nls(t.dataForFitY ~exp(a*t.dataForFitX),
> data=t.dataForFit, start=list(a = 0.01242922), trace=TRUE, algorithm =
> "plinear") 

  Use an algorithm other than "plinear", I think (admittedly this
is not at all clear from ?nls -- you would really have to go to the
references to find out).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Imposing Feller condition using project constraint in spg

2011-10-23 Thread Kristian Lind
Hi Ravi,

Thank you for your reply and please excuse my late response.

Plugging w2’ = k/w1’ from (A) into (B) yields

(C) f(w1') = (w1-w1’)^2 + (w2-k/w1’)^2

The partial derivative wrt w1' is

(D) df(w1')/ dw1' = -2(w1-w1’) + 2(w2-k/w1’)*k/(w1')^2

in order for this to be a minimum the f.o.c. df(w1')/ dw1' = 0 and the
s.o.c. d^2f(w1')/ d(w1')^2 >0 must be satisfied.

I follow you on substituting the constraint into the distance from the
iterate to (w1', w2') and then minimizing this distance, but I'm not quite
sure how to turn this into a projection function. Any suggestions?

Regards,

Kristian

2011/9/8 Ravi Varadhan 

>  Hi Kristian,
>
> ** **
>
> The idea behind projection is that you take an iterate that violates the
> constraints and project it onto a point such that it is the nearest point
> that satisfies the constraints.  
>
> ** **
>
> Suppose you have an iterate (w1, w4) that does not satisfy the constraint
> that w1 * w4 != (1 + eps)/2.  Our goal is to find a (w1’, w2’), given (w1,
> w2), such that
>
> ** **
>
> **(A)   **w1’ * w2’ = (1+eps)/2 = k
>
> **(B)   **(w1-w1’)^2 + (w2-w2’)^2 is minimum.  
>
> ** **
>
> This is quite easy to solve.  We know (w1, w2).  You plug in w2’ = k/w1’
> from (A) into (B) and minimize the function of w1’.  This is a simple
> calculus exercise, and I will leave this as a homework problem for you to
> solve!
>
> ** **
>
> Best,
>
> Ravi.
>
> ** **
>
> ---
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor,
>
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> ** **
>
> Ph. (410) 502-2619
>
> email: rvarad...@jhmi.edu
>
> ** **
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] setMethod "[" - extract by names within Slot

2011-10-23 Thread Martin Morgan

On 10/22/2011 08:03 AM, Omphalodes Verna wrote:

Thanks Martin.

Here is my ''updated'' code.

setClass("myClass", representation(ID.r = "numeric", ID.c = "character", DAT = 
"matrix"))

to.myClass<- function(ID.r, ID.c, DAT) {
 out<- new("myClass", ID.r = ID.r, ID.c = ID.c, DAT = DAT)
 return(out)
   }

setMethod("[", "myClass", function(x, i, j, drop = TRUE) {
 x@ID.r<- x@ID.r[i]
 x@ID.c<- x@ID.c[j]
 out.0<- x@DAT[i,j]
 out.1<- to.myClass(x@ID.r, x@ID.c, as.matrix(out.0))
 return(out.1)
   })

setMethod("[", c("myClass", "ANY", "character"),
 function(x, i, j, ..., drop = TRUE) {
 if(missing(i)) {x@ID.r<- x@ID.r} else {x@ID.r<- x@ID.r[i]}
 j<- which(j == x@ID.c)
 x@ID.c<- x@ID.c[j]
 out.0<- x@DAT[i, j]
 out.1<- to.myClass(x@ID.r, x@ID.c, as.matrix(out.0))
 return(out.1)
   })

a<- to.myClass(seq(1,25), c("A","A","B","B"), matrix(rnorm(100), nrow = 25))
a

a[1:20, ] #works
a[, 1:3] #works
a[1:10, 1:3] #works
a[, "A"] #works
a[5:20, "B"] #works

It works, but Is it normal to write two codes for setMethod???


Hi --

I defined the class as

setClass("A",
 representation=representation(
   rid="integer",
   cid="character",
   elt="matrix"))

A common pattern is that the methods provide a 'facade' that make 
different user inputs conform to a particular signature, and then all 
invoke a common function where the complicated work is done; sometimes 
the function is one of the methods. Here's where I'd do the work


setMethod("[", c("A", "numeric", "character"),
function(x, i, j, ..., drop=TRUE)
{
cidx <- match(j, x@cid)
if (any(is.na(cidx)))
stop("invalid 'j'")
initialize(x, rid=x@rid[i], cid=x@cid[cidx],
   elt=x@elt[i, cidx, drop=FALSE])
})

This uses 'initialize' as a copy constructor. Any complicated code for 
subsetting would be added to this method, and only in one place.


For "[" a minimal facade needs to handle the cases where i, j, or both 
are missing -- these are the facade, doing some minimal work to make it 
possible to invoke the underlying work-horse



setMethod("[", c("A", "missing", "character"),
function(x, i, j, ..., drop=TRUE)
{
x[x@rid, j, ..., drop=drop]
})

setMethod("[", c("A", "numeric", "missing"),
function(x, i, j, ..., drop=TRUE)
{
x[i, x@cid, ..., drop=drop]
})

setMethod("[", c("A", "missing", "missing"),
function(x, i, j, ..., drop=TRUE)
{
x[x@rid, x@cid, ..., drop=drop]
})

You also want to use a numeric (actually, integer) value for the second 
argument. This requires two more facade methods, distinguishing between 
a 'missing' first argument and an 'ANY' first argument


setMethod("[", c("A", "ANY", "numeric"),
function(x, i, j, ..., drop=TRUE)
{
x[i, x@cid[j], ..., drop=TRUE]
})

setMethod("[", c("A", "missing", "numeric"),
function(x, i, j, ..., drop=TRUE)
{
x[, x@cid[j], ..., drop=drop]
})

Here are some tests

a <- new("A", rid=1:5, cid=c("A", "B", "C"), elt=matrix(1:15, nrow=5))
a[1:2, "A"]

a[1:2,]
a[,"A"]
a[,]

a[1:2, 1:2]
a[,1:2]

The use of 'numeric' is a little loose, allowing a[1.1,] for instance, 
but a stricter 'integer' is probably too inconvenient for the user. 
'rid' is a bit weird -- is the user supposed to index it (x@rid[i], 
x@elt[i,]) or match it (ridx = match(i, x@rid); x@rid[ridx], x@elt[ridx,])?


Martin


Nice weekend, OV



From: Martin Morgan

Cc: "r-help@r-project.org"
Sent: Saturday, October 22, 2011 3:50 PM
Subject: Re: [R] setMethod "[" - extract by names within Slot

On 10/22/2011 02:11 AM, Omphalodes Verna wrote:

Hi R-helper!

I have problem with setMethods for "[". Here is example :

setClass("myClass", representation(ID.r = "numeric", ID.c = "character", DAT = 
"matrix"))

to.myClass<- function(ID.r, ID.c, DAT) {
   out<- new("myClass", ID.r = ID.r, ID.c = ID.c, DAT = DAT)
   return(out)
 }

setMethod("[", "myClass", function(x, i, j, drop) {
   x@ID.r<- x@ID.r[i]
   x@ID.c<- x@ID.c[j]
   out.0<- x@DAT[i,j]
   out.1<- to.myClass(x@ID.r, x@ID.c, as.matrix(out.0))
   return(out.1)
 })

a<- to.myClass(seq(1,25), c("A","A","B","B"), matrix(rnorm(100), nrow = 25))
a


a[1:20, ] #works
a[, 1:3] #works
a[1:10, 1:3] #works

a[, "A"] #not works


thinking about your code, this is the same as


ID.c = c("A","A","B","B")
j = "A"
ID.c[j]

[1] NA




What is solution to write "[" methods for extraction by names of Slot "ID.c"


Maybe (untested)

setMethod("[", c("myClass", "ANY", "character"),
function(x, i, j, ..., drop=TRUE) {
j = match(j, x@ID.c)
x[i, j, ..., drop=TRUE]
   })



Thanks all. OV

 [[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commen

Re: [R] unfold list (variable number of columns) into a data frame

2011-10-23 Thread Giovanni Azua
Hi Dennis,

Thank you very nice :)

Best regards,
Giovanni

On Oct 23, 2011, at 6:55 PM, Dennis Murphy wrote:

> Hi:
> 
> Here's one approach:
> 
> # Function to process a list component into a data frame
> ff <- function(x) {
> data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
>runtime = as.numeric(x[4:length(x)]) )
>   }
> 
> # Apply it to each element of the list:
> do.call(rbind, lapply(data, ff))
> 
> or equivalently, using the plyr package,
> 
> library('plyr')
> ldply(data, ff)
> 
> # Example:
> L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
> "67", "98"),
>  c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
>  c("1", "replication", "query", "2891", "2907", "2922", "2937"))
> do.call(rbind, lapply(L, ff))
>   time partitioning_mode workload runtime
> 1 1  shardingquery 607
> 2 1  shardingquery  85
> 3 1  shardingquery  52
> 4 1  shardingquery  79
> 5 1  shardingquery  77
> 6 1  shardingquery  67
> 7 1  shardingquery  98
> 8 1  sharding  refresh2932
> 9 1  sharding  refresh2870
> 101  sharding  refresh2877
> 111  sharding  refresh2868
> 121   replicationquery2891
> 131   replicationquery2907
> 141   replicationquery2922
> 151   replicationquery2937
> 
> HTH,
> Dennis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread Giovanni Azua
Hello,

This is one problem at the time :)

I have a data frame df that looks like this:

  time partitioning_mode workload runtime
1 1  shardingquery 607
2 1  shardingquery  85
3 1  shardingquery  52
4 1  shardingquery  79
5 1  shardingquery  77
6 1  shardingquery  67
7 1  shardingquery  98
8 1  sharding  refresh2932
9 1  sharding  refresh2870
101  sharding  refresh2877
111  sharding  refresh2868
121   replicationquery2891
131   replicationquery2907
141   replicationquery2922
151   replicationquery2937

and if I could use SQL ... omg! I really wish I could! I would do exactly this:

insert into throughput
  select time, partitioning_mode, count(*)
  from data.frame 
  group by time, partitioning_mode

My attempted R versions are wrong and produce very cryptic error message:

> throughput <- aggregate(x=df[,c("time", "partitioning_mode")], 
> by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

> throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) : 
  incorrect number of dimensions

>throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count)
I cant comprehend what comes out from this one ... :(

and I thought C++ template errors were the most cryptic ;P

Many many thanks in advance,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread David Winsemius


On Oct 23, 2011, at 1:29 PM, Giovanni Azua wrote:


Hello,

This is one problem at the time :)

I have a data frame df that looks like this:

> df <-read.table(textConnection(" time partitioning_mode workload  
runtime

+ 1 1  shardingquery 607
+ 2 1  shardingquery  85
+ 3 1  shardingquery  52
+ 4 1  shardingquery  79
+ 5 1  shardingquery  77
+ 6 1  shardingquery  67
+ 7 1  shardingquery  98
+ 8 1  sharding  refresh2932
+ 9 1  sharding  refresh2870
+ 101  sharding  refresh2877
+ 111  sharding  refresh2868
+ 121   replicationquery2891
+ 131   replicationquery2907
+ 141   replicationquery2922
+ 151   replicationquery2937"))
>
> df$throughput <- ave(df$time, list(df$time, df$partitioning_mode),  
FUN=length)

> df
   time partitioning_mode workload runtime throughput
1 1  shardingquery 607 11
2 1  shardingquery  85 11
3 1  shardingquery  52 11
4 1  shardingquery  79 11
5 1  shardingquery  77 11
6 1  shardingquery  67 11
7 1  shardingquery  98 11
8 1  sharding  refresh2932 11
9 1  sharding  refresh2870 11
101  sharding  refresh2877 11
111  sharding  refresh2868 11
121   replicationquery2891  4
131   replicationquery2907  4
141   replicationquery2922  4
151   replicationquery2937  4




and if I could use SQL ... omg! I really wish I could! I would do  
exactly this:


You can of, course use package sqldf, which would undoubtedly be good  
practice for me, but this seemed like a typical situation for using  
'ave'. You do need to use the FUN= construction in 'ave' because that  
argument appears after the triple dots in the argument list.




insert into throughput
 select time, partitioning_mode, count(*)
 from data.frame
 group by time, partitioning_mode

My attempted R versions are wrong and produce very cryptic error  
message:


throughput <- aggregate(x=df[,c("time", "partitioning_mode")],  
by=list(df$time,df$partitioning_mode), count)

Error in `[.default`(df2, u_id, , drop = FALSE) :
 incorrect number of dimensions

throughput <- aggregate(x=df, by=list(df$time,df 
$partitioning_mode), count)

Error in `[.default`(df2, u_id, , drop = FALSE) :
 incorrect number of dimensions

throughput <- tapply(X=df$time, INDEX=list(df$time,df 
$partitioning), FUN=count)

I cant comprehend what comes out from this one ... :(

and I thought C++ template errors were the most cryptic ;P

Many many thanks in advance,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Summary stats in table

2011-10-23 Thread Duncan Murdoch

Suppose I have data like this:

A <- sample(letters[1:3], 1000, replace=TRUE)
B <- sample(LETTERS[1:2], 1000, replace=TRUE)
x <- rnorm(1000)

I can get a table of means via

tapply(x, list(A, B), mean)

and I can add the marginal means to this using cbind/rbind:

main <- tapply(x, list(A,B), mean)
Amargin <- tapply(x, list(A), mean)
Bmargin <- tapply(x, list(B), mean)

rbind(cbind(main, all=Amargin),all=c(Bmargin, mean(x)))

But this is tedious.  Has some package got some code that makes this easier?

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting with a symbol on every nth point

2011-10-23 Thread Carl Witthoft


If you just want the same symbol at each point, you could use Weylandt's 
approach, tho' personally I think it's tidier to create a new vector


x10 <- x[seq(1,length(x),by=10)]  and plot that.

If you would like a different symbol at each point, then take a look at 
?text.






From: R. Michael Weylandt 
Date: Fri, 21 Oct 2011 17:21:34 -0400
Try something like this:

plot(x,type="o", pch = c(5,rep(NA,9)))

for, e.g., every 10th point.

Michael Weylandt

On Fri, Oct 21, 2011 at 5:18 PM, zugi young  
wrote:

> Hi,
>
> I would like to produce a plot with a symbol on every nth point in a 
time

> series data, like the one in the following:
>
> http://www.phon.ucl.ac.uk/home/yi/ProsodyPro/EnglishFocus.png
>
> x <- seq(-100,1000,25)
> plot(x,type="l")
>
> Could someone help me out with the above example?

--

Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to save an R object to a remote computer

2011-10-23 Thread Molly Davies
Hello,

I am running R remotely on my university's network from my laptop (Macbook Pro, 
running leopard, in case this is useful). I have a strict limit on how much 
disk space I can take up on my network account at school, which is insufficient 
for the size of some of the objects I need to create. Is there any way to use 
save() and write.table() in R to export directly to a remote machine (in this 
case that would be my laptop, which has plenty of room)? I need to save vectors 
of lists of lists (output from mclapply). So far, my search has led me to 
various database utilities. I suppose I could try to make that work, but I've 
no experience with databases and am unsure if that is the best way for me to go.
Any advice (including search terms I might not have thought of yet) would be 
much appreciated.

Thanks very much,

Molly
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exponential fit of form y=exp(a*x) and not of form y=l*exp(a*x)

2011-10-23 Thread Carl Witthoft

You misused nls().  Observe:


x<- 1:101
y2 <- 5*exp(x/20) + runif(101)/100 # nls will NOT converge for perfect data.

nls(y2 ~ exp(A*x), start=list(A=.1))

Nonlinear regression model
  model:  y2 ~ exp(A * x)
   data:  parent.frame()
  A
0.06709
 residual sum-of-squares: 136703

Number of iterations to convergence: 7
Achieved convergence tolerance: 2.694e-06



Which is a lousy fit.  Compare with

nls(y2~B*exp(A*x), start=list(A=.1,B=.3))

Nonlinear regression model
  model:  y2 ~ B * exp(A * x)
   data:  parent.frame()
A B
0.050 5.001
 residual sum-of-squares: 0.001398

Number of iterations to convergence: 13
Achieved convergence tolerance: 5.073e-08


So either form works, but only one will give you a result that fits your 
original data.







Henri Mone  gmail.com> writes:

> I want to fit to my data an exponential function with following
> functional form:
> y=exp(a*x)
>
> I used the function "nls" but this gives me exponential fits with
> following functional form:
> y=l*exp(a*x)
>
> With "l" being an scaling factor. What do I need to change in my R code?
>
> t.dataFitModel=nls(t.dataForFitY ~exp(a*t.dataForFitX),
> data=t.dataForFit, start=list(a = 0.01242922), trace=TRUE, algorithm =
> "plinear")
  Use an algorithm other than "plinear", I think (admittedly this is 
not at all clear from ?nls -- you would really have to go to the 
references to find out).



--

Sent from my Cray XK6
"Pendeo-navem mei anguillae plena est."

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save an R object to a remote computer

2011-10-23 Thread Barry Rowlingson
On Sun, Oct 23, 2011 at 7:46 PM, Molly Davies  wrote:
> Hello,
>
> I am running R remotely on my university's network from my laptop (Macbook 
> Pro, running leopard, in case this is useful). I have a strict limit on how 
> much disk space I can take up on my network account at school, which is 
> insufficient for the size of some of the objects I need to create. Is there 
> any way to use save() and write.table() in R to export directly to a remote 
> machine (in this case that would be my laptop, which has plenty of room)? I 
> need to save vectors of lists of lists (output from mclapply). So far, my 
> search has led me to various database utilities. I suppose I could try to 
> make that work, but I've no experience with databases and am unsure if that 
> is the best way for me to go.
> Any advice (including search terms I might not have thought of yet) would be 
> much appreciated.

 The easiest way would be to do this at the operating system level -
not using R. Some kind of shared file system between the computers.
How are you running R remotely? What OS is the remote machine? Are you
using connecting to a Windows machine via RDP (Remote Desktop) or a
Linux box with SSH or something else?

 With RDP its possible to tell the remote desktop program to mount
local drives (such as hard disks or USB drives) as extra drives on the
remote windows box (so you see extra K:,J: etc drives). Then you'd
just get R to write to K:\something\ and its going straight on your
laptop.

 For a remote Linux machine you might be able to use SSHFS to create a
connection to your laptop from the server.

 However these things will probably rely on things being installed on
the server and possibly friendly network technicians. Sometimes the
real easiest solution involves smiling sweetly to the people who
control the disk space, and if that doesn't work get your supervisor
to try it...

I'll just check you're not one of our students...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exponential fit of form y=exp(a*x) and not of form y=l*e

2011-10-23 Thread Ted Harding
On 23-Oct-11 19:03:05, Carl Witthoft wrote:
> You misused nls().  Observe:
> 
> x<- 1:101
> y2 <- 5*exp(x/20) + runif(101)/100 # nls will NOT converge for perfect
> data.
> 
> nls(y2 ~ exp(A*x), start=list(A=.1))
> 
> Nonlinear regression model
>model:  y2 ~ exp(A * x)
> data:  parent.frame()
>A
> 0.06709
>   residual sum-of-squares: 136703
> 
> Number of iterations to convergence: 7
> Achieved convergence tolerance: 2.694e-06
> 
> Which is a lousy fit.  Compare with
> 
> nls(y2~B*exp(A*x), start=list(A=.1,B=.3))
> 
> Nonlinear regression model
>model:  y2 ~ B * exp(A * x)
> data:  parent.frame()
>  A B
> 0.050 5.001
>   residual sum-of-squares: 0.001398
> 
> Number of iterations to convergence: 13
> Achieved convergence tolerance: 5.073e-08
> 
> So either form works, but only one will give you a result
> that fits your original data.
> 
> 
> Henri Mone  gmail.com> writes:
> 
>  > I want to fit to my data an exponential function with following
>  > functional form:
>  > y=exp(a*x)
>  >
>  > I used the function "nls" but this gives me exponential fits with
>  > following functional form:
>  > y=l*exp(a*x)
>  >
>  > With "l" being an scaling factor. What do I need to change in my R
> code?
>  >
>  > t.dataFitModel=nls(t.dataForFitY ~exp(a*t.dataForFitX),
>  > data=t.dataForFit, start=list(a = 0.01242922), trace=TRUE, algorithm
> =
>  > "plinear")
>Use an algorithm other than "plinear", I think (admittedly this is 
> not at all clear from ?nls -- you would really have to go to the 
> references to find out).
> -- 
> Sent from my Cray XK6
> "Pendeo-navem mei anguillae plena est."

Of course fitting y2 ~ 1.0*exp(A*x) to datatiplicativel generated by

  y2 <- 5*exp(x/20) + runif(101)/100

will result in a bad fit! Henri's original query stated
that he wanted to fit y ~ exp(A*x), and I presume he had
a reason for not including a multiplicative constant as in
your y2 ~ B*exp(A*x). It may well be that he knows, for
some reason, that, in  B*exp(A*x), B must be 1, though
he was certainly not explicit about this !

Generating the data with 1.0*exp(x/20) and then using nls
in the form nls(y2 ~ exp(A*x) works perfectly:

  x  <- 1:101
  y2 <- 1.0*exp(x/20) + runif(101)/100

  nls(y2 ~ exp(A*x), start=list(A=.1))

  # Nonlinear regression model
  #   model:  y2 ~ exp(A * x) 
  #data:  parent.frame() 
  #A 
  # 0.05 
  #  residual sum-of-squares: 0.002608
  #
  # Number of iterations to convergence: 9 
  # Achieved convergence tolerance: 5.962e-08

So I would not say he was "misusing nls", since we do not have
information about his data. Throwing up a counter-eaxample
where the data are deliberately generated so as to be impossible
to fit with the formula he wants to use (for whatever reason)
is not a good argument!

Hoping this helps,
Ted.



E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 23-Oct-11   Time: 21:22:53
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save an R object to a remote computer

2011-10-23 Thread Molly Davies
Thank you for the suggestions! I'm actually running simulations in R over two 
separate networks (and thus will need to smile sweetly (and authentically, of 
course!) twice). In one, I am ssh-ing from my Mac laptop into one Mac machine 
with 32 cpu. In the other, I am ssh-ing into a cluster and am executing my 
simulation scripts via qsub. 

I will take a look at SSHFS!

-Molly


On Oct 23, 2011, at 12:03 PM, Barry wrote:

> On Sun, Oct 23, 2011 at 7:46 PM, Molly wrote:
>> Hello,
>> 
>> I am running R remotely on my university's network from my laptop (Macbook 
>> Pro, running leopard, in case this is useful). I have a strict limit on how 
>> much disk space I can take up on my network account at school, which is 
>> insufficient for the size of some of the objects I need to create. Is there 
>> any way to use save() and write.table() in R to export directly to a remote 
>> machine (in this case that would be my laptop, which has plenty of room)? I 
>> need to save vectors of lists of lists (output from mclapply). So far, my 
>> search has led me to various database utilities. I suppose I could try to 
>> make that work, but I've no experience with databases and am unsure if that 
>> is the best way for me to go.
>> Any advice (including search terms I might not have thought of yet) would be 
>> much appreciated.
> 
> The easiest way would be to do this at the operating system level -
> not using R. Some kind of shared file system between the computers.
> How are you running R remotely? What OS is the remote machine? Are you
> using connecting to a Windows machine via RDP (Remote Desktop) or a
> Linux box with SSH or something else?
> 
> With RDP its possible to tell the remote desktop program to mount
> local drives (such as hard disks or USB drives) as extra drives on the
> remote windows box (so you see extra K:,J: etc drives). Then you'd
> just get R to write to K:\something\ and its going straight on your
> laptop.
> 
> For a remote Linux machine you might be able to use SSHFS to create a
> connection to your laptop from the server.
> 
> However these things will probably rely on things being installed on
> the server and possibly friendly network technicians. Sometimes the
> real easiest solution involves smiling sweetly to the people who
> control the disk space, and if that doesn't work get your supervisor
> to try it...
> 
> I'll just check you're not one of our students...
> 
> Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] symmetric matrix multiplication

2011-10-23 Thread statfan
Thank you Dan and Ted for these helpful comments. I will implement this
simple force symmetry code you suggested and make sure I familiarize with
this floating-point calculation problem so I can recognize such issues in
the future.

--
View this message in context: 
http://r.789695.n4.nabble.com/symmetric-matrix-multiplication-tp3929739p3930832.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread Tyler Rinker

This could be done with aggregate but I am unfamiliar with it so I'll give what 
I think you want from your message using the library 'reshape' that you'll have 
to doneload.  If you're problem is large the data.table library would be much 
faster.
 
You haven't really said what you'd like to get from the output so I'm going by 
what your code looks like you want. There is no count in R, the function is 
called length (you may want sum but it does not appear that way).  Also giving 
the list a bit of what you'd expect for an out put is often helpful.
 
Here is the code(one of these three options is what you want I think:
 
library(reshape)
throughput1 <- cast(df, time~partitioning_mode, value="runtime",  length)
throughput2 <- cast(df, partitioning_mode~time, value="runtime",  length)
throughput3 <- cast(df, partitioning_mode + workload~time, value="runtime", 
length)


> From: brave...@gmail.com
> Date: Sun, 23 Oct 2011 19:29:40 +0200
> To: r-help@r-project.org
> Subject: [R] summarizing a data frame i.e. count -> group by
>
> Hello,
>
> This is one problem at the time :)
>
> I have a data frame df that looks like this:
>
> time partitioning_mode workload runtime
> 1 1 sharding query 607
> 2 1 sharding query 85
> 3 1 sharding query 52
> 4 1 sharding query 79
> 5 1 sharding query 77
> 6 1 sharding query 67
> 7 1 sharding query 98
> 8 1 sharding refresh 2932
> 9 1 sharding refresh 2870
> 10 1 sharding refresh 2877
> 11 1 sharding refresh 2868
> 12 1 replication query 2891
> 13 1 replication query 2907
> 14 1 replication query 2922
> 15 1 replication query 2937
>
> and if I could use SQL ... omg! I really wish I could! I would do exactly 
> this:
>
> insert into throughput
> select time, partitioning_mode, count(*)
> from data.frame
> group by time, partitioning_mode
>
> My attempted R versions are wrong and produce very cryptic error message:
>
> > throughput <- aggregate(x=df[,c("time", "partitioning_mode")], 
> > by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
> incorrect number of dimensions
>
> > throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
> incorrect number of dimensions
>
> >throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), 
> >FUN=count)
> I cant comprehend what comes out from this one ... :(
>
> and I thought C++ template errors were the most cryptic ;P
>
> Many many thanks in advance,
> Best regards,
> Giovanni
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Summary stats in table

2011-10-23 Thread Tyler Rinker

I had to set it up as a data frame and then it workd beautifully with the 
reshape package.  
 
DF<-data.frame(A,B,x)
library(reshape)

cast(DF, A ~ B, fun.aggregate=mean, 
 margins=c("grand_row", "grand_col"))
 
Cheers
Tyler

> Date: Sun, 23 Oct 2011 14:39:08 -0400
> From: murdoch.dun...@gmail.com
> To: R-help@r-project.org
> Subject: [R] Summary stats in table
>
> Suppose I have data like this:
>
> A <- sample(letters[1:3], 1000, replace=TRUE)
> B <- sample(LETTERS[1:2], 1000, replace=TRUE)
> x <- rnorm(1000)
>
> I can get a table of means via
>
> tapply(x, list(A, B), mean)
>
> and I can add the marginal means to this using cbind/rbind:
>
> main <- tapply(x, list(A,B), mean)
> Amargin <- tapply(x, list(A), mean)
> Bmargin <- tapply(x, list(B), mean)
>
> rbind(cbind(main, all=Amargin),all=c(Bmargin, mean(x)))
>
> But this is tedious. Has some package got some code that makes this easier?
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to delete rows by a list of rownames

2011-10-23 Thread hanansela
Thank you, Michael
This is what i need. It works fine

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-delete-rows-by-a-list-of-rownames-tp3930206p3931200.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating 2 week intervals (lubridate)

2011-10-23 Thread sparklegirl100

Hello,
I have a list of dates in which I am going use for a time series analysis. I 
want to break these dates up into 2 week intervals and count the number of 
times a date appears in this interval.
For example from Nov. 19, 2000 to Dec 2 ,2000 with the data listed below I want 
to return
Start_date  Count2000/11/19 4
Date: 2000/11/20 2000/11/21  2000/11/19 2000/11/29  
My first approach was toa vector of the two week time periods (like below) and 
then match my dates with the intervals. But this did not work.
DataInterval<-rep(NA, 316)
TwoWeekint<-function(date) { for (i in 1:316)new.day<-i*14
DataInterval[i]<-as.interval(new_period(days=new.day), ymd("2000-11-17"))
  }
Thanks!
RK
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cex multiplier not exact

2011-10-23 Thread Ali Tofigh
Hi,

When I plot text and use cex to change the text size, I notice that the cex
multiplier is not exact. It looks as if the real size of text can take only
certain discrete values. Is there a workaround to get text to follow the cex
value more closely, or at least to be able to figure out what the real cex value
will be?

Here is an example that illustrates the problem:

cex <- seq(0.5, 1, 0.01)
x <- 0.05
y <- (1:length(cex))/length(cex)
labels <- paste("XX", cex)
plot.new()
text(x=x, y=y, labels=labels, pos=4, cex=cex)

/Ali

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] code review: is it too much to ask?

2011-10-23 Thread Giovanni Azua
Hello all,

I really appreciate how helpful the people in this list are. Would it be too 
much to ask to send a small script to have it peer-reviewed? to make sure I am 
not making blatant mistakes? The script takes an experiment.dat as input and 
generates system Throughput using ggplot2. It works now ... [sigh] but I have 
this nasty feeling that I might be doing something wrong :). Changing "samples" 
i.e. number of samples per group produces arbitrarily different results, I 
basically increased it (until 9) until there were no strongly deterministic 
periodicities. This is not a full-fledge experiment but just a preliminary 
report that will show I have implemented a healthy system. Proper experimental 
analysis comes after varying factors according to the 2^k*r experimental design 
etc 

Some key points I would like to find out:
- aggregation is not breaking the natural order of the measurements i.e. if 
there are 20 runtimes taken in that order, and I make groups of 10 measurements 
(to compute statistics on them) the first group must contain the first 10 
runtimes and the second group must contain the second 10 runtimes. I am not 
sure if the choice of aggregation etc is respecting this.
- I am not sure if it is best to do the binning by filling the bins by time 
intervals of by number of observations.

Your help will be greatly appreciated!

I have the data too and the plots look very nice but it is a 4mb file.

TIA
Best regards,
Giovanni

# 
=
# Advanced Systems Lab 
# Milestone 1
# Author: Giovanni Azua
# Date: 22 October 2011
# 
=

rm(list=ls())# clear 
workspace

library(boot)# use boot 
library
library(ggplot2) # use 
ggplot2 library
library(doBy)# use doBy 
library

# 
=
# ETL Step
# 
=

data_file <- file("/Users/bravegag/code/asl11/trunk/report/experiment.dat")
df <- read.table(data_file)  # reads 
the data as data frame
class(df)# show the 
class to be 'list' 
names(df)# data is 
prepared correcly in Python
str(df)
head(df)

names(df)[names(df)=="V1"] <- "Time" # change 
column names
names(df)[names(df)=="V2"] <- "Partitioning"
names(df)[names(df)=="V3"] <- "Workload"
names(df)[names(df)=="V4"] <- "Runtime"
str(df)
head(df)

# 
=
# Define utility functions
# 
=

se <- function(x) sqrt(var(x)/length(x))
sst <- function(x) sum(x-mean(x))^2

##  COPIED FROM 

## 
http://wiki.stdout.org/rcookbook/Graphs/Plotting%20means%20and%20error%20bars%20%28ggplot2%29
## 
*
## Summarizes data.
## Gives count, mean, standard deviation, standard error of the mean, and 
confidence interval (default 95%).
## If there are within-subject variables, calculate adjusted values using 
method from Morey (2008).
##   data: a data frame.
##   measurevar: the name of a column that contains the variable to be 
summariezed
##   groupvars: a vector containing names of columns that contain grouping 
variables
##   na.rm: a boolean that indicates whether to ignore NA's
##   conf.interval: the percent range of the confidence interval (default is 
95%)
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE, 
conf.interval=.95) {
require(doBy)

# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else   length(x)
}

# Collapse the data
formula <- as.formula(paste(measurevar, paste(groupvars, collapse=" + "), 
sep=" ~ "))
datac <- summaryBy(formula, data=data, FUN=c(length2,mean,sd), na.rm=na.rm)

# Rename columns
names(datac)[ names(datac) == paste(measurevar, ".mean", sep="") ] <- 
measurevar
names(datac)[ names(datac) == paste(measurevar, ".sd", sep="") ] <- "sd"
names(datac)[ names(datac) == paste(measurevar, ".length2", sep="") ] <- "N"

datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

# Confidence interval multiplier for

Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread Frank Harrell
You also did unaccounted for stepwise selection.  Regarding the proportional
odds assumption, if you assessed it correctly, something that is not
operating proportionally would have to be associated with the outcome for at
least one cutoff of Y, so you could say that you are doing reverse screening
that will need to be accounted for in resampling.
Frank

apeer wrote:
> 
> I guess I must be misunderstanding the point of checking the ordinality
> assumptions prior to fitting a model.  Are you saying that a response
> variable that does not behave in an ordinal fashion can still be included
> in the initial and final model?
> 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931493.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] diptest

2011-10-23 Thread Matilde
I am a very new user of R and I need some suggestions on how perform a
diptest.
I dowloaded the package diptest.  following instructions given by the file I
attach R I performed a diptest on the dataset statfaculty.

However I do not manage to do it with my dataset that consists of a single
column of numbers. 
Moreover different procedures are indicated in the file I attach and I do
not understand how can i choose between them.
I also think p value should be interpreted according to the numerosity of
data, again i do not find any clear indication about it!

Any help is very welcome
M.

--
View this message in context: 
http://r.789695.n4.nabble.com/diptest-tp3931400p3931400.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] manova/tukey test

2011-10-23 Thread Molly MacLeod
Hello,

I am trying to do a manova test in r, and have used the "manova" function to
test differences between two dependent variables. The results were
significant for the whole model, but the sources I've read say that in order
to do a post-hoc multiple comparison, I have to do separate aovs for each
dependent variable, then call the TukeyHSD function.

I have used the TukeyHSD with success in the past, but now when I try it,
the only output I get says "height." I want to find out whether the
dependent variable, "sum_enemies", is different across plant species
"early_plant", and which plant species are significantly different from ea
other.

Below is the code i used to call the TukeyHSD:


early_anova=aov(formula=early_data$sum_enemies~early_plant, data=early_data)


early_tukey=TukeyHSD(early_anova,"early_plant", ordered=TRUE)



 When I ask R to plot the results for my tukey test (which I've named
"early_tukey"), it says "Error in vcov.default(early_tukey) :
  object does not have variance-covariance matrix."

I'm not sure why I'm getting this error. I was able to print the var-covar
matrix for the aov, but not for the tukey. Whenever I've used TukeyHSD in
the past, I had no problems.

Is there an apparent error in the code I've written above? Or can anyone
suggest another place I can look?

Thanks,

Molly

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating 2 week intervals (lubridate)

2011-10-23 Thread David Winsemius


On Oct 23, 2011, at 4:18 PM,  > wrote:




Hello,
I have a list of dates in which I am going use for a time series  
analysis. I want to break these dates up into 2 week intervals and  
count the number of times a date appears in this interval.
For example from Nov. 19, 2000 to Dec 2 ,2000 with the data listed  
below I want to return

Start_date  Count2000/11/19 4
Date: 2000/11/20 2000/11/21  2000/11/19 2000/11/29
My first approach was toa vector of the two week time periods (like  
below) and then match my dates with the intervals. But this did not  
work.

DataInterval<-rep(NA, 316)
TwoWeekint<-function(date) { for (i in 1:316)new.day<- 
i*14DataInterval[i]<-as.interval(new_period(days=new.day),  
ymd("2000-11-17"))

 }
Thanks!


(I must note I was very much tempted to ignore this post because of  
the mail address.)


See if this helps:

findInterval(as.Date("2000-11-01")+0:60,
   seq(from =as.Date("2000-11-20"), to=as.Date("2000-12-31"),  
by="2 week")

)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1  
1 2 2 2 2 2 2

[40] 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3



RK  
[[alternative HTML version deleted]]


Please post in plain text.




--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread jim holtman
Another package to consider, especially if your dataframe is large, is
'data.table':

> tp <- read.table(textConnection(" time partitioning_mode workload runtime
+ 1 1  shardingquery 607
+ 2 1  shardingquery  85
+ 3 1  shardingquery  52
+ 4 1  shardingquery  79
+ 5 1  shardingquery  77
+ 6 1  shardingquery  67
+ 7 1  shardingquery  98
+ 8 1  sharding  refresh2932
+ 9 1  sharding  refresh2870
+ 101  sharding  refresh2877
+ 111  sharding  refresh2868
+ 121   replicationquery2891
+ 131   replicationquery2907
+ 141   replicationquery2922
+ 151   replicationquery2937"), as.is = TRUE, header = TRUE)
> closeAllConnections()
>
> require(data.table)
Loading required package: data.table
data.table 1.7.1  For help type: help("data.table")
> tp <- data.table(tp)
> tp[
+ , list(workload = workload
+ , runtime = runtime
+ , thruput = length(runtime)
+ )
+ , by = list(time, partitioning_mode)
+ ]
  time partitioning_mode workload runtime thruput
 [1,]1  shardingquery 607  11
 [2,]1  shardingquery  85  11
 [3,]1  shardingquery  52  11
 [4,]1  shardingquery  79  11
 [5,]1  shardingquery  77  11
 [6,]1  shardingquery  67  11
 [7,]1  shardingquery  98  11
 [8,]1  sharding  refresh2932  11
 [9,]1  sharding  refresh2870  11
[10,]1  sharding  refresh2877  11
[11,]1  sharding  refresh2868  11
[12,]1   replicationquery2891   4
[13,]1   replicationquery2907   4
[14,]1   replicationquery2922   4
[15,]1   replicationquery2937   4


On Sun, Oct 23, 2011 at 1:29 PM, Giovanni Azua  wrote:
> Hello,
>
> This is one problem at the time :)
>
> I have a data frame df that looks like this:
>
>  time partitioning_mode workload runtime
> 1     1          sharding    query     607
> 2     1          sharding    query      85
> 3     1          sharding    query      52
> 4     1          sharding    query      79
> 5     1          sharding    query      77
> 6     1          sharding    query      67
> 7     1          sharding    query      98
> 8     1          sharding  refresh    2932
> 9     1          sharding  refresh    2870
> 10    1          sharding  refresh    2877
> 11    1          sharding  refresh    2868
> 12    1       replication    query    2891
> 13    1       replication    query    2907
> 14    1       replication    query    2922
> 15    1       replication    query    2937
>
> and if I could use SQL ... omg! I really wish I could! I would do exactly 
> this:
>
> insert into throughput
>  select time, partitioning_mode, count(*)
>  from data.frame
>  group by time, partitioning_mode
>
> My attempted R versions are wrong and produce very cryptic error message:
>
>> throughput <- aggregate(x=df[,c("time", "partitioning_mode")], 
>> by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
>  incorrect number of dimensions
>
>> throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
>  incorrect number of dimensions
>
>>throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), 
>>FUN=count)
> I cant comprehend what comes out from this one ... :(
>
> and I thought C++ template errors were the most cryptic ;P
>
> Many many thanks in advance,
> Best regards,
> Giovanni
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread apeer
Does your point about proportionality also hold for ordinality?  In other
words, if I have several X variables that do not behave in an ordinal
fashion with Y, should I still include them in the full model?  My
understanding or perhaps misunderstanding of the ordinality assumption was
that all X variables included in the model should behave in an ordinal
fashion with Y.  Is that not the case?

--
View this message in context: 
http://r.789695.n4.nabble.com/interpreting-bootstrap-corrected-slope-rms-package-tp3928314p3931594.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RGtk2 problems

2011-10-23 Thread Aref Nammari
Hello,

I hope this is the right place to ask for help with a problem I am
having with RGtk2 installation with R on Windows XP.
I am running R 2.11.1 and have installed the package RGtk2 from CRAN.
I also have GTK 2.10.11 installed as well as GTK2-runtime 2.22.0. I
have added the environment variable GTK_PATH and set its value to the
root location where GTK is installed. When I try to run RGtk2 in R by
typing library(RGtk2) a popup dialog appears with the following error
message:

The procedure entry point gdk_app_launch_context_get_type could not be
located in the dynamic link library libgdk-win32-2.0-0.dll

In the R window I get :

Error in inDL(x, as.logical(local), as.logical(now), ...) :
  unable to load shared library 'C:/PROGRA~1/R/R-211~1.1/library/RGtk2/
libs/RGtk2.dll':
  LoadLibrary failure:  The specified procedure could not be found.

Failed to load RGtk2 dynamic library, attempting to install it.
Error : .onLoad failed in loadNamespace() for 'RGtk2', details:
  call: install_all()
  error: This platform is not yet supported by the automatic
installer. Please install GTK+ manually, if necessary. See:
http://www.gtk.org
Error: package/namespace load failed for 'RGtk2'

Any help in figuring out what could be the problem is greatly
appreciated.

Cheers,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interpreting bootstrap corrected slope [rms package]

2011-10-23 Thread David Winsemius


On Oct 23, 2011, at 7:37 PM, apeer wrote:

Does your point about proportionality also hold for ordinality?  In  
other

words, if I have several X variables that do not behave in an ordinal
fashion with Y, should I still include them in the full model?  My
understanding or perhaps misunderstanding of the ordinality  
assumption was

that all X variables included in the model should behave in an ordinal
fashion with Y.  Is that not the case?


Why should non-monotonic relationships be discarded? Are you implying  
they are impossible from a scientific perspective?


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Segfault and bad output with fOptions::rnorm.sobol

2011-10-23 Thread Robert McDonald
On Sun, Oct 23, 2011 at 11:48 AM, Robert McDonald wrote:

>  I think your question is answered by
>>
>> http://cran.r-project.org/web/packages/fOptions/ChangeLog
>>
>> 2010-04-23  chalabi
>>
>>* ChangeLog, DESCRIPTION: updated DESCR and ChangeLog
>>* src/085A-LowDiscrepancy.f: fixed sobol RVS on 64 bit platform
>>* ChangeLog, DESCRIPTION: updated DESC and ChangeLog
>>
>>  The middle item seems to address your problem exactly.
>>  That fix is 18 months old, so updating might be a good idea ...
>>
>>  Ben Bolker
>
>
> Ben, thanks very much for the pointer but the bug must not really be fixed.
> I encounter the problem when using version 2140.79, which is the version at
> http://cran.r-project.org/web/packages/fOptions/ and which has a date of
> 2011-06-08.
>
> Oddly I have a machine with (what I thought was) an up-to-date version of
> 13.2 which includes fOptions version 2110.78, dated 2010-04-27. This is the
> one install on which I have *not* encountered the problem. Very strange!
>
> Bob
>

Correction. The bug does occur with version 2110.78, but less frequently.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread Dennis Murphy
And the plyr version of this would be (using DF as the data frame name)

## transform method, mapping length(runtime) to all observations
## similar to David's results:
library('plyr')
ddply(DF, .(time, partitioning_mode), transform, n = length(runtime))
# or equivalently, the newer and somewhat faster
ddply(DF, .(time, partitioning_mode), mutate, n = length(runtime))

# If you just want the counts, then use

ddply(DF, .(time, partitioning_mode), summarise, n = length(runtime))

##-
# Just for fun, here's the equivalent SQL call using sqldf():

library('sqldf')
sqldf('select time partitioning_mode count(*) from DF group by time
partitioning_mode')

# which you can distribute over multiple lines for readability, e.g.

sqldf('select time, partitioning_mode, count(*) as n
  from DF
  group by time, partitioning_mode')

# Result:
  time partitioning_mode  n
11   replication  4
21  sharding 11

##-
# To do the same type of summary in data.table (to follow up on Jim
Holtman's post), here's one way:

library(data.table)
dt <- data.table(DF, key = 'time, partitioning_mode')
dt[, list(n = length(runtime)), by = key(dt)]
 time partitioning_mode  n
[1,]1   replication  4
[2,]1  sharding 11


###--
HTH,
Dennis


On Sun, Oct 23, 2011 at 10:29 AM, Giovanni Azua  wrote:
> Hello,
>
> This is one problem at the time :)
>
> I have a data frame df that looks like this:
>
>  time partitioning_mode workload runtime
> 1     1          sharding    query     607
> 2     1          sharding    query      85
> 3     1          sharding    query      52
> 4     1          sharding    query      79
> 5     1          sharding    query      77
> 6     1          sharding    query      67
> 7     1          sharding    query      98
> 8     1          sharding  refresh    2932
> 9     1          sharding  refresh    2870
> 10    1          sharding  refresh    2877
> 11    1          sharding  refresh    2868
> 12    1       replication    query    2891
> 13    1       replication    query    2907
> 14    1       replication    query    2922
> 15    1       replication    query    2937
>
> and if I could use SQL ... omg! I really wish I could! I would do exactly 
> this:
>
> insert into throughput
>  select time, partitioning_mode, count(*)
>  from data.frame
>  group by time, partitioning_mode
>
> My attempted R versions are wrong and produce very cryptic error message:
>
>> throughput <- aggregate(x=df[,c("time", "partitioning_mode")], 
>> by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
>  incorrect number of dimensions
>
>> throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
> Error in `[.default`(df2, u_id, , drop = FALSE) :
>  incorrect number of dimensions
>
>>throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), 
>>FUN=count)
> I cant comprehend what comes out from this one ... :(
>
> and I thought C++ template errors were the most cryptic ;P
>
> Many many thanks in advance,
> Best regards,
> Giovanni
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Offtopic sourceforge problems?

2011-10-23 Thread Erin Hodgess
Dear R People:

I've been trying to get  R Portable from sourceforge.net all day today
and there is a problem accessing the site.

Has anyone else run into that, please?

Thank you!
Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGtk2 problems

2011-10-23 Thread Prof Brian Ripley
Please update your R (and probably your RGtk2: you did not tell us its 
version), as the posting guide asked you to do before posting.


On Sun, 23 Oct 2011, Aref Nammari wrote:


Hello,

I hope this is the right place to ask for help with a problem I am
having with RGtk2 installation with R on Windows XP.
I am running R 2.11.1 and have installed the package RGtk2 from CRAN.


As a binary package, I guess, but please tell us (it matters).


I also have GTK 2.10.11 installed as well as GTK2-runtime 2.22.0. I
have added the environment variable GTK_PATH and set its value to the
root location where GTK is installed.


But you need the Gtk+ bin directory in your PATH.  Environment 
variable GTK_PATH is only needed when RGtk2 is installed from the 
sources.


Which Gtk+ you need in your path depends on the version of RGtk2 you 
have and how you installed it.  For current binary versions, see


http://cran.r-project.org/bin/windows/contrib/2.13/@ReadMe


When I try to run RGtk2 in R by
typing library(RGtk2) a popup dialog appears with the following error
message:

The procedure entry point gdk_app_launch_context_get_type could not be
located in the dynamic link library libgdk-win32-2.0-0.dll

In the R window I get :

Error in inDL(x, as.logical(local), as.logical(now), ...) :
 unable to load shared library 'C:/PROGRA~1/R/R-211~1.1/library/RGtk2/
libs/RGtk2.dll':
 LoadLibrary failure:  The specified procedure could not be found.

Failed to load RGtk2 dynamic library, attempting to install it.
Error : .onLoad failed in loadNamespace() for 'RGtk2', details:
 call: install_all()
 error: This platform is not yet supported by the automatic
installer. Please install GTK+ manually, if necessary. See:
http://www.gtk.org
Error: package/namespace load failed for 'RGtk2'

Any help in figuring out what could be the problem is greatly
appreciated.

Cheers,

[[alternative HTML version deleted]]


Please do as the posting guide asked of you and not send HTML.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Date column in downloaded date

2011-10-23 Thread ajc
Hi All:

If I download yahoo data by getSymbols() in R, the date column gets
accompanied along with the downloaded data. There is no column header for
the date column to access separately.
What is the way to eliminate the date column?

If I want to draw a xy scatter plot with the downloaded price (suppose AAPL
vs NASDAQ), I think the date column is creating problem and the plot
function is not working.

Please advise on this as I am very new to R.

Thanks.
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Date-column-in-downloaded-date-tp3932125p3932125.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zoo arithmetics

2011-10-23 Thread Hugo Mildenberger
Dear list members,

what is the reason that one obviously can't do arithmetic operations on
zoo members with different index positions?


   > require(zoo)
   > z <- zoo(c(1,1,1),order.by=c(1,2,3))
   > z
   1 2 3
   1 1 1
   > z[1]   + z[1]
   1
   2
   > z[1:2] + z[1:2]
   1 2
   2 2
   > z[1] + z[2]
   Data:
   numeric(0)

   Index:
   numeric(0)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to delete rows using conditions on all columns

2011-10-23 Thread Aher
n <- 10
P1 <- runif(n)
P2 <- runif(n)
P3 <- P1 + P2 + runif(n)/100
P4 <- P1 + P2 + P3 + runif(n)/100
mydata <- data.frame(cbind(P1,P2,P3,P4))
mydata[1,1] <- 8
mydata[3,1] <- -5
mydata[2,3] <- -6
mydata[7,3] <- 7

f=function(z){quantile(z, c(0.01, 0.99)) }

temp1 <- lapply(mydata, f)
temp1
$P1
   1%   99% 
-4.542391  7.354209 

$P2
1%99% 
0.03452814 0.61029804 

$P3
   1%   99% 
-5.423229  6.498828 

$P4
   1%   99% 
0.7825967 2.8454615

I want to remove rows based on the conditions on the columns as stored in
the vector temp1. Any row containing value less than 1% and value greater
than 99% need to be removed for each of the variable.
How this can be achieved.

Thanks for the help in advance.
Regards,
-Aher


--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-delete-rows-using-conditions-on-all-columns-tp3932027p3932027.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with calling an user defined R function from Java

2011-10-23 Thread Surajit
Dear All,

I am facing a problem in calling an user defined R function from Java
through JRI. The user defined R function does a loess normalization on micro
array data ( find in the limma package of BioConductor) and the last 2 lines
of the R Code is :

MA <- normalizeWithinArrays(RG, method="loess")
return(MA$A) # where MA$A is the column A of MA.

Now while calling it in Java, I simply use the following code:
String data;
data="C:/Project_WRAIR/US09493743_251527910706_1_1 T10-105_5day_24hrs";
//where data is the input.
re.eval("source('C:/Project_WRAIR/Normalisation.r')");
System.out.println(re);
REXP rn = re.eval("my.Normal(data)");
System.out.println(rn);
double[] rnd = rn.asDoubleArray();
  for(int i=0; ihttp://r.789695.n4.nabble.com/Problem-with-calling-an-user-defined-R-function-from-Java-tp3932071p3932071.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to get intecerpt standard error in PLS

2011-10-23 Thread arunkumar1111
Hi

 how do we get intercepts standard error. I'm using the package pls.
i got the coefficient but not able to get the stabdard error






--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-get-intecerpt-standard-error-in-PLS-tp3932104p3932104.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.