date:20121208

Re: [R] file.link fails on NTFS

2012-12-08 Thread Rui Barradas


Hello,

Checks. It seems like a Windows specific bug, it works on Ubuntu 12.04/R 
2.15.2. I'll post to R-devel.


> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base



Rui Barradas
Em 08-12-2012 06:07, Oliver Soong escreveu:

Windows 7 64-bit, R 2.15.2 i386.  Working directory is on an NTFS drive.


writeLines("", "file.txt")
file.link("file.txt", "link.txt")

Warning in file.link("file.txt", "link.txt") :
   cannot link 'link.txt' to 'link.txt', reason 'The system cannot find
the file specified'

No link is created.  The 'link.txt' to 'link.txt' is suspicious.  Does
this happen to anybody else?  I didn't find anything in my searches.

Oliver

__
R-help@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Marius Hofert

Dear expeRts,

I have two matrices A and B. They have the same number of columns but possibly 
different number of rows. I would like to compare each row of A with each row 
of B and check whether all entries in a row of A are less than or equal to all 
entries in a row of B. Here is a minimal working example:

A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
B <- matrix(1:10, ncol=2) # (5, 2) matrix
( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) # (3, 
5) = (nrow(A), nrow(B)) matrix

The question is: How can this be implemented more efficiently in R, that is, in 
a faster way?

Thanks & cheers,

Marius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sampling from a Population

2012-12-08 Thread Lorenzo Isella


Dear All,
I hope this is not too off topic, but I am sure it has to be a one-liner  
in R.
Suppose you have a population of size N and that you take a random sample  
of n_s individuals out of this population.

This population includes a subgroup of n_i individuals.
For any individual in n_i, what is the probability of being included in  
the sample n_s?

Many thanks.

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sampling from a Population

2012-12-08 Thread R. Michael Weylandt

Hi Lorenzo,

This has the feel of a homework problem, but I will suggest to you
that this is "sampling without replacement" and there exist easy
mathematical formulas (no need to resort to R) to calculate your
desired probability.

Michael

On Sat, Dec 8, 2012 at 11:54 AM, Lorenzo Isella
 wrote:
> Dear All,
> I hope this is not too off topic, but I am sure it has to be a one-liner in
> R.
> Suppose you have a population of size N and that you take a random sample of
> n_s individuals out of this population.
> This population includes a subgroup of n_i individuals.
> For any individual in n_i, what is the probability of being included in the
> sample n_s?
> Many thanks.
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Thomas Stewart

One option is to consider a Kronecker-type expansion.  See code below.
-tgs

perhaps <- function(A,B){
nA <- nrow(A)
nB <- nrow(B)

C <-
kronecker(matrix(1,nrow=nA,ncol=1),B) >=
kronecker(A,matrix(1,nrow=nB,ncol=1))

matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

Marius <- function(A,B) apply(B, 1, function(b) apply(A, 1, function(a)
all(a <= b)))

N <- 1000
M <- 5
P <- 5000
A <- matrix(runif(N,1,1000),nrow=N,ncol=M)
B <- matrix(runif(M,1,1000),nrow=P,ncol=M)


system.time(perhaps(A,B))
system.time(Marius(A,B))


On Sat, Dec 8, 2012 at 6:28 AM, Marius Hofert wrote:

> Dear expeRts,
>
> I have two matrices A and B. They have the same number of columns but
> possibly different number of rows. I would like to compare each row of A
> with each row of B and check whether all entries in a row of A are less
> than or equal to all entries in a row of B. Here is a minimal working
> example:
>
> A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
> B <- matrix(1:10, ncol=2) # (5, 2) matrix
> ( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) #
> (3, 5) = (nrow(A), nrow(B)) matrix
>
> The question is: How can this be implemented more efficiently in R, that
> is, in a faster way?
>
> Thanks & cheers,
>
> Marius
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Hofert Jan Marius

Nice idea, Thomas, thanks. I could further decrease run time a bit, by building 
the required matrices by hand.

Any other ideas?

Marius <- function(A, B) apply(B, 1, function(b) apply(A, 1, function(a) all(a 
<= b)))

perhaps <- function(A, B){
nA <- nrow(A)
nB <- nrow(B)
C <- kronecker(matrix(1, nrow=nA, ncol=1), B) >= kronecker(A, matrix(1, 
nrow=nB, ncol=1))
matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

Marius.2.0 <- function(A, B){
nA <- nrow(A)
nB <- nrow(B)
C <- do.call(rbind, rep(list(B), nA)) >= matrix(rep(A, each=nB), 
ncol=ncol(B))
matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

M <- 5
N <- 1000
P <- 5000
A <- matrix(runif(N,1,1000), nrow=N, ncol=M)
B <- matrix(runif(M,1,1000), nrow=P, ncol=M)

system.time(Marius(A, B))[[3]] # ~ 18s
system.time(foo <- perhaps(A, B))[[3]] # ~ 1.4s
system.time(bar <- Marius.2.0(A, B))[[3]] # ~ 1s
stopifnot(all.equal(foo, bar))





From: tgstew...@gmail.com [tgstew...@gmail.com] on behalf of Thomas Stewart 
[tgs.public.m...@gmail.com]
Sent: Saturday, December 08, 2012 3:46 PM
To: Hofert Jan Marius
Cc: mailman, r-help
Subject: Re: [R] How to efficiently compare each row in a matrix with each row 
in another matrix?

One option is to consider a Kronecker-type expansion.  See code below.
-tgs

perhaps <- function(A,B){
nA <- nrow(A)
nB <- nrow(B)

C <-
kronecker(matrix(1,nrow=nA,ncol=1),B) >=
kronecker(A,matrix(1,nrow=nB,ncol=1))

matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

Marius <- function(A,B) apply(B, 1, function(b) apply(A, 1, function(a) all(a 
<= b)))

N <- 1000
M <- 5
P <- 5000
A <- matrix(runif(N,1,1000),nrow=N,ncol=M)
B <- matrix(runif(M,1,1000),nrow=P,ncol=M)


system.time(perhaps(A,B))
system.time(Marius(A,B))


On Sat, Dec 8, 2012 at 6:28 AM, Marius Hofert 
mailto:marius.hof...@math.ethz.ch>> wrote:
Dear expeRts,

I have two matrices A and B. They have the same number of columns but possibly 
different number of rows. I would like to compare each row of A with each row 
of B and check whether all entries in a row of A are less than or equal to all 
entries in a row of B. Here is a minimal working example:

A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
B <- matrix(1:10, ncol=2) # (5, 2) matrix
( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) # (3, 
5) = (nrow(A), nrow(B)) matrix

The question is: How can this be implemented more efficiently in R, that is, in 
a faster way?

Thanks & cheers,

Marius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] KMP String search

2012-12-08 Thread email

Hi:

Is there any Package in R which implements the KMP String search algorithm ?

Thanks
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread arun

Hi,

May be this:
N <- 1000
M <- 5
P <- 5000
set.seed(15)
A <- matrix(runif(N,1,1000),nrow=N,ncol=M)
set.seed(425)
B <- matrix(runif(M,1,1000),nrow=P,ncol=M)

Marius.3.0<-function(A,B){do.call(cbind,lapply(split(B,row(B)),function(x) 
colSums(x>=t(A))==ncol(A)))}
 system.time(Marius.3.0(A,B))
  # user  system elapsed 
 # 0.524   0.000   0.523 

system.time(Marius.2.0(A,B))
#   user  system elapsed 
 # 0.972   0.236   1.212 

system.time(perhaps(A,B))
  # user  system elapsed 
  #1.232   0.244   1.482 

system.time(Marius(A,B))
#   user  system elapsed 
# 19.266   0.000  19.298 

With the toy example:
A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
 B <- matrix(1:10, ncol=2) # (5, 2) matrix
 ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) 
ind
#  [,1]  [,2]  [,3]  [,4]  [,5]
#[1,]  TRUE  TRUE  TRUE  TRUE  TRUE
#[2,] FALSE FALSE  TRUE  TRUE  TRUE
#[3,] FALSE FALSE FALSE FALSE FALSE
 Marius.3.0(A,B)
# 1 2 3 4 5
#[1,]  TRUE  TRUE  TRUE  TRUE  TRUE
#[2,] FALSE FALSE  TRUE  TRUE  TRUE
#[3,] FALSE FALSE FALSE FALSE FALSE

 str(ind)
# logi [1:3, 1:5] TRUE FALSE FALSE TRUE FALSE FALSE ...
 str(Marius.3.0(A,B))
# logi [1:3, 1:5] TRUE FALSE FALSE TRUE FALSE FALSE ...
 #- attr(*, "dimnames")=List of 2
  #..$ : NULL
  #..$ : chr [1:5] "1" "2" "3" "4" ...
A.K.






- Original Message -
From: Marius Hofert 
To: R-help 
Cc: 
Sent: Saturday, December 8, 2012 6:28 AM
Subject: [R] How to efficiently compare each row in a matrix with each row in 
another matrix?

Dear expeRts,

I have two matrices A and B. They have the same number of columns but possibly 
different number of rows. I would like to compare each row of A with each row 
of B and check whether all entries in a row of A are less than or equal to all 
entries in a row of B. Here is a minimal working example:

A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
B <- matrix(1:10, ncol=2) # (5, 2) matrix
( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) # (3, 
5) = (nrow(A), nrow(B)) matrix

The question is: How can this be implemented more efficiently in R, that is, in 
a faster way?

Thanks & cheers,

Marius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread arun



Hi,

Just to add:
N <- 1000
M <- 5
P <- 5000
set.seed(15)
A <- matrix(runif(N,1,1000),nrow=N,ncol=M)
set.seed(425)
B <- matrix(runif(M,1,1000),nrow=P,ncol=M)

Marius.3.0<-function(A,B){do.call(cbind,lapply(split(B,row(B)),function(x) 
colSums(x>=t(A))==ncol(A)))}
Marius.2.0 <- function(A, B){
    nA <- nrow(A)
    nB <- nrow(B)
    C <- do.call(rbind, rep(list(B), nA)) >= matrix(rep(A, each=nB), 
ncol=ncol(B))
    matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

system.time(z3.0<-Marius.3.0(A,B))
#   user  system elapsed 
 # 0.524   0.020   0.548 
system.time(z2.0<-Marius.2.0(A,B))
#   user  system elapsed 
 # 0.968   0.216   1.189 
 system.time(z1<-perhaps(A,B))
#   user  system elapsed 
 # 1.264   0.204   1.473 

 attr(z3.0,"dim")<-dim(z2.0)
 identical(z3.0,z2.0)
#[1] TRUE
identical(z1,z3.0)
#[1] TRUE

A.K.



- Original Message -
From: Marius Hofert 
To: R-help 
Cc: 
Sent: Saturday, December 8, 2012 6:28 AM
Subject: [R] How to efficiently compare each row in a matrix with each row in 
another matrix?

Dear expeRts,

I have two matrices A and B. They have the same number of columns but possibly 
different number of rows. I would like to compare each row of A with each row 
of B and check whether all entries in a row of A are less than or equal to all 
entries in a row of B. Here is a minimal working example:

A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
B <- matrix(1:10, ncol=2) # (5, 2) matrix
( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) # (3, 
5) = (nrow(A), nrow(B)) matrix

The question is: How can this be implemented more efficiently in R, that is, in 
a faster way?

Thanks & cheers,

Marius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hi List,

I have spent more than 30 minutes, but failed to read in this file using the 
read.table() function. I could not figure out how to fix the following error.

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :   
line 1 did not have 6 elements

Any help would be be appreciated.

Thanks,

Pradip Muhuri


### below is the  reproducible example
xd1 <-  "raceage   percent  sepercent  flag_var
 Mexican 12-17  5.7926   0.64195  any
Puerto Rican 12-17  5.1975   0.24929  any
   Cuban 12-17  3.7977   1.00487  any
C-S American 12-17  4.3665   0.55329  any
   Dominican 12-17  1.8149   0.46677  any
 Spanish (Spain) 12-17  6.1971   0.98386  any
  Multi Hisp Eth 12-17  6.7006   1.12464  any
NH White 12-17  4.8442   0.08660  any
NH Black 12-17  3.6943   0.16045  any
NH AM-AK 12-17  9.6325   1.06100  any
   NH HI-OPI 12-17  3.9189   1.08047  any
NH Asian 12-17  1.9115   0.28432  any
  NH Multiracial 12-17  6.4255   0.51434  any
  Mexican 18-25  8.9284   0.73022  any
 Puerto Rican 18-25  6.1364   0.28394  any
Cuban 18-25  8.6782   1.45543  any
 C-S American 18-25  5.9360   0.59899  any
Dominican 18-25  7.7642   1.64553  any
  Spanish (Spain) 18-25  9.2632   1.15652  any
   Multi Hisp Eth 18-25 11.3566   1.79282  any
 NH White 18-25  8.6484   0.11866  any
 NH Black 18-25  7.5972   0.24926  any
 NH AM-AK 18-25 13.5041   1.57275  any
NH HI-OPI 18-25  8.0227   1.41348  any
 NH Asian 18-25  3.2701   0.32414  any
   NH Multiracial 18-25 10.6489   0.85105  any
  Mexican   26+  3.2110   0.51683  any
 Puerto Rican   26+  1.6273   0.15033  any
Cuban   26+  1.4419   0.44118  any
 C-S American   26+  1.0187   0.26594  any
Dominican   26+  0.9554   0.50275  any
  Spanish (Spain)   26+  2.5976   0.86230  any
   Multi Hisp Eth   26+  1.1345   0.66375  any
 NH White   26+  1.5510   0.04156  any
 NH Black   26+  2.8763   0.15133  any
 NH AM-AK   26+  3.9674   0.76611  any
NH HI-OPI   26+  1.2919   0.66205  any
 NH Asian   26+  0.7207   0.13870  any
   NH Multiracial   26+  3.0668   0.52334  any
  Mexican 12-17  4.3152   0.53235  mrj
 Puerto Rican 12-17  3.7237   0.20969  mrj
Cuban 12-17  2.0616   0.67248  mrj
 C-S American 12-17  3.3282   0.47392  mrj
Dominican 12-17  1.3797   0.40435  mrj
  Spanish (Spain) 12-17  5.1810   0.93979  mrj
   Multi Hisp Eth 12-17  4.8915   0.94816  mrj
 NH White 12-17  3.6190   0.07379  mrj
 NH Black 12-17  2.8196   0.14042  mrj
 NH AM-AK 12-17  6.5091   0.85124  mrj
NH HI-OPI 12-17  3.6267   1.06724  mrj
 NH Asian 12-17  1.3162   0.23575  mrj
   NH Multiracial 12-17  5.0657   0.49614  mrj
  Mexican 18-25  7.3802   0.67992  mrj
 Puerto Rican 18-25  4.3260   0.24191  mrj
Cuban 18-25  6.1433   1.19242  mrj
 C-S American 18-25  3.9166   0.51272  mrj
Dominican 18-25  5.8000   1.24097  mrj
  Spanish (Spain) 18-25  6.8646   1.01387  mrj
   Multi Hisp Eth 18-25 10.1134   1.75013  mrj
 NH White 18-25  5.8656   0.10100  mrj
 NH Black 18-25  6.6869   0.23643  mrj
 NH AM-AK 18-25 11.2989   1.51687  mrj
NH HI-OPI 18-25  5.6302   1.14561  mrj
 NH Asian 18-25  2.3418   0.28309  mrj
   NH Multiracial 18-25  8.2696   0.77139  mrj
  Mexican   26+  1.1658   0.33967  mrj
 Puerto Rican   26+  0.6757   0.09329  mrj
Cuban   26+  0.6653   0.31239  mrj
 C-S American   26+  0.3177   0.17604  mrj
Dominican   26+  0.5616   0.39780  mrj
  Spanish (Spain)   26+  1.8078   0.82590  mrj
   Multi Hisp Eth   26+  0.8468   0.63529  mrj
 NH White   26+  0.6915   0.02791  mrj
 NH Black   26+  1.5675   0.12031  mrj
 NH AM-AK   26+  1.7273   0.37673  mrj
NH HI-OPI   26+  0.0356   0.03535  mrj
 NH Asian   26+  0.2687   0.07564  mrj
   NH Multiracial   26+  1.3419   0.30074  mrj
  Mexican 12-17  1.2074   0.36082  anl
 Puerto Rican 12-17  1.0772   0.11547  anl
Cuban 12-17  1.2569   0.67109  anl
 C-S American 12-17  0.6213   0.22726  anl
Dominican 12-17  0.1412   0.08552  anl
  Spanish (Spain) 12-17  0.9625   0.25453  anl
   Multi Hisp Eth 12-17  1.2863   0.43909  anl
 NH White 12-17  1.1490   0.04289  anl
 NH Black 12-17  0.5932   0.06220  anl
 NH AM-AK 12-17  1.9117   0.50122  anl
NH HI-OPI 12-17  0.3833   0.20240  anl
 NH Asian 12-17  0.4782   0.1

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Hofert Jan Marius

The idea is good, but you don't need to create a list of the rows of A first, 
apply does the job:

Marius.4.0 <- function(A, B)
apply(B, 1, function(x) colSums(x>=t(A))==ncol(A))

That was actually a bit faster than your version. 

This is the fastest version so far. I compared it with C code called via .C: C 
was 15% faster.

Cheers,

Marius



From: arun [smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 7:43 PM
To: Hofert  Jan Marius
Cc: Thomas Stewart; mailman, r-help
Subject: Re: [R] How to efficiently compare each row in a matrix with each row 
in another matrix?

Hi,

Just to add:
N <- 1000
M <- 5
P <- 5000
set.seed(15)
A <- matrix(runif(N,1,1000),nrow=N,ncol=M)
set.seed(425)
B <- matrix(runif(M,1,1000),nrow=P,ncol=M)

Marius.3.0<-function(A,B){do.call(cbind,lapply(split(B,row(B)),function(x) 
colSums(x>=t(A))==ncol(A)))}
Marius.2.0 <- function(A, B){
nA <- nrow(A)
nB <- nrow(B)
C <- do.call(rbind, rep(list(B), nA)) >= matrix(rep(A, each=nB), 
ncol=ncol(B))
matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE)
}

system.time(z3.0<-Marius.3.0(A,B))
#   user  system elapsed
 # 0.524   0.020   0.548
system.time(z2.0<-Marius.2.0(A,B))
#   user  system elapsed
 # 0.968   0.216   1.189
 system.time(z1<-perhaps(A,B))
#   user  system elapsed
 # 1.264   0.204   1.473

 attr(z3.0,"dim")<-dim(z2.0)
 identical(z3.0,z2.0)
#[1] TRUE
identical(z1,z3.0)
#[1] TRUE

A.K.



- Original Message -
From: Marius Hofert 
To: R-help 
Cc:
Sent: Saturday, December 8, 2012 6:28 AM
Subject: [R] How to efficiently compare each row in a matrix with each row in 
another matrix?

Dear expeRts,

I have two matrices A and B. They have the same number of columns but possibly 
different number of rows. I would like to compare each row of A with each row 
of B and check whether all entries in a row of A are less than or equal to all 
entries in a row of B. Here is a minimal working example:

A <- rbind(matrix(1:4, ncol=2, byrow=TRUE), c(6, 2)) # (3, 2) matrix
B <- matrix(1:10, ncol=2) # (5, 2) matrix
( ind <- apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) ) # (3, 
5) = (nrow(A), nrow(B)) matrix

The question is: How can this be implemented more efficiently in R, that is, in 
a faster way?

Thanks & cheers,

Marius

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read.table()

2012-12-08 Thread Prof Brian Ripley


On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:


Hi List,

I have spent more than 30 minutes, but failed to read in this file using the 
read.table() function. I could not figure out how to fix the following error.


Well, we have a whole manual on this, mentioned on ?read.table (see See 
Also)  Have you read it?  fortunes::fortune(14) applies.


The issue is what the separator is.  You have specified whitespace, and 
that is not correct.  The original might have had tabs (see ?read.delim) 
but as pasted into this email only a human can disentangle this file.



Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :   
line 1 did not have 6 elements

Any help would be be appreciated.

Thanks,

Pradip Muhuri


### below is the  reproducible example
xd1 <-  "raceage   percent  sepercent  flag_var
  Mexican 12-17  5.7926   0.64195  any
 Puerto Rican 12-17  5.1975   0.24929  any
Cuban 12-17  3.7977   1.00487  any
 C-S American 12-17  4.3665   0.55329  any
Dominican 12-17  1.8149   0.46677  any
  Spanish (Spain) 12-17  6.1971   0.98386  any
   Multi Hisp Eth 12-17  6.7006   1.12464  any
 NH White 12-17  4.8442   0.08660  any
 NH Black 12-17  3.6943   0.16045  any
 NH AM-AK 12-17  9.6325   1.06100  any
NH HI-OPI 12-17  3.9189   1.08047  any
 NH Asian 12-17  1.9115   0.28432  any
   NH Multiracial 12-17  6.4255   0.51434  any
   Mexican 18-25  8.9284   0.73022  any
  Puerto Rican 18-25  6.1364   0.28394  any
 Cuban 18-25  8.6782   1.45543  any
  C-S American 18-25  5.9360   0.59899  any
 Dominican 18-25  7.7642   1.64553  any
   Spanish (Spain) 18-25  9.2632   1.15652  any
Multi Hisp Eth 18-25 11.3566   1.79282  any
  NH White 18-25  8.6484   0.11866  any
  NH Black 18-25  7.5972   0.24926  any
  NH AM-AK 18-25 13.5041   1.57275  any
 NH HI-OPI 18-25  8.0227   1.41348  any
  NH Asian 18-25  3.2701   0.32414  any
NH Multiracial 18-25 10.6489   0.85105  any
   Mexican   26+  3.2110   0.51683  any
  Puerto Rican   26+  1.6273   0.15033  any
 Cuban   26+  1.4419   0.44118  any
  C-S American   26+  1.0187   0.26594  any
 Dominican   26+  0.9554   0.50275  any
   Spanish (Spain)   26+  2.5976   0.86230  any
Multi Hisp Eth   26+  1.1345   0.66375  any
  NH White   26+  1.5510   0.04156  any
  NH Black   26+  2.8763   0.15133  any
  NH AM-AK   26+  3.9674   0.76611  any
 NH HI-OPI   26+  1.2919   0.66205  any
  NH Asian   26+  0.7207   0.13870  any
NH Multiracial   26+  3.0668   0.52334  any
   Mexican 12-17  4.3152   0.53235  mrj
  Puerto Rican 12-17  3.7237   0.20969  mrj
 Cuban 12-17  2.0616   0.67248  mrj
  C-S American 12-17  3.3282   0.47392  mrj
 Dominican 12-17  1.3797   0.40435  mrj
   Spanish (Spain) 12-17  5.1810   0.93979  mrj
Multi Hisp Eth 12-17  4.8915   0.94816  mrj
  NH White 12-17  3.6190   0.07379  mrj
  NH Black 12-17  2.8196   0.14042  mrj
  NH AM-AK 12-17  6.5091   0.85124  mrj
 NH HI-OPI 12-17  3.6267   1.06724  mrj
  NH Asian 12-17  1.3162   0.23575  mrj
NH Multiracial 12-17  5.0657   0.49614  mrj
   Mexican 18-25  7.3802   0.67992  mrj
  Puerto Rican 18-25  4.3260   0.24191  mrj
 Cuban 18-25  6.1433   1.19242  mrj
  C-S American 18-25  3.9166   0.51272  mrj
 Dominican 18-25  5.8000   1.24097  mrj
   Spanish (Spain) 18-25  6.8646   1.01387  mrj
Multi Hisp Eth 18-25 10.1134   1.75013  mrj
  NH White 18-25  5.8656   0.10100  mrj
  NH Black 18-25  6.6869   0.23643  mrj
  NH AM-AK 18-25 11.2989   1.51687  mrj
 NH HI-OPI 18-25  5.6302   1.14561  mrj
  NH Asian 18-25  2.3418   0.28309  mrj
NH Multiracial 18-25  8.2696   0.77139  mrj
   Mexican   26+  1.1658   0.33967  mrj
  Puerto Rican   26+  0.6757   0.09329  mrj
 Cuban   26+  0.6653   0.31239  mrj
  C-S American   26+  0.3177   0.17604  mrj
 Dominican   26+  0.5616   0.39780  mrj
   Spanish (Spain)   26+  1.8078   0.82590  mrj
Multi Hisp Eth   26+  0.8468   0.63529  mrj
  NH White   26+  0.6915   0.02791  mrj
  NH Black   26+  1.5675   0.12031  mrj
  NH AM-AK   26+  1.7273   0.37673  mrj
 NH HI-OPI   26+  0.0356   0.03535  mrj
  NH Asian   26+  0.2687   0.07564  mrj
NH Multiracial   26+  1.3419   0.30074  mrj
   Mexican 12-17  1.2074   0.36082  anl
  Puerto Rican 12-17  1.0772   0.11547  anl

[R] print and cat not working with parallelised functions?

2012-12-08 Thread Martin Ivanov

 Dear R Community,

I am running R version 2.15.2 with package parallel version 2.15.2.
The problem is that cat and print do not produce any output. Also assigning 
objects to the .GlobalEnv does not work. This makes it difficult for me to 
debug code. This can be seen from the
following minimal working example:

library(parallel)
fun2 <- function(x) {
b <<- x; # try to export the object to the workspace
print(x); # try to print x
}

fun1 <- function(d) {
 cl <- makeCluster(getOption("cl.cores", detectCores()));
 parSapply(cl=cl, X=seq_along(d), FUN=fun2);
 stopCluster(cl=cl);
}
fun3 <- function(d) sapply(X=seq_along(d), FUN=fun2); 
fun1(d=1:5)
a <- fun3(d=1:5)

print only works when fun3 is called, that is when there is no parallelisation. 
The
same is also true for the <<- assignment.
I am almost sure this is not a bug, but a feature, so I would like to ask you
for some explanation and also for some ideas how to handle debugging code when 
no printing or exporting objects to the workspace works.

Any suggestions will be appreciated.

Best regards,

Martin Ivanov

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Oracle Approximating Shrinkage in R?

2012-12-08 Thread Matt Considine


Hi,
Can anyone point me to an implementation in R of the oracle 
approximating shrinkage technique for covariance matrices?  Rseek, 
Google, etc. aren't turning anything up for me.


Thanks in advance,
Matt Considine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] print and cat not working with parallelised functions?

2012-12-08 Thread Uwe Ligges




On 08.12.2012 21:04, Martin Ivanov wrote:

  Dear R Community,

I am running R version 2.15.2 with package parallel version 2.15.2.
The problem is that cat and print do not produce any output. Also assigning 
objects to the .GlobalEnv does not work. This makes it difficult for me to 
debug code. This can be seen from the
following minimal working example:




print: You are printing in the R client, not on the master
assign: You are assigning to the .GlobalEnv of the client, not the one 
of the master.


Best,
Uwe Ligges




library(parallel)
fun2 <- function(x) {
b <<- x; # try to export the object to the workspace
print(x); # try to print x
}

fun1 <- function(d) {
  cl <- makeCluster(getOption("cl.cores", detectCores()));
  parSapply(cl=cl, X=seq_along(d), FUN=fun2);
  stopCluster(cl=cl);
}
fun3 <- function(d) sapply(X=seq_along(d), FUN=fun2);
fun1(d=1:5)
a <- fun3(d=1:5)

print only works when fun3 is called, that is when there is no parallelisation. 
The
same is also true for the <<- assignment.
I am almost sure this is not a bug, but a feature, so I would like to ask you
for some explanation and also for some ideas how to handle debugging code when
no printing or exporting objects to the workspace works.

Any suggestions will be appreciated.

Best regards,

Martin Ivanov

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] imputation in mice

2012-12-08 Thread David L Carlson

What do 

> str(data)
> summary(data)

show you? The str() function will show you what kind of variables you have
and the summary() command will indicate the range of the values and if there
are missing data. 

You seem to be overwriting your original data frame "data" (really a bad
name to use since data() is a function in R) after the imputation. Your code
does not show us where "data" comes from originally. The "weight" variable
also seems to exist in something called "lbdata." The error message suggests
that what is in "data" when you try to compute your propensity scores is not
what you think it is.

--
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Elizabeth Fuller Bettini
> Sent: Friday, December 07, 2012 10:55 PM
> To: r-help@r-project.org
> Subject: [R] imputation in mice
> 
> Hello!  If I understand this listserve correctly, I can email this
> address
> to get help when I am struggling with code.  If this is inaccurate,
> please
> let me know, and I will unsubscribe.
> I have been struggling with the same error message for a while, and I
> can't
> seem to get past it.
> Here is the issue:
> I am using a data set that uses -1:-9 to indicate various kinds of
> missing
> data.  I changed all of these to NA, regardless of the cause of the
> missing
> data. I am trying to do propensity score matching with this data, but
> it
> will not calculate the propensity scores, regardless of which method I
> have
> tried. I have tried the following methods:
> 1. Optimal propensity score matching, using the MatchIt library:
> m.out<-matchit(assignment~totalexp + yrschool+new+cert+age+STratio +
> percminority+urbanicity+povproblem+numthreats+numbattack+weight, data =
> data, distance="logit", method = "optimal", ratio = 1)
> 2. Nearest neighbor propensity score matching, using the MatchIt
> library:
> mout<-matchit(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> distance = "logit", replace = T, data = data, method = "nearest",
> m.order="largest", caliper = 0.10)
> 3. Just calculating the propensity scores using the glm function:
> ps.model = glm(assignment~totalexp +
> yrschool+new+cert+age+STratio+percminority+urbanicity+povproblem+numthr
> eats+numbattack,
> family = "binomial", data = data)
> data$propensityscores = fitted(ps.model)
> 
> In each case, I have tried running the code after having performed zero
> imputations, 1 imputation, and 5 imputations.  A colleague looked at my
> code and assured me that I was doing the imputations correctly.
> However,
> even after performing the imputation, one of the continuous variables
> still
> has NAs.  This is the code that I am using for 5 imputations:
> library(mice)
> #Remove weights
> data$weight<-NULL
> #perform the imputation
> imputed.data = mice(data,  m = 5, diagnostics = F)
> #reinsert the weights
> imputed.data.final=complete(imputed.data)
> imputed.data.final$weight=lbdata$weight
> #rename the imputed dataset "data"
> data = imputed.data.final
> 
> When I perform optimal propensity score matching or nearest neighbor
> matching (regardless of how many imputations I perform), I get the
> following error:
> Error in matchit(assignment ~ totalexp + yrschool + new + cert + age +
> :
> Missing values exist in the data
> I tried running these with just two of the categorical covariates, but
> I
> still got this error, even though there is no missing data for those
> variables.
> 
> When I perform the glm function to get the propensity scores, I get
> this
> error, indicating that, for some reason, it is reducing the number of
> rows
> in my data set, which makes me think that it is doing list-wise
> deletion:
> Error in `$<-.data.frame`(`*tmp*`, "propensityscores", value =
> c(0.116801691392172,  :
> replacement has 15934 rows, data has 16844
> However, this method works if I remove the covariate that has missing
> data.
> 
> 
> So, I guess my question is, how do I get the code to impute for the
> variable that it is not imputing?  Or, do I just need to chuck this
> variable?  And, if I just need to chuck this variable, how do I get the
> optimal propensity score method to work?  Currently it doesn't work
> even
> when I chuck this variable.
> 
> Thank you for any help or advice!
> Liz
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide htt

Re: [R] read.table()

2012-12-08 Thread David L Carlson

If you look at the first few lines, you can see the problem. Your category
"race" has labels that contain spaces and you've told read.table() to
separate the variables using whitespace (including spaces) so read.table()
sees six variables in this line, but only five variables names in the first
line:

Puerto Rican 12-17  5.1975   0.24929  any

It assigns "Puerto" to race, "Rican" to age, etc. If your data come from a
spreadsheet it is possible that the separator is actually a tab (sep="\t"),
but that has been replaced with spaces in the version you sent us. With an
text editor, you can line up the columns and then use read.fwf() instead of
read.table(), but you will have to ensure that the columns line up and that
you insert a delimiter (e.g. a tab between the field names on the first
line). 

--
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
> Sent: Saturday, December 08, 2012 1:11 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
> Subject: [R] read.table()
> 
> 
> Hi List,
> 
> I have spent more than 30 minutes, but failed to read in this file
> using the read.table() function. I could not figure out how to fix the
> following error.
> 
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
> na.strings,  :   line 1 did not have 6 elements
> 
> Any help would be be appreciated.
> 
> Thanks,
> 
> Pradip Muhuri
> 
> 
> ### below is the  reproducible example
> xd1 <-  "raceage   percent  sepercent  flag_var
>  Mexican 12-17  5.7926   0.64195  any
> Puerto Rican 12-17  5.1975   0.24929  any
>Cuban 12-17  3.7977   1.00487  any
> C-S American 12-17  4.3665   0.55329  any
>Dominican 12-17  1.8149   0.46677  any
>  Spanish (Spain) 12-17  6.1971   0.98386  any
>   Multi Hisp Eth 12-17  6.7006   1.12464  any
> NH White 12-17  4.8442   0.08660  any
> NH Black 12-17  3.6943   0.16045  any
> NH AM-AK 12-17  9.6325   1.06100  any
>NH HI-OPI 12-17  3.9189   1.08047  any
> NH Asian 12-17  1.9115   0.28432  any
>   NH Multiracial 12-17  6.4255   0.51434  any
>   Mexican 18-25  8.9284   0.73022  any
>  Puerto Rican 18-25  6.1364   0.28394  any
> Cuban 18-25  8.6782   1.45543  any
>  C-S American 18-25  5.9360   0.59899  any
> Dominican 18-25  7.7642   1.64553  any
>   Spanish (Spain) 18-25  9.2632   1.15652  any
>Multi Hisp Eth 18-25 11.3566   1.79282  any
>  NH White 18-25  8.6484   0.11866  any
>  NH Black 18-25  7.5972   0.24926  any
>  NH AM-AK 18-25 13.5041   1.57275  any
> NH HI-OPI 18-25  8.0227   1.41348  any
>  NH Asian 18-25  3.2701   0.32414  any
>NH Multiracial 18-25 10.6489   0.85105  any
>   Mexican   26+  3.2110   0.51683  any
>  Puerto Rican   26+  1.6273   0.15033  any
> Cuban   26+  1.4419   0.44118  any
>  C-S American   26+  1.0187   0.26594  any
> Dominican   26+  0.9554   0.50275  any
>   Spanish (Spain)   26+  2.5976   0.86230  any
>Multi Hisp Eth   26+  1.1345   0.66375  any
>  NH White   26+  1.5510   0.04156  any
>  NH Black   26+  2.8763   0.15133  any
>  NH AM-AK   26+  3.9674   0.76611  any
> NH HI-OPI   26+  1.2919   0.66205  any
>  NH Asian   26+  0.7207   0.13870  any
>NH Multiracial   26+  3.0668   0.52334  any
>   Mexican 12-17  4.3152   0.53235  mrj
>  Puerto Rican 12-17  3.7237   0.20969  mrj
> Cuban 12-17  2.0616   0.67248  mrj
>  C-S American 12-17  3.3282   0.47392  mrj
> Dominican 12-17  1.3797   0.40435  mrj
>   Spanish (Spain) 12-17  5.1810   0.93979  mrj
>Multi Hisp Eth 12-17  4.8915   0.94816  mrj
>  NH White 12-17  3.6190   0.07379  mrj
>  NH Black 12-17  2.8196   0.14042  mrj
>  NH AM-AK 12-17  6.5091   0.85124  mrj
> NH HI-OPI 12-17  3.6267   1.06724  mrj
>  NH Asian 12-17  1.3162   0.23575  mrj
>NH Multiracial 12-17  5.0657   0.49614  mrj
>   Mexican 18-25  7.3802   0.67992  mrj
>  Puerto Rican 18-25  4.3260   0.24191  mrj
> Cuban 18-25  6.1433   1.19242  mrj
>  C-S American 18-25  3.9166   0.51272  mrj
> Dominican 18-25  5.8000   1.24097  mrj
>   Spanish (Spain) 18-25  6.8646   1.01387  mrj
>Multi Hisp Eth 18-25 10.1134   1.75013  mrj
>  NH White 18-25  5.8656   0.10100  mrj
>  NH Black 18-25  6.6869   0.23643  mrj
>  NH AM-AK 18-25 11.2989   1.51687  mrj
>

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says "fortunes::fortune(14) applies" 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
> agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE, as.is=TRUE)
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>
> Hi List,
>
> I have spent more than 30 minutes, but failed to read in this file using the 
> read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See 
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and 
that is not correct.  The original might have had tabs (see ?read.delim) 
but as pasted into this email only a human can disentangle this file.

> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
>   line 1 did not have 6 elements
>
> Any help would be be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> ### below is the  reproducible example
> xd1 <-  "raceage   percent  sepercent  flag_var
>   Mexican 12-17  5.7926   0.64195  any
>  Puerto Rican 12-17  5.1975   0.24929  any
> Cuban 12-17  3.7977   1.00487  any
>  C-S American 12-17  4.3665   0.55329  any
> Dominican 12-17  1.8149   0.46677  any
>   Spanish (Spain) 12-17  6.1971   0.98386  any
>Multi Hisp Eth 12-17  6.7006   1.12464  any
>  NH White 12-17  4.8442   0.08660  any
>  NH Black 12-17  3.6943   0.16045  any
>  NH AM-AK 12-17  9.6325   1.06100  any
> NH HI-OPI 12-17  3.9189   1.08047  any
>  NH Asian 12-17  1.9115   0.28432  any
>NH Multiracial 12-17  6.4255   0.51434  any
>Mexican 18-25  8.9284   0.73022  any
>   Puerto Rican 18-25  6.1364   0.28394  any
>  Cuban 18-25  8.6782   1.45543  any
>   C-S American 18-25  5.9360   0.59899  any
>  Dominican 18-25  7.7642   1.64553  any
>Spanish (Spain) 18-25  9.2632   1.15652  any
> Multi Hisp Eth 18-25 11.3566   1.79282  any
>   NH White 18-25  8.6484   0.11866  any
>   NH Black 18-25  7.5972   0.24926  any
>   NH AM-AK 18-25 13.5041   1.57275  any
>  NH HI-OPI 18-25  8.0227   1.41348  any
>   NH Asian 18-25  3.2701   0.32414  any
> NH Multiracial 18-25 10.6489   0.85105  any
>Mexican   26+  3.2110   0.51683  any
>   Puerto Rican   26+  1.6273   0.15033  any
>  Cuban   26+  1.4419   0.44118  any
>   C-S American   26+  1.0187   0.26594  any
>  Dominican   26+  0.9554   0.50275  any
>Spanish (Spain)   26+  2.5976   0.86230  any
> Multi Hisp Eth   26+  1.1345   0.66375  any
>   NH White   26+  1.5510   0.04156  any
>   NH Black   26+  2.8763   0.15133  any
>   NH AM-AK   26+  3.9674   0.76611  any
>  NH HI-OPI   26+  1.2919   0.66205  any
>   NH Asian   26+  0.7207   0.13870  any
> NH Multiracial   26+  3.0668   0.52334  any
>Mexican 12-17  4.3152   0.53235  mrj
>   Puerto Rican 12-17  3.7237   0.20969  mrj
>  Cuban 12-17  2.0616   0.67248  mrj
>   C-S American 12-17  3.3282   0.47392  mrj
>  Dominican 12-17  1.3797   0.40435  mrj
>Spanish (Spain) 12-17  5.1810   0.93979  mrj
> Multi Hisp Eth

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear Arun,

The issue is that the column names are incorrect.  I will also look into the 
comment by Prof Ripley.

Thanks for your continued support and help.

Pradip

> str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))
'data.frame':   195 obs. of  1 variable:
 $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "   
 Cuban   26+  0.6653   0.31239  mrj",..: 27 148 13 140 108 193 169 100 85 
67 ...
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 5:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David L Carlson; R help
Subject: Re: [R] read. table()

Hi,

You can check the str()
I assume it will be like this:
 str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
#'data.frame':195 obs. of  1 variable:
# $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "C-S 
American 12-17  0.2399   0.15804  coc",..: 50 170 20 5 35 185 65 155 110 80 
...

A.K.

- Original Message -
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
To: 'Prof Brian Ripley' ; "r-help@r-project.org" 

Cc:
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says "fortunes::fortune(14) applies" 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
> agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE, as.is=TRUE)
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
 raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>
> Hi List,
>
> I have spent more than 30 minutes, but failed to read in this file using the 
> read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and
that is not correct.  The original might have had tabs (see ?read.delim)
but as pasted into this email only a human can disentangle this file.

> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
>   line 1 did not have 6 elements
>
> Any help would be be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> ### below is the  reproducible example
> xd1 <-  "raceage   percent  sepercent  flag_var
>   Mexican 12-17  5.7926   0.64195  any
>  Puerto Rican 12-17  5.1975   0.24929  any
> Cuban 12-17  3.7977   1.00487  any
>  C-S American 12-17  4.3665   0.55329  any
> Dominican 12-17  1.8149   0.46677  any
>   Spanish (Spain) 12-17  6.1971   0.98386  any
>Multi Hi

Re: [R] KMP String search

2012-12-08 Thread Rui Barradas


Hello,

As far as I know, the answer is no, there isn't.

Hope this helps,

Rui Barradas
Em 08-12-2012 17:44, email escreveu:

Hi:

Is there any Package in R which implements the KMP String search algorithm ?

Thanks
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot read iso639 table

2012-12-08 Thread Prof Brian Ripley


For the record, in R-devel you can do


f <-
read.table(url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt";, 
encoding = "UTF-8-BOM"), quote="", sep="|", stringsAsFactors=FALSE)

f[1,]

   V1 V2 V3   V4   V5
1 aaraa Afar afar

charToRaw(f[1,1])

[1] 61 61 72

Whether this works with "UTF-8" depends on the implementation of iconv: 
strangely Microsoft remove BOMs in UTF-16 but not in UTF-8 (although 
almost the only people to put them there in UTF-8 are Microsoft's 
applications).




On 13/09/2012 21:43, peter dalgaard wrote:

Pragmatically, one can zap the BOM from the output with

language.ISO.table[1,1] <- substring(language.ISO.table[1,1],2)

and be gone with it.

It would be nicer to zap the BOM before read.table, though. It does work for me 
with the below (notice that the BOM is a single character if you don't use 
useBytes=).


get.language.ISO.table

function () {
  socket <- url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt";,
open="r",encoding="utf-8");
  readChar(socket, nchar=1)
  data <- read.table(socket, as.is = TRUE, sep = "|", header = FALSE,
 col.names = c("a3bibliographic","a3terminologic",
   "a2","english","french"), quote="");
  close(socket);
  data
}


On Sep 13, 2012, at 22:26 , William Dunlap wrote:


It would be helpful if you showed your commands and printed
outputs, copied directly from your R session, from the beginning
to the end.  I put the call to sessionInfo() in my message because
it is probably relevant.  It is nice to completely include the original
email when responding to it so others can see the whole story in
one place.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: Sam Steingold [mailto:sam.steing...@gmail.com] On Behalf Of Sam Steingold
Sent: Thursday, September 13, 2012 1:18 PM
To: William Dunlap
Cc: peter dalgaard; r-help@r-project.org
Subject: Re: [R] cannot read iso639 table


* William Dunlap  [2012-09-13 19:50:21 +]:

On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
the initial 3 bytes (the byte-order mark?) to make things work:


socket <-
url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-

8.txt",open="r",encoding="utf-8")

readChar(socket, nchars=3, useBytes=TRUE)

  [1] "ï»¿"


confirmed - first 3 bytes are "\357\273\277"


d <- read.table(socket, quote="", sep="|", stringsAsFactors=FALSE)
dim(d)

  [1] 485   5

head(d)

 V1 V2 V3 V4  V5
  1 aaraa   Afarafar
  2 abkab  Abkhazian abkhaze
  3 ace Achineseaceh
  4 achAcoli   acoli
  5 ada  Adangme adangme
  6 ady   Adyghe; Adygei  adyghé


alas, this is all I get:

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  invalid input found on input connection 
'http://www.loc.gov/standards/iso639-2/ISO-
639-2_utf-8.txt'

  a3bibliographic a3terminologic a2english  french
1 aar NA aa   Afarafar
2 abk NA ab  Abkhazian abkhaze
3 ace NA  Achineseaceh
4 ach NA Acoli   acoli
5 ada NA   Adangme adangme
6 ady NAAdyghe; Adygei   adygh

note that the first non-ASCII character terminates the input.

so, I still cannot read the data from the URL.

I can read the file though - with quote="" (thanks Peter!) -
except that the first record is "\357\273\277aar".


--
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com
http://mideasttruth.com http://iris.org.il http://jihadwatch.org
The only thing worse than X Windows: (X Windows) - X





--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read. table()

2012-12-08 Thread David L Carlson

Arun's solution works but you lose your spaces in the race field. These
commands will preserve them. We need to make sure that your file has two or
more spaces between each field. The first gsub() command strips leading
space. The second inserts a space before the digit 1 (that is where all the
fields separated by a single space are). Then we convert two or more spaces
to a comma. Finally you can use read.table().

Starting with your vector xd1 from your first posting:
> raw2 <- readLines(con=textConnection(xd1))
> raw2 <- gsub("^ +", "", raw2)
> raw2 <- gsub(" 1", "  1", raw2)
> raw3 <- gsub("  +", ",", raw2)
> agerace <- read.table(text=raw3, header=TRUE, sep=",", as.is=TRUE)
> str(agerace)
'data.frame':   195 obs. of  5 variables:
 $ race : chr  "Mexican" "Puerto Rican" "Cuban" "C-S American" ...
 $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
 $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
 $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
 $ flag_var : chr  "any" "any" "any" "any" ...
>

 -Original Message-
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Saturday, December 08, 2012 5:11 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: R help; David L Carlson
> Subject: Re: [R] read. table()
> 
> HI Pradip,
> 
> Try this:
> source("Muhuri.txt")
> #Muhuri.txt
> Lines<-  "race    age   percent  sepercent  flag_var
>  Mexican 12-17  5.7926   0.64195  any--
> ---
> 
> "
> Lines1<-readLines(textConnection(Lines))
> 
> Col1new<-gsub("
> ","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:
> ]]+\\-.*","\\1",Lines1[-1])))
> Col2<-
> gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-
> .*)","\\1",Lines1[-1]))
> dat1<-
> data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),
> stringsAsFactors=FALSE)
> 
> heading<-unlist(strsplit(Lines1[1]," "))
> colnames(dat1)<-heading[heading!=""]
>  head(dat1,6)
> #    race   age percent sepercent flag_var
> #1    Mexican 12-17  5.7926   0.64195  any
> #2    PuertoRican 12-17  5.1975   0.24929  any
> #3  Cuban 12-17  3.7977   1.00487  any
> #4    C-SAmerican 12-17  4.3665   0.55329  any
> #5  Dominican 12-17  1.8149   0.46677  any
> #6 Spanish(Spain) 12-17  6.1971   0.98386  any
> 
> 
> 
>  str(dat1)
> 'data.frame':    195 obs. of  5 variables:
>  $ race : chr  "Mexican" "PuertoRican" "Cuban" "C-SAmerican" ...
>  $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
>  $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
>  $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
>  $ flag_var : chr  "any" "any" "any" "any" ...
> 
> A.K.
> 
> 
> 
> - Original Message -
> From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
> To: 'arun' 
> Cc: David L Carlson ; R help 
> Sent: Saturday, December 8, 2012 5:20 PM
> Subject: RE: [R] read. table()
> 
> Dear Arun,
> 
> The issue is that the column names are incorrect.  I will also look
> into the comment by Prof Ripley.
> 
> Thanks for your continued support and help.
> 
> Pradip
> 
> > str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))
> 'data.frame':   195 obs. of  1 variable:
> $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "
>         Cuban   26+  0.6653   0.31239      mrj",..: 27 148 13 140 108
> 193 169 100 85 67 ...
> > names(agerace)
> [1] "raceage...percent..sepercent..flag_var"
> > head(agerace)
>          raceage...percent..sepercent..flag_var
> 1          Mexican 12-17  5.7926   0.64195      any
> 2     Puerto Rican 12-17  5.1975   0.24929      any
> 3            Cuban 12-17  3.7977   1.00487      any
> 4     C-S American 12-17  4.3665   0.55329      any
> 5        Dominican 12-17  1.8149   0.46677      any
> 6  Spanish (Spain) 12-17  6.1971   0.98386      any
> 
> Pradip K. Muhuri, PhD
> Statistician
> Substance Abuse & Mental Health Services Administration
> The Center for Behavioral Health Statistics and Quality
> Division of Population Surveys
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> 
> Tel: 240-276-1070
> Fax: 240-276-1260
> e-mail: pradip.muh...@samhsa.hhs.gov
> 
> The Center for Behavioral Health Statistics and Quality your feedback.
> Please click on the following link to complete a brief customer
> survey:  http://cbhsqsurvey.samhsa.gov
> 
> 
> -Original Message-
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Saturday, December 08, 2012 5:13 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: David L Carlson; R help
> Subject: Re: [R] read. table()
> 
> 
> 
> Hi,
> 
> You can check the str()
> I assume it will be like this:
> str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
> #'data.frame':    195 obs. of  1 variable:
> # $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "
>   C-S American 12-17  0.2399   0.15804      coc",..: 50 170 20 5 35 185
> 65 155 110 80 ...
> 
> A.K.
> 
> 
> 
> 
> -

Re: [R] read.table()

2012-12-08 Thread Tanja Vukov

Hi!
I think you have problem with "flag_var". I suggest to put just
"flagvar". Do not use "_" in variable names. I would suggest not to
use both "_" or "-" anywhere in data file. I am just a beginner with R
but think that is the problem...
Cheers!
Tanja.

On Sat, Dec 8, 2012 at 8:29 PM, Prof Brian Ripley  wrote:
> On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>>
>>
>> Hi List,
>>
>> I have spent more than 30 minutes, but failed to read in this file using
>> the read.table() function. I could not figure out how to fix the following
>> error.
>
>
> Well, we have a whole manual on this, mentioned on ?read.table (see See
> Also)  Have you read it?  fortunes::fortune(14) applies.
>
> The issue is what the separator is.  You have specified whitespace, and that
> is not correct.  The original might have had tabs (see ?read.delim) but as
> pasted into this email only a human can disentangle this file.
>
>
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
>> :   line 1 did not have 6 elements
>>
>> Any help would be be appreciated.
>>
>> Thanks,
>>
>> Pradip Muhuri
>>
>>
>> ### below is the  reproducible example
>> xd1 <-  "raceage   percent  sepercent  flag_var
>>   Mexican 12-17  5.7926   0.64195  any
>>  Puerto Rican 12-17  5.1975   0.24929  any
>> Cuban 12-17  3.7977   1.00487  any
>>  C-S American 12-17  4.3665   0.55329  any
>> Dominican 12-17  1.8149 0.46677  any
>>   Spanish (Spain) 12-17  6.1971   0.98386  any
>>Multi Hisp Eth 12-17  6.7006   1.12464  any
>>  NH White 12-17  4.8442   0.08660  any
>>  NH Black 12-17  3.6943   0.16045  any
>>  NH AM-AK 12-17  9.6325   1.06100  any
>> NH HI-OPI 12-17  3.9189   1.08047  any
>>  NH Asian 12-17  1.9115   0.28432  any
>>NH Multiracial 12-17  6.4255   0.51434  any
>>Mexican 18-25  8.9284   0.73022  any
>>   Puerto Rican 18-25  6.1364   0.28394  any
>>  Cuban 18-25  8.6782   1.45543  any
>>   C-S American 18-25  5.9360   0.59899  any
>>  Dominican 18-25  7.7642   1.64553  any
>>Spanish (Spain) 18-25  9.2632   1.15652  any
>> Multi Hisp Eth 18-25 11.3566   1.79282  any
>>   NH White 18-25  8.6484   0.11866  any
>>   NH Black 18-25  7.5972   0.24926  any
>>   NH AM-AK 18-25 13.5041   1.57275  any
>>  NH HI-OPI 18-25  8.0227   1.41348  any
>>   NH Asian 18-25  3.2701   0.32414  any
>> NH Multiracial 18-25 10.6489   0.85105  any
>>Mexican   26+  3.2110   0.51683  any
>>   Puerto Rican   26+  1.6273   0.15033  any
>>  Cuban   26+  1.4419   0.44118  any
>>   C-S American   26+  1.0187   0.26594  any
>>  Dominican   26+  0.9554   0.50275  any
>>Spanish (Spain)   26+  2.5976   0.86230  any
>> Multi Hisp Eth   26+  1.1345   0.66375  any
>>   NH White   26+  1.5510   0.04156  any
>>   NH Black   26+  2.8763   0.15133  any
>>   NH AM-AK   26+  3.9674   0.76611  any
>>  NH HI-OPI   26+  1.2919   0.66205  any
>>   NH Asian   26+  0.7207   0.13870  any
>> NH Multiracial   26+  3.0668   0.52334  any
>>Mexican 12-17  4.3152   0.53235  mrj
>>   Puerto Rican 12-17  3.7237   0.20969  mrj
>>  Cuban 12-17  2.0616   0.67248  mrj
>>   C-S American 12-17  3.3282   0.47392  mrj
>>  Dominican 12-17  1.3797   0.40435  mrj
>>Spanish (Spain) 12-17  5.1810   0.93979  mrj
>> Multi Hisp Eth 12-17  4.8915   0.94816  mrj
>>   NH White 12-17  3.6190   0.07379  mrj
>>   NH Black 12-17  2.8196   0.14042  mrj
>>   NH AM-AK 12-17  6.5091   0.85124  mrj
>>  NH HI-OPI 12-17  3.6267   1.06724  mrj
>>   NH Asian 12-17  1.3162 0.23575  mrj
>> NH Multiracial 12-17  5.0657   0.49614  mrj
>>Mexican 18-25  7.3802   0.67992  mrj
>>   Puerto Rican 18-25  4.3260   0.24191  mrj
>>  Cuban 18-25  6.1433   1.19242  mrj
>>   C-S American 18-25  3.9166   0.51272  mrj
>>  Dominican 18-25  5.8000   1.24097  mrj
>>Spanish (Spain) 18-25  6.8646   1.01387  mrj
>> Multi Hisp Eth 18-25 10.1134   1.75013  mrj
>>   NH White 18-25  5.8656   0.10100  mrj
>>   NH Black 18-25  6.6869   0.23643  mrj
>>   NH AM-AK 18-25 11.2989   1.51687  mrj
>>  NH HI-OPI 18-25  5.6302   1.14561  mrj
>>   NH Asian 18-25  2.3418   0.28309  mrj
>> NH Multiracial 18-25  8.2696   0.77139  mrj
>>Mexican   26+  1.1658   0.33967  mrj
>>   Puerto Rican   26+  0.6757   0.09329  mrj
>>  Cuban   26+  0.6653   0.31239  mrj
>>   C-S American   26+  0.3177   0.17604  mrj
>

[R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.

Hello,

I'm trying to create a custom function that "mean-centers" data and can be
applied across many columns.

Here is an example dataset, which is similar to my dataset:

*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

If I use only "one" column in the first argument of the "by" function,
everything is in fine.  For example the following code will work fine:

by(data[c("Units")],
data["Location"],
specialFunction)

But the following code will "not" work, because I have "two" columns in the
first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)

Does anyone have any ideas as to what I am doing wrong?

Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

Best Regards,

Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
r...@liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dbscan Clustering Feature Question

2012-12-08 Thread anthony kasza

Hello list. My apologies if this topic has been discussed before on the
list but I was unable to find it. I'm working on a way to cluster PCAP
files according to the events recorded within them. I've decided to use
Bro-IDS for feature extraction. I am looking at dbscan within the FPC
library to accomplish my goal.

Is it possible to feed a data frame to dbscan with more than two columns
and have dbscan cluster on more than two features?

-AK

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread CHEN, Cheng

Hi, guys

I don't understand why I can apply as.Date to a single item in the list:
> as.Date(alldays[4])
[1] "29-03-20"

but when I try to lapply as.Date to all the items, i got a sequence of neg
numbers:

> sapply(alldays[1:4], FUN=as.Date)
03-04-2012 02-04-2012 30-03-2012 29-03-2012
   -718323-718688-708492-708857

does anyone know what's wrong here?

i am very confused!

Thanks a lot for your time in such a freezing weekend!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read. table()

2012-12-08 Thread arun



Hi,

You can check the str()
I assume it will be like this:
 str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
#'data.frame':    195 obs. of  1 variable:
# $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "    C-S 
American 12-17  0.2399   0.15804  coc",..: 50 170 20 5 35 185 65 155 110 80 
...

A.K.
 



- Original Message -
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
To: 'Prof Brian Ripley' ; "r-help@r-project.org" 

Cc: 
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says "fortunes::fortune(14) applies" 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
> agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE, as.is=TRUE)
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
         raceage...percent..sepercent..flag_var
1          Mexican 12-17  5.7926   0.64195      any
2     Puerto Rican 12-17  5.1975   0.24929      any
3            Cuban 12-17  3.7977   1.00487      any
4     C-S American 12-17  4.3665   0.55329      any
5        Dominican 12-17  1.8149   0.46677      any
6  Spanish (Spain) 12-17  6.1971   0.98386      any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
 
Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov
 
The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Prof Brian Ripley
Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>
> Hi List,
>
> I have spent more than 30 minutes, but failed to read in this file using the 
> read.table() function. I could not figure out how to fix the following error.

Well, we have a whole manual on this, mentioned on ?read.table (see See 
Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace, and 
that is not correct.  The original might have had tabs (see ?read.delim) 
but as pasted into this email only a human can disentangle this file.

> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  
> :   line 1 did not have 6 elements
>
> Any help would be be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> ### below is the  reproducible example
> xd1 <-  "race    age   percent  sepercent  flag_var
>           Mexican 12-17  5.7926   0.64195      any
>      Puerto Rican 12-17  5.1975   0.24929      any
>             Cuban 12-17  3.7977   1.00487      any
>      C-S American 12-17  4.3665   0.55329      any
>         Dominican 12-17  1.8149   0.46677      any
>   Spanish (Spain) 12-17  6.1971   0.98386      any
>    Multi Hisp Eth 12-17  6.7006   1.12464      any
>          NH White 12-17  4.8442   0.08660      any
>          NH Black 12-17  3.6943   0.16045      any
>          NH AM-AK 12-17  9.6325   1.06100      any
>         NH HI-OPI 12-17  3.9189   1.08047      any
>          NH Asian 12-17  1.9115   0.28432      any
>    NH Multiracial 12-17  6.4255   0.51434      any
>            Mexican 18-25  8.9284   0.73022      any
>       Puerto Rican 18-25  6.1364   0.28394      any
>              Cuban 18-25  8.6782   1.45543      any
>       C-S American 18-25  5.9360   0.59899      any
>          Dominican 18-25  7.7642   1.64553      any
>    Spanish (Spain) 18-25  9.2632   1.15652      any
>     Multi Hisp Eth 18-25 11.3566   1.79282      any
>           NH White 18-25  8.6484   0.11866      any
>           NH Black 18-25  7.5972   0.24926      any
>           NH AM-AK 18-25 13.5041   1.57275      any
>          NH HI-OPI 18-25  8.0227   1.41348      any
>           NH Asian 18-25  3.2701   0.32414      any
>     NH Multiracial 18-25 10.6489   0.85105      any
>            Mexican   26+  3.2110   0.51683      any
>       Puerto Rican   26+  1.6273   0.15033      any
>              Cuban   26+  1.4419   0.44118      any
>       C-S American   26+  1.0187   0.26594      any
>          Dominican   26+  0.9554   0.50275      any
>    Spanish (Spain)   26+  2.5976   0.86230      any
>     Multi Hisp Eth   26+  1.1345   0.66375      any
>           NH White   26+  1.5510   0.04156      any
>           NH Black   26+  2.8763   0.15133      any
>           NH AM-AK

Re: [R] read. table()

2012-12-08 Thread arun

HI Pradip,

Try this:
source("Muhuri.txt")
#Muhuri.txt
Lines<-  "race    age   percent  sepercent  flag_var
 Mexican 12-17  5.7926   0.64195  
any-

"
Lines1<-readLines(textConnection(Lines))

Col1new<-gsub(" 
","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:]]+\\-.*","\\1",Lines1[-1])))
Col2<-gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-.*)","\\1",Lines1[-1]))
dat1<-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),stringsAsFactors=FALSE)

heading<-unlist(strsplit(Lines1[1]," "))
colnames(dat1)<-heading[heading!=""]
 head(dat1,6)
#    race   age percent sepercent flag_var
#1    Mexican 12-17  5.7926   0.64195  any
#2    PuertoRican 12-17  5.1975   0.24929  any
#3  Cuban 12-17  3.7977   1.00487  any
#4    C-SAmerican 12-17  4.3665   0.55329  any
#5  Dominican 12-17  1.8149   0.46677  any
#6 Spanish(Spain) 12-17  6.1971   0.98386  any



 str(dat1)
'data.frame':    195 obs. of  5 variables:
 $ race : chr  "Mexican" "PuertoRican" "Cuban" "C-SAmerican" ...
 $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
 $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
 $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
 $ flag_var : chr  "any" "any" "any" "any" ...

A.K.



- Original Message -
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
To: 'arun' 
Cc: David L Carlson ; R help 
Sent: Saturday, December 8, 2012 5:20 PM
Subject: RE: [R] read. table()

Dear Arun,

The issue is that the column names are incorrect.  I will also look into the 
comment by Prof Ripley.

Thanks for your continued support and help.

Pradip

> str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))
'data.frame':   195 obs. of  1 variable:
$ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "            
Cuban   26+  0.6653   0.31239      mrj",..: 27 148 13 140 108 193 169 100 85 67 
...
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
         raceage...percent..sepercent..flag_var
1          Mexican 12-17  5.7926   0.64195      any
2     Puerto Rican 12-17  5.1975   0.24929      any
3            Cuban 12-17  3.7977   1.00487      any
4     C-S American 12-17  4.3665   0.55329      any
5        Dominican 12-17  1.8149   0.46677      any
6  Spanish (Spain) 12-17  6.1971   0.98386      any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:  
http://cbhsqsurvey.samhsa.gov


-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 5:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David L Carlson; R help
Subject: Re: [R] read. table()



Hi,

You can check the str()
I assume it will be like this:
str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
#'data.frame':    195 obs. of  1 variable:
# $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "    C-S 
American 12-17  0.2399   0.15804      coc",..: 50 170 20 5 35 185 65 155 110 80 
...

A.K.




- Original Message -
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
To: 'Prof Brian Ripley' ; "r-help@r-project.org" 

Cc:
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the results.

The only issue is that the column names do not seem to be correct.  I did not 
understand part of your comment, which says "fortunes::fortune(14) applies" 
although I read about the double colon operator- ns-dblcolon {base}.

Could you please provide a little more hint for me to resolve the issue?

Thanks and regards,

# new result 
> agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE, as.is=TRUE)
> names(agerace)
[1] "raceage...percent..sepercent..flag_var"
> head(agerace)
         raceage...percent..sepercent..flag_var
1          Mexican 12-17  5.7926   0.64195      any
2     Puerto Rican 12-17  5.1975   0.24929      any
3            Cuban 12-17  3.7977   1.00487      any
4     C-S American 12-17  4.3665   0.55329      any
5        Dominican 12-17  1.8149   0.46677      any
6  Spanish (Spain) 12-17  6.1971   0.98386      any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

T

[R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.

Hello,

I'm trying to create a custom function that "mean-centers" data and can be
applied across many columns.

Here is an example dataset, which is similar to my dataset:

*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

If I use only "one" column in the first argument of the "by" function,
everything is in fine.  For example the following code will work fine:

by(data[c("Units")],
data["Location"],
 specialFunction)

But the following code will "not" work, because I have "two" columns in the
first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
 specialFunction)

Does anyone have any ideas as to what I am doing wrong?

Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

Best Regards,

Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
r...@liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)

Dear David and Arun,

Thank you very much for your time and efforts and for resolving the issue. 
>From this exchange, I have learned something new about reading the data files 
into R.

Regards,

Pradip


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please 
click on the following link to complete a brief customer survey:   
http://cbhsqsurvey.samhsa.gov

-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 8:45 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: dcarl...@tamu.edu; R help
Subject: Re: [R] read. table()

Hi,

David's method is much better than mine.
Regarding the spaces in the race field, this should preserve them if you wish 
to try my method.
source("Muhuri.txt")
Lines1<-readLines(textConnection(Lines))

 Col1new<-gsub(" 
+$","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:]]+\\-.*","\\1",Lines1[-1])))
 #changed

 
Col2<-gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-.*)","\\1",Lines1[-1]))
 
dat1<-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),stringsAsFactors=FALSE)
 heading<-unlist(strsplit(Lines1[1]," "))
 colnames(dat1)<-heading[heading!=""]


head(dat1)
# race   age percent sepercent flag_var
#1 Mexican 12-17  5.7926   0.64195  any
#2Puerto Rican 12-17  5.1975   0.24929  any
#3   Cuban 12-17  3.7977   1.00487  any
#4C-S American 12-17  4.3665   0.55329  any
#5   Dominican 12-17  1.8149   0.46677  any
#6 Spanish (Spain) 12-17  6.1971   0.98386  any
 str(dat1)
#'data.frame':195 obs. of  5 variables:
# $ race : chr  "Mexican" "Puerto Rican" "Cuban" "C-S American" ...
# $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
# $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
# $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
# $ flag_var : chr  "any" "any" "any" "any" ...


A.K.



- Original Message -
From: David L Carlson 
To: 'arun' ; "'Muhuri, Pradip (SAMHSA/CBHSQ)'" 

Cc: 'R help' 
Sent: Saturday, December 8, 2012 8:06 PM
Subject: RE: [R] read. table()

Arun's solution works but you lose your spaces in the race field. These
commands will preserve them. We need to make sure that your file has two or
more spaces between each field. The first gsub() command strips leading
space. The second inserts a space before the digit 1 (that is where all the
fields separated by a single space are). Then we convert two or more spaces
to a comma. Finally you can use read.table().

Starting with your vector xd1 from your first posting:
> raw2 <- readLines(con=textConnection(xd1))
> raw2 <- gsub("^ +", "", raw2)
> raw2 <- gsub(" 1", "  1", raw2)
> raw3 <- gsub("  +", ",", raw2)
> agerace <- read.table(text=raw3, header=TRUE, sep=",", as.is=TRUE)
> str(agerace)
'data.frame':   195 obs. of  5 variables:
$ race : chr  "Mexican" "Puerto Rican" "Cuban" "C-S American" ...
$ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
$ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
$ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
$ flag_var : chr  "any" "any" "any" "any" ...
>

-Original Message-
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Saturday, December 08, 2012 5:11 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: R help; David L Carlson
> Subject: Re: [R] read. table()
>
> HI Pradip,
>
> Try this:
> source("Muhuri.txt")
> #Muhuri.txt
> Lines<-  "raceage   percent  sepercent  flag_var
>  Mexican 12-17  5.7926   0.64195  any--
> ---
> 
> "
> Lines1<-readLines(textConnection(Lines))
>
> Col1new<-gsub("
> ","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:
> ]]+\\-.*","\\1",Lines1[-1])))
> Col2<-
> gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-
> .*)","\\1",Lines1[-1]))
> dat1<-
> data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),
> stringsAsFactors=FALSE)
>
> heading<-unlist(strsplit(Lines1[1]," "))
> colnames(dat1)<-heading[heading!=""]
>  head(dat1,6)
> #race   age percent sepercent flag_var
> #1Mexican 12-17  5.7926   0.64195  any
> #2PuertoRican 12-17  5.1975   0.24929  any
> #3  Cuban 12-17  3.7977   1.00487  any
> #4C-SAmerican 12-17  4.3665   0.55329  any
> #5  Dominican 12-17  1.8149   0.46677  any
> #6 Spanish(Spain) 12-17  6.1971   0.98386  any
>
>
>
>  str(dat1)
> 'data.frame':195 obs. of  5 variables:
>  $ race : chr  "Mexican" "PuertoRican" "Cuban" "C-SAmerican" ...
>  $ age  : chr  "12-17" "12-17" "12-17" "1

Re: [R] read. table()

2012-12-08 Thread arun

Hi,

David's method is much better than mine.
Regarding the spaces in the race field, this should preserve them if you wish 
to try my method.
source("Muhuri.txt")
Lines1<-readLines(textConnection(Lines))

 Col1new<-gsub(" 
+$","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:]]+\\-.*","\\1",Lines1[-1])))
 #changed

 Col2<-gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-.*)","\\1",Lines1[-1]))
 dat1<-data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),stringsAsFactors=FALSE)
 heading<-unlist(strsplit(Lines1[1]," "))
 colnames(dat1)<-heading[heading!=""]


head(dat1)
# race   age percent sepercent flag_var
#1 Mexican 12-17  5.7926   0.64195  any
#2    Puerto Rican 12-17  5.1975   0.24929  any
#3   Cuban 12-17  3.7977   1.00487  any
#4    C-S American 12-17  4.3665   0.55329  any
#5   Dominican 12-17  1.8149   0.46677  any
#6 Spanish (Spain) 12-17  6.1971   0.98386  any
 str(dat1)
#'data.frame':    195 obs. of  5 variables:
# $ race : chr  "Mexican" "Puerto Rican" "Cuban" "C-S American" ...
# $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
# $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
# $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
# $ flag_var : chr  "any" "any" "any" "any" ...


A.K.



- Original Message -
From: David L Carlson 
To: 'arun' ; "'Muhuri, Pradip (SAMHSA/CBHSQ)'" 

Cc: 'R help' 
Sent: Saturday, December 8, 2012 8:06 PM
Subject: RE: [R] read. table()

Arun's solution works but you lose your spaces in the race field. These
commands will preserve them. We need to make sure that your file has two or
more spaces between each field. The first gsub() command strips leading
space. The second inserts a space before the digit 1 (that is where all the
fields separated by a single space are). Then we convert two or more spaces
to a comma. Finally you can use read.table().

Starting with your vector xd1 from your first posting:
> raw2 <- readLines(con=textConnection(xd1))
> raw2 <- gsub("^ +", "", raw2)
> raw2 <- gsub(" 1", "  1", raw2)
> raw3 <- gsub("  +", ",", raw2)
> agerace <- read.table(text=raw3, header=TRUE, sep=",", as.is=TRUE)
> str(agerace)
'data.frame':   195 obs. of  5 variables:
$ race     : chr  "Mexican" "Puerto Rican" "Cuban" "C-S American" ...
$ age      : chr  "12-17" "12-17" "12-17" "12-17" ...
$ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
$ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
$ flag_var : chr  "any" "any" "any" "any" ...
>

-Original Message-
> From: arun [mailto:smartpink...@yahoo.com]
> Sent: Saturday, December 08, 2012 5:11 PM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: R help; David L Carlson
> Subject: Re: [R] read. table()
> 
> HI Pradip,
> 
> Try this:
> source("Muhuri.txt")
> #Muhuri.txt
> Lines<-  "race    age   percent  sepercent  flag_var
>  Mexican 12-17  5.7926   0.64195  any--
> ---
> 
> "
> Lines1<-readLines(textConnection(Lines))
> 
> Col1new<-gsub("
> ","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:
> ]]+\\-.*","\\1",Lines1[-1])))
> Col2<-
> gsub("\\s+\\D+([[:digit:]]+\\+.*)","\\1",gsub("\\s+\\D+([[:digit:]]+\\-
> .*)","\\1",Lines1[-1]))
> dat1<-
> data.frame(Col1new,read.table(text=Col2,stringsAsFactors=FALSE,sep=""),
> stringsAsFactors=FALSE)
> 
> heading<-unlist(strsplit(Lines1[1]," "))
> colnames(dat1)<-heading[heading!=""]
>  head(dat1,6)
> #    race   age percent sepercent flag_var
> #1    Mexican 12-17  5.7926   0.64195  any
> #2    PuertoRican 12-17  5.1975   0.24929  any
> #3  Cuban 12-17  3.7977   1.00487  any
> #4    C-SAmerican 12-17  4.3665   0.55329  any
> #5  Dominican 12-17  1.8149   0.46677  any
> #6 Spanish(Spain) 12-17  6.1971   0.98386  any
> 
> 
> 
>  str(dat1)
> 'data.frame':    195 obs. of  5 variables:
>  $ race : chr  "Mexican" "PuertoRican" "Cuban" "C-SAmerican" ...
>  $ age  : chr  "12-17" "12-17" "12-17" "12-17" ...
>  $ percent  : num  5.79 5.2 3.8 4.37 1.81 ...
>  $ sepercent: num  0.642 0.249 1.005 0.553 0.467 ...
>  $ flag_var : chr  "any" "any" "any" "any" ...
> 
> A.K.
> 
> 
> 
> - Original Message -
> From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
> To: 'arun' 
> Cc: David L Carlson ; R help 
> Sent: Saturday, December 8, 2012 5:20 PM
> Subject: RE: [R] read. table()
> 
> Dear Arun,
> 
> The issue is that the column names are incorrect.  I will also look
> into the comment by Prof Ripley.
> 
> Thanks for your continued support and help.
> 
> Pradip
> 
> > str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))
> 'data.frame':   195 obs. of  1 variable:
> $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "
>         Cuban   26+  0.6653   0.31239      mrj",..: 27 148 13 140 108
> 193 169 100 85 67 ...
> > names(agerace)
> [1] "raceage...percent..sepercent.

Re: [R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread David Winsemius

On Dec 8, 2012, at 1:34 PM, CHEN, Cheng wrote:

Hi, guys

I don't understand why I can apply as.Date to a single item in the  
list:

as.Date(alldays[4])

[1] "29-03-20"

but when I try to lapply as.Date to all the items, i got a sequence  
of neg

numbers:

sapply(alldays[1:4], FUN=as.Date)

03-04-2012 02-04-2012 30-03-2012 29-03-2012
  -718323-718688-708492-708857

does anyone know what's wrong here?

Problem #1
`sapply` will coerce to matrix or vector and remove the Date class

Problem #2:
You are not supplying a format to as.Date and your dates are not in  
the default formats>

> sapply(dts, as.Date)
03-04-2012 02-04-2012 30-03-2012 29-03-2012
   -718323-718688-708492-708857

> sapply(dts, as.Date, format="%d-%m-%Y")
03-04-2012 02-04-2012 30-03-2012 29-03-2012
 15433  15432  15429  15428

> lapply(dts, as.Date, format="%d-%m-%Y")
[[1]]
[1] "2012-04-03"

[[2]]
[1] "2012-04-02"

[[3]]
[1] "2012-03-30"

[[4]]
[1] "2012-03-29"

i am very confused!

Thanks a lot for your time in such a freezing weekend!

Problem #3:
You need to move to the California Coast.
--

.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean-Centering Question

2012-12-08 Thread Elizabeth Fuller Bettini

please remove me from this list.

On Sat, Dec 8, 2012 at 6:54 PM, Ray DiGiacomo, Jr.  wrote:

> R-help@r-project.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean-Centering Question

2012-12-08 Thread David Winsemius

On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:

Hello,

I'm trying to create a custom function that "mean-centers" data and  
can be

applied across many columns.

Here is an example dataset, which is similar to my dataset:

dat <- read.table(text="Location,TimePeriod,Units,AveragePrice
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2", header=TRUE, sep=",")

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

I needed to modify this to avoid errors relating to how colMeans is  
expecting its arguments:

specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }

aggregate(dat[3:4], dat[1], FUN=specialFunction2)

 LocationUnits.1Units.2Units.3 AveragePrice.1  
AveragePrice.2
1 Los Angeles  0.2136827 -0.0053709 -0.2083118  0.0717903  
-0.0728730
2New York  0.2354659 -0.0902535 -0.1452124  0.1014743  
-0.0871168
3   Paris  0.2193320 -0.0487031 -0.1706289  0.1173316  
-0.0491417

  AveragePrice.3
1  0.0010827
2 -0.0143575
3 -0.0681899

If I use only "one" column in the first argument of the "by" function,
everything is in fine.  For example the following code will work fine:

by(data[c("Units")],
data["Location"],
specialFunction)

But the following code will "not" work, because I have "two" columns  
in the

first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)

OK. So then I tried this with your function and was surprised to see  
that it also works:

> by(dat[c("Units", "AveragePrice")],
+ dat["Location"],
+ specialFunction)
Location: Los Angeles
 Units AveragePrice
1  0.213680.0717903
2  2.27351   -2.3517586
3 -0.208310.0010827
--
Location: New York
 Units AveragePrice
4  0.23547 0.101474
5  3.47628-3.653655
6 -0.14521-0.014357
--
Location: Paris
 Units AveragePrice
7  0.21933  0.11733
8  4.52537 -4.62322
9 -0.17063 -0.06819

Does anyone have any ideas as to what I am doing wrong?

I guess I don't. Cannot reproduce and my other methods worked as  
well.This also works with your version and with mine but I get the  
deprecation message for `mean.data.frame` from mine:

> lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
$`Los Angeles`
 Units AveragePrice
1  0.213680.0717903
2  2.27351   -2.3517586
3 -0.208310.0010827

$`New York`
 Units AveragePrice
4  0.23547 0.101474
5  3.47628-3.653655
6 -0.14521-0.014357

$Paris
 Units AveragePrice
7  0.21933  0.11733
8  4.52537 -4.62322
9 -0.17063 -0.06819

Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] read. table()

2012-12-08 Thread David Winsemius



On Dec 8, 2012, at 2:20 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:


Dear Arun,

The issue is that the column names are incorrect.


You have been given misinformation in this regard. Your column names  
were valid and not the source of your problems. The underscore causes  
no problems with names. prof Ripley idenitfied your problem. At some  
point your data was tab separated (as might happen when cut-pasting  
from Excel) but by the time it hit our mail-clients the tabs had been  
expanded to spaces and we were unable to read with sep="\t". But   
you should have been able to do so.


--
David


 I will also look into the comment by Prof Ripley.

Thanks for your continued support and help.

Pradip


str(read.delim(textConnection(xd1),header=TRUE,sep="\t"))

'data.frame':   195 obs. of  1 variable:
$ raceage...percent..sepercent..flag_var: Factor w/ 195 levels  
"Cuban   26+  0.6653   0.31239  mrj",..: 27 148 13  
140 108 193 169 100 85 67 ...

names(agerace)

[1] "raceage...percent..sepercent..flag_var"

head(agerace)

raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your  
feedback.  Please click on the following link to complete a brief  
customer survey:   http://cbhsqsurvey.samhsa.gov



-Original Message-
From: arun [mailto:smartpink...@yahoo.com]
Sent: Saturday, December 08, 2012 5:13 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: David L Carlson; R help
Subject: Re: [R] read. table()



Hi,

You can check the str()
I assume it will be like this:
str(read.delim(textConnection(Lines),header=TRUE,sep="\t"))
#'data.frame':195 obs. of  1 variable:
# $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels  
"C-S American 12-17  0.2399   0.15804  coc",..: 50 170 20 5  
35 185 65 155 110 80 ...


A.K.




- Original Message -
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" 
To: 'Prof Brian Ripley' ; "r-help@r- 
project.org" 

Cc:
Sent: Saturday, December 8, 2012 5:05 PM
Subject: Re: [R] read. table()

Dear Prof Ripley,

Your hint is helpful, and I see considerable improvements in the  
results.


The only issue is that the column names do not seem to be correct.   
I did not understand part of your comment, which says  
"fortunes::fortune(14) applies" although I read about the double  
colon operator- ns-dblcolon {base}.


Could you please provide a little more hint for me to resolve the  
issue?


Thanks and regards,

# new result 
agerace <- read.delim(textConnection(xd1), sep="\t",  header=TRUE,  
as.is=TRUE)

names(agerace)

[1] "raceage...percent..sepercent..flag_var"

head(agerace)

raceage...percent..sepercent..flag_var
1  Mexican 12-17  5.7926   0.64195  any
2 Puerto Rican 12-17  5.1975   0.24929  any
3Cuban 12-17  3.7977   1.00487  any
4 C-S American 12-17  4.3665   0.55329  any
5Dominican 12-17  1.8149   0.46677  any
6  Spanish (Spain) 12-17  6.1971   0.98386  any


Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: pradip.muh...@samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your  
feedback.  Please click on the following link to complete a brief  
customer survey:   http://cbhsqsurvey.samhsa.gov



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org 
] On Behalf Of Prof Brian Ripley

Sent: Saturday, December 08, 2012 2:29 PM
To: r-help@r-project.org
Subject: Re: [R] read.table()

On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:


Hi List,

I have spent more than 30 minutes, but failed to read in this file  
using the read.table() function. I could not figure out how to fix  
the following error.


Well, we have a whole manual on this, mentioned on ?read.table (see  
See

Also)  Have you read it?  fortunes::fortune(14) applies.

The issue is what the separator is.  You have specified whitespace,  
and
that is not correct.  The original might have had tabs (see ? 
read.delim)

but as pasted into this email only a human can disentangle this file.

Error in scan(file, what, nmax, sep, dec, qu

Re: [R] Mean-Centering Question

2012-12-08 Thread David Winsemius



On Dec 8, 2012, at 7:06 PM, Elizabeth Fuller Bettini wrote:


please remove me from this list.


You subscribed and only you know the password that allows you to  
control the subscription options. Please use the links at the bottom  
of every posting to Rhelp.




On Sat, Dec 8, 2012 at 6:54 PM, Ray DiGiacomo, Jr. 
wrote:



R-help@r-project.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (Re-posted as Plain Text ) Modelling a skew-normal distribution using glm/ mgcv

2012-12-08 Thread Saptarshi Guha

Hello,
[ Sorry, I sent the last email as HTML, this time it's in plain text ]

Suppose my variable,S, (time for something to start) is a skew-normal
distribution [1]. Can glm and mgcv handle this type of distribution for the
dependent variable?

Regards
Saptarshi
[1] http://azzalini.stat.unipd.it/SN/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread Rolf Turner


On 09/12/12 10:34, CHEN, Cheng wrote:

Hi, guys

I don't understand why I can apply as.Date to a single item in the list:

as.Date(alldays[4])

[1] "29-03-20"

but when I try to lapply as.Date to all the items, i got a sequence of neg
numbers:


sapply(alldays[1:4], FUN=as.Date)

03-04-2012 02-04-2012 30-03-2012 29-03-2012
-718323-718688-708492-708857

does anyone know what's wrong here?

i am very confused!

Thanks a lot for your time in such a freezing weekend!


It's actually a very fine warm sunny weekend here in the
Good Part of the World! :-)

(a) I don't understand the phenomenon you describe either.

(b) However, it seems to me there is no need to us sapply() at
all; just do:

as.Date(alldays)

and you get results as expected.  But see below.

(c) You are getting a toadally wrong answer from as.Date(alldays[4]),
if I am reading the output correctly.  My understanding is that alldays[4]
is "29-03-2012" and so you *want* to get 2012-03-29 from as.Date
(and *NOT* what you got!).

You need to do as.Date(alldays, format="%d-%m-%Y").  Is it not so?

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.

Hi David and Arun,

Thanks for looking into this.  I think I have found a solution.

The "by" function will run ok without errors but the values returned in the
second row of the "Los Angeles" output are both incorrect.  These incorrect
values are shown below in red.

I think my original custom function was causing the incorrect values
because the subtraction inside the original custom function was subtracting
frames that had different dimensions and I think there was some "recycling"
happening.

Using the "sweep" function fixes the problem.  This is what I did to fix
things:

# here is my "new" custom function
newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }

# this gives the correct values
by(PullData[c("Units","AveragePrice")],
PullData[c("StoreLocation")],
newFunction)

- Ray





On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius wrote:

>
> On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:
>
>  Hello,
>>
>> I'm trying to create a custom function that "mean-centers" data and can be
>> applied across many columns.
>>
>> Here is an example dataset, which is similar to my dataset:
>>
>>
>>  dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice
>
> Los Angeles,5/1/11,61,5.42
> Los Angeles,5/8/11,49,4.69
> Los Angeles,5/15/11,40,5.05
> New York,5/1/11,259,6.4
> New York,5/8/11,187,5.3
> New York,5/15/11,177,5.7
> Paris,5/1/11,672,6.26
> Paris,5/8/11,514,5.3
> Paris,5/15/11,455,5.2", header=TRUE, sep=",")
>
>
>> I want to mean-center the "Units" and "AveragePrice" Columns.
>>
>> So, I created this function:
>>
>> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
>>
>
> I needed to modify this to avoid errors relating to how colMeans is
> expecting its arguments:
>
> specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }
>
> aggregate(dat[3:4], dat[1], FUN=specialFunction2)
>
>  LocationUnits.1Units.2Units.3 AveragePrice.1
> AveragePrice.2
> 1 Los Angeles  0.2136827 -0.0053709 -0.2083118  0.0717903
> -0.0728730
> 2New York  0.2354659 -0.0902535 -0.1452124  0.1014743
> -0.0871168
> 3   Paris  0.2193320 -0.0487031 -0.1706289  0.1173316
> -0.0491417
>   AveragePrice.3
> 1  0.0010827
> 2 -0.0143575
> 3 -0.0681899
>
>
>
>> If I use only "one" column in the first argument of the "by" function,
>> everything is in fine.  For example the following code will work fine:
>>
>> by(data[c("Units")],
>> data["Location"],
>> specialFunction)
>>
>> But the following code will "not" work, because I have "two" columns in
>> the
>> first argument...
>>
>> by(data[c("Units", "AveragePrice")],
>> data["Location"],
>> specialFunction)
>>
>
> OK. So then I tried this with your function and was surprised to see that
> it also works:
>
> > by(dat[c("Units", "AveragePrice")],
> + dat["Location"],
> + specialFunction)
> Location: Los Angeles
>  Units AveragePrice
> 1  0.213680.0717903
> 2  *2.27351   -2.3517586*
> 3 -0.208310.0010827
> --**--**--
> Location: New York
>  Units AveragePrice
> 4  0.23547 0.101474
> 5  3.47628-3.653655
> 6 -0.14521-0.014357
> --**--**--
> Location: Paris
>  Units AveragePrice
> 7  0.21933  0.11733
> 8  4.52537 -4.62322
> 9 -0.17063 -0.06819
>
>
>
>> Does anyone have any ideas as to what I am doing wrong?
>>
>
> I guess I don't. Cannot reproduce and my other methods worked as well.This
> also works with your version and with mine but I get the deprecation
> message for `mean.data.frame` from mine:
>
> > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
> $`Los Angeles`
>  Units AveragePrice
> 1  0.213680.0717903
> 2  2.27351   -2.3517586
> 3 -0.208310.0010827
>
> $`New York`
>  Units AveragePrice
> 4  0.23547 0.101474
> 5  3.47628-3.653655
> 6 -0.14521-0.014357
>
> $Paris
>  Units AveragePrice
> 7  0.21933  0.11733
> 8  4.52537 -4.62322
> 9 -0.17063 -0.06819
>
>
>
>> Please note that I'm trying to get the following results (for the "Los
>> Angeles" group):
>>
>> Los Angeles "Units" variable (Mean-Centered)
>> 0.213682659
>> -0.005370907
>> -0.208311751
>>
>> Los Angeles "AveragePrice" variable (Mean-Centered)
>> 0.071790268
>> -0.072872965
>> 0.001082696
>>
>
> --
>
> David Winsemius, MD
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Mean-Centering Question

2012-12-08 Thread arun

Hi,

It works for me also:
 by(dat1[c("Units","AveragePrice")],dat1[,1],specialFunction)
#dat1[, 1]: Los Angeles
 #  Units AveragePrice
#1  0.2136827  0.071790268
#2  2.2735148 -2.351758623
#3 -0.2083118  0.001082696
--
#or

 by(cbind(Units=dat1[,3],AveragePrice=dat1[,4]),dat1[,1],specialFunction)
#INDICES: Los Angeles
 #  Units AveragePrice
#1  0.2136827  0.071790268
#2  2.2735148 -2.351758623
#3 -0.2083118  0.001082696


A.K.






- Original Message -
From: "Ray DiGiacomo, Jr." 
To: R Help 
Cc: 
Sent: Saturday, December 8, 2012 6:54 PM
Subject: [R] Mean-Centering Question

Hello,

I'm trying to create a custom function that "mean-centers" data and can be
applied across many columns.

Here is an example dataset, which is similar to my dataset:

*Location,TimePeriod,Units,AveragePrice*
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

If I use only "one" column in the first argument of the "by" function,
everything is in fine.  For example the following code will work fine:

by(data[c("Units")],
data["Location"],
specialFunction)

But the following code will "not" work, because I have "two" columns in the
first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)

Does anyone have any ideas as to what I am doing wrong?

Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

Best Regards,

Ray DiGiacomo, Jr.
Healthcare Predictive Analytics Specialist
President, Lion Data Systems LLC
President, The Orange County R User Group
Board Member, TDWI
r...@liondatasystems.com
(m) 408-425-7851
San Juan Capistrano, California USA
twitter.com/liondatasystems
linkedin.com/in/raydigiacomojr
youtube.com/user/liondatasystems/videos

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

40 matches

Mail list logo