Re: [R] file.link fails on NTFS

2012-12-08 Thread Rui Barradas
Hello, Checks. It seems like a Windows specific bug, it works on Ubuntu 12.04/R 2.15.2. I'll post to R-devel. > sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252 [3] LC_MONETAR

[R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Marius Hofert
Dear expeRts, I have two matrices A and B. They have the same number of columns but possibly different number of rows. I would like to compare each row of A with each row of B and check whether all entries in a row of A are less than or equal to all entries in a row of B. Here is a minimal work

[R] Sampling from a Population

2012-12-08 Thread Lorenzo Isella
Dear All, I hope this is not too off topic, but I am sure it has to be a one-liner in R. Suppose you have a population of size N and that you take a random sample of n_s individuals out of this population. This population includes a subgroup of n_i individuals. For any individual in n_i, what

Re: [R] Sampling from a Population

2012-12-08 Thread R. Michael Weylandt
Hi Lorenzo, This has the feel of a homework problem, but I will suggest to you that this is "sampling without replacement" and there exist easy mathematical formulas (no need to resort to R) to calculate your desired probability. Michael On Sat, Dec 8, 2012 at 11:54 AM, Lorenzo Isella wrote: >

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Thomas Stewart
One option is to consider a Kronecker-type expansion. See code below. -tgs perhaps <- function(A,B){ nA <- nrow(A) nB <- nrow(B) C <- kronecker(matrix(1,nrow=nA,ncol=1),B) >= kronecker(A,matrix(1,nrow=nB,ncol=1)) matrix(rowSums(C) == ncol(A), nA, nB, byrow=TRUE) } Marius <- function(A,B) apply

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Hofert Jan Marius
Nice idea, Thomas, thanks. I could further decrease run time a bit, by building the required matrices by hand. Any other ideas? Marius <- function(A, B) apply(B, 1, function(b) apply(A, 1, function(a) all(a <= b))) perhaps <- function(A, B){ nA <- nrow(A) nB <- nrow(B) C <- kroneck

[R] KMP String search

2012-12-08 Thread email
Hi: Is there any Package in R which implements the KMP String search algorithm ? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guid

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread arun
Hi, May be this: N <- 1000 M <- 5 P <- 5000 set.seed(15) A <- matrix(runif(N,1,1000),nrow=N,ncol=M) set.seed(425) B <- matrix(runif(M,1,1000),nrow=P,ncol=M) Marius.3.0<-function(A,B){do.call(cbind,lapply(split(B,row(B)),function(x) colSums(x>=t(A))==ncol(A)))}  system.time(Marius.3.0(A,B))   # u

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread arun
Hi, Just to add: N <- 1000 M <- 5 P <- 5000 set.seed(15) A <- matrix(runif(N,1,1000),nrow=N,ncol=M) set.seed(425) B <- matrix(runif(M,1,1000),nrow=P,ncol=M) Marius.3.0<-function(A,B){do.call(cbind,lapply(split(B,row(B)),function(x) colSums(x>=t(A))==ncol(A)))} Marius.2.0 <- function(A, B){    

[R] read.table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Hi List, I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error. Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 6 elements Any help would be

Re: [R] How to efficiently compare each row in a matrix with each row in another matrix?

2012-12-08 Thread Hofert Jan Marius
The idea is good, but you don't need to create a list of the rows of A first, apply does the job: Marius.4.0 <- function(A, B) apply(B, 1, function(x) colSums(x>=t(A))==ncol(A)) That was actually a bit faster than your version. This is the fastest version so far. I compared it with C code

Re: [R] read.table()

2012-12-08 Thread Prof Brian Ripley
On 08/12/2012 19:10, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Hi List, I have spent more than 30 minutes, but failed to read in this file using the read.table() function. I could not figure out how to fix the following error. Well, we have a whole manual on this, mentioned on ?read.table (see Se

[R] print and cat not working with parallelised functions?

2012-12-08 Thread Martin Ivanov
Dear R Community, I am running R version 2.15.2 with package parallel version 2.15.2. The problem is that cat and print do not produce any output. Also assigning objects to the .GlobalEnv does not work. This makes it difficult for me to debug code. This can be seen from the following minimal wo

[R] Oracle Approximating Shrinkage in R?

2012-12-08 Thread Matt Considine
Hi, Can anyone point me to an implementation in R of the oracle approximating shrinkage technique for covariance matrices? Rseek, Google, etc. aren't turning anything up for me. Thanks in advance, Matt Considine __ R-help@r-project.org mailing list

Re: [R] print and cat not working with parallelised functions?

2012-12-08 Thread Uwe Ligges
On 08.12.2012 21:04, Martin Ivanov wrote: Dear R Community, I am running R version 2.15.2 with package parallel version 2.15.2. The problem is that cat and print do not produce any output. Also assigning objects to the .GlobalEnv does not work. This makes it difficult for me to debug code.

Re: [R] imputation in mice

2012-12-08 Thread David L Carlson
What do > str(data) > summary(data) show you? The str() function will show you what kind of variables you have and the summary() command will indicate the range of the values and if there are missing data. You seem to be overwriting your original data frame "data" (really a bad name to use sin

Re: [R] read.table()

2012-12-08 Thread David L Carlson
If you look at the first few lines, you can see the problem. Your category "race" has labels that contain spaces and you've told read.table() to separate the variables using whitespace (including spaces) so read.table() sees six variables in this line, but only five variables names in the first lin

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Dear Prof Ripley, Your hint is helpful, and I see considerable improvements in the results. The only issue is that the column names do not seem to be correct. I did not understand part of your comment, which says "fortunes::fortune(14) applies" although I read about the double colon operator-

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Dear Arun, The issue is that the column names are incorrect. I will also look into the comment by Prof Ripley. Thanks for your continued support and help. Pradip > str(read.delim(textConnection(xd1),header=TRUE,sep="\t")) 'data.frame': 195 obs. of 1 variable: $ raceage...percent..sepe

Re: [R] KMP String search

2012-12-08 Thread Rui Barradas
Hello, As far as I know, the answer is no, there isn't. Hope this helps, Rui Barradas Em 08-12-2012 17:44, email escreveu: Hi: Is there any Package in R which implements the KMP String search algorithm ? Thanks John [[alternative HTML version deleted]] _

Re: [R] cannot read iso639 table

2012-12-08 Thread Prof Brian Ripley
For the record, in R-devel you can do f <- read.table(url("http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt";, encoding = "UTF-8-BOM"), quote="", sep="|", stringsAsFactors=FALSE) f[1,] V1 V2 V3 V4 V5 1 aaraa Afar afar charToRaw(f[1,1]) [1] 61 61 72 Whether this works wi

Re: [R] read. table()

2012-12-08 Thread David L Carlson
Arun's solution works but you lose your spaces in the race field. These commands will preserve them. We need to make sure that your file has two or more spaces between each field. The first gsub() command strips leading space. The second inserts a space before the digit 1 (that is where all the fie

Re: [R] read.table()

2012-12-08 Thread Tanja Vukov
Hi! I think you have problem with "flag_var". I suggest to put just "flagvar". Do not use "_" in variable names. I would suggest not to use both "_" or "-" anywhere in data file. I am just a beginner with R but think that is the problem... Cheers! Tanja. On Sat, Dec 8, 2012 at 8:29 PM, Prof Brian

[R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.
Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: *Location,TimePeriod,Units,AveragePrice* Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 Ne

[R] Dbscan Clustering Feature Question

2012-12-08 Thread anthony kasza
Hello list. My apologies if this topic has been discussed before on the list but I was unable to find it. I'm working on a way to cluster PCAP files according to the events recorded within them. I've decided to use Bro-IDS for feature extraction. I am looking at dbscan within the FPC library to acc

[R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread CHEN, Cheng
Hi, guys I don't understand why I can apply as.Date to a single item in the list: > as.Date(alldays[4]) [1] "29-03-20" but when I try to lapply as.Date to all the items, i got a sequence of neg numbers: > sapply(alldays[1:4], FUN=as.Date) 03-04-2012 02-04-2012 30-03-2012 29-03-2012 -718323

Re: [R] read. table()

2012-12-08 Thread arun
Hi, You can check the str() I assume it will be like this:  str(read.delim(textConnection(Lines),header=TRUE,sep="\t")) #'data.frame':    195 obs. of  1 variable: # $ raceage...percent..sepercent..flag_var: Factor w/ 195 levels "    C-S American 12-17  0.2399   0.15804  coc",..: 50 170

Re: [R] read. table()

2012-12-08 Thread arun
HI Pradip, Try this: source("Muhuri.txt") #Muhuri.txt Lines<-  "race    age   percent  sepercent  flag_var Mexican 12-17  5.7926   0.64195  any- " Lines1<-readLines(te

[R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.
Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: *Location,TimePeriod,Units,AveragePrice* Los Angeles,5/1/11,61,5.42 Los Angeles,5/8/11,49,4.69 Los Angeles,5/15/11,40,5.05 Ne

Re: [R] read. table()

2012-12-08 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Dear David and Arun, Thank you very much for your time and efforts and for resolving the issue. >From this exchange, I have learned something new about reading the data files into R. Regards, Pradip Pradip K. Muhuri, PhD Statistician Substance Abuse & Mental Health Services Administration Th

Re: [R] read. table()

2012-12-08 Thread arun
Hi, David's method is much better than mine. Regarding the spaces in the race field, this should preserve them if you wish to try my method. source("Muhuri.txt") Lines1<-readLines(textConnection(Lines))  Col1new<-gsub(" +$","",gsub("\\s+(\\D+)[[:digit:]]+\\+.*","\\1",gsub("\\s+(\\D+)[[:digit:]]

Re: [R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread David Winsemius
On Dec 8, 2012, at 1:34 PM, CHEN, Cheng wrote: Hi, guys I don't understand why I can apply as.Date to a single item in the list: as.Date(alldays[4]) [1] "29-03-20" but when I try to lapply as.Date to all the items, i got a sequence of neg numbers: sapply(alldays[1:4], FUN=as.Date) 0

Re: [R] Mean-Centering Question

2012-12-08 Thread Elizabeth Fuller Bettini
please remove me from this list. On Sat, Dec 8, 2012 at 6:54 PM, Ray DiGiacomo, Jr. wrote: > R-help@r-project.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do

Re: [R] Mean-Centering Question

2012-12-08 Thread David Winsemius
On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote: Hello, I'm trying to create a custom function that "mean-centers" data and can be applied across many columns. Here is an example dataset, which is similar to my dataset: dat <- read.table(text="Location,TimePeriod,Units,AveragePrice

Re: [R] read. table()

2012-12-08 Thread David Winsemius
On Dec 8, 2012, at 2:20 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote: Dear Arun, The issue is that the column names are incorrect. You have been given misinformation in this regard. Your column names were valid and not the source of your problems. The underscore causes no problems with names.

Re: [R] Mean-Centering Question

2012-12-08 Thread David Winsemius
On Dec 8, 2012, at 7:06 PM, Elizabeth Fuller Bettini wrote: please remove me from this list. You subscribed and only you know the password that allows you to control the subscription options. Please use the links at the bottom of every posting to Rhelp. On Sat, Dec 8, 2012 at 6:54 PM,

[R] (Re-posted as Plain Text ) Modelling a skew-normal distribution using glm/ mgcv

2012-12-08 Thread Saptarshi Guha
Hello, [ Sorry, I sent the last email as HTML, this time it's in plain text ] Suppose my variable,S, (time for something to start) is a skew-normal distribution [1]. Can glm and mgcv handle this type of distribution for the dependent variable? Regards Saptarshi [1] http://azzalini.stat.unipd.it/S

Re: [R] Why my lapply doesn't work with FUN=as.Date

2012-12-08 Thread Rolf Turner
On 09/12/12 10:34, CHEN, Cheng wrote: Hi, guys I don't understand why I can apply as.Date to a single item in the list: as.Date(alldays[4]) [1] "29-03-20" but when I try to lapply as.Date to all the items, i got a sequence of neg numbers: sapply(alldays[1:4], FUN=as.Date) 03-04-2012 02-04-

Re: [R] Mean-Centering Question

2012-12-08 Thread Ray DiGiacomo, Jr.
Hi David and Arun, Thanks for looking into this. I think I have found a solution. The "by" function will run ok without errors but the values returned in the second row of the "Los Angeles" output are both incorrect. These incorrect values are shown below in red. I think my original custom fun

Re: [R] Mean-Centering Question

2012-12-08 Thread arun
Hi, It works for me also:  by(dat1[c("Units","AveragePrice")],dat1[,1],specialFunction) #dat1[, 1]: Los Angeles  # Units AveragePrice #1  0.2136827  0.071790268 #2  2.2735148 -2.351758623 #3 -0.2083118  0.001082696 -- #or  by(cbind(Units=dat1[,3],A