Re: [R] How to import and create time series data frames in an efficient way?

2019-11-14 Thread Bert Gunter
Ha! -- A bug! "Corrected" version inline below: Bert Gunter On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter wrote: > Brute force approach, possibly inefficient: > > 1. You have a vector of file names. Sort them in the appropriate (time) > order. These names are also the component names of all the da

Re: [R] How to import and create time series data frames in an efficient way?

2019-11-14 Thread Bert Gunter
Brute force approach, possibly inefficient: 1. You have a vector of file names. Sort them in the appropriate (time) order. These names are also the component names of all the data frames in your list that you read in, call it yourlist. 2. Create a vector of all the unique ticker names, perhaps by

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Peter Langfelder
I suspect that you want to identify which variables are highly correlated, and then keep only "representative" variables, i.e., remove redundant ones. This is a bit of a risky procedure but I have done such things before as well sometimes to simplify large sets of highly related variables. If your

Re: [R] How to import and create time series data frames in an efficient way?

2019-11-14 Thread Nhan La
Hi Bert, I've attempted to find the answer and actually been able to import the individual data sets into a list of data frames. But I'm not sure how to go ahead with the next step. I'm not necessarily asking for a final answer. Perhaps if you (I mean others as well) would like a constructive coa

Re: [R] How to import and create time series data frames in an efficient way?

2019-11-14 Thread Bert Gunter
So you've made no attempt at all to do this for yourself?! That suggests to me that you need to spend time with some R tutorials. Also, please post in plain text on this plain text list. HTML can get mangled, as it may have here. -- Bert "The trouble with having an open mind is that people keep

[R] How to import and create time series data frames in an efficient way?

2019-11-14 Thread Nhan La
I have many separate data files in csv format for a lot of daily stock prices. Over a few years there are hundreds of those data files, whose names are the dates of data record. In each file there are variables of ticker (or stock trading code), date, open price, high price, low price, close price

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
HI Jim, This: colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3] was the master take! Thank you so much!!! On Thu, Nov 14, 2019 at 3:39 PM Jim Lemon wrote: > > I thought you were going to trick us. What I think you are asking now > is how to get the variable names in the columns that have at mos

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Jim Lemon
I thought you were going to trick us. What I think you are asking now is how to get the variable names in the columns that have at most one _absolute_ value greater than 0.8. OK: # I'm not going to try to recreate your correlation matrix calc.jim<-matrix(runif(100,min=-1,max=1),nrow=10) for(i in 1

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Jim Lemon
Hi Ana, Rather than addressing the question of why you want to do this, Let's get make the question easier to answer: calc.rho<-matrix(c(0.903,0.268,0.327,0.327,0.327,0.582, 0.928,0.276,0.336,0.336,0.336,0.598, 0.975,0.309,0.371,0.371,0.371,0.638, 0.975,0.309,0.371,0.371,0.371,0.638, 0.975,0.309,0

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
what would be the approach to remove variable that has at least 2 correlation coefficients >0.8? this is the whole output of the head() > head(calc.rho) rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506 rs56192520 1.000 0.976 0.927 0.927 0.927

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
That's assuming your data was returned by head(). __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide c

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
> I basically want to remove all entries for pairs which have value in > between them (correlation calculated not in R, bit it is correlation, > r2) > so for example I would not keep: rs883504 because it has r2>0.8 for > all those rs... I'm still not sure what "remove all entries" means? In your e

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
Sorry, but I don't understand your question. When I first looked at this, I thought it was a correlation (or covariance) matrix. e.g. > cor (quakes) > cov (quakes) However, your row and column variables are different, implying two different data sets. Also, some of the (correlation?) coefficien

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
I don't understand. I have to keep only pairs of variables with correlation less than 0.8 in order to proceed with some calculations On Thu, Nov 14, 2019 at 2:09 PM Bert Gunter wrote: > > Obvious advice: > > DON'T DO THIS! > > Bert Gunter > > "The trouble with having an open mind is that people k

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Bert Gunter
Obvious advice: DON'T DO THIS! Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Nov 14, 2019 at 10:50 AM Ana Marija wrote: > Hello, > > I have a data fra

[R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
Hello, I have a data frame like this (a matrix): head(calc.rho) rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995 rs56192520 0.903 0.268 0.327 0.327 0.327 0.582 rs3764410 0.928 0.276 0.336 0.336 0.336 0.598 rs145984817 0.

Re: [R] Problem related to multibyte string in CSV file

2019-11-14 Thread Ivan Krylov
On Thu, 14 Nov 2019 09:34:30 -0800 Dennis Fisher wrote: > Warning message: > In readLines(FILE, n = 1) : line 1 appears to contain an > embedded nul <...> > print(STRING) > [1] "\xff\xfet” Most probably, this means that the FILE is UCS-2LE-encoded (or maybe UTF-16).

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
Thanks Bill and Jeff strip.white did not change the outcomes. However, your inputs led me to compare the raw content of the files (ie, outside of an IDE) and found difference in how the apparent -99 were stored. In the big file, some -99 are stored as floats rather than integers and thus inclu

Re: [R] Ask for help on the preprocessing of GEO microarray data with Oligo

2019-11-14 Thread Bert Gunter
My recommendation is: Post on the BioConductor site, not here. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Nov 14, 2019 at 9:22 AM chziy429 wrote: >

[R] Problem related to multibyte string in CSV file

2019-11-14 Thread Dennis Fisher
R 3.6.1 OS X Colleagues, I read the first line of a CSV file using the readLines command; the only option was n=1 (I am interested in only the first line of the file) STRING <- readLines(FILE, n=1) to which R responded: Warning message: In readLines(FILE, n = 1) : line

[R] Ask for help on the preprocessing of GEO microarray data with Oligo

2019-11-14 Thread chziy429
Dear Sir I have downloaded the raw CEL data included in "GSE41418" from GEO and tried to process the raw microarray data according to the following Rscripts affydata <- ReadAffy(cdfname = "mouse4302mmentrezgcdf") eset <- oligo::rma(affydata) The raw data can be read by ReadAffy but fai

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread William Dunlap via R-help
read.table (and friends) also have the strip.white argument: > s <- "A,B,C\n0,0,0\n1,-99,-99\n2,-99 ,-99\n3, -99, -99\n" > read.csv(text=s, header=TRUE, na.strings="-99", strip.white=TRUE) A B C 1 0 0 0 2 1 NA NA 3 2 NA NA 4 3 NA NA > read.csv(text=s, header=TRUE, na.strings="-99", strip.whi

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Jeff Newmiller
Consider the following sample: # s <- "A,B,C 0,0,0 1,-99,-99 2,-99 ,-99 3, -99, -99 " dta_notok <- read.csv( text = s , header=TRUE , na.strings = c( "-99", "" ) ) dta_ok <- read.csv( text = s , header=TRUE

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
The data file is a csv file. Some text variables contain spaces. "Check for extraneous spaces" Are there specific locations that would be more critical than others? From: Jeff Newmiller Sent: Thursday, November 14, 2019 10:52 To: Sebastien Bihorel ; Sebastien Bi

Re: [R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Jeff Newmiller
Check for extraneous spaces. You may need more variations of the na.strings. On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help wrote: >Hi, > >I have this generic function to read ASCII data files. It is >essentially a wrapper around the read.table function. My function is >used i

[R] Can file size affect how na.strings operates in a read.table call?

2019-11-14 Thread Sebastien Bihorel via R-help
Hi, I have this generic function to read ASCII data files. It is essentially a wrapper around the read.table function. My function is used in a large variety of situations and has no a priori knowledge about the data file it is asked to read. Nothing is known about file size, variable types, va