Ha! -- A bug! "Corrected" version inline below:
Bert Gunter
On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter wrote:
> Brute force approach, possibly inefficient:
>
> 1. You have a vector of file names. Sort them in the appropriate (time)
> order. These names are also the component names of all the da
Brute force approach, possibly inefficient:
1. You have a vector of file names. Sort them in the appropriate (time)
order. These names are also the component names of all the data frames in
your list that you read in, call it yourlist.
2. Create a vector of all the unique ticker names, perhaps by
I suspect that you want to identify which variables are highly
correlated, and then keep only "representative" variables, i.e.,
remove redundant ones. This is a bit of a risky procedure but I have
done such things before as well sometimes to simplify large sets of
highly related variables. If your
Hi Bert,
I've attempted to find the answer and actually been able to import the
individual data sets into a list of data frames.
But I'm not sure how to go ahead with the next step. I'm not necessarily
asking for a final answer. Perhaps if you (I mean others as well) would
like a constructive coa
So you've made no attempt at all to do this for yourself?!
That suggests to me that you need to spend time with some R tutorials.
Also, please post in plain text on this plain text list. HTML can get
mangled, as it may have here.
-- Bert
"The trouble with having an open mind is that people keep
I have many separate data files in csv format for a lot of daily stock
prices. Over a few years there are hundreds of those data files, whose
names are the dates of data record.
In each file there are variables of ticker (or stock trading code), date,
open price, high price, low price, close price
HI Jim,
This:
colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3]
was the master take!
Thank you so much!!!
On Thu, Nov 14, 2019 at 3:39 PM Jim Lemon wrote:
>
> I thought you were going to trick us. What I think you are asking now
> is how to get the variable names in the columns that have at mos
I thought you were going to trick us. What I think you are asking now
is how to get the variable names in the columns that have at most one
_absolute_ value greater than 0.8. OK:
# I'm not going to try to recreate your correlation matrix
calc.jim<-matrix(runif(100,min=-1,max=1),nrow=10)
for(i in 1
Hi Ana,
Rather than addressing the question of why you want to do this, Let's
get make the question easier to answer:
calc.rho<-matrix(c(0.903,0.268,0.327,0.327,0.327,0.582,
0.928,0.276,0.336,0.336,0.336,0.598,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0.371,0.371,0.371,0.638,
0.975,0.309,0
what would be the approach to remove variable that has at least 2
correlation coefficients >0.8?
this is the whole output of the head()
> head(calc.rho)
rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506
rs56192520 1.000 0.976 0.927 0.927 0.927
That's assuming your data was returned by head().
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide c
> I basically want to remove all entries for pairs which have value in
> between them (correlation calculated not in R, bit it is correlation,
> r2)
> so for example I would not keep: rs883504 because it has r2>0.8 for
> all those rs...
I'm still not sure what "remove all entries" means?
In your e
Sorry, but I don't understand your question.
When I first looked at this, I thought it was a correlation (or
covariance) matrix.
e.g.
> cor (quakes)
> cov (quakes)
However, your row and column variables are different, implying two
different data sets.
Also, some of the (correlation?) coefficien
I don't understand. I have to keep only pairs of variables with
correlation less than 0.8 in order to proceed with some calculations
On Thu, Nov 14, 2019 at 2:09 PM Bert Gunter wrote:
>
> Obvious advice:
>
> DON'T DO THIS!
>
> Bert Gunter
>
> "The trouble with having an open mind is that people k
Obvious advice:
DON'T DO THIS!
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Nov 14, 2019 at 10:50 AM Ana Marija
wrote:
> Hello,
>
> I have a data fra
Hello,
I have a data frame like this (a matrix):
head(calc.rho)
rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995
rs56192520 0.903 0.268 0.327 0.327 0.327 0.582
rs3764410 0.928 0.276 0.336 0.336 0.336 0.598
rs145984817 0.
On Thu, 14 Nov 2019 09:34:30 -0800
Dennis Fisher wrote:
> Warning message:
> In readLines(FILE, n = 1) : line 1 appears to contain an
> embedded nul
<...>
> print(STRING)
> [1] "\xff\xfet”
Most probably, this means that the FILE is UCS-2LE-encoded (or maybe
UTF-16).
Thanks Bill and Jeff
strip.white did not change the outcomes.
However, your inputs led me to compare the raw content of the files (ie,
outside of an IDE) and found difference in how the apparent -99 were stored. In
the big file, some -99 are stored as floats rather than integers and thus
inclu
My recommendation is:
Post on the BioConductor site, not here.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Nov 14, 2019 at 9:22 AM chziy429 wrote:
>
R 3.6.1
OS X
Colleagues,
I read the first line of a CSV file using the readLines command; the only
option was n=1 (I am interested in only the first line of the file)
STRING <- readLines(FILE, n=1)
to which R responded:
Warning message:
In readLines(FILE, n = 1) : line
Dear Sir
I have downloaded the raw CEL data included in "GSE41418" from GEO and tried to
process the raw microarray data according to the following Rscripts
affydata <- ReadAffy(cdfname = "mouse4302mmentrezgcdf")
eset <- oligo::rma(affydata)
The raw data can be read by ReadAffy but fai
read.table (and friends) also have the strip.white argument:
> s <- "A,B,C\n0,0,0\n1,-99,-99\n2,-99 ,-99\n3, -99, -99\n"
> read.csv(text=s, header=TRUE, na.strings="-99", strip.white=TRUE)
A B C
1 0 0 0
2 1 NA NA
3 2 NA NA
4 3 NA NA
> read.csv(text=s, header=TRUE, na.strings="-99", strip.whi
Consider the following sample:
#
s <- "A,B,C
0,0,0
1,-99,-99
2,-99 ,-99
3, -99, -99
"
dta_notok <- read.csv( text = s
, header=TRUE
, na.strings = c( "-99", "" )
)
dta_ok <- read.csv( text = s
, header=TRUE
The data file is a csv file. Some text variables contain spaces.
"Check for extraneous spaces"
Are there specific locations that would be more critical than others?
From: Jeff Newmiller
Sent: Thursday, November 14, 2019 10:52
To: Sebastien Bihorel ; Sebastien Bi
Check for extraneous spaces. You may need more variations of the na.strings.
On November 14, 2019 7:40:42 AM PST, Sebastien Bihorel via R-help
wrote:
>Hi,
>
>I have this generic function to read ASCII data files. It is
>essentially a wrapper around the read.table function. My function is
>used i
Hi,
I have this generic function to read ASCII data files. It is essentially a
wrapper around the read.table function. My function is used in a large variety
of situations and has no a priori knowledge about the data file it is asked to
read. Nothing is known about file size, variable types, va
26 matches
Mail list logo