Dear Erik and Wacek,
I would request that you stop working on my problem. I had the second column
deleted
and the problem is gone. I don't know why but apparently the second column
somehow
interfered with the third column such that the third column is regarded as
'factor' not
'numeric'.
I can recover the 2nd column, which is gene symbol later so I cannot worry
about it
for now. I just don't want you to invest your precious time on this.
Thanks much,
Allen
On Thu, Jun 12, 2008 at 8:01 PM, ss <[EMAIL PROTECTED]> wrote:
> Thanks, Erik. I will try your code soon.
>
> I did this first:
>
> > data <-
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> row.names = NULL ,header=TRUE, fill=TRUE)
> > class(data[[3]])
> [1] "factor"
> > is.numeric(data[[3]])
> [1] FALSE
> >
>
> So it is not numeric but 'factor' instead.
> Can I convert this column to numeric?
>
> Allen
>
>
> On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <[EMAIL PROTECTED]>
> wrote:
>
>>
>>
>> ss wrote:
>>
>>> It is:
>>>
>>> > data <-
>>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>>> row.names = NULL ,header=TRUE, fill=TRUE)
>>> > class(data[3])
>>> [1] "data.frame"
>>> >
>>>
>>>
>> Oops, should have said class(data[[3]]) and
>> is.numeric(data[[3]])
>>
>> See ?Extract
>>
>>
>>> And if I try to use as.matrix(read.table()), I got:
>>>
>>> >data
>>> <-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>>> + row.names = NULL ,header=TRUE, fill=TRUE))
>>> > data[1:4,1:4]
>>> Probe_ID Gene_Symbol M16012391010920 M16012391010525
>>> [1,] "A_23_P105862" "13CDNA73" "-1.6" " 0.16" [2,]
>>> "A_23_P76435" "15E1.2" "0.18" " 0.59" [3,] "A_24_P402115"
>>> "15E1.2" "1.63" "-0.62" [4,] "A_32_P227764" "15E1.2"
>>> "-0.76" "-0.42"
>>> You see they are surrounded by "".
>>>
>>> I don't see such if I just use >read.table
>>>
>>>
>> That is because matrices (objects of class 'matrix') are of homogeneous
>> type. It changes everything to a character (including the numbers), which
>> you certainly do NOT want.
>>
>> You want a data.frame, I will provide an example of what I think you are
>> after.
>>
>> Try the following commands and see how they compare to your situation:
>> these work for me.
>>
>> test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y = rnorm(26),
>> z = rnorm(26))
>>
>> test
>>
>> class(test)
>>
>> is.numeric(test[[2]])
>>
>> is.numeric(test[[3]])
>>
>> rowMeans(test)
>>
>> rowMeans(test[2:3])
>>
>> > data <-
>>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>>> row.names = NULL ,header=TRUE, fill=TRUE)
>>> > data[1:4,1:4]
>>> Probe_ID Gene_Symbol M16012391010920 M16012391010525
>>> 1 A_23_P105862 13CDNA73 -1.6 0.16
>>> 2 A_23_P76435 15E1.2 0.18 0.59
>>> 3 A_24_P402115 15E1.2 1.63 -0.62
>>> 4 A_32_P227764 15E1.2 -0.76 -0.42
>>>
>>>
>>> Thanks,
>>> Allen
>>>
>>>
>>>
>>> On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson <[EMAIL PROTECTED]<mailto:
>>> [EMAIL PROTECTED]>> wrote:
>>>
>>>
>>>
>>> ss wrote:
>>>
>>> Hi Wacek,
>>>
>>> Yes, data is data frame not a matrix.
>>>
>>> is.numeric(data[3])
>>>
>>> [1] FALSE
>>>
>>>
>>> what is class(data[3])
>>>
>>>
>>> But I looked at the column 3 and it looks okay though. There are
>>> few NAs and
>>> I did find
>>> anything strange.
>>>
>>> Any suggestions?
>>>
>>> Thanks,
>>> Allen
>>>
>>>
>>>
>>> On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
>>> [EMAIL PROTECTED]
>>> <mailto:[EMAIL PROTECTED]>> wrote:
>>>
>>> ss wrote:
>>>
>>> Thank you very much, Wacek! It works very well.
>>> But there is a minor problem. I did the following:
>>>
>>> data <-
>>>
>>>
>>> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>>> +row.names = NULL ,header=TRUE, fill=TRUE)
>>>
>>> looks like you have a data frame, not a matrix
>>>
>>>
>>> dim(data)
>>>
>>> [1] 23963 85
>>>
>>> data[1:4,1:4]
>>>
>>> Probe_ID Gene_Symbol M16012391010920 M16012391010525
>>> 1 A_23_P105862 13CDNA73 -1.6 0.16
>>> 2 A_23_P76435 15E1.2 0.18 0.59
>>> 3 A_24_P402115 15E1.2 1.63 -0.62
>>> 4 A_32_P227764 15E1.2 -0.76 -0.42
>>>
>>> data1<-data[sapply(data, is.numeric)]
>>> dim(data1)
>>>
>>> [1] 23963 82
>>>
>>> data1[1:4,1:4]
>>>
>>> M16012391010525 M16012391010843 M16012391010531
>>> M16012391010921
>>> 1 0.16 -0.23 -1.40
>>> 0.90
>>> 2 0.59 0.28 -0.30
>>> 0.08
>>> 3 -0.62 -0.62 -0.22
>>> -0.18
>>> 4 -0.42 0.01 0.28
>>> -0.79
>>>
>>> You will notice that, after using 'data[sapply(data,
>>> is.numeric)]' and
>>> getting
>>> data1, the first sample in data, called
>>> 'M16012391010920', was missed
>>> in data1.
>>>
>>> Any further suggestions?
>>>
>>> surely there must be an entry in column 3 that makes it
>>> non-numeric.
>>> what does is.numeric(data[3]) say? (NAs should not make a
>>> column
>>> non-numeric, unless there are only NAs there, which is not
>>> the case
>>> here.) check your data for non-numeric entries in column 3,
>>> there can
>>> be a typo.
>>>
>>> vQ
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [email protected] <mailto:[email protected]> mailing list
>>>
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.