Dear useR's

I have a small basic problem which I am hoping to get some help with. I have
a data frame, testSeq_df, with 1 row and 500 columns. Each column is a
character (a,c,g or t). I want this sequence to have 4 factors (a,c,g,t).
When I try the following:

for(i in 1:500){
    if (length(levels(testSeq_df[,i]))==1)
    levels(testSeq_df[,i]) <- c(a="a",g="g",c="c",t="t")}

it replaces all the values in the sequence with "a" only. So all columns
become "a". How do I fix this so that the columns retain their original
values but still have 4 levels ie. a,c,g,t.

Thanks a lot for your help,

Vishal

On Sat, Dec 26, 2009 at 10:39 AM, Vishal Thapar <vishaltha...@gmail.com>wrote:

> Hi David,
>
> Thank you so much for the pointer. I get it now. I did try the
> str(testSeq_df) and since it gave me more than 2 factors for each column, I
> believed that it was fine. I get the point clearly now. Thanks again for all
> your help. I really appreciate it.
>
> Sincerely,
>
> vishal
>
>
> On Sat, Dec 26, 2009 at 8:26 AM, David Winsemius 
> <dwinsem...@comcast.net>wrote:
>
>>
>> On Dec 26, 2009, at 3:53 AM, Vishal Thapar wrote:
>>
>>  Hi All,
>>>
>>> Thank you for your replies so far. I was hoping I could get some more
>>> input from you on this issue. It seems to me that I have hit a dead end here
>>> and would really appreciate some feedback. I have followed all the
>>> suggestions you have mentioned but they still this is stuck. Earlier I
>>> thought that it was a "factor" issue but now even that is not the error.
>>> Here is the script and the error. Thanks for your help. I have attached the
>>> sample test file as well as the training file in case you would like to run
>>> it locally.
>>> ---------------------------------
>>> library(seqinr)
>>> library("kernlab")
>>>
>>
>>
>> > str(mars500_1_df)
>> 'data.frame':   256 obs. of  501 variables:
>> All of which are factors with 4 levels
>>
>>
>>  testSeq_fa=read.fasta("temp1.fasta")
>>> testSeq_seq=t(getSequence(testSeq_fa))
>>> testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE)
>>> testSeq_df = cbind(Class="-",testSeq_df)
>>> testSeq_df = data.frame(lapply(testSeq_df,factor))
>>>
>> > str(testSeq_df)
>> 'data.frame':   20 obs. of  501 variables:
>>
>> $ V9   : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ...
>> $ V9   : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ...
>> $ V26  : Factor w/ 3 levels "a","g","t": 2 1 1 1 1 3 1 3 1 3 ...
>> ...and about 10 more...
>>
>> So I think you were closer but not quite there yet.
>> > for(i in 11:501){if (length(levels(testSeq_df[,i])) == 3)
>>                levels(testSeq_df[,i])<- c(a="a",g="g",c="c",c="t")}
>>
>>
>> > predict(mars500_1,testSeq_df)
>>  [1] - - - - + - + - - - - + + + - - - - - +
>> Levels: - +
>>
>> YES, it WAS (and still is) a "factor issue". You were shown how to look at
>> objects with str. Why have you not adopted the practice?
>>
>> --
>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>>
>
>
> --
> Vishal Thapar, Ph.D.
> Post Doctoral Researcher
> Cold Spring Harbor Lab
> Williams Bldg
>
> 1 Bungtown Road
> Cold Spring Harbor, NY - 11724
>
>


-- 
Vishal Thapar, Ph.D.
Post Doctoral Researcher
Cold Spring Harbor Lab
Williams Bldg

1 Bungtown Road
Cold Spring Harbor, NY - 11724

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to