Dear useR's I have a small basic problem which I am hoping to get some help with. I have a data frame, testSeq_df, with 1 row and 500 columns. Each column is a character (a,c,g or t). I want this sequence to have 4 factors (a,c,g,t). When I try the following:
for(i in 1:500){ if (length(levels(testSeq_df[,i]))==1) levels(testSeq_df[,i]) <- c(a="a",g="g",c="c",t="t")} it replaces all the values in the sequence with "a" only. So all columns become "a". How do I fix this so that the columns retain their original values but still have 4 levels ie. a,c,g,t. Thanks a lot for your help, Vishal On Sat, Dec 26, 2009 at 10:39 AM, Vishal Thapar <vishaltha...@gmail.com>wrote: > Hi David, > > Thank you so much for the pointer. I get it now. I did try the > str(testSeq_df) and since it gave me more than 2 factors for each column, I > believed that it was fine. I get the point clearly now. Thanks again for all > your help. I really appreciate it. > > Sincerely, > > vishal > > > On Sat, Dec 26, 2009 at 8:26 AM, David Winsemius > <dwinsem...@comcast.net>wrote: > >> >> On Dec 26, 2009, at 3:53 AM, Vishal Thapar wrote: >> >> Hi All, >>> >>> Thank you for your replies so far. I was hoping I could get some more >>> input from you on this issue. It seems to me that I have hit a dead end here >>> and would really appreciate some feedback. I have followed all the >>> suggestions you have mentioned but they still this is stuck. Earlier I >>> thought that it was a "factor" issue but now even that is not the error. >>> Here is the script and the error. Thanks for your help. I have attached the >>> sample test file as well as the training file in case you would like to run >>> it locally. >>> --------------------------------- >>> library(seqinr) >>> library("kernlab") >>> >> >> >> > str(mars500_1_df) >> 'data.frame': 256 obs. of 501 variables: >> All of which are factors with 4 levels >> >> >> testSeq_fa=read.fasta("temp1.fasta") >>> testSeq_seq=t(getSequence(testSeq_fa)) >>> testSeq_df=as.data.frame(testSeq_seq,stringsAsFactors=FALSE) >>> testSeq_df = cbind(Class="-",testSeq_df) >>> testSeq_df = data.frame(lapply(testSeq_df,factor)) >>> >> > str(testSeq_df) >> 'data.frame': 20 obs. of 501 variables: >> >> $ V9 : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ... >> $ V9 : Factor w/ 3 levels "a","c","t": 2 1 2 1 3 2 3 2 3 1 ... >> $ V26 : Factor w/ 3 levels "a","g","t": 2 1 1 1 1 3 1 3 1 3 ... >> ...and about 10 more... >> >> So I think you were closer but not quite there yet. >> > for(i in 11:501){if (length(levels(testSeq_df[,i])) == 3) >> levels(testSeq_df[,i])<- c(a="a",g="g",c="c",c="t")} >> >> >> > predict(mars500_1,testSeq_df) >> [1] - - - - + - + - - - - + + + - - - - - + >> Levels: - + >> >> YES, it WAS (and still is) a "factor issue". You were shown how to look at >> objects with str. Why have you not adopted the practice? >> >> -- >> >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> > > > -- > Vishal Thapar, Ph.D. > Post Doctoral Researcher > Cold Spring Harbor Lab > Williams Bldg > > 1 Bungtown Road > Cold Spring Harbor, NY - 11724 > > -- Vishal Thapar, Ph.D. Post Doctoral Researcher Cold Spring Harbor Lab Williams Bldg 1 Bungtown Road Cold Spring Harbor, NY - 11724 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.