On Tue, Jul 05, 2011 at 04:53:32PM +0200, albert coster wrote: I'm taking this back to the list so others can follow up.
> Yes, the file is consists of one string (sequence) per line. > > The files format is following: > > Sequence > NNNNNNNNNNATTAAAGGGC OK - in that case (and as you want a vector anyway) you can use scan('seq.txt', what=character)() > > > seqfile<-read.table("seq.txt") > Warning message: > In read.table("seq.txt") : > incomplete final line found by readTableHeader on 'seq.txt' OK - that means you don't have a newline ('\n') at the end of your sequence file and read.table is warning you about that. > > str(seqfile) > 'data.frame': 2 obs. of 1 variable: > $ V1: Factor w/ 2 levels "NNNNNNNNNNATTAAAGGGC",..: 2 1 This indicates that there are at least two lines in the file (so you got two levels in the factor). So I would guess there is an empy line before your sequence or you really have the word 'Sequence' on line 1. For sequence data it probably does not make much sense to let R convert to factor and a character colunm would be prefered. This can be accomplished by using one of the options 'as.is', 'stringsAsFactors' or 'colClasses'. If you use scan you'll need to get rid of the extra line first. If you stick with read.table you can specify the first line as your header line using the header=TRUE option. Now you can address column 'Sequence' as such. Example: > dat <- read.table('seq.txt', as.is=T, header=TRUE) > dat$Sequence [1] "NNNNNNNNNNATTAAAGGGC" > dat[, 'Sequence'] [1] "NNNNNNNNNNATTAAAGGGC" > str(dat) 'data.frame': 1 obs. of 1 variable: $ Sequence: chr "NNNNNNNNNNATTAAAGGGC" cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.