Check out the arguments for read.table especially 'quote' you probably want quote='' to suppress the special meaning of quote. You might also need comment.char in the future.
On Tue, Feb 15, 2011 at 12:21 PM, Robert M. Flight <rfligh...@gmail.com> wrote: > Say I have a tab-delimited table I want to read into R. What should I > expect to happen if some of the entries contain the character " ' "? I > thought it would read the file fine, but that is not what happens. > Instead, all the values in between two " ' "s get read into one field, > and things are just seriously messed up. Is this a bug, and besides > removing the offending characters, is there a fix? > > Example Input file: > > testFile.txt: > 3499 9031 424823 COP'B2 118094989 XP_422637.2 > 3499 7955 114454 copb2 50080158 NP_001001940.1 > 3499 7227 45757 betaCop 24584107 NP_524836.2 > 3499 7165 1278426 AgaP_AGAP004798 158297839 XP_318012.4 > 3499 6239 177779 F38E11.5 17540286 NP_501671.1 > 3499 4896 2540050 sec'27 19113604 NP_596811.1 > 3499 4932 852740 SEC27 6321301 NP_011378.1 > 3499 28985 2897447 KLLA0B01958g 50303353 XP_451618.1 > 3499 33169 4621659 AGOS_AFL118W 45198403 NP_985432.1 > 3499 148305 2682116 MGG_10504 145615762 XP_366285.2 > 3499 5141 2709504 NCU07319.1 32414251 XP_327605.1 > 3499 3702 820842 AT3G15980 30683862 NP_850592.1 > 3499 3702 841666 AT1G52360 15218215 NP_175645.1 > 3499 3702 844339 AT1G79990 30699476 NP_178116.2 > 3499 4530 4340097 Os06g0143900 115466360 NP_001056779.1 > > testDat <- read.table('testFile.txt',sep='\t') > testDat > > V1 V2 V3 > 1 3499 9031 424823 > 2 3499 4932 852740 > 3 3499 28985 2897447 > 4 3499 33169 4621659 > 5 3499 148305 2682116 > 6 3499 5141 2709504 > 7 3499 3702 820842 > 8 3499 3702 841666 > 9 3499 3702 844339 > 10 3499 4530 4340097 > > > > V4 > 1 > COPB2\t118094989\tXP_422637.2\n3499\t7955\t114454\tcopb2\t50080158\tNP_001001940.1\n3499\t7227\t45757\tbetaCop\t24584107\tNP_524836.2\n3499\t7165\t1278426\tAgaP_AGAP004798\t158297839\tXP_318012.4\n3499\t6239\t177779\tF38E11.5\t17540286\tNP_501671.1\n3499\t4896\t2540050\tsec27 > 2 > > > SEC27 > 3 > > > KLLA0B01958g > 4 > > > AGOS_AFL118W > 5 > > > MGG_10504 > 6 > > > NCU07319.1 > 7 > > > AT3G15980 > 8 > > > AT1G52360 > 9 > > > AT1G79990 > 10 > > > Os06g0143900 > V5 V6 > 1 19113604 NP_596811.1 > 2 6321301 NP_011378.1 > 3 50303353 XP_451618.1 > 4 45198403 NP_985432.1 > 5 145615762 XP_366285.2 > 6 32414251 XP_327605.1 > 7 30683862 NP_850592.1 > 8 15218215 NP_175645.1 > 9 30699476 NP_178116.2 > 10 115466360 NP_001056779.1 > > I would appreciate any feedback. > > Thanks, > > -Robert > >> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_2.12.1 > > > Robert M. Flight, Ph.D. > University of Louisville Bioinformatics Laboratory > University of Louisville > Louisville, KY > > PH 502-852-1809 (HSC) > PH 502-852-0467 (Belknap) > EM robert.fli...@louisville.edu > EM rfligh...@gmail.com > > Williams and Holland's Law: > If enough data is collected, anything may be proven by > statistical methods. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.