Hi Ace, You can just read the file first to find out: max_fields<-function(file,sep=" ") { rlines<-readLines(file) return(max(unlist(lapply(sapply(rlines,strsplit,sep),length)))) } nmax<-max_fields(test.txt,"\t")
Jim On Wed, Aug 30, 2017 at 2:22 AM, Fix Ace <ace...@rocketmail.com> wrote: > Thank you very much! Looks like I have to know the length of each record > ahead of time. > > Ace > > > On Monday, August 28, 2017 12:56 AM, Jim Lemon <drjimle...@gmail.com> wrote: > > > Hi Ace, > With tabs as separators: > > testdf<-read.table("test.txt",header=FALSE,fill=TRUE,sep="\t", > col.names=paste("V",1:19,sep=""),stringsAsFactors=FALSE) > > Also note that I got the number of columns wrong the first time. > > Jim > > > On Mon, Aug 28, 2017 at 12:56 PM, Fix Ace <ace...@rocketmail.com> wrote: >> Hi, Jim, >> >> Thank you very much for pointing out the format issue. Here is the >> original >> text: >> >> === >> I have a text file (test.txt) with different number of columns: >> >> 0610007P14Rik%%% Tcf19 Gtf2i >> 0610010O12Rik%%% Ivns1abp Etv6 >> 1100001G20Rik%%% Nmi >> 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 >> 1700003E16Rik%%% Ascl2 Ifnar2 >> 1700028J19Rik%%% Musk Nfe2l3 >> 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 Sox10 Smarca2 >> 1810019D21Rik%%% Asb8 >> 1810037I17Rik%%% Zfp612 >> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i >> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 >> >> I wold like to read it into R using >> >>> test=read.csv("test.txt",sep="\t",header=FALSE) >> >> However, when I check the r object "test", I found that all the rows have >> 5 >> columns: >> >>> test >> V1 V2 V3 V4 V5 >> 1 0610007P14Rik%%% Tcf19 Gtf2i >> 2 0610010O12Rik%%% Ivns1abp Etv6 >> 3 1100001G20Rik%%% Nmi >> 4 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 >> 5 1700003E16Rik%%% Ascl2 Ifnar2 >> 6 1700028J19Rik%%% Musk Nfe2l3 >> 7 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc1 >> 8 Sox10 Smarca2 >> 9 1810019D21Rik%%% Asb8 >> 10 1810037I17Rik%%% Zfp612 >> 11 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 >> 12 Elk4 Spdef Tcf19 Isl2 Gtf2i >> 13 Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 >> 14 Nupr1 3632451O06Rik Creb3l4 Lass6 >> >> Basically it breaks some rows into more than one rows. For example, row 7 >> in >> the original record becomes two rows. Looks like the "test" always has 5 >> columns. >> >> How does this happen? How should I fix it to make one record into one two >> in >> R object? >> >> == >> >> Please let me know if it is readable now. Thank you very much for your >> time! >> >> Kind regards, >> >> Ace >> >> >> On Sunday, August 27, 2017 7:25 PM, Jim Lemon <drjimle...@gmail.com> >> wrote: >> >> >> Hi Ace, >> As your example seems to have spaces as separators, >> >> testdf<-read.table("test.txt",header=FALSE,fill=TRUE, >> col.names=paste("V",1:14,sep=""),stringsAsFactors=FALSE) >> >> By specifying the number of columns with "col.names" and using >> "fill=TRUE" you can get a data frame with zero length strings where >> values are missing in the input file. >> >> Jim >> >> On Mon, Aug 28, 2017 at 6:25 AM, Fix Ace via R-help >> <r-help@r-project.org> wrote: >>> Dear R community, >>> I have a text file (test.txt) with different number of columns: >>> 0610007P14Rik%%% Tcf19 Gtf2i 0610010O12Rik%%% Ivns1abp Etv6 >>> 1100001G20Rik%%% Nmi 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 1700003E16Rik%%% >>> Ascl2 Ifnar2 1700028J19Rik%%% Musk Nfe2l3 1810011O10Rik%%% Ppp1r13b Bpnt1 >>> Cdkn2c Foxc1 Sox10 Smarca2 1810019D21Rik%%% Asb8 1810037I17Rik%%% Zfp612 >>> 1810055G02Rik%%% Nkx2-3 Maged1 Runx1 Ugp2 Elk4 Spdef Tcf19 Isl2 Gtf2i >>> Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l1 Nupr1 3632451O06Rik Creb3l4 Lass6 >>> I wold like to read it into R using >>> > test=read.csv("test.txt",sep="\t",header=FALSE) >>> However, when I check the r object "test", I found that all the rows have >>> 5 columns: >>>> test V1 V2 V3 V4 V51 >>>> 0610007P14Rik%%% Tcf19 Gtf2i 2 0610010O12Rik%%% >>>> Ivns1abp Etv6 3 1100001G20Rik%%% Nmi >>>> 4 1500015O10Rik%%% Foxi1 Ascl3 Sirt3 5 >>>> 1700003E16Rik%%% >>>> Ascl2 Ifnar2 6 1700028J19Rik%%% Musk Nfe2l3 >>>> 7 1810011O10Rik%%% Ppp1r13b Bpnt1 Cdkn2c Foxc18 Sox10 >>>> Smarca2 9 1810019D21Rik%%% Asb8 >>>> 10 1810037I17Rik%%% Zfp612 11 >>>> 1810055G02Rik%%% >>>> Nkx2-3 Maged1 Runx1 Ugp212 Elk4 Spdef Tcf19 >>>> Isl2 >>>> Gtf2i13 Ctnnbl1 Tcea3 Ank2 Zfp612 Creb3l114 >>>> Nupr1 3632451O06Rik Creb3l4 Lass6 >>> Basically it breaks some rows into more than one rows. For example, row 7 >>> in the original record becomes two rows. Looks like the "test" always has >>> 5 >>> columns. >>> How does this happen? How should I fix it to make one record into one two >>> in R object? >>> Thank you very much! >>> Ace >> >>> >>> >>> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.