The file is 20MB having 2 Million rows. I understand that I two different formats - 6 columns and 7 columns. How do I read chunks to different files by using scan with modifying skip and nlines parameters?
On Mon, Mar 19, 2012 at 3:59 PM, Petr PIKAL <petr.pi...@precheza.cz> wrote: > > I would follow Jims suggestion, > nFields <- count.fields(fileName, sep = ',') > count fields and read chunks to different files by using scan with > modifying skip and nlines parameters. However if there is only few lines > which differ it would be better to correct those few lines manually in > some suitable editor. > > Elaborating omnipotent function for reading any kind of > corrupted/nonstandard files seems to me suited only if you expect to read > such files many times. > > Regards > Petr > > >> >> >> >> On Sat, Mar 17, 2012 at 4:54 AM, jim holtman <jholt...@gmail.com> wrote: >> > Here is a solution that looks for the line with 7 elements and inserts >> > the quotes: >> > >> > >> >> fileName <- '/temp/text.txt' >> >> input <- readLines(fileName) >> >> # count the fields to find 7 >> >> nFields <- count.fields(fileName, sep = ',') >> >> # now fix the data >> >> for (i in which(nFields == 7)){ >> > + # split on comma >> > + z <- strsplit(input[i], ',')[[1]] >> > + input[i] <- paste(z[1], z[2] >> > + , paste('"', z[3], ',', z[4], '"', sep = '') # put on quotes >> > + , z[5], z[6], z[7], sep = ',' >> > + ) >> > + } >> >> >> >> # now read in the data >> >> result <- read.table(textConnection(input), sep = ',') >> >> >> >> result >> > V1 V2 V3 V4 V5 V6 >> > 1 1968 21 0 >> > 2 Boston 1968 13 0 >> > 3 Boston 1968 18 0 >> > 4 Chicago 1967 44 0 >> > 5 Providence 1968 17 0 >> > 6 Providence 1969 48 0 >> > 7 Binky 1968 24 0 >> > 8 Chicago 1968 23 0 >> > 9 Dally 1968 7 0 >> > 10 Raleigh, North Carol 1968 25 0 >> > 11 Addy ABC-Dogs Stars-W8.1 Providence 1968 38 0 >> > 12 DEF_REQPRF/ Dartmouth 1967 31 1 >> > 13 PL 1967 38 1 >> > 14 XY PopatLal 1967 5 1 >> > 15 XY PopatLal 1967 6 8 >> > 16 XY PopatLal 1967 7 7 >> > 17 XY PopatLal 1967 9 1 >> > 18 XY PopatLal 1967 10 1 >> > 19 XY PopatLal 1967 13 1 >> > 20 XY PopatLal Boston 1967 6 1 >> > 21 XY PopatLal Boston 1967 7 11 >> > 22 XY PopatLal Boston 1967 9 2 >> > 23 XY PopatLal Boston 1967 10 3 >> > 24 XY PopatLal Boston 1967 7 2 >> >> >> > >> > >> > On Fri, Mar 16, 2012 at 2:17 PM, Ashish Agarwal >> > <ashish.agarw...@gmail.com> wrote: >> >> I have a file that is 5000 records and to edit that file is not easy. >> >> Is there any way to line 10 differently to account for changes in the >> >> third field? >> >> >> >> On Fri, Mar 16, 2012 at 11:35 PM, Peter Ehlers <ehl...@ucalgary.ca> > wrote: >> >>> On 2012-03-16 10:48, Ashish Agarwal wrote: >> >>>> >> >>>> Line 10 has City and State that too separated by comma. For line 10 >> >>>> how can I read differently as compared to the other lines? >> >>> >> >>> >> >>> Edit the file and put quotes around the city-state combination: >> >>> "Raleigh, North Carol" >> >>> >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > >> > >> > -- >> > Jim Holtman >> > Data Munger Guru >> > >> > What is the problem that you are trying to solve? >> > Tell me what you want to do, not how you want to do it. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.