If you know you can find the start of the document (say that line always starts with Document...), then:
grep("Document+.", yourfile, value = FALSE) + 4 should give you 4 lines after each line where Document occurred. No loop needed :) On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com> wrote: > Hi Josh, > Sorry for the insufficient introduction. This might work, but I'm not sure. > The file that I have includes up to 100 documents (Document 1, Document 2, > Document 3....Document 100) with the newspaper name following 4 lines below > each Document number. > I'm using readlines to get the text file into R and then trying to use grep > to get the newspaper name for each record. But your idea of indexing the text > object read into R with the line number where the newspaper name is found is > a good one. I'll just have to come up with a loop to tell R to get the 4th, > 8th, 12, 16th, line, etc. > I'll see if I can get that to work. > Simon > On 2011-07-11, at 12:45 PM, Joshua Wiley wrote: > >> Dear Simon, >> >> Maybe I don't understand properly....if you are doing this in R, can't >> you just pick the line you want? >> >> Josh >> >> ## print your data to clipboard >> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = >> "clipboard") >> ## read data in, and only select the 4th line to pass to grep() >> grep("pattern", x = readLines("clipboard")[4]) >> >> >> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote: >>> Dear colleagues, >>> I have a series of newspaper articles in a text file, downloaded from a >>> text file. They look as follows: >>> >>> Document 1 of 100 >>> \n >>> \n >>> \n >>> Newspaper Name >>> \n >>> \n >>> Day Date >>> >>> I have a series of grep scripts that can extract the date and convert it to >>> a date object, but I can't figure out how to grep the newspaper name. >>> There is no field ID attached to those lines. The best I can come up with >>> would be to have the program grep the four lines following matching the >>> pattern "Document [0-9]". There is an an argument to grep in unix that can >>> do this ...grep -A4 'pattern' infile>outfile, but I don't know if there is >>> an equivalent argument in R. >>> >>> Any thoughts. >>> Yours, Simon Kiss >>> ********************************* >>> Simon J. Kiss, PhD >>> Assistant Professor, Wilfrid Laurier University >>> 73 George Street >>> Brantford, Ontario, Canada >>> N3T 2C9 >>> Cell: +1 905 746 7606 >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> https://joshuawiley.com/ > > ********************************* > Simon J. Kiss, PhD > Assistant Professor, Wilfrid Laurier University > 73 George Street > Brantford, Ontario, Canada > N3T 2C9 > Cell: +1 905 746 7606 > > > > > > > > > > > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.