On Jul 11, 2011, at 12:00, Bert Gunter <gunter.ber...@gene.com> wrote:
> Simon: > > Basic basic stuff (not grep -- the stuff thereafter) . Please read the > docs, especially the tutorial, An Intro to R. > > ... and Josh's solution can be shortened to (as he knows): > > index <- grep("Document+.", yourfile, value = FALSE) + c(2,4) > Really? Won't the 2 and 4 get recycled so that every other element returned from grep will have 2 or 4 added instead of 2 *and* 4? My understanding is that Simon has a single file with for example Document 1 on line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would only give 3, 305. Josh > -- Bert > > On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley <jwiley.ps...@gmail.com> wrote: >> Try this (untested as I'm on my iPhone now): >> >> index <- grep("Document+.", yourfile, value = FALSE) >> index <- c(index + 2, index + 4) >> >> You just need to make sure you avoid recycling, e.g., >> >> 1:10 + c(2, 4) # not what you want >> >> If you want a sufficient number of lines that manually writing index + >> becomes cumbersome, you could use something like: >> >> as.vector(sapply(c(2, 4), "+", e2 = index)) >> >> HTH, >> >> Josh >> >> On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote: >> >>> Josh, that's amazing. Is there any way to have it grab two different lines >>> after the grep, say the second and the fourth line? There's some other >>> information in the text file I'd like to grab. I could do two separate >>> commands, but I'd like to know if this could be done in one command... >>> Simon Kiss >>> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote: >>> >>>> If you know you can find the start of the document (say that line >>>> always starts with Document...), then: >>>> >>>> grep("Document+.", yourfile, value = FALSE) + 4 >>>> >>>> should give you 4 lines after each line where Document occurred. No >>>> loop needed :) >>>> >>>> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com> wrote: >>>>> Hi Josh, >>>>> Sorry for the insufficient introduction. This might work, but I'm not >>>>> sure. >>>>> The file that I have includes up to 100 documents (Document 1, Document >>>>> 2, Document 3....Document 100) with the newspaper name following 4 lines >>>>> below each Document number. >>>>> I'm using readlines to get the text file into R and then trying to use >>>>> grep to get the newspaper name for each record. But your idea of indexing >>>>> the text object read into R with the line number where the newspaper name >>>>> is found is a good one. I'll just have to come up with a loop to tell R >>>>> to get the 4th, 8th, 12, 16th, line, etc. >>>>> I'll see if I can get that to work. >>>>> Simon >>>>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote: >>>>> >>>>>> Dear Simon, >>>>>> >>>>>> Maybe I don't understand properly....if you are doing this in R, can't >>>>>> you just pick the line you want? >>>>>> >>>>>> Josh >>>>>> >>>>>> ## print your data to clipboard >>>>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = >>>>>> "clipboard") >>>>>> ## read data in, and only select the 4th line to pass to grep() >>>>>> grep("pattern", x = readLines("clipboard")[4]) >>>>>> >>>>>> >>>>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote: >>>>>>> Dear colleagues, >>>>>>> I have a series of newspaper articles in a text file, downloaded from a >>>>>>> text file. They look as follows: >>>>>>> >>>>>>> Document 1 of 100 >>>>>>> \n >>>>>>> \n >>>>>>> \n >>>>>>> Newspaper Name >>>>>>> \n >>>>>>> \n >>>>>>> Day Date >>>>>>> >>>>>>> I have a series of grep scripts that can extract the date and convert >>>>>>> it to a date object, but I can't figure out how to grep the newspaper >>>>>>> name. There is no field ID attached to those lines. The best I can >>>>>>> come up with would be to have the program grep the four lines following >>>>>>> matching the pattern "Document [0-9]". There is an an argument to grep >>>>>>> in unix that can do this ...grep -A4 'pattern' infile>outfile, but I >>>>>>> don't know if there is an equivalent argument in R. >>>>>>> >>>>>>> Any thoughts. >>>>>>> Yours, Simon Kiss >>>>>>> ********************************* >>>>>>> Simon J. Kiss, PhD >>>>>>> Assistant Professor, Wilfrid Laurier University >>>>>>> 73 George Street >>>>>>> Brantford, Ontario, Canada >>>>>>> N3T 2C9 >>>>>>> Cell: +1 905 746 7606 >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Joshua Wiley >>>>>> Ph.D. Student, Health Psychology >>>>>> University of California, Los Angeles >>>>>> https://joshuawiley.com/ >>>>> >>>>> ********************************* >>>>> Simon J. Kiss, PhD >>>>> Assistant Professor, Wilfrid Laurier University >>>>> 73 George Street >>>>> Brantford, Ontario, Canada >>>>> N3T 2C9 >>>>> Cell: +1 905 746 7606 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Joshua Wiley >>>> Ph.D. Student, Health Psychology >>>> University of California, Los Angeles >>>> https://joshuawiley.com/ >>> >>> ********************************* >>> Simon J. Kiss, PhD >>> Assistant Professor, Wilfrid Laurier University >>> 73 George Street >>> Brantford, Ontario, Canada >>> N3T 2C9 >>> Cell: +1 905 746 7606 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.