Re: [R] grep lines before or after pattern matched?

Joshua Wiley Mon, 11 Jul 2011 12:35:13 -0700

On Jul 11, 2011, at 12:00, Bert Gunter <gunter.ber...@gene.com> wrote:


> Simon:
> 
> Basic basic stuff (not grep -- the stuff thereafter) . Please read the
> docs, especially the tutorial,  An Intro to R.
> 
> ... and Josh's solution can be shortened to (as he knows):
> 
> index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)
> 

Really?  Won't the 2 and 4 get recycled so that every other element returned 
from grep will have 2 or 4 added instead of 2 *and* 4?

My understanding is that Simon has a single file with for example Document 1 on 
line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines 
after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would only 
give 3, 305.

Josh

> -- Bert
> 
> On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley <jwiley.ps...@gmail.com> wrote:
>> Try this (untested as I'm on my iPhone now):
>> 
>> index <- grep("Document+.", yourfile, value = FALSE)
>> index <- c(index + 2, index + 4)
>> 
>> You just need to make sure you avoid recycling, e.g.,
>> 
>> 1:10 + c(2, 4) # not what you want
>> 
>> If you want a sufficient number of lines that manually writing index + 
>> becomes cumbersome, you could use something like:
>> 
>> as.vector(sapply(c(2, 4), "+", e2 = index))
>> 
>> HTH,
>> 
>> Josh
>> 
>> On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote:
>> 
>>> Josh, that's amazing. Is there any way to have it grab two different lines 
>>> after the grep, say the second and the fourth line? There's some other 
>>> information in the text file I'd like to grab.  I could do two separate 
>>> commands, but I'd like to know if this could be done in one command...
>>> Simon Kiss
>>> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
>>> 
>>>> If you know you can find the start of the document (say that line
>>>> always starts with Document...), then:
>>>> 
>>>> grep("Document+.", yourfile, value = FALSE) + 4
>>>> 
>>>> should give you 4 lines after each line where Document occurred.  No
>>>> loop needed :)
>>>> 
>>>> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com> wrote:
>>>>> Hi Josh,
>>>>> Sorry for the insufficient introduction. This might work, but I'm not 
>>>>> sure.
>>>>> The file that I have includes up to 100 documents (Document 1, Document 
>>>>> 2, Document 3....Document 100) with the newspaper name following 4 lines 
>>>>> below each Document number.
>>>>> I'm using readlines to get the text file into R and then trying to use 
>>>>> grep to get the newspaper name for each record. But your idea of indexing 
>>>>> the text object read into R with the line number where the newspaper name 
>>>>> is found is a good one.  I'll just have to come up with a loop to tell R 
>>>>> to get the 4th, 8th, 12, 16th, line, etc.
>>>>> I'll see if I can get that to work.
>>>>> Simon
>>>>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>>>>> 
>>>>>> Dear Simon,
>>>>>> 
>>>>>> Maybe I don't understand properly....if you are doing this in R, can't
>>>>>> you just pick the line you want?
>>>>>> 
>>>>>> Josh
>>>>>> 
>>>>>> ## print your data to clipboard
>>>>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>>>>>> "clipboard")
>>>>>> ## read data in, and only select the 4th line to pass to grep()
>>>>>> grep("pattern", x = readLines("clipboard")[4])
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote:
>>>>>>> Dear colleagues,
>>>>>>> I have a series of newspaper articles in a text file, downloaded from a 
>>>>>>> text file.  They look as follows:
>>>>>>> 
>>>>>>> Document 1 of 100
>>>>>>> \n
>>>>>>> \n
>>>>>>> \n
>>>>>>> Newspaper Name
>>>>>>> \n
>>>>>>> \n
>>>>>>> Day Date
>>>>>>> 
>>>>>>> I have a series of grep scripts that can extract the date and convert 
>>>>>>> it to a date object, but I can't figure out how to grep the newspaper 
>>>>>>> name.  There is no field ID attached to those lines. The best I can 
>>>>>>> come up with would be to have the program grep the four lines following 
>>>>>>> matching the pattern "Document [0-9]".  There is an an argument to grep 
>>>>>>> in unix that can do this ...grep -A4 'pattern' infile>outfile, but I 
>>>>>>> don't know if there is an equivalent argument in R.
>>>>>>> 
>>>>>>> Any thoughts.
>>>>>>> Yours, Simon Kiss
>>>>>>> *********************************
>>>>>>> Simon J. Kiss, PhD
>>>>>>> Assistant Professor, Wilfrid Laurier University
>>>>>>> 73 George Street
>>>>>>> Brantford, Ontario, Canada
>>>>>>> N3T 2C9
>>>>>>> Cell: +1 905 746 7606
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-help@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide 
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Joshua Wiley
>>>>>> Ph.D. Student, Health Psychology
>>>>>> University of California, Los Angeles
>>>>>> https://joshuawiley.com/
>>>>> 
>>>>> *********************************
>>>>> Simon J. Kiss, PhD
>>>>> Assistant Professor, Wilfrid Laurier University
>>>>> 73 George Street
>>>>> Brantford, Ontario, Canada
>>>>> N3T 2C9
>>>>> Cell: +1 905 746 7606
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joshua Wiley
>>>> Ph.D. Student, Health Psychology
>>>> University of California, Los Angeles
>>>> https://joshuawiley.com/
>>> 
>>> *********************************
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
> 
> -- Maimonides (1135-1204)
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

Reply via email to