Re: Substitute pattern over multiple lines

John Cordes Wed, 23 Dec 2020 18:08:03 -0800

On Wed, Dec 23, 2020 at 9:31 PM Tim Chase <[email protected]> wrote:


> On 2020-12-23 20:39, John Cordes wrote:
> >> I'd start with this ugly monstrosity:
> >>
> >> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> >> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> >> 'g'), '\n', '', 'g')."<\/div>\n"
> >
> >  I will attempt to deconstruct your 'monstrosity' somewhat later,
>
> Tweaking it so that it only does NOTE items, not generic
> continuations:
>
> :%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
> Breaking it down so hopefully you can swap parts as you see fit:
>
> :%s/^2 NOTE \zs     On every line starting with "2 NOTE "
>                     start our replacement here (\zs)
> \(                  start capturing the note
>                     this will be submatch(1) later
> .*                  everything else on that line
> \n                  and the newline
> \%(                 a non-capturing group for another line that
> \%(\D               starts with either a non-digit
> \|                  or
> 3 CONC              a literal "3 CONC "
> \)                  (end of this OR of things marking a continuation)
> .*\n                followed by the rest of the line
> \)                  (end of this continuation-line)
> \+                  we can have 1 or more continuation lines
> \)                  end the capturing
> /                   replace it with
> \=                  the result of evaluating this expression
> '<div class="xxx">' the literal opening tag
> .                   and then the results of
> substitute(         remove all the newlines from the results of
>  substitute(        removing from
>   submatch(1),      the whole set of continuation stuff
>   '\n3 CONC ',      the literal newline-followed-by-"3 CONC "
>   '',               and replace them with nothing
>   'g'               everywhere
>   ),                and in that "\n3 CONC "-less text, replace
>  '\n',              newlines with
>  '',                nothing
>  'g')               everywhere
> .                   and then tack on
> "<\/div>\n"         the literal closing </div> followed by a newline
>
> >  It's a bit more complicated than I first explained. Two aspects:
> > a) I *do* need to search on the "2 NOTE" lines, since there are
> > various other chunks of lines with the CONC lines; and
> > b) Sometimes the line "2 TYPE tngnote" has a line between it and
> > the "2 NOTE". The intervening line can look like this
> >
> > 2 DATE 18 AUG 1776
> >  or this
> > 2 _SDATE 1802
>
> Given the substitution command above, it should only touch "2 NOTE"
> lines with subsequent "3 CONT" lines.  It does *every* "2 NOTE" so if
> you need to limit them to just those that immediately follow "2 TYPE
> tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
> have a NOTE immediately following them), you can tweak that command,
> changing that inital "%" to
>
> :g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…
>
> This looks for all the "2 TYPE tngnote" lines, searches forward
> (skipping over any DATE/_SDATE lines or other intervening stuff) for
> the "2 NOTE " line following it, and then only performs the
> subsitution on those particular lines.
>
> >  So the lines to change could look like this:
> >
> > ===================
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 _SDATE 1802
> > 2 NOTE The surname of John's wife is not positively established.
> > However, it is certain that her given name is Elizabeth; evidence
> > for this comes first from the baptismal records for Rebecca and
> > Eliza Catherine; these children were born while th
> > 3 CONC e family was in London so the records are available in the
> > London Metropolitan Archives (the other two children were born in
> > Sheffield). Henry's baptismal record in Sheffield also has his
> > parents being John (a skinner) and Elizabeth. The id
> > 3 CONC entification of John's wife specifically with  Elizabeth
> > Coxsey is somewhat tentative, however.
> > 1 EVEN
> > ===================
> >
> >  This search pattern
> > /^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> >
> >  works to find all 3 possibilities: no DATE line, an _SDATE line
> > or a DATE line.
> >
> >  I thought I would be able to combine that with your pattern like
> > so:
> >
> > :%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> > \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> > class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> > 'g'), '\n', '', 'g')."<\/div>\n"
> >
> >  but that is not working.
>
> I suspect that the problem snuck in by using \(…\) in your added
> conditions which captured that as submatch(1).  So you can either
> make it non-capturing by adding that "%" before the open-paren:
>
>   \%(\_^2 .*DATE.*\)
>
> or change the "submatch(1)" to "submatch(2)"
>
> > Here's an example of one small chunk of
> > lines which were transformed by that command:
> >
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 DATE 18 AUG 1776
> > 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> > 1 EVEN
>
> Note that the content here is what you captured in the first group.
> :-)
>
> Hope this helps get you on the right path,
>
> -tim
>
>
 This is amazing looking, Tim -- thanks so much! There is a lot for a
nearly 80-year old to unpack here -- it's going to take me a while. :)
  It looks as though you have covered all the bases I want to deal with.

 Thank you again,
 John

-- 
-- 
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/CAGZBEdSChuJr8t82%3DOE-aMwQ6GgXyUKj-6SnBMmpQJLEHC9h%2BA%40mail.gmail.com.

Re: Substitute pattern over multiple lines

Reply via email to