On Fri, May 11, 2018 at 6:16 PM, Daniel Frey <djqf...@gmail.com> wrote: > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END --- > > As you can see, there's no standard in the way the date is formatted. > Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY. > > I have a basic grep that I tossed together: > > grep -o '\([0-9]\{4\}\)' > > This does extract the year but yields the following: > > 1994 > 1992 > 1994 > 1993 > 1992 > 1995 > 1993 > 1991 > 1991 > 1992 > 1994 > 1992 > 1995 > > As you can see, the two empty lines are removed but this will cause > problems with data not lining up later on. > > Does anyone have a quick tip for my tired brain to make this work and > just output a blank line if there's no match? I swear I did this months > ago and had something working but I apparently didn't bother saving the > script I made. Argh! > > Dan >
Use awk or perl and when the line matches the pattern ^\s*$ print a blank line. Otherwise, apply the normal pattern. Cheers, R0b0t1