On Fri, May 11, 2018 at 6:16 PM, Daniel Frey <djqf...@gmail.com> wrote:
> Hi all,
>
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
>
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
>
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
>
>
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END ---
>
> As you can see, there's no standard in the way the date is formatted.
> Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY.
>
> I have a basic grep that I tossed together:
>
> grep -o '\([0-9]\{4\}\)'
>
> This does extract the year but yields the following:
>
> 1994
> 1992
> 1994
> 1993
> 1992
> 1995
> 1993
> 1991
> 1991
> 1992
> 1994
> 1992
> 1995
>
> As you can see, the two empty lines are removed but this will cause
> problems with data not lining up later on.
>
> Does anyone have a quick tip for my tired brain to make this work and
> just output a blank line if there's no match? I swear I did this months
> ago and had something working but I apparently didn't bother saving the
> script I made. Argh!
>
> Dan
>

Use awk or perl and when the line matches the pattern ^\s*$ print a
blank line. Otherwise, apply the normal pattern.

Cheers,
     R0b0t1

Reply via email to