On Thu, Aug 30, 2012 at 04:37:19PM -0700, daniel jimenez wrote:
>    Hello all,
>    I need some help fixing the format of some pretty strangely compressed
>    data files. An example would be like this:
> 
>    2883
>    452
>    0  7
>    1  6
>    2
>    4
>    6
>    10  7
>    Parsing rules:
>    The first two lines should be ignored.
>    The first column is the 'index', the second column being the 'counter'.
>    If there is no second number (ex. index=2), then the second number should
>    be set to '1'.
>    If there the index skips (ex. from index=2 to index=4), then the indexes
>    which where skipped should be set to '0'
>    Max index is 1024.
>    That is it. I'd like to be guided to an app (scripting language? awk? sed?
>    I haven't used those so I really don't know where to start) that can help
>    me do that effectively.
>    The command, with the script possibly as the argument, is to be included
>    in a bash script right before a fortran program is executed as the fortran
>    program expects the file to be uncompressed and it doesn't seem intuitive
>    to do it from fortran. Although it would be nice for a guru to let me know
>    how to handle it from within...
>    In the end, any solution would be a great help.
>    Thanks.
>    --
>    Daniel Jimenez

Hi Daniel,

Here's my awk solution:


NR > 2 {                                # Ignore lines 1 & 2
  if (NF < 2){                          # If number of fields is less than 
one...
    counter=1                           # Set variable counter to one
  } else {
    counter=$2                          # Otherwise set counter to 2nd field
  }
  difference = $1 - last_index          # Subtract last index to find gaps
  if (difference > 1){                  # If gaps exist...
    for (i=1; i<=difference; i++){
      arr[i+last_index]=0               # Add skipped indices to array w/ zero 
value
    }
  }
  arr[$1]=counter                       # Add index to array with value counter
  last_index=$1                         # Remember this index for the next line
}
END {
  for (j=0; j<=last_index; j++){        
    print j, arr[j]                     # Print all indices and their values
  }
}


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120831014403.ga4...@cerulean.myhome.westell.com

Reply via email to