On Monday 23 February 2009, 00:31, Mark Knecht wrote:

> Yeah, that's probably almost usable as it is . I tried it with n=3 and
> n=10. Worked both times just fine. The initial issue might be (as with
> Willie's sed code) that the first line wasn't quite right and required
> some hand editing. I'd prefer not to have to hand edit anything as the
> files are large and that step will be slow. I can work on that.

But then could you paste an example of such line, so we can see it? The 
first line was not special in the sample you posted...

> As per the message to Willie it would be nice to be able to drop
> columns out but technically I suppose it's not really required. All of
> this is going into another program which must at some level understand
> what the columns are. If I have extra dates and don't use them that's
> probably workable.

Anyway, it's not difficult to add that feature:

BEGIN { FS=OFS=","}
{
  r=$NF;NF--
  for(i=1;i<n;i++){
    s[i]=s[i+1]
    dt[i]=dt[i+1]
    if((NR>=n)&&(i==1))printf "%s%s",dt[1],OFS
    if(NR>=n)printf "%s%s",s[i],OFS
  }
  sep=dt[n]="";for(i=1;i<=dropcol;i++){dt[n]=dt[n] sep $i;sep=OFS}
  sub("^([^,]*,){"dropcol"}","")
  s[n]=$0
  if(NR>=n)printf "%s,%s\n", s[n],r
}

There is a new variable "dropcol" which contains the number of columns to 
drop. Also, for the above to work, you must add the --re-interval 
command line switch to awk, eg 

awk --re-interval -v n=4 -v dropcol=2 -f program.awk datafile.csv

> The down side is the output file is 10x larger than the input file -
> roughly - and my current input files are 40-60MB so the output files
> will be 600MB. Not huge but if they grew too much more I might get
> beyond what a single file can be on ext3, right? Isn't that 2GB or so?

That is strange, the output file could be bigger but not by that 
factor...if you don't mind, again could you paste a sample input file 
(maybe just some lines, to get an idea...)?

Reply via email to