On Tuesday 24 February 2009, 03:26, Mark Knecht wrote: > If I drop columns - and I do need to - then something like how cut > works would be good, but it needs to repeat across all the rows being > used. For instance, if I'm dropping columns 6 & 12 from a 20 column > wide data set, then I'm dropping 6 & 12 from all N lines. This is > where using cut after the line is built is difficult as I'm forced to > figure out a list like 6,12,26,32,46,52, etc. Easy to make a mistake > doing that. If I could say something like "Drop 6 & 12 from all rows, > and 1 & 2 from all rows higher than the first that make up this new > line" then that would be great. That's a lot to ask though. > > D1,T1,A1,B1,C1,D1, > D2,T2,A2,B2,C2,D2, > D3,T3,A3,B3,C3,D3, > D4,T4,A4,B4,C4,D4, > D5,T5,A5,B5,C5,D5, > > In the data above if I drop column A, then I drop it for all rows. > (For instance, A contains 0 and isn't necessary, etc.) Assuming 3 > wide I'd get > > D1,T1,B1,C1,D1,B2,C2,D2,B3,C3,D3 > D2,T2,B2,C2,D2,B3,C3,D3,B4,C4,D4 > D3,T3,B3,C3,D3,B4,C4,D4,B5,C5,D5 > > Making that completely flexible - where I can drop 4 or 5 random > columns - is probably a bit too much work. On the other hand maybe > sending it to cut as part of the whole process, line by lone or > something, is more reasonable? I don't know.
The current "dropcol" variable drops fields from the beginning of line. Doing that for arbitrary columns can be done, but requires an array where to save the numbers of the columns to drop. So, in my understanding this is what we want to accomplish so far: given an input of the form D1,T1,a1,b1,c1,d1,...,R1 D2,T2,a2,b2,c2,d2,...,R2 D3,T3,a3,b3,c3,d3,...,R3 D4,T4,a4,b4,c4,d4,...,R4 D5,T5,a5,b5,c5,d5,...,R5 (the ... mean that an arbitrary number of columns can follow) You want to group lines by n at a time, keeping the D and T column from the first line of each group, and keeping the R column from the last line of the group, so for example with n=3 we would have: D1,T1,a1,b1,c1,d1,...a2,b2,c2,d2,...a3,b3,c3,d3,...R3 D1,T1,a2,b2,c2,d2,...a3,b3,c3,d3,...a4,b4,c4,d4,...R4 D1,T1,a3,b3,c3,d3,...a4,b4,c4,d4,...a5,b5,c5,d5,...R5 (and you're right, that produces an output that is roughly n times the size of the original file) Now, in addition to that, you also want to drop an arbitrary number of columns in the a,b,c... group. So for example, you want to drop columns 2 and 3 (b and c in the example), so you'd end up with something like D1,T1,a1,d1,...a2,d2,...a3,d3,...R3 D1,T1,a2,d2,...a3,d3,...a4,d4,...R4 D1,T1,a3,d3,...a4,d4,...a5,d5,...R5 Please confirm that my understanding is correct, so I can come up with some code to do that. > I found a web site to study awk so I'm starting to see more or less > how your example works when I have the code in front of me. Creating > the code out of thin air might be a bit of a stretch for me at this > point though. I suggest you start from http://www.gnu.org/software/gawk/manual/gawk.html really complete, but gradual so you can have an easy start and move on to the complexities later.