On Nov 4, 11:45 am, Tyler <[EMAIL PROTECTED]> wrote: > Hello All: > > I hope this is the right place to ask, but I am trying to come up with > a way to parse each line of a file. Unfortunately, the file is neither > comma, nor tab, nor space delimited. Rather, the character locations > imply what field it is. > > For example: > > The first ten characters would be the record number, the next > character is the client type, the next ten characters are a volume, > and the next three are order type, and the last character would be an > optional type depending on the order type. > > The lines are somewhat more complicated, but they work like that, and > not all have to be populated, in that they may contain spaces. For > example, the order number may be 2345, and it is space padded at the > beginning of the line, and other might be zero padded in the front. > Imagine I have a line: > > ______2345H0000300000_NC_ > > where the underscores indicate a space. I then want to map this to: > > 2345,H,0000300000,NC, > > In other words, I want to preserve ALL of the fields, but map to > something that awk could easily cut up afterwords, or open in a CSV > editor. I am unsure how to place the commas based on character > location. > > Any ideas?
Here's a general solution for fixed size records: >>> def slicer(*sizes): ... slices = len(sizes) * [None] ... start = 0 ... for i,size in enumerate(sizes): ... stop = start+size ... slices[i] = slice(start,stop) ... start = stop ... return lambda string: [string[s].strip() for s in slices] ... >>> order_slicer = slicer(10,1,10,4) >>> order_slicer('______2345H0000300000_NC_'.replace('_',' ')) ['2345', 'H', '0000300000', 'NC'] HTH, George -- http://mail.python.org/mailman/listinfo/python-list