[EMAIL PROTECTED] wrote: > Hello, > > I am looking for python code useful to process > tables that are in ASCII text. The code must > determine where are the columns (fields). > Concerned tables for my application are various, > but their columns are not very complicated > to locate for a human, because even > when ignoring the semantic of words, > our eyes see vertical alignments > > Here is a sample table (must be viewed > with fixed-width font to see alignments): > ================================= > > 44544 ipod apple black 102 > GFGFHHF-12 unknown thing bizar brick mortar tbc > 45fjk do not know + is less biac > disk seagate 250GB 130 > 5G_gff tbd tbd > gjgh88hgg media record a and b 12 > hjj foo bar hop zip > hg uy oi hj uuu ii a qqq ccc v ZZZ Ughj > qdsd zert nope nope > > ================================= > > I want the python code that builds a representation > of this table (for exemple a list of lists, where each list > represents a table line, each element of the list > being a field value). > > Any hints? > thanks >
As promised. I call this the "cast a shadow" algorithm for table discovery. This is about as obfuscated as I could make it. It will be up to you to explain it to your teacher ;-) Assuming the lines are all equal width (padded right with space) e.g.: def rpadd(lines): """ Pass in the lines as a list of lines. """ lines = [line.rstrip() for line in lines] maxlen = max([len(line) for line in lines]) return [line + ' ' * (maxlen - len(line)) for line in lines] In which case, you can: binary = [[((s==' ' and 2) or 1) for s in line] for line in lines] shadow = [1 in c for c in zip(*binary)] isit = False indices = [] for i,v in enumerate(shadow): if v is not isit: indices.append(i) isit = not isit indices.append(i+1) indices = [t for t in zip(indices[::2],indices[1::2])] columns = [[line[t[0]:t[1]].strip() for line in lines] for t in indices] In case you want rows: rows = zip(*columns) James -- http://mail.python.org/mailman/listinfo/python-list