> thank you both for your replies. Unfortunately it is a pre-existing > file format imposed by an external system that I can't > change. Thank you for the code snippet.
Hi Richard, Despite the fact that it is a preexisting format, it is very close indeed to valid YAML code. Writing your own whitespace-aware parser can be a bit of a pain, but since YAML does this for you, I would argue the cleanest solution would be to bootstrap that functionality, rather than roll your own solution, or to resort to hard to maintain regex voodoo. Here is my solution. As a bonus, it directly constructs a custom object hierarchy (obviously you would want to expand on this, but the essentials are there). One caveat: at the moment, the conversion to YAML relies on the appparent convention that instances never directly contain other instances, and lists never directly contain lists. This means all instances are list entries and get a '-' appended, and this just works. If this is not a general rule, youd have to keep track of an enclosing scope stack an emit dashes based on that. Anyway, the idea is there, and I believe it to be one worth looking at. <code> import yaml class A(yaml.YAMLObject): yaml_tag = u'!A' def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): return 'A' + str(self.__dict__) class B(yaml.YAMLObject): yaml_tag = u'!B' def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): return 'B' + str(self.__dict__) class C(yaml.YAMLObject): yaml_tag = u'!C' def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): return 'C' + str(self.__dict__) class TestArray(yaml.YAMLObject): yaml_tag = u'!TestArray' def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): return 'TestArray' + str(self.__dict__) class myList(yaml.YAMLObject): yaml_tag = u'!myList' def __init__(self, **kwargs): self.__dict__.update(kwargs) def __repr__(self): return 'myList' + str(self.__dict__) data = \ """ An instance of TestArray a=a b=b c=c List of 2 A elements: Instance of A element a=1 b=2 c=3 Instance of A element d=1 e=2 f=3 List of 1 B elements Instance of B element a=1 b=2 c=3 List of 2 C elements Instance of C element a=1 b=2 c=3 Instance of C element a=1 b=2 c=3 An instance of TestArray a=1 b=2 c=3 """.strip() #remove trailing whitespace and seemingly erronous colon in line 5 lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')] def transform(lines): """transform text line by line""" for line in lines: #regular mapping lines if line.find('=') > 0: yield line.replace('=', ': ') #instance lines p = line.find('nstance of') if p > 0: s = p + 11 e = line[s:].find(' ') if e == -1: e = len(line[s:]) tag = line[s:s+e] whitespace= line.partition(line.lstrip())[0] yield whitespace[:-2]+' -'+ ' !'+tag #list lines p = line.find('List of') if p > 0: whitespace= line.partition(line.lstrip())[0] yield whitespace[:-2]+' '+ 'myList:' ##transformed = (transform( lines)) ##for i,t in enumerate(transformed): ## print '{:>3}{}'.format(i,t) transformed = '\n'.join(transform( lines)) print transformed res = yaml.load(transformed) print res print yaml.dump(res) </code> -- http://mail.python.org/mailman/listinfo/python-list