On Jun 5, 8:50 pm, Eelco <hoogendoorn.ee...@gmail.com> wrote: > > thank you both for your replies. Unfortunately it is a pre-existing > > file format imposed by an external system that I can't > > change. Thank you for the code snippet. > > Hi Richard, > > Despite the fact that it is a preexisting format, it is very close > indeed to valid YAML code. > > Writing your own whitespace-aware parser can be a bit of a pain, but > since YAML does this for you, I would argue the cleanest solution > would be to bootstrap that functionality, rather than roll your own > solution, or to resort to hard to maintain regex voodoo. > > Here is my solution. As a bonus, it directly constructs a custom > object hierarchy (obviously you would want to expand on this, but the > essentials are there). One caveat: at the moment, the conversion to > YAML relies on the appparent convention that instances never directly > contain other instances, and lists never directly contain lists. This > means all instances are list entries and get a '-' appended, and this > just works. If this is not a general rule, youd have to keep track of > an enclosing scope stack an emit dashes based on that. Anyway, the > idea is there, and I believe it to be one worth looking at. > > <code> > import yaml > > class A(yaml.YAMLObject): > yaml_tag = u'!A' > def __init__(self, **kwargs): > self.__dict__.update(kwargs) > def __repr__(self): > return 'A' + str(self.__dict__) > > class B(yaml.YAMLObject): > yaml_tag = u'!B' > def __init__(self, **kwargs): > self.__dict__.update(kwargs) > def __repr__(self): > return 'B' + str(self.__dict__) > > class C(yaml.YAMLObject): > yaml_tag = u'!C' > def __init__(self, **kwargs): > self.__dict__.update(kwargs) > def __repr__(self): > return 'C' + str(self.__dict__) > > class TestArray(yaml.YAMLObject): > yaml_tag = u'!TestArray' > def __init__(self, **kwargs): > self.__dict__.update(kwargs) > def __repr__(self): > return 'TestArray' + str(self.__dict__) > > class myList(yaml.YAMLObject): > yaml_tag = u'!myList' > def __init__(self, **kwargs): > self.__dict__.update(kwargs) > def __repr__(self): > return 'myList' + str(self.__dict__) > > data = \ > """ > An instance of TestArray > a=a > b=b > c=c > List of 2 A elements: > Instance of A element > a=1 > b=2 > c=3 > Instance of A element > d=1 > e=2 > f=3 > List of 1 B elements > Instance of B element > a=1 > b=2 > c=3 > List of 2 C elements > Instance of C element > a=1 > b=2 > c=3 > Instance of C element > a=1 > b=2 > c=3 > An instance of TestArray > a=1 > b=2 > c=3 > """.strip() > > #remove trailing whitespace and seemingly erronous colon in line 5 > lines = [' '+line.rstrip().rstrip(':') for line in data.split('\n')] > > def transform(lines): > """transform text line by line""" > for line in lines: > #regular mapping lines > if line.find('=') > 0: > yield line.replace('=', ': ') > #instance lines > p = line.find('nstance of') > if p > 0: > s = p + 11 > e = line[s:].find(' ') > if e == -1: e = len(line[s:]) > tag = line[s:s+e] > whitespace= line.partition(line.lstrip())[0] > yield whitespace[:-2]+' -'+ ' !'+tag > #list lines > p = line.find('List of') > if p > 0: > whitespace= line.partition(line.lstrip())[0] > yield whitespace[:-2]+' '+ 'myList:' > > ##transformed = (transform( lines)) > ##for i,t in enumerate(transformed): > ## print '{:>3}{}'.format(i,t) > > transformed = '\n'.join(transform( lines)) > print transformed > > res = yaml.load(transformed) > print res > print yaml.dump(res) > </code>
Hi Eelco many thanks for the reply / solution it definitely looks like a clean way to go about it. However installing 3rd party libs like yaml on the server I dont think is on the cards at the moment. -- http://mail.python.org/mailman/listinfo/python-list