On Fri, Jan 15, 2010 at 2:17 PM, Noufal Ibrahim <nou...@gmail.com> wrote:
> On Fri, Jan 15, 2010 at 1:04 PM, Dhananjay Nene <dhananjay.n...@gmail.com > >wrote: > > > This seems to be an output of print_r of PHP. If you have a flexibility, > > try > > to have the PHP code output the data into a language neutral format (eg > > json, yaml, xml etc.) and then parse it in python using the appropriate > > parser. If not you may have to write a custom parser. I did google to > find > > if one existed, but couldn't easily locate one. > > > > > There is > http://www.php.net/manual/en/book.json.php for PHP and Python2.6 onwards > has json part of the stdlib. > > If you don't have access to the webserver, you might be able to use the php > interpreter on your own machine to parse this into something more language > neutral > If you take a look at your data, it is surprisingly close to how a nested Python dictionary will look like, except that instead of ':' to separate key from value, it uses '=>', which is what Perl and PHP uses. So, the following solution takes advantage of this fact and converts your data to a Python dictionary. Here is the complete solution. def scrub(data): # First replace [code][/code] parts data = data.replace('[code]','').replace('[/code]','') # Replace '=>' with ':' data = data.replace('=>',':') # Now, count and trans are not strings in # data, so Python will complain, hence we # define these as strings with same name! count, trans = 'count','trans' # Now prefix data with { and post-fix with } data = '{' + data + '}' print data # Eval it to a dictionary mydict = eval(data) print mydict if __name__ == "__main__": scrub(open('data.txt').read()) And it neatly prints as, {'a': {'count': 1164, 'trans': {'kaoi': 0.053943079999999997, 'kaa': 0.03726579, 'haai': 0.067746970000000004, 'kaisai': 0.088346750000000002, 'kae': 0.034464500000000002, 'kai': 0.049819820000000001, 'eka': 0.14900490999999999, '\\(none\\)': 0.044000850000000001}}, 'confident': {'count': 4, 'trans': {'mailatae': 0.028564269999999999, 'ashahvasahta': 0.74918567999999996, 'anaa': 0.015785520000000001, 'jaitanae': 0.01227762, 'pahraaram\\.nbha': 0.069907289999999997, 'utanai': 0.01929341, 'atahmavaishahvaasa': 0.090954649999999998, 'uthaanae': 0.01403157}}, 'consumers': {'count': 4, 'trans': {'sauda\\\xef\xbf\xbd\\\xef\xbf\xbd\\\xef\xbf\xbddha': 0.11875471, 'upabhaokahtaa': 0.75144361999999998, 'upabhaokahtaaom\\.n': 0.12980166000000001}}} Now, use the data as a Python dictionary. It is a clever hack, taking advantage of the nature of the data. But it is far more faster than the other approaches posted here. --Anand > > > -- > ~noufal > http://nibrahim.net.in > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > -- --Anand _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers