John Machin a écrit : > On 5/06/2006 10:38 AM, Bruno Desthuilliers wrote: > >> SuperHik a écrit : >> >>> hi all, >>> (snip)
>>> I have an old(er) script with the >>> following task - takes a string I copy-pasted and wich always has the >>> same format: >>> (snip) >>> >> def to_dict(items): >> items = items.replace('\t', '\n').split('\n') > > > In case there are leading/trailing spaces on the keys: There aren't. Test passes. (snip) > Fantastic -- at least for the OP's carefully copied-and-pasted input. That was the spec, and my code passes the test. > Meanwhile back in the real world, The "real world" is mostly defined by customer's test set (is that the correct translation for "jeu d'essai" ?). Code passes the test. period. > there might be problems with multiple > tabs used for 'prettiness' instead of 1 tab, non-integer values, etc etc. Which means that the spec and the customer's test set is wrong. Not my responsability. Any way, I refuse to change anything in the parsing algorithm before having another test set. > In that case a loop approach that validated as it went and was able to > report the position and contents of any invalid input might be better. One doesn't know what *will* be better without actual facts. You can be right (and, from my experience, you probably are !-), *but* you can be wrong as well. Until you have a correct spec and test data set on which the code fails, writing any other code is a waste of time. Better to work on other parts of the system, and come back on this if and when the need arise. <ot> Kind of reminds me of a former employer that paid me 2 full monthes to work on a very hairy data migration script (the original data set was so f... up and incoherent even a human parser could barely make any sens of it), before discovering than none of the users of the old system was interested in migrating that part of the data. Talk about a waste of time and money... </ot> Now FWIW, there's actually something else bugging me with this code : it loads the whole data set in memory. It's ok for a few lines, but obviously wrong if one is to parse huge files. *That* would be the first thing I would change - it takes a couple of minutes to do so no real waste of time, but it obviously imply rethinking the API, which is better done yet than when client code will have been written. My 2 cents.... -- http://mail.python.org/mailman/listinfo/python-list