[EMAIL PROTECTED] wrote: > On Jul 11, 9:49 pm, James Stroud <[EMAIL PROTECTED]> wrote:
>>The "flat is better than nested" philosophy suggests that clean should >>be module level and you should initialize a MyString like such: >> >> m = MyString(clean(s)) >> >>Where clean is >> >> def clean(astr): >> return astr.replace('m', 'f') >> >>Although it appears compulsory to call clean each time you instantiate >>MyString, note that you do it anyway when you check in your __init__. >>Here, you are explicit. > > Initially, I tried simply calling a clean function on a regular > string, without any of this messy subclassing. However, I would end > up accidentally cleaning it more than once, and transforming the > string was just very messy. Its not clear what you mean here. A code snippet might help. In theory, you can encapsulate any amount of cleaning inside a single function, so it shouldn't be messy. You need only to return the result. def fix_whitespace(astr): import string astr = ''.join(c if c not in string.whitespace else '-' for c in astr) return astr.strip() def fix_m(astr): return astr.replace('m', 'f') def clean_up(astr): return fix_m(fix_whitespace(astr)) In theory, if you didn't want a custom string to actually change its own value, this could be semantically equivalent to: new_str = astr.fix_whitespace().fix_m() The latter might be a little more readable than the former. > I thought that it would be much easier to > just clean the string once, and then add methods that would give me > the various transformations that I wanted from the cleaned string. If you intended these transformations to be new instances of MyString, then this would probably not be abuse of the str built in type. > Using __new__ seems to be the solution I was looking for. >>Additionally, it has been suggested that you use __new__. E.g.: >> >>py> class MyString(str): >>... def __new__(cls, astr): >>... astr = astr.replace('m', 'f') >>... return super(MyString, cls).__new__(cls, astr) >>... >>py> MyString('mail') >>'fail' >> >>But this is an abuse of the str class if you intend to populate your >>subclasses with self-modifying methods such as your clean method. In >>this case, you might consider composition, wherein you access an >>instance of str as an attribute of class instances. The python standard >>library make this easy with the UserString class and the ability to add >>custom methods to its subclasses: > What constitutes an abuse of the str class? Changing its value as if it were mutable. This might be ok (though not recommended) during instantiation, but you wouldn't want something like this: py> class MyString(str): ... [etc.] ... py> s = MyString('mail man') py> s 'fail fan' py> # kind-of ok up till now, but... py> s.fix_whitespace() py> s 'fail-fan' py> # abusive to str Probably better, if subclassing str, would be something more explicit: py> s = MyString('mail man') py> s 'mail man' py> s = s.fix_ms() py> s 'fail fan' py> s = s.fix_whitespace() py> s 'fail-fan' py> s = MyString('mail man') py> s 'mail man' py> s.clean_up() 'fail-fan' In this way, users would not need to understand the implementation of MyString (i.e. that it gets cleaned by default), and its behavior more intuitively resembles the built-in str class--except that MyString has added functionality. > Is there some performance > decrement that results from subclassing str like this?(Unfortunately > my implementation seems to have a pretty large memory footprint, 400mb > for about 400,000 files.) Or do you just mean from a philsophical > standpoint? Philosophical from a standpoint of desiring intuitively usable, reusable, and maintainable code. > I guess I don't understand what benefits come from using > UserString instead of just str. Probably not many if you think of MyString as I suggest above. But if you want it to be magic, as you described originally, then you might think about UserString. James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list