On Jul 11, 9:49 pm, James Stroud <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > I'd like to implement a subclass of string that works like this: > > >>>>m = MyString('mail') > >>>>m == 'fail' > > > True > > >>>>m == 'mail' > > > False > > >>>>m in ['fail', hail'] > > > True > > > My best attempt for something like this is: > > > class MyString(str): > > def __init__(self, seq): > > if self == self.clean(seq): pass > > else: self = MyString(self.clean(seq)) > > > def clean(self, seq): > > seq = seq.replace("m", "f") > > > but this doesn't work. Nothing gets changed. > > > I understand that I could just remove the clean function from the > > class and call it every time, but I use this class in several > > locations, and I think it would be much safer to have it do the > > cleaning itself. > > The "flat is better than nested" philosophy suggests that clean should > be module level and you should initialize a MyString like such: > > m = MyString(clean(s)) > > Where clean is > > def clean(astr): > return astr.replace('m', 'f') > > Although it appears compulsory to call clean each time you instantiate > MyString, note that you do it anyway when you check in your __init__. > Here, you are explicit. Such an approach also eliminates the obligation > to clean the string under conditions where you know it will already be > clean--such as deserialization.
Initially, I tried simply calling a clean function on a regular string, without any of this messy subclassing. However, I would end up accidentally cleaning it more than once, and transforming the string was just very messy. I thought that it would be much easier to just clean the string once, and then add methods that would give me the various transformations that I wanted from the cleaned string. Using __new__ seems to be the solution I was looking for. > > Also, you don't return anything from clean above, so you assign None to > self here: > > self = MyString(self.clean(seq)) > > Additionally, it has been suggested that you use __new__. E.g.: > > py> class MyString(str): > ... def __new__(cls, astr): > ... astr = astr.replace('m', 'f') > ... return super(MyString, cls).__new__(cls, astr) > ... > py> MyString('mail') > 'fail' > > But this is an abuse of the str class if you intend to populate your > subclasses with self-modifying methods such as your clean method. In > this case, you might consider composition, wherein you access an > instance of str as an attribute of class instances. The python standard > library make this easy with the UserString class and the ability to add > custom methods to its subclasses: What constitutes an abuse of the str class? Is there some performance decrement that results from subclassing str like this? (Unfortunately my implementation seems to have a pretty large memory footprint, 400mb for about 400,000 files.) Or do you just mean from a philsophical standpoint? I guess I don't understand what benefits come from using UserString instead of just str. Thanks for the help, Chris > > py> from UserString import UserString as UserString > py> class MyClass(UserString): > ... def __init__(self, astr): > ... self.data = self.clean(astr) > ... def clean(self, astr): > ... return astr.replace('m', 'f') > ... > py> MyClass('mail') > 'fail' > py> type(_) > <type 'instance'> > > This class is much slower than str, but you can always access an > instance's data attribute directly if you want fast read-only behavior. > > py> astr = MyClass('mail').data > py> astr > 'fail' > > But now you are back to a built-in type, which is actually the > point--not everything needs to be in a class. This isn't java. > > James > > -- > James Stroud > UCLA-DOE Institute for Genomics and Proteomics > Box 951570 > Los Angeles, CA 90095 > > http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list