On Jul 11, 9:49 pm, James Stroud <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > I'd like to implement a subclass of string that works like this:
>
> m = MyString('mail')
> m == 'fail'
>
> > True
>
> m == 'mail'
>
> > False
>
> m in ['fail', hail']
>
> > True
>
> > My best attempt for something like this is:
>
> > class MyString(str):
> > def __init__(self, seq):
> > if self == self.clean(seq): pass
> > else: self = MyString(self.clean(seq))
>
> > def clean(self, seq):
> > seq = seq.replace("m", "f")
>
> > but this doesn't work. Nothing gets changed.
>
> > I understand that I could just remove the clean function from the
> > class and call it every time, but I use this class in several
> > locations, and I think it would be much safer to have it do the
> > cleaning itself.
>
> The "flat is better than nested" philosophy suggests that clean should
> be module level and you should initialize a MyString like such:
>
>m = MyString(clean(s))
>
> Where clean is
>
>def clean(astr):
> return astr.replace('m', 'f')
>
> Although it appears compulsory to call clean each time you instantiate
> MyString, note that you do it anyway when you check in your __init__.
> Here, you are explicit. Such an approach also eliminates the obligation
> to clean the string under conditions where you know it will already be
> clean--such as deserialization.
Initially, I tried simply calling a clean function on a regular
string, without any of this messy subclassing. However, I would end
up accidentally cleaning it more than once, and transforming the
string was just very messy. I thought that it would be much easier to
just clean the string once, and then add methods that would give me
the various transformations that I wanted from the cleaned string.
Using __new__ seems to be the solution I was looking for.
>
> Also, you don't return anything from clean above, so you assign None to
> self here:
>
> self = MyString(self.clean(seq))
>
> Additionally, it has been suggested that you use __new__. E.g.:
>
> py> class MyString(str):
> ... def __new__(cls, astr):
> ... astr = astr.replace('m', 'f')
> ... return super(MyString, cls).__new__(cls, astr)
> ...
> py> MyString('mail')
> 'fail'
>
> But this is an abuse of the str class if you intend to populate your
> subclasses with self-modifying methods such as your clean method. In
> this case, you might consider composition, wherein you access an
> instance of str as an attribute of class instances. The python standard
> library make this easy with the UserString class and the ability to add
> custom methods to its subclasses:
What constitutes an abuse of the str class? Is there some performance
decrement that results from subclassing str like this? (Unfortunately
my implementation seems to have a pretty large memory footprint, 400mb
for about 400,000 files.) Or do you just mean from a philsophical
standpoint? I guess I don't understand what benefits come from using
UserString instead of just str.
Thanks for the help,
Chris
>
> py> from UserString import UserString as UserString
> py> class MyClass(UserString):
> ... def __init__(self, astr):
> ... self.data = self.clean(astr)
> ... def clean(self, astr):
> ... return astr.replace('m', 'f')
> ...
> py> MyClass('mail')
> 'fail'
> py> type(_)
>
>
> This class is much slower than str, but you can always access an
> instance's data attribute directly if you want fast read-only behavior.
>
> py> astr = MyClass('mail').data
> py> astr
> 'fail'
>
> But now you are back to a built-in type, which is actually the
> point--not everything needs to be in a class. This isn't java.
>
> James
>
> --
> James Stroud
> UCLA-DOE Institute for Genomics and Proteomics
> Box 951570
> Los Angeles, CA 90095
>
> http://www.jamesstroud.com/
--
http://mail.python.org/mailman/listinfo/python-list