vegetax wrote: > Steven Bethard wrote: > > > vegetax wrote: > >> How can i make my custom class an element of a set? > >> > >> class Cfile: > >> def __init__(s,path): s.path = path > >> > >> def __eq__(s,other): > >> print 'inside equals' > >> return not os.popen('cmp %s %s' % (s.path,other.path)).read() > >> > >> def __hashcode__(s): return s.path.__hashcode__() > >> > >> the idea is that it accepts file paths and construct a set of unique > >> files (the command "cmp" compares files byte by byte.),the files can > >> have different paths but the same content > >> > >> but the method __eq__ is never called
[snip] > I just tried and it wont be called =(, so how can i generate a hash code for > the CFile class? note that the comparitions(__eq__) are done based on the > contents of a file using the command 'cmp', i guess thats not posible but > thanks. Let me suggest that, if your idea is to get a set of files all with unique file contents, comparing a file byte-by-byte with each file already in the set is going to be absurdly inefficient. Instead, I recommend comparing md5 (or sha) digest. The idea is, you read in each file once, calculate an md5 digest, and compare the digests instead of the file contents. . import md5 . . class Cfile: . def __init__(self,path): . self.path = path . self.md5 = md5.new().update(open(path).read()).digest() . def __eq__(self,other): . return self.md5 == other.md5 . def __hash__(self): . return hash(self.md5) This is kind of hackish (not to mention untested). You would probably do better to mmap the file (see the mmap module) rather than read it. And, in case you're wondering: yes it is theoretically possible for different files to have the same md5. However, the chances are microscopic. (Incidentally, the SCons build system uses MD5 to decide if a file has been modified.) -- CARL BANKS -- http://mail.python.org/mailman/listinfo/python-list