Re: [python-uk] [pyconuk] Minimalistic software transactional memory
Mike I don't think that you can rely on the threadsafety of these functions. Even if they are threadsafe in C Python (which I doubt that 'set' is), the locking in Jython in more fine grained and would likely catch you out. I would suggest that you should routinely wrap shared datamodels like these in thread locks to be certain about things. I would also suggest that a small change to the Value class would make it possible for client code to subclass it, which might make it more flexible. Here is my suggestion, bare in mind that I have not tested the thread locking code beyond making sure that it runs :-) Regards Richard #!/usr/bin/python from threading import Lock, RLock class Locker(object): def __init__(self,lock): self._lock = lock def __call__(self): lock = self._lock def wrap(f): def newFunction(*args, **kw): lock.acquire() try: return f(*args, **kw) finally: lock.release() return newFunction return wrap write_lock = Locker(Lock()) read_lock = Locker(RLock()) class ConcurrentUpdate(Exception): pass class Value(object): def __init__(self, version, value,store,key, *tup, **kw): self.version = version self.value = value self.store = store self.key = key self.tup = tup self.kw = kw def __repr__(self): return "Value"+repr((self.version,self.value)) def set(self, value): self.value = value def commit(self): self.store.set(self.key, self) def clone(self): return self.__class__(self.version, self.value,self.store,self.key,*self.tup, **self.kw) class Store(object): def __init__(self, value_class=Value): self.store = {} self.value_class = value_class @read_lock() def get(self, key): return self.store[key].clone() @write_lock() def set(self, key, value): if not (self.store[key].version > value.version): self.store[key] = self.value_class(value.version+1, value.value, self, key) value.version= value.version+1 else: raise ConcurrentUpdate @read_lock() def using(self, key): try: return self.get(key) except KeyError: self.store[key] = self.value_class(0, None,self,key) return self.get(key) def dump(self): for k in self.store: print k, ":", self.store[k] S = Store() greeting = S.using("hello") print repr(greeting.value) greeting.set("Hello World") greeting.commit() print greeting S.dump() # -- par = S.using("hello") par.set("Woo") par.commit() # -- print greeting S.dump() # -- greeting.set("Woo") try: greeting.commit() except ConcurrentUpdate: print "Received ConcurrentUpdate exception" print repr(greeting), repr(greeting.value) S.dump() On Saturday 08 December 2007, Michael Sparks wrote: > I've just posted this message on comp.lang.python fishing for comments, but > I'll post it here in case there's any feedback this way :-) > > (apologies for any dupes people get - I don't *think* python-uk & pyconuk are > strict subsets) > > I'm interested in writing a simple, minimalistic, non persistent (at this > stage) software transactional memory (STM) module. The idea being it should > be possible to write such a beast in a way that can be made threadsafe fair > easily. > > For those who don't know, STM is a really fancy way of saying variables > with version control (as far as I can tell :-) designed to enable threadsafe > shared data. > > I'm starting with the caveat here that the following code is almost > certainly not threadsafe (not put any real thought into that as yet), > and I'm interested in any feedback on the following: > >* Does the API look simple enough? >* Are there any glaring mistakes in the code ? (It's always harder to see > your own bugs) >* What key areas appear least threadsafe, and any general suggestions > around that. > > If I get no feedback I hope this is of interest. Since these things get > archived, if you're reading this a month, 6 months, a year or more from > now, I'll still be interested in feedback... > > OK, API. > > First of all we need to initialise the store: > > S = Store() > > We then want to get a value from the store such that we can use the value, > and do stuff with it: > > greeting = S.using("hello") > > Access the value: > > print repr(greeting.value) > > Update the value: > > greeting.set("Hello World") > > Commit the value back to the store: > > greeting.commit() > > If you have concurrent updates of the same value, the following exception > gets thrown: > ConcurrentUpdate > > cf: >
Re: [python-uk] [pyconuk] Minimalistic software transactional memory
Hi Richard, On Tuesday 11 December 2007 13:36, Richard Taylor wrote: > I don't think that you can rely on the threadsafety of these functions. > Even if they are threadsafe in C Python (which I doubt that 'set' is), the > locking in Jython in more fine grained and would likely catch you out. It's perhaps worth noting in the version of the code I posted, in this thread, where it said... """What key areas appear least threadsafe, and any general suggestions around that.""" ...I knew that set and using were not threadsafe, but wondered about other parts. I perhaps should've been more explicit on that point. (I wanted to simply post some ideas which showed the core logic without locking. Perhaps a mistake :) Anyhow, the current version is here: https://kamaelia.svn.sourceforge.net/svnroot/kamaelia/branches/private_MPS_Scratch/Bindings/STM/Axon/STM.py In that version, "set" now looks like this: def set(self, key, value): success = False if self.lock.acquire(0): try: if not (self.store[key].version > value.version): self.store[key] = Value(value.version+1, copy.deepcopy(value.value), self, key) value.version= value.version+1 success = True finally: self.lock.release() else: raise BusyRetry if not success: raise ConcurrentUpdate and "using" has changed to "usevar: (using now relates to a collection) def usevar(self, key): try: return self.get(key) except KeyError: if self.lock.acquire(0): try: self.store[key] = Value(0, None,self,key) finally: self.lock.release() else: raise BusyRetry return self.get(key) Since mutations of the store rely on acquiring the lock on the store, that should be safe(r). User code doesn't have to worry about locks however - which is course the point of the code :-) The reason for specifically using the acquire(0) call rather than acquire() call is because I want it to fail hard if the lock can't be acquired. I know it'd be nicer to have a finer grained lock here, but I'm personally primarily going to be using this for rare operations rather than common operations. These locks above are of course in relation to write locking. I'll think about the read locking you've suggested. Your locking looks incorrect on using since it both allows reading and writing of the store. (retrieve value & if not present create & initialise) I also think the independent locks are a misnomer, but they're useful for thinking about it. > I would suggest that you should routinely wrap shared datamodels like these > in thread locks to be certain about things. Indeed. It makes the code look worse, so for this example I was really after suggestions (like yours :-) of "OK, where does this break badly" as well as "does the logic look sane?". > I would also suggest that a small change to the Value class would make it > possible for client code to subclass it, which might make it more flexible. I'm not convinced by the changes to Value - its there for storing arbitrary values, rather than extending Value itself. It's probably worth noting that .clone has changed in my version to this: def clone(self): return Value(self.version, copy.deepcopy(self.value),self.store,self.key) Which includes deepcopy on the value stored by Value. I'm beginning to think that Value should be called "Variable" to make this clearer... > Here is my suggestion, bare in mind that I have not tested the thread > locking code beyond making sure that it runs :-) The feedback is much appreciated - it's making me think more about the read locking aspect. I suspect the GIL in CPython *may* make the reads safe, but the lack of a GIL in jython & ironpython probably renders the reads in using & get unsafe. (and I would like this to be safe in jython & ironpython) It's interesting though, after having developed large amounts of code of code based on no-shared-data & read-only/write-only pipes with data handoff and not having had any major concurrency issues (despite mixing threads and non threads) switching to a shared data model instantly causes problems. The difference is really stark. One is simple, natural and easy and the other is subtle & problematic. I'm not shocked, but find it amusing :-) Many thanks for the feedback! Michael. ___ python-uk mailing list python-uk@python.org http://mail.python.org/mailman/listinfo/python-uk