I read the article. It's still based on a CAS which I don't necessarily consider simpler than PLO . In this article, it states " Each hazard pointer can be written only by its associated thread, but can be read by all threads." This is exactly the provision I state in my example. It's a key element of any serialization algorithm that allows concurrent lock-free operations. Hazard pointers seem like an elegant general solution. I'll have to explore it but I've not had issues with any of the solutions I've written using other methods. I write a solution appropriate to the usage of the serialized resource. The key is a rigid protocol that has to always be followed.
In my e-mail last night I meant transactional execution facility not transactional event facility. I've thoroughly read the POM on this facility but have not to date had access to a processor to experiment with it. From my reading, the transactional execution facility (TM) , appears to be a way to implement almost all requirements for concurrent read/write access to a serialized resource. Essentially, from my understanding of the POM, it's a CAS on as many operands as necessary within a hardware defined limit. In my very complex application, there are a couple scenarios where I cannot use PLO to serialize access. I have no problem using multi-step PLO operations for serialization as long as the integrity of the resource is guaranteed after each step. For example, a delete that consists of a PLO to remove from the active chain and a separate PLO to add the removed element to a free chain. However, some operations are too complex for a PLO CAS and triple store even if the operands are organized in storage such that you can modify 128 bits at a time. In these cases, I use a gate word and a spin lock. When available, the gate is 0. When in use the gate contains identification information for the gate owner. I very rarely have to use these. And if I had a TM like transactional execution facility, I would replace this spin lock with this facility. Normally, there are only a handful of instructions within the gate so this has never caused me any problems. In a senses, all methods, LL/CS, TM, etc. are spin locks. If they don't succeed, try, try again. All methods that I know of, the hardware must perform a memory serialization function. I use PLO instead of CS not only because of the increased functionality such as modifying noncontiguous areas and being able to modify up to 4 128 bit areas but also because I believe the PLO lock word is an advantage. In all hardware serialization methods that I know of, a memory serialization function is required during the LL/CS or TM. These serializing functions can be expensive. CS is not granular and the serialization proceeds without regard to accesses by other CPUs meaning the overhead occurs whether the function succeeds or fails. The PLO lock word "gates" access by all processors using the same lock word thus reducing the total number of serializations performed by "stopping" a processor performing a PLO using the same lock word. This requires careful selection of lock words and use of the same lock word by all processes that read/write to the same resource. This advantage can be negated in a queue that has a substantial percentage of write operations compared to read operations because write operations will necessarily result in a PLO failure. I believe this is the disadvantage referred to in your first paper on TM by Paul McKenney. I suspect that since IBM is using TM to replace a lock and that in most cases the lock was used to serialize storage alterations. In this case, CPU would increase but so would throughput. I believe this is a classic example of trading the "less expensive" CPU resource for the "more expensive" throughput. Kenneth -----Original Message----- From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf Of David Crayford Sent: Sunday, November 10, 2013 8:56 PM To: [email protected] Subject: Re: Serialization without Enque On 11/11/2013 10:36 AM, Kenneth Wilkerson wrote: > I read the article. This article is about transactional event facility > introduced in z/EC-12 and not PLO which is an LL/CS. I wish I had access to a > z/EC-12 with the transactional event facility to play with it and compare it > to PLO. The transactional event facility is much more comprehensive and not > as granular as a PLO. In PLO, the hardware locking occurs according to the > lock word. Transaction Memory sounds exciting but it's complex. IBM should put a layer of abstraction on top with simple semantics. > I've done a lot of testing with PLO. It can increase CPU, particularly in a > situation where updates are much higher percentage of the operations. But in > all applications that I've tested, it's CPU overhead is offset by higher > throughput. In a traditional locking method, tasks end up serializing to the > lock. There are lots of lock-free algorithms out there that do very job just using a simple CAS. RCU, hazard pointers to name but a few. Hazard pointers are interesting in how they deal with ownership http://www.research.ibm.com/people/m/michael/podc-2002.pdf (IBM patent warning!). PLO is a fine instruction. It makes it easy to implement lock-free multi-producer multi-consumer stacks/queues. I'm interested how would one use PLO to implement a fair reader/writer lock? I've seen some interesting bakery style ticketing algorithms. They-re basically spinlocks on steriods. > > Kenneth > > -----Original Message----- > From: IBM Mainframe Discussion List [mailto:[email protected]] > On Behalf Of David Crayford > Sent: Sunday, November 10, 2013 6:50 PM > To: [email protected] > Subject: Re: Serialization without Enque > > On 11/11/2013 5:19 AM, Mark Zelden wrote: >> On Sat, 9 Nov 2013 19:47:35 GMT, [email protected] <[email protected]> wrote: >> >>> I have been reading and following this thread sine PLO is not an >>> instruction I use every day. >>> It would be nice if someone would actually post some working code using a >>> PLO instruction, to illustrate how one would add an element to a queue and >>> remove an element from a queue. >>> >>> Paul D'Angelo >> I've not been paying that close of attention, but I'm more curious >> about what people did for these situations prior to PLO. > They used smart algorithms using the atomic instructions they had, like RCU > http://en.wikipedia.org/wiki/Read-copy-update. It's interesting that I have > never seen any use of the PLO instruction in the zLinux kernel code. > > Paul McKenney, IBMs expert on these things, wrote a good article that > suggests that Hardware Transaction Memory may not be the panacea we all > expect it to be, and in some cases may actually increase CPU > http://paulmck.livejournal.com/31285.html. > > >> Mark >> -- >> Mark Zelden - Zelden Consulting Services - z/OS, OS/390 and MVS >> mailto:[email protected] ITIL v3 Foundation Certified Mark's MVS >> Utilities: http://www.mzelden.com/mvsutil.html >> Systems Programming expert at >> http://search390.techtarget.com/ateExperts/ >> --------------------------------------------------------------------- >> - For IBM-MAIN subscribe / signoff / archive access instructions, >> send email to [email protected] with the message: INFO >> IBM-MAIN > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, send > email to [email protected] with the message: INFO IBM-MAIN > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, send > email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
