Hi Sudeep,
Interesting. Just curious if this is r0p0/p1 A53 ? If so, is the errata 819472 enabled ?
Sorry for bringing this up after the loo-ong delay, but I've been assured that the A53 involved is > r0p1. I've also confirmed this problem on multiple internal platforms, and I'm pretty sure that it occurs on any b.L out there today. Also, we found the same problematic lock design used in the workqueue code in the kernel, causing the same livelock. It's very very rare and requires a perfect set of circumstances.
If it would help I can provide a unit test if you folks would be generous enough to test it on the latest Juno or something b.L that's also upstream.
Thanks, Vikram