Hi Paul, I was asking Sasha about [1] since other folks in Oracle also stumbled upon similar RCU stalls with v4.1 kernel in different workloads. I was reported similar issue with RDS as well and looking at [1], [2], [3] and [4], thought of reaching out to see if you can help us to understand this issue better.
Have also included RCU specific config used in these test(s). Its very hard to reproduce the issue but one of the data point is, it reproduces on systems with larger CPUs(64+). Same workload with less than 64 CPUs, don't show the issue. Someone also told me, making use of SLAB instead SLUB allocator makes difference but I haven't verified that part for RDS. Let me know your thoughts. Thanks in advance !! Regards, Santosh [1] https://lkml.org/lkml/2014/12/14/304 [2] log 1: http://pastebin.uk.oracle.com/iUr9qE [3] log 2: http://pastebin.uk.oracle.com/Oe3cr5 [4] log 3: http://pastebin.uk.oracle.com/bMYLkD [5] rcu config: http://pastebin.uk.oracle.com/e7NXTW