On Thu, 20 Apr 2023, Michael Schmitz wrote: > Can you try and fault in as many of these stack pages as possible, ahead > of filling the stack? (Depending on how much RAM you have ...). Maybe we > would need to lock those pages into memory? Just to show that with no > page faults (but still signals) there is no corruption? >
OK. > > Any signal frames or exception frames have been completely overwritten > > because the recursion continued after the corruption took place. So > > there's not much to see in the core dump. > > We'd need a way to stop recursion once the first corruption has taken > place. If the 'safe' recursion depth of 10131 is constant, the dump > taken at that point should look similar to what you saw in dash > (assuming it is the page fault and subsequent signal return that causes > the corruption). > It turns out that the recursion depth can be set a lot lower than the 200000 that I chose in that test program. (I used that value as it kept the stack size just below the default 8192 kB limit.) At depth = 2500, a failure is around 95% certain. At depth = 2048 I can still get an intermittent failure. This only required 21 stack pagefaults and one fork. I suspect that the location of the corruption is probably somewhat random, and the larger the stack happens to be when the signal comes in, the better the odds of detection.