Hi! ----- Ursprüngliche Mail ----- > Von: "Benjamin Berg" <benja...@sipsolutions.net> > First, it doesn't seem like my patch actually works, so please do not > merge it. It actually appears that tree RCU and tiny RCU (which are > selected depending on the preemption setting) are behaving differently. > > So now I am wondering if I can come up with a hack that works for both.
Ok! > On Fri, 2024-09-13 at 13:47 +0200, Richard Weinberger wrote: >> ----- Ursprüngliche Mail ----- >> > Von: "Benjamin Berg" <benja...@sipsolutions.net> >> > > While I acknowledge that time-travel itself is a beautiful hack, I'd >> > > like to keep the hacks >> > > to keep it working minimal. >> > > So, the problem here is that RCU callbacks never run and just pile up? >> > >> > Yes. A simple example of this is doing a "find /". This will allocate a >> > lot of inode information which is only free'ed at a later point. >> > >> > > I wonder why such a situation does not happen in a nohz_full setup on >> > > regular systems. >> > >> > Had to search for a bit. But, I think the boot CPU will still have a >> > tick even on a NOHZ_FULL setup. see the nohz_full= boot parameter. >> > >> > It does look like the RCU code might try to force scheduling (tiny RCU) >> > or wake up a worker (tree RCU) in these situations. But neither of >> > these attempts is going to fix the situation as there will be no call >> > to rcu_sched_clock_irq with time-travel. >> >> Agreed. I think having a house keeping CPU (thread) will not work in >> time-travel mode. >> Kicking RCU whenever a syscall is executed is okay, the question is, >> are there other scenarios where RCU work can pile up and no syscall is >> run for a long time? Maybe we need to kick it at other places (page fault >> handler?) >> too. > > Hmm, that is good question. I assume that implies major faults for > mapped files (or anonymous memory from swap) happening. I suppose, that > can trigger just about anything in the kernel and could also create > load on the RCU. Not sure how problematic that is, in our case it was > python importing a large amount of files and bringing the system to its > knees in the process. I had also workloads like heavy network processing without userspace interaction in mind. > Anyway, I'll need to reconsider the hack a bit, maybe we can find a > better solution. We can also add RCU folks into the loop. But I guess they need a good introduction first what time-traveling is. :-D Thanks, //richard