Hi, I am looking at fuse scalabily issue that I hit recently on my multi-socket server machines http://permalink.gmane.org/gmane.comp.file-systems.fuse.devel/13490
The short description is that we have several thousands threads that make filesystem access to a fuse fs. Fuse spawns several threads (usually 4-8) that read the requests from kernel process it and write response back to the kernel. The problem is that fuse kernel module uses a global list where it keeps all active requests. But both consumers and producers are at different CPU's and getting lock to access the list is very expensive operation. I have a test case that shows ~35% of the system time is spent in _raw_spin_lock when accessing to this global list. I want to solve this scalability problem. One idea is not to use the spin_lock. It is the 'fair spin_lock' that has scalability problems http://pdos.csail.mit.edu/papers/linux:lock.pdf Maybe lockless datastructures can help here? Another idea is avoid global datasctructures but I have a few questions here. Let's say we want to use per-CPU lists. But the problem is that producers/consumers are not distributed across all CPUs. Some CPU might have too many producers, some other might not have consumers at all. So we need some kind of migration from hot CPU to the cold one. What is the best way to achieve it? Are there any examples how to do this? Any other ideas? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/