Hello Ingo, Peter. I am implementing non-lru page migration and preparing v4 to resend. https://lkml.org/lkml/2016/3/30/56
Although design was changed from v3, my issue I will say from now on is still same so I think it's not hard to understand this problem with v3 although I didn't send v4 yet. :) My problem is zsmalloc part for supporting page migration. The zsmalloc stores several compressed pages in a page. Let's say compressed page as 'object'. If we are luck, we could store 113 objects(because minimum slot size is 36 byte) in a page. If a page has internal fragmentation, zsmalloc try to migrate a object from A page to B page. We call it as 'object migration'. To prevent access from user during the object migration, we uses spin lock in the atomic path to save memory space. Thus, it's object granularity so user can access other objects in the page. (Exactly speaking, it's not a spin_lock but owned-invented weired bit spin-lock with test_and_set_bit in while loop. I know it's bad buggy mess so I will change it with regular bit_spin_lock but the issue is still there). During object migration, the spin lock will be nested twice. One is source object, the othere is destination object. Let's return back to the issue. This time, not object but page migration, step is as follows. migration trial A page to B page. B is newly allocated page so it's empty. 1. freeze every objects in A page for object in a page bit_spin_lock(object) 2. memcpy(B, A, PAGE_SIZE); 3. unfreeze every objects in A page for object in a page bit_spin_unlock(object) 4. put_page(A); The logic is rather staightforward, I guess. :) Here, the problem is that unlike object migration, page migration needs to prevent all objects access in a page all at once before step 2. So, if we are luck, we can increase preempt_count as 113 every CPU so easily preempt_count_add emits spinlock count overflow in DEBUG_LOCKS_WARN_ON if we are multiple CPUs.(My machine is 12 CPU). I think there are several choices to fix it but I'm not sure what's the best so I want to hear your opinion. 1. increase preempt_count size? 2. support bit_spin_lock_no_preempt/bit_spin_unlock_no_preempt? 3. redesign zsmalloc page migration locking granularity? I want to avoid 3 if possible because such design will make code very complicated and may hurt scalabitity and performance, I guess. I guess 8bit for PREEMPT_BITS is too small for considering the number of CPUs in recent computer system? I hope I'm not alone to see this issue until now. :) Thanks.