在 6/30/2025 9:30 PM, Mikulas Patocka 写道:
On Tue, 24 Jun 2025, Dongsheng Yang wrote:
Hi Mikulas,
This is V1 for dm-pcache, please take a look.
Code:
https://github.com/DataTravelGuide/linux tags/pcache_v1
Changelogs from RFC-V2:
- use crc32c to replace crc32
- only retry pcache_req when cache full, add pcache_req into defer_list,
and wait cache invalidation happen.
- new format for pcache table, it is more easily extended with
new parameters later.
- remove __packed.
- use spin_lock_irq in req_complete_fn to replace
spin_lock_irqsave.
- fix bug in backing_dev_bio_end with spin_lock_irqsave.
- queue_work() inside spinlock.
- introduce inline_bvecs in backing_dev_req.
- use kmalloc_array for bvecs allocation.
- calculate ->off with dm_target_offset() before use it.
Hi
The out-of-memory handling still doesn't seem right.
If the GFP_NOWAIT allocation doesn't succeed (which may happen anytime,
for example it happens when the machine is receiving network packets
faster than the swapper is able to swap out data), create_cache_miss_req
returns NULL, the caller changes it to -ENOMEM, cache_read returns
-ENOMEM, -ENOMEM is propagated up to end_req and end_req will set the
status to BLK_STS_RESOURCE. So, it may randomly fail I/Os with an error.
Properly, you should use mempools. The mempool allocation will wait until
some other process frees data into the mempool.
If you need to allocate memory inside a spinlock, you can't do it reliably
(because you can't sleep inside a spinlock and non-sleepng memory
allocation may fail anytime). So, in this case, you should drop the
spinlock, allocate the memory from a mempool with GFP_NOIO and jump back
to grab the spinlock - and now you holding the allocated object, so you
can use it while you hold the spinlock.
Hi Mikulas,
Thanx for your suggestion, I will cook a GFP_NOIO version for the
memory allocation for pcache data path.
Another comment:
set_bit/clear_bit use atomic instructions which are slow. As you already
hold a spinlock when calling them, you don't need the atomicity, so you
can replace them with __set_bit and __clear_bit.
Good idea.
Thanx
Dongsheng
Mikulas