https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209571
Bug ID: 209571 Summary: ZFS and NVMe performing poorly. TRIM requests stall I/O activity Product: Base System Version: 10.3-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: bor...@sarenet.es Created attachment 170388 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=170388&action=edit throughput graphs for two bonnie++ runs On a test system with 10 Intel P3500 NVMEs I have found that TRIM activity can cause a severe I/O stall. After running several bonnie++ tests, the ZFS file system has been almost unusable for 15 minutes (yes, FIFTEEN!). HOW TO REPRODUCE: - Create a ZFS pool, in this case, a raidz2 pool with the 10 NVMEs. - Create a dataset without compression (we want to test actual I/O performance) - Run bonnie++. As bonnie++ can quickly saturate a single CPU core and hence it's unable to generate enough bandwidth for this setup, I run four bonnie++ processes concurrently. In order to demonstrate this issue, each bonnie++ performs two runs. So, ( bonnie++ -s 512g -x 2 -f) & # four times. Graphs included. Made with devilator (an Orca compatible data collector) pulling data from devstat(9). The disk is just one out of 10 (the other 9 graphs are identical, as expected). The first run of four bonnie++ processes runs without flaws. On graph 1 (TwoBonniesTput) we have the first bonnie++ from the start of the graph to around 08:30 (the green line is the "Intelligent reading" phase, and a second bonnie++ starting right after it. Bonnie++ does several tests, beginning with a write test (blue line showing around 230 MBps, from the start to 07:40), followed by a read/write test (from 07:40 to 08:15 on the graphs), showing read/write/delete activity and finally a read test (green line showing 250 MBps from 08:15 to 08:30 more or less). After bonnie++ ends, the files it created are deleted. In this particular test, four concurrent bonnie++ processes created four files of 512 GB each, a total of 2 TB. After the first run, the disks show the TRIM activity going on at a rate of around 200 MB/s. It seems quite slow, since a test I did at home on an OCZ Vertex4 SSD (albeit, a single one, not a pool) gave a peak of 2 GB/s. But I understand that the ada driver is coalescing TRIM requests, while the nvd driver doesn't. The trouble is: the second bonnie++ process is started right after the first one, and, THERE IS ALMOST NO WRITE ACTIVITY FOR 15 MINUTES. The writing activity is just frozen, and it doesn't pick up until about 08:45, stalling again, although for a shorter time, around 08:50. On exhibit 2, "TwoBonniesTimes", it can be seen that the write latency during the stall is zero, which means (unless I am wrong) that no write commands are actually reaching the disks. During the stalls the ZFS system was unresponsive. Any commands such as a simple "zfs list" were painfully slow, taking even some minutes to complete. EXPECTED BEHAVIOR: I understand that a heavy TRIM activity must have an impact, but in this case it's causing a complete starvation for the rest of the ZFS I/O activity which is clearly wrong. This behavior could cause a severe problem, por example, when destroying a large snapshot. In this case, the system is deleting 2 TB of data. ATTEMPTS TO MITIGATE IT: The first thing I tried was to reduce the priority of the TRIM operations in the I/O scheduler, vfs.zfs.vdev.trim_max_pending=100 vfs.zfs.vdev.trim_max_active=1 vfs.zfs.vdev.async_write_min_active=8 with no visible effect. After reading the article describing the ZFS I/O scheduler I suspected that the trim activity might be activating the write throttle. So I just disabled it. vfs.zfs.delay_scale=0 But it didn't help either. The writing processes still got stuck, but on dp->dp_s rather than dmu_tx_delay. There are two problems here. It seems that the nvd driver doesn't coalesce trim requests. On the other hand, ZFS is dumping a lot of trim requests assuming that the lower layer will coalesce them. I don't think it's a good idea to make such an assumption blindly in ZFS. On the other hand, I think that there should be some throttling mechanism applied to trim requests. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"