[Apologies to the list, this has expanded past ZFS, if someone complains, we can move the thread to another illumos dev list]
On May 28, 2012, at 2:18 PM, Lionel Cons wrote: > On 28 May 2012 22:10, Richard Elling <richard.ell...@gmail.com> wrote: >> The only recommendation which will lead to results is to use a >> different OS or filesystem. Your choices are >> - FreeBSD with ZFS >> - Linux with BTRFS >> - Solaris with QFS >> - Solaris with UFS >> - Solaris with NFSv4, use ZFS on independent fileserver machines >> >> There's a rather mythical rewrite of the Solaris virtual memory >> subsystem called VM2 in progress but it will still take a long time >> until this will become available for customers and there are no real >> data yet whether this will help with mmap performance. It won't be >> available for Opensolaris successors like Illumos available either >> (likely never, at least the Illumos leadership doesn't see the need >> for this and instead recommends to rewrite the applications to not use >> mmap). >> >> >> This is a mischaracterization of the statements given. The illumos team >> says they will not implement Oracle's VM2 for valid, legal reasons. >> That does not mean that mmap performance improvements for ZFS >> cannot be implemented via other methods. > > I'd like to hear what the other methods should be. The lack of mmap > performance is only a symptom of a more severe disease. Just doing > piecework and alter the VFS API to integrate ZFS/ARC/VM with each > other doesn't fix the underlying problems. > > I've assigned two of my staff, one familiar with the FreeBSD VM and > one familiar with the Linux VM, to look at the current VM subsystem > and their preliminary reports point to disaster. If Illumos does not > initiate a VM rewrite project of it's own which will make the VM aware > of NUMA, power management and other issues then I predict nothing less > than the downfall of Illumos within a couple of years because the > performance impact is dramatic and makes the Illumos kernel no longer > competitive. > Despite these findings, of which Sun was aware for a long time, and > the number of ex-Sun employees working on Illumos, I miss the > commitment to launch such a project. That's why I said "likely never", > unless of course someone slams Garrett's head with sufficient force on > a wooden table to make him see the reality. > > The reality is: > - The modern x86 server platforms are now all NUMA or NUMA-like. Lack > of NUMA support leads to bad performance SPARC has been NUMA since 1997 and Solaris changed the scheduler long ago. > - They all use some kind of serialized link between CPU nodes, let it > be Hypertransport or Quickpath, with power management. If power > management is active and has reduced the number of active links > between nodes and the OS doesn't manage this correctly you'll get bad > performance. Illumo's VM isn't even remotely aware of this fact > - Based on simulator testing we see that in a simulated environment > with 8 sockets almost 40% of kernel memory accesses are _REMOTE_ > accesses, i.e. it's not local to the node accessing it > That are all preliminary results, I expect that the remainder of the > analysis will take another 4-5 weeks until we present the findings to > the Illumos community. But I can say already it will be a faceslap for > those who think that Illumos doesn't need a better VM system. Nobody said illumos doesn't need a better VM system. The statement was that illumos is not going to reverse-engineer Oracle's VM2. >> The primary concern for mmap files is that the RAM footprint is doubled. > > It's not only that RAM is doubled, the data are copied between both > ARC and page cache multiple times. You can say memory and the in > memory copy operation are cheap, but this and the lack of NUMA > awareness is a real performance killer. Anybody who has worked on a SPARC system for the past 15 years is well aware of NUMAness. We've been living in a NUMA world for a very long time, a world where the processors were slow and far memory latency is much, much worse than we see in the x86 world. I look forward to seeing the results of your analysis and experiments. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss