On 5/3/2011 10:34 PM, John wrote:
-----Original Message-----
From: Ted Mittelstaedt<t...@mittelstaedt.us> Sent: May 4, 2011
12:48 AM To: freebsd-emulation@freebsd.org Subject: Re: virtualbox
I/O 3 times slower than KVM?
On 5/3/2011 11:25 AM, John wrote:
-----Original Message-----
From: Ted Mittelstaedt<t...@mittelstaedt.us> Sent: May 3,
2011 12:02 AM To: Adam Vande More<amvandem...@gmail.com> Cc:
freebsd-emulation@freebsd.org Subject: Re: virtualbox I/O 3
times slower than KVM?
On 5/2/2011 7:39 PM, Adam Vande More wrote:
On Mon, May 2, 2011 at 4:30 PM, Ted
Mittelstaedt<t...@mittelstaedt.us<mailto:t...@mittelstaedt.us>>
wrote:
that's sync within the VM. Where is the bottleneck taking
place? If the bottleneck is hypervisor to host, then the
guest to vm write may write all it's data to a memory buffer
in the hypervisor that is then slower-writing it to the
filesystem. In that case killing the guest without killing
the VM manager will allow the buffer to complete emptying
since the hypervisor isn't actually being shut down.
No the bottle neck is the emulated hardware inside the VM
process container. This is easy to observe, just start a
bound process in the VM and watch top host side. Also the
hypervisor uses native host IO driver, there's no reason for
it to be slow. Since it's the emulated NIC which is the
bottleneck, there is nothing left to issue the write. Further
empirical evidence for this can be seen by by watching gstat
on VM running with an md or ZVOL backed storage. I already
utilize ZVOL's for this so it was pretty easy to confirm no
IO occurs when the VM is paused or shutdown.
Is his app going to ever face the extremely bad scenario,
though?
The point is it should be relatively easy to induce patterns
you expect to see in production. If you can't, I would
consider that a problem. Testing out theories(performance
based or otherwise) on a production system is not a good way
to keep the continued faith of your clients when the
production system is a mission critical one. Maybe throwing
more hardware at a problem is the first line of defense for
some companies, unfortunately I don't work for them. Are
they hiring? ;) I understand the logic of such an approach
and have even argued for it occasionally. Unfortunately
payroll is already in the budget, extra hardware is not even
if it would be a net savings.
Most if not all sites I've ever been in that run Windows
servers behave in this manner. With most of these sites SOP is
to "prove" that the existing hardware is inadequate by loading
whatever Windows software that management wants loaded then
letting the users on the network scream about it. Then money
magically frees itself up when there wasn't any before. Since
of course management will never blame the OS for the slowness,
always the hardware.
Understand I'm not advocating this, just making an
observation.
Understand that I'm not against testing but I've seen people
get so engrossed in spending time constructing test suites that
they have ended up wasting a lot of money. I would have to
ask, how much time did the OP who started this thread take
building 2 systems, a Linux and a BSD system? How much time
has he spent trying to get the BSD system to "work as well as
the Linux" system? Wouldn't it have been cheaper for him to
not spend that time and just put the Linux system into
production?
Ted
Thanks a lot for everyone's insights and suggestions. The CentOS
on the KVM is a production server, so I took some time to
prepare another CentOS on that KVM and did the test as Ted
suggested before (for comparison, right now the test freebsd is
the only guest on the virtualbox).
What I do is to cat the 330MB binary file (XP service pack from
Microsoft) 20 times into a single 6.6GB file, "date" before and
afterwards, and after the second date finishes, immediately
Force power shut down. There are two observations:
1. the time to complete copying into this 6.6GB file were 72s,
44s, 79s in three runs, presumably because there is another
production VM on the same host. The average is 65s, so it's
about 100MB/s. 2. After immediately power down, I do found the
resulting file was less than 6.6GB. So indeed the VM claimed the
completion of the copying before it actually did.
For clarity, what your saying is the CentOS guest OS claimed the
copy had completed before it actually did, correct? This is
consistent with async-mounted filesystems which I believe is the
default under CentOS. Your guest is mounting it's own filesystem
inside the VM async mount. So when the copy completes and you get
back to the shell prompt on the guest, a memory buffer in the guest
OS is still copying the last bits of the file to the disk.
I then did the same thing on the virtualbox, since I don't want
the above premature I/O, I made sure the "Use Host I/O cache" is
unchecked for the VM storage.
That setting isn't going to change how the guest async-mounts it's
filesystems. All it does is force the hypervisor to not use some
caching that the hypervisor is provided with by the host OS.
1. the time to complete copying into this 6.6GB file was 119s
and 92s, the average is 105s, so the speed is 62MB/s. 2. after
immediately "Reset" the machine, I couldn't boot. Both times it
asked me to do fsck for that partition (GPT 2.2T). But after
finally powering up, I found the file was also less than 6.6GB
both times as well.
I would imagine this would happen.
So looks like virtualbox also suffers caching problem? Or did I
do anything wrong?
There isn't a "caching problem" As we have said on this forum the
speed that the actual write is happening is the same under the
FreeBSD guest and the CentOS guest. The only difference is the
FreeBSD guest is sync-mounting it's filesystem within the virtual
machine and the CentOS guest is async-mounting it's filesystem
within the virtual machine.
Async mount is always faster for writes because what is actually
going on is that the write goes to a memory buffer then the OS
completes the write "behind the scenes" In many cases when the
data in a file is rapidly changing, the write may never go to disk
at all, if the OS sees successive writes to the same part of the
file it will simply make the writes to the memory buffer then get
around to updating the disk when it feels like.
I didn't spend extra time optimizing either the linux or the
freebsd, they are both the production systems from centos and
freebsd. I just want to have a production quality system without
too much customized work.
Also, most servers will be mail servers and web servers, with
some utilization of database. Granted, copying 6.6GB file is
atypical on these servers, but I just want to get an idea of what
the server is capable of. I do not know a test software that can
benchmark my usage pattern and is readily available on both
centos and freebsd.
What it really sounds like to me is that your just not
understanding the difference in how the filesystem is mounted. For
starters you have your host OS which the hypervisor is running on.
You have a large file on that host which comprises the VM, either
freeBSD or CentOS. When the FreeBSD or CentOS guest is making it's
writes it is making them into that large file. if the host has
that file sync-mounted then it will slow file access by the
hypervisor to that file.
And then you have the guest OSes which themselves have their own
memory buffers and mount chunks of that file as their filesystems.
They can mount these chunks sync or async. If they mount them
async then it makes access to those chunks faster also.
There is a tradeoff here. If you sync-mount a filesystem then if
the operating system halts or crashes then there is usually little
to no file system damage. But, access to the disk will be slowest.
If you async mount a filesystem then if the operating system
crashes then you will have a lot of garbage and file corruption.
But, access will be the fastest.
A very common configuration for a mailserver is when your
partitioning the filesystem to create the usual /, swap, /usr,
/tmp,& /var - then create an additional /home and "mail". Then
you either mount "mail" on /var/mail or you mount it on /mail and
softlink /var/mail to /mail. Then you setup /tmp, /home, and /mail
or /var/mail as async mount and everything else sync mount, and
softlink /var/spool to /tmp.
That way if the mailserver reboots or crashes then the program
files are generally not affected even if the e-mail is scotched,
yet you get the fastest possible disk performance. If a partition
is so far gone that it cannot even be repaired by fsck then you can
just newfs it and start over. It is also a lot easier to create a
dump/restore back up scheme, too.
With CentOS/Linux it's a bit different because that OS mounts the
entire disk on / and creates subdirectories for everything. That
is one of the (many) reasons I don't ever use Linux for
mailservers, you do not have the same kind of fine-grained control.
But you can create multiple partitions on CentOS, too.
Also, the fact is that the FreeBSD filesystem and OS has been
heavily optimized and if the mailserver isn't that busy you don't
need to bother async mounting any of it's partitions, because the
system will simply spawn more processes. You got to think of it
this way, for example with a mailserver let's say sync mounting
causes each piece of e-mail to spend 15 ms in disk access and let's
say async mounting cuts that to 5ms - well if the mailserver
normally runs at about 10 simultaneous sendmail instances under
async mounting then it will run 30 instances under sync mounting at
the same throughput - and with each instance only taking 100MB of
ram, you can toss a couple extra GB of ram in the server and forget
about it.
Ted
Hi Ted,
Thanks for taking time to explain this. I'm so sorry I didn't pay
attention to this (a)sync mounting options before. Are you talking
about these options in the /etc/fstab?
yes
I just checked I didn't give
any option here (other than 'sw'), for all disk partitions on both
the FreeBSD virtualbox host and guest. And the mount manpage said
then by default it's sync mounting.
yes. the absense of the keyword "async" means it's sync mounted.
Does that mean my FreeBSD guest
already sync mounted?
it means your guest has mounted it's filesystems sync.
So what is happening is the FreeBSD guest virtual OS makes it's write
sync, and is getting told by the hypervisor (virtual box) that the
write is completed, when in reality all that has happened is that
the guest OS has completed a write to the virtual filesystem. When
you reset the system the write is still going from the hypervisor to
the host filesystem, and that is mounted async.
Here is what I mean:
FreeBSD
syncmount on
virtual filesystem controlled by hypervisor, in memory
memory buffers of hypervisor
asyncmounted on
host OS,
hardware cache mounted on
physical disk
The FreeBSD guest disk IO is virtual.
Then why it also prematurely declared
completion of the writing and couldn't boot?
(At the same time, I did confirm that on CentOS the default mount
option includes "async".)
There are in this scenario at least 5 layers of disk caching going on:
FreeBSD caching to it's virtual filesystem.
The virtual filesystem the hypervisor provides is probably also
cached in the ram that the host OS gives to the hypervisor in
a hypervisor i/o cache.
Then the virtual filesystem of the hypervisor is mounted
on the disk, so the host OS is running a cache
then all that is on the hardware cache of the
raid controller
And finally, each individual disk of the array has it's own
internal cache.
The better hardware raid cards are battery-backed up for this reason.
All of this is why it's not easy to get the REAL disk throughput
of a system because there is so much caching. The tools written
to do this have to do fancy stuff like generate data in
many multiple files and read and write them back and forth in
order to fill up all of the caches so that the systems and disks
are unable to cache anything and have to do the actual writes,
and the tools have to create and delete many of these files so
the systems don't try to get smart and just manipulate files in
a memory disk cache somewhere.
Ted
_______________________________________________
freebsd-emulation@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-emulation To
unsubscribe, send any mail to
"freebsd-emulation-unsubscr...@freebsd.org"
_______________________________________________
freebsd-emulation@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-emulation
To unsubscribe, send any mail to "freebsd-emulation-unsubscr...@freebsd.org"