Hi,

I think you are hit by two different problems at the same time. The second 
problem might be the same that we also experience, namely that Windows VMs have 
very strange performance characteristics with libvirt, vd driver and RBD. With 
copy operations on very large files (>2GB) we see a sharp drop of bandwidth 
after ca. 1 to 1.5GB to a measly 25MB/s for as yet unknown reasons. We cannot 
reproduce this behaviour with Linux VMs, so chances that this is a Windows and 
not a ceph problem are rather high.

The first problem, however, has to do with how ceph uses disks. Bare spinning 
disks have very poor performance characteristics and a lot of development since 
their invention has been on smart controllers (internal and external) with 
volatile and persistent caches and OS file buffers that attempt to translate 
usual user's workloads into something that works reasonable well with spinning 
drives. The main ideas being to re-order and merge I/O, cache hot data and 
absorb I/O bursts for constant write back. The SANs you are used to are almost 
certainly high-end products with all the magic money can currently afford.

Ceph forcefully bypasses all of such logic and a rule of thumb I'm following is 
that with ceph and current hardware, using current generation drives will 
provide previous generation's drive performance. With NVMes you can achieve SSD 
performance, with SSDs you get good spinning SAS drive performance and with SAS 
drives you get, well, floppy or zip drive performance. I'm afraid that's what 
you are seeing with 15VMs saturating the available aggregated performance of 
the spindles.

If you want to stick with spindles as a data store, what you need is fast, 
reliable persistent cache. Reliable here means that the firmware is free of 
bugs with respect to power outages, which is quite a requirement in itself. 
Some expensive disk controllers claim to have that, they offer persistent NVMe 
cache. How much you want to trust the firmware is a different story. 
Alternatively, you could consider a few TB NVMe drives for a ceph cache pool. 
People report that they are happy with that. As long as the cache pool can hold 
all hot data plus write bursts, I would also expect this to work fine.

Instead of caching we decided to go for a split. We use datacenter grade 
low-cost SSDs for a small all-flash pool for OS RBD disks and a large HDD-only 
pool for data storage. This works quite well since the major annoying 
simultaneous I/O workload of Windows VMs happens on the OS disks. For ordinary 
data access, an EC HDD pool is perfectly fine and we provision machines with a 
second large data disk on HDD. Our users are quite happy with that model.

In any case, we are still stuck with the strange performance drop with Windows 
machines that you also seem to observe and are still looking for help with 
that. If you manage to figure out what is going on, I would like to hear about 
that. So far, we haven't found a clue.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: jchar...@provectio.fr <jchar...@provectio.fr>
Sent: 11 June 2020 12:38:32
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Poor Windows performance on ceph RBD.

Hello,

we are using same environment, Opennebula + Ceph.
Our ceph cluster is composed by 5 ceph OSD Hosts with SSD, spinning 10ktrs and 
7.2ktrs, with 10Gb/s fiber network
Each spinning OSD are associated with a db and wall devices on SSD

Nearly all our Windows VM RBD images are in a 10k/trs pool with erasure coding.
For the moment we are house about 15 VM (RDS and exchange)

What we are noting :
   - VM are far from respondig as well as on our old 10k SAN ( less than 30%)
   - RBD average Latency is oscillating between 50ms to 250ms with some peaks 
that can reach the second
   - some tests (crystal test drive) from inside the VM can show performance up 
to 700MB/s on read and 170 MB/s on write, but a single file copy barely reach 
150 MB/s and  stay at a poor 25 MB/s most of the time
   -  test on 4K rnd, show some iops performance up to 4K iops read and 2kiops 
write, but view from RDB point of view, it's like the image iops cant barely go 
over 500 iops(read+write)

Since we have to migrate our VM from the old SAN to Ceph, I am really worried, 
there is mode than 150 VMs on it, and our Ceph seems to have hard time to cope 
with 15 VMs.

I can't find accurate date and relevant calculus templates  that should permit 
me to evualate what I can expect
All the documents I've read (and I read a lot ;) ) only reports empirical 
ascertainment with "it's better", or "it's worst".
There is a lot of parameters we can tweaks like block size, striping, stripe 
size, strip count, ... but those are poorly documented, especially the relation 
between them.

I will be more than happy to work with some peoples who are in the same 
situation to try to find some solutions, methods which can help us to be sure 
of our design. And break the "make the cluster, tweak it, and maybe it will be 
fine for you". I feel that each of us ( as I read in forums and mailing list) 
are a bit lonesome. Google is a real friend, but if feel he reached its limits 
;)

Maybe my call will reach some volontee.

Best regards
JC Passard
CTO Provectio
France
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to