Re: [ceph-users] Very HIGH Disk I/O latency on instances

Peter Maloney Thu, 29 Jun 2017 00:17:40 -0700

On 06/28/17 21:57, Gregory Farnum wrote:
>
>
> On Wed, Jun 28, 2017 at 9:17 AM Peter Maloney
> <peter.malo...@brockmann-consult.de
> <mailto:peter.malo...@brockmann-consult.de>> wrote:
>
>     On 06/28/17 16:52, keynes_...@wistron.com
>     <mailto:keynes_...@wistron.com> wrote:
>>     [...]backup VMs is create a snapshot by Ceph commands (rbd
>>     snapshot) then download (rbd export) it.
>>
>>      
>>
>>     We found a very high Disk Read / Write latency during creating /
>>     deleting snapshots, it will higher than 10000 ms.
>>
>>      
>>
>>     Even not during backup jobs, we often see a more than 4000 ms
>>     latency occurred.
>>
>>      
>>
>>     Users start to complain.
>>
>>     Could you please help us to how to start the troubleshooting?
>>
>>      
>>
>     For creating snaps and keeping them, this was marked wontfix
>     http://tracker.ceph.com/issues/10823
>
>     For deleting, see the recent "Snapshot removed, cluster thrashed"
>     thread for some config to try.
>
>
> Given he says he's seeing 4 second IOs even without snapshot
> involvement, I think Keynes must be seeing something else in his cluster.


If you have few enough OSDs and slow enough journals that seem ok
without snaps, with snaps can be much worse than 4s IOs if you have any
sync heavy clients, like ganglia.

Before I figured out that it was exclusive-lock causing VMs to hang, I
tested many things and spent months on it and found that out. Also some
people in freenode irc ##proxmox channel with cheap home setups with
ceph complain about such things often.

>
>
>     
> https://storageswiss.com/2016/04/01/snapshot-101-copy-on-write-vs-redirect-on-write/
>>     Consider a *copy-on-write* system, which /copies/ any blocks
>>     before they are overwritten with new information (i.e. it copies
>>     on writes). In other words, if a block in a protected entity is
>>     to be modified, the system will copy that block to a separate
>>     snapshot area before it is overwritten with the new information.
>>     This approach requires three I/O operations for each write: one
>>     read and two writes. [...] This decision process for each block
>>     also comes with some computational overhead.
>
>>     A *redirect-on-write* system uses pointers to represent all
>>     protected entities. If a block needs modification, the storage
>>     system merely /redirects/ the pointer for that block to another
>>     block and writes the data there. [...] There is zero
>>     computational overhead of reading a snapshot in a
>>     redirect-on-write system.
>
>>     The redirect-on-write system uses 1/3 the number of I/O
>>     operations when modifying a protected block, and it uses no extra
>>     computational overhead reading a snapshot. Copy-on-write systems
>>     can therefore have a big impact on the performance of the
>>     protected entity. The more snapshots are created and the longer
>>     they are stored, the greater the impact to performance on the
>>     protected entity.
>
>
> I wouldn't consider that a very realistic depiction of the tradeoffs
> involved in different snapshotting strategies[1], but BlueStore uses
> "redirect-on-write" under the formulation presented in those quotes.
> RBD clones of protected images will remain copy-on-write forever, I
> imagine.
> -Greg
It was simply the first link I found which I could quote, but I didn't
find it too bad... just it describes it like all implementations are the
same.
>
> [1]: There's no reason to expect a copy-on-write system will first
> copy the original data and then overwrite it with the new data when it
> can simply inject the new data along the way. *Some* systems will copy
> the "old" block into a new location and then overwrite in the existing
> location (it helps prevent fragmentation), but many don't. And a
> "redirect-on-write" system needs to persist all those block metadata
> pointers, which may be much cheaper or much, much more expensive than
> just duplicating the blocks.

After a snap is unprotected, will the clones be redirect-on-write? Or
after the image is flattened (like dd if=/dev/zero to the whole disk)?

Are there other cases where you get a copy-on-write behavior?

Glad to hear bluestore has something better. Is that avaliable and
default behavior on kraken (which I tested but where it didn't seem to
be fixed, although all storage backends were less block prone on kraken)?

If it was a true redirect-on-write system, I would expect that when you
make a snap, there is just the overhead of organizing some metadata, and
then after that, any writes just write as normal, to a new place, not
requiring the old data to be copied, ideally not any of it, even
partially written objects. And I don't think I saw that behavior on my
kraken tests, although the performance was better (due to no blocked
requests, but the iops at peak was basically the same; and I didn't
measure total IO or something that would be more reliable...just looked
at performance effects and blocking).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Very HIGH Disk I/O latency on instances

Reply via email to