Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-18 Thread Nick Fisk
-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Balzer Sent: 18 November 2014 01:11 To: ceph-users Subject: Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier Hello, On Mon, 17 Nov 2014 17:45:54 +0100 Laurent GUERBY wrote: > Hi, > > Just a

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-17 Thread Laurent GUERBY
Le Tuesday 18 November 2014 à 10:11 +0900, Christian Balzer a écrit : > Hello, > > On Mon, 17 Nov 2014 17:45:54 +0100 Laurent GUERBY wrote: > > > Hi, > > > > Just a follow-up on this issue, we're probably hitting: > > > > http://tracker.ceph.com/issues/9285 > > > I wonder how much pressure was

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-17 Thread Christian Balzer
Hello, On Mon, 17 Nov 2014 17:45:54 +0100 Laurent GUERBY wrote: > Hi, > > Just a follow-up on this issue, we're probably hitting: > > http://tracker.ceph.com/issues/9285 > > We had the issue a few weeks ago with replicated SSD pool in front of > rotational pool and turned off cache tiering.

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-17 Thread Erik Logtenberg
I think I might be running into the same issue. I'm using Giant though. A lot of slow writes. My thoughts went to: the OSD's get too much work to do (commodity hardware), so I'll have to do some performance tuning to limit parallellism a bit. And indeed, limiting the amount of threads for different

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-17 Thread Laurent GUERBY
Hi, Just a follow-up on this issue, we're probably hitting: http://tracker.ceph.com/issues/9285 We had the issue a few weeks ago with replicated SSD pool in front of rotational pool and turned off cache tiering. Yesterday we made a new test and activating cache tiering on a single erasure pool

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
On Sat, Nov 8, 2014 at 3:24 PM, Loic Dachary wrote: > > > On 09/11/2014 00:03, Gregory Farnum wrote: >> It's all about the disk accesses. What's the slow part when you dump >> historic and in-progress ops? > > This is what I see on g1 (6% iowait) Yeah, you're going to need to do some data collat

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Loic Dachary
On 09/11/2014 00:03, Gregory Farnum wrote: > It's all about the disk accesses. What's the slow part when you dump historic > and in-progress ops? This is what I see on g1 (6% iowait) root@g1:~# ceph daemon osd.0 dump_ops_in_flight { "num_ops": 0, "ops": []} root@g1:~# ceph daemon osd.0 dump

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
It's all about the disk accesses. What's the slow part when you dump historic and in-progress ops? On Sat, Nov 8, 2014 at 2:30 PM Loic Dachary wrote: > Hi Greg, > > On 08/11/2014 20:19, Gregory Farnum wrote:> When acting as a cache pool it > needs to go do a lookup on the base pool for every obje

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Loic Dachary
Hi Greg, On 08/11/2014 20:19, Gregory Farnum wrote:> When acting as a cache pool it needs to go do a lookup on the base pool for every object it hasn't encountered before. I assume that's why it's slower. > (The penalty should not be nearly as high as you're seeing here, but based on > the low

Re: [ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Gregory Farnum
When acting as a cache pool it needs to go do a lookup on the base pool for every object it hasn't encountered before. I assume that's why it's slower. (The penalty should not be nearly as high as you're seeing here, but based on the low numbers I imagine you're running everything on an overloaded

[ceph-users] Troubleshooting an erasure coded pool with a cache tier

2014-11-08 Thread Loic Dachary
Hi, This is a first attempt, it is entirely possible that the solution is simple or RTFM ;-) Here is the problem observed: rados --pool ec4p1 bench 120 write # the erasure coded pool Total time run: 147.207804 Total writes made: 458 Write size: 4194304 Bandwidth (MB/sec