date:20170602

[ceph-users] About dmClock tests confusion after integrating dmClock QoS library into ceph codebase

2017-06-02 Thread Lijie

Hi Eric, Our team has developed QOS feature on ceph using the dmclock library from community. We treat a rbd as a dmclock client instead of pool as . We tested our code and the result is confusing . Testing environment: single server with 16 cores , RAM of 32G, 8 non-systerm disks,each runs

Re: [ceph-users] is there any way to speed up cache evicting?

2017-06-02 Thread Christian Balzer

Hello, On Fri, 2 Jun 2017 14:30:56 +0800 jiajia zhong wrote: > christian, thanks for your reply. > > 2017-06-02 11:39 GMT+08:00 Christian Balzer : > > > On Fri, 2 Jun 2017 10:30:46 +0800 jiajia zhong wrote: > > > > > hi guys: > > > > > > Our ceph cluster is working with tier cache. > > If

[ceph-users] should I use rocdsdb ?

2017-06-02 Thread Z Will

Hello gurus: My name is will . I have just study ceph and have a lot of interest in it . We are using ceph 0.94.10. And I am tring to tune the performance of ceph to satisfy our requirements. We are using it as object store now. Even though I have tried some different configuration. But I sti

Re: [ceph-users] is there any way to speed up cache evicting?

2017-06-02 Thread jiajia zhong

thank you for your guide :)， It's making sense. 2017-06-02 16:17 GMT+08:00 Christian Balzer : > > Hello, > > On Fri, 2 Jun 2017 14:30:56 +0800 jiajia zhong wrote: > > > christian, thanks for your reply. > > > > 2017-06-02 11:39 GMT+08:00 Christian Balzer : > > > > > On Fri, 2 Jun 2017 10:30:46 +0

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Laszlo Budai

Hi David, If I understand correctly your suggestion is the following: If we have for instance 12 servers grouped into 3 racks (4/rack) then you would build a crush map saying that you have 6 racks (virtual ones), and 2 servers in each of them, right? In this case if we are setting the failure

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney

On 06/01/17 17:12, koukou73gr wrote: > Hello list, > > Today I had to create a new image for a VM. This was the first time, > since our cluster was updated from Hammer to Jewel. So far I was just > copying an existing golden image and resized it as appropriate. But this > time I used rbd create. >

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney

On 06/02/17 11:59, Peter Maloney wrote: > On 06/01/17 17:12, koukou73gr wrote: >> Hello list, >> >> Today I had to create a new image for a VM. This was the first time, >> since our cluster was updated from Hammer to Jewel. So far I was just >> copying an existing golden image and resized it as app

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread koukou73gr

Thanks for the reply. Easy? Sure, it happens reliably every time I boot the guest with exclusive-lock on :) I'll need some walkthrough on the gcore part though! -K. On 2017-06-02 12:59, Peter Maloney wrote: > On 06/01/17 17:12, koukou73gr wrote: >> Hello list, >> >> Today I had to create a new

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney

On 06/02/17 12:06, koukou73gr wrote: > Thanks for the reply. > > Easy? > Sure, it happens reliably every time I boot the guest with > exclusive-lock on :) If it's that easy, also try with only exclusive-lock, and not object-map nor fast-diff. And also with one or the other of those. > > I'll need

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread koukou73gr

On 2017-06-02 13:01, Peter Maloney wrote: >> Is it easy for you to reproduce it? I had the same problem, and the same >> solution. But it isn't easy to reproduce... Jason Dillaman asked me for >> a gcore dump of a hung process but I wasn't able to get one. Can you do >> that, and when you reply, C

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread koukou73gr

On 2017-06-02 13:22, Peter Maloney wrote: > On 06/02/17 12:06, koukou73gr wrote: >> Thanks for the reply. >> >> Easy? >> Sure, it happens reliably every time I boot the guest with >> exclusive-lock on :) > If it's that easy, also try with only exclusive-lock, and not object-map > nor fast-diff. And

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread Peter Maloney

On 06/02/17 12:25, koukou73gr wrote: > On 2017-06-02 13:01, Peter Maloney wrote: >>> Is it easy for you to reproduce it? I had the same problem, and the same >>> solution. But it isn't easy to reproduce... Jason Dillaman asked me for >>> a gcore dump of a hung process but I wasn't able to get one.

Re: [ceph-users] should I use rocdsdb ?

2017-06-02 Thread Mark Nelson

Hi Will, Few people have tried rocksdb as the k/v store for filestore since we never really started supporting it for production use (We ended up deciding to move on to bluestore). I suspect it will be faster than leveldb but I don't think anyone has actually tested filestore+rocksdb to any

Re: [ceph-users] is there any way to speed up cache evicting?

2017-06-02 Thread David Turner

I'm thinking you have erasure coding in cephfs and only use cache tiring because you have to, correct? What is your use case for repeated file accesses? How much data is written into cephfs at a time? For me, my files are infrequently accessed after they are written or read from the EC back-end po

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread David Turner

You wouldn't be able to guarantee that the cluster will not use 2 servers from the same rack. The problem with 3 failure domains, however, is if you lose a full failure domain ceph can do nothing to maintain 3 copies of your data. It leaves you in a position where you need to rush to the datacenter

[ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Oleg Obleukhov

Hello, I am playing around with ceph (ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)) on Debian Jessie and I build a test setup: $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.01497 root default -2 0.00499 host af-staging-ceph01

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Ashley Merrick

You only have 3 osd's hence with one down you only have 2 left for replication of 3 objects. No spare OSD to place the 3rd object on, if you was to add a 4th node the issue would be removed. ,Ashley On 2 Jun 2017, at 10:31 PM, Oleg Obleukhov mailto:leoleov...@gmail.com>> wrote: Hello, I am pl

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Etienne Menguy

I think it's because af-staging-ceph02 data can only be moved to af-staging-ceph01/3 which already have the data. There is no acceptable place to create the third replicate of data. Etienne From: ceph-users on behalf of Oleg Obleukhov Sent: Friday, June 2,

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Burkhard Linke

Hi, On 06/02/2017 04:15 PM, Oleg Obleukhov wrote: Hello, I am playing around with ceph (ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)) on Debian Jessie and I build a test setup: $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.014

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Laszlo Budai

What you're saying that if we only have 3 failure domains then ceph can do nothing to maintain 3 copies in case of an entire failure domain is lost, that is correct. BUT if you're losing 2 replicas out of 3 of your data, and your min size is set to 2 (the recommended minimum) then you have an e

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread David Turner

Also, your min_size is set to 2. What this means is that you need at least 2 copies of your data up to be able to access it. You do not want to have min_size of 1. If you had min_size of 1 and you only have 1 copy of your data receiving writes and then that copy goes down as well... What is to s

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread David Turner

I agree that running in min_size of 1 is worse than running with only 3 failure domains. Even if it's just for a short time and you're monitoring it closely... it takes mere seconds before you could have corrupt data with min_size of 1 (depending on your use case). That right there is the key. Wh

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Oleg Obleukhov

Thanks to everyone, problem is solved by: ceph osd pool set cephfs_metadata size 2 ceph osd pool set cephfs_data size 2 Best, Oleg. > On 2 Jun 2017, at 16:15, Oleg Obleukhov wrote: > > Hello, > I am playing around with ceph (ceph version 10.2.7 > (50e863e0f4bc8f4b9e31156de690d765af245185)) on D

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread David Turner

That's good for testing in the small scale. For production I would revisit using size 3. Glad you got it working. On Fri, Jun 2, 2017 at 11:02 AM Oleg Obleukhov wrote: > Thanks to everyone, > problem is solved by: > ceph osd pool set cephfs_metadata size 2 > ceph osd pool set cephfs_data size

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Oleg Obleukhov

But what would be the best? Have 3 servers and how many osd? Thanks! > On 2 Jun 2017, at 17:09, David Turner wrote: > > That's good for testing in the small scale. For production I would revisit > using size 3. Glad you got it working. > > On Fri, Jun 2, 2017 at 11:02 AM Oleg Obleukhov

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-06-02 Thread Mark Nelson

I got a chance to run this by Josh and he had a good thought. Just to make sure that it's not IO backing up on the device, it probably makes sense to repeat the test and watch what the queue depth and service times look like. I like using collectl for it: "collectl -sD -oT" The queue depth

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-06-02 Thread Steve Anthony

I'm seeing this again on two OSDs after adding another 20 disks to my cluster. Is there someway I can maybe determine which snapshots the recovery process is looking for? Or maybe find and remove the objects it's trying to recover, since there's apparently a problem with them? Thanks! -Steve On 0

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

2017-06-02 Thread koukou73gr

Coming back to this, with Jason's insight it was quickly revealed that my problem was in reality a cephx authentication permissions issue. Specifically, exclusive-lock requires a cephx user with class-write access to the pool where the image resides. This wasn't clear in the documentation and the

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-02 Thread Anthony D'Atri

All very true and worth considering, but I feel compelled to mention the strategy of setting mon_osd_down_out_subtree_limit carefully to prevent automatic rebalancing. *If* the loss of a failure domain is temporary, ie. something you can fix fairly quickly, it can be preferable to not start tha

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-02 Thread ceph . novice

Hi Graham. We are on Kraken and have the same problem with "lifecycle". Various (other) tools like s3cmd or CyberDuck do show the applied "expiration" settings, but objects seem never to be purged. If you should have new findings, hints,... PLEASE share/let me know. Thanks a lot! Anton Ges

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-02 Thread Yehuda Sadeh-Weinraub

Have you opened a ceph tracker issue, so that we don't lose track of the problem? Thanks, Yehuda On Fri, Jun 2, 2017 at 3:05 PM, wrote: > Hi Graham. > > We are on Kraken and have the same problem with "lifecycle". Various (other) > tools like s3cmd or CyberDuck do show the applied "expiration"

Re: [ceph-users] Recovery stuck in active+undersized+degraded

2017-06-02 Thread Christian Wuerdig

Well, what's "best" really depends on your needs and use-case. The general advise which has been floated several times now is to have at least N+2 entities of your failure domain in your cluster. So for example if you run with size=3 then you should have at least 5 OSDs if your failure domain is OS

Re: [ceph-users] is there any way to speed up cache evicting?

2017-06-02 Thread jiajia zhong

david, 2017-06-02 21:41 GMT+08:00 David Turner : > I'm thinking you have erasure coding in cephfs and only use cache tiring > because you have to, correct? What is your use case for repeated file > accesses? How much data is written into cephfs at a time? > these days, up to ten-millions of tiny

[ceph-users] Bug report: unexpected behavior when executing Lua object class

2017-06-02 Thread Zheyuan Chen

Hi, I found two bugs when testing out Lua object class. I am running with Ceph 11.2.0. Can anybody take a look at them? Zheyuan Bug 1: I can not get returned output in the first script. "data" is always empty. import rados, json > cluster = rados.Rados(conffile='') > cluster.connect() > ioctx

[ceph-users] About dmClock tests confusion after integrating dmClock QoS library into ceph codebase

Re: [ceph-users] is there any way to speed up cache evicting?

[ceph-users] should I use rocdsdb ?

Re: [ceph-users] is there any way to speed up cache evicting?

Re: [ceph-users] Crushmap from Rack aware to Node aware

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] should I use rocdsdb ?

Re: [ceph-users] is there any way to speed up cache evicting?

Re: [ceph-users] Crushmap from Rack aware to Node aware

[ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Crushmap from Rack aware to Node aware

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Crushmap from Rack aware to Node aware

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

Re: [ceph-users] RBD exclusive-lock and lqemu/librbd

Re: [ceph-users] Crushmap from Rack aware to Node aware

Re: [ceph-users] RGW lifecycle not expiring objects

Re: [ceph-users] RGW lifecycle not expiring objects

Re: [ceph-users] Recovery stuck in active+undersized+degraded

Re: [ceph-users] is there any way to speed up cache evicting?

[ceph-users] Bug report: unexpected behavior when executing Lua object class

34 matches

Site Navigation

Mail list logo

Footer information