Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
Hi David Turner, This is our ceph config under mon section , we have EC 4+1 and set the failure domain as host and osd_min_down_reporters to 4 ( osds from 4 different host ) . [mon] mon_compact_on_start = True mon_osd_down_out_interval = 86400 mon_osd_down_out_subtree_limit = host mon_osd_min_dow

[ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
I've set up the rule according to the doc, but some of the PGs are still being assigned to the same host. http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/ rule ssd-primary { ruleset 5 type replicated min_size 5 max_size

Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
Add to the info, I have a slightly modified rule to take advantage of the new storage class. rule ssd-hybrid { id 2 type replicated min_size 1 max_size 10 step take default class ssd step chooseleaf firstn 1 type host step emit step

Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-23 Thread Siegfried Höllrigl
Hi ! We have now deleted all snapshots of the pool in question. With "ceph pg dump" we can see that pg 5.9b has a SNAPTRIMQ_LEN of 27826. All other PGs have 0. It looks like this value does not decrease. LAST_SCRUB and LAST_DEEP_SCRUB  are both from 2018-04-24. Almost 1 month ago. OSD stil

[ceph-users] IO500 Call for Submissions for ISC 2018

2018-05-23 Thread John Bent
IO500 Call for Submission Deadline: 23 June 2018 AoE The IO500 is now accepting and encouraging submissions for the upcoming IO500 list revealed at ISC 2018 in Frankfurt, Germany. The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread David Turner
How many disks in each node? 68? If yes, then change it to 69. Also running with ec 4+1 is bad for the same reason as running with size=2 min_size=1 which has been mentioned and discussed multiple times on the ML. On Wed, May 23, 2018, 3:39 AM nokia ceph wrote: > Hi David Turner, > > This is our

Re: [ceph-users] Several questions on the radosgw-openstack integration

2018-05-23 Thread Massimo Sgaravatto
For #1 I guess this is a known issue (http://tracker.ceph.com/issues/20570) On Tue, May 22, 2018 at 1:03 PM, Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > I have several questions on the radosgw - OpenStack integration. > > I was more or less able to set it (using a Luminous ceph cl

[ceph-users] ceph_vms performance

2018-05-23 Thread Thomas Bennett
Hi, I'm testing out ceph_vms vs a cephfs mount with a cifs export. I currently have 3 active ceph mds servers to maximise throughput and when I have configured a cephfs mount with a cifs export, I'm getting a reasonable benchmark results. However, when I tried some benchmarking with the ceph_v

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

2018-05-23 Thread nokia ceph
yes it is 68 disks , and will this mon_osd_reporter_subtree_level = host have any impact on mon_osd_ min_down_reporters ? And related to min_size , yes there was many suggestions for us to move to 2 , due to storage efficiency concerns we still retain with 1 and trying to convince customers to g

Re: [ceph-users] Several questions on the radosgw-openstack integration

2018-05-23 Thread Massimo Sgaravatto
For #2, I think I found myself the answer. The admin can simply generate the S3 keys for the user, e.g.: radosgw-admin key create --key-type=s3 --gen-access-key --gen-secret --uid="a22db12575694c9e9f8650dde73ef565\$a22db12575694c9e9f8650dde73ef565" --rgw-realm=cloudtest and then the user can acce

[ceph-users] HDFS with CEPH, only single RGW works with the hdfs

2018-05-23 Thread 한승진
Hello Cephers, Our Team currently is trying to replace hdfs to CEPH object storage. However, there is a big problem which is "*hdfs dfs -put*" operation is very slow. I doubt session of RGW with hadoop system. Because, only one RGW node works with hadoop, even through we have 4 RGWs. There see

Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Paul Emmerich
You can't mix HDDs and SSDs in a server if you want to use such a rule. The new selection step after "emit" can't know what server was selected previously. Paul 2018-05-23 11:02 GMT+02:00 Horace : > Add to the info, I have a slightly modified rule to take advantage of the > new storage class. >

[ceph-users] open vstorage

2018-05-23 Thread Brady Deetz
http://www.openvstorage.com https://www.openvstorage.org I came across this the other day and am curious if anybody has run it in front of their Ceph cluster. I'm looking at it for a clean-ish Ceph integration with VMWare. ___ ceph-users mailing list cep

[ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
Now that Mimic is fully branched out from master, ceph-disk is going to be removed from master so that it is no longer available for the N release (pull request to follow) ceph-disk should be considered as "frozen" and deprecated for Mimic, in favor of ceph-volume. This means that if you are rely

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Willem Jan Withagen
On 23-5-2018 17:12, Alfredo Deza wrote: > Now that Mimic is fully branched out from master, ceph-disk is going > to be removed from master so that it is no longer available for the N > release (pull request to follow) > Willem, we don't have a way of directly supporting FreeBSD, I've > suggested t

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Sean Sullivan
Thanks Yan! I did this for the bug ticket and missed these replies. I hope I did it correctly. Here are the pastes of the dumps: https://pastebin.com/kw4bZVZT -- primary https://pastebin.com/sYZQx0ER -- secondary they are not that long here is the output of one: 1. Thread 17 "mds_rank_progr

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Vasu Kulkarni
Alfredo, Do we have the migration docs link from ceph-disk deployment to ceph-volume? the current docs as i see lacks scenario migration, maybe there is another link ? http://docs.ceph.com/docs/master/ceph-volume/simple/#ceph-volume-simple If it doesn't exist can we document, how a) ceph-disk wit

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
On Wed, May 23, 2018 at 12:12 PM, Vasu Kulkarni wrote: > Alfredo, > > Do we have the migration docs link from ceph-disk deployment to > ceph-volume? the current docs as i see lacks scenario migration, maybe > there is another link ? > http://docs.ceph.com/docs/master/ceph-volume/simple/#ceph-volum

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Alfredo Deza
On Wed, May 23, 2018 at 11:47 AM, Willem Jan Withagen wrote: > On 23-5-2018 17:12, Alfredo Deza wrote: >> Now that Mimic is fully branched out from master, ceph-disk is going >> to be removed from master so that it is no longer available for the N >> release (pull request to follow) > >> Willem, w

Re: [ceph-users] ceph-disk is getting removed from master

2018-05-23 Thread Vasu Kulkarni
On Wed, May 23, 2018 at 10:03 AM, Alfredo Deza wrote: > On Wed, May 23, 2018 at 12:12 PM, Vasu Kulkarni wrote: >> Alfredo, >> >> Do we have the migration docs link from ceph-disk deployment to >> ceph-volume? the current docs as i see lacks scenario migration, maybe >> there is another link ? >>

[ceph-users] MDS_DAMAGE: 1 MDSs report damaged metadata

2018-05-23 Thread Marc-Antoine Desrochers
Dear Ceph Experts, I have recently deleted a very big directory on my cephfs and a few minutes after my dashboard start yelling : Overall status: HEALTH_ERR MDS_DAMAGE: 1 MDSs report damaged metadata So I immediately log in my ceph admin node than do a ceph -s: cluster: id: 472

Re: [ceph-users] Too many objects per pg than average: deadlock situation

2018-05-23 Thread Mike A
Hello > 21 мая 2018 г., в 2:05, Sage Weil написал(а): > > On Sun, 20 May 2018, Mike A wrote: >> Hello! >> >> In our cluster, we see a deadlock situation. >> This is a standard cluster for an OpenStack without a RadosGW, we have a >> standard block access pools and one for metrics from a gnocch

Re: [ceph-users] Too many objects per pg than average: deadlock situation

2018-05-23 Thread Sage Weil
On Wed, 23 May 2018, Mike A wrote: > Hello > > > 21 мая 2018 г., в 2:05, Sage Weil написал(а): > > > > On Sun, 20 May 2018, Mike A wrote: > >> Hello! > >> > >> In our cluster, we see a deadlock situation. > >> This is a standard cluster for an OpenStack without a RadosGW, we have a > >> standa

[ceph-users] Flush very, very slow

2018-05-23 Thread Philip Poten
Hi, the flush from the overlay cache for my ec-based cephfs is very very slow, as are all operations on the cephfs. The flush accelerates when the mds is stopped. I think this is due to a large number of files that were deleted all at once, but I'm not sure how to verify that. Are there any count

[ceph-users] Ceph replication factor of 2

2018-05-23 Thread Anthony Verevkin
This week at the OpenStackSummit Vancouver I can hear people entertaining the idea of running Ceph with replication factor of 2. Karl Vietmeier of Intel suggested that we use 2x replication because Bluestore comes with checksums. https://www.openstack.org/summit/vancouver-2018/summit-schedule/ev

Re: [ceph-users] Ceph replication factor of 2

2018-05-23 Thread Jack
Hi, About Bluestore, sure there are checksum, but are they fully used ? Rumors said that on a replicated pool, during recovery, they are not > My thoughts on the subject are that even though checksums do allow to find > which replica is corrupt without having to figure which 2 out of 3 copies a

Re: [ceph-users] 12.2.4 Both Ceph MDS nodes crashed. Please help.

2018-05-23 Thread Yan, Zheng
On Thu, May 24, 2018 at 12:00 AM, Sean Sullivan wrote: > Thanks Yan! I did this for the bug ticket and missed these replies. I hope I > did it correctly. Here are the pastes of the dumps: > > https://pastebin.com/kw4bZVZT -- primary > https://pastebin.com/sYZQx0ER -- secondary > > > they are not t

Re: [ceph-users] SSD-primary crush rule doesn't work as intended

2018-05-23 Thread Horace
Oh, it's not working as intended though the ssd-primary rule is officially listed on ceph documentation. I should file a feature request or bugzilla for it? Regards, Horace Ng From: "Paul Emmerich" To: "horace" Cc: "ceph-users" Sent: Wednesday, May 23, 2018 8:37:07 PM Subject: Re: [c

Re: [ceph-users] Ceph replication factor of 2

2018-05-23 Thread Janne Johansson
Den tors 24 maj 2018 kl 00:20 skrev Jack : > Hi, > > I have to say, this is a common yet worthless argument > If I have 3000 OSD, using 2 or 3 replica will not change much : the > probability of losing 2 devices is still "high" > On the other hand, if I have a small cluster, less than a hundred OS

Re: [ceph-users] Ceph replication factor of 2

2018-05-23 Thread Daniel Baumann
Hi, I coudn't agree more, but just to re-emphasize what others already said: the point of replica 3 is not to have extra safety for (human|software|server) failures, but to have enough data around to allow rebalancing the cluster when disks fail. after a certain amount of disks in a cluste