Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Wido den Hollander


On 1/11/19 8:08 PM, Kenneth Van Alstyne wrote:
> Hello all (and maybe this would be better suited for the ceph devel
> mailing list):
> I’d like to use RBD mirroring between two sites (to each other), but I
> have the following limitations:
> - The clusters use the same name (“ceph”)
> - The clusters share IP address space on a private, non-routed storage
> network
> 

The IP-space can be fixed rather easily. For Ceph it's not required to
keep the same IP's. OSDs can be renumbered very simple.

MONs take a bit more work, but they can be done without any downtime either.

Wido

> There are management servers on each side that can talk to the
> respective storage networks, but the storage networks cannot talk
> directly to each other.  I recall reading, some years back, of possibly
> adding support for an RBD mirror proxy, which would potentially solve my
> issues.  Has anything been done in this regard?  If not, is my best bet
> perhaps a tertiary clusters that both can reach and do one-way
> replication to?
> 
> Thanks,
> 
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 f: 571-266-3106
> www.knightpoint.com  
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3
> 
> Notice: This e-mail message, including any attachments, is for the sole
> use of the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, copy, use, disclosure,
> or distribution is STRICTLY prohibited. If you are not the intended
> recipient, please contact the sender by reply e-mail and destroy all
> copies of the original message.
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Clarification of communication between mon and osd

2019-01-14 Thread Eugen Block

Hello list,

I noticed my last post was displayed as a reply to a different thread,  
so I re-send my question, please excuse the noise.


There are two config options of mon/osd interaction that I don't fully  
understand. Maybe one of you could clarify it for me.



mon osd report timeout
- The grace period in seconds before declaring unresponsive Ceph OSD  
Daemons down. Default 900



mon osd down out interval
- The number of seconds Ceph waits before marking a Ceph OSD Daemon  
down and out if it doesn’t respond. Default 600


I've seen the mon_osd_down_out_interval beeing hit plenty of times,  
e.g. If I manually take down an OSD it will be marked out after 10  
minutes. But I can't quite remember seeing the 900 seconds timeout  
happen. When exactly will the mon_osd_report_timeout kick in? Does  
this mean that if for some reason one OSD is unresponsive the MON will  
mark it down after 15 minutes, then wait another 10 minutes until it  
is marked out so the recovery can start?


I'd appreciate any insight!

Regards,
Eugen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clarification of communication between mon and osd

2019-01-14 Thread Paul Emmerich
Yes, your understanding is correct. But the main mechanism by which
OSDs are reported as down is that other OSDs report them as down with
a much stricter timeout (20 seconds? 30 seconds? something like that).

It's quite rare to hit the "mon osd report timeout" (the usual
scenario here is a network partition)

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jan 14, 2019 at 10:17 AM Eugen Block  wrote:
>
> Hello list,
>
> I noticed my last post was displayed as a reply to a different thread,
> so I re-send my question, please excuse the noise.
>
> There are two config options of mon/osd interaction that I don't fully
> understand. Maybe one of you could clarify it for me.
>
> > mon osd report timeout
> > - The grace period in seconds before declaring unresponsive Ceph OSD
> > Daemons down. Default 900
>
> > mon osd down out interval
> > - The number of seconds Ceph waits before marking a Ceph OSD Daemon
> > down and out if it doesn’t respond. Default 600
>
> I've seen the mon_osd_down_out_interval beeing hit plenty of times,
> e.g. If I manually take down an OSD it will be marked out after 10
> minutes. But I can't quite remember seeing the 900 seconds timeout
> happen. When exactly will the mon_osd_report_timeout kick in? Does
> this mean that if for some reason one OSD is unresponsive the MON will
> mark it down after 15 minutes, then wait another 10 minutes until it
> is marked out so the recovery can start?
>
> I'd appreciate any insight!
>
> Regards,
> Eugen
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Clarification of communication between mon and osd

2019-01-14 Thread Eugen Block

Thanks for the reply, Paul.


Yes, your understanding is correct. But the main mechanism by which
OSDs are reported as down is that other OSDs report them as down with
a much stricter timeout (20 seconds? 30 seconds? something like that).


Yes, the osd_heartbeat_grace of 20 seconds has occured from time to  
time in setups with network configuration issues.



It's quite rare to hit the "mon osd report timeout" (the usual
scenario here is a network partition)


Thanks for the confirmation.

Eugen

Zitat von Paul Emmerich :


Yes, your understanding is correct. But the main mechanism by which
OSDs are reported as down is that other OSDs report them as down with
a much stricter timeout (20 seconds? 30 seconds? something like that).

It's quite rare to hit the "mon osd report timeout" (the usual
scenario here is a network partition)

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jan 14, 2019 at 10:17 AM Eugen Block  wrote:


Hello list,

I noticed my last post was displayed as a reply to a different thread,
so I re-send my question, please excuse the noise.

There are two config options of mon/osd interaction that I don't fully
understand. Maybe one of you could clarify it for me.

> mon osd report timeout
> - The grace period in seconds before declaring unresponsive Ceph OSD
> Daemons down. Default 900

> mon osd down out interval
> - The number of seconds Ceph waits before marking a Ceph OSD Daemon
> down and out if it doesn’t respond. Default 600

I've seen the mon_osd_down_out_interval beeing hit plenty of times,
e.g. If I manually take down an OSD it will be marked out after 10
minutes. But I can't quite remember seeing the 900 seconds timeout
happen. When exactly will the mon_osd_report_timeout kick in? Does
this mean that if for some reason one OSD is unresponsive the MON will
mark it down after 15 minutes, then wait another 10 minutes until it
is marked out so the recovery can start?

I'd appreciate any insight!

Regards,
Eugen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore device’s device selector for Samsung NVMe

2019-01-14 Thread Yanko Davila


Hello

My name is Yanko Davila, I´m new to ceph so please pardon my ignorance. 
I have a question about Bluestore and SPDK.


I´m currently running ceph version:

ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) luminous 
(stable)


on Debian:

Linux  4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 
(2018-08-21) x86_64 GNU/Linux


Distributor ID:    Debian
Description:    Debian GNU/Linux 9.5 (stretch)
Release:    9.5
Codename:    stretch

I´m trying to add an NVMe osd using bluestore but I´m struggling to find 
the device selector for that NVMe. So far I´ve been  able to compile 
spdk and succesfully run the setup.sh script. I can also succesfully run 
the identify example which leads me to think that spdk is working as 
expected.


When I read the online manual ( 
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#spdk-usage 
) it gives an example for an Intel PCIe SSD:




For example, users can find the device selector of an Intel PCIe SSD with:

$ lspci -mm -n -D -d 8086:0953



When I try the same command adjusting for my Samsung SSD it returns 
nothing or the return is just blank here is what I tried:


$ lspci -mm -n -D -d 144d:a801


Assuming that I gave you enough information. Can anyone spot what I´m 
doing wrong? Does spdk only works on Intel SSDs ? Any comment is highly 
appreciated. Thank You for your time.


Yanko.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-14 Thread David C
Hi All

I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
filesystem, it seems to be working pretty well so far. A few questions:

1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a libcephfs
client,..." [1]. For arguments sake, if I have ten top level dirs in my
Cephfs namespace, is there any value in creating a separate export for each
directory? Will that potentially give me better performance than a single
export of the entire namespace?

2) Tuning: are there any recommended parameters to tune? So far I've found
I had to increase client_oc_size which seemed quite conservative.

Thanks
David

[1] http://docs.ceph.com/docs/mimic/cephfs/nfs/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Massimo Sgaravatto
I have a ceph luminous cluster running on CentOS7 nodes.
This cluster has 50 OSDs, all with the same size and all with the same
weight.

Since I noticed that there was a quite "unfair" usage of OSD nodes (some
used at 30 %, some used at 70 %) I tried to activate the balancer.

But the balancer doesn't start I guess because of this problem:

[root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat
Error EPERM: crush map contains one or more bucket(s) that are not straw2


So I issued the command to convert from straw to straw2 (all the clients
are running luminous):


[root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
Error EINVAL: new crush map requires client version hammer but
require_min_compat_client is firefly
[root@ceph-mon-01 ~]# ceph osd set-require-min-compat-client jewel
set require_min_compat_client to jewel
[root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
[root@ceph-mon-01 ~]#


After having issued the command, the cluster went in WARNING state because
~ 12 % objects were misplaced.

Is this normal ?
I read somewhere that the migration from straw to straw2 should trigger a
data migration only if the OSDs have different sizes, which is not my case.


The cluster is still recovering, but what is worrying me is that it looks
like that data are being moved to the most used OSDs and the MAX_AVAIL
value is decreasing quite quickly.

I hope that the recovery can finish without causing problems: then I will
immediately activate the balancer.

But, if some OSDs are getting too full, is it safe to decrease their
weights  while the cluster is still being recovered ?

Thanks a lot for your help
Of course I can provide other info, if needed


Cheers, Massimo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Dan van der Ster
On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
 wrote:
>
> I have a ceph luminous cluster running on CentOS7 nodes.
> This cluster has 50 OSDs, all with the same size and all with the same weight.
>
> Since I noticed that there was a quite "unfair" usage of OSD nodes (some used 
> at 30 %, some used at 70 %) I tried to activate the balancer.
>
> But the balancer doesn't start I guess because of this problem:
>
> [root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat
> Error EPERM: crush map contains one or more bucket(s) that are not straw2
>
>
> So I issued the command to convert from straw to straw2 (all the clients are 
> running luminous):
>
>
> [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
> Error EINVAL: new crush map requires client version hammer but 
> require_min_compat_client is firefly
> [root@ceph-mon-01 ~]# ceph osd set-require-min-compat-client jewel
> set require_min_compat_client to jewel
> [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
> [root@ceph-mon-01 ~]#
>
>
> After having issued the command, the cluster went in WARNING state because ~ 
> 12 % objects were misplaced.
>
> Is this normal ?
> I read somewhere that the migration from straw to straw2 should trigger a 
> data migration only if the OSDs have different sizes, which is not my case.

The relevant sizes to compare are the crush buckets across which you
are replicating.
Are you replicating host-wise or rack-wise?
Do you have hosts/racks with a different crush weight (e.g. different
crush size).
Maybe share your `ceph osd tree`.

Cheers, dan



>
>
> The cluster is still recovering, but what is worrying me is that it looks 
> like that data are being moved to the most used OSDs and the MAX_AVAIL value 
> is decreasing quite quickly.
>
> I hope that the recovery can finish without causing problems: then I will 
> immediately activate the balancer.
>
> But, if some OSDs are getting too full, is it safe to decrease their weights  
> while the cluster is still being recovered ?
>
> Thanks a lot for your help
> Of course I can provide other info, if needed
>
>
> Cheers, Massimo
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Massimo Sgaravatto
Thanks for the prompt reply

Indeed I have different racks with different weights.
Below the ceph osd tree" output

[root@ceph-mon-01 ~]# ceph osd tree
ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
-1   272.80426 root default
-7   109.12170 rack Rack11-PianoAlto
-854.56085 host ceph-osd-04
30   hdd   5.45609 osd.30up  1.0 1.0
31   hdd   5.45609 osd.31up  1.0 1.0
32   hdd   5.45609 osd.32up  1.0 1.0
33   hdd   5.45609 osd.33up  1.0 1.0
34   hdd   5.45609 osd.34up  1.0 1.0
35   hdd   5.45609 osd.35up  1.0 1.0
36   hdd   5.45609 osd.36up  1.0 1.0
37   hdd   5.45609 osd.37up  1.0 1.0
38   hdd   5.45609 osd.38up  1.0 1.0
39   hdd   5.45609 osd.39up  1.0 1.0
-954.56085 host ceph-osd-05
40   hdd   5.45609 osd.40up  1.0 1.0
41   hdd   5.45609 osd.41up  1.0 1.0
42   hdd   5.45609 osd.42up  1.0 1.0
43   hdd   5.45609 osd.43up  1.0 1.0
44   hdd   5.45609 osd.44up  1.0 1.0
45   hdd   5.45609 osd.45up  1.0 1.0
46   hdd   5.45609 osd.46up  1.0 1.0
47   hdd   5.45609 osd.47up  1.0 1.0
48   hdd   5.45609 osd.48up  1.0 1.0
49   hdd   5.45609 osd.49up  1.0 1.0
-6   109.12170 rack Rack15-PianoAlto
-354.56085 host ceph-osd-02
10   hdd   5.45609 osd.10up  1.0 1.0
11   hdd   5.45609 osd.11up  1.0 1.0
12   hdd   5.45609 osd.12up  1.0 1.0
13   hdd   5.45609 osd.13up  1.0 1.0
14   hdd   5.45609 osd.14up  1.0 1.0
15   hdd   5.45609 osd.15up  1.0 1.0
16   hdd   5.45609 osd.16up  1.0 1.0
17   hdd   5.45609 osd.17up  1.0 1.0
18   hdd   5.45609 osd.18up  1.0 1.0
19   hdd   5.45609 osd.19up  1.0 1.0
-454.56085 host ceph-osd-03
20   hdd   5.45609 osd.20up  1.0 1.0
21   hdd   5.45609 osd.21up  1.0 1.0
22   hdd   5.45609 osd.22up  1.0 1.0
23   hdd   5.45609 osd.23up  1.0 1.0
24   hdd   5.45609 osd.24up  1.0 1.0
25   hdd   5.45609 osd.25up  1.0 1.0
26   hdd   5.45609 osd.26up  1.0 1.0
27   hdd   5.45609 osd.27up  1.0 1.0
28   hdd   5.45609 osd.28up  1.0 1.0
29   hdd   5.45609 osd.29up  1.0 1.0
-554.56085 rack Rack17-PianoAlto
-254.56085 host ceph-osd-01
 0   hdd   5.45609 osd.0 up  1.0 1.0
 1   hdd   5.45609 osd.1 up  1.0 1.0
 2   hdd   5.45609 osd.2 up  1.0 1.0
 3   hdd   5.45609 osd.3 up  1.0 1.0
 4   hdd   5.45609 osd.4 up  1.0 1.0
 5   hdd   5.45609 osd.5 up  1.0 1.0
 6   hdd   5.45609 osd.6 up  1.0 1.0
 7   hdd   5.45609 osd.7 up  1.0 1.0
 8   hdd   5.45609 osd.8 up  1.0 1.0
 9   hdd   5.45609 osd.9 up  1.0 1.0
[root@ceph-mon-01 ~]#

On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster  wrote:

> On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
>  wrote:
> >
> > I have a ceph luminous cluster running on CentOS7 nodes.
> > This cluster has 50 OSDs, all with the same size and all with the same
> weight.
> >
> > Since I noticed that there was a quite "unfair" usage of OSD nodes (some
> used at 30 %, some used at 70 %) I tried to activate the balancer.
> >
> > But the balancer doesn't start I guess because of this problem:
> >
> > [root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat
> > Error EPERM: crush map contains one or more bucket(s) that are not straw2
> >
> >
> > So I issued the command to convert from straw to straw2 (all the clients
> are running luminous):
> >
> >
> > [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2
> > Error EINVAL: new crush map requires client version hammer but
> require_min_compa

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Wido den Hollander


On 1/14/19 3:18 PM, Massimo Sgaravatto wrote:
> Thanks for the prompt reply
> 
> Indeed I have different racks with different weights. 
> Below the ceph osd tree" output
> 

Can you also show the output of 'ceph osd df' ?

The amount of PGs might be on the low side which also causes this imbalance.

If you do not have enough PGs CRUSH can't properly distribute either.

Wido

> [root@ceph-mon-01 ~]# ceph osd tree
> ID CLASS WEIGHT    TYPE NAME                 STATUS REWEIGHT PRI-AFF 
> -1       272.80426 root default                                      
> -7       109.12170     rack Rack11-PianoAlto                         
> -8        54.56085         host ceph-osd-04                          
> 30   hdd   5.45609             osd.30            up  1.0 1.0 
> 31   hdd   5.45609             osd.31            up  1.0 1.0 
> 32   hdd   5.45609             osd.32            up  1.0 1.0 
> 33   hdd   5.45609             osd.33            up  1.0 1.0 
> 34   hdd   5.45609             osd.34            up  1.0 1.0 
> 35   hdd   5.45609             osd.35            up  1.0 1.0 
> 36   hdd   5.45609             osd.36            up  1.0 1.0 
> 37   hdd   5.45609             osd.37            up  1.0 1.0 
> 38   hdd   5.45609             osd.38            up  1.0 1.0 
> 39   hdd   5.45609             osd.39            up  1.0 1.0 
> -9        54.56085         host ceph-osd-05                          
> 40   hdd   5.45609             osd.40            up  1.0 1.0 
> 41   hdd   5.45609             osd.41            up  1.0 1.0 
> 42   hdd   5.45609             osd.42            up  1.0 1.0 
> 43   hdd   5.45609             osd.43            up  1.0 1.0 
> 44   hdd   5.45609             osd.44            up  1.0 1.0 
> 45   hdd   5.45609             osd.45            up  1.0 1.0 
> 46   hdd   5.45609             osd.46            up  1.0 1.0 
> 47   hdd   5.45609             osd.47            up  1.0 1.0 
> 48   hdd   5.45609             osd.48            up  1.0 1.0 
> 49   hdd   5.45609             osd.49            up  1.0 1.0 
> -6       109.12170     rack Rack15-PianoAlto                         
> -3        54.56085         host ceph-osd-02                          
> 10   hdd   5.45609             osd.10            up  1.0 1.0 
> 11   hdd   5.45609             osd.11            up  1.0 1.0 
> 12   hdd   5.45609             osd.12            up  1.0 1.0 
> 13   hdd   5.45609             osd.13            up  1.0 1.0 
> 14   hdd   5.45609             osd.14            up  1.0 1.0 
> 15   hdd   5.45609             osd.15            up  1.0 1.0 
> 16   hdd   5.45609             osd.16            up  1.0 1.0 
> 17   hdd   5.45609             osd.17            up  1.0 1.0 
> 18   hdd   5.45609             osd.18            up  1.0 1.0 
> 19   hdd   5.45609             osd.19            up  1.0 1.0 
> -4        54.56085         host ceph-osd-03                          
> 20   hdd   5.45609             osd.20            up  1.0 1.0 
> 21   hdd   5.45609             osd.21            up  1.0 1.0 
> 22   hdd   5.45609             osd.22            up  1.0 1.0 
> 23   hdd   5.45609             osd.23            up  1.0 1.0 
> 24   hdd   5.45609             osd.24            up  1.0 1.0 
> 25   hdd   5.45609             osd.25            up  1.0 1.0 
> 26   hdd   5.45609             osd.26            up  1.0 1.0 
> 27   hdd   5.45609             osd.27            up  1.0 1.0 
> 28   hdd   5.45609             osd.28            up  1.0 1.0 
> 29   hdd   5.45609             osd.29            up  1.0 1.0 
> -5        54.56085     rack Rack17-PianoAlto                         
> -2        54.56085         host ceph-osd-01                          
>  0   hdd   5.45609             osd.0             up  1.0 1.0 
>  1   hdd   5.45609             osd.1             up  1.0 1.0 
>  2   hdd   5.45609             osd.2             up  1.0 1.0 
>  3   hdd   5.45609             osd.3             up  1.0 1.0 
>  4   hdd   5.45609             osd.4             up  1.0 1.0 
>  5   hdd   5.45609             osd.5             up  1.0 1.0 
>  6   hdd   5.45609             osd.6             up  1.0 1.0 
>  7   hdd   5.45609             osd.7             up  1.0 1.0 
>  8   hdd   5.45609             osd.8             up  1.0 1.0 
>  9   hdd   5.45609             osd.9             up  1.0 1.0 
> [root@ceph-mon-01 ~]#
> 
> On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster  > wrote:
> 
> On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
> mailto:massimo.sgarava...@gmail.com>>
> wrote:
> >
> > I have a cep

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Dan van der Ster
On Mon, Jan 14, 2019 at 3:18 PM Massimo Sgaravatto
 wrote:
>
> Thanks for the prompt reply
>
> Indeed I have different racks with different weights.

Are you sure you're replicating across racks? You have only 3 racks,
one of which is half the size of the other two -- if yes, then your
cluster will be full once that rack is full.

-- dan


> Below the ceph osd tree" output
>
> [root@ceph-mon-01 ~]# ceph osd tree
> ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
> -1   272.80426 root default
> -7   109.12170 rack Rack11-PianoAlto
> -854.56085 host ceph-osd-04
> 30   hdd   5.45609 osd.30up  1.0 1.0
> 31   hdd   5.45609 osd.31up  1.0 1.0
> 32   hdd   5.45609 osd.32up  1.0 1.0
> 33   hdd   5.45609 osd.33up  1.0 1.0
> 34   hdd   5.45609 osd.34up  1.0 1.0
> 35   hdd   5.45609 osd.35up  1.0 1.0
> 36   hdd   5.45609 osd.36up  1.0 1.0
> 37   hdd   5.45609 osd.37up  1.0 1.0
> 38   hdd   5.45609 osd.38up  1.0 1.0
> 39   hdd   5.45609 osd.39up  1.0 1.0
> -954.56085 host ceph-osd-05
> 40   hdd   5.45609 osd.40up  1.0 1.0
> 41   hdd   5.45609 osd.41up  1.0 1.0
> 42   hdd   5.45609 osd.42up  1.0 1.0
> 43   hdd   5.45609 osd.43up  1.0 1.0
> 44   hdd   5.45609 osd.44up  1.0 1.0
> 45   hdd   5.45609 osd.45up  1.0 1.0
> 46   hdd   5.45609 osd.46up  1.0 1.0
> 47   hdd   5.45609 osd.47up  1.0 1.0
> 48   hdd   5.45609 osd.48up  1.0 1.0
> 49   hdd   5.45609 osd.49up  1.0 1.0
> -6   109.12170 rack Rack15-PianoAlto
> -354.56085 host ceph-osd-02
> 10   hdd   5.45609 osd.10up  1.0 1.0
> 11   hdd   5.45609 osd.11up  1.0 1.0
> 12   hdd   5.45609 osd.12up  1.0 1.0
> 13   hdd   5.45609 osd.13up  1.0 1.0
> 14   hdd   5.45609 osd.14up  1.0 1.0
> 15   hdd   5.45609 osd.15up  1.0 1.0
> 16   hdd   5.45609 osd.16up  1.0 1.0
> 17   hdd   5.45609 osd.17up  1.0 1.0
> 18   hdd   5.45609 osd.18up  1.0 1.0
> 19   hdd   5.45609 osd.19up  1.0 1.0
> -454.56085 host ceph-osd-03
> 20   hdd   5.45609 osd.20up  1.0 1.0
> 21   hdd   5.45609 osd.21up  1.0 1.0
> 22   hdd   5.45609 osd.22up  1.0 1.0
> 23   hdd   5.45609 osd.23up  1.0 1.0
> 24   hdd   5.45609 osd.24up  1.0 1.0
> 25   hdd   5.45609 osd.25up  1.0 1.0
> 26   hdd   5.45609 osd.26up  1.0 1.0
> 27   hdd   5.45609 osd.27up  1.0 1.0
> 28   hdd   5.45609 osd.28up  1.0 1.0
> 29   hdd   5.45609 osd.29up  1.0 1.0
> -554.56085 rack Rack17-PianoAlto
> -254.56085 host ceph-osd-01
>  0   hdd   5.45609 osd.0 up  1.0 1.0
>  1   hdd   5.45609 osd.1 up  1.0 1.0
>  2   hdd   5.45609 osd.2 up  1.0 1.0
>  3   hdd   5.45609 osd.3 up  1.0 1.0
>  4   hdd   5.45609 osd.4 up  1.0 1.0
>  5   hdd   5.45609 osd.5 up  1.0 1.0
>  6   hdd   5.45609 osd.6 up  1.0 1.0
>  7   hdd   5.45609 osd.7 up  1.0 1.0
>  8   hdd   5.45609 osd.8 up  1.0 1.0
>  9   hdd   5.45609 osd.9 up  1.0 1.0
> [root@ceph-mon-01 ~]#
>
> On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster  wrote:
>>
>> On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto
>>  wrote:
>> >
>> > I have a ceph luminous cluster running on CentOS7 nodes.
>> > This cluster has 50 OSDs, all with the same size and all with the same 
>> > weight.
>> >
>> > Since I noticed that there was a quite "unfair" usage of OSD nodes (some 
>> > used at 30 %, some used at 70 %) I tried to activate the balancer.
>> >
>> > But the balancer doesn't start I guess because of this problem:
>> >
>> > [root@c

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Massimo Sgaravatto
This [*]is the output of "ceph osd df".

Thanks a lot !
Massimo


[*]
[root@ceph-mon-01 ~]# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL %USE  VAR  PGS
30   hdd 5.45609  1.0 5587G 1875G 3711G 33.57 0.65 140
31   hdd 5.45609  1.0 5587G 3951G 1635G 70.72 1.38 144
32   hdd 5.45609  1.0 5587G 3426G 2160G 61.33 1.19 127
33   hdd 5.45609  1.0 5587G 3548G 2038G 63.51 1.24 167
34   hdd 5.45609  1.0 5587G 1847G 3739G 33.06 0.64 121
35   hdd 5.45609  1.0 5587G 2496G 3090G 44.68 0.87 161
36   hdd 5.45609  1.0 5587G 3038G 2548G 54.38 1.06 153
37   hdd 5.45609  1.0 5587G 2834G 2752G 50.73 0.99 122
38   hdd 5.45609  1.0 5587G 2781G 2805G 49.79 0.97 124
39   hdd 5.45609  1.0 5587G 3362G 2224G 60.18 1.17 141
40   hdd 5.45609  1.0 5587G 2738G 2848G 49.02 0.95 139
41   hdd 5.45609  1.0 5587G 2924G 2662G 52.35 1.02 129
42   hdd 5.45609  1.0 5587G 2195G 3391G 39.29 0.77 116
43   hdd 5.45609  1.0 5587G 2654G 2932G 47.51 0.93 132
44   hdd 5.45609  1.0 5587G 3180G 2406G 56.93 1.11 125
45   hdd 5.45609  1.0 5587G 2727G 2859G 48.82 0.95 152
46   hdd 5.45609  1.0 5587G 2844G 2742G 50.91 0.99 153
47   hdd 5.45609  1.0 5587G 2611G 2975G 46.74 0.91 127
48   hdd 5.45609  1.0 5587G 3575G 2011G 63.99 1.25 139
49   hdd 5.45609  1.0 5587G 1876G 3710G 33.59 0.65 121
10   hdd 5.45609  1.0 5587G 2884G 2702G 51.64 1.01 128
11   hdd 5.45609  1.0 5587G 3401G 2185G 60.89 1.19 130
12   hdd 5.45609  1.0 5587G 4023G 1563G 72.01 1.40 153
13   hdd 5.45609  1.0 5587G 1303G 4283G 23.34 0.45 131
14   hdd 5.45609  1.0 5587G 2792G 2794G 49.97 0.97 135
15   hdd 5.45609  1.0 5587G 1765G 3821G 31.61 0.62 123
16   hdd 5.45609  1.0 5587G 3958G 1628G 70.86 1.38 152
17   hdd 5.45609  1.0 5587G 4362G 1224G 78.09 1.52 139
18   hdd 5.45609  1.0 5587G 2766G 2820G 49.51 0.96 144
19   hdd 5.45609  1.0 5587G 3427G 2159G 61.34 1.19 131
20   hdd 5.45609  1.0 5587G 3226G 2360G 57.75 1.12 162
21   hdd 5.45609  1.0 5587G 2247G 3339G 40.22 0.78 146
22   hdd 5.45609  1.0 5587G 2128G 3458G 38.10 0.74 124
23   hdd 5.45609  1.0 5587G 2749G 2837G 49.21 0.96 133
24   hdd 5.45609  1.0 5587G 3979G 1607G 71.24 1.39 148
25   hdd 5.45609  1.0 5587G 2179G 3407G 39.02 0.76 121
26   hdd 5.45609  1.0 5587G 3860G 1726G 69.09 1.35 151
27   hdd 5.45609  1.0 5587G 2161G 3425G 38.68 0.75 137
28   hdd 5.45609  1.0 5587G 3898G 1688G 69.78 1.36 141
29   hdd 5.45609  1.0 5587G 2355G 3231G 42.15 0.82 121
 0   hdd 5.45609  1.0 5587G 3294G 2292G 58.97 1.15 127
 1   hdd 5.45609  1.0 5587G 2515G 3071G 45.02 0.88 132
 2   hdd 5.45609  1.0 5587G 3300G 2286G 59.07 1.15 144
 3   hdd 5.45609  1.0 5587G 2943G 2643G 52.68 1.03 151
 4   hdd 5.45609  1.0 5587G 2641G 2945G 47.29 0.92 114
 5   hdd 5.45609  1.0 5587G 2786G 2801G 49.87 0.97 131
 6   hdd 5.45609  1.0 5587G 2564G 3022G 45.90 0.89 121
 7   hdd 5.45609  1.0 5587G 1923G 3663G 34.43 0.67 143
 8   hdd 5.45609  1.0 5587G 2625G 2961G 46.99 0.91 130
 9   hdd 5.45609  1.0 5587G 2921G 2665G 52.30 1.02 140
TOTAL  272T  140T  132T 51.36
MIN/MAX VAR: 0.45/1.52  STDDEV: 12.09


On Mon, Jan 14, 2019 at 3:22 PM Wido den Hollander  wrote:

>
>
> On 1/14/19 3:18 PM, Massimo Sgaravatto wrote:
> > Thanks for the prompt reply
> >
> > Indeed I have different racks with different weights.
> > Below the ceph osd tree" output
> >
>
> Can you also show the output of 'ceph osd df' ?
>
> The amount of PGs might be on the low side which also causes this
> imbalance.
>
> If you do not have enough PGs CRUSH can't properly distribute either.
>
> Wido
>
> > [root@ceph-mon-01 ~]# ceph osd tree
> > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
> > -1   272.80426 root default
> > -7   109.12170 rack Rack11-PianoAlto
> > -854.56085 host ceph-osd-04
> > 30   hdd   5.45609 osd.30up  1.0 1.0
> > 31   hdd   5.45609 osd.31up  1.0 1.0
> > 32   hdd   5.45609 osd.32up  1.0 1.0
> > 33   hdd   5.45609 osd.33up  1.0 1.0
> > 34   hdd   5.45609 osd.34up  1.0 1.0
> > 35   hdd   5.45609 osd.35up  1.0 1.0
> > 36   hdd   5.45609 osd.36up  1.0 1.0
> > 37   hdd   5.45609 osd.37up  1.0 1.0
> > 38   hdd   5.45609 osd.38up  1.0 1.0
> > 39   hdd   5.45609 osd.39up  1.0 1.0
> > -954.56085 host ceph-osd-05
> > 40   hdd   5.45609 osd.40up  1.0 1.0
> > 41   hdd   5.45609 osd.41up  1.0 1.0
> > 42   hdd   5.45609 osd.42up  1.0 1.0
> > 43   hdd   5.45609 osd.43up  1.0 1.0
> > 44   hdd  

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Massimo Sgaravatto
Hi Dan

I have indeed at the moment only 5 OSD nodes on 3 racks.
The crush-map is attached.
Are you suggesting to replicate only between nodes and not between racks
(since the very few resources) ?
Thanks, Massimo

On Mon, Jan 14, 2019 at 3:29 PM Dan van der Ster  wrote:

> On Mon, Jan 14, 2019 at 3:18 PM Massimo Sgaravatto
>  wrote:
> >
> > Thanks for the prompt reply
> >
> > Indeed I have different racks with different weights.
>
> Are you sure you're replicating across racks? You have only 3 racks,
> one of which is half the size of the other two -- if yes, then your
> cluster will be full once that rack is full.
>
> -- dan
>
>
> > Below the ceph osd tree" output
> >
> > [root@ceph-mon-01 ~]# ceph osd tree
> > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
> > -1   272.80426 root default
> > -7   109.12170 rack Rack11-PianoAlto
> > -854.56085 host ceph-osd-04
> > 30   hdd   5.45609 osd.30up  1.0 1.0
> > 31   hdd   5.45609 osd.31up  1.0 1.0
> > 32   hdd   5.45609 osd.32up  1.0 1.0
> > 33   hdd   5.45609 osd.33up  1.0 1.0
> > 34   hdd   5.45609 osd.34up  1.0 1.0
> > 35   hdd   5.45609 osd.35up  1.0 1.0
> > 36   hdd   5.45609 osd.36up  1.0 1.0
> > 37   hdd   5.45609 osd.37up  1.0 1.0
> > 38   hdd   5.45609 osd.38up  1.0 1.0
> > 39   hdd   5.45609 osd.39up  1.0 1.0
> > -954.56085 host ceph-osd-05
> > 40   hdd   5.45609 osd.40up  1.0 1.0
> > 41   hdd   5.45609 osd.41up  1.0 1.0
> > 42   hdd   5.45609 osd.42up  1.0 1.0
> > 43   hdd   5.45609 osd.43up  1.0 1.0
> > 44   hdd   5.45609 osd.44up  1.0 1.0
> > 45   hdd   5.45609 osd.45up  1.0 1.0
> > 46   hdd   5.45609 osd.46up  1.0 1.0
> > 47   hdd   5.45609 osd.47up  1.0 1.0
> > 48   hdd   5.45609 osd.48up  1.0 1.0
> > 49   hdd   5.45609 osd.49up  1.0 1.0
> > -6   109.12170 rack Rack15-PianoAlto
> > -354.56085 host ceph-osd-02
> > 10   hdd   5.45609 osd.10up  1.0 1.0
> > 11   hdd   5.45609 osd.11up  1.0 1.0
> > 12   hdd   5.45609 osd.12up  1.0 1.0
> > 13   hdd   5.45609 osd.13up  1.0 1.0
> > 14   hdd   5.45609 osd.14up  1.0 1.0
> > 15   hdd   5.45609 osd.15up  1.0 1.0
> > 16   hdd   5.45609 osd.16up  1.0 1.0
> > 17   hdd   5.45609 osd.17up  1.0 1.0
> > 18   hdd   5.45609 osd.18up  1.0 1.0
> > 19   hdd   5.45609 osd.19up  1.0 1.0
> > -454.56085 host ceph-osd-03
> > 20   hdd   5.45609 osd.20up  1.0 1.0
> > 21   hdd   5.45609 osd.21up  1.0 1.0
> > 22   hdd   5.45609 osd.22up  1.0 1.0
> > 23   hdd   5.45609 osd.23up  1.0 1.0
> > 24   hdd   5.45609 osd.24up  1.0 1.0
> > 25   hdd   5.45609 osd.25up  1.0 1.0
> > 26   hdd   5.45609 osd.26up  1.0 1.0
> > 27   hdd   5.45609 osd.27up  1.0 1.0
> > 28   hdd   5.45609 osd.28up  1.0 1.0
> > 29   hdd   5.45609 osd.29up  1.0 1.0
> > -554.56085 rack Rack17-PianoAlto
> > -254.56085 host ceph-osd-01
> >  0   hdd   5.45609 osd.0 up  1.0 1.0
> >  1   hdd   5.45609 osd.1 up  1.0 1.0
> >  2   hdd   5.45609 osd.2 up  1.0 1.0
> >  3   hdd   5.45609 osd.3 up  1.0 1.0
> >  4   hdd   5.45609 osd.4 up  1.0 1.0
> >  5   hdd   5.45609 osd.5 up  1.0 1.0
> >  6   hdd   5.45609 osd.6 up  1.0 1.0
> >  7   hdd   5.45609 osd.7 up  1.0 1.0
> >  8   hdd   5.45609 osd.8 up  1.0 1.0
> >  9   hdd   5.45609 osd.9 up  1.0 1.0
> > [root@ceph-mon-01 ~]#
> >
> > On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster 
> wrote:
> >>
> >> On Mon, Jan 14, 2019 at 3:06 PM

Re: [ceph-users] Problems after migrating to straw2 (to enable the balancer)

2019-01-14 Thread Dan van der Ster
Your crush rule is ok:

step chooseleaf firstn 0 type host

You are replicating host-wise, not rack wise.

This is what I would suggest for you cluster, but keep in mind that a
whole-rack outage will leave some PGs incomplete.

Regarding the straw2 change causing 12% data movement -- in this case
it is a bit more than I would have expected.

-- dan



On Mon, Jan 14, 2019 at 3:40 PM Massimo Sgaravatto
 wrote:
>
> Hi Dan
>
> I have indeed at the moment only 5 OSD nodes on 3 racks.
> The crush-map is attached.
> Are you suggesting to replicate only between nodes and not between racks 
> (since the very few resources) ?
> Thanks, Massimo
>
> On Mon, Jan 14, 2019 at 3:29 PM Dan van der Ster  wrote:
>>
>> On Mon, Jan 14, 2019 at 3:18 PM Massimo Sgaravatto
>>  wrote:
>> >
>> > Thanks for the prompt reply
>> >
>> > Indeed I have different racks with different weights.
>>
>> Are you sure you're replicating across racks? You have only 3 racks,
>> one of which is half the size of the other two -- if yes, then your
>> cluster will be full once that rack is full.
>>
>> -- dan
>>
>>
>> > Below the ceph osd tree" output
>> >
>> > [root@ceph-mon-01 ~]# ceph osd tree
>> > ID CLASS WEIGHTTYPE NAME STATUS REWEIGHT PRI-AFF
>> > -1   272.80426 root default
>> > -7   109.12170 rack Rack11-PianoAlto
>> > -854.56085 host ceph-osd-04
>> > 30   hdd   5.45609 osd.30up  1.0 1.0
>> > 31   hdd   5.45609 osd.31up  1.0 1.0
>> > 32   hdd   5.45609 osd.32up  1.0 1.0
>> > 33   hdd   5.45609 osd.33up  1.0 1.0
>> > 34   hdd   5.45609 osd.34up  1.0 1.0
>> > 35   hdd   5.45609 osd.35up  1.0 1.0
>> > 36   hdd   5.45609 osd.36up  1.0 1.0
>> > 37   hdd   5.45609 osd.37up  1.0 1.0
>> > 38   hdd   5.45609 osd.38up  1.0 1.0
>> > 39   hdd   5.45609 osd.39up  1.0 1.0
>> > -954.56085 host ceph-osd-05
>> > 40   hdd   5.45609 osd.40up  1.0 1.0
>> > 41   hdd   5.45609 osd.41up  1.0 1.0
>> > 42   hdd   5.45609 osd.42up  1.0 1.0
>> > 43   hdd   5.45609 osd.43up  1.0 1.0
>> > 44   hdd   5.45609 osd.44up  1.0 1.0
>> > 45   hdd   5.45609 osd.45up  1.0 1.0
>> > 46   hdd   5.45609 osd.46up  1.0 1.0
>> > 47   hdd   5.45609 osd.47up  1.0 1.0
>> > 48   hdd   5.45609 osd.48up  1.0 1.0
>> > 49   hdd   5.45609 osd.49up  1.0 1.0
>> > -6   109.12170 rack Rack15-PianoAlto
>> > -354.56085 host ceph-osd-02
>> > 10   hdd   5.45609 osd.10up  1.0 1.0
>> > 11   hdd   5.45609 osd.11up  1.0 1.0
>> > 12   hdd   5.45609 osd.12up  1.0 1.0
>> > 13   hdd   5.45609 osd.13up  1.0 1.0
>> > 14   hdd   5.45609 osd.14up  1.0 1.0
>> > 15   hdd   5.45609 osd.15up  1.0 1.0
>> > 16   hdd   5.45609 osd.16up  1.0 1.0
>> > 17   hdd   5.45609 osd.17up  1.0 1.0
>> > 18   hdd   5.45609 osd.18up  1.0 1.0
>> > 19   hdd   5.45609 osd.19up  1.0 1.0
>> > -454.56085 host ceph-osd-03
>> > 20   hdd   5.45609 osd.20up  1.0 1.0
>> > 21   hdd   5.45609 osd.21up  1.0 1.0
>> > 22   hdd   5.45609 osd.22up  1.0 1.0
>> > 23   hdd   5.45609 osd.23up  1.0 1.0
>> > 24   hdd   5.45609 osd.24up  1.0 1.0
>> > 25   hdd   5.45609 osd.25up  1.0 1.0
>> > 26   hdd   5.45609 osd.26up  1.0 1.0
>> > 27   hdd   5.45609 osd.27up  1.0 1.0
>> > 28   hdd   5.45609 osd.28up  1.0 1.0
>> > 29   hdd   5.45609 osd.29up  1.0 1.0
>> > -554.56085 rack Rack17-PianoAlto
>> > -254.56085 host ceph-osd-01
>> >  0   hdd   5.45609 osd.0 up  1.0 1.0
>> >  1   hdd   5.45609 osd.1 up  1.0 1.0
>> >  2   hdd   5.45609 osd.2 up  1.0 1.0
>> >  3   hdd   5.45609 osd.3 up  1.0 1.0
>> >  4   hdd   5.45609 osd.4 up  1.000

Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Kenneth Van Alstyne
Thanks for the reply Jason — I was actually thinking of emailing you directly, 
but thought it may be beneficial to keep the conversation to the list so that 
everyone can see the thread.   Can you think of a reason why one-way RBD 
mirroring would not work to a shared tertiary cluster?  I need to build out a 
test lab to see how that would work for us.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 12, 2019, at 4:01 PM, Jason Dillaman 
mailto:jdill...@redhat.com>> wrote:

On Fri, Jan 11, 2019 at 2:09 PM Kenneth Van Alstyne
mailto:kvanalst...@knightpoint.com>> wrote:

Hello all (and maybe this would be better suited for the ceph devel mailing 
list):
I’d like to use RBD mirroring between two sites (to each other), but I have the 
following limitations:
- The clusters use the same name (“ceph”)

That's actually not an issue. The "ceph" name is used to locate
configuration files for RBD mirroring (a la
/etc/ceph/.conf and
/etc/ceph/.client..keyring). You just need to map
that cluster config file name to the remote cluster name in the RBD
mirroring configuration. Additionally, starting with Nautilus, the
configuration details for connecting to a remote cluster can now be
stored in the monitor (via the rbd CLI and dashbaord), so there won't
be any need to fiddle with configuration files for remote clusters
anymore.

- The clusters share IP address space on a private, non-routed storage network

Unfortunately, that is an issue since the rbd-mirror daemon needs to
be able to connect to both clusters. If the two clusters are at least
on different subnets and your management servers can talk to each
side, you might be able to run the rbd-mirror daemon there.


There are management servers on each side that can talk to the respective 
storage networks, but the storage networks cannot talk directly to each other.  
I recall reading, some years back, of possibly adding support for an RBD mirror 
proxy, which would potentially solve my issues.  Has anything been done in this 
regard?

No, I haven't really seen much demand for such support so it's never
bubbled up as a priority yet.

If not, is my best bet perhaps a tertiary clusters that both can reach and do 
one-way replication to?

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-14 Thread Daniel Gryniewicz

Hi.  Welcome to the community.

On 01/14/2019 07:56 AM, David C wrote:

Hi All

I've been playing around with the nfs-ganesha 2.7 exporting a cephfs 
filesystem, it seems to be working pretty well so far. A few questions:


1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a 
libcephfs client,..." [1]. For arguments sake, if I have ten top level 
dirs in my Cephfs namespace, is there any value in creating a separate 
export for each directory? Will that potentially give me better 
performance than a single export of the entire namespace?


I don't believe there are any advantages from the Ceph side.  From the 
Ganesha side, you configure permissions, client ACLs, squashing, and so 
on on a per-export basis, so you'll need different exports if you need 
different settings for each top level directory.  If they can all use 
the same settings, one export is probably better.




2) Tuning: are there any recommended parameters to tune? So far I've 
found I had to increase client_oc_size which seemed quite conservative.


Ganesha is just a standard libcephfs client, so any tuning you'd make on 
any other cephfs client also applies to Ganesha.  I'm not aware of 
anything in particular, but I've never deployed it for anything other 
than testing.


Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-14 Thread Paul Emmerich
We've found that more aggressive prefetching in the Ceph client can
help with some poorly behaving legacy applications (don't know the
option off the top of my head but it's documented).
It can also be useful to disable logging (even the in-memory logs) if
you do a lot IOPS (that's debug client and debug ms mostly).

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jan 14, 2019 at 4:11 PM Daniel Gryniewicz  wrote:
>
> Hi.  Welcome to the community.
>
> On 01/14/2019 07:56 AM, David C wrote:
> > Hi All
> >
> > I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
> > filesystem, it seems to be working pretty well so far. A few questions:
> >
> > 1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a
> > libcephfs client,..." [1]. For arguments sake, if I have ten top level
> > dirs in my Cephfs namespace, is there any value in creating a separate
> > export for each directory? Will that potentially give me better
> > performance than a single export of the entire namespace?
>
> I don't believe there are any advantages from the Ceph side.  From the
> Ganesha side, you configure permissions, client ACLs, squashing, and so
> on on a per-export basis, so you'll need different exports if you need
> different settings for each top level directory.  If they can all use
> the same settings, one export is probably better.
>
> >
> > 2) Tuning: are there any recommended parameters to tune? So far I've
> > found I had to increase client_oc_size which seemed quite conservative.
>
> Ganesha is just a standard libcephfs client, so any tuning you'd make on
> any other cephfs client also applies to Ganesha.  I'm not aware of
> anything in particular, but I've never deployed it for anything other
> than testing.
>
> Daniel
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Jason Dillaman
On Mon, Jan 14, 2019 at 10:10 AM Kenneth Van Alstyne
 wrote:
>
> Thanks for the reply Jason — I was actually thinking of emailing you 
> directly, but thought it may be beneficial to keep the conversation to the 
> list so that everyone can see the thread.   Can you think of a reason why 
> one-way RBD mirroring would not work to a shared tertiary cluster?  I need to 
> build out a test lab to see how that would work for us.
>

I guess I don't understand what the tertiary cluster is doing? If the
goal is to replicate from cluster A -> cluster B -> cluster C, that is
not currently supported since (by design choice) we don't currently
re-write the RBD image journal entries from the source cluster to the
destination cluster but instead just directly apply the journal
entries to the destination image (to save IOPS).

> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 f: 571-266-3106
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole use 
> of the intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, copy, use, disclosure, or distribution 
> is STRICTLY prohibited. If you are not the intended recipient, please contact 
> the sender by reply e-mail and destroy all copies of the original message.
>
> On Jan 12, 2019, at 4:01 PM, Jason Dillaman  wrote:
>
> On Fri, Jan 11, 2019 at 2:09 PM Kenneth Van Alstyne
>  wrote:
>
>
> Hello all (and maybe this would be better suited for the ceph devel mailing 
> list):
> I’d like to use RBD mirroring between two sites (to each other), but I have 
> the following limitations:
> - The clusters use the same name (“ceph”)
>
>
> That's actually not an issue. The "ceph" name is used to locate
> configuration files for RBD mirroring (a la
> /etc/ceph/.conf and
> /etc/ceph/.client..keyring). You just need to map
> that cluster config file name to the remote cluster name in the RBD
> mirroring configuration. Additionally, starting with Nautilus, the
> configuration details for connecting to a remote cluster can now be
> stored in the monitor (via the rbd CLI and dashbaord), so there won't
> be any need to fiddle with configuration files for remote clusters
> anymore.
>
> - The clusters share IP address space on a private, non-routed storage network
>
>
> Unfortunately, that is an issue since the rbd-mirror daemon needs to
> be able to connect to both clusters. If the two clusters are at least
> on different subnets and your management servers can talk to each
> side, you might be able to run the rbd-mirror daemon there.
>
>
> There are management servers on each side that can talk to the respective 
> storage networks, but the storage networks cannot talk directly to each 
> other.  I recall reading, some years back, of possibly adding support for an 
> RBD mirror proxy, which would potentially solve my issues.  Has anything been 
> done in this regard?
>
>
> No, I haven't really seen much demand for such support so it's never
> bubbled up as a priority yet.
>
> If not, is my best bet perhaps a tertiary clusters that both can reach and do 
> one-way replication to?
>
> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 f: 571-266-3106
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole use 
> of the intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, copy, use, disclosure, or distribution 
> is STRICTLY prohibited. If you are not the intended recipient, please contact 
> the sender by reply e-mail and destroy all copies of the original message.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Jason
>
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Kenneth Van Alstyne
In this case, I’m imagining Clusters A/B both having write access to a third 
“Cluster C”.  So A/B -> C rather than A -> C -> B / B -> C -> A / A -> B-> C.  
I admit, in the event that I need to replicate back to either primary cluster, 
there may be challenges.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 14, 2019, at 9:50 AM, Jason Dillaman 
mailto:jdill...@redhat.com>> wrote:

On Mon, Jan 14, 2019 at 10:10 AM Kenneth Van Alstyne
mailto:kvanalst...@knightpoint.com>> wrote:

Thanks for the reply Jason — I was actually thinking of emailing you directly, 
but thought it may be beneficial to keep the conversation to the list so that 
everyone can see the thread.   Can you think of a reason why one-way RBD 
mirroring would not work to a shared tertiary cluster?  I need to build out a 
test lab to see how that would work for us.


I guess I don't understand what the tertiary cluster is doing? If the
goal is to replicate from cluster A -> cluster B -> cluster C, that is
not currently supported since (by design choice) we don't currently
re-write the RBD image journal entries from the source cluster to the
destination cluster but instead just directly apply the journal
entries to the destination image (to save IOPS).

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 12, 2019, at 4:01 PM, Jason Dillaman  wrote:

On Fri, Jan 11, 2019 at 2:09 PM Kenneth Van Alstyne
 wrote:


Hello all (and maybe this would be better suited for the ceph devel mailing 
list):
I’d like to use RBD mirroring between two sites (to each other), but I have the 
following limitations:
- The clusters use the same name (“ceph”)


That's actually not an issue. The "ceph" name is used to locate
configuration files for RBD mirroring (a la
/etc/ceph/.conf and
/etc/ceph/.client..keyring). You just need to map
that cluster config file name to the remote cluster name in the RBD
mirroring configuration. Additionally, starting with Nautilus, the
configuration details for connecting to a remote cluster can now be
stored in the monitor (via the rbd CLI and dashbaord), so there won't
be any need to fiddle with configuration files for remote clusters
anymore.

- The clusters share IP address space on a private, non-routed storage network


Unfortunately, that is an issue since the rbd-mirror daemon needs to
be able to connect to both clusters. If the two clusters are at least
on different subnets and your management servers can talk to each
side, you might be able to run the rbd-mirror daemon there.


There are management servers on each side that can talk to the respective 
storage networks, but the storage networks cannot talk directly to each other.  
I recall reading, some years back, of possibly adding support for an RBD mirror 
proxy, which would potentially solve my issues.  Has anything been done in this 
regard?


No, I haven't really seen much demand for such support so it's never
bubbled up as a priority yet.

If not, is my best bet perhaps a tertiary clusters that both can reach and do 
one-way replication to?

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use 

Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Jason Dillaman
On Mon, Jan 14, 2019 at 11:09 AM Kenneth Van Alstyne
 wrote:
>
> In this case, I’m imagining Clusters A/B both having write access to a third 
> “Cluster C”.  So A/B -> C rather than A -> C -> B / B -> C -> A / A -> B-> C. 
>  I admit, in the event that I need to replicate back to either primary 
> cluster, there may be challenges.

While this is possible, in addition to the failback question, you
would also need to use unique pool names in clusters A and B since on
cluster C you are currently prevented from adding more than a single
peer per pool.

> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 f: 571-266-3106
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole use 
> of the intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, copy, use, disclosure, or distribution 
> is STRICTLY prohibited. If you are not the intended recipient, please contact 
> the sender by reply e-mail and destroy all copies of the original message.
>
> On Jan 14, 2019, at 9:50 AM, Jason Dillaman  wrote:
>
> On Mon, Jan 14, 2019 at 10:10 AM Kenneth Van Alstyne
>  wrote:
>
>
> Thanks for the reply Jason — I was actually thinking of emailing you 
> directly, but thought it may be beneficial to keep the conversation to the 
> list so that everyone can see the thread.   Can you think of a reason why 
> one-way RBD mirroring would not work to a shared tertiary cluster?  I need to 
> build out a test lab to see how that would work for us.
>
>
> I guess I don't understand what the tertiary cluster is doing? If the
> goal is to replicate from cluster A -> cluster B -> cluster C, that is
> not currently supported since (by design choice) we don't currently
> re-write the RBD image journal entries from the source cluster to the
> destination cluster but instead just directly apply the journal
> entries to the destination image (to save IOPS).
>
> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 f: 571-266-3106
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole use 
> of the intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, copy, use, disclosure, or distribution 
> is STRICTLY prohibited. If you are not the intended recipient, please contact 
> the sender by reply e-mail and destroy all copies of the original message.
>
> On Jan 12, 2019, at 4:01 PM, Jason Dillaman  wrote:
>
> On Fri, Jan 11, 2019 at 2:09 PM Kenneth Van Alstyne
>  wrote:
>
>
> Hello all (and maybe this would be better suited for the ceph devel mailing 
> list):
> I’d like to use RBD mirroring between two sites (to each other), but I have 
> the following limitations:
> - The clusters use the same name (“ceph”)
>
>
> That's actually not an issue. The "ceph" name is used to locate
> configuration files for RBD mirroring (a la
> /etc/ceph/.conf and
> /etc/ceph/.client..keyring). You just need to map
> that cluster config file name to the remote cluster name in the RBD
> mirroring configuration. Additionally, starting with Nautilus, the
> configuration details for connecting to a remote cluster can now be
> stored in the monitor (via the rbd CLI and dashbaord), so there won't
> be any need to fiddle with configuration files for remote clusters
> anymore.
>
> - The clusters share IP address space on a private, non-routed storage network
>
>
> Unfortunately, that is an issue since the rbd-mirror daemon needs to
> be able to connect to both clusters. If the two clusters are at least
> on different subnets and your management servers can talk to each
> side, you might be able to run the rbd-mirror daemon there.
>
>
> There are management servers on each side that can talk to the respective 
> storage networks, but the storage networks cannot talk directly to each 
> other.  I recall reading, some years back, of possibly adding support for an 
> RBD mirror proxy, which would potentially solve my issues.  Has anything been 
> done in this regard?
>
>
> No, I haven't really seen much demand for such support so it's never
> bubbled up as a priority yet.
>
> If not, is my best bet perhaps a tertiary clusters that both can reach and do 
> one-way replication to?
>
> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-

Re: [ceph-users] RBD Mirror Proxy Support?

2019-01-14 Thread Kenneth Van Alstyne
D’oh!  I was hoping that the destination pools could be unique names, 
regardless of the source pool name.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 14, 2019, at 11:07 AM, Jason Dillaman 
mailto:jdill...@redhat.com>> wrote:

On Mon, Jan 14, 2019 at 11:09 AM Kenneth Van Alstyne
mailto:kvanalst...@knightpoint.com>> wrote:

In this case, I’m imagining Clusters A/B both having write access to a third 
“Cluster C”.  So A/B -> C rather than A -> C -> B / B -> C -> A / A -> B-> C.  
I admit, in the event that I need to replicate back to either primary cluster, 
there may be challenges.

While this is possible, in addition to the failback question, you
would also need to use unique pool names in clusters A and B since on
cluster C you are currently prevented from adding more than a single
peer per pool.

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 14, 2019, at 9:50 AM, Jason Dillaman  wrote:

On Mon, Jan 14, 2019 at 10:10 AM Kenneth Van Alstyne
 wrote:


Thanks for the reply Jason — I was actually thinking of emailing you directly, 
but thought it may be beneficial to keep the conversation to the list so that 
everyone can see the thread.   Can you think of a reason why one-way RBD 
mirroring would not work to a shared tertiary cluster?  I need to build out a 
test lab to see how that would work for us.


I guess I don't understand what the tertiary cluster is doing? If the
goal is to replicate from cluster A -> cluster B -> cluster C, that is
not currently supported since (by design choice) we don't currently
re-write the RBD image journal entries from the source cluster to the
destination cluster but instead just directly apply the journal
entries to the destination image (to save IOPS).

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 9001 / ISO 2 / ISO 27001 / CMMI Level 3

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

On Jan 12, 2019, at 4:01 PM, Jason Dillaman  wrote:

On Fri, Jan 11, 2019 at 2:09 PM Kenneth Van Alstyne
 wrote:


Hello all (and maybe this would be better suited for the ceph devel mailing 
list):
I’d like to use RBD mirroring between two sites (to each other), but I have the 
following limitations:
- The clusters use the same name (“ceph”)


That's actually not an issue. The "ceph" name is used to locate
configuration files for RBD mirroring (a la
/etc/ceph/.conf and
/etc/ceph/.client..keyring). You just need to map
that cluster config file name to the remote cluster name in the RBD
mirroring configuration. Additionally, starting with Nautilus, the
configuration details for connecting to a remote cluster can now be
stored in the monitor (via the rbd CLI and dashbaord), so there won't
be any need to fiddle with configuration files for remote clusters
anymore.

- The clusters share IP address space on a private, non-routed storage network


Unfortunately, that is an issue since the rbd

[ceph-users] Bionic Upgrade 12.2.10

2019-01-14 Thread Scottix
Hey,
I am having some issues upgrading to 12.2.10 on my 18.04 server. It is
saying 12.2.8 is the latest.
I am not sure why it is not going to 12.2.10, also the rest of my
cluster is already in 12.2.10 except this one machine.

$ cat /etc/apt/sources.list.d/ceph.list
deb https://download.ceph.com/debian-luminous/ bionic main

$ apt update
Hit:1 http://us.archive.ubuntu.com/ubuntu bionic InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:4 http://us.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:5 https://download.ceph.com/debian-luminous bionic InRelease

$ apt-cache policy ceph-osd
ceph-osd:
  Installed: 12.2.8-0ubuntu0.18.04.1
  Candidate: 12.2.8-0ubuntu0.18.04.1
  Version table:
 *** 12.2.8-0ubuntu0.18.04.1 500
500 http://us.archive.ubuntu.com/ubuntu bionic-updates/main
amd64 Packages
100 /var/lib/dpkg/status
 12.2.4-0ubuntu1 500
500 http://us.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Any help on what could be the issue.

Thanks,
Scott

-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-14 Thread Reed Dier
This is because Luminous is not being built for Bionic for whatever reason.
There are some other mailing list entries detailing this.

Right now you have ceph installed from the Ubuntu bionic-updates repo, which 
has 12.2.8, but does not get regular release updates.

This is what I ended up having to do for my ceph nodes that were upgraded from 
Xenial to Bionic, as well as new ceph nodes that installed straight to Bionic, 
due to the repo issues. Even if you try to use the xenial packages, you will 
run into issues with libcurl4 and libcurl3 I imagine.

Reed

> On Jan 14, 2019, at 12:21 PM, Scottix  wrote:
> 
> https://download.ceph.com/debian-luminous/ 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Offsite replication scenario

2019-01-14 Thread Gregory Farnum
On Fri, Jan 11, 2019 at 10:07 PM Brian Topping 
wrote:

> Hi all,
>
> I have a simple two-node Ceph cluster that I’m comfortable with the care
> and feeding of. Both nodes are in a single rack and captured in the
> attached dump, it has two nodes, only one mon, all pools size 2. Due to
> physical limitations, the primary location can’t move past two nodes at the
> present time. As far as hardware, those two nodes are 18-core Xeon with
> 128GB RAM and connected with 10GbE.
>
> My next goal is to add an offsite replica and would like to validate the
> plan I have in mind. For it’s part, the offsite replica can be considered
> read-only except for the occasional snapshot in order to run backups to
> tape. The offsite location is connected with a reliable and secured
> ~350Kbps WAN link.
>

Unfortunately this is just not going to work. All writes to a Ceph OSD are
replicated synchronously to every replica, all reads are served from the
primary OSD for any given piece of data, and unless you do some hackery on
your CRUSH map each of your 3 OSD nodes is going to be a primary for about
1/3 of the total data.

If you want to move your data off-site asynchronously, there are various
options for doing that in RBD (either periodic snapshots and export-diff,
or by maintaining a journal and streaming it out) and RGW (with the
multi-site stuff). But you're not going to be successful trying to stretch
a Ceph cluster over that link.
-Greg


>
> The following presuppositions bear challenge:
>
> * There is only a single mon at the present time, which could be expanded
> to three with the offsite location. Two mons at the primary location is
> obviously a lower MTBF than one, but  with a third one on the other side of
> the WAN, I could create resiliency against *either* a WAN failure or a
> single node maintenance event.
> * Because there are two mons at the primary location and one at the
> offsite, the degradation mode for a WAN loss (most likely scenario due to
> facility support) leaves the primary nodes maintaining the quorum, which is
> desirable.
> * It’s clear that a WAN failure and a mon failure at the primary location
> will halt cluster access.
> * The CRUSH maps will be managed to reflect the topology change.
>
> If that’s a good capture so far, I’m comfortable with it. What I don’t
> understand is what to expect in actual use:
>
> * Is the link speed asymmetry between the two primary nodes and the
> offsite node going to create significant risk or unexpected behaviors?
> * Will the performance of the two primary nodes be limited to the speed
> that the offsite mon can participate? Or will the primary mons correctly
> calculate they have quorum and keep moving forward under normal operation?
> * In the case of an extended WAN outage (and presuming full uptime on
> primary site mons), would return to full cluster health be simply a matter
> of time? Are there any limits on how long the WAN could be down if the
> other two maintain quorum?
>
> I hope I’m asking the right questions here. Any feedback appreciated,
> including blogs and RTFM pointers.
>
>
> Thanks for a great product!! I’m really excited for this next frontier!
>
> Brian
>
> > [root@gw01 ~]# ceph -s
> >  cluster:
> >id: 
> >health: HEALTH_OK
> >
> >  services:
> >mon: 1 daemons, quorum gw01
> >mgr: gw01(active)
> >mds: cephfs-1/1/1 up  {0=gw01=up:active}
> >osd: 8 osds: 8 up, 8 in
> >
> >  data:
> >pools:   3 pools, 380 pgs
> >objects: 172.9 k objects, 11 GiB
> >usage:   30 GiB used, 5.8 TiB / 5.8 TiB avail
> >pgs: 380 active+clean
> >
> >  io:
> >client:   612 KiB/s wr, 0 op/s rd, 50 op/s wr
> >
> > [root@gw01 ~]# ceph df
> > GLOBAL:
> >SIZEAVAIL   RAW USED %RAW USED
> >5.8 TiB 5.8 TiB   30 GiB  0.51
> > POOLS:
> >NAMEID USED%USED MAX AVAIL
>  OBJECTS
> >cephfs_metadata 2  264 MiB 0   2.7 TiB
> 1085
> >cephfs_data 3  8.3 GiB  0.29   2.7 TiB
> 171283
> >rbd 4  2.0 GiB  0.07   2.7 TiB
>  542
> > [root@gw01 ~]# ceph osd tree
> > ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF
> > -1   5.82153 root default
> > -3   2.91077 host gw01
> > 0   ssd 0.72769 osd.0 up  1.0 1.0
> > 2   ssd 0.72769 osd.2 up  1.0 1.0
> > 4   ssd 0.72769 osd.4 up  1.0 1.0
> > 6   ssd 0.72769 osd.6 up  1.0 1.0
> > -5   2.91077 host gw02
> > 1   ssd 0.72769 osd.1 up  1.0 1.0
> > 3   ssd 0.72769 osd.3 up  1.0 1.0
> > 5   ssd 0.72769 osd.5 up  1.0 1.0
> > 7   ssd 0.72769 osd.7 up  1.0 1.0
> > [root@gw01 ~]# ceph osd df
> > ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE VAR  PGS
> > 0   ssd 0.72769  1.0 745 GiB 4.9 GiB 740 GiB 0.66 1.29 115
> > 2   ssd 0.72769  1.0 745 GiB 3.1 G

Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-14 Thread Scottix
Wow OK.
I wish there was some official stance on this.

Now I got to remove those OSDs, downgrade to 16.04 and re-add them,
this is going to take a while.

--Scott

On Mon, Jan 14, 2019 at 10:53 AM Reed Dier  wrote:
>
> This is because Luminous is not being built for Bionic for whatever reason.
> There are some other mailing list entries detailing this.
>
> Right now you have ceph installed from the Ubuntu bionic-updates repo, which 
> has 12.2.8, but does not get regular release updates.
>
> This is what I ended up having to do for my ceph nodes that were upgraded 
> from Xenial to Bionic, as well as new ceph nodes that installed straight to 
> Bionic, due to the repo issues. Even if you try to use the xenial packages, 
> you will run into issues with libcurl4 and libcurl3 I imagine.
>
> Reed
>
> On Jan 14, 2019, at 12:21 PM, Scottix  wrote:
>
> https://download.ceph.com/debian-luminous/
>
>


-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Stefan Priebe - Profihost AG
Hi,

while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm experience
issues with bluestore osds - so i canceled the upgrade and all bluestore
osds are stopped now.

After starting a bluestore osd i'm seeing a lot of slow requests caused
by very high read rates.


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda  45,00   187,00  767,00   39,00 482040,00  8660,00
1217,6258,16   74,60   73,85   89,23   1,24 100,00

it reads permanently with 500MB/s from the disk and can't service client
requests. Overall client read rate is at 10.9MiB/s rd

I can't reproduce this with 12.2.8. Is this a known bug / regression?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Paul Emmerich
What's the output of "ceph daemon osd. status" on one of the OSDs
while it's starting?

Is the OSD crashing and being restarted all the time? Anything weird
in the log files? Was there recovery or backfill during the upgrade?

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Jan 14, 2019 at 8:35 PM Stefan Priebe - Profihost AG
 wrote:
>
> Hi,
>
> while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm experience
> issues with bluestore osds - so i canceled the upgrade and all bluestore
> osds are stopped now.
>
> After starting a bluestore osd i'm seeing a lot of slow requests caused
> by very high read rates.
>
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda  45,00   187,00  767,00   39,00 482040,00  8660,00
> 1217,6258,16   74,60   73,85   89,23   1,24 100,00
>
> it reads permanently with 500MB/s from the disk and can't service client
> requests. Overall client read rate is at 10.9MiB/s rd
>
> I can't reproduce this with 12.2.8. Is this a known bug / regression?
>
> Greets,
> Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore SPDK OSD

2019-01-14 Thread Yanko Davila
Hello 

 I was able to find the device selector. Now I have an issue understanding the 
steps to activate the osd. Once I setup spdk the device disappears from lsblk 
as expected. So the ceph manual is not very helpful after spdk is enabled. Is 
there any manual that walks you through the steps to add an spdk nvme to ceph 
?? Thanks again for your time. 

Yanko.

smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Stefan Priebe - Profihost AG
Hi Paul,

Am 14.01.19 um 21:39 schrieb Paul Emmerich:
> What's the output of "ceph daemon osd. status" on one of the OSDs
> while it's starting?


{
"cluster_fsid": "b338193d-39e0-40e9-baba-4965ef3868a3",
"osd_fsid": "d95d0e3b-7441-4ab0-869c-fe0551d3bd52",
"whoami": 2,
"state": "active",
"oldest_map": 1313325,
"newest_map": 1314026,
"num_pgs": 212
}


> Is the OSD crashing and being restarted all the time?

no it's just running

> Anything weird in the log files

no

> Was there recovery or backfill during the upgrade?

No - it was clean and healthy.

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Offsite replication scenario

2019-01-14 Thread Brian Topping
Ah! Makes perfect sense now. Thanks!! 

Sent from my iPhone

> On Jan 14, 2019, at 12:30, Gregory Farnum  wrote:
> 
>> On Fri, Jan 11, 2019 at 10:07 PM Brian Topping  
>> wrote:
>> Hi all,
>> 
>> I have a simple two-node Ceph cluster that I’m comfortable with the care and 
>> feeding of. Both nodes are in a single rack and captured in the attached 
>> dump, it has two nodes, only one mon, all pools size 2. Due to physical 
>> limitations, the primary location can’t move past two nodes at the present 
>> time. As far as hardware, those two nodes are 18-core Xeon with 128GB RAM 
>> and connected with 10GbE. 
>> 
>> My next goal is to add an offsite replica and would like to validate the 
>> plan I have in mind. For it’s part, the offsite replica can be considered 
>> read-only except for the occasional snapshot in order to run backups to 
>> tape. The offsite location is connected with a reliable and secured ~350Kbps 
>> WAN link. 
> 
> Unfortunately this is just not going to work. All writes to a Ceph OSD are 
> replicated synchronously to every replica, all reads are served from the 
> primary OSD for any given piece of data, and unless you do some hackery on 
> your CRUSH map each of your 3 OSD nodes is going to be a primary for about 
> 1/3 of the total data.
> 
> If you want to move your data off-site asynchronously, there are various 
> options for doing that in RBD (either periodic snapshots and export-diff, or 
> by maintaining a journal and streaming it out) and RGW (with the multi-site 
> stuff). But you're not going to be successful trying to stretch a Ceph 
> cluster over that link.
> -Greg
>  
>> 
>> The following presuppositions bear challenge:
>> 
>> * There is only a single mon at the present time, which could be expanded to 
>> three with the offsite location. Two mons at the primary location is 
>> obviously a lower MTBF than one, but  with a third one on the other side of 
>> the WAN, I could create resiliency against *either* a WAN failure or a 
>> single node maintenance event. 
>> * Because there are two mons at the primary location and one at the offsite, 
>> the degradation mode for a WAN loss (most likely scenario due to facility 
>> support) leaves the primary nodes maintaining the quorum, which is 
>> desirable. 
>> * It’s clear that a WAN failure and a mon failure at the primary location 
>> will halt cluster access.
>> * The CRUSH maps will be managed to reflect the topology change.
>> 
>> If that’s a good capture so far, I’m comfortable with it. What I don’t 
>> understand is what to expect in actual use:
>> 
>> * Is the link speed asymmetry between the two primary nodes and the offsite 
>> node going to create significant risk or unexpected behaviors?
>> * Will the performance of the two primary nodes be limited to the speed that 
>> the offsite mon can participate? Or will the primary mons correctly 
>> calculate they have quorum and keep moving forward under normal operation?
>> * In the case of an extended WAN outage (and presuming full uptime on 
>> primary site mons), would return to full cluster health be simply a matter 
>> of time? Are there any limits on how long the WAN could be down if the other 
>> two maintain quorum?
>> 
>> I hope I’m asking the right questions here. Any feedback appreciated, 
>> including blogs and RTFM pointers.
>> 
>> 
>> Thanks for a great product!! I’m really excited for this next frontier!
>> 
>> Brian
>> 
>> > [root@gw01 ~]# ceph -s
>> >  cluster:
>> >id: 
>> >health: HEALTH_OK
>> > 
>> >  services:
>> >mon: 1 daemons, quorum gw01
>> >mgr: gw01(active)
>> >mds: cephfs-1/1/1 up  {0=gw01=up:active}
>> >osd: 8 osds: 8 up, 8 in
>> > 
>> >  data:
>> >pools:   3 pools, 380 pgs
>> >objects: 172.9 k objects, 11 GiB
>> >usage:   30 GiB used, 5.8 TiB / 5.8 TiB avail
>> >pgs: 380 active+clean
>> > 
>> >  io:
>> >client:   612 KiB/s wr, 0 op/s rd, 50 op/s wr
>> > 
>> > [root@gw01 ~]# ceph df
>> > GLOBAL:
>> >SIZEAVAIL   RAW USED %RAW USED 
>> >5.8 TiB 5.8 TiB   30 GiB  0.51 
>> > POOLS:
>> >NAMEID USED%USED MAX AVAIL OBJECTS 
>> >cephfs_metadata 2  264 MiB 0   2.7 TiB1085 
>> >cephfs_data 3  8.3 GiB  0.29   2.7 TiB  171283 
>> >rbd 4  2.0 GiB  0.07   2.7 TiB 542 
>> > [root@gw01 ~]# ceph osd tree
>> > ID CLASS WEIGHT  TYPE NAME STATUS REWEIGHT PRI-AFF 
>> > -1   5.82153 root default  
>> > -3   2.91077 host gw01 
>> > 0   ssd 0.72769 osd.0 up  1.0 1.0 
>> > 2   ssd 0.72769 osd.2 up  1.0 1.0 
>> > 4   ssd 0.72769 osd.4 up  1.0 1.0 
>> > 6   ssd 0.72769 osd.6 up  1.0 1.0 
>> > -5   2.91077 host gw02 
>> > 1   ssd 0.72769 osd.1 up  1.0

Re: [ceph-users] slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

2019-01-14 Thread Mark Nelson

Hi Stefan,


Any idea if the reads are constant or bursty?  One cause of heavy reads 
is when rocksdb is compacting and has to read SST files from disk.  It's 
also possible you could see heavy read traffic during writes if data has 
to be read from SST files rather than cache. It's possible this could be 
related to the osd_memory_autotune feature.  It will try to keep OSD 
memory usage within a certain footprint (4GB by default) which 
supercedes the bluestore cache size (it automatically sets the cache 
size based on the osd_memory_target).



To see what's happening during compaction, you can run this script 
against one of your bluestore OSD logs:


https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py


Mark

On 1/14/19 1:35 PM, Stefan Priebe - Profihost AG wrote:

Hi,

while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm experience
issues with bluestore osds - so i canceled the upgrade and all bluestore
osds are stopped now.

After starting a bluestore osd i'm seeing a lot of slow requests caused
by very high read rates.


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda  45,00   187,00  767,00   39,00 482040,00  8660,00
1217,6258,16   74,60   73,85   89,23   1,24 100,00

it reads permanently with 500MB/s from the disk and can't service client
requests. Overall client read rate is at 10.9MiB/s rd

I can't reproduce this with 12.2.8. Is this a known bug / regression?

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Segfaults on 12.2.9 and 12.2.8

2019-01-14 Thread Glen Baars
Hello Ceph users,

I am chasing an issue that is affecting one of our clusters across various 
Nodes / OSDs. The cluster is around 150 OSDs / 9 nodes and running Ceph 12.2.8 
and 12.2.9.

We have three ceph clusters around this size, two are Centos 7.5 based and one 
is Ubuntu 16.04. This segfault only occurs on our Ubuntu cluster. It is 
happening across different hardware configs / drive types ( brand, SSD / HDD 
etc ).

Does anyone have any ideas to try and track this down?

Below is a snip from the OSD log
---

   -11> 2019-01-14 06:11:03.317623 7fec6e596700  5 write_log_and_missing with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 
109398'1041726, trimmed: , trimmed_dups: , clear_divergent_priors: 0
   -10> 2019-01-14 06:11:03.317779 7fec6e596700  5 write_log_and_missing with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 
109398'1041727, trimmed: , trimmed_dups: , clear_divergent_priors: 0
-9> 2019-01-14 06:11:03.317944 7fec79dad700  1 -- 10.4.36.38:6808/547667 
--> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020224 1.917 
e109398/105174 ondisk, result = 0) v2 -- 0x5556f3abe780 con 0
-8> 2019-01-14 06:11:03.318221 7fec79dad700  1 -- 10.4.36.38:6808/547667 
--> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020225 1.917 
e109398/105174 ondisk, result = 0) v2 -- 0x55569f765700 con 0
-7> 2019-01-14 06:11:03.318293 7fec79dad700  1 -- 10.4.36.38:6808/547667 
--> 10.4.36.36:6804/5411 -- osd_repop_reply(osd.131.0:32020226 1.917 
e109398/105174 ondisk, result = 0) v2 -- 0x5556a709a300 con 0
-6> 2019-01-14 06:11:03.318342 7fec87db3700  5 -- 10.4.36.38:6808/547667 >> 
10.4.36.36:6804/5411 conn(0x5556a0a64000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=26010 cs=1 l=0). rx osd.131 
seq 354167 0x5557575ece00 os
d_repop(osd.131.0:32020228 1.917 e109398/105174) v2
-5> 2019-01-14 06:11:03.318369 7fec87db3700  1 -- 10.4.36.38:6808/547667 
<== osd.131 10.4.36.36:6804/5411 354167  osd_repop(osd.131.0:32020228 1.917 
e109398/105174) v2  1060+0+532 (3817260254 0 922584117) 0x5557575ece00
con 0x5556a0a64000
-4> 2019-01-14 06:11:03.318587 7fec6e596700  5 write_log_and_missing with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 
109398'1041728, trimmed: , trimmed_dups: , clear_divergent_priors: 0
-3> 2019-01-14 06:11:03.322142 7fec87db3700  5 -- 10.4.36.38:6808/547667 >> 
10.4.36.31:6834/14584 conn(0x5556a0ca2800 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=20551 cs=1 l=0). rx osd.31 
seq 1114772 0x55571b85ee00 o
sd_repop(client.229590.0:782182967 1.3ca e109398/105174) v2
-2> 2019-01-14 06:11:03.322165 7fec87db3700  1 -- 10.4.36.38:6808/547667 
<== osd.31 10.4.36.31:6834/14584 1114772  
osd_repop(client.229590.0:782182967 1.3ca e109398/105174) v2  1048+0+5897 
(3029422256 0 514289860) 0x555
71b85ee00 con 0x5556a0ca2800
-1> 2019-01-14 06:11:03.322308 7fec6f598700  5 write_log_and_missing with: 
dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615, writeout_from: 
109398'21605410, trimmed: , trimmed_dups: , clear_divergent_priors: 0
 0> 2019-01-14 06:11:03.326962 7fec6e596700 -1 *** Caught signal 
(Segmentation fault) **
in thread 7fec6e596700 thread_name:tp_osd_tp

ceph version 12.2.9 (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)
1: (()+0xa985a4) [0x55565edc45a4]
2: (()+0x11390) [0x7fec8af2a390]
3: (operator new[](unsigned long)+0xc6) [0x7fec8bd1cff6]
4: (BlueStore::Collection::open_shared_blob(unsigned long, 
boost::intrusive_ptr)+0x3ca) [0x55565ec6875a]
5: 
(BlueStore::ExtentMap::decode_spanning_blobs(ceph::buffer::ptr::iterator&)+0x385)
 [0x55565ec702b5]
6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x6c9) 
[0x55565ec80de9]
7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
ObjectStore::Transaction*)+0x918) [0x55565ecb7b88]
8: (BlueStore::queue_transactions(ObjectStore::Sequencer*, 
std::vector 
>&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x52e) 
[0x55565ecb9f2e]
9: (PrimaryLogPG::queue_transactions(std::vector >&, 
boost::intrusive_ptr)+0x66) [0x55565e9d96a6]
10: (ReplicatedBackend::do_repop(boost::intrusive_ptr)+0xc44) 
[0x55565eb04b24]
11: (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x294) 
[0x55565eb0dd04]
12: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) 
[0x55565ea17f00]
13: (PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0x543) [0x55565e97aec3]
14: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x3a9) [0x55565e7eb999]
15: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
const&)+0x57) [0x55565ea9d577]
16: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0x1047) [0x55565e819db7]
17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) 
[0x55565ee0c1a4]
18: (ShardedThreadPool:

[ceph-users] about python 36

2019-01-14 Thread Will Zhao
Can I use python34 package in python36 environment?  If not , what
should I do to use python34 package in python36 ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds0: Metadata damage detected

2019-01-14 Thread Sergei Shvarts
Hello ceph users!

A couple of days ago I've got a ceph health error - mds0: Metadata damage
detected.
Overall ceph cluster is fine: all pgs are clean, all osds are up and in, no
big problems.
Looks like there is not much information regarding this class of issues, so
I'm writing this message and hope somebody can help me.

here is the damage itself
ceph tell mds.0 damage ls
2019-01-15 07:47:04.651317 7f48c9813700  0 client.312845186 ms_handle_reset
on 192.168.0.5:6801/1186631878
2019-01-15 07:47:04.656991 7f48ca014700  0 client.312845189 ms_handle_reset
on 192.168.0.5:6801/1186631878
[{"damage_type":"dir_frag","id":3472877204,"ino":1100954978087,"frag":"*","path":"\/public\/video\/3h\/3hG6X7\/screen-msmall"}]

Best regards,
Sergei Shvarts
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com