[ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Wladimir Mutel

Dear all,

I am experimenting with Ceph setup. I set up a single node
(Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA HDDs,
Ubuntu 18.04 Bionic, Ceph packages from
http://download.ceph.com/debian-luminous/dists/xenial/
and iscsi parts built manually per
http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/)
Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
in the CRUSH map to run with single host.

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)
mon_host in ceph.conf is set to 192.168.200.230,
and ceph daemons (mgr, mon, osd) are listening to this IP.

What I would like to finally achieve, is to provide multipath
iSCSI access through both these Ethernets to Ceph RBDs,
and apparently, gwcli does not allow me to add a second
gateway to the same target. It is going like this :

/iscsi-target> create iqn.2018-06.host.test:test
ok
/iscsi-target> cd iqn.2018-06.host.test:test/gateways
/iscsi-target...test/gateways> create p10s 192.168.200.230 skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...test/gateways> create p10s2 192.168.200.231 skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
Failed : Gateway creation failed, gateway(s) 
unavailable:192.168.200.231(UNKNOWN state)


host names are defined in /etc/hosts as follows :

192.168.200.230 p10s
192.168.200.231 p10s2

	so I suppose that something does not listen on 192.168.200.231, but I 
don't have an idea what is that thing and how to make it listen there. 
Or how to achieve this goal (utilization of both Ethernets for iSCSI) in 
different way. Shoud I aggregate Ethernets into a 'bond' interface with 
single IP ? Should I build and use 'lrbd' tool instead of 'gwcli' ? Is 
it acceptable that I run kernel 4.15, not 4.16+ ?

What other directions could you give me on this task ?
Thanks in advance for your replies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-01 Thread Marc Roos

I actually tried to search the ML before bringing up this topic. Because 
I do not get the logic choosing this direction.

- Bluestore is created to cut out some fs overhead, 
- everywhere 10Gb is recommended because of better latency. (I even 
posted here something to make ceph better performing with 1Gb eth, 
disregarded because it would add complexity, fine, I can understand)

And then because of some start-up/automation issues, lets add the lvm 
tier? Introducing a layer that is constantly there and adds some 
overhead (maybe not that much) for every read and write operation? 

Is see ceph-disk as a tool to prepare the osd and the do the rest 
myself. Without ceph-deploy or ansible, because I trust more what I see 
I type than someone else scripted. I don’t have any startup problems.

Do assume I am not an expert in any field. But it is understandable that 
having nothing between the disk access and something (lvm) should have a 
performance penalty. 
I know you can hack around nicely with disks and lvm, but those pro's 
fall into the same category of questions people are suggesting related 
to putting disks in raid.

Let alone the risk that your are taking when there is going to be a 
significant performance penalty:
https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination
https://hrcak.srce.hr/index.php?show=clanak&id_clanak_jezik=216661



-Original Message-
From: David Turner [mailto:drakonst...@gmail.com] 
Sent: donderdag 31 mei 2018 23:48
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

Your question assumes that ceph-disk was a good piece of software.  It 
had a bug list a mile long and nobody working on it.  A common example 
was how simple it was to mess up any part of the dozens of components 
that allowed an OSD to autostart on boot.  One of the biggest problems 
was when ceph-disk was doing it's thing and an OSD would take longer 
than 3 minutes to start and ceph-disk would give up on it.

That is a little bit about why a new solution was sought after and why 
ceph-disk is being removed entirely.  LVM was a choice made to implement 
something other than partitions and udev magic while still incorporating 
the information still needed from all of that in a better solution.  
There has been a lot of talk about this on the ML.

On Thu, May 31, 2018 at 5:23 PM Marc Roos  
wrote:



What is the reasoning behind switching to lvm? Does it make sense 
to go 
through (yet) another layer to access the disk? Why creating this 
dependency and added complexity? It is fine as it is, or not?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread John Hearns
Errr   is this very wise ?

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)


In my experience setting up to interfaces on the same subnet means that
your ssystem doesnt know which one to route traffic through...






On 1 June 2018 at 09:01, Wladimir Mutel  wrote:

> Dear all,
>
> I am experimenting with Ceph setup. I set up a single node
> (Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA HDDs,
> Ubuntu 18.04 Bionic, Ceph packages from
> http://download.ceph.com/debian-luminous/dists/xenial/
> and iscsi parts built manually per
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/)
> Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
> in the CRUSH map to run with single host.
>
> I have both its Ethernets connected to the same LAN,
> with different IPs in the same subnet
> (like, 192.168.200.230/24 and 192.168.200.231/24)
> mon_host in ceph.conf is set to 192.168.200.230,
> and ceph daemons (mgr, mon, osd) are listening to this IP.
>
> What I would like to finally achieve, is to provide multipath
> iSCSI access through both these Ethernets to Ceph RBDs,
> and apparently, gwcli does not allow me to add a second
> gateway to the same target. It is going like this :
>
> /iscsi-target> create iqn.2018-06.host.test:test
> ok
> /iscsi-target> cd iqn.2018-06.host.test:test/gateways
> /iscsi-target...test/gateways> create p10s 192.168.200.230 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> ok
> /iscsi-target...test/gateways> create p10s2 192.168.200.231 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> Failed : Gateway creation failed, gateway(s) 
> unavailable:192.168.200.231(UNKNOWN
> state)
>
> host names are defined in /etc/hosts as follows :
>
> 192.168.200.230 p10s
> 192.168.200.231 p10s2
>
> so I suppose that something does not listen on 192.168.200.231,
> but I don't have an idea what is that thing and how to make it listen
> there. Or how to achieve this goal (utilization of both Ethernets for
> iSCSI) in different way. Shoud I aggregate Ethernets into a 'bond'
> interface with single IP ? Should I build and use 'lrbd' tool instead of
> 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+ ?
> What other directions could you give me on this task ?
> Thanks in advance for your replies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Marc Roos
 

Indeed, you have to add routes and rules to routing table. Just bond 
them.


-Original Message-
From: John Hearns [mailto:hear...@googlemail.com] 
Sent: vrijdag 1 juni 2018 10:00
To: ceph-users
Subject: Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - 
how to ?

Errr   is this very wise ?

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)
 

In my experience setting up to interfaces on the same subnet means that 
your ssystem doesnt know which one to route traffic through...







On 1 June 2018 at 09:01, Wladimir Mutel  wrote:


Dear all,

I am experimenting with Ceph setup. I set up a single node
(Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA 
HDDs,
Ubuntu 18.04 Bionic, Ceph packages from
http://download.ceph.com/debian-luminous/dists/xenial/ 
 
and iscsi parts built manually per
http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-instal
l/ 
 )
Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
in the CRUSH map to run with single host.

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)
mon_host in ceph.conf is set to 192.168.200.230,
and ceph daemons (mgr, mon, osd) are listening to this IP.

What I would like to finally achieve, is to provide 
multipath
iSCSI access through both these Ethernets to Ceph RBDs,
and apparently, gwcli does not allow me to add a second
gateway to the same target. It is going like this :

/iscsi-target> create iqn.2018-06.host.test:test
ok
/iscsi-target> cd iqn.2018-06.host.test:test/gateways
/iscsi-target...test/gateways> create p10s 192.168.200.230 
skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...test/gateways> create p10s2 192.168.200.231 
skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
Failed : Gateway creation failed, gateway(s) 
unavailable:192.168.200.231(UNKNOWN state)

host names are defined in /etc/hosts as follows :

192.168.200.230 p10s
192.168.200.231 p10s2

so I suppose that something does not listen on 
192.168.200.231, but I don't have an idea what is that thing and how to 
make it listen there. Or how to achieve this goal (utilization of both 
Ethernets for iSCSI) in different way. Shoud I aggregate Ethernets into 
a 'bond' interface with single IP ? Should I build and use 'lrbd' tool 
instead of 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+ 
?
What other directions could you give me on this task ?
Thanks in advance for your replies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Panayiotis Gotsis

Hello

Bonding and iscsi are not a best practice architecture. Multipath is,
however I can attest to problems with the multipathd and debian.

In any case, what you should try to do and check is:

1) Use two vlans, one for each ethernet port, with different ip
address space. Your initiators on the hosts will then be able to
discover two iscsi targets.
2) You should ensure that ping between host interfaces and iscsi
targets is working. You should ensure that the iscsi target daemon is
up (through the use of netstat for example) for each one of the two
ip addresses/ethernet interfaces
3) Check multipath configuration

On 18-06-01 05:08 +0200, Marc Roos wrote:



Indeed, you have to add routes and rules to routing table. Just bond
them.


-Original Message-
From: John Hearns [mailto:hear...@googlemail.com]
Sent: vrijdag 1 juni 2018 10:00
To: ceph-users
Subject: Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters -
how to ?

Errr   is this very wise ?

I have both its Ethernets connected to the same LAN,
   with different IPs in the same subnet
   (like, 192.168.200.230/24 and 192.168.200.231/24)


In my experience setting up to interfaces on the same subnet means that
your ssystem doesnt know which one to route traffic through...







On 1 June 2018 at 09:01, Wladimir Mutel  wrote:


Dear all,

I am experimenting with Ceph setup. I set up a single node
(Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA
HDDs,
Ubuntu 18.04 Bionic, Ceph packages from
http://download.ceph.com/debian-luminous/dists/xenial/

and iscsi parts built manually per
http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-instal
l/
 )
Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
in the CRUSH map to run with single host.

I have both its Ethernets connected to the same LAN,
with different IPs in the same subnet
(like, 192.168.200.230/24 and 192.168.200.231/24)
mon_host in ceph.conf is set to 192.168.200.230,
and ceph daemons (mgr, mon, osd) are listening to this IP.

What I would like to finally achieve, is to provide
multipath
iSCSI access through both these Ethernets to Ceph RBDs,
and apparently, gwcli does not allow me to add a second
gateway to the same target. It is going like this :

/iscsi-target> create iqn.2018-06.host.test:test
ok
/iscsi-target> cd iqn.2018-06.host.test:test/gateways
/iscsi-target...test/gateways> create p10s 192.168.200.230
skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
ok
/iscsi-target...test/gateways> create p10s2 192.168.200.231
skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
Failed : Gateway creation failed, gateway(s)
unavailable:192.168.200.231(UNKNOWN state)

host names are defined in /etc/hosts as follows :

192.168.200.230 p10s
192.168.200.231 p10s2

so I suppose that something does not listen on
192.168.200.231, but I don't have an idea what is that thing and how to
make it listen there. Or how to achieve this goal (utilization of both
Ethernets for iSCSI) in different way. Shoud I aggregate Ethernets into
a 'bond' interface with single IP ? Should I build and use 'lrbd' tool
instead of 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+
?
What other directions could you give me on this task ?
Thanks in advance for your replies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
--
Panayiotis Gotsis
Systems & Services Engineer
Network Operations Center
GRNET - Networking Research and Education
7, Kifisias Av., 115 23, Athens
t: +30 210 7471091 | f: +30 210 7474490

Follow us: www.grnet.gr
Twitter: @grnet_gr |Facebook: @grnet.gr
LinkedIn: grnet |YouTube: GRNET EDET
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread shrey chauhan
Hi,

I keep getting inconsistent placement groups and every time its the
whiteout.


cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0 clones,
1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9
whiteouts, 28802382/28802382 bytes, 16107/16107 hit_set_archive bytes.

I tried looking it up and I hardly found anything on this.

What are these whiteouts and when do they start causing inconsistent issues?

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread John Hearns
It is worth asking - why do you want to have two interfaces?
If you have 1Gbps interfaces and this is a bandwidth requirement then
10Gbps cards and switches are very cheap these days.

On 1 June 2018 at 10:37, Panayiotis Gotsis  wrote:

> Hello
>
> Bonding and iscsi are not a best practice architecture. Multipath is,
> however I can attest to problems with the multipathd and debian.
>
> In any case, what you should try to do and check is:
>
> 1) Use two vlans, one for each ethernet port, with different ip
> address space. Your initiators on the hosts will then be able to
> discover two iscsi targets.
> 2) You should ensure that ping between host interfaces and iscsi
> targets is working. You should ensure that the iscsi target daemon is
> up (through the use of netstat for example) for each one of the two
> ip addresses/ethernet interfaces
> 3) Check multipath configuration
>
>
> On 18-06-01 05:08 +0200, Marc Roos wrote:
>
>>
>>
>> Indeed, you have to add routes and rules to routing table. Just bond
>> them.
>>
>>
>> -Original Message-
>> From: John Hearns [mailto:hear...@googlemail.com]
>> Sent: vrijdag 1 juni 2018 10:00
>> To: ceph-users
>> Subject: Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters -
>> how to ?
>>
>> Errr   is this very wise ?
>>
>> I have both its Ethernets connected to the same LAN,
>>with different IPs in the same subnet
>>(like, 192.168.200.230/24 and 192.168.200.231/24)
>>
>>
>> In my experience setting up to interfaces on the same subnet means that
>> your ssystem doesnt know which one to route traffic through...
>>
>>
>>
>>
>>
>>
>>
>> On 1 June 2018 at 09:01, Wladimir Mutel  wrote:
>>
>>
>> Dear all,
>>
>> I am experimenting with Ceph setup. I set up a single node
>> (Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA
>> HDDs,
>> Ubuntu 18.04 Bionic, Ceph packages from
>> http://download.ceph.com/debian-luminous/dists/xenial/
>> 
>> and iscsi parts built manually per
>> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual
>> -instal
>> l/
>>  )
>> Also i changed 'chooseleaf ... host' into 'chooseleaf ...
>> osd'
>> in the CRUSH map to run with single host.
>>
>> I have both its Ethernets connected to the same LAN,
>> with different IPs in the same subnet
>> (like, 192.168.200.230/24 and 192.168.200.231/24)
>> mon_host in ceph.conf is set to 192.168.200.230,
>> and ceph daemons (mgr, mon, osd) are listening to this IP.
>>
>> What I would like to finally achieve, is to provide
>> multipath
>> iSCSI access through both these Ethernets to Ceph RBDs,
>> and apparently, gwcli does not allow me to add a second
>> gateway to the same target. It is going like this :
>>
>> /iscsi-target> create iqn.2018-06.host.test:test
>> ok
>> /iscsi-target> cd iqn.2018-06.host.test:test/gateways
>> /iscsi-target...test/gateways> create p10s 192.168.200.230
>> skipchecks=true
>> OS version/package checks have been bypassed
>> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
>> ok
>> /iscsi-target...test/gateways> create p10s2 192.168.200.231
>> skipchecks=true
>> OS version/package checks have been bypassed
>> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
>> Failed : Gateway creation failed, gateway(s)
>> unavailable:192.168.200.231(UNKNOWN state)
>>
>> host names are defined in /etc/hosts as follows :
>>
>> 192.168.200.230 p10s
>> 192.168.200.231 p10s2
>>
>> so I suppose that something does not listen on
>> 192.168.200.231, but I don't have an idea what is that thing and how to
>> make it listen there. Or how to achieve this goal (utilization of both
>> Ethernets for iSCSI) in different way. Shoud I aggregate Ethernets into
>> a 'bond' interface with single IP ? Should I build and use 'lrbd' tool
>> instead of 'gwcli' ? Is it acceptable that I run kernel 4.15, not 4.16+
>> ?
>> What other directions could you give me on this task ?
>> Thanks in advance for your replies.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> --
> --
> Panayiotis Gotsis
> Systems & Services Engineer
> Network Operations Center
> GRNET 

[ceph-users] ceph with rdma

2018-06-01 Thread Muneendra Kumar M
Hi ,

I have created a ceph cluster on centos 7 Servers and it is working fine
with tcp and able to run all benchmarks.



Now want to check the same with rdma  and I followed the below link to
deploy the same



https://community.mellanox.com/docs/DOC-2721



After following the above document and when I Restart all cluster processes
on the monitor node: using sudo systemctl start ceph-mon.target



The monitors are not coming up. Any help here will be great.



If I manually run the below command instead of using systemctl the server
is rebooting

sudo /usr/bin/ceph-mon --cluster ceph --id clx-ssp-056 --setuser ceph
--setgroup ceph



Below is my ceph.conf configuration file and iam using soft ROCE as my rdma
device.



[global]

fsid = 74cc4723-7ab9-4cc3-b8c8-182e138da955

mon_initial_members = TestNVMe2

mon_host = 10.38.32.245

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

public network = 10.38.32.0/24

osd pool default size = 2

osd_max_object_name_len = 256

osd_max_object_namespace_len = 64

ms_type = async+rdma

ms_cluster_type = async+rdma

ms_async_rdma_device_name = rxe0

ms_async_rdma_polling_us = 0

ms_async_rdma_local_gid = ::::::0a26:20f5

ms_async_rdma_port_num = 1



Regards,

Muneendra.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread Brad Hubbard
On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan
 wrote:
> Hi,
>
> I keep getting inconsistent placement groups and every time its the
> whiteout.
>
>
> cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0 clones,
> 1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9
> whiteouts, 28802382/28802382 bytes, 16107/16107 hit_set_archive bytes.
>
> I tried looking it up and I hardly found anything on this.
>
> What are these whiteouts and when do they start causing inconsistent issues?

Seems to be related to cache tiering. Is pool 9 a cache pool or does
it have a relation to a cache pool (or has it in the past)?

>
> Thanks
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-01 Thread Alfredo Deza
On Thu, May 31, 2018 at 10:33 PM, Marc Roos  wrote:
>
> I actually tried to search the ML before bringing up this topic. Because
> I do not get the logic choosing this direction.
>
> - Bluestore is created to cut out some fs overhead,
> - everywhere 10Gb is recommended because of better latency. (I even
> posted here something to make ceph better performing with 1Gb eth,
> disregarded because it would add complexity, fine, I can understand)
>
> And then because of some start-up/automation issues, lets add the lvm
> tier? Introducing a layer that is constantly there and adds some
> overhead (maybe not that much) for every read and write operation?
>
> Is see ceph-disk as a tool to prepare the osd and the do the rest
> myself. Without ceph-deploy or ansible, because I trust more what I see
> I type than someone else scripted. I don’t have any startup problems.

You can certainly do that with ceph-volume. You can create the OSD
manually, and then add the information about your OSD (drives,
locations, fsid, uuids, etc..)
on /etc/ceph/osd/

This is how we are able to take over ceph-disk deployed OSDs

See: http://docs.ceph.com/docs/master/ceph-volume/simple/scan/#scan

>
> Do assume I am not an expert in any field. But it is understandable that
> having nothing between the disk access and something (lvm) should have a
> performance penalty.
> I know you can hack around nicely with disks and lvm, but those pro's
> fall into the same category of questions people are suggesting related
> to putting disks in raid.
>
> Let alone the risk that your are taking when there is going to be a
> significant performance penalty:
> https://www.researchgate.net/publication/284897601_LVM_in_the_Linux_environment_Performance_examination
> https://hrcak.srce.hr/index.php?show=clanak&id_clanak_jezik=216661
>
>
>
> -Original Message-
> From: David Turner [mailto:drakonst...@gmail.com]
> Sent: donderdag 31 mei 2018 23:48
> To: Marc Roos
> Cc: ceph-users
> Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume
> and lvm? (and just not stick with direct disk access)
>
> Your question assumes that ceph-disk was a good piece of software.  It
> had a bug list a mile long and nobody working on it.  A common example
> was how simple it was to mess up any part of the dozens of components
> that allowed an OSD to autostart on boot.  One of the biggest problems
> was when ceph-disk was doing it's thing and an OSD would take longer
> than 3 minutes to start and ceph-disk would give up on it.
>
> That is a little bit about why a new solution was sought after and why
> ceph-disk is being removed entirely.  LVM was a choice made to implement
> something other than partitions and udev magic while still incorporating
> the information still needed from all of that in a better solution.
> There has been a lot of talk about this on the ML.
>
> On Thu, May 31, 2018 at 5:23 PM Marc Roos 
> wrote:
>
>
>
> What is the reasoning behind switching to lvm? Does it make sense
> to go
> through (yet) another layer to access the disk? Why creating this
> dependency and added complexity? It is fine as it is, or not?
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread Brad Hubbard
-- Forwarded message --
From: Brad Hubbard 
Date: Fri, Jun 1, 2018 at 9:24 PM
Subject: Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts
To: shrey chauhan 
Cc: ceph-users 


Too late for me today.

If you send your reply to the list someone else may provide an answer
more expansive than the following.

An object flagged as a "whiteout" logically does not exist. I believe
this is a way for a cache tier to cache the fact that this object does
not exist so it does not have to forward IO requests for that object
to the backing cache, it can just immediately notify the client that
the object does not exist.

Not sure why the scrub is uncovering mismatching values for whiteouts,
but that's what is happening.

On Fri, Jun 1, 2018 at 8:58 PM, shrey chauhan
 wrote:
> yes it is pool.
>
>
> moreover what are these whiteouts? and when does this mismatch happen?
>
>
>
> On Fri, Jun 1, 2018 at 3:51 PM, Brad Hubbard  wrote:
>>
>> On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan
>>  wrote:
>> > Hi,
>> >
>> > I keep getting inconsistent placement groups and every time its the
>> > whiteout.
>> >
>> >
>> > cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0
>> > clones,
>> > 1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9
>> > whiteouts, 28802382/28802382 bytes, 16107/16107 hit_set_archive bytes.
>> >
>> > I tried looking it up and I hardly found anything on this.
>> >
>> > What are these whiteouts and when do they start causing inconsistent
>> > issues?
>>
>> Seems to be related to cache tiering. Is pool 9 a cache pool or does
>> it have a relation to a cache pool (or has it in the past)?
>>
>> >
>> > Thanks
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Cheers,
>> Brad
>
>



--
Cheers,
Brad


-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread shrey chauhan
yes it is cache, moreover what are these whiteouts? and when does this
mismatch occur.

Thanks

On Fri, Jun 1, 2018 at 3:51 PM, Brad Hubbard  wrote:

> On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan
>  wrote:
> > Hi,
> >
> > I keep getting inconsistent placement groups and every time its the
> > whiteout.
> >
> >
> > cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0
> clones,
> > 1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9
> > whiteouts, 28802382/28802382 bytes, 16107/16107 hit_set_archive bytes.
> >
> > I tried looking it up and I hardly found anything on this.
> >
> > What are these whiteouts and when do they start causing inconsistent
> issues?
>
> Seems to be related to cache tiering. Is pool 9 a cache pool or does
> it have a relation to a cache pool (or has it in the past)?
>
> >
> > Thanks
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Sudden increase in "objects misplaced"

2018-06-01 Thread Jake Grimmett
Hi Greg,

Firstly, many thanks for your advice.

I'm perplexed as to why the crush map is upset; the host names look the
same, each node has a fixed IP on a single bond0 interface.

Perhaps the problems were an artefact of having "nodown" set?

As you suggested, I've unset "osd nodown" and am letting the cluster
rebalance. It looks like it's moving in the right direction, so
hopefully the problem will resolve...

osd: 454 osds: 453 up, 453 in; 287 remapped pgs

data:
pools:   3 pools, 8224 pgs
objects: 485M objects, 1402 TB
usage:   1481 TB used, 1788 TB / 3270 TB avail
pgs: 3145/5089441557 objects degraded (0.000%)
 19379209/5089441557 objects misplaced (0.381%)
 7870 active+clean
 238  active+remapped+backfilling
 66   active+recovery_wait+degraded
 49   active+remapped+backfill_wait
 1active+clean+snaptrim

  io:
client:   101 MB/s wr, 0 op/s rd, 28
recovery: 2806 MB/s, 975 objects/s

again, many thanks,

Jake

On 31/05/18 21:52, Gregory Farnum wrote:
> On Thu, May 31, 2018 at 5:07 AM Jake Grimmett  > wrote:
> 
> Dear All,
> 
> I recently upgraded our Ceph cluster from 12.2.4 to 12.2.5
> & simultaneously upgraded the OS from Scientific Linux 7.4 to 7.5
> 
> After reboot, 0.7% objects were misplaced and many pgs degraded.
> 
> the cluster had no client connections, so I speeded up recovery with:
> 
> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
> 
> cluster then rebalances at >6000 MB/s, but the number of misplaced
> objects started shooting up...
> 
> 
> Clearly something happened here. I'd probably try to understand that first.
> (Perhaps your host names changed and it swapped the CRUSH mappings?)
>  
> 
>  
> 
> 
> In case something very nasty was going on, I set osd nodown, and
> rebooted the cluster.
> 
> 
> This is probably not great. If you set nodown you're limiting the
> ability of the cluster to heal itself. Without understanding *why* it's
> trying to heal to begin with, you are in bad shape. Plus you may have
> OSD daemons dead and missing PGs that you just don't know about, because
> there's nobody around to report that they're dead. (Though you *may* be
> okay since the manager should notice if PG states aren't being reported
> and mark them stale.)
>  
> 
> 
> 21st May, Post reboot health status;
> 
>    pgs:     10003755/5184299696  objects
> degraded (0.193%)
>              282514666/5184299696  objects
> misplaced (5.449%)
>  recovery: 1901 MB/s, 657 objects/s
> 
> The cluster continued to mend, slowly this time (default
> osd-max-backfills)
> 
> 28th May
> nodown flag(s) set;
> 24820486/5352446983 objects misplaced (0.464%)
> Degraded data redundancy: 816609/5352446983 objects degraded (0.015%),
> 179 pgs degraded, 6 pgs undersized
> 
> 30th May
> nodown flag(s) set;
> 3571105/5667392354  objects misplaced (0.063%);
> Degraded data redundancy: 40/5667392354 
> objects degraded (0.000%)
> 1 pg degraded
> 
> All good, so I thought, but this morning (31st May):
> 
> nodown flag(s) set;
> 41264874/5190843723 objects misplaced (0.795%)
> Degraded data redundancy: 11795/5190843723 objects degraded (0.000%),
> 226 pgs degraded
> 
> Of course I'm perplexed as to what might have caused this...
> 
> Looking at /var/log/ceph.log-20180531.gz
> 
> there is a sudden jump in objects misplaced at 22:55:28
> 
> 
> 2018-05-30 22:55:18.154529 mon.ceph2 mon.0 10.1.0.80:6789/0
>  71418 :
> cluster [WRN] Health check update: 2666818/5085379079
>  objects misplaced
> (0.052%) (OBJECT_MISPLACED)
> 2018-05-30 22:55:20.096386 mon.ceph2 mon.0 10.1.0.80:6789/0
>  72319 :
> cluster [WRN] Health check failed: Reduced data availability: 34 pgs
> peering (PG_AVAILABILITY)
> 2018-05-30 22:55:22.197206 mon.ceph2 mon.0 10.1.0.80:6789/0
>  72333 :
> cluster [WRN] Health check failed: Degraded data redundancy:
> 1123/5079163159  objects degraded (0.000%), 21
> pgs degraded (PG_DEGRADED)
> 2018-05-30 22:55:23.155873 mon.ceph2 mon.0 10.1.0.80:6789/0
>  72335 :
> cluster [WRN] Health check update: 2666363/5079163159
>  objects misplaced
> (0.052%) (OBJECT_MISPLACED)
> 2018-05-30 22:55:25.450185 mon.ceph2 mon.0 10.1.0.80:6789/0
>  72336 :
> cluster [WRN] Health check update: Reduced data availability: 2 pgs
> inactive, 38 pgs peering (PG_AVAILABILITY)
> 2018-05-30 22:55:27.521142 mon.ceph2 mon.0 10.1.0.80:6789/0
>  72337 :
> cluster [WRN] Health check update: Degraded data redundancy:
> 13808/5085377819  objects degraded (0.000%),
> 270 pgs degraded (PG_DEGRADED)
> 2018-

Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Wladimir Mutel
	Well, ok, I moved second address into different subnet 
(192.168.201.231/24) and also reflected that in 'hosts' file


But that did not help much :

/iscsi-target...test/gateways> create p10s2 192.168.201.231 skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
Failed : Gateway creation failed, gateway(s) 
unavailable:192.168.201.231(UNKNOWN state)


/disks> create pool=replicated image=win2016-3gb size=2861589M
Failed : at least 2 gateways must exist before disk operations are permitted

I see this mentioned in Ceph-iSCSI-CLI GitHub issues
https://github.com/ceph/ceph-iscsi-cli/issues/54 and
https://github.com/ceph/ceph-iscsi-cli/issues/59
but apparently without a solution

So, would anybody propose an idea
on how can I start using iSCSI over Ceph acheap?
With the single P10S host I have in my hands right now?

Additional host and 10GBE hardware would require additional
funding, which would possible only in some future.

Thanks in advance for your responses

Wladimir Mutel wrote:


 I have both its Ethernets connected to the same LAN,
 with different IPs in the same subnet
 (like, 192.168.200.230/24 and 192.168.200.231/24)



192.168.200.230 p10s
192.168.200.231 p10s2


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Jason Dillaman
Are your firewall ports open for rbd-target-api? Is the process
running on the other host? If you run "gwcli -d" and try to add the
second gateway, what messages do you see?

On Fri, Jun 1, 2018 at 8:15 AM, Wladimir Mutel  wrote:
> Well, ok, I moved second address into different subnet
> (192.168.201.231/24) and also reflected that in 'hosts' file
>
> But that did not help much :
>
> /iscsi-target...test/gateways> create p10s2 192.168.201.231 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> Failed : Gateway creation failed, gateway(s)
> unavailable:192.168.201.231(UNKNOWN state)
>
> /disks> create pool=replicated image=win2016-3gb size=2861589M
> Failed : at least 2 gateways must exist before disk operations are permitted
>
> I see this mentioned in Ceph-iSCSI-CLI GitHub issues
> https://github.com/ceph/ceph-iscsi-cli/issues/54 and
> https://github.com/ceph/ceph-iscsi-cli/issues/59
> but apparently without a solution
>
> So, would anybody propose an idea
> on how can I start using iSCSI over Ceph acheap?
> With the single P10S host I have in my hands right now?
>
> Additional host and 10GBE hardware would require additional
> funding, which would possible only in some future.
>
> Thanks in advance for your responses
>
> Wladimir Mutel wrote:
>
>>  I have both its Ethernets connected to the same LAN,
>>  with different IPs in the same subnet
>>  (like, 192.168.200.230/24 and 192.168.200.231/24)
>
>
>> 192.168.200.230 p10s
>> 192.168.200.231 p10s2
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: v13.2.0 Mimic is out

2018-06-01 Thread ceph
FYI


De: "Abhishek"  À: "ceph-devel"
, "ceph-users" ,
ceph-maintain...@ceph.com, ceph-annou...@ceph.com Envoyé: Vendredi 1
Juin 2018 14:11:00 Objet: v13.2.0 Mimic is out
We're glad to announce the first stable release of Mimic, the next long
term release series. There have been major changes since Luminous and
please read the upgrade notes carefully.
We'd also like to highlight that we've had contributions from over 282
contributors, for Mimic, and would like to thank everyone for the
continued support. The next major release of Ceph will be called Nautilus.
For the detailed changelog, please refer to the release blog at
https://ceph.com/releases/v13-2-0-mimic-released/
Major Changes from Luminous ---
- *Dashboard*:
* The (read-only) Ceph manager dashboard introduced in Ceph Luminous has
been replaced with a new implementation inspired by and derived from the
openATTIC[1] Ceph management tool, providing a drop-in replacement
offering a number of additional management features
- *RADOS*:
* Config options can now be centrally stored and managed by the monitor.
* The monitor daemon uses significantly less disk space when undergoing
recovery or rebalancing operations. * An *async recovery* feature
reduces the tail latency of requests when the OSDs are recovering from a
recent failure. * OSD preemption of scrub by conflicting requests
reduces tail latency.
- *RGW*:
* RGW can now replicate a zone (or a subset of buckets) to an external
cloud storage service like S3. * RGW now supports the S3 multi-factor
authentication api on versioned buckets. * The Beast frontend is no long
expermiental and is considered stable and ready for use.
- *CephFS*:
* Snapshots are now stable when combined with multiple MDS daemons.
- *RBD*:
* Image clones no longer require explicit *protect* and *unprotect*
steps. * Images can be deep-copied (including any clone linkage to a
parent image and associated snapshots) to new pools or with altered data
layouts.
Upgrading from Luminous ---
Notes ~
* We recommend you avoid creating any RADOS pools while the upgrade is
in process.
* You can monitor the progress of your upgrade at each stage with the
`ceph versions` command, which will tell you what ceph version(s) are
running for each type of daemon.
Instructions 
#. Make sure your cluster is stable and healthy (no down or recoverying
OSDs). (Optional, but recommended.)
#. Set the `noout` flag for the duration of the upgrade. (Optional, but
recommended.)::
# ceph osd set noout
#. Upgrade monitors by installing the new packages and restarting the
monitor daemons.::
# systemctl restart ceph-mon.target
Verify the monitor upgrade is complete once all monitors are up by
looking for the `mimic` feature string in the mon map. For example::
# ceph mon feature ls
should include `mimic` under persistent features::
on current monmap (epoch NNN) persistent: [kraken,luminous,mimic]
required: [kraken,luminous,mimic]
#. Upgrade `ceph-mgr` daemons by installing the new packages and
restarting with::
# systemctl restart ceph-mgr.target
Verify the ceph-mgr daemons are running by checking `ceph -s`::
# ceph -s
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active),
standbys: bar, baz ...
#. Upgrade all OSDs by installing the new packages and restarting the
ceph-osd daemons on all hosts::
# systemctl restart ceph-osd.target
You can monitor the progress of the OSD upgrades with the new `ceph
versions` or `ceph osd versions` command::
# ceph osd versions { "ceph version 12.2.5 (...) luminous (stable)": 12,
"ceph version 13.2.0 (...) mimic (stable)": 22, }
#. Upgrade all CephFS MDS daemons. For each CephFS file system,
#. Reduce the number of ranks to 1. (Make note of the original number of
MDS daemons first if you plan to restore it later.)::
# ceph status # ceph fs set  max_mds 1
#. Wait for the cluster to deactivate any non-zero ranks by periodically
checking the status::
# ceph status
#. Take all standby MDS daemons offline on the appropriate hosts with::
# systemctl stop ceph-mds@
#. Confirm that only one MDS is online and is rank 0 for your FS::
# ceph status
#. Upgrade the last remaining MDS daemon by installing the new packages
and restarting the daemon::
# systemctl restart ceph-mds.target
#. Restart all standby MDS daemons that were taken offline::
# systemctl start ceph-mds.target
#. Restore the original value of `max_mds` for the volume::
# ceph fs set  max_mds 
#. Upgrade all radosgw daemons by upgrading packages and restarting
daemons on all hosts::
# systemctl restart radosgw.target
#. Complete the upgrade by disallowing pre-mimic OSDs and enabling all
new Mimic-only functionality::
# ceph osd require-osd-release mimic
#. If you set `noout` at the beginning, be sure to clear it with::
# ceph osd unset noout
#. Verify the cluster is healthy with `ceph health`.
Upgrading from pre-Luminous releases (like Jewel)
-
You *must* first upgra

[ceph-users] Migrating (slowly) from spinning rust to ssd

2018-06-01 Thread Jonathan Proulx
Hi All,

I looking at starting to move my deployed ceph cluster to SSD.

As a first step my though is to get a large enough set of SSD
expantion that I can set crush map to ensure 1 copy of every
(important) PG is on SSD and use primary affinity to ensure that copy
is primary.

I know this won't help with writes, but most of my pain is reads since
workloads are generally not cache freindly and write workloads while
larger ard fairly asynchronous so WAL and DB on SSD along with soem
write back caching on libvirt side (most of my load is VMs) makes
writes *seem* fast enough for now.

I have a few question before writing a check that size.

Is this completely insane?

Are there any hidden surprizes I may not have considered?

Will I really need to mess with crush map to get this to happen?  I
expect so, but if primary affinity settings along with current "rack"
level leaves is good enough to be sure each of 3 replicas is in a
different rack and at least one of those is on an SSD OSD I'd rather
not touch crush (bonus points if anyone has a worked example).

Thanks,
-Jon

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD recommendation

2018-06-01 Thread Simon Ironside
Thanks for the input, both. I've gone ahead with the SM863As. I've no 
input on the Microns I'm afraid. The specs look good to me, I just can't 
get them easily.


Sean, I didn't know you'd lost 10 in all. I do have 4x 480GB S4600s I'm 
using as Filestore journals in production for a couple of months now 
(purchased before I saw the S4600 thread) without issue. IIRC you were 
using the same 2TB S4600s as OSDs as the OP of the S4600 thread - I'm 
keeping my fingers crossed that if I was going to have the problem you 
experienced, I would've had it by now . . .


Thanks again,
Simon

On 31/05/18 19:12, Sean Redmond wrote:

I know the s4600 thread well as I had over 10 of those drives fail 
before I took them all out of production.


Intel did say a firmware fix was on the way but I could not wait and 
opted for SM863A and never looked back...


I will be sticking with SM863A for now on futher orders.

On Thu, 31 May 2018, 15:33 Fulvio Galeazzi, > wrote:


      I am also about to buy some new hardware and for SATA ~400GB I
was
considering Micron 5200 MAX, rated at 5 DWPD, for journaling/FSmetadata.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why the change from ceph-disk to ceph-volume and lvm? (and just not stick with direct disk access)

2018-06-01 Thread Marc Roos
 
Yes it is indeed difficult to find a good balance between asking 
multiple things in one email and risk that not all are answered, or 
putting them as individual questions. 


-Original Message-
From: David Turner [mailto:drakonst...@gmail.com] 
Sent: donderdag 31 mei 2018 23:50
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Why the change from ceph-disk to ceph-volume 
and lvm? (and just not stick with direct disk access)

You are also making this entire conversation INCREDIBLY difficult to 
follow by creating so many new email threads instead of sticking with 
one.

On Thu, May 31, 2018 at 5:48 PM David Turner  
wrote:


Your question assumes that ceph-disk was a good piece of software.  
It had a bug list a mile long and nobody working on it.  A common 
example was how simple it was to mess up any part of the dozens of 
components that allowed an OSD to autostart on boot.  One of the biggest 
problems was when ceph-disk was doing it's thing and an OSD would take 
longer than 3 minutes to start and ceph-disk would give up on it.

That is a little bit about why a new solution was sought after and 
why ceph-disk is being removed entirely.  LVM was a choice made to 
implement something other than partitions and udev magic while still 
incorporating the information still needed from all of that in a better 
solution.  There has been a lot of talk about this on the ML.

On Thu, May 31, 2018 at 5:23 PM Marc Roos 
 wrote:



What is the reasoning behind switching to lvm? Does it make 
sense to go 
through (yet) another layer to access the disk? Why creating 
this 
dependency and added complexity? It is fine as it is, or not?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph EC profile, how are you using?

2018-06-01 Thread Vasu Kulkarni
Thanks to those who have added their config,  Request anyone in list
using EC profile in production to add high level config which will be
helpful for tests.

Thanks

On Wed, May 30, 2018 at 12:16 PM, Vasu Kulkarni  wrote:
> Hello Ceph Users,
>
> I would like to know how folks are using EC profile in the production
> environment, what kind of EC configurations are you using (10+4, 5+3 ?
> ) with other configuration options, If you can reply to this thread or
> update in the shared excel sheet below that will help design better
> tests that are run on nightly basis.
>
> https://docs.google.com/spreadsheets/d/1B7WLM3_6nV_DMf18POI7cWLWx6_vQJABVC2-bbglNEM/edit?usp=sharing
>
> Thanks
> Vasu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Wladimir Mutel

Ok, I looked into Python sources of ceph-iscsi-{cli,config} and
found that per-host configuration sections use short host name
(returned by this_host() function) as their primary key.
So I can't trick gwcli with alternative host name like p10s2
which I put into /etc/hosts to denote my second IP,
as this_host() calls gethostname() and further code
disregards alternative host names at all.
I added 192.168.201.231 into trusted_ip_list,
but after 'create p10s2 192.168.201.231 skipchecks=true'
I got KeyError 'p10s2' in gwcli/gateway.py line 571

Fortunately, I found a way to edit Ceph iSCSI configuration
as a text file (rados --pool rbd get gateway.conf gateway.conf)
I added needed IP to the appropriate json lists
(."gateways"."ip_list" and."gateways"."p10s"."gateway_ip_list"),
put the file back into RADOS and restarted rbd-target-gw
in the hope everything will go well

Unfortunately, I found (by running 'targetcli ls')
that now it creates 2 TPGs with single IP portal in each of them
Also, it disables 1st TPG but enables 2nd one, like this :

  o- iscsi  [Targets: 1]
  | o- iqn.2018-06.domain.p10s:p10s [TPGs: 2]
  |   o- tpg1   [disabled]
  |   | o- portals  [Portals: 1]
  |   |   o- 192.168.200.230:3260   [OK]
  |   o- tpg2   [no-gen-acls, no-auth]
  | o- portals  [Portals: 1]
  |   o- 192.168.201.231:3260   [OK]

And still, when I do '/disks create ...' in gwcli, it says
that it wants 2 existing gateways. Probably this is related
to the created 2-TPG structure and I should look for more ways
to 'improve' that json config so that rbd-target-gw loads it
as I need on single host.


Wladimir Mutel wrote:
 Well, ok, I moved second address into different subnet 
(192.168.201.231/24) and also reflected that in 'hosts' file


 But that did not help much :

/iscsi-target...test/gateways> create p10s2 192.168.201.231 skipchecks=true
OS version/package checks have been bypassed
Adding gateway, sync'ing 0 disk(s) and 0 client(s)
Failed : Gateway creation failed, gateway(s) 
unavailable:192.168.201.231(UNKNOWN state)


/disks> create pool=replicated image=win2016-3gb size=2861589M
Failed : at least 2 gateways must exist before disk operations are 
permitted


 I see this mentioned in Ceph-iSCSI-CLI GitHub issues
https://github.com/ceph/ceph-iscsi-cli/issues/54 and
https://github.com/ceph/ceph-iscsi-cli/issues/59
 but apparently without a solution

 So, would anybody propose an idea
 on how can I start using iSCSI over Ceph acheap?
 With the single P10S host I have in my hands right now?

 Additional host and 10GBE hardware would require additional
 funding, which would possible only in some future.

 Thanks in advance for your responses

Wladimir Mutel wrote:


 I have both its Ethernets connected to the same LAN,
 with different IPs in the same subnet
 (like, 192.168.200.230/24 and 192.168.200.231/24)



192.168.200.230 p10s
192.168.200.231 p10s2


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: v13.2.0 Mimic is out

2018-06-01 Thread Alexandre DERUMIER
CephFS snapshot is now stable and enabled by default on new filesystems 


:) 






Alexandre Derumier 
Ingénieur système et stockage 

Manager Infrastructure 


Fixe : +33 3 59 82 20 10 



125 Avenue de la république 
59110 La Madeleine 
[ https://twitter.com/OdisoHosting ] [ https://twitter.com/mindbaz ] [ 
https://www.linkedin.com/company/odiso ] [ 
https://www.viadeo.com/fr/company/odiso ] [ 
https://www.facebook.com/monsiteestlent ] 

[ https://www.monsiteestlent.com/ | MonSiteEstLent.com ] - Blog dédié à la 
webperformance et la gestion de pics de trafic 






De: "ceph"  
À: "ceph-users"  
Envoyé: Vendredi 1 Juin 2018 14:48:13 
Objet: [ceph-users] Fwd: v13.2.0 Mimic is out 

FYI 


De: "Abhishek"  À: "ceph-devel" 
, "ceph-users" , 
ceph-maintain...@ceph.com, ceph-annou...@ceph.com Envoyé: Vendredi 1 
Juin 2018 14:11:00 Objet: v13.2.0 Mimic is out 
We're glad to announce the first stable release of Mimic, the next long 
term release series. There have been major changes since Luminous and 
please read the upgrade notes carefully. 
We'd also like to highlight that we've had contributions from over 282 
contributors, for Mimic, and would like to thank everyone for the 
continued support. The next major release of Ceph will be called Nautilus. 
For the detailed changelog, please refer to the release blog at 
https://ceph.com/releases/v13-2-0-mimic-released/ 
Major Changes from Luminous --- 
- *Dashboard*: 
* The (read-only) Ceph manager dashboard introduced in Ceph Luminous has 
been replaced with a new implementation inspired by and derived from the 
openATTIC[1] Ceph management tool, providing a drop-in replacement 
offering a number of additional management features 
- *RADOS*: 
* Config options can now be centrally stored and managed by the monitor. 
* The monitor daemon uses significantly less disk space when undergoing 
recovery or rebalancing operations. * An *async recovery* feature 
reduces the tail latency of requests when the OSDs are recovering from a 
recent failure. * OSD preemption of scrub by conflicting requests 
reduces tail latency. 
- *RGW*: 
* RGW can now replicate a zone (or a subset of buckets) to an external 
cloud storage service like S3. * RGW now supports the S3 multi-factor 
authentication api on versioned buckets. * The Beast frontend is no long 
expermiental and is considered stable and ready for use. 
- *CephFS*: 
* Snapshots are now stable when combined with multiple MDS daemons. 
- *RBD*: 
* Image clones no longer require explicit *protect* and *unprotect* 
steps. * Images can be deep-copied (including any clone linkage to a 
parent image and associated snapshots) to new pools or with altered data 
layouts. 
Upgrading from Luminous --- 
Notes ~ 
* We recommend you avoid creating any RADOS pools while the upgrade is 
in process. 
* You can monitor the progress of your upgrade at each stage with the 
`ceph versions` command, which will tell you what ceph version(s) are 
running for each type of daemon. 
Instructions  
#. Make sure your cluster is stable and healthy (no down or recoverying 
OSDs). (Optional, but recommended.) 
#. Set the `noout` flag for the duration of the upgrade. (Optional, but 
recommended.):: 
# ceph osd set noout 
#. Upgrade monitors by installing the new packages and restarting the 
monitor daemons.:: 
# systemctl restart ceph-mon.target 
Verify the monitor upgrade is complete once all monitors are up by 
looking for the `mimic` feature string in the mon map. For example:: 
# ceph mon feature ls 
should include `mimic` under persistent features:: 
on current monmap (epoch NNN) persistent: [kraken,luminous,mimic] 
required: [kraken,luminous,mimic] 
#. Upgrade `ceph-mgr` daemons by installing the new packages and 
restarting with:: 
# systemctl restart ceph-mgr.target 
Verify the ceph-mgr daemons are running by checking `ceph -s`:: 
# ceph -s 
... services: mon: 3 daemons, quorum foo,bar,baz mgr: foo(active), 
standbys: bar, baz ... 
#. Upgrade all OSDs by installing the new packages and restarting the 
ceph-osd daemons on all hosts:: 
# systemctl restart ceph-osd.target 
You can monitor the progress of the OSD upgrades with the new `ceph 
versions` or `ceph osd versions` command:: 
# ceph osd versions { "ceph version 12.2.5 (...) luminous (stable)": 12, 
"ceph version 13.2.0 (...) mimic (stable)": 22, } 
#. Upgrade all CephFS MDS daemons. For each CephFS file system, 
#. Reduce the number of ranks to 1. (Make note of the original number of 
MDS daemons first if you plan to restore it later.):: 
# ceph status # ceph fs set  max_mds 1 
#. Wait for the cluster to deactivate any non-zero ranks by periodically 
checking the status:: 
# ceph status 
#. Take all standby MDS daemons offline on the appropriate hosts with:: 
# systemctl stop ceph-mds@ 
#. Confirm that only one MDS is online and is rank 0 for your FS:: 
# ceph status 
#. Upgrade the last remaining MDS daemon by installing the

[ceph-users] Problems while sending email to Ceph mailings

2018-06-01 Thread Leonardo Vaz
Hi,

Some of our community members reported problems while sending email to
mailings hosted by Ceph project.

We reported the problem to the host company and it's happening because
some changes on DNS records done some time ago, and they're working to
fix it.

The issue happens to all mailings (except ceph-devel which is hosted
by vger.kernel.org) and a workaround is to update the mail address
from @ceph.com to @lists.ceph.com.

The email addresses are also documented on our website:

   https://ceph.com/irc/#mailing-lists

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly - June 2018

2018-06-01 Thread Leonardo Vaz
Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Monthly
meeting is coming up:

 http://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 http://wiki.ceph.com/CDM_06-JUN-2018

This edition happens on NA/EMEAC friendly hours (12:30 EST) and we
will use the following Bluejeans URL for the video conference:

 https://redhat.bluejeans.com/376400604

The meeting details are also available on Ceph Community Calendar:

 
https://calendar.google.com/calendar/b/1?cid=OXRzOWM3bHQ3dTF2aWMyaWp2dnFxbGZwbzBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Jason Dillaman
The (1) TPG per gateway is expected behavior since that is how ALUA
active/passive is configured.

On Fri, Jun 1, 2018 at 1:20 PM, Wladimir Mutel  wrote:
> Ok, I looked into Python sources of ceph-iscsi-{cli,config} and
> found that per-host configuration sections use short host name
> (returned by this_host() function) as their primary key.
> So I can't trick gwcli with alternative host name like p10s2
> which I put into /etc/hosts to denote my second IP,
> as this_host() calls gethostname() and further code
> disregards alternative host names at all.
> I added 192.168.201.231 into trusted_ip_list,
> but after 'create p10s2 192.168.201.231 skipchecks=true'
> I got KeyError 'p10s2' in gwcli/gateway.py line 571
>
> Fortunately, I found a way to edit Ceph iSCSI configuration
> as a text file (rados --pool rbd get gateway.conf gateway.conf)
> I added needed IP to the appropriate json lists
> (."gateways"."ip_list" and."gateways"."p10s"."gateway_ip_list"),
> put the file back into RADOS and restarted rbd-target-gw
> in the hope everything will go well
>
> Unfortunately, I found (by running 'targetcli ls')
> that now it creates 2 TPGs with single IP portal in each of them
> Also, it disables 1st TPG but enables 2nd one, like this :
>
>   o- iscsi  [Targets: 1]
>   | o- iqn.2018-06.domain.p10s:p10s [TPGs: 2]
>   |   o- tpg1   [disabled]
>   |   | o- portals  [Portals: 1]
>   |   |   o- 192.168.200.230:3260   [OK]
>   |   o- tpg2   [no-gen-acls, no-auth]
>   | o- portals  [Portals: 1]
>   |   o- 192.168.201.231:3260   [OK]
>
> And still, when I do '/disks create ...' in gwcli, it says
> that it wants 2 existing gateways. Probably this is related
> to the created 2-TPG structure and I should look for more ways
> to 'improve' that json config so that rbd-target-gw loads it
> as I need on single host.
>
>
>
> Wladimir Mutel wrote:
>>
>>  Well, ok, I moved second address into different subnet
>> (192.168.201.231/24) and also reflected that in 'hosts' file
>>
>>  But that did not help much :
>>
>> /iscsi-target...test/gateways> create p10s2 192.168.201.231
>> skipchecks=true
>> OS version/package checks have been bypassed
>> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
>> Failed : Gateway creation failed, gateway(s)
>> unavailable:192.168.201.231(UNKNOWN state)
>>
>> /disks> create pool=replicated image=win2016-3gb size=2861589M
>> Failed : at least 2 gateways must exist before disk operations are
>> permitted
>>
>>  I see this mentioned in Ceph-iSCSI-CLI GitHub issues
>> https://github.com/ceph/ceph-iscsi-cli/issues/54 and
>> https://github.com/ceph/ceph-iscsi-cli/issues/59
>>  but apparently without a solution
>>
>>  So, would anybody propose an idea
>>  on how can I start using iSCSI over Ceph acheap?
>>  With the single P10S host I have in my hands right now?
>>
>>  Additional host and 10GBE hardware would require additional
>>  funding, which would possible only in some future.
>>
>>  Thanks in advance for your responses
>>
>> Wladimir Mutel wrote:
>>
>>>  I have both its Ethernets connected to the same LAN,
>>>  with different IPs in the same subnet
>>>  (like, 192.168.200.230/24 and 192.168.200.231/24)
>>
>>
>>> 192.168.200.230 p10s
>>> 192.168.200.231 p10s2
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating (slowly) from spinning rust to ssd

2018-06-01 Thread Paul Emmerich
You can't have a server with both SSDs and HDDs in this setup because you
can't write a crush rule that is able to pick n distinct servers when also
specifying different device classes.
A crush rule for this looks like this:

step take default class=ssd
step choose firstn 1 type host
emit
step take default class=hdd
step choose firstn -1 type host
emit

(No primary affinity needed)
It can pick the same server twice because it needs to start over to change
the device class.

But yes, it does work very well for read-heavy workloads. But such a setup
suffers more than usual from server outages: one SSD host fails, and lots
of reads go to the HDD hosts, overloading them...


Paul

2018-06-01 16:34 GMT+02:00 Jonathan Proulx :

> Hi All,
>
> I looking at starting to move my deployed ceph cluster to SSD.
>
> As a first step my though is to get a large enough set of SSD
> expantion that I can set crush map to ensure 1 copy of every
> (important) PG is on SSD and use primary affinity to ensure that copy
> is primary.
>
> I know this won't help with writes, but most of my pain is reads since
> workloads are generally not cache freindly and write workloads while
> larger ard fairly asynchronous so WAL and DB on SSD along with soem
> write back caching on libvirt side (most of my load is VMs) makes
> writes *seem* fast enough for now.
>
> I have a few question before writing a check that size.
>
> Is this completely insane?
>
> Are there any hidden surprizes I may not have considered?
>
> Will I really need to mess with crush map to get this to happen?  I
> expect so, but if primary affinity settings along with current "rack"
> level leaves is good enough to be sure each of 3 replicas is in a
> different rack and at least one of those is on an SSD OSD I'd rather
> not touch crush (bonus points if anyone has a worked example).
>
> Thanks,
> -Jon
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI to a Ceph node with 2 network adapters - how to ?

2018-06-01 Thread Mike Christie
On 06/01/2018 02:01 AM, Wladimir Mutel wrote:
> Dear all,
> 
> I am experimenting with Ceph setup. I set up a single node
> (Asus P10S-M WS, Xeon E3-1235 v5, 64 GB RAM, 8x3TB SATA HDDs,
> Ubuntu 18.04 Bionic, Ceph packages from
> http://download.ceph.com/debian-luminous/dists/xenial/
> and iscsi parts built manually per
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/)
> Also i changed 'chooseleaf ... host' into 'chooseleaf ... osd'
> in the CRUSH map to run with single host.
> 
> I have both its Ethernets connected to the same LAN,
> with different IPs in the same subnet
> (like, 192.168.200.230/24 and 192.168.200.231/24)
> mon_host in ceph.conf is set to 192.168.200.230,
> and ceph daemons (mgr, mon, osd) are listening to this IP.
> 
> What I would like to finally achieve, is to provide multipath
> iSCSI access through both these Ethernets to Ceph RBDs,
> and apparently, gwcli does not allow me to add a second
> gateway to the same target. It is going like this :
> 
> /iscsi-target> create iqn.2018-06.host.test:test
> ok
> /iscsi-target> cd iqn.2018-06.host.test:test/gateways
> /iscsi-target...test/gateways> create p10s 192.168.200.230 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> ok
> /iscsi-target...test/gateways> create p10s2 192.168.200.231 skipchecks=true
> OS version/package checks have been bypassed
> Adding gateway, sync'ing 0 disk(s) and 0 client(s)
> Failed : Gateway creation failed, gateway(s)
> unavailable:192.168.200.231(UNKNOWN state)
> 
> host names are defined in /etc/hosts as follows :
> 
> 192.168.200.230 p10s
> 192.168.200.231 p10s2
> 
> so I suppose that something does not listen on 192.168.200.231, but
> I don't have an idea what is that thing and how to make it listen there.
> Or how to achieve this goal (utilization of both Ethernets for iSCSI) in
> different way. Shoud I aggregate Ethernets into a 'bond' interface with

There are multiple issues here:

1. LIO does not really support multiple IPs on the same subnet on the
same system out of the box. The network routing will kick in and
sometimes if the initiator sent something to .230, the target would
respond from .231 and I think for operations like logins it will not go
as planned in the iscsi target layer as the code that manages
connections gets thrown off. On the initiator side you it works when
using ifaces because we use SO_BINDTODEVICE to tell the net layer to use
the specific netdev, but there is no code like that in the target. So on
the target, I think it just depends on the routing table setup and you
have to modify that. I think there might be a bug though.

In general I think different subnet is easiest and best for most cases.

2. Ceph-iscsi does not support multiple IPs on the same gw right now,
because you can hit the issue where a WRITE is sent down path1, that
gets stuck, then the initiator fails over to path2 and sends the STPG
there. That will go down a different path and so the WRITE in path 1 is
not flushed like we need. Because both paths are accessing the same rbd
client then the rbd locking/blacklisting would not kick in like when
this is done on different gws.

So for both you would/could just use networ level bonding.

> single IP ? Should I build and use 'lrbd' tool instead of 'gwcli' ? Is

Or you can use lrbd but for that make sure you are using the SUSE kernel
as they have the special timeout code.

> it acceptable that I run kernel 4.15, not 4.16+ ?
> What other directions could you give me on this task ?
> Thanks in advance for your replies.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Data recovery after loosing all monitors

2018-06-01 Thread Bryan Henderson
>Luckily; it's not. I don't remember if the MDS maps contain entirely
>ephemeral data, but on the scale of cephfs recovery scenarios that's just
>about the easiest one. Somebody would have to walk through it; you probably
>need to look up the table states and mds counts from the RADOS store and
>generate a new (epoch 1 or 2) mdsmap which contains those settings ready to
>go. Or maybe you just need to "create" a new cephfs on the prior pools and
>set it up with the correct number of MDSes.
>
>At the moment the mostly-documented recovery procedure probably involves
>recovering the journals, flushing everything out, and resetting the server
>state to a single MDS, and if you lose all your monitors there's a good
>chance you need to be going through recovery anyway, so...*shrug*

The idea of just creating a new filesystem from old metadata and data pools
intrigued me, so I looked into it further, including reading some code.

It appears that there's nothing in the MDS map that can't be regenerated, and
while it's probably easy for a Ceph developer to do that, there aren't tools
available that can.

'fs new' comes close, but according to

  http://docs.ceph.com/docs/master/cephfs/disaster-recovery/

it causes a new empty root directory to be created, so you lose access to all
your files (and leak all the storage space they occupy).

The same document mentions 'fs reset', which also comes close and keeps the
existing root directory, but it requires, perhaps gratuitously, that a
filesystem already exist in the MDS map, albeit maybe corrupted, before it
regenerates it.

I'm tempted to modify Ceph to try to add a 'fs recreate' that does what 'fs
reset' does, but without expecting anything to be there already.  Maybe that's
all it takes along with 'ceph-objecstore-tool --op update-mon-db' to recover
from a lost cluster map.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Data recovery after loosing all monitors

2018-06-01 Thread Yan, Zheng
Bryan Henderson  于 2018年6月2日周六 10:23写道:

> >Luckily; it's not. I don't remember if the MDS maps contain entirely
> >ephemeral data, but on the scale of cephfs recovery scenarios that's just
> >about the easiest one. Somebody would have to walk through it; you
> probably
> >need to look up the table states and mds counts from the RADOS store and
> >generate a new (epoch 1 or 2) mdsmap which contains those settings ready
> to
> >go. Or maybe you just need to "create" a new cephfs on the prior pools and
> >set it up with the correct number of MDSes.
> >
> >At the moment the mostly-documented recovery procedure probably involves
> >recovering the journals, flushing everything out, and resetting the server
> >state to a single MDS, and if you lose all your monitors there's a good
> >chance you need to be going through recovery anyway, so...*shrug*
>
> The idea of just creating a new filesystem from old metadata and data pools
> intrigued me, so I looked into it further, including reading some code.
>
> It appears that there's nothing in the MDS map that can't be regenerated,
> and
> while it's probably easy for a Ceph developer to do that, there aren't
> tools
> available that can.
>
> 'fs new' comes close, but according to
>
>   http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
>
> it causes a new empty root directory to be created, so you lose access to
> all
> your files (and leak all the storage space they occupy)
>

Kill all mds first , create new fs with old pools , then run ‘fs reset’
before start any MDS.



> The same document mentions 'fs reset', which also comes close and keeps the
> existing root directory, but it requires, perhaps gratuitously, that a
> filesystem already exist in the MDS map, albeit maybe corrupted, before it
> regenerates it.
>
> I'm tempted to modify Ceph to try to add a 'fs recreate' that does what 'fs
> reset' does, but without expecting anything to be there already.  Maybe
> that's
> all it takes along with 'ceph-objecstore-tool --op update-mon-db' to
> recover
> from a lost cluster map.
>
> --
> Bryan Henderson   San Jose, California
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com