[ceph-users] Re: Out of Memory after Upgrading to Nautilus

2021-05-06 Thread Christoph Adomeit
It looks that I have solved the issue.

I tried:
ceph.conf
[osd]
 osd_memory_target = 1073741824

systemctl restart ceph-osd.target

when i run
ceph config get osd.40 osd_memory_target it returns:
4294967296

so this did not work.

Next I tried:
ceph tell osd.* injectargs '--osd_memory_target 1073741824'

and ceph returns:
ceph config get osd.40 osd_memory_target
4294967296

So this also dir not work in 14.2.20

Next I tried:

ceph config set osd/class:hdd osd_memory_target 1073741824

and that finally worked.

I also slowly increased the memory and so far I use:

ceph config set osd/class:hdd osd_memory_target 2147483648 

for now.

Thanks
  Christoph
On Wed, May 05, 2021 at 04:30:17PM +0200, Christoph Adomeit wrote:
> I manage a historical cluster of severak ceph nodes with each 128 GB Ram and 
> 36 OSD each 8 TB size.
> 
> The cluster ist just for archive purpose and performance is not so important.
> 
> The cluster was running fine for long time using ceph luminous.
> 
> Last week I updated it to Debian 10 and Ceph Nautilus.
> 
> Now I can see that the memory usage of each osd grows slowly to 4 GB each and 
> once the system has
> no memory left it will oom-kill processes
> 
> I have already configured osd_memory_target = 1073741824 .
> This helps for some hours but then memory usage will grow from 1 GB to 4 GB 
> per OSD.
> 
> Any ideas what I can do to further limit osd memory usage ?
> 
> It would be good to keep the hardware running some more time without 
> upgrading RAM on all
> OSD machines.
> 
> Any Ideas ?
> 
> Thanks
>   Christoph
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
"rule_name": "nxtcloudAFhost",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 7,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
11 clients failing to respond to capability release
2 MDSs report slow metadata IOs
1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability release
mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to respond to


I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush rule of
a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:
> Thanks, I will test it.
> 
> El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:
>> Create a new crush rule with the correct failure domain, test it
>> properly and assign it to the pool(s).
>>
> 

-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Eugen Block
Yes it is possible, but you should validate it with crushtool before  
injecting it to make sure the PGs land where they belong.


crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings

If you don't get bad mappings and the 'show-mappings' confirms the PG  
distribution by host you can inject it. But be aware of a lot of data  
movement, that could explain the (temporarily) unavailable PGs. But to  
make your cluster resilient against host failure you'll have to go  
through that at some point.



https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
"rule_name": "nxtcloudAFhost",
"ruleset": 2,
"type": 3,
"min_size": 3,
"max_size": 7,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
health: HEALTH_WARN
11 clients failing to respond to capability release
2 MDSs report slow metadata IOs
1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability release
mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to respond to


I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush rule of
a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
Ok, thank you very much for the answer.

El 6/5/21 a las 13:47, Eugen Block escribió:
> Yes it is possible, but you should validate it with crushtool before
> injecting it to make sure the PGs land where they belong.
> 
> crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
> crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings
> 
> If you don't get bad mappings and the 'show-mappings' confirms the PG
> distribution by host you can inject it. But be aware of a lot of data
> movement, that could explain the (temporarily) unavailable PGs. But to
> make your cluster resilient against host failure you'll have to go
> through that at some point.
> 
> 
> https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/
> 
> 
> Zitat von Andres Rojas Guerrero :
> 
>> Hi, I try to make a new crush rule (Nautilus) in order take the new
>> correct_failure_domain to hosts:
>>
>>    "rule_id": 2,
>>     "rule_name": "nxtcloudAFhost",
>>     "ruleset": 2,
>>     "type": 3,
>>     "min_size": 3,
>>     "max_size": 7,
>>     "steps": [
>>     {
>>     "op": "set_chooseleaf_tries",
>>     "num": 5
>>     },
>>     {
>>     "op": "set_choose_tries",
>>     "num": 100
>>     },
>>     {
>>     "op": "take",
>>     "item": -1,
>>     "item_name": "default"
>>     },
>>     {
>>     "op": "choose_indep",
>>     "num": 0,
>>     "type": "host"
>>     },
>>     {
>>     "op": "emit"
>>
>> And I have changed the pool to this new crush rule:
>>
>> # ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost
>>
>> But suddenly the cephfs it's unavailable:
>>
>> # ceph status
>>   cluster:
>>     id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
>>     health: HEALTH_WARN
>>     11 clients failing to respond to capability release
>>     2 MDSs report slow metadata IOs
>>     1 MDSs report slow requests
>>
>>
>> And clients failing to respond:
>>
>> HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
>> report slow metadata IOs; 1 MDSs report slow requests
>> MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
>> release
>>     mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
>> capability release client_id: 1524269
>>     mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to respond to
>>
>>
>> I reversed the change, returning to the original crush rule, and all
>> it's Ok. My question if it's possible to change on fly the crush rule of
>> a EC pool.
>>
>>
>> Thanks
>> El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:
>>> Thanks, I will test it.
>>>
>>> El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:
 Create a new crush rule with the correct failure domain, test it
 properly and assign it to the pool(s).

>>>
>>
>> -- 
>> ***
>> Andrés Rojas Guerrero
>> Unidad Sistemas Linux
>> Area Arquitectura Tecnológica
>> Secretaría General Adjunta de Informática
>> Consejo Superior de Investigaciones Científicas (CSIC)
>> Pinar 19
>> 28006 - Madrid
>> Tel: +34 915680059 -- Ext. 990059
>> email: a.ro...@csic.es
>> ID comunicate.csic.es: @50852720l:matrix.csic.es
>> ***
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:
> Yes it is possible, but you should validate it with crushtool before
> injecting it to make sure the PGs land where they belong.
> 
> crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
> crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings
> 
> If you don't get bad mappings and the 'show-mappings' confirms the PG
> distribution by host you can inject it. But be aware of a lot of data
> movement, that could explain the (temporarily) unavailable PGs. But to
> make your cluster resilient against host failure you'll have to go
> through that at some point.
> 
> 
> https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/
> 
> 
> Zitat von Andres Rojas Guerrero :
> 
>> Hi, I try to make a new crush rule (Nautilus) in order take the new
>> correct_failure_domain to hosts:
>>
>>    "rule_id": 2,
>>     "rule_name": "nxtcloudAFhost",
>>     "ruleset": 2,
>>     "type": 3,
>>     "min_size": 3,
>>     "max_size": 7,
>>     "steps": [
>>     {
>>     "op": "set_chooseleaf_tries",
>>     "num": 5
>>     },
>>     {
>>     "op": "set_choose_tries",
>>     "num": 100
>>     },
>>     {
>>     "op": "take",
>>     "item": -1,
>>     "item_name": "default"
>>     },
>>     {
>>     "op": "choose_indep",
>>     "num": 0,
>>     "type": "host"
>>     },
>>     {
>>     "op": "emit"
>>
>> And I have changed the pool to this new crush rule:
>>
>> # ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost
>>
>> But suddenly the cephfs it's unavailable:
>>
>> # ceph status
>>   cluster:
>>     id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
>>     health: HEALTH_WARN
>>     11 clients failing to respond to capability release
>>     2 MDSs report slow metadata IOs
>>     1 MDSs report slow requests
>>
>>
>> And clients failing to respond:
>>
>> HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
>> report slow metadata IOs; 1 MDSs report slow requests
>> MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
>> release
>>     mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
>> capability release client_id: 1524269
>>     mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to respond to
>>
>>
>> I reversed the change, returning to the original crush rule, and all
>> it's Ok. My question if it's possible to change on fly the crush rule of
>> a EC pool.
>>
>>
>> Thanks
>> El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:
>>> Thanks, I will test it.
>>>
>>> El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:
 Create a new crush rule with the correct failure domain, test it
 properly and assign it to the pool(s).

>>>
>>
>> -- 
>> ***
>> Andrés Rojas Guerrero
>> Unidad Sistemas Linux
>> Area Arquitectura Tecnológica
>> Secretaría General Adjunta de Informática
>> Consejo Superior de Investigaciones Científicas (CSIC)
>> Pinar 19
>> 28006 - Madrid
>> Tel: +34 915680059 -- Ext. 990059
>> email: a.ro...@csic.es
>> ID comunicate.csic.es: @50852720l:matrix.csic.es
>> ***
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Eugen Block
Interesting, I haven't had that yet with crushtool. Your ceph version  
is Nautilus, right? And you did decompile the binary crushmap with  
crushtool, correct? I don't know how to reproduce that.


Zitat von Andres Rojas Guerrero :


I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:

Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.

crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-bad-mappings

If you don't get bad mappings and the 'show-mappings' confirms the PG
distribution by host you can inject it. But be aware of a lot of data
movement, that could explain the (temporarily) unavailable PGs. But to
make your cluster resilient against host failure you'll have to go
through that at some point.


https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
    "rule_name": "nxtcloudAFhost",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 7,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -1,
    "item_name": "default"
    },
    {
    "op": "choose_indep",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
    id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
    11 clients failing to respond to capability release
    2 MDSs report slow metadata IOs
    1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
release
    mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
    mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to respond to


I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush rule of
a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph stretch mode enabling

2021-05-06 Thread Felix O
Hello,

I'm trying to deploy my test ceph cluster and enable stretch mode (
https://docs.ceph.com/en/latest/rados/operations/stretch-mode/). My problem
is enabling the stretch mode.

$ ceph mon enable_stretch_mode ceph-node-05 stretch_rule datacenter
Error EINVAL: Could not find location entry for datacenter on monitor
ceph-node-05

ceph-node-5 is the tiebreaker monitor

I tried to create the third datacenter and put the tiebreaker there but got
the following error:

root@ceph-node-01:/home/clouduser# ceph mon enable_stretch_mode
ceph-node-05 stretch_rule datacenter
Error EINVAL: there are 3datacenter's in the cluster but stretch mode
currently only works with 2!

An additional info:

Setup method: cephadm (https://docs.ceph.com/en/latest/cephadm/install/)

# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME  STATUS  REWEIGHT  PRI-AFF
 -1 0.03998  root default
-11 0.01999  datacenter site1
 -5 0.00999  host ceph-node-01
  0hdd  0.00999  osd.0  up   1.0  1.0
 -3 0.00999  host ceph-node-02
  1hdd  0.00999  osd.1  up   1.0  1.0
-12 0.01999  datacenter site2
 -9 0.00999  host ceph-node-03
  3hdd  0.00999  osd.3  up   1.0  1.0
 -7 0.00999  host ceph-node-04
  2hdd  0.00999  osd.2  up   1.0  1.0

stretch_rule is added to the crush

# ceph mon set_location ceph-node-01 datacenter=site1
# ceph mon set_location ceph-node-02 datacenter=site1
# ceph mon set_location ceph-node-03 datacenter=site2
# ceph mon set_location ceph-node-04 datacenter=site2

# ceph versions
{
"mon": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 5
},
"mgr": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 2
},
"osd": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 4
},
"mds": {},
"overall": {
"ceph version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10)
pacific (stable)": 11
}
}

Thank you for your support.

--
Best regards,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
Hi all,

I lost 2 OSDs deployed on a single Kingston SSD in a rather strange way and am 
wondering if anyone has made similar observations or is aware of a firmware bug 
with these disks.

Disk model: KINGSTON SEDC500M3840G (it ought to be a DC grade model with super 
capacitors)
Smartctl does not report any drive errors.
Performance per TB is as expected, OSDs are "ceph-volume lvm batch" bluestore 
deployed, everything collocated.

Short version: I disable volatile write cache on all OSD disks, but the 
Kingston disks seem to behave as if this cache is *not* disabled. Smartctl and 
hdparm report wcache=off though. The OSD loss looks like what unflushed write 
cache during power loss would result in. I'm afraid now that our cluster might 
be vulnerable to power loss.

Long version:

Our disks are on Dell HBA330 Mini controllers and are in state "non-raid". The 
controller itself has no cache and is HBA-mode only.

Log entry:

The iDRAC log shows that the disk was removed from a drive group:

---
PDR5Disk 6 in Backplane 2 of Integrated Storage Controller 1 is removed.
Detailed Description: A physical disk has been removed from the disk group. 
This alert can also be caused by loose or defective cables or by problems with 
the enclosure.
---

The iDRAC did not report the disk as failed and neither as "removed from drive 
bay". I reseated the disk and it came back as healthy. I assume it was a 
problem with connectivity to the back-plane (chassis). If I now try to start up 
the OSDs on this disk, I get the error:

starting osd.581 at - osd_data /var/lib/ceph/osd/ceph-581 
/var/lib/ceph/osd/ceph-581/journal
starting osd.580 at - osd_data /var/lib/ceph/osd/ceph-580 
/var/lib/ceph/osd/ceph-580/journal
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluefs mount failed to replay log: (5) 
Input/output error
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluestore(/var/lib/ceph/osd/ceph-581) 
_open_db failed bluefs mount: (5) Input/output error
2021-05-06 09:23:47.630 7fead5a1fb80 -1 osd.581 0 OSD:init: unable to mount 
object store
2021-05-06 09:23:47.630 7fead5a1fb80 -1  ** ERROR: osd init failed: (5) 
Input/output error

I have removed disks of active OSDs before without any bluestore corruption 
happening. While it is very well possible that this particular "disconnect" 
event may lead to a broken OSD, there is also another observation where the 
Kingston disks stick out compared with other SSD OSDs, which make me suspicious 
of this being a disk cache firmware problem:

The I/O indicator LED lights up with significantly lower frequency than for all 
other SSD types on the same pool even though we have 2 instead of 1 OSD 
deployed on the Kingstons (the other disks are 2TB Micron Pro). While this 
could be due to a wiring difference I'm starting to suspect that this might be 
an indication of volatile caching.

Does anyone using Kingston DC-M-SSDs have similar or contradicting experience?
How did these disks handle power outages?
Any recommendations?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW Beast SSL version

2021-05-06 Thread Glen Baars
Hello Ceph,

Can you set the SSL min version? Such as TLS1.2?

Glen
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero

Yes, my ceph version is Nautilus:

# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) nautilus 
(stable)


First dump the crush map:

# ceph osd getcrushmap -o crush_map

Then, decompile the crush map:

# crushtool -d crush_map -o crush_map_d


Now, edit the crush rule and compile:

# crushtool -c crush_map_d -o crush_map_new


An finally test the mappings:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f2d717acb40 thread_name:crushtool


El 6/5/21 a las 14:13, Eugen Block escribió:
Interesting, I haven't had that yet with crushtool. Your ceph version is 
Nautilus, right? And you did decompile the binary crushmap with 
crushtool, correct? I don't know how to reproduce that.


Zitat von Andres Rojas Guerrero :


I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:

Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.

crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 
--show-bad-mappings


If you don't get bad mappings and the 'show-mappings' confirms the PG
distribution by host you can inject it. But be aware of a lot of data
movement, that could explain the (temporarily) unavailable PGs. But to
make your cluster resilient against host failure you'll have to go
through that at some point.


https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
    "rule_name": "nxtcloudAFhost",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 7,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -1,
    "item_name": "default"
    },
    {
    "op": "choose_indep",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
    id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
    11 clients failing to respond to capability release
    2 MDSs report slow metadata IOs
    1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 MDSs
report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
release
    mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
    mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to 
respond to



I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush 
rule of

a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
__

[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Andrew Walker-Brown
Hi Frank,

I’m running the same SSDs (approx. 20) in Dell servers on HBA330’s.  Haven’t 
had any issues and have suffered at least one power outage.  Just checking the 
wcache setting and it shows as enabled.

Running Octopus 15.1.9 and docker containers.  Originally part of a Proxmox 
cluster but now standalone Ceph.

Cheers,

A

Sent from Mail for Windows 10

From: Frank Schilder
Sent: 06 May 2021 10:11
To: ceph-users@ceph.io
Subject: [ceph-users] OSD lost: firmware bug in Kingston SSDs?

Hi all,

I lost 2 OSDs deployed on a single Kingston SSD in a rather strange way and am 
wondering if anyone has made similar observations or is aware of a firmware bug 
with these disks.

Disk model: KINGSTON SEDC500M3840G (it ought to be a DC grade model with super 
capacitors)
Smartctl does not report any drive errors.
Performance per TB is as expected, OSDs are "ceph-volume lvm batch" bluestore 
deployed, everything collocated.

Short version: I disable volatile write cache on all OSD disks, but the 
Kingston disks seem to behave as if this cache is *not* disabled. Smartctl and 
hdparm report wcache=off though. The OSD loss looks like what unflushed write 
cache during power loss would result in. I'm afraid now that our cluster might 
be vulnerable to power loss.

Long version:

Our disks are on Dell HBA330 Mini controllers and are in state "non-raid". The 
controller itself has no cache and is HBA-mode only.

Log entry:

The iDRAC log shows that the disk was removed from a drive group:

---
PDR5 Disk 6 in Backplane 2 of Integrated Storage Controller 1 is removed.
Detailed Description: A physical disk has been removed from the disk group. 
This alert can also be caused by loose or defective cables or by problems with 
the enclosure.
---

The iDRAC did not report the disk as failed and neither as "removed from drive 
bay". I reseated the disk and it came back as healthy. I assume it was a 
problem with connectivity to the back-plane (chassis). If I now try to start up 
the OSDs on this disk, I get the error:

starting osd.581 at - osd_data /var/lib/ceph/osd/ceph-581 
/var/lib/ceph/osd/ceph-581/journal
starting osd.580 at - osd_data /var/lib/ceph/osd/ceph-580 
/var/lib/ceph/osd/ceph-580/journal
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluefs mount failed to replay log: (5) 
Input/output error
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluestore(/var/lib/ceph/osd/ceph-581) 
_open_db failed bluefs mount: (5) Input/output error
2021-05-06 09:23:47.630 7fead5a1fb80 -1 osd.581 0 OSD:init: unable to mount 
object store
2021-05-06 09:23:47.630 7fead5a1fb80 -1  ** ERROR: osd init failed: (5) 
Input/output error

I have removed disks of active OSDs before without any bluestore corruption 
happening. While it is very well possible that this particular "disconnect" 
event may lead to a broken OSD, there is also another observation where the 
Kingston disks stick out compared with other SSD OSDs, which make me suspicious 
of this being a disk cache firmware problem:

The I/O indicator LED lights up with significantly lower frequency than for all 
other SSD types on the same pool even though we have 2 instead of 1 OSD 
deployed on the Kingstons (the other disks are 2TB Micron Pro). While this 
could be due to a wiring difference I'm starting to suspect that this might be 
an indication of volatile caching.

Does anyone using Kingston DC-M-SSDs have similar or contradicting experience?
How did these disks handle power outages?
Any recommendations?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread David Caro
On 05/06 14:03, mabi wrote:
> Hello,
> 
> I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with 
> cephadm and I added a second OSD to one of my 3 OSD nodes. I started then 
> copying data to my ceph fs mounted with kernel mount but then both OSDs on 
> that specific nodes crashed.
> 
> To this topic I have the following questions:
> 
> 1) How can I find out why the two OSD crashed? because everything is in 
> podman containers I don't know where are the logs to find out the reason why 
> this happened. From the OS itself everything looks ok, there was no out of 
> memory error.

There should be some logs under /var/log/ceph//osd./ on 
the host/hosts that were running the osds.
I found myself sometimes though disabling the '--rm' flag for the pod in the 
'unit.run' script under
/va/lib/ceph//osd./unit.run to make podman persist the container 
and be able to do a 'podman logs' on it.
Though that's probably sensible only when debugging.

> 
> 2) I would assume the two OSD container would restart on their own but this 
> is not the case it looks like. How can I restart manually these 2 OSD 
> containers on that node? I believe this should be a "cephadm orch" command?

I think 'ceph orch daemon redeploy' might do it? What is the output of 'ceph 
orch ls' and 'ceph orch ps'?
> 
> The health of the cluster right now is:
> 
> CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
> PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded 
> (33.333%), 65 pgs degraded, 65 pgs undersized
> 
> Thank your for your hints.
> 
> Best regards,
> Mabi
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Out of Memory after Upgrading to Nautilus

2021-05-06 Thread Didier GAZEN

Hi Christoph,

I am currently using Nautilus on a ceph cluster with osd_memory_target 
defined in ceph.conf on each node.


By running :

ceph config get osd.40 osd_memory_target

you get the default value for the parameter osd_memory_target 
(4294967296 for nautilus)


If you change the ceph.conf file and restart the osd service as you 
said, it is REALLY working, but you must check it with the command:


ceph config show osd.40

that will output several lines and the one you are interested in:

NAME VALUE SOURCE OVERRIDES IGNORES
...
osd_memory_target 1073741824 file
...

indicating the value you have specified in the ceph.conf file.

You can try again...

Didier

On 5/6/21 10:32 AM, Christoph Adomeit wrote:

It looks that I have solved the issue.

I tried:
ceph.conf
[osd]
  osd_memory_target = 1073741824

systemctl restart ceph-osd.target

when i run
ceph config get osd.40 osd_memory_target it returns:
4294967296

so this did not work.

Next I tried:
ceph tell osd.* injectargs '--osd_memory_target 1073741824'

and ceph returns:
ceph config get osd.40 osd_memory_target
4294967296

So this also dir not work in 14.2.20

Next I tried:

ceph config set osd/class:hdd osd_memory_target 1073741824

and that finally worked.

I also slowly increased the memory and so far I use:

ceph config set osd/class:hdd osd_memory_target 2147483648

for now.

Thanks
   Christoph
On Wed, May 05, 2021 at 04:30:17PM +0200, Christoph Adomeit wrote:

I manage a historical cluster of severak ceph nodes with each 128 GB Ram and 36 
OSD each 8 TB size.

The cluster ist just for archive purpose and performance is not so important.

The cluster was running fine for long time using ceph luminous.

Last week I updated it to Debian 10 and Ceph Nautilus.

Now I can see that the memory usage of each osd grows slowly to 4 GB each and 
once the system has
no memory left it will oom-kill processes

I have already configured osd_memory_target = 1073741824 .
This helps for some hours but then memory usage will grow from 1 GB to 4 GB per 
OSD.

Any ideas what I can do to further limit osd memory usage ?

It would be good to keep the hardware running some more time without upgrading 
RAM on all
OSD machines.

Any Ideas ?

Thanks
   Christoph
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Write Ops on CephFS Increasing exponentially

2021-05-06 Thread Kyle Dean
Hi, hoping someone could help me get to the bottom of this particular issue I'm 
having.

I have ceph octopus installed using ceph-ansible.

Currently, I have 3 MDS servers running, and one client connected to the active 
MDS. I'm currently storing a very large encrypted container on the CephFS file 
system, 8TB worth, and I'm writing data into it from the client host.

recently I have noticed a severe impact on performance, and the time take to do 
processing on file within the container has increased from 1 minute to 11 
minutes.

in the ceph dashboard, when I take a look at the performance tab on the file 
system page, the Write Ops are increasing exponentially over time.

At the end of April around the 22nd I had 49 write Ops on the performance page 
for the MDS deamons. This is now at 266467 Write Ops and increasing.

Also the client requests have gone from 14 to 67 to 117 and is now at 283

would someone be able to help me make sense of why the performance has 
decreased and what is going on with the client requests and write operations.

Kind regards,

kyle
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Kai Börnert

Hi all,

upon updating to 16.2.2 via cephadm  the upgrade is being stuck on the 
first mgr


Looking into this via docker logs I see that it is still loading modules 
when it is apparently terminated and restarted in a loop


When pausing the update, the mgr succeeds to start with the new version, 
however when resuming the update, it seems to try to update it again 
even tho it already has the new version, leading to the exact same loop.


Is there some setting or workaround to increase the time before it is 
attempted to be redeployed, or can this behavior be caused by something 
else?


Greetings,

Kai
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Sage Weil
Hi!

I hit the same issue.  This was a bug in 16.2.0 that wasn't completely
fixed, but I think we have it this time.  Kicking of a 16.2.3 build
now to resolve the problem.

(Basically, sometimes docker calls the image docker.io/ceph/ceph:foo
and somethings it's ceph/ceph:foo, and our attempt to normalize missed
one case.)

sage

On Thu, May 6, 2021 at 9:59 AM Kai Börnert  wrote:
>
> Hi all,
>
> upon updating to 16.2.2 via cephadm  the upgrade is being stuck on the
> first mgr
>
> Looking into this via docker logs I see that it is still loading modules
> when it is apparently terminated and restarted in a loop
>
> When pausing the update, the mgr succeeds to start with the new version,
> however when resuming the update, it seems to try to update it again
> even tho it already has the new version, leading to the exact same loop.
>
> Is there some setting or workaround to increase the time before it is
> attempted to be redeployed, or can this behavior be caused by something
> else?
>
> Greetings,
>
> Kai
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Upgrade problem with cephadm

2021-05-06 Thread fcid

Hello ceph community,

I'm trying to upgrade a pacific (v16.2.0) cluster to the last version, 
but the upgrading process seems to be stuck. The mgr log (debug level) 
does not show any significant message regarding the upgrade, other than 
when it is started/paused/resumed/stopped.


2021-05-06T14:29:59.294725+ mgr.hostc.riclju (mgr.3935983) 35645 : 
cephadm [INF] Upgrade: Started with target docker.io/ceph/ceph:v16.2.2
2021-05-06T14:49:55.710023+ mgr.hostc.riclju (mgr.3935983) 36285 : 
cephadm [INF] Paused
2021-05-06T14:50:24.444742+ mgr.hostc.riclju (mgr.3935983) 36302 : 
cephadm [INF] Resumed
2021-05-06T14:51:36.888269+ mgr.hostc.riclju (mgr.3935983) 36349 : 
cephadm [INF] Upgrade: Paused upgrade to docker.io/ceph/ceph:v16.2.2
2021-05-06T14:51:50.411779+ mgr.hostc.riclju (mgr.3935983) 36357 : 
cephadm [INF] Upgrade: Resumed upgrade to docker.io/ceph/ceph:v16.2.2
2021-05-06T14:52:01.660682+ mgr.hostc.riclju (mgr.3935983) 36365 : 
cephadm [INF] Upgrade: Stopped


It may be worth mentioning that last week I had trouble trying to deploy 
RGWs. It was not possible to deploy de RGWs using this command


ceph orch apply rgw orbyta --realm=realma --zone=zonea --placement="2"

So the following were used

ceph orch daemon add rgw zonea --placement hostb
ceph orch daemon add rgw zonea --placement hosta

After those commands were issued the orchestrator would still not deploy 
the RGWs, unless the current MGR failed over to another standby MGR. 
After that, the RGWs where depoyed.


Another problem I have is the refresh parameter of the orchestrator. The 
last time the daemons listed in ceph orch ps where refreshed is the last 
time a MGR was set to failed, and issuing ceph orch ps --refresh does 
not seem to update


It looks like all those symptoms are related somehow, but I don't know 
how to dig further into the internals of the orchestrator to get more 
information.


I greatly appreciate if you can point me in the right direction.

Thank you, kind regards.

--
AltaVoz 
Fernando Cid
Ingeniero de Operaciones
www.altavoz.net 
Ubicación AltaVoz   
Viña del Mar: 2 Poniente 355 of 53 
 | +56 32 276 8060 

Santiago: Antonio Bellet 292 of 701 
 | +56 2 2585 4264 



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Robert Sander
Am 06.05.21 um 17:18 schrieb Sage Weil:

> I hit the same issue.  This was a bug in 16.2.0 that wasn't completely
> fixed, but I think we have it this time.  Kicking of a 16.2.3 build
> now to resolve the problem.

Great. I also hit that today. Thanks for fixing it quickly.

Regards
-- 
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD lost: firmware bug in Kingston SSDs?

2021-05-06 Thread Frank Schilder
Hi Andrew,

thanks, that is reassuring. To be sure, I plan to do a few power out tests with 
this server. Never had any issues with that so far, its the first time I see a 
corrupted OSD.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Andrew Walker-Brown 
Sent: 06 May 2021 15:23:30
To: Frank Schilder; ceph-users@ceph.io
Subject: RE: OSD lost: firmware bug in Kingston SSDs?

Hi Frank,

I’m running the same SSDs (approx. 20) in Dell servers on HBA330’s.  Haven’t 
had any issues and have suffered at least one power outage.  Just checking the 
wcache setting and it shows as enabled.

Running Octopus 15.1.9 and docker containers.  Originally part of a Proxmox 
cluster but now standalone Ceph.

Cheers,

A

Sent from Mail for Windows 10

From: Frank Schilder
Sent: 06 May 2021 10:11
To: ceph-users@ceph.io
Subject: [ceph-users] OSD lost: firmware bug in Kingston SSDs?

Hi all,

I lost 2 OSDs deployed on a single Kingston SSD in a rather strange way and am 
wondering if anyone has made similar observations or is aware of a firmware bug 
with these disks.

Disk model: KINGSTON SEDC500M3840G (it ought to be a DC grade model with super 
capacitors)
Smartctl does not report any drive errors.
Performance per TB is as expected, OSDs are "ceph-volume lvm batch" bluestore 
deployed, everything collocated.

Short version: I disable volatile write cache on all OSD disks, but the 
Kingston disks seem to behave as if this cache is *not* disabled. Smartctl and 
hdparm report wcache=off though. The OSD loss looks like what unflushed write 
cache during power loss would result in. I'm afraid now that our cluster might 
be vulnerable to power loss.

Long version:

Our disks are on Dell HBA330 Mini controllers and are in state "non-raid". The 
controller itself has no cache and is HBA-mode only.

Log entry:

The iDRAC log shows that the disk was removed from a drive group:

---
PDR5 Disk 6 in Backplane 2 of Integrated Storage Controller 1 is removed.
Detailed Description: A physical disk has been removed from the disk group. 
This alert can also be caused by loose or defective cables or by problems with 
the enclosure.
---

The iDRAC did not report the disk as failed and neither as "removed from drive 
bay". I reseated the disk and it came back as healthy. I assume it was a 
problem with connectivity to the back-plane (chassis). If I now try to start up 
the OSDs on this disk, I get the error:

starting osd.581 at - osd_data /var/lib/ceph/osd/ceph-581 
/var/lib/ceph/osd/ceph-581/journal
starting osd.580 at - osd_data /var/lib/ceph/osd/ceph-580 
/var/lib/ceph/osd/ceph-580/journal
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluefs mount failed to replay log: (5) 
Input/output error
2021-05-06 09:23:47.160 7fead5a1fb80 -1 bluestore(/var/lib/ceph/osd/ceph-581) 
_open_db failed bluefs mount: (5) Input/output error
2021-05-06 09:23:47.630 7fead5a1fb80 -1 osd.581 0 OSD:init: unable to mount 
object store
2021-05-06 09:23:47.630 7fead5a1fb80 -1  ** ERROR: osd init failed: (5) 
Input/output error

I have removed disks of active OSDs before without any bluestore corruption 
happening. While it is very well possible that this particular "disconnect" 
event may lead to a broken OSD, there is also another observation where the 
Kingston disks stick out compared with other SSD OSDs, which make me suspicious 
of this being a disk cache firmware problem:

The I/O indicator LED lights up with significantly lower frequency than for all 
other SSD types on the same pool even though we have 2 instead of 1 OSD 
deployed on the Kingstons (the other disks are 2TB Micron Pro). While this 
could be due to a wiring difference I'm starting to suspect that this might be 
an indication of volatile caching.

Does anyone using Kingston DC-M-SSDs have similar or contradicting experience?
How did these disks handle power outages?
Any recommendations?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread mabi
Hello,

I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with 
cephadm and I added a second OSD to one of my 3 OSD nodes. I started then 
copying data to my ceph fs mounted with kernel mount but then both OSDs on that 
specific nodes crashed.

To this topic I have the following questions:

1) How can I find out why the two OSD crashed? because everything is in podman 
containers I don't know where are the logs to find out the reason why this 
happened. From the OS itself everything looks ok, there was no out of memory 
error.

2) I would assume the two OSD container would restart on their own but this is 
not the case it looks like. How can I restart manually these 2 OSD containers 
on that node? I believe this should be a "cephadm orch" command?

The health of the cluster right now is:

CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded 
(33.333%), 65 pgs degraded, 65 pgs undersized

Thank your for your hints.

Best regards,
Mabi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v16.2.3 Pacific released

2021-05-06 Thread David Galloway
This is the third backport release in the Pacific series.  We recommend
all users update to this release.

Notable Changes
---

* This release fixes a cephadm upgrade bug that caused some systems to
get stuck in a loop restarting the first mgr daemon.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-16.2.3.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 381b476cb3900f9a92eb95d03b4850b953cfd79a
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to find out why osd crashed with cephadm/podman containers?

2021-05-06 Thread mabi
Thank you very much for the hint regarding the log files, I wasn't aware that 
it still saves the logs on the host although everything is running in 
containers nowadays.

So there was nothing in the log files but I could find out that finally the 
host (a RasPi4) could not cope with 2 SSD external USB disks connected to it. 
Probably due to not enough power, so the disks disappeared and the OSD went 
away with them. After a restart of the host the disks where back as well as the 
OSD containers. So I have now remove that second OSD and will keep only one 
single OSD per server.

For reference here is the relevant part of the kernel log I saw:

[Thu May  6 15:24:34 2021] blk_update_request: I/O error, dev sda, sector 
40063143 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Thu May  6 15:24:34 2021] usb 1-1-port4: over-current change #1

and of course it did that for both sda and sdb.


‐‐‐ Original Message ‐‐‐
On Thursday, May 6, 2021 4:17 PM, David Caro  wrote:

> On 05/06 14:03, mabi wrote:
>
> > Hello,
> > I have a small 6 nodes Octopus 15.2.11 cluster installed on bare metal with 
> > cephadm and I added a second OSD to one of my 3 OSD nodes. I started then 
> > copying data to my ceph fs mounted with kernel mount but then both OSDs on 
> > that specific nodes crashed.
> > To this topic I have the following questions:
> >
> > 1.  How can I find out why the two OSD crashed? because everything is in 
> > podman containers I don't know where are the logs to find out the reason 
> > why this happened. From the OS itself everything looks ok, there was no out 
> > of memory error.
>
> There should be some logs under /var/log/ceph//osd./ on 
> the host/hosts that were running the osds.
> I found myself sometimes though disabling the '--rm' flag for the pod in the 
> 'unit.run' script under
> /va/lib/ceph//osd./unit.run to make podman persist the 
> container and be able to do a 'podman logs' on it.
> Though that's probably sensible only when debugging.
>
> > 2.  I would assume the two OSD container would restart on their own but 
> > this is not the case it looks like. How can I restart manually these 2 OSD 
> > containers on that node? I believe this should be a "cephadm orch" command?
>
> I think 'ceph orch daemon redeploy' might do it? What is the output of 'ceph 
> orch ls' and 'ceph orch ps'?
>
> > The health of the cluster right now is:
> >
> > CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
> > PG_DEGRADED: Degraded data redundancy: 132518/397554 objects degraded 
> > (33.333%), 65 pgs degraded, 65 pgs undersized
> >
> >
> > Thank your for your hints.
> > Best regards,
> > Mabi
> >
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> --
>
> David Caro
> SRE - Cloud Services
> Wikimedia Foundation https://wikimediafoundation.org/
> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
>
> "Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment."
>
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Clyso GmbH - Ceph Foundation Member

Hi Andres,

does the commando work with the original rule/crushmap?


___
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.clyso.com

Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:

Yes, my ceph version is Nautilus:

# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) 
nautilus (stable)


First dump the crush map:

# ceph osd getcrushmap -o crush_map

Then, decompile the crush map:

# crushtool -d crush_map -o crush_map_d


Now, edit the crush rule and compile:

# crushtool -c crush_map_d -o crush_map_new


An finally test the mappings:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f2d717acb40 thread_name:crushtool


El 6/5/21 a las 14:13, Eugen Block escribió:
Interesting, I haven't had that yet with crushtool. Your ceph version 
is Nautilus, right? And you did decompile the binary crushmap with 
crushtool, correct? I don't know how to reproduce that.


Zitat von Andres Rojas Guerrero :


I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 
--show-mappings

CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:

Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.

crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 
--show-bad-mappings


If you don't get bad mappings and the 'show-mappings' confirms the PG
distribution by host you can inject it. But be aware of a lot of data
movement, that could explain the (temporarily) unavailable PGs. But to
make your cluster resilient against host failure you'll have to go
through that at some point.


https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
    "rule_name": "nxtcloudAFhost",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 7,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -1,
    "item_name": "default"
    },
    {
    "op": "choose_indep",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
    id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
    11 clients failing to respond to capability release
    2 MDSs report slow metadata IOs
    1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 
MDSs

report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
release
    mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
    mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to 
respond to



I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush 
rule of

a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Conse

[ceph-users] Slow performance and many slow ops

2021-05-06 Thread codignotto
Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72 OSD,
I am using CEPH Octopos in its latest version, the deployment was done
using ceph admin and containers according to the dosing, we are having some
problems with performance of the cluster, I mount it on a proxmox cluster
and on windows VMs I have the problem of the disks being 100% occupied with
a simple browser opening, when I switch to another NFS storage for example
everything goes back to normal, I have the CEPH cluster now mounted and
with only 1 VM inside it, and we have the problem of slowness and slow ops,
the network speed between the hosts in the cluster is 25Gb tested with
iperf, between ceph and proxmox is 25Gb per host, someone already passed
that?


Many Tks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow performance and many slow ops

2021-05-06 Thread Mario Giammarco
We need more details, but are you using krbd? iothread? and so on?

Il giorno gio 6 mag 2021 alle ore 22:38 codignotto 
ha scritto:

> Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72 OSD,
> I am using CEPH Octopos in its latest version, the deployment was done
> using ceph admin and containers according to the dosing, we are having some
> problems with performance of the cluster, I mount it on a proxmox cluster
> and on windows VMs I have the problem of the disks being 100% occupied with
> a simple browser opening, when I switch to another NFS storage for example
> everything goes back to normal, I have the CEPH cluster now mounted and
> with only 1 VM inside it, and we have the problem of slowness and slow ops,
> the network speed between the hosts in the cluster is 25Gb tested with
> iperf, between ceph and proxmox is 25Gb per host, someone already passed
> that?
>
>
> Many Tks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow performance and many slow ops

2021-05-06 Thread codignotto
Hello Mario, yes I am using KRBD in the proxmox, ceph is a separate
cluster and proxmox another cluster, I connect the CEPH to the promox
using the RBD and in the configuration of the storage I select KRBD,
IO and SSD


Em qui., 6 de mai. de 2021 às 18:15, Mario Giammarco 
escreveu:

> We need more details, but are you using krbd? iothread? and so on?
>
> Il giorno gio 6 mag 2021 alle ore 22:38 codignotto 
> ha scritto:
>
>> Hello, I have 6 hosts with 12 SSD disks on each host for a total of 72
>> OSD,
>> I am using CEPH Octopos in its latest version, the deployment was done
>> using ceph admin and containers according to the dosing, we are having
>> some
>> problems with performance of the cluster, I mount it on a proxmox cluster
>> and on windows VMs I have the problem of the disks being 100% occupied
>> with
>> a simple browser opening, when I switch to another NFS storage for example
>> everything goes back to normal, I have the CEPH cluster now mounted and
>> with only 1 VM inside it, and we have the problem of slowness and slow
>> ops,
>> the network speed between the hosts in the cluster is 25Gb tested with
>> iperf, between ceph and proxmox is 25Gb per host, someone already passed
>> that?
>>
>>
>> Many Tks
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Stuck OSD service specification - can't remove

2021-05-06 Thread David Orman
Has anybody run into a 'stuck' OSD service specification? I've tried
to delete it, but it's stuck in 'deleting' state, and has been for
quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:

NAME   PORTS  RUNNING  REFRESHED   AGE  PLACEMENT
osd.osd_spec504/52512m  label:osd
root@ceph01:/# ceph orch rm osd.osd_spec
Removed service osd.osd_spec

>From active monitor:

debug 2021-05-06T23:14:48.909+ 7f17d310b700  0
log_channel(cephadm) log [INF] : Remove service osd.osd_spec

Yet in ls, it's still there, same as above. --export on it:

root@ceph01:/# ceph orch ls osd.osd_spec --export
service_type: osd
service_id: osd_spec
service_name: osd.osd_spec
placement: {}
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore

We've tried --force, as well, with no luck.

To be clear, the --export even prior to delete looks nothing like the
actual service specification we're using, even after I re-apply it, so
something seems 'bugged'. Here's the OSD specification we're applying:

service_type: osd
service_id: osd_spec
placement:
  label: "osd"
data_devices:
  rotational: 1
db_devices:
  rotational: 0
db_slots: 12

I would appreciate any insight into how to clear this up (without
removing the actual OSDs, we're just wanting to apply the updated
service specification - we used to use host placement rules and are
switching to label-based).

Thanks,
David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Natutilus - not unmapping

2021-05-06 Thread Joe Comeau
 
Nautilus cluster is not unmapping
 
ceph 14.2.16
 
ceph report |grep "osdmap_.*_committed"
report 1175349142
"osdmap_first_committed": 285562,
"osdmap_last_committed": 304247,
we've set osd_map_cache_size = 2
but its is slowly growing to that difference as well
 
OSD map first committed is not changing for some strange reason
 
Cluster has been around and upgraded since either firefly or jewel
 
I have seen a few other with this problem to no solution to it
Any suggestions ?
 
 
Thanks Joe
 
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Natutilus - not unmapping

2021-05-06 Thread Matthias Grandl
Hi Joe,

are all PGs active+clean? If not, you will only get osdmap pruning, which
will try to keep only every 10th osdmap.
https://docs.ceph.com/en/latest/dev/mon-osdmap-prune/

If you have remapped PGs and need to urgently get rid of osdmaps, you can
try the upmap-remapped script to get to a pseudo clean state.

https://github.com/HeinleinSupport/cern-ceph-scripts/blob/master/tools/upmap/upmap-remapped.py


Matthias Grandl
Head of UX

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io

On Fri, May 7, 2021, 02:16 Joe Comeau  wrote:

>
> Nautilus cluster is not unmapping
>
> ceph 14.2.16
>
> ceph report |grep "osdmap_.*_committed"
> report 1175349142
> "osdmap_first_committed": 285562,
> "osdmap_last_committed": 304247,
> we've set osd_map_cache_size = 2
> but its is slowly growing to that difference as well
>
> OSD map first committed is not changing for some strange reason
>
> Cluster has been around and upgraded since either firefly or jewel
>
> I have seen a few other with this problem to no solution to it
> Any suggestions ?
>
>
> Thanks Joe
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero

No, it doesn't work with an unedit crush map file.



El 6/5/21 a las 18:28, Clyso GmbH - Ceph Foundation Member escribió:

Hi Andres,

does the commando work with the original rule/crushmap?


___
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.clyso.com

Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:

Yes, my ceph version is Nautilus:

# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) 
nautilus (stable)


First dump the crush map:

# ceph osd getcrushmap -o crush_map

Then, decompile the crush map:

# crushtool -d crush_map -o crush_map_d


Now, edit the crush rule and compile:

# crushtool -c crush_map_d -o crush_map_new


An finally test the mappings:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f2d717acb40 thread_name:crushtool


El 6/5/21 a las 14:13, Eugen Block escribió:
Interesting, I haven't had that yet with crushtool. Your ceph version 
is Nautilus, right? And you did decompile the binary crushmap with 
crushtool, correct? I don't know how to reproduce that.


Zitat von Andres Rojas Guerrero :


I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 
--show-mappings

CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:

Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.

crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 
--show-bad-mappings


If you don't get bad mappings and the 'show-mappings' confirms the PG
distribution by host you can inject it. But be aware of a lot of data
movement, that could explain the (temporarily) unavailable PGs. But to
make your cluster resilient against host failure you'll have to go
through that at some point.


https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
    "rule_name": "nxtcloudAFhost",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 7,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -1,
    "item_name": "default"
    },
    {
    "op": "choose_indep",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
    id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
    11 clients failing to respond to capability release
    2 MDSs report slow metadata IOs
    1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 
MDSs

report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
release
    mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
    mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to 
respond to



I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush 
rule of

a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
*

[ceph-users] Re: Ceph cluster not recover after OSD down

2021-05-06 Thread Andres Rojas Guerrero
But, it another cluster with version 14.2.16, it's working ... it's 
seems a problem of the version 14.2.6 ...?


El 6/5/21 a las 18:28, Clyso GmbH - Ceph Foundation Member escribió:

Hi Andres,

does the commando work with the original rule/crushmap?


___
Clyso GmbH - Ceph Foundation Member
supp...@clyso.com
https://www.clyso.com

Am 06.05.2021 um 15:21 schrieb Andres Rojas Guerrero:

Yes, my ceph version is Nautilus:

# ceph -v
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9) 
nautilus (stable)


First dump the crush map:

# ceph osd getcrushmap -o crush_map

Then, decompile the crush map:

# crushtool -d crush_map -o crush_map_d


Now, edit the crush rule and compile:

# crushtool -c crush_map_d -o crush_map_new


An finally test the mappings:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 --show-mappings
CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f2d717acb40 thread_name:crushtool


El 6/5/21 a las 14:13, Eugen Block escribió:
Interesting, I haven't had that yet with crushtool. Your ceph version 
is Nautilus, right? And you did decompile the binary crushmap with 
crushtool, correct? I don't know how to reproduce that.


Zitat von Andres Rojas Guerrero :


I have this error when try to show mappings with crushtool:

# crushtool -i crush_map_new --test --rule 2 --num-rep 7 
--show-mappings

CRUSH rule 2 x 0 [-5,-45,-49,-47,-43,-41,-29]
*** Caught signal (Segmentation fault) **
 in thread 7f7f7a0ccb40 thread_name:crushtool




El 6/5/21 a las 13:47, Eugen Block escribió:

Yes it is possible, but you should validate it with crushtool before
injecting it to make sure the PGs land where they belong.

crushtool -i crushmap.bin --test --rule 2 --num-rep 7 --show-mappings
crushtool -i crushmap.bin --test --rule 2 --num-rep 7 
--show-bad-mappings


If you don't get bad mappings and the 'show-mappings' confirms the PG
distribution by host you can inject it. But be aware of a lot of data
movement, that could explain the (temporarily) unavailable PGs. But to
make your cluster resilient against host failure you'll have to go
through that at some point.


https://docs.ceph.com/en/latest/rados/operations/crush-map-edits/


Zitat von Andres Rojas Guerrero :


Hi, I try to make a new crush rule (Nautilus) in order take the new
correct_failure_domain to hosts:

   "rule_id": 2,
    "rule_name": "nxtcloudAFhost",
    "ruleset": 2,
    "type": 3,
    "min_size": 3,
    "max_size": 7,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -1,
    "item_name": "default"
    },
    {
    "op": "choose_indep",
    "num": 0,
    "type": "host"
    },
    {
    "op": "emit"

And I have changed the pool to this new crush rule:

# ceph osd pool set nxtcloudAF crush_rule nxtcloudAFhost

But suddenly the cephfs it's unavailable:

# ceph status
  cluster:
    id: c74da5b8-3d1b-483e-8b3a-739134db6cf8
    health: HEALTH_WARN
    11 clients failing to respond to capability release
    2 MDSs report slow metadata IOs
    1 MDSs report slow requests


And clients failing to respond:

HEALTH_WARN 11 clients failing to respond to capability release; 2 
MDSs

report slow metadata IOs; 1 MDSs report slow requests
MDS_CLIENT_LATE_RELEASE 11 clients failing to respond to capability
release
    mdsceph2mon03(mds.1): Client nxtcl3: failing to respond to
capability release client_id: 1524269
    mdsceph2mon01(mds.0): Client nxtcl5:nxtclproAF failing to 
respond to



I reversed the change, returning to the original crush rule, and all
it's Ok. My question if it's possible to change on fly the crush 
rule of

a EC pool.


Thanks
El 5/5/21 a las 18:14, Andres Rojas Guerrero escribió:

Thanks, I will test it.

El 5/5/21 a las 16:37, Joachim Kraftmayer escribió:

Create a new crush rule with the correct failure domain, test it
properly and assign it to the pool(s).





--
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io