Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-23 Thread Frank Schilder
Hi Marc,

if you can exclude network problems, you can ignore this message.

The only time we observed something that might be similar to your problem was, 
when a network connection was overloaded. Potential causes include

- broadcast storm
- the "too much cache memory" issues 
https://www.suse.com/support/kb/doc/?id=7010287
- a network or I/O intensive scheduled task that runs at the same time on many 
machines
- a shared up-link between clients and ceph storage with insufficient peak 
capacity
- a bad link in a trunk

In our case, we observed two different network related break downs:

- broadcast storms, probably caused by a misbehaving router and
- a bad link in a trunk. The trunk was a switch stacking connection and failed 
due to a half-broken SFP transceiver. This was really bad and hard to find, 
because the hardware error was not detected by the internal health checks (the 
transceiver showed up as good). The symptom was, that packages just disappeared 
randomly and more likely the larger they were. However, no package losses were 
reported on the server NICs, because they got lost within the switch stack. 
Everything looked healthy. It just didn't work.

If a network connection becomes too congested, latency might get high enough 
for ceph or ceph clients to trigger time-outs. Also, connection attempts might 
repeatedly time out and fail in short succession. We also saw that OSD 
heartbeats did not arrive in time.

Ceph tends to react faster than other services to network issues, so you might 
not see ssh problems etc. while still having a network problem.

If your ceph cluster was healthy during the event (100% cpu load on an OSD is 
not necessarily unhealthy), this could indicate that it is not ceph related.

Some things worth checking:

- are there any health warnings or errors in the ceph.log
- are slow ops/requests reported
- do you have any network load/health monitoring in place (netdata is really 
good for this)
- are you collecting client/guest I/O stats with the hypervisor, do they peak 
during the incident
- are there high-network-load scheduled tasks on your machines (host or VM) or 
somewhere else affecting relevant network traffic (backups etc?)

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: ceph-users  on behalf of Marc Roos 

Sent: 20 May 2019 12:41:43
To: ceph-users
Subject: [ceph-users] cephfs causing high load on vm, taking down 15 min later 
another cephfs vm

I got my first problem with cephfs in a production environment. Is it
possible from these logfiles to deduct what happened?

svr1 is connected to ceph client network via switch
svr2 vm is collocated on c01 node.
c01 has osd's and the mon.a colocated.

svr1 was the first to report errors at 03:38:44. I have no error
messages reported of a network connection problem by any of the ceph
nodes. I have nothing in dmesg on c01.

[@c01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[@c01 ~]# uname -a
Linux c01 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux
[@c01 ~]# ceph versions
{
"mon": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 3
},
"osd": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 32
},
"mds": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 2
},
"rgw": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 2
},
"overall": {
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 42
}
}




[0] svr1 messages
May 20 03:36:01 svr1 systemd: Started Session 308978 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308979 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308979 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308980 of user root.
May 20 03:36:01 svr1 systemd: Started Session 308980 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308981 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308981 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308982 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308982 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308983 of user root.
May 20 03:38:01 svr1 systemd: Started Session 308983 of user root.
May 20 03:38:44 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:44 svr1 kernel: libceph: osd0 192.168.x.111:6814 io error
May 20 03:38:45 svr1 kernel: last message repeated 5 times
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 io error
May 20 03:38:45 svr1 kernel: libceph: mon0 192.168.x.111:6789 session
lost, hunting for new mon
May 20 

Re: [ceph-users] RGW metadata pool migration

2019-05-23 Thread Janne Johansson
Den ons 22 maj 2019 kl 17:43 skrev Nikhil Mitra (nikmitra) <
nikmi...@cisco.com>:

> Hi All,
>
> What are the metadata pools in an RGW deployment that need to sit on the
> fastest medium to better the client experience from an access standpoint ?
>
> Also is there an easy way to migrate these pools in a PROD scenario with
> minimal to no-outage if possible ?
>

We have lots of non-data pools on SSD pools and the data (and log) pools on
HDD.
It's a simple matter of making a crush ruleset for SSD and telling the
pools you want to move that they should use the SSD ruleset and they will
move over by themselves.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW metadata pool migration

2019-05-23 Thread Konstantin Shalygin

What are the metadata pools in an RGW deployment that need to sit on the 
fastest medium to better the client experience from an access standpoint ?
Also is there an easy way to migrate these pools in a PROD scenario with 
minimal to no-outage if possible ?


Just change crush rule to place default.rgw.buckets.index pool on your 
fastest drives.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush rule for "ssd first" but without knowing how much

2019-05-23 Thread Dan van der Ster
Did I understand correctly: you have a crush tree with both ssd and
hdd devices, and you want to direct PGs to the ssds, until they reach
some fullness threshold, and only then start directing PGs to the
hdds?

I can't think of a crush rule alone to achieve that. But something you
could do is add all the ssds & hdds to the crush tree, set the hdd
crush weights to 0.0, then start increasing those weights manually
once the ssd's reach 80% full or whatever.

-- dan

On Thu, May 23, 2019 at 10:29 AM Florent B  wrote:
>
> Hi everyone,
>
> I would like to create a crush rule saying to store as much data as
> possible on ssd class OSDs first (then hdd), but without entering how
> much OSDs in the rule (I don't know in advance how much there will be).
>
> Is it possible ? All examples seen on the web are always writing the
> number of OSD to select.
>
> Thank you.
>
> Florent
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does 
not change anything. Hence, the rados report is empty. Is there a way to 
stop the recovery wait to start the deep-scrub and get the output? I 
guess the recovery_wait might be caused by missing objects. Do we need 
to delete them first to get the recovery going?


Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
On Wed, May 22, 2019 at 4:31 AM Kevin Flöh > wrote:


Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we
have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the
PGs
to repair?

Regards,

Kevin


Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out 
why they are inconsistent. Do these steps and then we can figure out 
how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some 
of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the 
shards have the same data.


Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:
>
> Hi,
>
> we have set the PGs to recover and now they are stuck in 
> active+recovery_wait+degraded and instructing them to deep-scrub does not 
> change anything. Hence, the rados report is empty. Is there a way to stop the 
> recovery wait to start the deep-scrub and get the output? I guess the 
> recovery_wait might be caused by missing objects. Do we need to delete them 
> first to get the recovery going?
>
> Kevin
>
> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
>
> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:
>>
>> Hi,
>>
>> thank you, it worked. The PGs are not incomplete anymore. Still we have
>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
>> repair" and nothing happens. Does somebody know how we can get the PGs
>> to repair?
>>
>> Regards,
>>
>> Kevin
>
>
> Kevin,
>
> I just fixed an inconsistent PG yesterday. You will need to figure out why 
> they are inconsistent. Do these steps and then we can figure out how to 
> proceed.
> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of 
> them)
> 2. Print out the inconsistent report for each inconsistent PG. `rados 
> list-inconsistent-obj  --format=json-pretty`
> 3. You will want to look at the error messages and see if all the shards have 
> the same data.
>
> Robert LeBlanc
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Update minic to nautilus documentation error

2019-05-23 Thread Andres Rojas Guerrero
Hi all, I have followed the Ceph documentation in order to update from
Mimic to Nautilus:


https://ceph.com/releases/v14-2-0-nautilus-released/

The process gone well but I have seen that two links with important
information doesn't work:

"v2 network protocol"
"Updating ceph.conf and mon_host"

https://docs.ceph.com/docs/nautilus/rados/configuration/msgr2/#msgr2-ceph-conf

and the information about telemetry module it doesn't exist

https://ceph.com/mgr/telemetry/#telemetry

It would be nice if they could be corrected.



-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Marc Roos


I have been following this thread for a while, and thought I need to 
have 
 "major ceph disaster" alert on the monitoring ;)
 http://www.f1-outsourcing.eu/files/ceph-disaster.mp4 




-Original Message-
From: Kevin Flöh [mailto:kevin.fl...@kit.edu] 
Sent: donderdag 23 mei 2019 10:51
To: Robert LeBlanc
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Major ceph disaster

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does 
not change anything. Hence, the rados report is empty. Is there a way to 
stop the recovery wait to start the deep-scrub and get the output? I 
guess the recovery_wait might be caused by missing objects. Do we need 
to delete them first to get the recovery going?


Kevin


On 22.05.19 6:03 nachm., Robert LeBlanc wrote:


On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  
wrote:


Hi,

thank you, it worked. The PGs are not incomplete anymore. 
Still we have 
another problem, there are 7 PGs inconsistent and a cpeh pg 
repair is 
not doing anything. I just get "instructing pg 1.5dd on osd.24 
to 
repair" and nothing happens. Does somebody know how we can get 
the PGs 
to repair?

Regards,

Kevin



Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure 
out why they are inconsistent. Do these steps and then we can figure out 
how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix 
some of them)
2. Print out the inconsistent report for each inconsistent PG. 
`rados list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the 
shards have the same data.

Robert LeBlanc
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh

This is the current status of ceph:


  cluster:
    id: 23e72372-0d44-4cad-b24f-3641b14b86f4
    health: HEALTH_ERR
    9/125481144 objects unfound (0.000%)
    Degraded data redundancy: 9/497011417 objects degraded 
(0.000%), 7 pgs degraded
    9 stuck requests are blocked > 4096 sec. Implicated osds 
1,11,21,32,43,50,65


  services:
    mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
    mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
    mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3 
up:standby

    osd: 96 osds: 96 up, 96 in

  data:
    pools:   2 pools, 4096 pgs
    objects: 125.48M objects, 259TiB
    usage:   370TiB used, 154TiB / 524TiB avail
    pgs: 9/497011417 objects degraded (0.000%)
 9/125481144 objects unfound (0.000%)
 4078 active+clean
 11   active+clean+scrubbing+deep
 7    active+recovery_wait+degraded

  io:
    client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr

On 23.05.19 10:54 vorm., Dan van der Ster wrote:

What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does not 
change anything. Hence, the rados report is empty. Is there a way to stop the 
recovery wait to start the deep-scrub and get the output? I guess the 
recovery_wait might be caused by missing objects. Do we need to delete them 
first to get the recovery going?

Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the PGs
to repair?

Regards,

Kevin


Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out why they 
are inconsistent. Do these steps and then we can figure out how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the shards have 
the same data.

Robert LeBlanc


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.

Open a window with `watch ceph -s`, then in another window slowly do

ceph osd down 1
# then wait a minute or so for that osd.1 to re-peer fully.
ceph osd down 11
...

Continue that for each of the osds with stuck requests, or until there
are no more recovery_wait/degraded PGs.

After each `ceph osd down...`, you should expect to see several PGs
re-peer, and then ideally the slow requests will disappear and the
degraded PGs will become active+clean.
If anything else happens, you should stop and let us know.


-- dan

On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:
>
> This is the current status of ceph:
>
>
>cluster:
>  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>  health: HEALTH_ERR
>  9/125481144 objects unfound (0.000%)
>  Degraded data redundancy: 9/497011417 objects degraded
> (0.000%), 7 pgs degraded
>  9 stuck requests are blocked > 4096 sec. Implicated osds
> 1,11,21,32,43,50,65
>
>services:
>  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> up:standby
>  osd: 96 osds: 96 up, 96 in
>
>data:
>  pools:   2 pools, 4096 pgs
>  objects: 125.48M objects, 259TiB
>  usage:   370TiB used, 154TiB / 524TiB avail
>  pgs: 9/497011417 objects degraded (0.000%)
>   9/125481144 objects unfound (0.000%)
>   4078 active+clean
>   11   active+clean+scrubbing+deep
>   7active+recovery_wait+degraded
>
>io:
>  client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
>
> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> > What's the full ceph status?
> > Normally recovery_wait just means that the relevant osd's are busy
> > recovering/backfilling another PG.
> >
> > On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:
> >> Hi,
> >>
> >> we have set the PGs to recover and now they are stuck in 
> >> active+recovery_wait+degraded and instructing them to deep-scrub does not 
> >> change anything. Hence, the rados report is empty. Is there a way to stop 
> >> the recovery wait to start the deep-scrub and get the output? I guess the 
> >> recovery_wait might be caused by missing objects. Do we need to delete 
> >> them first to get the recovery going?
> >>
> >> Kevin
> >>
> >> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
> >>
> >> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:
> >>> Hi,
> >>>
> >>> thank you, it worked. The PGs are not incomplete anymore. Still we have
> >>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
> >>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
> >>> repair" and nothing happens. Does somebody know how we can get the PGs
> >>> to repair?
> >>>
> >>> Regards,
> >>>
> >>> Kevin
> >>
> >> Kevin,
> >>
> >> I just fixed an inconsistent PG yesterday. You will need to figure out why 
> >> they are inconsistent. Do these steps and then we can figure out how to 
> >> proceed.
> >> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of 
> >> them)
> >> 2. Print out the inconsistent report for each inconsistent PG. `rados 
> >> list-inconsistent-obj  --format=json-pretty`
> >> 3. You will want to look at the error messages and see if all the shards 
> >> have the same data.
> >>
> >> Robert LeBlanc
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph dovecot

2019-05-23 Thread Marc Roos


Sorry for not waiting until it is published on the ceph website but, 
anyone attended this talk? Is it production ready? 

https://cephalocon2019.sched.com/event/M7j8
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph dovecot

2019-05-23 Thread Wido den Hollander



On 5/23/19 12:02 PM, Marc Roos wrote:
> 
> Sorry for not waiting until it is published on the ceph website but, 
> anyone attended this talk? Is it production ready? 
> 

Danny from Deutsche Telekom can answer this better, but no, it's not
production ready.

It seems it's more challenging to get it working especially on the scale
of Telekom. (Millions of mailboxes).

Wido

> https://cephalocon2019.sched.com/event/M7j8
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph dovecot

2019-05-23 Thread Kai Wagner
Hi Marc,

let me add Danny so he's aware of your request.

Kai

On 23.05.19 12:13, Wido den Hollander wrote:
>
> On 5/23/19 12:02 PM, Marc Roos wrote:
>> Sorry for not waiting until it is published on the ceph website but, 
>> anyone attended this talk? Is it production ready? 
>>
> Danny from Deutsche Telekom can answer this better, but no, it's not
> production ready.
>
> It seems it's more challenging to get it working especially on the scale
> of Telekom. (Millions of mailboxes).
>
> Wido
>
>> https://cephalocon2019.sched.com/event/M7j8
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Update minic to nautilus documentation error

2019-05-23 Thread Andres Rojas Guerrero
I have found that it's better to follow this links from the
documentation not from the Ceph Blog:

http://docs.ceph.com/docs/nautilus/releases/nautilus/


Here the links are working.





On 23/5/19 10:56, Andres Rojas Guerrero wrote:
> Hi all, I have followed the Ceph documentation in order to update from
> Mimic to Nautilus:
> 
> 
> https://ceph.com/releases/v14-2-0-nautilus-released/
> 
> The process gone well but I have seen that two links with important
> information doesn't work:
> 
> "v2 network protocol"
> "Updating ceph.conf and mon_host"
> 
> https://docs.ceph.com/docs/nautilus/rados/configuration/msgr2/#msgr2-ceph-conf
> 
> and the information about telemetry module it doesn't exist
> 
> https://ceph.com/mgr/telemetry/#telemetry
> 
> It would be nice if they could be corrected.
> 
> 
> 

-- 
***
Andrés Rojas Guerrero
Unidad Sistemas Linux
Area Arquitectura Tecnológica
Secretaría General Adjunta de Informática
Consejo Superior de Investigaciones Científicas (CSIC)
Pinar 19
28006 - Madrid
Tel: +34 915680059 -- Ext. 990059
email: a.ro...@csic.es
ID comunicate.csic.es: @50852720l:matrix.csic.es
***
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
thank you for this idea, it has improved the situation. Nevertheless, 
there are still 2 PGs in recovery_wait. ceph -s gives me:


  cluster:
    id: 23e72372-0d44-4cad-b24f-3641b14b86f4
    health: HEALTH_WARN
    3/125481112 objects unfound (0.000%)
    Degraded data redundancy: 3/497011315 objects degraded 
(0.000%), 2 pgs degraded


  services:
    mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
    mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
    mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3 
up:standby

    osd: 96 osds: 96 up, 96 in

  data:
    pools:   2 pools, 4096 pgs
    objects: 125.48M objects, 259TiB
    usage:   370TiB used, 154TiB / 524TiB avail
    pgs: 3/497011315 objects degraded (0.000%)
 3/125481112 objects unfound (0.000%)
 4083 active+clean
 10   active+clean+scrubbing+deep
 2    active+recovery_wait+degraded
 1    active+clean+scrubbing

  io:
    client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr


and ceph health detail:

HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data 
redundancy: 3/497011315 objects degraded (0.000%), 2 p

gs degraded
OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
    pg 1.24c has 1 unfound objects
    pg 1.779 has 2 unfound objects
PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded 
(0.000%), 2 pgs degraded
    pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 
unfound
    pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2 
unfound



also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph 
osd down for all OSDs of the degraded PGs. Do you have any further 
suggestions on how to proceed?


On 23.05.19 11:08 vorm., Dan van der Ster wrote:

I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.

Open a window with `watch ceph -s`, then in another window slowly do

 ceph osd down 1
 # then wait a minute or so for that osd.1 to re-peer fully.
 ceph osd down 11
 ...

Continue that for each of the osds with stuck requests, or until there
are no more recovery_wait/degraded PGs.

After each `ceph osd down...`, you should expect to see several PGs
re-peer, and then ideally the slow requests will disappear and the
degraded PGs will become active+clean.
If anything else happens, you should stop and let us know.


-- dan

On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:

This is the current status of ceph:


cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  9/125481144 objects unfound (0.000%)
  Degraded data redundancy: 9/497011417 objects degraded
(0.000%), 7 pgs degraded
  9 stuck requests are blocked > 4096 sec. Implicated osds
1,11,21,32,43,50,65

services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

data:
  pools:   2 pools, 4096 pgs
  objects: 125.48M objects, 259TiB
  usage:   370TiB used, 154TiB / 524TiB avail
  pgs: 9/497011417 objects degraded (0.000%)
   9/125481144 objects unfound (0.000%)
   4078 active+clean
   11   active+clean+scrubbing+deep
   7active+recovery_wait+degraded

io:
  client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr

On 23.05.19 10:54 vorm., Dan van der Ster wrote:

What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does not 
change anything. Hence, the rados report is empty. Is there a way to stop the 
recovery wait to start the deep-scrub and get the output? I guess the 
recovery_wait might be caused by missing objects. Do we need to delete them 
first to get the recovery going?

Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the PGs
to repair?

Regards,

Kevin

Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out why they 
are inconsistent. Do these steps and then we can figure out how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Alexandre Marangone
The PGs will stay active+recovery_wait+degraded until you solve the unfound
objects issue.
You can follow this doc to look at which objects are unfound[1]  and if no
other recourse mark them lost

[1]
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects
.

On Thu, May 23, 2019 at 5:47 AM Kevin Flöh  wrote:

> thank you for this idea, it has improved the situation. Nevertheless,
> there are still 2 PGs in recovery_wait. ceph -s gives me:
>
>cluster:
>  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>  health: HEALTH_WARN
>  3/125481112 objects unfound (0.000%)
>  Degraded data redundancy: 3/497011315 objects degraded
> (0.000%), 2 pgs degraded
>
>services:
>  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> up:standby
>  osd: 96 osds: 96 up, 96 in
>
>data:
>  pools:   2 pools, 4096 pgs
>  objects: 125.48M objects, 259TiB
>  usage:   370TiB used, 154TiB / 524TiB avail
>  pgs: 3/497011315 objects degraded (0.000%)
>   3/125481112 objects unfound (0.000%)
>   4083 active+clean
>   10   active+clean+scrubbing+deep
>   2active+recovery_wait+degraded
>   1active+clean+scrubbing
>
>io:
>  client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr
>
>
> and ceph health detail:
>
> HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data
> redundancy: 3/497011315 objects degraded (0.000%), 2 p
> gs degraded
> OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
>  pg 1.24c has 1 unfound objects
>  pg 1.779 has 2 unfound objects
> PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded
> (0.000%), 2 pgs degraded
>  pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1
> unfound
>  pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2
> unfound
>
>
> also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph
> osd down for all OSDs of the degraded PGs. Do you have any further
> suggestions on how to proceed?
>
> On 23.05.19 11:08 vorm., Dan van der Ster wrote:
> > I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
> > their degraded PGs.
> >
> > Open a window with `watch ceph -s`, then in another window slowly do
> >
> >  ceph osd down 1
> >  # then wait a minute or so for that osd.1 to re-peer fully.
> >  ceph osd down 11
> >  ...
> >
> > Continue that for each of the osds with stuck requests, or until there
> > are no more recovery_wait/degraded PGs.
> >
> > After each `ceph osd down...`, you should expect to see several PGs
> > re-peer, and then ideally the slow requests will disappear and the
> > degraded PGs will become active+clean.
> > If anything else happens, you should stop and let us know.
> >
> >
> > -- dan
> >
> > On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:
> >> This is the current status of ceph:
> >>
> >>
> >> cluster:
> >>   id: 23e72372-0d44-4cad-b24f-3641b14b86f4
> >>   health: HEALTH_ERR
> >>   9/125481144 objects unfound (0.000%)
> >>   Degraded data redundancy: 9/497011417 objects degraded
> >> (0.000%), 7 pgs degraded
> >>   9 stuck requests are blocked > 4096 sec. Implicated osds
> >> 1,11,21,32,43,50,65
> >>
> >> services:
> >>   mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
> >>   mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
> >>   mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> >> up:standby
> >>   osd: 96 osds: 96 up, 96 in
> >>
> >> data:
> >>   pools:   2 pools, 4096 pgs
> >>   objects: 125.48M objects, 259TiB
> >>   usage:   370TiB used, 154TiB / 524TiB avail
> >>   pgs: 9/497011417 objects degraded (0.000%)
> >>9/125481144 objects unfound (0.000%)
> >>4078 active+clean
> >>11   active+clean+scrubbing+deep
> >>7active+recovery_wait+degraded
> >>
> >> io:
> >>   client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
> >>
> >> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> >>> What's the full ceph status?
> >>> Normally recovery_wait just means that the relevant osd's are busy
> >>> recovering/backfilling another PG.
> >>>
> >>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh 
> wrote:
>  Hi,
> 
>  we have set the PGs to recover and now they are stuck in
> active+recovery_wait+degraded and instructing them to deep-scrub does not
> change anything. Hence, the rados report is empty. Is there a way to stop
> the recovery wait to start the deep-scrub and get the output? I guess the
> recovery_wait might be caused by missing objects. Do we need to delete them
> first to get the recovery going?
> 
>  Kevin
> 
>  On 

[ceph-users] large omap object in usage_log_pool

2019-05-23 Thread shubjero
Hi there,

We have an old cluster that was built on Giant that we have maintained and
upgraded over time and are now running Mimic 13.2.5. The other day we
received a HEALTH_WARN about 1 large omap object in the pool '.usage' which
is our usage_log_pool defined in our radosgw zone.

I am trying to understand the purpose of the usage_log_pool and whether or
not we have appropriate settings (shards, replicas, etc) in place.

We were able to identify the 1 large omap object as 'usage.22' in the
.usage pool. This particular "bucket" had over 2 million "omapkeys"

```for i in `rados -p .usage ls`; do echo $i; rados -p .usage listomapkeys
$i | wc -l; done```
-snip-
usage.13
20
usage.22
2023790
usage.25
14
-snip-

These keys all seem to be metadata/pointers of valid data from our
OpenStack's object storage where we hold about 1PB of unique data.

To resolve the HEALTH_WARN we changeg the
'osd_deep_scrub_large_omap_object_key_threshold' from '200' to
'250' using 'ceph config set osd ...' on our Mon's.

I'd like to know the importance of this pool as I also noticed that this
pool's replication is only set to 2, instead of 3 like all our other pools
with the exception of .users.email (also 2). If important, I'd like to set
the replication to 3 and curious to know if there would be any negative
impact to the cluster. The .usage pool says 0 bytes used in 'ceph df' but
it contains 30 objects for which there are many omapkeys.

I am also wondering about bucket index max shards for which we have '8' set
in the config.
```"rgw_override_bucket_index_max_shards": "8",```. Should this be
increased?

Thanks in advance for any responses, I have found this mailing list to be
an excellent source of information!

Jared Baker
Ontario Institute for Cancer Research
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [events] Ceph Day Netherlands July 2nd - CFP ends June 3rd

2019-05-23 Thread Mike Perez
Hi everyone,

We will be having Ceph Day Netherlands July 2nd!

https://ceph.com/cephdays/netherlands-2019/

The CFP will be ending June 3rd, so there is still time to get your
Ceph related content in front of the Ceph community ranging from all
levels of expertise:

https://zfrmz.com/E3ouYm0NiPF1b3NLBjJk

If your company is interested in sponsoring the event, we would be
delighted to have you. Please contact me directly for further
information.

Hosted by the Ceph community (and our friends) in select cities around
the world, Ceph Days are full-day events dedicated to fostering our
vibrant community.

In addition to Ceph experts, community members, and vendors, you’ll
hear from production users of Ceph who’ll share what they’ve learned
from their deployments.

Each Ceph Day ends with a Q&A session and cocktail reception. Join us!

--
Mike Perez (thingee)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and multiple RDMA NICs

2019-05-23 Thread Lazuardi Nasution
Hi David and Justinas,

I'm interested with this old thread. Have it been solved? Would you mind to
share the solution and reference regarding to David statement of some
threads on the ML about RDMA?

Best regards,


> Date: Fri, 02 Mar 2018 06:12:18 +
> From: David Turner 
> To: Justinas LINGYS 
> Cc: "ceph-users@lists.ceph.com" 
> Subject: Re: [ceph-users] Ceph and multiple RDMA NICs
> Message-ID:
> <
> can-gepjtyuhyur0qnbrae7zby7tylrnt9h94-nonb4wssao...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> The only communication on the private network for ceph is between the OSDs
> for replication, Erasure coding, backfilling, and recovery. Everything else
> is on the public network. Including communication with clients, mons, MDS,
> rgw and, literally everything else.
>
> I haven't used RDMA, but from the question of ceph public network vs
> private network, that is what they do. You can decide if you want to have 2
> different subnets for them. There have been some threads on the ML about
> RDMA and getting it working.
>
> On Fri, Mar 2, 2018, 12:53 AM Justinas LINGYS 
> wrote:
>
> > Hi David,
> >
> > Thank you for your reply. As I understand your experience with multiple
> > subnets
> > suggests sticking to a single device. However, I have a powerful RDMA NIC
> > (100Gbps) with two ports and I have seen recommendations from Mellanox to
> > separate the
> > two networks. Also, I am planning on having quite a lot of traffic on my
> > private network since it's for a research project which uses machine
> > learning and it stores a lot of data in a Ceph cluster. Considering my
> > case, I assume it is worth the pain separating the two networks to get
> best
> > out the advanced NIC.
> >
> > Justin
> >
> > 
> > From: David Turner 
> > Sent: Thursday, March 1, 2018 9:57:50 PM
> > To: Justinas LINGYS
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Ceph and multiple RDMA NICs
> >
> > There has been some chatter on the ML questioning the need to separate
> out
> > the public and private subnets for Ceph. The trend seems to be in
> > simplifying your configuration which for some is not specifying multiple
> > subnets here.  I haven't heard of anyone complaining about network
> problems
> > with putting private and public on the same subnets, but I have seen a
> lot
> > of people with networking problems by splitting them up.
> >
> > Personally I use vlans for the 2 on the same interface at home and I have
> > 4 port 10Gb nics at the office, so we split that up as well, but even
> there
> > we might be better suited with bonding all 4 together and using a vlan to
> > split traffic.  I wouldn't merge them together since we have graphing on
> > our storage nodes for public and private networks.
> >
> > But the take-away is that if it's too hard to split your public and
> > private subnets... don't.  I doubt you would notice any difference if you
> > were to get it working vs just not doing it.
> >
> > On Thu, Mar 1, 2018 at 3:24 AM Justinas LINGYS  > > wrote:
> > Hi all,
> >
> > I am running a small Ceph cluster  (1 MON and 3OSDs), and it works fine.
> > However, I have a doubt about the two networks (public and cluster) that
> > an OSD uses.
> > There is a reference from Mellanox (
> > https://community.mellanox.com/docs/DOC-2721) how to configure
> > 'ceph.conf'. However, after reading the source code (luminous-stable), I
> > get a feeling that we cannot run Ceph with two NICs/Ports as we only have
> > one 'ms_async_rdma_local_gid' per OSD, and it seems that the source code
> > only uses one option (NIC). I would like to ask how I could communicate
> > with the public network via one RDMA NIC and communicate  with the
> cluster
> > network via another RDMA NIC (apply RoCEV2 to both NICs). Since gids are
> > unique within a machine, how can I use two different gids in 'ceph.conf'?
> >
> > Justin
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs free space vs ceph df free space disparity

2019-05-23 Thread Robert Ruge
Ceph newbie question.

I have a disparity between the free space that my cephfs file system is showing 
and what ceph df is showing.
As you can see below my cephfs file system says there is 9.5TB free however 
ceph df says there is 186TB which with replication size 3 should equate to 62TB 
free space.
I guess the basic question is how can I get cephfs to see and use all of the 
available space?
I recently changed my number of pg's on the cephfs_data pool from 2048 to 4096 
and this gave me another 8TB so do I keep increasing the number of pg's or is 
there something else that I am missing? I have only been running ceph for ~6 
months so I'm relatively new to it all and not being able to use all of the 
space is just plain bugging me.

# df -h /ceph
FilesystemSize  Used Avail Use% Mounted on
X,y,z:/  107T   97T  9.5T  92% /ceph
# ceph df
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
495 TiB 186 TiB  310 TiB 62.51
POOLS:
NAMEID USED%USED MAX AVAIL OBJECTS
cephfs_data 1   97 TiB 91.06   9.5 TiB 156401395
cephfs_metadata 2  385 MiB 0   9.5 TiB530590
# ceph osd pool ls detail
pool 1 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 4096 pgp_num 4096 last_change 33914 lfor 0/29945 flags 
hashpspool,nearfull,selfmanaged_snaps stripe_width 0 application cephfs
removed_snaps [2~2]
pool 2 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash 
rjenkins pg_num 256 pgp_num 256 last_change 33914 lfor 0/30369 flags 
hashpspool,nearfull stripe_width 0 application cephfs


Regards
Robert Ruge


Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] large omap object in usage_log_pool

2019-05-23 Thread Konstantin Shalygin

in the config.
```"rgw_override_bucket_index_max_shards": "8",```. Should this be
increased?


Should be decreased to default `0`, I think.

Modern Ceph releases resolve large omaps automatically via bucket 
dynamic resharding:


```

{
    "option": {
    "name": "rgw_dynamic_resharding",
    "type": "bool",
    "level": "basic",
    "desc": "Enable dynamic resharding",
    "long_desc": "If true, RGW will dynamicall increase the number 
of shards in buckets that have a high number of objects per shard.",

    "default": true,
    "daemon_default": "",
    "tags": [],
    "services": [
    "rgw"
    ],
    "see_also": [
    "rgw_max_objs_per_shard"
    ],
    "min": "",
    "max": ""
    }
}
```

```

{
    "option": {
    "name": "rgw_max_objs_per_shard",
    "type": "int64_t",
    "level": "basic",
    "desc": "Max objects per shard for dynamic resharding",
    "long_desc": "This is the max number of objects per bucket 
index shard that RGW will allow with dynamic resharding. RGW will 
trigger an automatic reshard operation on the bucket if it exceeds this 
number.",

    "default": 10,
    "daemon_default": "",
    "tags": [],
    "services": [
    "rgw"
    ],
    "see_also": [
    "rgw_dynamic_resharding"
    ],
    "min": "",
    "max": ""
    }
}
```


So when your bucket reached new 100k objects rgw will shard this bucket 
automatically.


Some old buckets may be not sharded, like your ancients from Giant. You 
can check fill status like this: `radosgw-admin bucket limit check | jq 
'.[]'`. If some buckets is not reshared you can shart it by hand via 
`radosgw-admin reshard add ...`. Also, there may be some stale reshard 
instances (fixed ~ in 12.2.11), you can check it via `radosgw-admin 
reshard stale-instances list` and then remove via `reshard 
stale-instances rm`.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com