[ceph-users] Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread Troels Hansen
I have a strange issue..
Its a 3 node cluster, deployed on Ubuntu, on containers, running version
15.2.4, docker.io/ceph/ceph:v15

Its only running RGW, and everything seems fine, and everyting works.
No errors and the cluster is healthy.

As soon as one node is restarted all IO is blocked, apparently because of
slow ops, but I see no reason for it.

Its running as simple as possible, with a replica count of 3.

The second the OSD's on the halted node dissapears I see slow ops, but its
blocking everything, and there is no IO to the cluster.

The slow requests are spread accross all of the remaining OSD's.

2021-10-20T05:07:02.554282+0200 mon.prodceph-mon1 [WRN] Health check
failed: 0 slow ops, oldest one blocked for 30 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:04.652756+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.585995+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:05.629622+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.629660+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:05.629690+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:06.555735+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:06.677696+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:06.677732+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:06.677750+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:07.553717+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:07.643135+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:07.643159+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:07.643175+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:08.368877+0200 mon.prodceph-mon1 [WRN] Health check
update: 0 slow ops, oldest one blocked for 35 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:08.570167+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:08.570200+0

[ceph-users] Re: Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread Eugen Block

Hi,

what's the pool's min_size?

ceph osd pool ls detail


Zitat von Troels Hansen :


I have a strange issue..
Its a 3 node cluster, deployed on Ubuntu, on containers, running version
15.2.4, docker.io/ceph/ceph:v15

Its only running RGW, and everything seems fine, and everyting works.
No errors and the cluster is healthy.

As soon as one node is restarted all IO is blocked, apparently because of
slow ops, but I see no reason for it.

Its running as simple as possible, with a replica count of 3.

The second the OSD's on the halted node dissapears I see slow ops, but its
blocking everything, and there is no IO to the cluster.

The slow requests are spread accross all of the remaining OSD's.

2021-10-20T05:07:02.554282+0200 mon.prodceph-mon1 [WRN] Health check
failed: 0 slow ops, oldest one blocked for 30 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:04.652756+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.585995+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:05.629622+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.629660+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:05.629690+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:06.555735+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:06.677696+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:06.677732+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:06.677750+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:07.553717+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:07.643135+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:07.643159+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:07.643175+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:08.368877+0200 mon.prodceph-mon1 [WRN] Health check
update: 0 slow ops, oldest one blocked for 35 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:08.570167+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084)

[ceph-users] Re: v15.2.15 Octopus released

2021-10-25 Thread Stefan Kooman

On 10/20/21 21:57, David Galloway wrote:

We're happy to announce the 15th backport release in the Octopus series.
We recommend users to update to this release. 


...


Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-15.2.15.tar.gz
* Containers at https://quay.io/repository/ceph/ceph


No containers tagged with version 15.2.15 as of yet. How long does it 
normally take to build containers after a new version gets released?


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Dan,

basically it's this: when you have a server that is so big, crush can't utilize 
it the same way as the other smaller servers because of the placement 
constraints,
the balancer doesn't balance data on the smaller servers any more, because it 
just "sees" the big one to be too empty.

To my understanding the mgr-balancer balances hierarchically, on each crush 
level.
It moves pgs between buckets on the same level (i.e. from too-full-rack to 
too-empty-rack, from too-full-server to too-empty server, inside a server from 
osd to another osd),
so when there's e.g. an always-too-empty server, it kinda defeats the algorithm 
and doesn't migrate PGs even when the crush constraints would allow it.
So it won't move PGs from small-server 1 (with osds at ~90% full) to 
small-server 2 (with osds at ~60%), due to server 3 with osds at 30%.
We have servers with 12T drives and some with 1T drives, and various drive 
counts, so that this situation emerged...
Since I saw how it could be balanced, but wasn't, I wrote the tool.

I also think that the mgr-balancer approach is good, but the hierarchical 
movements are hard to adjust I think.
But yes, I see my balancer complementary to the mgr-balancer, and for some time 
I used both (since mgr-balance is happy about my movements and just leaves 
them) and it worked well.

-- Jonas

On 20/10/2021 21.41, Dan van der Ster wrote:
> Hi,
> 
> I don't quite understand your "huge server" scenario, other than a basic 
> understanding that the balancer cannot do magic in some impossible cases.
> 
> But anyway, I wonder if this sort of higher order balancing could/should be 
> added as a "part two" to the mgr balancer. The existing code does a quite 
> good job in many (dare I say most?) cases. E.g. it even balances empty 
> clusters perfectly.
> But after it cannot find a further optimization, maybe a heuristic like yours 
> can further refine the placement...
> 
>  Dan
> 
> 
> On Wed, 20 Oct 2021, 20:52 Jonas Jelten,  > wrote:
> 
> Hi Dan,
> 
> I'm not kidding, these were real-world observations, hence my motivation 
> to create this balancer :)
> First I tried "fixing" the mgr balancer, but after understanding the 
> exact algorithm there I thought of a completely different approach.
> 
> For us the main reason things got out of balance was this (from the 
> README):
> > To make things worse, if there's a huge server in the cluster which is 
> so big, CRUSH can't place data often enough on it to fill it to the same 
> level as any other server, the balancer will fail moving PGs across servers 
> that actually would have space.
> > This happens since it sees only this server's OSDs as "underfull", but 
> each PG has one shard on that server already, so no data can be moved on it.
> 
> But all the aspects in that section play together, and I don't think it's 
> easily improvable in mgr-balancer while keeping the same base algorithm.
> 
> Cheers
>   -- Jonas
> 
> On 20/10/2021 19.55, Dan van der Ster wrote:
> > Hi Jonas,
> >
> > From your readme:
> >
> > "the best possible solution is some OSDs having an offset of 1 PG to 
> the ideal count. As a PG-distribution-optimization is done per pool, without 
> checking other pool's distribution at all, some devices will be the +1 more 
> often than others. At worst one OSD is the +1 for each pool in the cluster."
> >
> > That's an interesting observation/flaw which hadn't occurred to me 
> before. I think we don't ever see it in practice in our clusters because we 
> do not have multiple large pools on the same osds.
> >
> > How large are the variances in your real clusters? I hope the example 
> in your readme isn't from real life??
> >
> > Cheers, Dan
> 
> 
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Good day everybody,

I just came across very strange behavior. I have two buckets where s3cmd
hangs when I try to show current multipart uploads.

When I use --debug I see that it loops over the same response.
What I tried to fix it on one bucket:
* radosgw-admin bucket check --bucket=BUCKETNAME
* radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME

The check command now reports an empty array [], but I still can't show the
multiparts. I can interact very normal with the bucket (list/put/get
objects).

The debug output shows always the same data and
DEBUG: Listing continues after 'FILENAME'

Did someone already came across this error?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] failing dkim

2021-10-25 Thread mj

Hi,

This is not about ceph, but about this ceph-users mailinglist.

We have recently started using DKIM/DMARC/SPF etc, and since then we 
notice that the emails from this ceph-users mailinglist come with either a

- failing DKIM signature
or
- no DKIM signature
at all.

Many of the other mailinglists I am subscribed to (like postfix, samba, 
sogo) generally pass the DKIM verification.


Does this say something about how this particular ceph-users mailinglist 
is setup, or is there something we can do about it..?


Sorry for being off-topic, please reply privately if this is not allowed 
and or appreciated here.


MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Casey Bodley
hi Boris, this sounds a lot like
https://tracker.ceph.com/issues/49206, which says "When deleting a
bucket with an incomplete multipart upload that has about 2000 parts
uploaded, we noticed an infinite loop, which stopped s3cmd from
deleting the bucket forever."

i'm afraid this fix was merged after nautilus went end-of-life, so
you'd need to upgrade to octopus for it

On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens  wrote:
>
> Good day everybody,
>
> I just came across very strange behavior. I have two buckets where s3cmd
> hangs when I try to show current multipart uploads.
>
> When I use --debug I see that it loops over the same response.
> What I tried to fix it on one bucket:
> * radosgw-admin bucket check --bucket=BUCKETNAME
> * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME
>
> The check command now reports an empty array [], but I still can't show the
> multiparts. I can interact very normal with the bucket (list/put/get
> objects).
>
> The debug output shows always the same data and
> DEBUG: Listing continues after 'FILENAME'
>
> Did someone already came across this error?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3cmd does not show multiparts in nautilus RGW on specific bucket (--debug shows loop)

2021-10-25 Thread Boris Behrens
Hi Casey,

thanks a lot for that hint. That sound a lot like this is the problem.
Is there a way to show incomplete multipart uploads via radosgw-admin?

So I would be able to cancel it.

Upgrading to octopus might take a TON of time, as we have 1.1 PiB in 160
OSDs rotational disks. :)

Am Mo., 25. Okt. 2021 um 16:19 Uhr schrieb Casey Bodley :

> hi Boris, this sounds a lot like
> https://tracker.ceph.com/issues/49206, which says "When deleting a
> bucket with an incomplete multipart upload that has about 2000 parts
> uploaded, we noticed an infinite loop, which stopped s3cmd from
> deleting the bucket forever."
>
> i'm afraid this fix was merged after nautilus went end-of-life, so
> you'd need to upgrade to octopus for it
>
> On Mon, Oct 25, 2021 at 9:52 AM Boris Behrens  wrote:
> >
> > Good day everybody,
> >
> > I just came across very strange behavior. I have two buckets where s3cmd
> > hangs when I try to show current multipart uploads.
> >
> > When I use --debug I see that it loops over the same response.
> > What I tried to fix it on one bucket:
> > * radosgw-admin bucket check --bucket=BUCKETNAME
> > * radosgw-admin bucket check --check-objects --fix --bucket=BUCKETNAME
> >
> > The check command now reports an empty array [], but I still can't show
> the
> > multiparts. I can interact very normal with the bucket (list/put/get
> > objects).
> >
> > The debug output shows always the same data and
> > DEBUG: Listing continues after 'FILENAME'
> >
> > Did someone already came across this error?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread E Taka
Hi Jonas,
I'm impressed, Thanks!

I have a question about the usage: do I have to turn off the automatic
balancing feature (ceph balancer off)? Do the upmap balancer and your
customizations get in each other's way, or can I run your script from time
to time?

Thanks
Erich


Am Mo., 25. Okt. 2021 um 14:50 Uhr schrieb Jonas Jelten :

> Hi Dan,
>
> basically it's this: when you have a server that is so big, crush can't
> utilize it the same way as the other smaller servers because of the
> placement constraints,
> the balancer doesn't balance data on the smaller servers any more, because
> it just "sees" the big one to be too empty.
>
> To my understanding the mgr-balancer balances hierarchically, on each
> crush level.
> It moves pgs between buckets on the same level (i.e. from too-full-rack to
> too-empty-rack, from too-full-server to too-empty server, inside a server
> from osd to another osd),
> so when there's e.g. an always-too-empty server, it kinda defeats the
> algorithm and doesn't migrate PGs even when the crush constraints would
> allow it.
> So it won't move PGs from small-server 1 (with osds at ~90% full) to
> small-server 2 (with osds at ~60%), due to server 3 with osds at 30%.
> We have servers with 12T drives and some with 1T drives, and various drive
> counts, so that this situation emerged...
> Since I saw how it could be balanced, but wasn't, I wrote the tool.
>
> I also think that the mgr-balancer approach is good, but the hierarchical
> movements are hard to adjust I think.
> But yes, I see my balancer complementary to the mgr-balancer, and for some
> time I used both (since mgr-balance is happy about my movements and just
> leaves them) and it worked well.
>
> -- Jonas
>
> On 20/10/2021 21.41, Dan van der Ster wrote:
> > Hi,
> >
> > I don't quite understand your "huge server" scenario, other than a basic
> understanding that the balancer cannot do magic in some impossible cases.
> >
> > But anyway, I wonder if this sort of higher order balancing could/should
> be added as a "part two" to the mgr balancer. The existing code does a
> quite good job in many (dare I say most?) cases. E.g. it even balances
> empty clusters perfectly.
> > But after it cannot find a further optimization, maybe a heuristic like
> yours can further refine the placement...
> >
> >  Dan
> >
> >
> > On Wed, 20 Oct 2021, 20:52 Jonas Jelten,  jel...@in.tum.de>> wrote:
> >
> > Hi Dan,
> >
> > I'm not kidding, these were real-world observations, hence my
> motivation to create this balancer :)
> > First I tried "fixing" the mgr balancer, but after understanding the
> exact algorithm there I thought of a completely different approach.
> >
> > For us the main reason things got out of balance was this (from the
> README):
> > > To make things worse, if there's a huge server in the cluster
> which is so big, CRUSH can't place data often enough on it to fill it to
> the same level as any other server, the balancer will fail moving PGs
> across servers that actually would have space.
> > > This happens since it sees only this server's OSDs as "underfull",
> but each PG has one shard on that server already, so no data can be moved
> on it.
> >
> > But all the aspects in that section play together, and I don't think
> it's easily improvable in mgr-balancer while keeping the same base
> algorithm.
> >
> > Cheers
> >   -- Jonas
> >
> > On 20/10/2021 19.55, Dan van der Ster wrote:
> > > Hi Jonas,
> > >
> > > From your readme:
> > >
> > > "the best possible solution is some OSDs having an offset of 1 PG
> to the ideal count. As a PG-distribution-optimization is done per pool,
> without checking other pool's distribution at all, some devices will be the
> +1 more often than others. At worst one OSD is the +1 for each pool in the
> cluster."
> > >
> > > That's an interesting observation/flaw which hadn't occurred to me
> before. I think we don't ever see it in practice in our clusters because we
> do not have multiple large pools on the same osds.
> > >
> > > How large are the variances in your real clusters? I hope the
> example in your readme isn't from real life??
> > >
> > > Cheers, Dan
> >
> >
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
Continuing my containerized Ceph adventures

I'm trying to set up SAML2 auth for the dashboard (specifically pointing at the 
institute Shibboleth service).  The service requires the use of the x509 
certificates.  Following the instructions in the documentation ( 
https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-sso-support ) leads to 
an error about the certificate file not existing.

Some digging suggests that's because the daemon is looking in the container's 
filesystem rather than the physical host's filesystem.  That makes some sense, 
but it annoying.

So my question is:  How do I get the cert and key file into the container's 
filesystem in a persistent way?  cephadm enter --name "mgr.hostname" results in 
a "no such container" error.  cephadm shell --name "mgr.hostname" works, but 
changes don't persist.

Any suggestions about this problem specifically, authing the dashboard against 
Shibboleth, or SAML2 in general?

-
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Yury Kirsanov
Hi Edward,
You need to set configuration like this, assuming that certificate and key
are on your local disk:

ceph mgr module disable dashboard
ceph dashboard set-ssl-certificate -i .crt
ceph dashboard set-ssl-certificate-key -i .key
ceph config-key set mgr/cephadm/grafana_crt -i .crt
ceph config-key set mgr/cephadm/grafana_key -i .key
ceph orch reconfig grafana
ceph mgr module enable dashboard

Hope this helps!

Regards,
Yury.

On Tue, Oct 26, 2021 at 2:45 AM Edward R Huyer  wrote:

> Continuing my containerized Ceph adventures
>
> I'm trying to set up SAML2 auth for the dashboard (specifically pointing
> at the institute Shibboleth service).  The service requires the use of the
> x509 certificates.  Following the instructions in the documentation (
> https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-sso-support )
> leads to an error about the certificate file not existing.
>
> Some digging suggests that's because the daemon is looking in the
> container's filesystem rather than the physical host's filesystem.  That
> makes some sense, but it annoying.
>
> So my question is:  How do I get the cert and key file into the
> container's filesystem in a persistent way?  cephadm enter --name
> "mgr.hostname" results in a "no such container" error.  cephadm shell
> --name "mgr.hostname" works, but changes don't persist.
>
> Any suggestions about this problem specifically, authing the dashboard
> against Shibboleth, or SAML2 in general?
>
> -
> Edward Huyer
> Golisano College of Computing and Information Sciences
> Rochester Institute of Technology
> Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
I don’t think that’s correct?  I already have a certificate set up for HTTPS, 
and it doesn’t show up in the SAML2 configuration.  Maybe I’m mistaken, but I 
think the SAML2 cert is separate from the regular HTTPS cert?

From: Yury Kirsanov 
Sent: Monday, October 25, 2021 11:52 AM
To: Edward R Huyer 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Doing SAML2 Auth With Containerized mgrs

CAUTION: This message came from outside RIT. If you are unsure about the source 
or content of this message, please contact the RIT Service Center at 
585-475-5000 or help.rit.edu before clicking links, opening attachments or 
responding.


Hi Edward,
You need to set configuration like this, assuming that certificate and key are 
on your local disk:

ceph mgr module disable dashboard
ceph dashboard set-ssl-certificate -i .crt
ceph dashboard set-ssl-certificate-key -i .key
ceph config-key set mgr/cephadm/grafana_crt -i .crt
ceph config-key set mgr/cephadm/grafana_key -i .key
ceph orch reconfig grafana
ceph mgr module enable dashboard

Hope this helps!

Regards,
Yury.

On Tue, Oct 26, 2021 at 2:45 AM Edward R Huyer 
mailto:erh...@rit.edu>> wrote:
Continuing my containerized Ceph adventures

I'm trying to set up SAML2 auth for the dashboard (specifically pointing at the 
institute Shibboleth service).  The service requires the use of the x509 
certificates.  Following the instructions in the documentation ( 
https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-sso-support ) leads to 
an error about the certificate file not existing.

Some digging suggests that's because the daemon is looking in the container's 
filesystem rather than the physical host's filesystem.  That makes some sense, 
but it annoying.

So my question is:  How do I get the cert and key file into the container's 
filesystem in a persistent way?  cephadm enter --name "mgr.hostname" results in 
a "no such container" error.  cephadm shell --name "mgr.hostname" works, but 
changes don't persist.

Any suggestions about this problem specifically, authing the dashboard against 
Shibboleth, or SAML2 in general?

-
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu>

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Yury Kirsanov
Hi Edward,
Yes, you probably are right, I thought about dashboard SSL certificate, not
the SAML2, sorry for that.

Regards,
Yury.

On Tue, Oct 26, 2021 at 3:10 AM Edward R Huyer  wrote:

> I don’t think that’s correct?  I already have a certificate set up for
> HTTPS, and it doesn’t show up in the SAML2 configuration.  Maybe I’m
> mistaken, but I think the SAML2 cert is separate from the regular HTTPS
> cert?
>
>
>
> *From:* Yury Kirsanov 
> *Sent:* Monday, October 25, 2021 11:52 AM
> *To:* Edward R Huyer 
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [ceph-users] Doing SAML2 Auth With Containerized mgrs
>
>
>
> *CAUTION: This message came from outside RIT. If you are unsure about the
> source or content of this message, please contact the RIT Service Center at
> 585-475-5000 or help.rit.edu  before clicking links,
> opening attachments or responding.*
>
> Hi Edward,
>
> You need to set configuration like this, assuming that certificate and key
> are on your local disk:
>
> ceph mgr module disable dashboard
> ceph dashboard set-ssl-certificate -i .crt
> ceph dashboard set-ssl-certificate-key -i .key
> ceph config-key set mgr/cephadm/grafana_crt -i .crt
> ceph config-key set mgr/cephadm/grafana_key -i .key
> ceph orch reconfig grafana
> ceph mgr module enable dashboard
>
> Hope this helps!
>
> Regards,
> Yury.
>
>
>
> On Tue, Oct 26, 2021 at 2:45 AM Edward R Huyer  wrote:
>
> Continuing my containerized Ceph adventures
>
> I'm trying to set up SAML2 auth for the dashboard (specifically pointing
> at the institute Shibboleth service).  The service requires the use of the
> x509 certificates.  Following the instructions in the documentation (
> https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-sso-support )
> leads to an error about the certificate file not existing.
>
> Some digging suggests that's because the daemon is looking in the
> container's filesystem rather than the physical host's filesystem.  That
> makes some sense, but it annoying.
>
> So my question is:  How do I get the cert and key file into the
> container's filesystem in a persistent way?  cephadm enter --name
> "mgr.hostname" results in a "no such container" error.  cephadm shell
> --name "mgr.hostname" works, but changes don't persist.
>
> Any suggestions about this problem specifically, authing the dashboard
> against Shibboleth, or SAML2 in general?
>
> -
> Edward Huyer
> Golisano College of Computing and Information Sciences
> Rochester Institute of Technology
> Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

2021-10-25 Thread Edward R Huyer
No worries.  It's a pretty specific problem, and the documentation could be 
better.

-Original Message-
From: Yury Kirsanov  
Sent: Monday, October 25, 2021 12:17 PM
To: Edward R Huyer 
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: Doing SAML2 Auth With Containerized mgrs

Hi Edward,
Yes, you probably are right, I thought about dashboard SSL certificate, not the 
SAML2, sorry for that.

Regards,
Yury.

On Tue, Oct 26, 2021 at 3:10 AM Edward R Huyer  wrote:

> I don’t think that’s correct?  I already have a certificate set up for 
> HTTPS, and it doesn’t show up in the SAML2 configuration.  Maybe I’m 
> mistaken, but I think the SAML2 cert is separate from the regular 
> HTTPS cert?
>
>
>
> *From:* Yury Kirsanov 
> *Sent:* Monday, October 25, 2021 11:52 AM
> *To:* Edward R Huyer 
> *Cc:* ceph-users@ceph.io
> *Subject:* Re: [ceph-users] Doing SAML2 Auth With Containerized mgrs
>
>
>
> *CAUTION: This message came from outside RIT. If you are unsure about 
> the source or content of this message, please contact the RIT Service 
> Center at
> 585-475-5000 or help.rit.edu  before clicking 
> links, opening attachments or responding.*
>
> Hi Edward,
>
> You need to set configuration like this, assuming that certificate and 
> key are on your local disk:
>
> ceph mgr module disable dashboard
> ceph dashboard set-ssl-certificate -i .crt ceph 
> dashboard set-ssl-certificate-key -i .key ceph 
> config-key set mgr/cephadm/grafana_crt -i .crt ceph 
> config-key set mgr/cephadm/grafana_key -i .key 
> ceph orch reconfig grafana ceph mgr module enable dashboard
>
> Hope this helps!
>
> Regards,
> Yury.
>
>
>
> On Tue, Oct 26, 2021 at 2:45 AM Edward R Huyer  wrote:
>
> Continuing my containerized Ceph adventures
>
> I'm trying to set up SAML2 auth for the dashboard (specifically 
> pointing at the institute Shibboleth service).  The service requires 
> the use of the
> x509 certificates.  Following the instructions in the documentation ( 
> https://docs.ceph.com/en/latest/mgr/dashboard/#dashboard-sso-support ) 
> leads to an error about the certificate file not existing.
>
> Some digging suggests that's because the daemon is looking in the 
> container's filesystem rather than the physical host's filesystem.  
> That makes some sense, but it annoying.
>
> So my question is:  How do I get the cert and key file into the 
> container's filesystem in a persistent way?  cephadm enter --name 
> "mgr.hostname" results in a "no such container" error.  cephadm shell 
> --name "mgr.hostname" works, but changes don't persist.
>
> Any suggestions about this problem specifically, authing the dashboard 
> against Shibboleth, or SAML2 in general?
>
> -
> Edward Huyer
> Golisano College of Computing and Information Sciences Rochester 
> Institute of Technology Golisano 70-2373
> 152 Lomb Memorial Drive
> Rochester, NY 14623
> 585-475-6651
> erh...@rit.edu
>
> Obligatory Legalese:
> The information transmitted, including attachments, is intended only 
> for the person(s) or entity to which it is addressed and may contain 
> confidential and/or privileged material. Any review, retransmission, 
> dissemination or other use of, or taking of any action in reliance 
> upon this information by persons or entities other than the intended 
> recipient is prohibited. If you received this in error, please contact 
> the sender and destroy any copies of this information.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-10-25 Thread DHilsbos
MJ;

Assuming that you have a replicated pool with 3 replicas and min_size = 2, I 
would think stopping 2 OSD daemons, or 2 OSD containers would guarantee 
HEALTH_ERR.  Similarly, if you have a replicated pool with 2 replicas, still 
with min_size = 2, stopping 1 OSD should do the trick.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Saturday, October 23, 2021 4:06 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: How to make HEALTH_ERR quickly and pain-free



Op 21-01-2021 om 11:57 schreef George Shuklin:
> I have hell of the question: how to make HEALTH_ERR status for a cluster 
> without consequences?
> 
> I'm working on CI tests and I need to check if our reaction to 
> HEALTH_ERR is good. For this I need to take an empty cluster with an 
> empty pool and do something. Preferably quick and reversible.
> 
> For HEALTH_WARN the best thing I found is to change pool size to 1, it 
> raises "1 pool(s) have no replicas configured" warning almost instantly 
> and it can be reverted very quickly for empty pool.

To get HEALTH_WARN we always simply set something like noout, but we 
also wonder if there's a nice way to set HEALTH_ERR, for the same purpose.

Anyone..?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: failing dkim

2021-10-25 Thread DHilsbos
MJ;

A lot of mailing lists "rewrite" the origin address to one that matches the 
mailing list server.
Here's an example from the Samba mailing list: "samba 
; on behalf of; Rowland Penny via samba 
".

This mailing list relays the email, without modifying the sender, or the 
envelope address.  For this email, you see a @performair.com email coming from 
a ceph.io (RedHat?) server.

I don't know if that's the cause, but that's a significant difference between 
this and other mailing lists.

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: mj [mailto:li...@merit.unu.edu] 
Sent: Monday, October 25, 2021 7:10 AM
To: ceph-users
Subject: [ceph-users] failing dkim

Hi,

This is not about ceph, but about this ceph-users mailinglist.

We have recently started using DKIM/DMARC/SPF etc, and since then we 
notice that the emails from this ceph-users mailinglist come with either a
- failing DKIM signature
or
- no DKIM signature
at all.

Many of the other mailinglists I am subscribed to (like postfix, samba, 
sogo) generally pass the DKIM verification.

Does this say something about how this particular ceph-users mailinglist 
is setup, or is there something we can do about it..?

Sorry for being off-topic, please reply privately if this is not allowed 
and or appreciated here.

MJ
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rebooting one node immediately blocks IO via RGW

2021-10-25 Thread DHilsbos
Troels;

This sounds like a failure domain issue.  If I remember correctly, Ceph 
defaults to a failure domain of disk (osd), while you need a failure domain of 
host.

Could you do a ceph -s while one of the hosts is offline?  You're looking for 
the HEALTH_ flag, and any errors other than slow ops.

Also, what major version of Ceph are you running?

Thank you,

Dominic L. Hilsbos, MBA
Vice President - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com


-Original Message-
From: Troels Hansen [mailto:t...@miracle.dk] 
Sent: Monday, October 25, 2021 12:55 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Rebooting one node immediately blocks IO via RGW

I have a strange issue..
Its a 3 node cluster, deployed on Ubuntu, on containers, running version
15.2.4, docker.io/ceph/ceph:v15

Its only running RGW, and everything seems fine, and everyting works.
No errors and the cluster is healthy.

As soon as one node is restarted all IO is blocked, apparently because of
slow ops, but I see no reason for it.

Its running as simple as possible, with a replica count of 3.

The second the OSD's on the halted node dissapears I see slow ops, but its
blocking everything, and there is no IO to the cluster.

The slow requests are spread accross all of the remaining OSD's.

2021-10-20T05:07:02.554282+0200 mon.prodceph-mon1 [WRN] Health check
failed: 0 slow ops, oldest one blocked for 30 sec, osd.4 has slow ops
(SLOW_OPS)
2021-10-20T05:07:04.652756+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.585995+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:05.629622+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:05.629660+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:05.629690+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:06.555735+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:06.677696+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:06.677732+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:06.677750+0200 osd.13 [WRN] slow request
osd_op(client.305099.0:3244269 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94522369776384] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.402403+ currently delayed
2021-10-20T05:07:07.553717+0200 osd.25 [WRN] slow request
osd_op(client.394158.0:62776921 7.1f3
7:cfb51b5f:::5a288701-a65a-45c0-97c7-edfb38f2f487.124110.147864_b19283e9-c7bd-448e-952d-2f172467fa5c:head
[getxattrs,stat,read 0~4194304] snapc 0=[] ondisk+read+known_if_redirected
e18084) initiated 2021-10-20T03:06:35.106815+ currently delayed
2021-10-20T05:07:07.643135+0200 osd.13 [WRN] slow request
osd_op(client.394115.0:2994408 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94796974922496] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:34.010528+ currently delayed
2021-10-20T05:07:07.643159+0200 osd.13 [WRN] slow request
osd_op(client.394158.0:62776924 4.d 4:b4812045:::notify.4:head [watch ping
cookie 94141521019648] snapc 0=[] ondisk+write+known_if_redirected e18084)
initiated 2021-10-20T03:06:35.165999+ currently delayed
2021-10-20T05:07:07.643175+0200 osd.13 [WRN] slow request
osd_op(client.3050

[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Erich!

Yes, in most cases the mgr-balancer will happily accept jj-balancer movements 
and neither reverts nor worsens its optimizations.
It just generates new upmap items or removes existing ones, just like the 
mgr-balancer (which has to be in upmap mode of course).
So the intended usage is that you can run the script from time to time, yes, 
expecially when your cluster changes, pgs are moved due to failures, ...
A continuous mode is yet to be implemented :)

You can see (I think) best with prometheus/grafana how things are going, and 
with `show --osds --sort-utilization`.

I hope it's useful for you, maybe you can report back your experience!

-- Jonas

On 25/10/2021 17.23, E Taka wrote:
> Hi Jonas, 
> I'm impressed, Thanks!
> 
> I have a question about the usage: do I have to turn off the automatic 
> balancing feature (ceph balancer off)? Do the upmap balancer and your 
> customizations get in each other's way, or can I run your script from time to 
> time?
> 
> Thanks
> Erich

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: jj's "improved" ceph balancer

2021-10-25 Thread Jonas Jelten
Hi Josh,

yes, there's many factors to optimize... which makes it kinda hard to achieve 
an optimal solution.

I think we have to consider all these things, in ascending priority:

* 1: Minimize distance to CRUSH (prefer fewest upmaps, and remove upmap items 
if balance is better)
* 2: Relocation of PGs in remapped state (since they are not fully moved yet, 
hence 'easier' to relocate)
* 3: Per-Pool PG distribution, respecting OSD device size -> ideal_pg_count = 
osd_size * (pg_num / sum(possible_osd_sizes)
* 4: Primary/EC-N distribution (all osds have equal primary/EC-N counts, for 
workload balancing, not respecting device size (for hdd at least?), else this 
is just 3)
* 5: Capacity balancing (all osds equally full)
* 6: And of course CRUSH constraints

Beautiful optimization problem, which could be fed into a solver :)
My approach currently optimizes for 3, 5, 6, iteratively...

> My only comment about what you did is that it should somehow work pool by 
> pool and manage the +-1 globally.

I think this is already implemented!
Since each iteration I pick the "fullest" device first, it has to have more 
pools (or data) than other OSDs (e.g. through +1), and we try to migrate a PG 
off it.
And we only migrate a particular PG of a pool from such a source OSD if it has 
>ideal_amount_for_pool (float, hence we allow moving +1s or worse).
Same for a destination OSD, it's only selected if has other PGs of that pool 
 Hi Jonas,
> 
> I want to clarify a bit my thoughts (it may be long) regarding balancing in 
> general.
> 
> 1 - Balancing the capacity correctly is of top priority, this is because we 
> all know that the system is as full as the fullest device and as a storage 
> system we can't allow large capacity which is wasted and can't be used. This 
> is a top functional requirement.
> 2 - Workload balancing is a performance requirement, and an important one, 
> but we should not optimize workload on behalf of capacity so the challenge is 
> how to do both simultaneously. (hint: it is not always possible, but when 
> this is not possible the system performs less than the aggregated performance 
> of the devices)
> 
> Assumption 1: Per pool the workload on a PG is linear with the capacity, 
> which means either all PGs have the same workload (#PGs is a power of 2) or 
> some PGs has exactly twice the load as the others. From now on I will assume 
> the number of PGs is a power of 2, since the adjustments to the other case 
> are pretty simple. 
> 
> Conclusion 1: Balancing capacity based on all the PGs is the system may cause 
> workload imbalance - balancing capacity should be done on a pool by pool 
> basis. (assume 2 pools H(ot) and C(old) with exactly the same settings (#PGs, 
> capacity and protection scheme). If you balance per PG capacity only you can 
> have a device with all the PGs from C pool and a device with all the PGs from 
> the H pool -
> This will cause the second device to be fully loaded while the first device 
> is idle). 
> 
> On the other hand, your point about the +-1 PGs when working on a pool by 
> pool basis is correct and should be fixed (when working on pool by pool basis)
> 
> When all the devices are identical, the other thing we need to do for 
> balancing the workload is balancing the primaries (on a pool by pool basis) - 
> this means that when the capacity is balanced (every OSD has the same number 
> of PGs per pool) every OSD has also the same number of primaries (+-1) per 
> pool. This is mainly important for replicated pools, for EC pools it is 
> important (but less
> critical) when working without "fast read" mode, and does not have any effect 
> with EC pools with "fast read" mode enabled. (For EC pools we need to balance 
> the N OSDs from N+K and not only the primaries - think about replica-3 as a 
> special case of EC with 1+2)
> 
> Now what happens when the devices are not identical - 
> In case of mixing technologies (SSD and HDD) - (this is not recommended, but 
> you can see some use cases for this in my SDC presentation 
> ) - without 
> going into deep details the easiest solution is make all the faster (I mean 
> much faster such as HDD/SSD or SSD/PM) devices always primaries and all the 
> slow devices never primaries
> (assuming you always keep at least one copy on a fast device). More on this 
> in the presentation.  
> 
> The last case is when there are relatively minor performance differences 
> between the devices (HDD with different RPM rate, or devices with the same 
> technology and not the same size, but not a huge difference - I believe that 
> when on device has X times the capacity as others when X > replica-count, we 
> can't balance any more, but I need to complete my calculations). In these 
> cases, assuming we know
> something about the workload (R/W ratio) we can balance workload by giving 
> more primaries to the faster or smaller devices relative to the slower or 
> lar

[ceph-users] Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

2021-10-25 Thread mgrzybowski

Hi Igor

In ceph.conf:

[osd]
debug bluestore = 10/30

systemctl  start ceph-osd@2


~# ls -alh /var/log/ceph/ceph-osd.2.log
-rw-r--r-- 1 ceph ceph 416M paź 25 21:08 /var/log/ceph/ceph-osd.2.log

/var/log/ceph/ceph-osd.2.log | gzip >  ceph-osd.2.log.gz

Full compressed log on gdrive: 
https://drive.google.com/file/d/1Z3Yh92RQBgE1IGRsLdJrgM8qvLHXotBw/view?usp=sharing

~# tail -n 700  /var/log/ceph/ceph-osd.2.log
  -540> 2021-10-25T20:52:24.141+0200 7f0c27906f00 10 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values 25.a_head oid 
#25:5000head# = 0
  -539> 2021-10-25T20:52:24.141+0200 7f0c27906f00 20 _unpin0x55655e742800   
#25:5000head# unpinned
  -538> 2021-10-25T20:52:24.141+0200 7f0c27906f00 15 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values 25.a_head oid 
#25:5000head#
  -537> 2021-10-25T20:52:24.141+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup
  -536> 2021-10-25T20:52:24.141+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup 
#25:5000head# hit 0x556564efea00 1 1 0
  -535> 2021-10-25T20:52:24.141+0200 7f0c27906f00 20 _pin0x55655e742800   
#25:5000head# pinned
  -534> 2021-10-25T20:52:24.141+0200 7f0c27906f00 20 
bluestore.onode(0x556564efea00).flush flush done
  -533> 2021-10-25T20:52:24.141+0200 7f0c27906f00 30 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values  got 0x04F830DB'._epoch' 
-> _epoch
  -532> 2021-10-25T20:52:24.141+0200 7f0c27906f00 30 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values  got 
0x04F830DB'._infover' -> _infover
  -531> 2021-10-25T20:52:24.141+0200 7f0c27906f00 10 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values 25.a_head oid 
#25:5000head# = 0
  -530> 2021-10-25T20:52:24.141+0200 7f0c27906f00 20 _unpin0x55655e742800   
#25:5000head# unpinned
  -529> 2021-10-25T20:52:24.189+0200 7f0c1adc4700 20 
bluestore.MempoolThread(0x55655d98eb90) _resize_shards cache_size: 733714456 
kv_alloc: 318767104 kv_used: 61399496 kv_onode_alloc: 42949672 kv_onode_used: -22 
meta_alloc: 293601280 meta_used: 26058 data_alloc: 100663296 data_used: 40960
  -528> 2021-10-25T20:52:24.189+0200 7f0c1adc4700 30 
bluestore.MempoolThread(0x55655d98eb90) _resize_shards max_shard_onodes: 2816 
max_shard_buffer: 3145728
  -527> 2021-10-25T20:52:24.237+0200 7f0c27906f00  5 osd.2 pg_epoch: 23167 
pg[25.a(unlocked)] enter Initial
  -526> 2021-10-25T20:52:24.237+0200 7f0c165bb700 20 
bluestore(/var/lib/ceph/osd/ceph-2) _txc_apply_kv onode 0x556564efe500 had 1
  -525> 2021-10-25T20:52:24.237+0200 7f0c27906f00 15 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values 25.a_head oid 
#25:5000head#
  -524> 2021-10-25T20:52:24.237+0200 7f0c165bb700 10 HybridAllocator allocate 
want 0x1 unit 0x1 max_alloc_size 0x1 hint 0x0
  -523> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup
  -522> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup 
#25:5000head# hit 0x556564efea00 1 1 0
  -521> 2021-10-25T20:52:24.237+0200 7f0c27906f00 20 _pin0x55655e742800   
#25:5000head# pinned
  -520> 2021-10-25T20:52:24.237+0200 7f0c27906f00 20 
bluestore.onode(0x556564efea00).flush flush done
  -519> 2021-10-25T20:52:24.237+0200 7f0c165bb700 20 AvlAllocator _allocate 
first fit=5111808 size=65536
  -518> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values  got 
0x04F830DB'._biginfo' -> _biginfo
  -517> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values  got 0x04F830DB'._info' 
-> _info
  -516> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values  got 
0x04F830DB'._infover' -> _infover
  -515> 2021-10-25T20:52:24.237+0200 7f0c27906f00 10 
bluestore(/var/lib/ceph/osd/ceph-2) omap_get_values 25.a_head oid 
#25:5000head# = 0
  -514> 2021-10-25T20:52:24.237+0200 7f0c27906f00 20 _unpin0x55655e742800   
#25:5000head# unpinned
  -513> 2021-10-25T20:52:24.237+0200 7f0c27906f00 10 
bluestore(/var/lib/ceph/osd/ceph-2) stat 25.a_head #25:5000head#
  -512> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup
  -511> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e742800) lookup 
#25:5000head# hit 0x556564efea00 1 1 0
  -510> 2021-10-25T20:52:24.237+0200 7f0c27906f00 20 _pin0x55655e742800   
#25:5000head# pinned
  -509> 2021-10-25T20:52:24.237+0200 7f0c27906f00 20 _unpin0x55655e742800   
#25:5000head# unpinned
  -508> 2021-10-25T20:52:24.237+0200 7f0c27906f00 10 
bluestore(/var/lib/ceph/osd/ceph-2) get_omap_iterator 25.a_head 
#25:5000head#
  -507> 2021-10-25T20:52:24.237+0200 7f0c27906f00 30 
bluestore.OnodeSpace(0x556567002320 in 0x55655e74

[ceph-users] Re: RGW/multisite sync traffic rps

2021-10-25 Thread Stefan Schueffler
Hi Istvan,

we don’t have thus many users constantly uploading or deleting objects. In our 
environment, there are very little (around 2 - 10) PUTs per second. This is 
exactly what makes me wonder about the huge number of sync requests - as there 
should not be any (apart from a very tiny little amount) of changes to 
synchronize, at all?

Stefan


Am 22.10.2021 um 19:50 schrieb Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>>:

I see the same issue (45k GET requests constantly as admin), what my guess is, 
the primary site is putting to the datalog the changes and the secondary sites 
are pulling these logs as it changes.
Do you have user who constantly uploading, deleting?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Oct 22., at 10:46, Stefan Schueffler 
mailto:s.schueff...@softgarden.de>> wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hi,

i have a question on RGW/multisite. The sync traffic is running a lot of 
requests per second (around 1500), which seems to be high, especially compared 
to the actual volume of user/client-requests.

We have a rather simple multisite-setup with
- two ceph clusters (16.2.6), 1 realm, 1 zonegroup, and one zone on each side, 
one of them ist the master zone.
- latency between those cluster around 0.3ms
- each cluster has 3 RGW/beast daemons running.
- a handful of buckets (around 20), and a check script which creates one bucket 
per second (and deletes it after validating the successful bucket creation).
- one of the buckets has a few million (smaller) objects, the others are (more 
or less) empty.
- from the client side, there are just a few requests per second (mostly PUT 
objects into the one larger bucket), writing a few kilobytes per second.
- roughly 5 GB in total disk size consumed currently, with the idea to increase 
the total consumption to a few TB over time.

Both clusters are in sync (after the initial full sync, they now do incremental 
sync). Although they do sync the new objects from cluster A (master, to which 
the clients connect to) to B, we see a lot of „internal“ sync requests in our 
monitoring: each rgw daemon does about 500 requests per second to a rgw daemon 
on cluster A, especially to "/admin/log?…", which leads to a total of 1500 
requests per second just for the sync, and this results in almost 60% cpu usage 
for the rgw/beast processes.

When stopping and restarting the rgw-instances on cluster-B, it first catches 
up with the delta, and as soon as it finishes, it starts to request in this 
endless loop "/admin/log…"

Is this amount of internal, sync-related requests normal and expected?

Thanks for any ideas how to debug / introspect this.

Best
Stefan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm does not find podman objects for osds

2021-10-25 Thread Magnus Harlander
Hi,

after converting my 2 node cluster to cephadm I'm in lots of trouble.

- containerized mds are not available in the cluster. I must
  run mds from systemd to make my fs available.

- osd podman objects are not found after a reboot of one node. I
  don't want to test it on the second node, because if it looses
  the podman config as well, my cluster is dead!

Can anybody help me?

Best regards Magnus

Output from cephadm.log:

cephadm ['--image', 'docker.io/ceph/ceph:v15', '--no-container-init', 'ls']
2021-10-25 22:47:01,929 DEBUG container_init=False
2021-10-25 22:47:01,929 DEBUG Running command: systemctl is-enabled
ceph-mds@s1
2021-10-25 22:47:01,935 DEBUG systemctl: stdout enabled
2021-10-25 22:47:01,935 DEBUG Running command: systemctl is-active
ceph-mds@s1
2021-10-25 22:47:01,940 DEBUG systemctl: stdout active
2021-10-25 22:47:01,940 DEBUG Running command: ceph -v
2021-10-25 22:47:02,009 DEBUG ceph: stdout ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
2021-10-25 22:47:02,016 DEBUG Running command: systemctl is-enabled
ceph-osd@1
2021-10-25 22:47:02,024 DEBUG systemctl: stdout disabled
2021-10-25 22:47:02,024 DEBUG Running command: systemctl is-active
ceph-osd@1
2021-10-25 22:47:02,031 DEBUG systemctl: stdout inactive
2021-10-25 22:47:02,031 DEBUG Running command: systemctl is-enabled
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@mon.s1
2021-10-25 22:47:02,036 DEBUG systemctl: stdout enabled
2021-10-25 22:47:02,036 DEBUG Running command: systemctl is-active
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@mon.s1
2021-10-25 22:47:02,041 DEBUG systemctl: stdout active
2021-10-25 22:47:02,041 DEBUG Running command: /bin/podman --version
2021-10-25 22:47:02,068 DEBUG /bin/podman: stdout podman version 3.2.3
2021-10-25 22:47:02,069 DEBUG Running command: /bin/podman inspect
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
.Config.Labels "io.ceph.version"}}
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-mon.s1
2021-10-25 22:47:02,111 DEBUG /bin/podman: stdout
1fd6debed26923212e3b3b263e88505d2b70fe024b7a1c01105299bb746d7c48,docker.io/ceph/ceph:v15,2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73,2021-10-25
22:45:39.315992691 +0200 CEST,
2021-10-25 22:47:02,203 DEBUG Running command: /bin/podman exec
1fd6debed26923212e3b3b263e88505d2b70fe024b7a1c01105299bb746d7c48 ceph -v
2021-10-25 22:47:02,356 DEBUG /bin/podman: stdout ceph version 15.2.13
(c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
2021-10-25 22:47:02,434 DEBUG Running command: systemctl is-enabled
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@mgr.s1
2021-10-25 22:47:02,440 DEBUG systemctl: stdout enabled
2021-10-25 22:47:02,440 DEBUG Running command: systemctl is-active
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@mgr.s1
2021-10-25 22:47:02,445 DEBUG systemctl: stdout active
2021-10-25 22:47:02,445 DEBUG Running command: /bin/podman --version
2021-10-25 22:47:02,468 DEBUG /bin/podman: stdout podman version 3.2.3
2021-10-25 22:47:02,470 DEBUG Running command: /bin/podman inspect
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
.Config.Labels "io.ceph.version"}}
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-mgr.s1
2021-10-25 22:47:02,513 DEBUG /bin/podman: stdout
2b08ddb6182e14985939e50a00e2306a77c3068a65ed122dab8bd5604c91af65,docker.io/ceph/ceph:v15,2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73,2021-10-25
22:45:39.535973449 +0200 CEST,
2021-10-25 22:47:02,601 DEBUG Running command: systemctl is-enabled
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@osd.0
2021-10-25 22:47:02,607 DEBUG systemctl: stdout enabled
2021-10-25 22:47:02,607 DEBUG Running command: systemctl is-active
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@osd.0
2021-10-25 22:47:02,612 DEBUG systemctl: stdout activating
2021-10-25 22:47:02,612 DEBUG Running command: /bin/podman --version
2021-10-25 22:47:02,636 DEBUG /bin/podman: stdout podman version 3.2.3
2021-10-25 22:47:02,637 DEBUG Running command: /bin/podman inspect
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
.Config.Labels "io.ceph.version"}}
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.0
2021-10-25 22:47:02,709 DEBUG /bin/podman: stderr Error: error
inspecting object: no such object:
"ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.0"
2021-10-25 22:47:02,712 DEBUG Running command: systemctl is-enabled
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@osd.2
2021-10-25 22:47:02,718 DEBUG systemctl: stdout enabled
2021-10-25 22:47:02,718 DEBUG Running command: systemctl is-active
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524@osd.2
2021-10-25 22:47:02,723 DEBUG systemctl: stdout activating
2021-10-25 22:47:02,723 DEBUG Running command: /bin/podman --version
2021-10-25 22:47:02,747 DEBUG /bin/podman: stdout podman version 3.2.3
2021-10-25 22:47:02,748 DEBUG Running command: /bin/podman inspect
--format {{.Id}},{{.Config.Image}},{{.Image}},{{.Created}},{{index
.Config.Labels "io.ceph.version"}}
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.2

[ceph-users] Re: MDS not becoming active after migrating to cephadm

2021-10-25 Thread Magnus Harlander
Hi,

I just migrated to cephadm on my 2 node octopus cluster.
I have the same problems with the mds started in a container
not being available to ceph. Had to run the old systemd mds, to
keep the fs available.

some outputs:


[root@s0 ~]# ceph health detail
HEALTH_WARN 2 stray daemon(s) not managed by cephadm
[WRN] CEPHADM_STRAY_DAEMON: 2 stray daemon(s) not managed by cephadm
    stray daemon mds.s0 on host s0.harlan.de not managed by cephadm
    stray daemon mds.s1 on host s1.harlan.de not managed by cephadm


[root@s0 ~]# ceph -s
  cluster:
    id: 86bbd6c5-ae96-4c78-8a5e-50623f0ae524
    health: HEALTH_WARN
    2 stray daemon(s) not managed by cephadm

  services:
    mon: 3 daemons, quorum s0,s1,r1 (age 2h)
    mgr: s1(active, since 2h), standbys: s0
    mds: fs:1 {0=s0=up:active} 1 up:standby
    osd: 10 osds: 10 up (since 2h), 10 in (since 11h)

  data:
    pools:   6 pools, 289 pgs
    objects: 1.85M objects, 1.7 TiB
    usage:   3.6 TiB used, 13 TiB / 16 TiB avail
    pgs: 289 active+clean

  io:
    client:   85 B/s rd, 855 KiB/s wr, 0 op/s rd, 110 op/s wr



root@r1:/tmp# ceph fs ls
name: fs, metadata pool: cfs_md, data pools: [cfs ]


root@r1:/tmp# ceph orch ps --daemon-type mds
NAME  HOST  STATUS    REFRESHED  AGE  VERSION 
IMAGE NAME   IMAGE ID  CONTAINER ID  
mds.fs.s0.khuhto  s0.harlan.de  running (2h)  3m ago 2h   15.2.13 
docker.io/ceph/ceph:v15  2cf504fded39  0a65ce57d168  
mds.fs.s1.ajxyaf  s1.harlan.de  running (2h)  3m ago 2h   15.2.13 
docker.io/ceph/ceph:v15  2cf504fded39  407bd3bdb334 

Both WORKING mds are running from systemctl not cephadm. When I stop
them, fs is not available any more.


[root@s0 qemu]# systemctl status ceph-mds@s0
● ceph-mds@s0.service - Ceph metadata server daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mds@.service; enabled; vendor 
preset: disabled)
   Active: active (running) since Mon 2021-10-25 18:31:11 CEST; 2h 34min ago
 Main PID: 326528 (ceph-mds)
Tasks: 23
   Memory: 1.1G
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@s0.service
   └─326528 /usr/bin/ceph-mds -f --cluster ceph --id s0 --setuser ceph 
--setgroup ceph

Okt 25 18:31:11 s0.harlan.de systemd[1]: Started Ceph metadata server daemon.
Okt 25 18:31:11 s0.harlan.de ceph-mds[326528]: starting mds.s0 at


[root@s1 ceph]# systemctl status ceph-mds@s1
● ceph-mds@s1.service - Ceph metadata server daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mds@.service; enabled; vendor 
preset: disabled)
   Active: active (running) since Mon 2021-10-25 20:58:17 CEST; 6min ago
 Main PID: 266482 (ceph-mds)
Tasks: 15
   Memory: 15.2M
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@s1.service
   └─266482 /usr/bin/ceph-mds -f --cluster ceph --id s1 --setuser ceph 
--setgroup ceph

Oct 25 20:58:17 s1.harlan.de systemd[1]: Started Ceph metadata server daemon.
Oct 25 20:58:17 s1.harlan.de ceph-mds[266482]: starting mds.s1 at


[root@s0 qemu]# podman ps
CONTAINER ID  IMAGECOMMAND   CREATED  
STATUS  PORTS   NAMES
4a66b2a1b9d1  docker.io/ceph/ceph:v15  -n mon.s0 -f --se...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-mon.s0
4319d986bbfc  docker.io/ceph/ceph:v15  -n mgr.s0 -f --se...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-mgr.s0
58bd2d0b1f3d  docker.io/ceph/ceph:v15  -n osd.1 -f --set...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.1
14f80276cb4a  docker.io/ceph/ceph:v15  -n osd.3 -f --set...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.3
37f51999a723  docker.io/ceph/ceph:v15  -n osd.4 -f --set...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.4
fda7ef3bd7ea  docker.io/ceph/ceph:v15  -n osd.5 -f --set...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.5
d390a53b3d29  docker.io/ceph/ceph:v15  -n osd.9 -f --set...  3 hours ago  Up 3 
hours ago  ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-osd.9
0a65ce57d168  docker.io/ceph/ceph:v15  -n mds.fs.s0.khuh...  3 hours ago  Up 3 
hours ago  
ceph-86bbd6c5-ae96-4c78-8a5e-50623f0ae524-mds.fs.s0.khuhto


[root@s1 ceph]# podman ps
CONTAINER ID  

[ceph-users] Re: Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

2021-10-25 Thread Igor Fedotov

Hi Beard,

curious if that cluster had been created by pre-Nautilus release, e.g. 
Luminous or Kraken?



Thanks,

Igor

On 10/22/2021 3:53 PM, Beard Lionel wrote:

Hi,

I had exactly the same behaviour:
- upgrade from nautilus to pacific
- same warning message
- set config option
- restart osd. I've first restarted one osd and it was fine, so I decided to 
restart all osds of same host, and about half of osds can't start anymore with 
same error as you.

We didn't find any workaround, apart deleting and recreating failed osds ☹

For MDS, which was also crashing, I had to follow the recovery procedure to 
recover my data: 
https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery

Cordialement, Regards,
Lionel BEARD
CLS - IT & Operations

-Message d'origine-
De : Marek Grzybowski  De la part de mgrzybowski 
Envoyé : mercredi 20 octobre 2021 23:56 À : ceph-users@ceph.io Objet : [ceph-users] 
Upgrade to 16.2.6 and osd+mds crash after bluestore_fsck_quick_fix_on_mount true

CAUTION: This message comes from an external server, do not click on links or 
open attachments unless you know the sender and are sure the content is safe.


Hi
Recently I did perform upgrades on single node cephfs server i have.

# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data 
ecpoolk3m1osd ecpoolk5m1osd ecpoolk4m2osd ~# ceph osd pool ls detail pool 20 
'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins 
pg_num 32 pgp_num 32 autoscale_mode warn last_change 10674 lfor 0/0/5088 flags 
hashpspool stripe_width 0 application cephfs pool 21 'cephfs_metadata' 
replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 
pgp_num 32 autoscale_mode warn last_change 10674 lfor 0/0/5179 flags hashpspool 
stripe_width 0 application cephfs pool 22 'ecpoolk3m1osd' erasure profile 
myprofilek3m1osd size 4 min_size 3 crush_rule 3 object_hash rjenkins pg_num 16 
pgp_num 16 autoscale_mode warn last_change 10674 lfor 0/0/1442 flags 
hashpspool,ec_overwrites stripe_width 12288 compression_algorithm zstd 
compression_mode aggressive application cephfs pool 23 'ecpoolk5m1osd' erasure 
profile myprofilek5m1osd size 6 min_size 5 crush_rule 5 object_hash rjenkins 
pg_num 128 pgp_num 128 autoscale_mode warn last_change 12517 lfor 0/0/7892 
flags hashpspool,ec_overwrites stripe_width 20480 compression_algorithm zstd 
compression_mode aggressive application cephfs pool 24 'ecpoolk4m2osd' erasure 
profile myprofilek4m2osd size 6 min_size 5 crush_rule 6 object_hash rjenkins 
pg_num 64 pgp_num 64 autoscale_mode warn last_change 10674 flags 
hashpspool,ec_overwrites stripe_width 16384 compression_algorithm zstd 
compression_mode aggressive application cephfs pool 25 'device_health_metrics' 
replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 
pgp_num 32 autoscale_mode on last_change 11033 lfor 0/0/10991 flags hashpspool 
stripe_width 0 pg_num_min 1 application mgr_devicehealth


I started this upgrade from ubuntu 16.04 and luminous ( there were upgrades in 
the past and some osd's could be started in Kraken ) ):
- first i upgraded ceph to Nautilus,  all seems to went well and accoording to 
the docs, no warning in status
- then i did "do-release-upgrade" to ubuntu to 18.04 ( ceph packaged  were not 
touch by that upgrade )
- then i did "do-release-upgrade" to ubuntu to 20.04 ( this upgrade bumped ceph
packages to 15.2.1-0ubuntu1, before each do-release-upgrade i removed 
/etc/ceph/ceph.conf,
so at least mon deamon was down. osd should not start ( siple volumes are 
encrypted )
- next i upgraded ceph packages to  16.2.6-1focal m started deamons .

All seems to work well, only what left was warning:

10 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats

I found on the list that it is recommend to set:

ceph config set osd bluestore_fsck_quick_fix_on_mount true

and rolling restart OSDs. After first restart+fsck i got crash on OSD ( and on 
MDS to) :

  -1> 2021-10-14T22:02:45.877+0200 7f7f080a4f00 -1 
/build/ceph-16.2.6/src/osd/PG.cc: In function 'static int 
PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*)' thread 7f7f080a4f00 time 
2021-10-14T22:02:45.878154+0200
/build/ceph-16.2.6/src/osd/PG.cc: 1009: FAILED ceph_assert(values.size() == 2)
   ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific 
(stable)
   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x152) [0x55e29cd0ce61]
   2: /usr/bin/ceph-osd(+0xac6069) [0x55e29cd0d069]
   3: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*)+0xa17) 
[0x55e29ce97057]
   4: (OSD::load_pgs()+0x6b4) [0x55e29ce07ec4]
   5: (OSD::init()+0x2b4e) [0x55e29ce14a6e]
   6: main()
   7: __libc_start_main()
   8: _start()


The same went on next restart+fsck  osd:

  -1> 2021-10-17T22:47:49.291+0200 7f98877bff00 -1 
/build/ceph-16.2.6/src/osd/PG.cc: In function 'static int 
PG::peek_map_epoch(Objec

[ceph-users] 16.2.6 OSD down, out but container running....

2021-10-25 Thread Marco Pizzolo
Hello Everyone,

I'm seeing an issue where the podman container is running, but the osd is
being reported as down and out.  restarting service doesn't help, neither
does rebooting the host.

What am I missing?

Thanks,
Marco
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 回复: 16.2.6 OSD down, out but container running....

2021-10-25 Thread 胡 玮文
Could you post the logs of the problematic OSDs? E.g.:

cephadm logs --name osd.0

发件人: Marco Pizzolo
发送时间: 2021年10月26日 7:15
收件人: ceph-users
主题: [ceph-users] 16.2.6 OSD down, out but container running

Hello Everyone,

I'm seeing an issue where the podman container is running, but the osd is
being reported as down and out.  restarting service doesn't help, neither
does rebooting the host.

What am I missing?

Thanks,
Marco
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.6 OSD down, out but container running....

2021-10-25 Thread Stefan Kooman

On 10/26/21 01:14, Marco Pizzolo wrote:

Hello Everyone,

I'm seeing an issue where the podman container is running, but the osd is
being reported as down and out.  restarting service doesn't help, neither
does rebooting the host.

What am I missing?


Can you try:

ceph osd in $osd.id

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io