[ceph-users] One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Hi Guys

I am busy removing an OSD from my rook-ceph cluster. I did 'ceph osd out
osd.7'  and the re-balancing process started. Now it has stalled with one
pg on "active+undersized+degraded". I have done this before and it has
worked fine.

# ceph health detail
HEALTH_WARN Degraded data redundancy: 15/94659 objects degraded (0.016%), 1
pg degraded, 1 pg undersized
[WRN] PG_DEGRADED: Degraded data redundancy: 15/94659 objects degraded
(0.016%), 1 pg degraded, 1 pg undersized
pg 3.1f is stuck undersized for 2h, current state
active+undersized+degraded, last acting [0,2]

# ceph pg dump_stuck
PG_STAT  STATE   UP UP_PRIMARY  ACTING
 ACTING_PRIMARY
3.1f active+undersized+degraded  [0,2]   0   [0,2]
  0

I have lots of OSDs on different nodes:

# ceph osd tree
ID   CLASS  WEIGHTTYPE NAME   STATUS
 REWEIGHT  PRI-AFF
 -1 13.77573  root default

 -5 13.77573  region FSN1

-22  0.73419  zone FSN1-DC13

-210  host node5-redacted-com

-27  0.73419  host node7-redacted-com

  1ssd   0.36710  osd.1   up
1.0  1.0
  5ssd   0.36710  osd.5   up
1.0  1.0
-10  6.20297  zone FSN1-DC14

 -9  6.20297  host node3-redacted-com

  2ssd   3.10149  osd.2   up
1.0  1.0
  4ssd   3.10149  osd.4   up
1.0  1.0
-18  3.19919  zone FSN1-DC15

-17  3.19919  host node4-redacted-com

  7ssd   3.19919  osd.7 down
  0  1.0
 -4  2.90518  zone FSN1-DC16

 -3  2.90518  host node1-redacted-com

  0ssd   1.45259  osd.0   up
1.0  1.0
  3ssd   1.45259  osd.3   up
1.0  1.0
-14  0.73419  zone FSN1-DC18

-130  host node2-redacted-com

-25  0.73419  host node6-redacted-com

 10ssd   0.36710  osd.10  up
1.0  1.0
 11ssd   0.36710  osd.11  up
1.0  1.0

Any ideas on how to fix this?

Thanks
David
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Sure. Tx.

# ceph pg 3.1f query
{
"snap_trimq": "[]",
"snap_trimq_len": 0,
"state": "active+undersized+degraded",
"epoch": 2477,
"up": [
0,
2
],
"acting": [
0,
2
],
"acting_recovery_backfill": [
"0",
"2"
],
"info": {
"pgid": "3.1f",
"last_update": "2475'68370",
"last_complete": "2475'68370",
"log_tail": "2197'61385",
"last_user_version": 68361,
"last_backfill": "MAX",
"purged_snaps": [],
"history": {
"epoch_created": 35,
"epoch_pool_created": 35,
"last_epoch_started": 2371,
"last_interval_started": 2364,
"last_epoch_clean": 2150,
"last_interval_clean": 2149,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 2364,
"same_interval_since": 2364,
"same_primary_since": 2149,
"last_scrub": "2228'68323",
"last_scrub_stamp": "2021-11-18T02:13:56.969240+",
"last_deep_scrub": "2216'64918",
"last_deep_scrub_stamp": "2021-11-16T20:37:17.620680+",
"last_clean_scrub_stamp": "2021-11-18T02:13:56.969240+",
"prior_readable_until_ub": 0
},
"stats": {
"version": "2475'68370",
"reported_seq": "239508",
"reported_epoch": "2477",
"state": "active+undersized+degraded",
"last_fresh": "2021-11-18T09:52:22.117896+",
"last_change": "2021-11-18T07:13:25.620031+",
"last_active": "2021-11-18T09:52:22.117896+",
"last_peered": "2021-11-18T09:52:22.117896+",
"last_clean": "2021-11-18T07:11:26.622323+",
"last_became_active": "2021-11-18T07:13:25.620031+",
"last_became_peered": "2021-11-18T07:13:25.620031+",
"last_unstale": "2021-11-18T09:52:22.117896+",
"last_undegraded": "2021-11-18T07:13:25.618970+",
"last_fullsized": "2021-11-18T07:13:25.618860+",
"mapping_epoch": 2364,
"log_start": "2197'61385",
"ondisk_log_start": "2197'61385",
"created": 35,
"last_epoch_clean": 2150,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "2228'68323",
"last_scrub_stamp": "2021-11-18T02:13:56.969240+",
"last_deep_scrub": "2216'64918",
"last_deep_scrub_stamp": "2021-11-16T20:37:17.620680+",
"last_clean_scrub_stamp": "2021-11-18T02:13:56.969240+",
"log_size": 6985,
"ondisk_log_size": 6985,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 12583465,
"num_objects": 15,
"num_object_clones": 0,
"num_object_copies": 45,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 15,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 15,
"num_whiteouts": 0,
"num_read": 167793,
"num_read_kb": 438718,
"num_write": 68025,
"num_write_kb": 373419,
"num_scrub_errors": 0,
"num_shallow_scrub_errors": 0,
"num_deep_scrub_errors": 0,
"num_objects_recovered": 48,
"num_bytes_recovered": 21150091,
"num_keys_recovered": 47,
"num_objects_omap": 11,
"num_objects_hit_set_archive": 0,
"num_bytes_hit_set_archive": 0,
"num_flush": 0,
"num_flush_kb": 0,
"num_evict": 0,
"num_evict_kb": 0,
"num_promote": 0,
"num_flush_mode_high": 0,
"num_flush_mode_low": 0,
"num_evict_mode_some": 0,
"num_evict_mode_full": 0,
"num_objects_pinned": 0,
"num_legacy_snapsets": 0,
"num_large_omap_objects": 0,
"num_objects_manifest": 0,
"num_omap_bytes": 7536,
"num_omap_keys": 16,
"num_objects_repaired": 0
},
"up": [
0,
2
],
"acting": [
0,
2
],
"avail_no_missing": [
"0",
"2"

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
I just grepped all the OSD pod logs for error and warn and nothing comes up:

# k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk  | grep -i warn
etc

I am assuming that would bring back something if any of them were unhappy.



On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman  wrote:

> On 11/18/21 11:45, David Tinker wrote:
>
> 
>
> >  "recovery_state": [
> >  {
> >  "name": "Started/Primary/Active",
> >  "enter_time": "2021-11-18T07:13:25.618950+",
> >  "might_have_unfound": [],
> >  "recovery_progress": {
> >  "backfill_targets": [],
> >  "waiting_on_backfill": [],
> >  "last_backfill_started": "MIN",
> >  "backfill_info": {
> >  "begin": "MIN",
> >  "end": "MIN",
> >  "objects": []
> >  },
> >  "peer_backfill_info": [],
> >  "backfills_in_flight": [],
> >  "recovering": [],
> >  "pg_backend": {
> >  "pull_from_peer": [],
> >  "pushing": []
> >  }
> >  }
> >  },
> >  {
> >  "name": "Started",
> >  "enter_time": "2021-11-18T07:13:25.618794+"
> >  },
> >  {
> >  "scrubber.epoch_start": "2149",
> >  "scrubber.active": false,
> >  "scrubber.state": "INACTIVE",
> >  "scrubber.start": "MIN",
> >  "scrubber.end": "MIN",
> >  "scrubber.max_end": "MIN",
> >  "scrubber.subset_last_update": "0'0",
> >  "scrubber.deep": false,
> >  "scrubber.waiting_on_whom": []
> >  }
> >  ],
> >  "agent_state": {}
>
> Nothing unusual in the recovery_state. I expected a reason why Ceph
> could not make progress.
>
> Is there logged something in osd.0 that might give a hint what is going
> on here?
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Peter Lieven

Am 17.11.21 um 20:14 schrieb Marc:

a good choice. It lacks RBD encryption and read leases. But for us
upgrading from N to O or P is currently not


what about just using osd encryption with N?



That would be Data at Rest encryption only. The keys for the OSDs are stored on 
the mons. Data is transferred unencrypted over the wire.

RBD encryption takes place in the client.


Peter


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Dan van der Ster
Hi,

We sometimes have similar stuck client recall warnings.
To debug you can try:

(1) ceph health detail
 that will show you the client ids which are generating the
warning. (e.g. 1234)
(2) ceph tell mds.* client ls id=1234
 this will show lots of client statistics for the session. Notably
you need to look at the num_caps and the various recall metrics, and
see how they are changing over time.

From experience this can be one of two or three things:
1. the client is very quickly iterating through files and the mds
isn't recalling them fast enough -- that shouldn't be the case if
you're running the default caps recall tuning on 16.2.6.
2. the client has some files open and cannot release the caps it holds
for some reason. We've seen some apps behave like this, and also I've
noticed that if users mount cephfs on another cephfs, or use bind
mounts, in some cases the caps of the "lower" cephfs will simply never
be released. I've found that issuing an `ls -lR` in the working dir of
the other client sometimes provokes the caps to be released. Or, if
they really will never release, you can try `echo 2 >
/proc/sys/vm/drop_caches` or umount / mount on the client.

If the cause isn't obvious to you, you can share some client session
stats and we can try to help here.

Best Regards,

dan





On Thu, Nov 18, 2021 at 6:36 AM 胡 玮文  wrote:
>
> Hi all,
>
> We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it 
> seems harmless, but we cannot get HEALTH_OK, which is annoying.
>
> The clients that are reported failing to respond to cache pressure are 
> constantly changing, and most of the time we got 1-5 such clients out of ~20. 
> All of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 
> 20.04. The load is pretty low.
>
> We are reading datasets that consist of millions of small files from cephfs, 
> so we have tuned some config for performance. Some configs from "ceph config 
> dump" that might be relevant:
>
> WHO   LEVEL OPTION   VALUE
>   mds basic mds_cache_memory_limit   51539607552
>   mds advanced  mds_max_caps_per_client  8388608
>   client  basic client_cache_size32768
>
> We also manually pinned almost every directory to either rank 0 or rank 1.
>
> Any thoughts about what causes the warning, or how can we get rid of it?
>
> Thanks,
> Weiwen Hu
>
>
> # ceph -s
>   cluster:
> id: e88d509a-f6fc-11ea-b25d-a0423f3ac864
> health: HEALTH_WARN
> 4 clients failing to respond to cache pressure
>
>   services:
> mon: 5 daemons, quorum gpu024,gpu006,gpu023,gpu013,gpu018 (age 7d)
> mgr: gpu014.kwbqcf(active, since 2w), standbys: gpu024.bapbcz
> mds: 2/2 daemons up, 2 hot standby
> osd: 45 osds: 45 up (since 2h), 45 in (since 5d)
> rgw: 2 daemons active (2 hosts, 1 zones)
>
>   data:
> volumes: 1/1 healthy
> pools:   16 pools, 1713 pgs
> objects: 265.84M objects, 55 TiB
> usage:   115 TiB used, 93 TiB / 208 TiB avail
> pgs: 1711 active+clean
>  2active+clean+scrubbing+deep
>
>   io:
> client:   55 MiB/s rd, 5.2 MiB/s wr, 513 op/s rd, 14 op/s wr
>
>
> # ceph fs status
> cephfs - 23 clients
> ==
> RANK  STATE   MDS  ACTIVITY DNSINOS   
> DIRS   CAPS
>  0active  cephfs.gpu018.ovxvoz  Reqs:  241 /s  17.3M  17.3M  
> 41.3k  5054k
>  1active  cephfs.gpu023.aetiph  Reqs:1 /s  13.1M  12.1M   
> 864k   586k
> 1-s   standby-replay  cephfs.gpu006.ddpekw  Evts:2 /s  2517k  2393k   
> 216k 0
> 0-s   standby-replay  cephfs.gpu024.rpfbnh  Evts:   17 /s  9587k  9587k   
> 214k 0
>   POOL  TYPE USED  AVAIL
>cephfs.cephfs.meta metadata   126G   350G
>cephfs.cephfs.data   data 102T  25.9T
>  cephfs.cephfs.data_ssd data   0525G
> cephfs.cephfs.data_mixeddata9.81T   350G
>  cephfs.cephfs.data_ec  data 166G  41.4T
> MDS version: ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) 
> pacific (stable)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Maged Mokhtar

Hello Cephers,

i too am for LTS releases or for some kind of middle ground like longer 
release cycle and/or have even numbered releases designated for 
production like before. We all use LTS releases for the base OS when 
running Ceph, yet in reality we depend much more on the Ceph code than 
the base OS.


Another thing we hear our users want, after stability, is performance. 
it ultimately determines the cost of the storage solution. I think this 
should be high on the priority list. I know there has been a lot of 
effort with Crimson development for a while, but from my opinion if Ceph 
was run by a purely commercial company, getting this out the door as 
quickly as possible would take priority.


We may have different opinions on priorities but one thing for sure: 
Ceph is the best storage solution hands down, so kudos to all involved.


/Maged


On 18/11/2021 09:51, Janne Johansson wrote:

Den ons 17 nov. 2021 kl 18:41 skrev Dave Hall :

The real point here:  From what I'm reading in this mailing list it appears
that most non-developers are currently afraid to risk an upgrade to Octopus
or Pacific.  If this is an accurate perception then THIS IS THE ONLY
PROBLEM.

You might also consider that Luminous had a bad streak somewhere in
the middle, so if people
are cautious about .0 / .1 releases, wait until .5-.9 and still get
burnt, that feeling gets stuck in your mind.

Kraken was experimental, so Hammer and Jewel clusters waited for L to
settle, then they got all kinds of
weird bugs in the middle of that release cycle anyway.

Jumping to a newish Mimic might not have felt like the best option to
escape Lum bugs.
Half of the problem of running into bugs like the ones in Lum is that
you often need to be able to move back out of them, before moving
forward again.

There is no guarantee that the developed-in-parallel M point release
.0/.1/.2 will have
corrective code that fixes the newly introduced errors in Lum, so
holding out for the
next Lum point will sometimes feel safer.

If you wonder why people wait for Oct to have ten or so releases
before upgrading to it,
meaning they are stuck in something that is unsupported by the time
Oct has "proven itself",
this would be one of the reasons. For new clusters, I would not mind
starting with as
late a release as possible.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
If I ignore the dire warnings and about losing data and do:
ceph osd purge 7

will I lose data? There are still 2 copies of everything right?

I need to remove the node with the OSD from the k8s cluster, reinstall it
and have it re-join the cluster. This will bring in some new OSDs and maybe
Ceph will use them to sort out the stuck PG?

Is there a way to trigger Ceph to try find another OSD for the stuck pg?

On Thu, Nov 18, 2021 at 2:20 PM David Tinker  wrote:

> I just grepped all the OSD pod logs for error and warn and nothing comes
> up:
>
> # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk  | grep -i warn
> etc
>
> I am assuming that would bring back something if any of them were unhappy.
>
>
>
> On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman  wrote:
>
>> On 11/18/21 11:45, David Tinker wrote:
>>
>> 
>>
>> >  "recovery_state": [
>> >  {
>> >  "name": "Started/Primary/Active",
>> >  "enter_time": "2021-11-18T07:13:25.618950+",
>> >  "might_have_unfound": [],
>> >  "recovery_progress": {
>> >  "backfill_targets": [],
>> >  "waiting_on_backfill": [],
>> >  "last_backfill_started": "MIN",
>> >  "backfill_info": {
>> >  "begin": "MIN",
>> >  "end": "MIN",
>> >  "objects": []
>> >  },
>> >  "peer_backfill_info": [],
>> >  "backfills_in_flight": [],
>> >  "recovering": [],
>> >  "pg_backend": {
>> >  "pull_from_peer": [],
>> >  "pushing": []
>> >  }
>> >  }
>> >  },
>> >  {
>> >  "name": "Started",
>> >  "enter_time": "2021-11-18T07:13:25.618794+"
>> >  },
>> >  {
>> >  "scrubber.epoch_start": "2149",
>> >  "scrubber.active": false,
>> >  "scrubber.state": "INACTIVE",
>> >  "scrubber.start": "MIN",
>> >  "scrubber.end": "MIN",
>> >  "scrubber.max_end": "MIN",
>> >  "scrubber.subset_last_update": "0'0",
>> >  "scrubber.deep": false,
>> >  "scrubber.waiting_on_whom": []
>> >  }
>> >  ],
>> >  "agent_state": {}
>>
>> Nothing unusual in the recovery_state. I expected a reason why Ceph
>> could not make progress.
>>
>> Is there logged something in osd.0 that might give a hint what is going
>> on here?
>>
>> Gr. Stefan
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Tx. # ceph version
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)



On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman  wrote:

> On 11/18/21 13:20, David Tinker wrote:
> > I just grepped all the OSD pod logs for error and warn and nothing comes
> up:
> >
> > # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk  | grep -i warn
> > etc
> >
> > I am assuming that would bring back something if any of them were
> unhappy.
>
> Your issue looks similar to another thread last week (thread pg
> inactive+remapped).
>
> What Ceph version are you running?
>
> I don't know if enabling debugging on osd.7 would reveal something
>
> Maybe recovery can be trigger by moving the primary to another OSD with
> pg upmap. Check your failure domain to see what OSD would be suitable.
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Sasha Litvak
Perhaps I missed something,  but does the survey concludes that users don't
value reliability improvements at all?  This would explain why developers
team wants to concentrate on performance and ease of management.

On Thu, Nov 18, 2021, 07:23 Stefan Kooman  wrote:

> On 11/18/21 14:09, Maged Mokhtar wrote:
> > Hello Cephers,
> >
> > i too am for LTS releases or for some kind of middle ground like longer
> > release cycle and/or have even numbered releases designated for
> > production like before. We all use LTS releases for the base OS when
> > running Ceph, yet in reality we depend much more on the Ceph code than
> > the base OS.
> >
> > Another thing we hear our users want, after stability, is performance.
> > it ultimately determines the cost of the storage solution. I think this
> > should be high on the priority list. I know there has been a lot of
> > effort with Crimson development for a while, but from my opinion if Ceph
> > was run by a purely commercial company, getting this out the door as
> > quickly as possible would take priority.
>
> That is in line with the results from the last Ceph User Survey (2021):
>
>
>
> https://ceph.io/en/news/blog/2021/2021-ceph-user-survey-results/#based-on-weighted-category-prioritization
>
> So there is a dedicated group of people involved in the "next gen" OSD
> storage sub system, which is a big endeavor. In the mean time there are
> several developers improving the current implementation incrementally.
> Zac is doing a great job improving the documentation. Cephadm team is
> working on improving management. As as I have read correctly they will
> have access to a large cluster to improve ... the next thing on the prio
> list: scalability, in this case scalability of the management system.
>
> If there is a separate "quality" team for the No. 1 priority:
> Reliability? I don't know. Maybe that is just implicit in the project,
> to make things reliable by default? That might be an interesting thing
> to ask in the upcoming user+dev meeting ...
>
> Gr. Stefan
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Daniel Tönnißen
The weighted category prioritization clearly identifies reliability as the top 
priority.


Daniel

> Am 18.11.2021 um 15:32 schrieb Sasha Litvak :
> 
> Perhaps I missed something,  but does the survey concludes that users don't
> value reliability improvements at all?  This would explain why developers
> team wants to concentrate on performance and ease of management.
> 
> On Thu, Nov 18, 2021, 07:23 Stefan Kooman  wrote:
> 
>> On 11/18/21 14:09, Maged Mokhtar wrote:
>>> Hello Cephers,
>>> 
>>> i too am for LTS releases or for some kind of middle ground like longer
>>> release cycle and/or have even numbered releases designated for
>>> production like before. We all use LTS releases for the base OS when
>>> running Ceph, yet in reality we depend much more on the Ceph code than
>>> the base OS.
>>> 
>>> Another thing we hear our users want, after stability, is performance.
>>> it ultimately determines the cost of the storage solution. I think this
>>> should be high on the priority list. I know there has been a lot of
>>> effort with Crimson development for a while, but from my opinion if Ceph
>>> was run by a purely commercial company, getting this out the door as
>>> quickly as possible would take priority.
>> 
>> That is in line with the results from the last Ceph User Survey (2021):
>> 
>> 
>> 
>> https://ceph.io/en/news/blog/2021/2021-ceph-user-survey-results/#based-on-weighted-category-prioritization
>> 
>> So there is a dedicated group of people involved in the "next gen" OSD
>> storage sub system, which is a big endeavor. In the mean time there are
>> several developers improving the current implementation incrementally.
>> Zac is doing a great job improving the documentation. Cephadm team is
>> working on improving management. As as I have read correctly they will
>> have access to a large cluster to improve ... the next thing on the prio
>> list: scalability, in this case scalability of the management system.
>> 
>> If there is a separate "quality" team for the No. 1 priority:
>> Reliability? I don't know. Maybe that is just implicit in the project,
>> to make things reliable by default? That might be an interesting thing
>> to ask in the upcoming user+dev meeting ...
>> 
>> Gr. Stefan
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

2021-11-18 Thread Lincoln Bryant
Hi all,

Just to close the loop on this one - we ultimately found that there was an MTU 
misconfiguration between the hosts that was causing Ceph and other things to 
fail in strange ways. After fixing the MTU, cephadm etc immediately started 
working.

Cheers,
Lincoln

From: Lincoln Bryant 
Sent: Wednesday, November 17, 2021 9:18 AM
To: Eugen Block ; ceph-users@ceph.io 
Subject: [ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to 
new cluster

Hi,

Yes, the hosts have internet access and other Ceph commands work successfully. 
Every host we have tried has worked for bootstrap, but adding another node to 
the cluster isn't working. We've also tried adding intentionally bad hosts and 
get expected failures (missing SSH key, etc).

Here's some check-host output for our mons:

[root@kvm-mon03 ~]# cephadm check-host
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
[root@kvm-mon02 ~]#  cephadm check-host
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
[root@kvm-mon01 ~]# cephadm check-host
podman (/usr/bin/podman) version 3.4.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

Tailing the logs, journalctl simply reports:

 sshd[8135]: Accepted publickey for root from 192.168.7.13 port 41378 ssh2: RSA 
SHA256:JZxTh1Su9A+cqx14cIxzbP2W0vRHwgNcGQioLPCMFtk
 systemd-logind[2155]: New session 17 of user root.
 systemd[1]: Started Session 17 of user root.
 sshd[8135]: pam_unix(sshd:session): session opened for user root by (uid=0)

Very strange...

Maybe a manual installation will reveal issues?

--Lincoln

From: Eugen Block 
Sent: Wednesday, November 17, 2021 2:27 AM
To: ceph-users@ceph.io 
Subject: [ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to 
new cluster

Hi,

> This is the only logging output we see:
> # cephadm shell -- ceph orch host add kvm-mon02 192.168.7.12
> Inferring fsid 826b9b36-4729-11ec-99f0-c81f66d05a38
> Using recent ceph image
> quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3
>
> This command hangs indefinitely until killed via podman kill​.

the first thought coming to mind is, do the hosts have internet access
to download the container images? But if the bootstrap worked the
answer would be yes. And IIUC you tried different hosts for bootstrap
and all of them worked? Just for the record, a manual 'podman pull
...' on the second host works, too?

What does a 'cephadm check-host' on the second host report?
The syslog on the second node should usually reveal errors, have you
checked 'journalctl -f' during the attempt to add it?


Zitat von Lincoln Bryant :

> Greetings list,
>
> We have a new Ceph cluster we are trying to deploy on EL8 (CentOS
> Stream) using cephadm (+podman), targeting Pacific.
>
> We are successfully able to bootstrap the first host, but attempting
> to add any additional hosts hangs indefinitely. We have confirmed
> that we are able to SSH from the first host to subsequent hosts
> using the key generated by Ceph.
>
> This is the only logging output we see:
> # cephadm shell -- ceph orch host add kvm-mon02 192.168.7.12
> Inferring fsid 826b9b36-4729-11ec-99f0-c81f66d05a38
> Using recent ceph image
> quay.io/ceph/ceph@sha256:a2c23b6942f7fbc1e15d8cfacd6655a681fe0e44f288e4a158db22030b8d58e3
>
> This command hangs indefinitely until killed via podman kill​.
>
> Inspecting the host we're trying to add, we see that Ceph has
> launched a python process:
> root3604  0.0  0.0 164128  6316 ?S16:36   0:00
> |   \_ sshd: root@notty
> root3605  0.0  0.0  31976  8752 ?Ss   16:36   0:00
> |   \_ python3 -c import sys;exec(eval(sys.stdin.readline()))
>
> Inside of the mgr container, we see 2 SSH connections:
> ceph 186  0.0  0.0  44076  6676 ?S22:31   0:00
> \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i
> /tmp/cephadm-identity-8ku7ib6b root@192.168.7.13 python3 -c "import
> sys;exec(eval(sys.stdin.readline()))"
> ceph 211  0.0  0.0  44076  6716 ?S22:36   0:00
> \_ ssh -C -F /tmp/cephadm-conf-s0b8c90d -i
> /tmp/cephadm-identity-8ku7ib6b root@192.168.7.12 python3 -c "import
> sys;exec(eval(sys.stdin.readline()))"
>
> where 192.168.1.13 is the IP of the first host in the cluster (which
> has succesfully bootstrapped and is running mgr, mon, and so on),
> and 196.168.1.12 is the host we are trying to unsuccessfully add.
>
> The mgr logs show no particularly interesting except for:
> debug 2021-11-16T22:39:03.570+ 7fb6e4914700  0 [progress WARNING
> root] complete: ev de058df7-b54a-4429-933a-99abe7796715 does not exist
> debug 2021-11-16T22:39:03.571+ 7fb6e4914700  0 [progress WARNING
> root] complete: ev 61fe4998-4ef4-4640-8a13

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker
Would it be worth setting the OSD I removed back to "in" (or whatever the
opposite of "out") is and seeing if things recovered?

On Thu, Nov 18, 2021 at 3:44 PM David Tinker  wrote:

> Tx. # ceph version
> ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
> (stable)
>
>
>
> On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman  wrote:
>
>> On 11/18/21 13:20, David Tinker wrote:
>> > I just grepped all the OSD pod logs for error and warn and nothing
>> comes up:
>> >
>> > # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk  | grep -i warn
>> > etc
>> >
>> > I am assuming that would bring back something if any of them were
>> unhappy.
>>
>> Your issue looks similar to another thread last week (thread pg
>> inactive+remapped).
>>
>> What Ceph version are you running?
>>
>> I don't know if enabling debugging on osd.7 would reveal something
>>
>> Maybe recovery can be trigger by moving the primary to another OSD with
>> pg upmap. Check your failure domain to see what OSD would be suitable.
>>
>> Gr. Stefan
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> Am 17.11.21 um 20:14 schrieb Marc:
> >> a good choice. It lacks RBD encryption and read leases. But for us
> >> upgrading from N to O or P is currently not
> >>
> > what about just using osd encryption with N?
> 
> 
> That would be Data at Rest encryption only. The keys for the OSDs are
> stored on the mons. Data is transferred unencrypted over the wire.
> 
> RBD encryption takes place in the client.
> 

Very very nice security policy you have!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> 
> docker itself is not the problem, 

I would even argue the opposite. If the docker daemon crashes it takes down all 
containers. Sorry but in this time this is really not necessary with other 
alternatives.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> We also use containers for ceph and love it. If for some reason we
> couldn't run ceph this way any longer, we would probably migrate
> everything to a different solution. We are absolutely committed to
> containerization.

I wonder if you are really using containers. Are you not just using ceph-adm? 
If you would be using containers you would have selected your OC already, and 
would be pissed about how the current containers are being developed and have 
to use a 2nd system.




 
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> 
> Please remember, free software comes still with a price. You can not
> expect someone to work on your individual problem while being cheap on
> your highly critical data. If your data has value, then you should
> invest in ensuring data safety. There are companies out, paying Ceph
> developers and fixing bugs, so your problem will be gone as soon as you
> A) contribute code yourself or B) pay someone to contribute code.

Oh please, this again. As if someone does something for free, one cannot 
criticize, hold them accountable etc. Let's be honest, if ceph was not open 
source it would not be where it is today, and it would not have the market 
share it currently has.
Users choose a storage solution based on its availability in the future, 
because you are not easily switching to a different one. So you cannot blame 
users for asking what has been 'promised' in the past.
I would even advocate for some sort of addition to GPL(?) licenses where there 
is a commitment to warn users years upfront or guarantee support for a product 
and it is not possible to abruptly stop support and switch to a proprietary 
solution after enough market share has been acquired.
 
> Don't get me wrong, every dev here should have the focus in providing
> rock solid work and I believe they do, but in the end it's software, and
> software never will be free of bugs. 

Nobody is questioning this. 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc
> 
> If your building a ceph cluster, the state of a single node shouldn't
> matter. Docker crashing should not be a show stopper.
> 

You remind me of this senior software engineer of redhat that told me it was 
not that big of deal that ceph.conf got deleted and the root fs was mounted via 
a bind, because I should run his driver in a container namespace.

 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] A middle ground between containers and 'lts distros'?

2021-11-18 Thread Harry G. Coin
I sense the concern about ceph distributions via containers generally 
has to do with what you might call a feeling of 'opaqueness'.   The 
feeling is amplified as most folks who choose open source solutions 
prize being able promptly to address the particular concerns affecting 
them without having to wait for 'the next (might as well be opaque) 
release'.


An easy way forward might be if the ceph devs would document an approved 
set of steps that would add to the current ability to 'ssh in' to a 
container to  make on the fly changes.  If there was a cephadm command 
to 'save the current state of a container' in a format that adds a 
'.lcl.1' '.lcl.2' , then smooth the command line process to allow the 
cephadm upgrade process to use those saved local images with the local 
changes as targets, so as to automate pushing out the changes to the rest?


??

Harry Coin



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-18 Thread Wesley Dillingham
That response is typically indicative of a pg whose OSD sets has changed
since it was last scrubbed (typically from a disk failing).

Are you sure its actually getting scrubbed when you issue the scrub? For
example you can issue: "ceph pg  query"  and look for
"last_deep_scrub_stamp" which will tell you when it was last deep scrubbed.

Further, in sufficiently recent versions of Ceph (introduced in
14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
flight deep-scrubs to stop immediately. You may have a scheduling issue
where you deep-scrub or repairs arent getting scheduled.

Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
current deep-scrubs to complete then try and manually re-issue the deep
scrub "ceph pg deep-scrub " at this point your scrub should start
near immediately and "rados
list-inconsistent-obj 6.180 --format=json-pretty" should return with
something of value.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Nov 18, 2021 at 2:38 PM J-P Methot 
wrote:

> Hi,
>
> We currently have a PG stuck in an inconsistent state on an erasure
> coded pool. The pool's K and M values are 33 and 3.  The command rados
> list-inconsistent-obj 6.180 --format=json-pretty results in the
> following error:
>
> No scrub information available for pg 6.180 error 2: (2) No such file or
> directory
>
> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
> 6.180 doesn't seem to do anything. Is there a known bug explaining this
> behavior? I am attaching informations regarding the PG in question.
>
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
Okay, good news: on the osd start side, I identified the bug (and easily
reproduced locally).  The tracker and fix are:

 https://tracker.ceph.com/issues/53326
 https://github.com/ceph/ceph/pull/44015

These will take a while to work through QA and get backported.

Also, to reiterate what I said on the call earlier today about the osd
stopping issues:
 - A key piece of the original problem you were seeing was because
require_osd_release wasn't up to date, which meant that the the dead_epoch
metadata wasn't encoded in the OSDMap and we would basically *always* go
into the read lease wait when an OSD stopped.
 - Now that that is fixed, it appears as though setting both
osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning
combination.

I would be curious to hear if adjusting the icmp throttle kernel setting
makes things behave better when osd_fast_shutdown_notify_mon=false (the
default), but this is more out of curiosity--I think we've concluded that
we should set this option to true by default.

If I'm missing anything, please let me know!

Thanks for your patience in tracking this down.  It's always a bit tricky
when there are multiple contributing factors (in this case, at least 3).

sage



On Tue, Nov 16, 2021 at 9:42 AM Sage Weil  wrote:

> On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch 
> wrote:
>
>> Hi Sage,
>>
>> its still the same cluster we talked about. I only upgraded it from
>> 16.2.5 to 16.2.6.
>>
>> I enabled fast shutdown again and did some tests with debug
>> logging enabled.
>> osd_fast_shutdowntrue
>> osd_fast_shutdown_notify_mon false
>>
>> The logs are here:
>> ceph-post-file: 59325568-719c-4ec9-b7ab-945244fcf8ae
>>
>>
>> I took 3 tests.
>>
>> First I stopped OSD 122 again at 14:22:40 and started it again at
>> 14:23:40.
>> stopping worked now without issue. But on starting I got 3 Slow
>> ops.
>>
>> Then at 14:25:00 I stopped all osds (systemctl stop ceph-osd.target) on
>> the host "csdeveubs-u02c01b01". Surprisingly there were no slow op as
>> well. But still on startup at 14:26:00
>>
>> On 14:28:00 I stopped again all OSDs on host csdeveubs-u02c01b05. This
>> time I got some slow ops while stopping too.
>>
>>
>> So far as I understand, ceph skips the read lease time if a OSD is
>> "dead" but not if it is only down. This is because we do not know for
>> sure if a down OSD is realy gone and cannot answer reads anymore. right?
>>
>
> Exactly.
>
>
>> If a OSD annouces its shutdown to the mon the cluster marks it as
>> down. Can we not assume the deadness in this case as well?
>> Maybe this would help me in the stopping casse.
>>
>
> It could, but that's not how the shutdown process currently works. It
> requests that the mon mark it down, but continues servicing IO until it is
> actually marked down.
>
>
>> The starting case will still be an issue.
>
>
> Yes.  I suspect the root cause(s) there are a bit more complicated--I'll
> take a look at the logs today.
>
> Thanks!
> sage
>
>
>
>>
>>
>>
>> Thanks a lot
>> Manuel
>>
>>
>>
>> On Mon, 15 Nov 2021 17:32:24 -0600
>> Sage Weil  wrote:
>>
>> > Okay, I traced one slow op through the logs, and the problem was that
>> > the PG was laggy.  That happened because of the osd.122 that you
>> > stopped, which was marked down in the OSDMap but *not* dead.  It
>> > looks like that happened because the OSD took the 'clean shutdown'
>> > path instead of the fast stop.
>> >
>> > Have you tried enabling osd_fast_shutdown = true *after* you fixed the
>> > require_osd_release to octopus?   It would have led to slow requests
>> > when you tested before because the new dead_epochfied in the OSDMap
>> > that the read leases rely on was not being encoded, making peering
>> > wait for the read lease to time out even though the stopped osd
>> > really died.
>> >
>> > I'm not entirely sure if this is the same cluster as the earlier
>> > one.. but given the logs you sent, my suggestion is to enable
>> > osd_fast_shutdown and try again.  If you still get slow requests, can
>> > you capture the logs again?
>> >
>> > Thanks!
>> > sage
>> >
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] November Ceph Science Virtual User Group

2021-11-18 Thread Kevin Hrpcek

Hey all,

We will be having a Ceph science/research/big cluster call on Wednesday 
November 24th. If anyone wants to discuss something specific they can 
add it to the pad linked below. If you have questions or comments you 
can contact me.


This is an informal open call of community members mostly from 
hpc/htc/research environments where we discuss whatever is on our minds 
regarding ceph. Updates, outages, features, maintenance, etc...there is 
no set presenter but I do attempt to keep the conversation lively.


https://pad.ceph.com/p/Ceph_Science_User_Group_20211124 



We try to keep it to an hour or less.

Ceph calendar event details:

November, 2021
15:00 UTC
4pm Central European
9am Central US

Description: Main pad for discussions: 
https://pad.ceph.com/p/Ceph_Science_User_Group_Index

Meetings will be recorded and posted to the Ceph Youtube channel.
To join the meeting on a computer or mobile phone: 
https://bluejeans.com/908675367?src=calendarLink

To join from a Red Hat Deskphone or Softphone, dial: 84336.
Connecting directly from a room system?
    1.) Dial: 199.48.152.152 or bjn.vc
    2.) Enter Meeting ID: 908675367
Just want to dial in on your phone?
    1.) Dial one of the following numbers: 408-915-6466 (US)
    See all numbers: https://www.redhat.com/en/conference-numbers
    2.) Enter Meeting ID: 908675367
    3.) Press #
Want to test your video connection? https://bluejeans.com/111


Kevin

--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison

--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread huxia...@horebdata.cn
May i ask, which versions are affected by this bug? and which versions are 
going to receive backports?

best regards,

samuel



huxia...@horebdata.cn
 
From: Sage Weil
Date: 2021-11-18 22:02
To: Manuel Lausch; ceph-users
Subject: [ceph-users] Re: OSD spend too much time on "waiting for readable" -> 
slow ops -> laggy pg -> rgw stop -> worst case osd restart
Okay, good news: on the osd start side, I identified the bug (and easily
reproduced locally).  The tracker and fix are:
 
https://tracker.ceph.com/issues/53326
https://github.com/ceph/ceph/pull/44015
 
These will take a while to work through QA and get backported.
 
Also, to reiterate what I said on the call earlier today about the osd
stopping issues:
- A key piece of the original problem you were seeing was because
require_osd_release wasn't up to date, which meant that the the dead_epoch
metadata wasn't encoded in the OSDMap and we would basically *always* go
into the read lease wait when an OSD stopped.
- Now that that is fixed, it appears as though setting both
osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning
combination.
 
I would be curious to hear if adjusting the icmp throttle kernel setting
makes things behave better when osd_fast_shutdown_notify_mon=false (the
default), but this is more out of curiosity--I think we've concluded that
we should set this option to true by default.
 
If I'm missing anything, please let me know!
 
Thanks for your patience in tracking this down.  It's always a bit tricky
when there are multiple contributing factors (in this case, at least 3).
 
sage
 
 
 
On Tue, Nov 16, 2021 at 9:42 AM Sage Weil  wrote:
 
> On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch 
> wrote:
>
>> Hi Sage,
>>
>> its still the same cluster we talked about. I only upgraded it from
>> 16.2.5 to 16.2.6.
>>
>> I enabled fast shutdown again and did some tests with debug
>> logging enabled.
>> osd_fast_shutdowntrue
>> osd_fast_shutdown_notify_mon false
>>
>> The logs are here:
>> ceph-post-file: 59325568-719c-4ec9-b7ab-945244fcf8ae
>>
>>
>> I took 3 tests.
>>
>> First I stopped OSD 122 again at 14:22:40 and started it again at
>> 14:23:40.
>> stopping worked now without issue. But on starting I got 3 Slow
>> ops.
>>
>> Then at 14:25:00 I stopped all osds (systemctl stop ceph-osd.target) on
>> the host "csdeveubs-u02c01b01". Surprisingly there were no slow op as
>> well. But still on startup at 14:26:00
>>
>> On 14:28:00 I stopped again all OSDs on host csdeveubs-u02c01b05. This
>> time I got some slow ops while stopping too.
>>
>>
>> So far as I understand, ceph skips the read lease time if a OSD is
>> "dead" but not if it is only down. This is because we do not know for
>> sure if a down OSD is realy gone and cannot answer reads anymore. right?
>>
>
> Exactly.
>
>
>> If a OSD annouces its shutdown to the mon the cluster marks it as
>> down. Can we not assume the deadness in this case as well?
>> Maybe this would help me in the stopping casse.
>>
>
> It could, but that's not how the shutdown process currently works. It
> requests that the mon mark it down, but continues servicing IO until it is
> actually marked down.
>
>
>> The starting case will still be an issue.
>
>
> Yes.  I suspect the root cause(s) there are a bit more complicated--I'll
> take a look at the logs today.
>
> Thanks!
> sage
>
>
>
>>
>>
>>
>> Thanks a lot
>> Manuel
>>
>>
>>
>> On Mon, 15 Nov 2021 17:32:24 -0600
>> Sage Weil  wrote:
>>
>> > Okay, I traced one slow op through the logs, and the problem was that
>> > the PG was laggy.  That happened because of the osd.122 that you
>> > stopped, which was marked down in the OSDMap but *not* dead.  It
>> > looks like that happened because the OSD took the 'clean shutdown'
>> > path instead of the fast stop.
>> >
>> > Have you tried enabling osd_fast_shutdown = true *after* you fixed the
>> > require_osd_release to octopus?   It would have led to slow requests
>> > when you tested before because the new dead_epochfied in the OSDMap
>> > that the read leases rely on was not being encoded, making peering
>> > wait for the read lease to time out even though the stopped osd
>> > really died.
>> >
>> > I'm not entirely sure if this is the same cluster as the earlier
>> > one.. but given the logs you sent, my suggestion is to enable
>> > osd_fast_shutdown and try again.  If you still get slow requests, can
>> > you capture the logs again?
>> >
>> > Thanks!
>> > sage
>> >
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
It looks like the bug has been there since the read leases were introduced,
which I believe was octopus (15.2.z)

s

On Thu, Nov 18, 2021 at 3:55 PM huxia...@horebdata.cn 
wrote:

> May i ask, which versions are affected by this bug? and which versions are
> going to receive backports?
>
> best regards,
>
> samuel
>
> --
> huxia...@horebdata.cn
>
>
> *From:* Sage Weil 
> *Date:* 2021-11-18 22:02
> *To:* Manuel Lausch ; ceph-users
> 
> *Subject:* [ceph-users] Re: OSD spend too much time on "waiting for
> readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart
> Okay, good news: on the osd start side, I identified the bug (and easily
> reproduced locally).  The tracker and fix are:
>
> https://tracker.ceph.com/issues/53326
> https://github.com/ceph/ceph/pull/44015
>
> These will take a while to work through QA and get backported.
>
> Also, to reiterate what I said on the call earlier today about the osd
> stopping issues:
> - A key piece of the original problem you were seeing was because
> require_osd_release wasn't up to date, which meant that the the dead_epoch
> metadata wasn't encoded in the OSDMap and we would basically *always* go
> into the read lease wait when an OSD stopped.
> - Now that that is fixed, it appears as though setting both
> osd_fast_shutdown *and* osd_fast_shutdown_notify_mon is the winning
> combination.
>
> I would be curious to hear if adjusting the icmp throttle kernel setting
> makes things behave better when osd_fast_shutdown_notify_mon=false (the
> default), but this is more out of curiosity--I think we've concluded that
> we should set this option to true by default.
>
> If I'm missing anything, please let me know!
>
> Thanks for your patience in tracking this down.  It's always a bit tricky
> when there are multiple contributing factors (in this case, at least 3).
>
> sage
>
>
>
> On Tue, Nov 16, 2021 at 9:42 AM Sage Weil  wrote:
>
> > On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch 
> > wrote:
> >
> >> Hi Sage,
> >>
> >> its still the same cluster we talked about. I only upgraded it from
> >> 16.2.5 to 16.2.6.
> >>
> >> I enabled fast shutdown again and did some tests with debug
> >> logging enabled.
> >> osd_fast_shutdowntrue
> >> osd_fast_shutdown_notify_mon false
> >>
> >> The logs are here:
> >> ceph-post-file: 59325568-719c-4ec9-b7ab-945244fcf8ae
> >>
> >>
> >> I took 3 tests.
> >>
> >> First I stopped OSD 122 again at 14:22:40 and started it again at
> >> 14:23:40.
> >> stopping worked now without issue. But on starting I got 3 Slow
> >> ops.
> >>
> >> Then at 14:25:00 I stopped all osds (systemctl stop ceph-osd.target) on
> >> the host "csdeveubs-u02c01b01". Surprisingly there were no slow op as
> >> well. But still on startup at 14:26:00
> >>
> >> On 14:28:00 I stopped again all OSDs on host csdeveubs-u02c01b05. This
> >> time I got some slow ops while stopping too.
> >>
> >>
> >> So far as I understand, ceph skips the read lease time if a OSD is
> >> "dead" but not if it is only down. This is because we do not know for
> >> sure if a down OSD is realy gone and cannot answer reads anymore. right?
> >>
> >
> > Exactly.
> >
> >
> >> If a OSD annouces its shutdown to the mon the cluster marks it as
> >> down. Can we not assume the deadness in this case as well?
> >> Maybe this would help me in the stopping casse.
> >>
> >
> > It could, but that's not how the shutdown process currently works. It
> > requests that the mon mark it down, but continues servicing IO until it
> is
> > actually marked down.
> >
> >
> >> The starting case will still be an issue.
> >
> >
> > Yes.  I suspect the root cause(s) there are a bit more complicated--I'll
> > take a look at the logs today.
> >
> > Thanks!
> > sage
> >
> >
> >
> >>
> >>
> >>
> >> Thanks a lot
> >> Manuel
> >>
> >>
> >>
> >> On Mon, 15 Nov 2021 17:32:24 -0600
> >> Sage Weil  wrote:
> >>
> >> > Okay, I traced one slow op through the logs, and the problem was that
> >> > the PG was laggy.  That happened because of the osd.122 that you
> >> > stopped, which was marked down in the OSDMap but *not* dead.  It
> >> > looks like that happened because the OSD took the 'clean shutdown'
> >> > path instead of the fast stop.
> >> >
> >> > Have you tried enabling osd_fast_shutdown = true *after* you fixed the
> >> > require_osd_release to octopus?   It would have led to slow requests
> >> > when you tested before because the new dead_epochfied in the OSDMap
> >> > that the read leases rely on was not being encoded, making peering
> >> > wait for the read lease to time out even though the stopped osd
> >> > really died.
> >> >
> >> > I'm not entirely sure if this is the same cluster as the earlier
> >> > one.. but given the logs you sent, my suggestion is to enable
> >> > osd_fast_shutdown and try again.  If you still get slow requests, can
> >> > you capture the logs again?
> >> >
> >> > Thanks!
> >> > sage
> >> >
> >>
> >>
> ___
> ceph-use

[ceph-users] Dashboard's website hangs during loading, no errors

2021-11-18 Thread Zach Heise (SSCC)

  

  Hello!
   
  Our test cluster is a few months old, was
initially set up from scratch with Pacific and has now had two
separate small patches 16.2.5 and then a couple weeks ago,
16.2.6 applied to it. The issue I?m describing has been present
since the beginning.
   
  We have an active and standby mgr daemon, and
the dashboard module is installed with SSL turned on. Self
signed certificates only, not trusted by browsers, but I always
just click ?okay? through Chrome and Firefox?s warnings about
that.
   
  I have noticed that every 2-3 days, in the
morning when I start work, our ceph dashboard page does not
respond in the browser. It works fine throughout the day, but it
seems like after a certain unknown hours without anyone
accessing it (I?m the only one using the dashboard now since
it?s just a test) something must be going wrong with the
dashboard module, or mgr daemon, because when I try to load (or
refresh when it's already loaded) the ceph dashboard site, the
browser just does the ?throbber? ? no content on the page
ever appears, no errors or anything. None of the buttons on the
page load ? nor time out and show a 404 ? for example,
Block\Images or Cluster\Hosts in the left sidebar will load, but
show empty. And the throbber never stops.
   
  Confirmed that this happens in all browsers
too.
   
  I can easily fix it with ceph mgr module disable dashboard
and then waiting 10 seconds, then ceph mgr module enable dashboard
? this makes it start working again, until the next time
I go a few days without using the dashboard, at which point I
need to do the same process again.
   
  Any ideas as to what could be causing this? I
have already turned on debug mode. When I?m in this hanging
state, I check the cephadm logs with cephadm logs
  --name mgr.ceph01.fblojp -- -f but there?s nothing
obvious (to my untrained eyes at least). When the dashboard is
functional, I can see my own navigation around the dashboard in
the logs so I know that logging is working:
   
  Nov 01 15:46:32 ceph01.domain conmon[5814]: debug
  2021-11-01T20:46:32.601+ 7f7cbb42e700  0 [dashboard INFO
  request] [10.130.50.252:52267] [GET] [200] [0.013s] [admin]
  [1.0K] /api/summary
   
  I already confirmed that the same thing
happens regardless of whether I?m using default ports of http://ceph01.domain:8080 or
https://ceph01.domain:8443
(although as mentioned I usually use self-signed SSL).
   
  At this moment the dashboard is currently in
this hanging state so I am happy to try to get logs.
   
  Thanks,
  -Zach

  

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-18 Thread Kai Börnert

Hi,

do you use more nodes than deployed mgrs and cephadm?

If so it might be, that the node you are connecting to no longer has a 
instance of the mgr running, and you only getting some leftovers in the 
browser cache?


At least this was happening in my test cluster, but I was always able to 
find a node with the mgr running by just trying trough them.


Greetings,

Kai

On 11/19/21 00:03, Zach Heise (SSCC) wrote:


Hello!

Our test cluster is a few months old, was initially set up from 
scratch with Pacific and has now had two separate small patches 16.2.5 
and then a couple weeks ago, 16.2.6 applied to it. The issue I?m 
describing has been present since the beginning.


We have an active and standby mgr daemon, and the dashboard module is 
installed with SSL turned on. Self signed certificates only, not 
trusted by browsers, but I always just click ?okay? through Chrome and 
Firefox?s warnings about that.


I have noticed that every 2-3 days, in the morning when I start work, 
our ceph dashboard page does not respond in the browser. It works fine 
throughout the day, but it seems like after a certain unknown hours 
without anyone accessing it (I?m the only one using the dashboard now 
since it?s just a test) something must be going wrong with the 
dashboard module, or mgr daemon, because when I try to load (or 
refresh when it's already loaded) the ceph dashboard site, the browser 
just does the ?throbber ? ? no 
content on the page ever appears, no errors or anything. None of the 
buttons on the page load ? nor time out and show a 404 ? for example, 
Block\Images or Cluster\Hosts in the left sidebar will load, but show 
empty. And the throbber never stops.


Confirmed that this happens in all browsers too.

I can easily fix it with ceph mgr module disable dashboard and then 
waiting 10 seconds, then ceph mgr module enable dashboard ? this makes 
it start working again, until the next time I go a few days without 
using the dashboard, at which point I need to do the same process again.


Any ideas as to what could be causing this? I have already turned on 
debug mode. When I?m in this hanging state, I check the cephadm logs 
with cephadm logs --name mgr.ceph01.fblojp -- -f but there?s nothing 
obvious (to my untrained eyes at least). When the dashboard is 
functional, I can see my own navigation around the dashboard in the 
logs so I know that logging is working:


Nov 01 15:46:32 ceph01.domain conmon[5814]: debug 
2021-11-01T20:46:32.601+ 7f7cbb42e700  0 [dashboard INFO request] 
[10.130.50.252:52267] [GET] [200] [0.013s] [admin] [1.0K] /api/summary


I already confirmed that the same thing happens regardless of whether 
I?m using default ports of http://ceph01.domain:8080 or 
https://ceph01.domain:8443 (although as mentioned I usually use 
self-signed SSL).


At this moment the dashboard is currently in this hanging state so I 
am happy to try to get logs.


Thanks,

-Zach


___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: This week: Ceph User + Dev Monthly Meetup

2021-11-18 Thread Neha Ojha
Hello,

I don't think the meeting was recorded but there are detailed notes in
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. The next meeting
is scheduled for December 16, feel free to include your discussion
topic to the agenda.

Thanks,
Neha

On Thu, Nov 18, 2021 at 11:04 AM Szabo, Istvan (Agoda)
 wrote:
>
> Hi,
>
> I’ve set my calendar to a wrong zone … so couldn’t discuss my topic :(
> Is there any recording about the meeting?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> On 2021. Nov 15., at 18:35, Neha Ojha  wrote:
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> Hi everyone,
>
> This event is happening on November 18, 2021, 15:00-16:00 UTC - this
> is an hour later than what I had sent in my earlier email (I hadn't
> accounted for daylight savings change, sorry!), the calendar invite
> reflects the same.
>
> Thanks,
> Neha
>
> On Thu, Oct 28, 2021 at 11:53 AM Neha Ojha  wrote:
>
>
> Hi everyone,
>
>
> We are kicking off a new monthly meeting for Ceph users to directly
>
> interact with Ceph Developers. The high-level aim of this meeting is
>
> to provide users with a forum to:
>
>
> - share their experience running Ceph clusters
>
> - provide feedback on Ceph versions they are using
>
> - ask questions and raise concerns on any matters related to Ceph
>
> - provide documentation feedback and suggest improvements
>
>
> Note that this is not a meeting to discuss design ideas or feature
>
> improvements, we'll continue to use existing CDMs [0] for such
>
> discussions.
>
>
> The meeting details have been added to the community calendar [1]. The
>
> first meeting will be held on November 18, 2021, 14:00-15:00 UTC and
>
> the agenda is here:
>
> https://pad.ceph.com/p/ceph-user-dev-monthly-minutes
>
>
> Hope to see you there!
>
>
> Thanks,
>
> Neha
>
>
> [0] https://tracker.ceph.com/projects/ceph/wiki/Planning
>
> [1] 
> https://calendar.google.com/calendar/u/0/embed?src=9ts9c7lt7u1vic2ijvvqqlf...@group.calendar.google.com
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Patrick Donnelly
On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文  wrote:
>
> Hi all,
>
> We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it 
> seems harmless, but we cannot get HEALTH_OK, which is annoying.
>
> The clients that are reported failing to respond to cache pressure are 
> constantly changing, and most of the time we got 1-5 such clients out of ~20. 
> All of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 
> 20.04. The load is pretty low.
>
> We are reading datasets that consist of millions of small files from cephfs, 
> so we have tuned some config for performance. Some configs from "ceph config 
> dump" that might be relevant:
>
> WHO   LEVEL OPTION   VALUE
>   mds basic mds_cache_memory_limit   51539607552
>   mds advanced  mds_max_caps_per_client  8388608

This is pretty high. It may or may not cause problems in the future for you.

>   client  basic client_cache_size32768

Won't affect kernel clients.

> We also manually pinned almost every directory to either rank 0 or rank 1.
>
> Any thoughts about what causes the warning, or how can we get rid of it?

This reminds me of https://tracker.ceph.com/issues/46830

Suggest monitoring the client session information from the MDS as Dan
suggested. You can also try increasing mds_min_caps_working_set to see
if that helps.



-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bluestore_quick_fix_on_mount

2021-11-18 Thread Lindsay Mathieson

How does one read/set that from the command line?


Thanks,


Lindsay

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Tony Liu
Instead of complaining, take some time to learn more about container would help.

Tony

From: Marc 
Sent: November 18, 2021 10:50 AM
To: Pickett, Neale T; Hans van den Bogert; ceph-users@ceph.io
Subject: [ceph-users] Re: [EXTERNAL] Re: Why you might want packages not 
containers for Ceph deployments

> We also use containers for ceph and love it. If for some reason we
> couldn't run ceph this way any longer, we would probably migrate
> everything to a different solution. We are absolutely committed to
> containerization.

I wonder if you are really using containers. Are you not just using ceph-adm? 
If you would be using containers you would have selected your OC already, and 
would be pissed about how the current containers are being developed and have 
to use a 2nd system.






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Christian Wuerdig
I think Marc uses containers - but they've chosen Apache Mesos as
orchestrator and ceph-adm doesn't work with that.
Currently essentially two ceph container orchestrators exist - rook which
is a ceph orch or kubernetes and ceph-adm which is an orchestrator
expecting docker or podman
Admittedly I don't fully understand the nuanced differences between rook
(which can be added as a module to the ceph orchestrator cli) and cephadm
(no idea how this is related to the ceph orch cli) - they kinda seem to do
the same thing but slightly differently (or not?).

On Fri, 19 Nov 2021 at 16:51, Tony Liu  wrote:

> Instead of complaining, take some time to learn more about container would
> help.
>
> Tony
> 
> From: Marc 
> Sent: November 18, 2021 10:50 AM
> To: Pickett, Neale T; Hans van den Bogert; ceph-users@ceph.io
> Subject: [ceph-users] Re: [EXTERNAL] Re: Why you might want packages not
> containers for Ceph deployments
>
> > We also use containers for ceph and love it. If for some reason we
> > couldn't run ceph this way any longer, we would probably migrate
> > everything to a different solution. We are absolutely committed to
> > containerization.
>
> I wonder if you are really using containers. Are you not just using
> ceph-adm? If you would be using containers you would have selected your OC
> already, and would be pissed about how the current containers are being
> developed and have to use a 2nd system.
>
>
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡 玮文
Thanks Dan,

I choose one of the stuck client to investigate, as shown below, it currently 
holds ~269700 caps, which is pretty high with no obvious reason. I cannot 
understand most of the output, and failed to find any documents about it.

# ceph tell mds.cephfs.gpu018.ovxvoz client ls id=7915658
[
{
"id": 7915658,
"entity": {
"name": {
"type": "client",
"num": 7915658
},
"addr": {
"type": "v1",
"addr": "202.38.247.227:0",
"nonce": 3019311016
}
},
"state": "open",
"num_leases": 0,
"num_caps": 269695,
"request_load_avg": 184,
"uptime": 1340483.111458218,
"requests_in_flight": 0,
"num_completed_requests": 0,
"num_completed_flushes": 1,
"reconnecting": false,
"recall_caps": {
"value": 1625220.0378812221,
"halflife": 60
},
"release_caps": {
"value": 69.432671270941171,
"halflife": 60
},
"recall_caps_throttle": {
"value": 63255.667075845187,
"halflife": 1.5
},
"recall_caps_throttle2o": {
"value": 26064.679002183591,
"halflife": 0.5
},
"session_cache_liveness": {
"value": 259.9718480278375,
"halflife": 300
},
"cap_acquisition": {
"value": 0,
"halflife": 10
},
"delegated_inos": [... 7 items removed ],
"inst": "client.7915658 v1:202.38.247.227:0/3019311016",
"completed_requests": [],
"prealloc_inos": [ ... 9 items removed ],
"client_metadata": {
"client_features": {
"feature_bits": "0x7bff"
},
"metric_spec": {
"metric_flags": {
"feature_bits": "0x001f"
}
},
"entity_id": "smil",
"hostname": "gpu027",
"kernel_version": "5.11.0-37-generic",
"root": "/"
}
}
]

I suspect that some files are in use so that their caps cannot be released. 
However, "sudo lsof +f -- /mnt/cephfs | wc -l" just shows about 9k open files, 
well below "num_caps".

I also looked at 
/sys/kernel/debug/ceph/e88d509a-f6fc-11ea-b25d-a0423f3ac864.client7915658/caps 
on the client. The number of lines in it matches the "num_caps" reported by 
MDS. This file also tells me which caps are not released. I investigated some 
of them, but cannot see anything special. One example is attached here.

# ceph tell mds.cephfs.gpu018.ovxvoz dump inode 0x100068b9d24
{
"path": "/dataset/coco2017/train2017/00342643.jpg",
"ino": 1099621440804,
"rdev": 0,
"ctime": "2021-04-23T09:49:54.433652+",
"btime": "2021-04-23T09:49:54.425652+",
"mode": 33204,
"uid": 85969,
"gid": 85969,
"nlink": 1,
"dir_layout": {
"dir_hash": 0,
"unused1": 0,
"unused2": 0,
"unused3": 0
},
"layout": {
"stripe_unit": 4194304,
"stripe_count": 1,
"object_size": 4194304,
"pool_id": 5,
"pool_ns": ""
},
"old_pools": [],
"size": 147974,
"truncate_seq": 1,
"truncate_size": 18446744073709551615,
"truncate_from": 0,
"truncate_pending": 0,
"mtime": "2021-04-23T09:49:54.433652+",
"atime": "2021-04-23T09:49:54.425652+",
"time_warp_seq": 0,
"change_attr": 1,
"export_pin": -1,
"export_ephemeral_random_pin": 0,
"export_ephemeral_distributed_pin": false,
"client_ranges": [],
"dirstat": {
"version": 0,
"mtime": "0.00",
"num_files": 0,
"num_subdirs": 0,
"change_attr": 0
},
"rstat": {
"version": 0,
"rbytes": 147974,
"rfiles": 1,
"rsubdirs": 0,
"rsnaps": 0,
"rctime": "2021-04-23T09:49:54.433652+"
},
"accounted_rstat": {
"version": 0,
"rbytes": 147974,
"rfiles": 1,
"rsubdirs": 0,
"rsnaps": 0,
"rctime": "2021-04-23T09:49:54.433652+"
},
"version": 182894,
"file_data_version": 0,
"xattr_version": 1,
"backtrace_version": 177717,
"stray_prior_path": "",
"max_size_ever": 0,
"quota": {
"max_bytes": 0,
"max_files": 0
},
"last_scrub_stamp": "0.00",
"last_scrub_version": 0,
"symlink": "",
"xattrs": [],
"dirfragtree": {
"splits": []
},
"old_inodes": [],
"oldest_snap": 18446744073709551614,
"damage_flags": 0,
"is_auth": true,
"auth_state": {
"replicas": {}
},
"replica_state": {
"authority": [
0,
-2
],
"replica_nonce": 0
},
"auth_pins": 0,
"is_frozen": false,
"is_freezing": false,
 

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡 玮文
Hi Patrick,

One of the stuck client has num_caps at around 269700, and well above the 
number of files opened on the client (about 9k). See my reply to Dan for 
details. So I don't think this warning is simply caused by 
"mds_min_caps_working_set" being set too low.

> -邮件原件-
> 发件人: Patrick Donnelly 
> 发送时间: 2021年11月19日 9:37
> 收件人: 胡 玮文 
> 抄送: ceph-users@ceph.io
> 主题: Re: [ceph-users] Annoying MDS_CLIENT_RECALL Warning
> 
> On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文  wrote:
> >
> > Hi all,
> >
> > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it
> seems harmless, but we cannot get HEALTH_OK, which is annoying.
> >
> > The clients that are reported failing to respond to cache pressure are
> constantly changing, and most of the time we got 1-5 such clients out of ~20. 
> All
> of the clients are kernel clients, running HWE kernel 5.11 of Ubuntu 20.04. 
> The
> load is pretty low.
> >
> > We are reading datasets that consist of millions of small files from 
> > cephfs, so
> we have tuned some config for performance. Some configs from "ceph config
> dump" that might be relevant:
> >
> > WHO   LEVEL OPTION   VALUE
> >   mds basic mds_cache_memory_limit   51539607552
> >   mds advanced  mds_max_caps_per_client  8388608
> 
> This is pretty high. It may or may not cause problems in the future for you.

We sometimes need to iterate over datasets containing several millions of 
files. And we have 512G memory on client. So we set this to very high value to 
fully utilize our memory as page cache to accelerate IO.

> 
> >   client  basic client_cache_size32768
> 
> Won't affect kernel clients.
> 
> > We also manually pinned almost every directory to either rank 0 or rank 1.
> >
> > Any thoughts about what causes the warning, or how can we get rid of it?
> 
> This reminds me of https://tracker.ceph.com/issues/46830
> 
> Suggest monitoring the client session information from the MDS as Dan
> suggested. You can also try increasing mds_min_caps_working_set to see if that
> helps.
> 
> 
> 
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Anthony D'Atri



> In this context, I find it quite disturbing that nobody is willing even to 
> discuss an increase of the release cycle from say 2 to 4 years. What is so 
> important about pumping out one version after the other that real issues 
> caused by this speed are ignored?

One factor I think is that we’ve seen on multiple occasions that Ceph can have 
specific toolchain dependencies.  This is an observation, not a criticism.  
Which complicates the competing demands of working on new OS releases — and of 
working on the OS releases that companies actually run in production.  

> I think it would make a lot more sense if the observations and discoveries 
> made with production clusters - even or in particular after a long run time 
> on the battle field - were incorporated going much longer back than 4 years. 
> For a system like ceph 4 years is nothing.
> 
> But - and here I come back to my main point - this would require a very 
> scarce resource: time. This is what it all really is about. A slower release 
> cadence would provide time to look into long-term issues and hard challenges 
> with, for example, cache algorithms.

There has for a few years been visible advocacy in the software world for CI/CD 
over “waterfall” releases, with the idea that infrequent releases with large 
deltas of changes and new functionality are seen as more prone to bugs and 
regressions than frequent but very incremental releases, as often as monthly, 
weekly, or even daily.

Backports are a double-edged sword.  I’ve been there myself, where I yearned 
for something to be backported (eg. `test-reweight-by-utilization` when I was 
using RHCS). But if *everything* is backported, does the prior release more or 
less _become_ the newer release?  What’s the “right” middle ground?  We might 
have as many thoughts there as we have subscribers.

Here’s an idea for discussion:

Might we mull over the idea of switching to a more incremental release cadence, 
a la Slack or iTerm2?  Would that help obviate the current situation where we 
sort of have three release trains going at any time?

I am not advocating for or against this idea, but it would be an interesting 
discussion.

— aad

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io