[ceph-users] CephFS keyrings for K8s

2022-01-20 Thread Michal Strnad

Hi,

We are using CephFS in our Kubernetes clusters and now we are trying to 
optimize permissions/caps in keyrings. Every guide which we found 
contains something like - Create the file system by specifying the 
desired settings for the metadata pool, data pool and admin keyring with 
access to the entire file system ... Is there better way where we don't 
need admin key, but restricted key only? What are you using in your 
environments?


Multiple file systems isn't option for us.

Thanks for your help

Regards,
Michal Strnad

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS keyrings for K8s

2022-01-20 Thread Michal Strnad

Ad. We are using Nautilus on Ceph side.

Michal Strnad


On 1/20/22 9:26 AM, Michal Strnad wrote:

Hi,

We are using CephFS in our Kubernetes clusters and now we are trying to 
optimize permissions/caps in keyrings. Every guide which we found 
contains something like - Create the file system by specifying the 
desired settings for the metadata pool, data pool and admin keyring with 
access to the entire file system ... Is there better way where we don't 
need admin key, but restricted key only? What are you using in your 
environments?


Multiple file systems isn't option for us.

Thanks for your help

Regards,
Michal Strnad


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS keyrings for K8s

2022-01-20 Thread Burkhard Linke

Hi,

On 1/20/22 9:26 AM, Michal Strnad wrote:

Hi,

We are using CephFS in our Kubernetes clusters and now we are trying 
to optimize permissions/caps in keyrings. Every guide which we found 
contains something like - Create the file system by specifying the 
desired settings for the metadata pool, data pool and admin keyring 
with access to the entire file system ... Is there better way where we 
don't need admin key, but restricted key only? What are you using in 
your environments?


The 'ceph fs authorize' cli function can generate keys suitable for your 
use case. You can restrict the access scope to sub directories etc.



See https://docs.ceph.com/en/pacific/cephfs/client-auth/  (or the pages 
for your current release).



We use the CSI cephfs plugin in our main k8s cluster, and it is working 
fine with those keys.



Regards,

Burkhard Linke


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Dashboard: The Object Gateway Service is not configured

2022-01-20 Thread Kuhring, Mathias
Dear all,

recently, our dashboard is not able to connect to our RGW anymore:

Error connecting to Object Gateway: RGW REST API failed request with 
status code 404 
(b'{"Code":"NoSuchKey","BucketName":"admin","RequestId":"tx0f84ffa8b34579fa'
 
b'a-0061e93872-4bc673c-ext-default-primary","HostId":"4bc673c-ext-default-prim' 
b'ary-ext-default"}')

We couldn't figure out how to fix this, yet. I think there never was a bucket 
"admin" to begin with, so we are unsure why it's looking for it now. We created 
one now but this didn't change anything.

We would appreciate any hints where we could start looking.

Best,
Mathias

---
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail:  mathias.kuhr...@bih-charite.de
Mobile: +49 172 3475576

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Disk Failure Predication cloud module?

2022-01-20 Thread Jake Grimmett

Dear All,

Is the cloud option for the diskprediction module deprecated in Pacific?

https://docs.ceph.com/en/pacific/mgr/diskprediction/

If so, are prophetstor still contributing data to the local module, or 
is this being updated by someone using data from Backblaze?


Do people find this module useful?

many thanks

Jake

--
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

2022-01-20 Thread Jay Sullivan
I just verified the following in my other 16.2.6 cluster:
(S, per_pool_omap)
  32|2|
0001

I set noout, stopped the OSD service, ran the "ceph-kvstore-tool bluestore-kv 
 get S per_pool_omap" command, and started the OSD back up.

Looking back through my change history, I had manually set 
bluestore_fsck_quick_fix_on_mount to "true" right before my upgrade from 
Nautilus 14.2.20 to Pacific 16.2.4. Per the "holy war" (lol) referenced below 
and elsewhere, I figured I'd just get the OMAP conversion out of the way while 
I was already in a maintenance window:
https://tracker.ceph.com/issues/45265 

My OSDs are not crashing on startup. I would have seen something by now, right? 
How was I spared? This specific cluster is mostly RBD (79M objects for 295TB 
data) with a little RGW (65k objects for 200GB data) and a very small 
non-production CephFS volume (1GB).

~Jay

-Original Message-
From: Jay Sullivan 
Sent: Wednesday, January 19, 2022 12:36 PM
To: 'Igor Fedotov' ; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

Hello, Igor!

I just ran "ceph config set osd bluestore_warn_on_no_per_pg_omap true". No 
warnings yet. Would I need to restart any services for that to take effect?

When I upgraded from Nautilus to Pacific 16.2.4 in July and then 16.2.4 to 
16.2.6 in mid-October I upgraded the OSD packages, restarted the OSD services, 
waited for HEALTH_OK, and then rebooted the OSD hosts one at a time. I've had a 
handful of drive failures since installing 16.2.6 so brand new OSDs would have 
started up with bluestore_fsck_quick_fix_on_mount set to "true" (not sure if 
that counts). But to answer your question, all of my OSDs have restarted at 
least once since upgrading to 16.2.6 in the form of a full host reboot. I have 
not seen any unplanned OSD restarts.

I just checked one of my 16.2.6 clusters and I see the following (I think this 
particular cluster was "born" on Nautilus). I'll check my other 16.2.6 cluster 
in a bit as it's currently backfilling from a drive failure earlier this 
morning.
(S, per_pool_omap)
  32|2|
0001

Would the per-pg OMAP task have run during my upgrade from Nautilus/Octopus to 
Pacific 16.2.4? (I think YES.) Was the bug introduced specifically in 16.2.6 or 
was it present in older versions, too? Am I possibly spared because I upgraded 
to 16.2.4 first before the OMAP bug was introduced? Again, I have not seen any 
unplanned OSD restarts.

Thanks, Igor!!

~Jay

-Original Message-
From: Igor Fedotov 
Sent: Wednesday, January 19, 2022 8:24 AM
To: Jay Sullivan ; ceph-users@ceph.io
Subject: [ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

Hey Jay,

first of all I'd like to mention that there were two OMAP naming scheme 
modifications since Nautilus:

1) per-pool OMAPs

2) per-pg OMAPs

Both are applied during BlueStore repair/quick-fix. So may be you performed the 
first one but not the second.

You might want to set bluestore_warn_on_no_per_pg_omap to true and inspect ceph 
health alerts to learn if per-pg format is disabled in your cluster.

Alternatively you might want to inspect db manually against an offline OSD with 
ceph-kvstore-tool: ceph-kvstore-tool bluestore-kv  get S 
per_pool_omap:

(S, per_pool_omap)
  32    |2|
0001

Having ASCII '2' there means per-pg format is already applied. I can't explain 
why you're observing no issues if that's the case though...


As for your other questions - you can stay on 16.2.6 as far as you don't run 
BlueStore repair/quick-fix - i.e. the relevant setting is false and nobody runs 
relevant ceph-bluestore-tool commands  manually.


And you mentioned bluestore_fsck_quick_fix_on_mount was set to true for until 
now - curious if you had any OSD restarts with that setting set to true?


Thanks,

Igor

On 1/19/2022 4:28 AM, Jay Sullivan wrote:
> https://tracker.ceph.com/issues/53062
>
> Can someone help me understand the scope of the OMAP key bug linked above? 
> I’ve been using 16.2.6 for three months and I don’t _think_ I’ve seen any 
> related problems.
>
> I upgraded my Nautilus (then 14.2.21) clusters to Pacific (16.2.4) in 
> mid-June. One of my clusters was born in Jewel and has made all of the 
> even-numbered releases to Pacific. I skipped over 16.2.5 and upgraded to 
> 16.2.6 in mid-October. It looks like the aforementioned OMAP bug was 
> discovered shortly after, on/around October 20th. My clusters had 
> bluestore_fsck_quick_fix_on_mount set to true until about 10 minutes ago. I 
> _think_ all of my OSDs did the OMAP conversion when I upgraded from Nautilus 
> to Pacific back in June (I remember it taking several minutes per spinning 
> OSD).
>
> Questions for my sanity:
>
>*   Do I need to upgrade to 16.2.7 ASAP? Or can I wait until my next 
> regular maintenance window?
>*   What is the 

[ceph-users] Re: Ceph User + Dev Monthly January Meetup

2022-01-20 Thread Dan van der Ster
Reminder -- starting in a few minutes.

Agenda here (still pretty light!)

https://pad.ceph.com/p/ceph-user-dev-monthly-minutes

-- dan

On Thu, Jan 13, 2022 at 7:31 PM Neha Ojha  wrote:
>
> Hi everyone,
>
> This month's Ceph User + Dev Monthly meetup is next Thursday, January
> 20, 2022, 15:00-16:00 UTC. This time we would like to hear what users
> have to say about four themes of Ceph: Quality, Usability, Performance
> and Ecosystem. Any kind of feedback is welcome! Please feel free to
> add more topics to the agenda https://pad.ceph.com/p/ceph-roles-draft.
>
> Hope to see you there!
>
> Thanks,
> Neha
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: dashboard fails with error code 500 on a particular file system

2022-01-20 Thread E Taka
Hello Ernesto,

I found the reason. One of the users set a directory permission without the
+x bit (drw---). After the command 'chmod 700' everything was OK again.
The MDS log did not help, but with the API call 'ls_dir?path=…' I was able
to iterate to the directory with the wrong permissions.

IMHO this is not an urgent problem , but an user should not be able to
crash the management interface of the admin.

Thanks for your patience
Erich


Am Mi., 19. Jan. 2022 um 18:00 Uhr schrieb Ernesto Puerta <
epuer...@redhat.com>:

> Given the error returned from libcephfs is "cephfs.OSError: opendir
> failed: Permission denied [Errno 13]", it could be that the mgr doesn't
> have rights (ceph auth) to access the filesystem. Could you check the mds
> logs for any trace when the Dashboard error appears?
>
> Kind Regards,
> Ernesto
>
>
> On Wed, Jan 19, 2022 at 4:40 PM E Taka <0eta...@gmail.com> wrote:
>
>> Hello Ernesto,
>>
>> the commands worked without any problems, with Ubuntus 20.04 Ceph packages
>> and inside "cephadm shell". I tried all 55k directories of the filesystem.
>>
>> Best,
>> Erich
>>
>> Am Mo., 17. Jan. 2022 um 21:10 Uhr schrieb Ernesto Puerta <
>> epuer...@redhat.com>:
>>
>> > Hi E Taka,
>> >
>> > There's already a report of that issue in 16.2.5 (
>> > https://tracker.ceph.com/issues/51611), stating that it didn't happen
>> in
>> > 16.2.3 (so a regression then), but we couldn't reproduce it so far.
>> >
>> > I just tried creating a regular fresh cephfs filesystem (1 MDS), a
>> > directory inside it (via cephfs-shell) and I could access the directory
>> > from the dashboard with no issues. Is there anything specific on that
>> > deployment? The dashboard basically uses Python libcephfs
>> >  for
>> accessing
>> > Cephfs, so could you plz try the same and validate whether it works?
>> >
>> > >>> import cephfs
>> > >>> fs = cephfs.LibCephFS()
>> > >>> fs.conf_read_file('/etc/ceph/ceph.conf')
>> > >>> fs.mount(b'/', b'a')
>> > >>> fs.opendir('/test')
>> >
>> > # NO ERROR
>> >
>> >
>> > Kind Regards,
>> > Ernesto
>> >
>> >
>> > On Sun, Jan 16, 2022 at 11:26 AM E Taka <0eta...@gmail.com> wrote:
>> >
>> >> Dashboard → Filesystems → (filesystem name) → Directories
>> >>
>> >> fails on a particular file system with error "500 - Internal Server
>> >> Error".
>> >>
>> >> The log shows:
>> >>
>> >>  Jan 16 11:22:18 ceph00 bash[96786]:   File
>> >> "/usr/share/ceph/mgr/dashboard/services/cephfs.py", line 57, in opendir
>> >>  Jan 16 11:22:18 ceph00 bash[96786]: d = self.cfs.opendir(dirpath)
>> >>  Jan 16 11:22:18 ceph00 bash[96786]:   File "cephfs.pyx", line 942, in
>> >> cephfs.LibCephFS.opendir
>> >>  Jan 16 11:22:18 ceph00 bash[96786]: cephfs.OSError: opendir failed:
>> >> Permission denied [Errno 13]
>> >>  Jan 16 11:22:18 ceph00 bash[96786]: [:::10.149.249.237:47814]
>> [GET]
>> >> [500] [0.246s] [admin] [513.0B] /ui-api/cephfs/3/ls_dir
>> >>  Jan 16 11:22:18 ceph00 bash[96786]: [b'{"status": "500 Internal Server
>> >> Error", "detail": "The server encountered an unexpected condition which
>> >> prevented it from fulfilling the request.", "request_id":
>> >> "76727248-cf64-4b85-8630-8131e33832f8"}
>> >>
>> >> Do you have an idea what went wrong hore and how can I solve this
>> issue?
>> >>
>> >> Thanks!
>> >> Erich
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disk Failure Predication cloud module?

2022-01-20 Thread Yaarit Hatuka
Hi Jake,

diskprediction_cloud module is no longer available in Pacific.
There are efforts to enhance the diskprediction module, using our
anonymized device telemetry data, which is aimed at building a dynamic,
large, diverse, free and open data set to help data scientists create
accurate failure prediction models.

See more details:
https://ceph.io/en/users/telemetry/device-telemetry/
https://docs.ceph.com/en/latest/mgr/telemetry/

Please join these efforts by opting-in to telemetry with:
`ceph telemetry on`
or with the dashboard's wizard.
If for some reason you can not or wish not to opt-it, please share the
reason with us.

Thanks,
Yaarit


On Thu, Jan 20, 2022 at 6:39 AM Jake Grimmett  wrote:

> Dear All,
>
> Is the cloud option for the diskprediction module deprecated in Pacific?
>
> https://docs.ceph.com/en/pacific/mgr/diskprediction/
>
> If so, are prophetstor still contributing data to the local module, or
> is this being updated by someone using data from Backblaze?
>
> Do people find this module useful?
>
> many thanks
>
> Jake
>
> --
> Dr Jake Grimmett
> Head Of Scientific Computing
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue,
> Cambridge CB2 0QH, UK.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help Removing Failed Cephadm Daemon(s) - MDS Deployment Issue

2022-01-20 Thread Adam King
Hi Michael,

To clarify a bit further "ceph orch rm" works for removing services and
"ceph orch daemon rm" works for removing daemons. In the command you ran

[ceph: root@osd16 /]# ceph orch rm "mds.cephmon03.local osd16.local
osd17.local osd18.local.onl26.drymjr"

the name you've given there is the name of a daemon (note the ".drymjr" at
the end). To remove the service you must run "ceph orch rm" with the name
of the service as it appears in "ceph orch ls" output (where as what you
see in "ceph orch ps" output are daemon names). Otherwise, it won't be able
to find the service and therefore can't delete it. Furthermore, as long as
the service is still present, cephadm will keep attempting to place mds
daemons until the placement matches the service spec (which for this
service is just count:2 so it will try to make sure there are 2 mds daemons
placed) and will therefore replace the daemons you removed with the "ceph
orch daemon rm . . ." command. There is a way to make cephadm not do this
for a specific service by setting the unmanaged field to True (see
https://docs.ceph.com/en/latest/cephadm/services/#ceph.deployment.service_spec.ServiceSpec.unmanaged)
which would allow you to remove the daemons without them being replaced but
won't get rid of the service itself so I'd recommend continuing to try the
"ceph orch rm" command. If the service name you give matches what is shown
in the "ceph orch ls" output it should remove the service properly.

- Adam

On Thu, Jan 20, 2022 at 12:09 PM Poat, Michael  wrote:

> Hello Adam et al.,
>
>
>
> Thank you for the reply and suggestions. I will work on deploying the
> additional services using .yaml and the instructions you suggested. First I
> need to get rid of these two stuck daemons. The ‘ ceph orch rm’ command
> fails to remove the service. If I use ‘ceph orch daemon rm’ the service
> gets removed but reappears shortly later on another host. Also adding
> –force doesn’t change the outcome.
>
> Initial state, notice the one daemon is running on cephmon03 & the other
> on osd26:
>
> [ceph: root@osd16 /]# ceph orch ps | grep error
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.cephmon03.talhqb
> cephmon03.local  error  3m ago 36m  
> docker.io/ceph/ceph:v15
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.osd26.drymjr
> osd26.local  error  3m ago 46m  
> docker.io/ceph/ceph:v15
>
>
>
> Trying Adam’s suggestion:
>
> [ceph: root@osd16 /]# ceph orch rm "mds.cephmon03.local osd16.local
> osd17.local osd18.local.onl26.drymjr"
>
> Failed to remove service.  osd18.local.osd26.drymjr> was not found.
>
>
>
> Service gets removed:
>
> [ceph: root@osd16 /]# ceph orch daemon rm "mds.cephmon03.local
> osd16.local osd17.local osd18.local.osd26.drymjr"
>
> Removed mds.cephmon03.local osd16.local osd17.local
> osd18.local.osd26.drymjr from host 'osd26.local'
>
>
>
> Ceph will stay in this state with only 1 failed daemon for a while, if I
> remove the 2nd one they both come back:
>
> [ceph: root@osd16 /]# ceph orch ps | grep error
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.cephmon03.talhqb
> cephmon03.local  error  4m ago 37m  
> docker.io/ceph/ceph:v15
>
>
>
> Removing the 2nd daemon:
>
> [ceph: root@osd16 /]# ceph orch daemon rm "mds.cephmon03.local
> osd16.local osd17.local osd18.local.cephmon03.talhqb"
>
> Removed mds.cephmon03.local osd16.local osd17.local
> osd18.local.cephmon03.talhqb from host 'cephmon03.local'
>
>
>
> After a few minutes….notice one daemon on cephmon03 & the other osd30.
> This seems random
>
> [ceph: root@osd16 /]# ceph orch ps | grep error
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.cephmon03.pwtvcw
> cephmon03.local  error  53s ago12m  
> docker.io/ceph/ceph:v15
>
> mds.cephmon03.local osd16.local osd17.local osd18.local.osd30.ythzbh
> osd30.local  error  40s ago15m  
> docker.io/ceph/ceph:v15
>
>
>
> Any further suggestions are helpful.
>
>
>
> Thanks,
> -Michael
>
>
>
> *From:* Adam King 
> *Sent:* Wednesday, January 19, 2022 3:37 PM
> *To:* Poat, Michael 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] Help Removing Failed Cephadm Daemon(s) - MDS
> Deployment Issue
>
>
>
> Hello Michael,
>
>
>
> If you're trying to remove all the mds daemons in this mds
> "cephmon03.local osd16.local osd17.local osd18.local" I think the command
> would be "ceph orch rm "mds.cephmon03.local osd16.local osd17.local
> osd18.local"" (note the quotes around that mds.cepmon . . .
> since cephadm thinks this is the service id rather than the placement as I
> think you intended. I'd check "ceph orch apply mds -h" to see the order it
> takes the args). Also, as for getting the mds or any other daemon where you
> want them, I'd recommend taking a look at
> https://docs.ceph.com/en/pacific/cephadm/services/#service-specification
> 

[ceph-users] Re: cephfs: [ERR] loaded dup inode

2022-01-20 Thread Patrick Donnelly
Hi Frank,

On Tue, Jan 18, 2022 at 4:54 AM Frank Schilder  wrote:
>
> Hi Dan and Patrick,
>
> this problem seems to develop into a nightmare. I executed a find on the file 
> system and had some initial success. The number of stray files dropped by 
> about 8%. Unfortunately, this is about it. I'm running a find now also on 
> snap dirs, but I don't have much hope. There must be a way to find out what 
> is accumulating in the stray buckets. As I wrote in another reply to this 
> thread, I can't dump the trees:
>
> > I seem to have a problem. I cannot dump the mds tree:
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0/stray0'
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mds0' 0
> > root inode is not in cache
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 dump tree '~mdsdir' 0
> > root inode is not in cache
> >
> > [root@ceph-08 ~]# ceph daemon mds.ceph-08 get subtrees | grep path
> > "path": "",
> > "path": "~mds0",
> >
>
> However, this information is somewhere in rados objects and it should be 
> possible to figure something out similar to
>
> # rados getxattr --pool=con-fs2-meta1  parent | ceph-dencoder type 
> inode_backtrace_t import - decode dump_json
> # rados listomapkeys --pool=con-fs2-meta1 
>
> What OBJ_IDs am I looking for? How and where can I start to traverse the 
> structure? Version is mimic latest stable.

You mentioned you have snapshots? If you've deleted the directories
that have been snapshotted then they stick around in the stray
directory until the snapshot is deleted. There's no way to force
purging until the snapshot is also deleted. For this reason, the stray
directory size can grow without bound. You need to either upgrade to
Pacific where the stray directory will be fragmented or remove the
snapshots.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

2022-01-20 Thread Igor Fedotov

Hello Jay!

Just refreshed my memory - the bug was introduced in 16.2.6 by 
https://github.com/ceph/ceph/pull/42956


So it was safe to apply quick-fix in 16.2.4 which explains why you're 
fine now.


And OSD deployed by Pacific wouldn't suffer from it at all as they've 
got the new omap format from the beginning and don't need its upgrade.



Thanks,

Igor


On 1/19/2022 8:35 PM, Jay Sullivan wrote:

Hello, Igor!

I just ran "ceph config set osd bluestore_warn_on_no_per_pg_omap true". No 
warnings yet. Would I need to restart any services for that to take effect?

When I upgraded from Nautilus to Pacific 16.2.4 in July and then 16.2.4 to 16.2.6 in 
mid-October I upgraded the OSD packages, restarted the OSD services, waited for 
HEALTH_OK, and then rebooted the OSD hosts one at a time. I've had a handful of drive 
failures since installing 16.2.6 so brand new OSDs would have started up with 
bluestore_fsck_quick_fix_on_mount set to "true" (not sure if that counts). But 
to answer your question, all of my OSDs have restarted at least once since upgrading to 
16.2.6 in the form of a full host reboot. I have not seen any unplanned OSD restarts.

I just checked one of my 16.2.6 clusters and I see the following (I think this particular 
cluster was "born" on Nautilus). I'll check my other 16.2.6 cluster in a bit as 
it's currently backfilling from a drive failure earlier this morning.
(S, per_pool_omap)
  32|2|
0001

Would the per-pg OMAP task have run during my upgrade from Nautilus/Octopus to 
Pacific 16.2.4? (I think YES.) Was the bug introduced specifically in 16.2.6 or 
was it present in older versions, too? Am I possibly spared because I upgraded 
to 16.2.4 first before the OMAP bug was introduced? Again, I have not seen any 
unplanned OSD restarts.

Thanks, Igor!!

~Jay

-Original Message-
From: Igor Fedotov 
Sent: Wednesday, January 19, 2022 8:24 AM
To: Jay Sullivan ; ceph-users@ceph.io
Subject: [ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

Hey Jay,

first of all I'd like to mention that there were two OMAP naming scheme 
modifications since Nautilus:

1) per-pool OMAPs

2) per-pg OMAPs

Both are applied during BlueStore repair/quick-fix. So may be you performed the 
first one but not the second.

You might want to set bluestore_warn_on_no_per_pg_omap to true and inspect ceph 
health alerts to learn if per-pg format is disabled in your cluster.

Alternatively you might want to inspect db manually against an offline OSD with 
ceph-kvstore-tool: ceph-kvstore-tool bluestore-kv  get S 
per_pool_omap:

(S, per_pool_omap)
  32    |2|
0001

Having ASCII '2' there means per-pg format is already applied. I can't explain 
why you're observing no issues if that's the case though...


As for your other questions - you can stay on 16.2.6 as far as you don't
run BlueStore repair/quick-fix - i.e. the relevant setting is false and
nobody runs relevant ceph-bluestore-tool commands  manually.


And you mentioned bluestore_fsck_quick_fix_on_mount was set to true for
until now - curious if you had any OSD restarts with that setting set to
true?


Thanks,

Igor

On 1/19/2022 4:28 AM, Jay Sullivan wrote:

https://tracker.ceph.com/issues/53062

Can someone help me understand the scope of the OMAP key bug linked above? I’ve 
been using 16.2.6 for three months and I don’t _think_ I’ve seen any related 
problems.

I upgraded my Nautilus (then 14.2.21) clusters to Pacific (16.2.4) in mid-June. 
One of my clusters was born in Jewel and has made all of the even-numbered 
releases to Pacific. I skipped over 16.2.5 and upgraded to 16.2.6 in 
mid-October. It looks like the aforementioned OMAP bug was discovered shortly 
after, on/around October 20th. My clusters had 
bluestore_fsck_quick_fix_on_mount set to true until about 10 minutes ago. I 
_think_ all of my OSDs did the OMAP conversion when I upgraded from Nautilus to 
Pacific back in June (I remember it taking several minutes per spinning OSD).

Questions for my sanity:

*   Do I need to upgrade to 16.2.7 ASAP? Or can I wait until my next 
regular maintenance window?
*   What is the risk of staying on 16.2.6 if I have 
bluestore_fsck_quick_fix_on_mount set to false?
*   If I don’t have OSDs crashing, how would I know if I was impacted by 
the bug?

Thanks! ❤

~Jay

--
Jay Sullivan
Rochester Institute of Technology
jay.sulli...@rit.edu

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

___
ceph-users m

[ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

2022-01-20 Thread Igor Fedotov
So in other words - it's unsafe to apply quick-fix/repair in 16.2.6 
only. You're safe if you applied it before or newly deployed OSDs with 
v16 (even 16.2.6).


Igor

On 1/20/2022 5:22 PM, Jay Sullivan wrote:

I just verified the following in my other 16.2.6 cluster:
(S, per_pool_omap)
  32|2|
0001

I set noout, stopped the OSD service, ran the "ceph-kvstore-tool bluestore-kv 
 get S per_pool_omap" command, and started the OSD back up.

Looking back through my change history, I had manually set bluestore_fsck_quick_fix_on_mount to 
"true" right before my upgrade from Nautilus 14.2.20 to Pacific 16.2.4. Per the 
"holy war" (lol) referenced below and elsewhere, I figured I'd just get the OMAP 
conversion out of the way while I was already in a maintenance window:
https://tracker.ceph.com/issues/45265

My OSDs are not crashing on startup. I would have seen something by now, right? 
How was I spared? This specific cluster is mostly RBD (79M objects for 295TB 
data) with a little RGW (65k objects for 200GB data) and a very small 
non-production CephFS volume (1GB).

~Jay

-Original Message-
From: Jay Sullivan
Sent: Wednesday, January 19, 2022 12:36 PM
To: 'Igor Fedotov' ; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

Hello, Igor!

I just ran "ceph config set osd bluestore_warn_on_no_per_pg_omap true". No 
warnings yet. Would I need to restart any services for that to take effect?

When I upgraded from Nautilus to Pacific 16.2.4 in July and then 16.2.4 to 16.2.6 in 
mid-October I upgraded the OSD packages, restarted the OSD services, waited for 
HEALTH_OK, and then rebooted the OSD hosts one at a time. I've had a handful of drive 
failures since installing 16.2.6 so brand new OSDs would have started up with 
bluestore_fsck_quick_fix_on_mount set to "true" (not sure if that counts). But 
to answer your question, all of my OSDs have restarted at least once since upgrading to 
16.2.6 in the form of a full host reboot. I have not seen any unplanned OSD restarts.

I just checked one of my 16.2.6 clusters and I see the following (I think this particular 
cluster was "born" on Nautilus). I'll check my other 16.2.6 cluster in a bit as 
it's currently backfilling from a drive failure earlier this morning.
(S, per_pool_omap)
  32|2|
0001

Would the per-pg OMAP task have run during my upgrade from Nautilus/Octopus to 
Pacific 16.2.4? (I think YES.) Was the bug introduced specifically in 16.2.6 or 
was it present in older versions, too? Am I possibly spared because I upgraded 
to 16.2.4 first before the OMAP bug was introduced? Again, I have not seen any 
unplanned OSD restarts.

Thanks, Igor!!

~Jay

-Original Message-
From: Igor Fedotov 
Sent: Wednesday, January 19, 2022 8:24 AM
To: Jay Sullivan ; ceph-users@ceph.io
Subject: [ceph-users] Re: Scope of Pacific 16.2.6 OMAP Keys Bug?

Hey Jay,

first of all I'd like to mention that there were two OMAP naming scheme 
modifications since Nautilus:

1) per-pool OMAPs

2) per-pg OMAPs

Both are applied during BlueStore repair/quick-fix. So may be you performed the 
first one but not the second.

You might want to set bluestore_warn_on_no_per_pg_omap to true and inspect ceph 
health alerts to learn if per-pg format is disabled in your cluster.

Alternatively you might want to inspect db manually against an offline OSD with 
ceph-kvstore-tool: ceph-kvstore-tool bluestore-kv  get S 
per_pool_omap:

(S, per_pool_omap)
  32    |2|
0001

Having ASCII '2' there means per-pg format is already applied. I can't explain 
why you're observing no issues if that's the case though...


As for your other questions - you can stay on 16.2.6 as far as you don't run 
BlueStore repair/quick-fix - i.e. the relevant setting is false and nobody runs 
relevant ceph-bluestore-tool commands  manually.


And you mentioned bluestore_fsck_quick_fix_on_mount was set to true for until 
now - curious if you had any OSD restarts with that setting set to true?


Thanks,

Igor

On 1/19/2022 4:28 AM, Jay Sullivan wrote:

https://tracker.ceph.com/issues/53062

Can someone help me understand the scope of the OMAP key bug linked above? I’ve 
been using 16.2.6 for three months and I don’t _think_ I’ve seen any related 
problems.

I upgraded my Nautilus (then 14.2.21) clusters to Pacific (16.2.4) in mid-June. 
One of my clusters was born in Jewel and has made all of the even-numbered 
releases to Pacific. I skipped over 16.2.5 and upgraded to 16.2.6 in 
mid-October. It looks like the aforementioned OMAP bug was discovered shortly 
after, on/around October 20th. My clusters had 
bluestore_fsck_quick_fix_on_mount set to true until about 10 minutes ago. I 
_think_ all of my OSDs did the OMAP conversion when I upgraded from Nautilus to 
Pacific back in June (I remember it taking several minute

[ceph-users] MDS Journal Replay Issues / Ceph Disaster Recovery Advice/Questions

2022-01-20 Thread Alex Jackson
 Hello Ceph Users,

I wanted to hopefully get some advice or at least get some questions
answered about the Ceph Disaster Recovery Process detailed in the docs. The
questions I have are as follows:

- Do all the steps need to be performed or can I check the status of the
MDS after each until it recovers?

- What does the Journal truncate do? From what the name suggests it
truncates part of the journal, but from what the warnings sound like, it
might cause some unexpected data to be deleted or delete the journal
entirely.

- Where would I use the data stored from recover_dentries to rebuild the
metadata?

- What sorts of information would an "expert" need to perform a successful
disaster recovery?


Other than questions, I was hoping to get some advice on my situation and
whether I even need disaster recovery.

I recently had a power blip reset the ceph servers and they came back
barking about the CephFS MDSs being unable to start. The status listed
UP:replay. Upon further investigation, there seemed to be issues with the
journal and the MDS log had some errors in replaying.

The somewhat abridged log can be found here (abridged because it spits out
the same stuff): https://pastebin.com/FkypNkSZ

The main errors lines in my mind, though, are:

Jan 19 13:28:26 nxpmn01 ceph-mds[313765]: -3> 2022-01-19T13:28:26.091-0500
7f80a0ba7700 -1 log_channel(cluster) log [ERR] : journal replay inotablev
mismatch 2 -> 2417

Jan 19 13:28:26 nxpmn01 ceph-mds[313765]: -2> 2022-01-19T13:28:26.091-0500
7f80a0ba7700 -1 log_channel(cluster) log [ERR] : EMetaBlob.replay
sessionmap v 1160787 - 1 > table 0


Everything I've found online says I might need a journal truncate. I was
hoping to avoid it coming to that, though, as I'm not an "expert" as
mentioned in the Disaster Recovery docs.

Relevant info about my Ceph setup:

- 3 servers running Proxmox 6.4-13 and Ceph 15.2.10

- ceph -s returns:

cluster: id: 642c8584-f642-4043-a43d-a984bbf75603 health: HEALTH_WARN 1
filesystem is degraded insufficient standby MDS daemons available 99
daemons have recently crashed services: mon: 3 daemons, quorum
nxpmn01,nxpmn02,nxpmn03 (age 5d) mgr: nxpmn02(active, since 9d), standbys:
nxpmn03, nxpmn01 mds: cephfs:1/1 {0=nxpmn01=up:replay(laggy or crashed)}
osd: 18 osds: 18 up (since 5d), 18 in (since 3w) data: pools: 5 pools, 209
pgs objects: 4.25M objects, 16 TiB usage: 23 TiB used, 28 TiB / 51 TiB
avail pgs: 209 active+clean

- OSDs are UP and IN

- To my knowledge CephFS has only 1 rank (rank 0?)


Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-mon is low on available space

2022-01-20 Thread Michel Niyoyita
Dear Team ,

I have a warning  on my cluster which I deployed using Ansible on ubuntu
20.04 and with pacific ceph version , which says :

root@ceph-mon1:~# ceph health detail
HEALTH_WARN mon ceph-mon1 is low on available space
[WRN] MON_DISK_LOW: mon ceph-mon1 is low on available space
mon.ceph-mon1 has 28% avail

Can anyone help me to solve this? which command should run to solve this
problem?, I tried : ceph tell mon.ceph-mon1 compact , but no
favourite result.

Kindly advise

Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io