[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li

Hi Dejan,

This is a known issue and please see https://tracker.ceph.com/issues/61009.

For the workaround please see https://tracker.ceph.com/issues/61009#note-26.

Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:

Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly crashing with 
FAILED ceph_assert(p->first <= start) in md_log_replay thread. Is there any way 
to work around this and get to accesible file system or should we start with disaster 
recovery?
It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
 "assert_condition": "p->first <= start",
 "assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",
 "assert_func": "void interval_set::erase(T, T, std::function) 
[with T = inodeno_t; C = std::map]",
 "assert_line": 568,
 "assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h:
 In function 'void interval_set::erase(T, T, std::function) [with T = inodeno_t; C = 
std::map]' thread 7fcdaaf8a640 time 
2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h:
 568: FAILED ceph_assert(p->first <= start)\n",
 "assert_thread_name": "md_log_replay",
 "backtrace": [
 "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
 "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
 "raise()",
 "abort()",
 "(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x188) [0x7fcdb83610ff]",
 "/usr/lib64/ceph/libceph-common.so.2(+0x161263) [0x7fcdb8361263]",
 "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
 "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
 "(EMetaBlob::replay(MDSRank*, LogSegment*, int, MDPeerUpdate*)+0x4b9d) 
[0x55a5906e1c8d]",
 "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
 "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
 "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
 "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
 "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
 ],
 "ceph_version": "18.2.2",
 "crash_id": 
"2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",
 "entity_name": "mds.spod19",
 "os_id": "almalinux",
 "os_name": "AlmaLinux",
 "os_version": "9.3 (Shamrock Pampas Cat)",
 "os_version_id": "9.3",
 "process_name": "ceph-mds",
 "stack_sig": 
"3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",
 "timestamp": "2024-05-07T22:26:22.050652Z",
 "utsname_hostname": "spod19.ijs.si",
 "utsname_machine": "x86_64",
 "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
 "utsname_sysname": "Linux",
 "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 2023"
}


Cheers,
Dejan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Dejan Lesjak

Hi Xiubo,

On 8. 05. 24 09:53, Xiubo Li wrote:

Hi Dejan,

This is a known issue and please see https://tracker.ceph.com/issues/61009.

For the workaround please see 
https://tracker.ceph.com/issues/61009#note-26.


Thank you for the links. Unfortunately I'm not sure I understand the 
workaround: the clients should be mounted without nowsync, however, the 
clients don't get to the point of mounting as mds is not available yet 
as it is doing replay.
Rebooting clients does not seem to help as they are still in clients 
list (from "ceph tell mds.1 client ls").


Thanks,
Dejan


Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:

Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly 
crashing with FAILED ceph_assert(p->first <= start) in md_log_replay 
thread. Is there any way to work around this and get to accesible file 
system or should we start with disaster recovery?

It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
 "assert_condition": "p->first <= start",
 "assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",
 "assert_func": "void interval_set::erase(T, T, 
std::function) [with T = inodeno_t; C = std::map]",

 "assert_line": 568,
 "assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: In function 'void interval_set::erase(T, T, std::function) [with T = inodeno_t; C = std::map]' thread 7fcdaaf8a640 time 2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 568: FAILED ceph_assert(p->first <= start)\n",

 "assert_thread_name": "md_log_replay",
 "backtrace": [
 "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
 "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
 "raise()",
 "abort()",
 "(ceph::__ceph_assert_fail(char const*, char const*, int, 
char const*)+0x188) [0x7fcdb83610ff]",
 "/usr/lib64/ceph/libceph-common.so.2(+0x161263) 
[0x7fcdb8361263]",

 "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
 "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
 "(EMetaBlob::replay(MDSRank*, LogSegment*, int, 
MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",

 "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
 "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
 "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
 "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
 "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
 ],
 "ceph_version": "18.2.2",
 "crash_id": 
"2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",

 "entity_name": "mds.spod19",
 "os_id": "almalinux",
 "os_name": "AlmaLinux",
 "os_version": "9.3 (Shamrock Pampas Cat)",
 "os_version_id": "9.3",
 "process_name": "ceph-mds",
 "stack_sig": 
"3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",

 "timestamp": "2024-05-07T22:26:22.050652Z",
 "utsname_hostname": "spod19.ijs.si",
 "utsname_machine": "x86_64",
 "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
 "utsname_sysname": "Linux",
 "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 EST 
2023"

}


Cheers,
Dejan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem with take-over-existing-cluster.yml playbook

2024-05-08 Thread Eugen Block

Hi,

I'm not familiar with ceph-ansible. I'm not sure if I understand it  
correctly, according to [1] it tries to get the public IP range to  
define monitors (?). Can you verify if your mon sections in  
/etc/ansible/hosts are correct?


ansible.builtin.set_fact:
_monitor_addresses: "{{ _monitor_addresses | default({}) |  
combine({item: hostvars[item]['ansible_facts']['all_ipv4_addresses'] |  
ips_in_ranges(hostvars[item]['public_network'].split(',')) | first}) }}"


[1]  
https://github.com/ceph/ceph-ansible/blob/878cce5b4847a9a112f9d07c0fd651aa15f1e58b/roles/ceph-facts/tasks/set_monitor_address.yml


Zitat von vladimir franciz blando :


I know that only a few are using this script but just trying my luck here
if someone has the same issue as mine.

But first, who has successfully used this script and what version did you
use? Im using this guide on my test environment (
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/installation_guide_for_red_hat_enterprise_linux/importing-an-existing-ceph-cluster-to-ansible
)

Error encountered
---
TASK [Generate ceph configuration file]
**

***
fatal: [vladceph-1]: FAILED! =>
  msg: '''_monitor_addresses'' is undefined. ''_monitor_addresses'' is
undefined'
fatal: [vladceph-3]: FAILED! =>
  msg: '''_monitor_addresses'' is undefined. ''_monitor_addresses'' is
undefined'
fatal: [vladceph-2]: FAILED! =>
  msg: '''_monitor_addresses'' is undefined. ''_monitor_addresses'' is
undefined'
---



Regards,
Vlad Blando 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to define a read-only sub-user?

2024-05-08 Thread Matthew Darwin

Hi,

I'm new to bucket policies. I'm trying to create a sub-user that has 
only read-only access to all the buckets of the main user. I created 
the below policy, I can't create or delete files, but I can still 
create buckets using "rclone mkdir".  Any idea what I'm doing wrong?


I'm using ceph quincy.

radosgw-admin subuser create --uid=main_user --subuser=eosn_read 
--access=read
radosgw-admin key create --subuser=main_user:sub_user --key-type=s3 
--access-key  --secret-key 

s3cmd setpolicy policy.txt  s3://somebucket

{
  "Version": "2012-10-17",
  "Statement": [
    {
  "Effect": "Allow",
  "Principal": {
    "AWS": [
  "arn:aws:iam:::user/main_user:sub_user"
    ]
  },
  "Action": [
    "s3:ListBucket",
    "s3:ListAllMyBuckets",
    "s3:GetObject",
    "s3:GetObjectVersion",
    "s3:GetObjectTagging",
    "s3:GetObjectRetention",
    "s3:GetObjectLegalHold"
  ],
  "Resource": "arn:aws:s3:::*"
    }
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph dashboard reef 18.2.2 radosgw

2024-05-08 Thread Christopher Durham
Hello,
I am uisng 18.2.2 on Rocky 8 Linux.

I am getting http error 500 whe trying to hit the ceph dashboard on reef 18.2.2 
when trying to look at any of the radosgw pages.
I tracked this down to /usr/share/ceph/mgr/dashboard/controllers/rgw.py
It appears to parse the metadata for a given radosgw server improperly. In my 
varoous rgw ceph.conf entries, I have:
rgw frontends = beast ssl_endpoint=0.0.0.0 
ssl_certificate=/path/to/pem_with_cert_and_key
but, rgw.py pulls the metadata for each server, and it is looking for 'port=' 
in the metadata for each server. When it doesn't find it based on line 147 in 
rgw.py, the ceph-mgr logs throwan exception which the manager proper catches 
and returns a 500.
Would changing my frontends definition work? Is this known? I have had the 
frontends definition for awhile prior to my reef upgrade. Thanks
-Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph User + Community Meeting and Survey [May 23]

2024-05-08 Thread Noah Lehman
Hi Ceph users and devs,

We've announced the next community meeting and survey on Ceph social media
and would appreciate help promoting. The next meeting will be Thursday, May
23rd at 10PM EDT. Details can be found here

.

Ceph twitter /xCeph
linkedin


*Language:*

Our next Ceph User + Developer Monthly Meeting is coming May 23! The goal
of these meetings is to elicit feedback from the users, companies, and
organizations who use #Ceph in their production environments.


Take the survey and share your voice: https://t.co/xGAN5FlGDu


Thanks!

Noah
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph User + Community Meeting and Survey [May 23]

2024-05-08 Thread Laura Flores
Thanks Noah!

We've already gotten a few responses. Let's keep it up!

Take the survey here:
https://docs.google.com/forms/d/e/1FAIpQLSet7HyqfREYCSYZxA1ggvBchDN7GZh1av4WG86MLbVK1gyhaw/viewform?usp=sf_link

- Laura

On Wed, May 8, 2024 at 2:16 PM Noah Lehman 
wrote:

> Hi Ceph users and devs,
>
> We've announced the next community meeting and survey on Ceph social media
> and would appreciate help promoting. The next meeting will be Thursday, May
> 23rd at 10PM EDT. Details can be found here
> 
> .
>
> Ceph twitter /xCeph
> linkedin
> 
>
> *Language:*
>
> Our next Ceph User + Developer Monthly Meeting is coming May 23! The goal
> of these meetings is to elicit feedback from the users, companies, and
> organizations who use #Ceph in their production environments.
>
>
> Take the survey and share your voice: https://t.co/xGAN5FlGDu
>
>
> Thanks!
>
> Noah
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Guidance on using large RBD volumes - NTFS

2024-05-08 Thread Robert W. Eckert
Hi- I managed to improve my throughput somewhat by recreating the RBD Image 
with a larger object size (I chose 16 Mb, not from any science but on a gut 
feel ) and then change the stripe to 4This seemed to roughly double the 
performance from an average of 8-12Mb/s write to 16-24 Mb/s.
I also changed the pool to  PG Autoscale, and it has spent the past day or so 
reducing the number of PGs to what appears to be a target of 64.I am now 
seeing higher write speeds of 40Mb/s  to as high as 100Mb/s.  

The average response time shown in windows task manager appears to be 'off' - I 
have seen it jump around from 500 ms  to over 300 seconds back to 3 seconds in 
a matter of a few refreshes.

Windows resource monitor is showing a more consistent response time on multiple 
parallel writes of about 1.5-3 seconds per write.



-Original Message-
From: Robert W. Eckert  
Sent: Tuesday, May 7, 2024 8:36 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Guidance on using large RBD volumes - NTFS

Hi - in my home , I have been running cephfs for a few years, and have 
reasonably good performance, however since exposing cephfs via SMB has been hit 
and miss.So I thought I could carve out space for a RBD device to share 
from a windows machine


My set up:

CEPH 18.2.2  deployed using ceph adm

4 servers running RHEL9  on AMD 5600g  CPUs
64 Gb Each
10Gbe NICs
4x4Tb Hdd
1 2Tb NVME for DB/WAL
rbd pool is set to auto  PG - its currently at 256

I have tested the NIC connection between the servers and my PC, an each point 
to point works well at the 10Gbe speeds

Now the problem

I created a 8Tb RBD using

rbd create winshare -size 8T -pool rbd
rbd map winshare

I prepped the drive, formatted it, and the drive appears cleanly as an 8 Tb 
drive.

When I used fio on the drive/volume, speeds were good around 150-200 Mb/s.

Then I started trying to populate the drive from a few different sources, and 
performance took a nose dive.  - Write speeds are about 6-10 Mb/s,  and windows 
task manager shows  average response time  anywhere from 500ms to 30 seconds. - 
mainly around 4 seconds.


I don't see any obvious bottlenecks - cpu on the servers are about 5-10%, 
memory is good,   Network is showing under 1 Gb/s  on all servers.


I am wondering if I needed to use different parameters for creating the volume? 
 Or is there a practical limit to the volume size I exceeded?

Thanks,

Rob

___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to handle incomplete data after rbd import-diff failure?

2024-05-08 Thread Satoru Takeuchi
2024年5月2日(木) 7:42 Satoru Takeuchi :
>
> Hi Maged,
>
> 2024年5月2日(木) 5:34 Maged Mokhtar :
>>
>>
>> On 01/05/2024 16:12, Satoru Takeuchi wrote:
>> > I confirmed that incomplete data is left on `rbd import-diff` failure.
>> > I guess that this data is the part of snapshot. Could someone answer
>> > me the following questions?
>> >
>> > Q1. Is it safe to use the RBD image (e.g. client I/O and snapshot
>> > management) even though incomplete data exists?
>> > Q2. Is there any way to clean up the incomplete data?
>> >
>> > I read the following document and understand that this problem will be
>> > resolved after running `rbd import-diff` again.
>> >
>> > https://ceph.io/en/news/blog/2013/incremental-snapshots-with-rbd/
>> >> Since overwriting the same data is idempotent, it’s safe to have an 
>> >> import-diff interrupted in the middle.
>> > However, it's difficult if I can't access the exported backup data
>> > anymore. For instance, I'm afraid of the following scenario.
>> >
>> > 1. Send the backup data from one DC (DC0) to another DC (DC1) periodically.
>> > 2. The backup data is created in DC0 and is sent directly to DC1
>> > without persist backup data as a file.
>> > 3. Major power outage happens in DC0 and it's impossible to
>> > re-generate the backup data for  a long time.
>> >
>> > I simulated this problem as follows:
>> >
>> > 1. Create an RBD image.
>> > 2. Write some data to this image.
>> > 3. Create a snapshot S0.
>> > 4. Write another data to this image.
>> > 5. Create a snapshot S1.
>> > 6. Create a backup data consists of the difference between S0 and S1
>> > by running rbd export-diff.
>> > 7. Delete the last byte of the backup data, which is 'e' and means the
>> > end of the backup data, to inject import-diff failure.
>> > 8. Delete S1.
>> > 9. Run rbd import-diff to apply the broken backup data created in the step 
>> > 7.
>> >
>> > Then step9 failed and S1 was not created. However, the number of RADOS
>> > objects and the storage usage has increased.
>> >
>> > before
>> > ```
>> > $ rados -p replicapool df
>> > POOL_NAME  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
>> > UNFOUND  DEGRADED  RD_OPS  RD  WR_OPS  WR  USED COMPR  UNDER
>> > COMPR
>> > replicapool  11 MiB   24   9  24   0
>> >   0 03609  53 MiB 279  41 MiB 0 B  0 B
>> >
>> > total_objects24
>> > total_used   39 MiB
>> > total_avail  32 GiB
>> > total_space  32 GiB
>> > ```
>> >
>> > after:
>> > ```
>> > $ rados -p replicapool df
>> > POOL_NAME  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
>> > UNFOUND  DEGRADED  RD_OPS  RD  WR_OPS  WR  USED COMPR  UNDER
>> > COMPR
>> > replicapool  12 MiB   25   9  25   0
>> >   0 03531  53 MiB 278  41 MiB 0 B  0 B
>> >
>> > total_objects25
>> > total_used   40 MiB
>> > total_avail  32 GiB
>> > total_space  32 GiB
>> > ```
>> >
>> > The incomplete data seem to increase if rbd import-diff fails again
>> > and again. The following output was get after the above-mentioned
>> > step9 100 times.
>> >
>> > ```
>> > $ rados -p replicapool df
>> > POOL_NAME  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY
>> > UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS   WR  USED COMPR  UNDER
>> > COMPR
>> > replicapool  12 MiB   25   9  25   0
>> >   0 07925  104 MiB1308  164 MiB 0 B  0
>> > B
>> >
>> > total_objects25
>> > total_used   58 MiB
>> > total_avail  32 GiB
>> > total_space  32 GiB
>> > ```
>> >
>> > Thanks,
>> > Satoru
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>> the image is not in a consistent state so should not be used as is. if
>> you no longer have access to the source image or its exported data, you
>> should be able to use the rbd snap rollback command to rollback the
>> destination image to its last  known good snapshot, the destination
>> snapshots get created from the import-diff command with names matching
>> source snapshots.
>
>
> Thank you for the reply. I succeed to rollback the rbd image to S0 and 
> `total_objects` got back to the previous value (24).
>
> On the other hand, `total_used` didn't become the original value. Repeating 
> the following steps resulted in the continuous growth of `total_used`.
>
> 1. Import the broken diff (it fails).
> 2. Rollback to S0.
>
> I guess it's a resource leak.
>
> Could you tell me whether I can clean up these remaining garbage data?

I verified the behavior of rollback after rbd import failure. Then
garbage data seems to disappear. I opened a new issue to know whether
garbage data disappears in all cases.

https://tracker.ceph.com/issues/65873

Thanks again Maged to answer my question.

Best,
Satoru
___
ceph-users mail

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li


On 5/8/24 17:36, Dejan Lesjak wrote:

Hi Xiubo,

On 8. 05. 24 09:53, Xiubo Li wrote:

Hi Dejan,

This is a known issue and please see 
https://tracker.ceph.com/issues/61009.


For the workaround please see 
https://tracker.ceph.com/issues/61009#note-26.


Thank you for the links. Unfortunately I'm not sure I understand the 
workaround: the clients should be mounted without nowsync, however, 
the clients don't get to the point of mounting as mds is not available 
yet as it is doing replay.
Rebooting clients does not seem to help as they are still in clients 
list (from "ceph tell mds.1 client ls").



Hi Dejan,

We are disscussing the same issue in slack thread 
https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1715189877518529.


Thanks

- Xiubo



Thanks,
Dejan


Thanks

- Xiubo

On 5/8/24 06:49, Dejan Lesjak wrote:

Hello,

We have cephfs with two active MDS. Currently rank 1 is repeatedly 
crashing with FAILED ceph_assert(p->first <= start) in md_log_replay 
thread. Is there any way to work around this and get to accesible 
file system or should we start with disaster recovery?

It seems similar to https://tracker.ceph.com/issues/61009
Crash info:

{
 "assert_condition": "p->first <= start",
 "assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h",
 "assert_func": "void interval_set::erase(T, T, 
std::function) [with T = inodeno_t; C = std::map]",

 "assert_line": 568,
 "assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 
In function 'void interval_set::erase(T, T, 
std::function) [with T = inodeno_t; C = std::map]' 
thread 7fcdaaf8a640 time 
2024-05-08T00:26:22.049974+0200\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el9/BUILD/ceph-18.2.2/src/include/interval_set.h: 
568: FAILED ceph_assert(p->first <= start)\n",

 "assert_thread_name": "md_log_replay",
 "backtrace": [
 "/lib64/libc.so.6(+0x54db0) [0x7fcdb7a54db0]",
 "/lib64/libc.so.6(+0xa154c) [0x7fcdb7aa154c]",
 "raise()",
 "abort()",
 "(ceph::__ceph_assert_fail(char const*, char const*, int, 
char const*)+0x188) [0x7fcdb83610ff]",
 "/usr/lib64/ceph/libceph-common.so.2(+0x161263) 
[0x7fcdb8361263]",

 "/usr/bin/ceph-mds(+0x1f3b0e) [0x55a5904a9b0e]",
 "/usr/bin/ceph-mds(+0x1f3b55) [0x55a5904a9b55]",
 "(EMetaBlob::replay(MDSRank*, LogSegment*, int, 
MDPeerUpdate*)+0x4b9d) [0x55a5906e1c8d]",

 "(EUpdate::replay(MDSRank*)+0x5d) [0x55a5906eacbd]",
 "(MDLog::_replay_thread()+0x7a1) [0x55a590694af1]",
 "/usr/bin/ceph-mds(+0x1460f1) [0x55a5903fc0f1]",
 "/lib64/libc.so.6(+0x9f802) [0x7fcdb7a9f802]",
 "/lib64/libc.so.6(+0x3f450) [0x7fcdb7a3f450]"
 ],
 "ceph_version": "18.2.2",
 "crash_id": 
"2024-05-07T22:26:22.050652Z_8be89ffb-bb87-4832-9339-57f8bd29f766",

 "entity_name": "mds.spod19",
 "os_id": "almalinux",
 "os_name": "AlmaLinux",
 "os_version": "9.3 (Shamrock Pampas Cat)",
 "os_version_id": "9.3",
 "process_name": "ceph-mds",
 "stack_sig": 
"3d0a2ca9b3c7678bf69efc20fff42b588c63f8be1832e1e0c28c99bafc082c15",

 "timestamp": "2024-05-07T22:26:22.050652Z",
 "utsname_hostname": "spod19.ijs.si",
 "utsname_machine": "x86_64",
 "utsname_release": "5.14.0-362.8.1.el9_3.x86_64",
 "utsname_sysname": "Linux",
 "utsname_version": "#1 SMP PREEMPT_DYNAMIC Tue Nov 7 14:54:22 
EST 2023"

}


Cheers,
Dejan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph dashboard reef 18.2.2 radosgw

2024-05-08 Thread Nizamudeen A
Hello Christopher,

Could you please paste the logs and exceptions to this thread as well?

Regards,
Nizam

On Wed, May 8, 2024 at 11:21 PM Christopher Durham 
wrote:

> Hello,
> I am uisng 18.2.2 on Rocky 8 Linux.
>
> I am getting http error 500 whe trying to hit the ceph dashboard on reef
> 18.2.2 when trying to look at any of the radosgw pages.
> I tracked this down to /usr/share/ceph/mgr/dashboard/controllers/rgw.py
> It appears to parse the metadata for a given radosgw server improperly. In
> my varoous rgw ceph.conf entries, I have:
> rgw frontends = beast ssl_endpoint=0.0.0.0
> ssl_certificate=/path/to/pem_with_cert_and_key
> but, rgw.py pulls the metadata for each server, and it is looking for
> 'port=' in the metadata for each server. When it doesn't find it based on
> line 147 in rgw.py, the ceph-mgr logs throwan exception which the manager
> proper catches and returns a 500.
> Would changing my frontends definition work? Is this known? I have had the
> frontends definition for awhile prior to my reef upgrade. Thanks
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Forcing Posix Permissions On New CephFS Files

2024-05-08 Thread duluxoz

Hi All,

I've gone and gotten myself into a "can't see the forest for the trees" 
state, so I'm hoping someone can take pity on me and answer a really dumb Q.


So I've got a CephFS system happily bubbling along and a bunch of 
(linux) workstations connected to a number of common shares/folders. To 
take a single one of these folders as an example ("music") the 
sub-folders and files of that share all belong to root:music with 
permissions of 2770 (folders) and 0660 (files). The "music" folder is 
then connected to (as per the Ceph Doco: mount.ceph) via each 
workstation's fstab file - all good, all working, everyone's happy.


What I'm trying to achieve is that when a new piece of music (a file) is 
uploaded to the Ceph Cluster the file inherits the music share's default 
ownership (root:music) and permissions (0660). What is happening at the 
moment is I'm getting permissions of 644 (and 755 for new folders).


I've been looking for a way to do what I want but, as I said, I've gone 
and gotten myself thoroughly mixed-up.


Could someone please point me in the right direction on how to achieve 
what I'm after - thanks


Cheers

Dulux-Oz
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io