[ceph-users] Re: Status of IPv4 / IPv6 dual stack?

2024-04-23 Thread Marc
> I have removed dual-stack-mode-related information from the documentation
> on the assumption that dual-stack mode was planned but never fully
> implemented.
> 
> See https://tracker.ceph.com/issues/65631.
> 
> See https://github.com/ceph/ceph/pull/57051.
> 
> Hat-tip to Dan van der Ster, who bumped this thread for me.

"I will remove references to dual-stack mode in the documentation because i"

I prefer if it would state that dual stack is not supported (and maybe why). By 
default I would assume such a thing is supported. I would not even suspect it 
could be an issue.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stuck in replay?

2024-04-23 Thread Lars Köppel
Hi Erich,

great that you recovered from this.
It sounds like you had the same problem I had a few months ago.
mds crashes after up:replay state - ceph-users - lists.ceph.io


Kind regards,
Lars


[image: ariadne.ai Logo] Lars Köppel
Developer
Email: lars.koep...@ariadne.ai
Phone: +49 6221 5993580 <+4962215993580>
ariadne.ai (Germany) GmbH
Häusserstraße 3, 69115 Heidelberg
Amtsgericht Mannheim, HRB 744040
Geschäftsführer: Dr. Fabian Svara
https://ariadne.ai


On Mon, Apr 22, 2024 at 11:31 PM Sake Ceph  wrote:

> 100 GB of Ram! Damn that's a lot for a filesystem in my opinion, or am I
> wrong?
>
> Kind regards,
> Sake
>
> > Op 22-04-2024 21:50 CEST schreef Erich Weiler :
> >
> >
> > I was able to start another MDS daemon on another node that had 512GB
> > RAM, and then the active MDS eventually migrated there, and went through
> > the replay (which consumed about 100 GB of RAM), and then things
> > recovered.  Phew.  I guess I need significantly more RAM in my MDS
> > servers...  I had no idea the MDS daemon could require that much RAM.
> >
> > -erich
> >
> > On 4/22/24 11:41 AM, Erich Weiler wrote:
> > > possibly but it would be pretty time consuming and difficult...
> > >
> > > Is it maybe a RAM issue since my MDS RAM is filling up?  Should maybe
> I
> > > bring up another MDS on another server with huge amount of RAM and
> move
> > > the MDS there in hopes it will have enough RAM to complete the replay?
> > >
> > > On 4/22/24 11:37 AM, Sake Ceph wrote:
> > >> Just a question: is it possible to block or disable all clients? Just
> > >> to prevent load on the system.
> > >>
> > >> Kind regards,
> > >> Sake
> > >>> Op 22-04-2024 20:33 CEST schreef Erich Weiler :
> > >>>
> > >>> I also see this from 'ceph health detail':
> > >>>
> > >>> # ceph health detail
> > >>> HEALTH_WARN 1 filesystem is degraded; 1 MDSs report oversized cache;
> 1
> > >>> MDSs behind on trimming
> > >>> [WRN] FS_DEGRADED: 1 filesystem is degraded
> > >>>   fs slugfs is degraded
> > >>> [WRN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
> > >>>   mds.slugfs.pr-md-01.xdtppo(mds.0): MDS cache is too large
> > >>> (19GB/8GB); 0 inodes in use by clients, 0 stray files
> > >>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
> > >>>   mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming
> (127084/250)
> > >>> max_segments: 250, num_segments: 127084
> > >>>
> > >>> MDS cache too large?  The mds process is taking up 22GB right now and
> > >>> starting to swap my server, so maybe it somehow is too large
> > >>>
> > >>> On 4/22/24 11:17 AM, Erich Weiler wrote:
> >  Hi All,
> > 
> >  We have a somewhat serious situation where we have a cephfs
> filesystem
> >  (18.2.1), and 2 active MDSs (one standby).  ThI tried to restart
> one of
> >  the active daemons to unstick a bunch of blocked requests, and the
> >  standby went into 'replay' for a very long time, then RAM on that
> MDS
> >  server filled up, and it just stayed there for a while then
> eventually
> >  appeared to give up and switched to the standby, but the cycle
> started
> >  again.  So I restarted that MDS, and now I'm in a situation where I
> see
> >  this:
> > 
> >  # ceph fs status
> >  slugfs - 29 clients
> >  ==
> >  RANK   STATEMDSACTIVITY   DNSINOS
> >  DIRS   CAPS
> > 0 replay  slugfs.pr-md-01.xdtppo3958k  57.1k
> >  12.2k 0
> > 1resolve  slugfs.pr-md-02.sbblqq   0  3
> >  1  0
> >   POOL   TYPE USED  AVAIL
> > cephfs_metadatametadata   997G  2948G
> >  cephfs_md_and_datadata   0   87.6T
> >   cephfs_datadata 773T   175T
> > STANDBY MDS
> >  slugfs.pr-md-03.mclckv
> >  MDS version: ceph version 18.2.1
> >  (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
> > 
> >  It just stays there indefinitely.  All my clients are hung.  I tried
> >  restarting all MDS daemons and they just went back to this state
> after
> >  coming back up.
> > 
> >  Is there any way I can somehow escape this state of indefinite
> >  replay/resolve?
> > 
> >  Thanks so much!  I'm kinda nervous since none of my clients have
> >  filesystem access at the moment...
> > 
> >  cheers,
> >  erich
> > >>> ___
> > >>> ceph-users mailing list -- ceph-users@ceph.io
> > >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-use

[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Frédéric Nass
Hello,

My turn ;-)

Ceph is strongly consistent. Either you read/write objects/blocs/files with an 
insured strong consistency OR you don't. Worst thing you can expect from Ceph, 
as long as it's been properly designed, configured and operated is a temporary 
loss of access to the data.

There are now a few companies in the world with deep knowledge of Ceph that are 
designing, deploying and operating Ceph clusters in the best way for their 
customers, contributing to the leadership and development of Ceph at the 
highest level, some of them even offering their own downstream version of Ceph, 
ensuring customers are operating the most up-to-date, stable and best 
performing version of Ceph.

In the long term, it is more interesting to invest in software and in reliable, 
responsive support, attentive to customers, capable of pushing certain 
developments to improve Ceph and match customers needs than to buy overpriced 
hardware, with limited functionalities and lifespan, from vendors not always 
paying attention to how customers use their products.

Regards,
Frédéric.


- Le 17 Avr 24, à 17:06, sebcio t sebci...@o2.pl a écrit :

> Hi,
> I have problem to answer to this question:
> Why CEPH is better than other storage solutions?
> 
> I know this high level texts about
> - scalability,
> - flexibility,
> - distributed,
> - cost-Effectiveness
> 
> What convince me, but could be received also against, is ceph as a product has
> everything what I need it mean:
> block storage (RBD),
> file storage (CephFS),
> object storage (S3, Swift)
> and "plugins" to run NFS, NVMe over Fabric, NFS on object storage.
> 
> Also many other features which are usually sold as a option (mirroring, geo
> replication, etc) in paid solutions.
> I have problem to write it done piece by piece.
> I want convince my managers we are going in good direction.
> 
> Why not something from robin.io or purestorage, netapp, dell/EMC. From
> opensource longhorn or openEBS.
> 
> If you have ideas please write it.
> 
> Thanks,
> S.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror failed to query services: (13) Permission denied

2024-04-23 Thread Eugen Block
I'm not entirely sure if I ever tried it with the rbd-mirror user  
instead of admin user, but I see the same error message on 17.2.7. I  
assume that it's not expected, I think a tracker issue makes sense.


Thanks,
Eugen

Zitat von Stefan Kooman :


Hi,

We are testing rbd-mirroring. There seems to be a permission error  
with the rbd-mirror user. Using this user to query the mirror pool  
status gives:


failed to query services: (13) Permission denied

And results in the following output:

health: UNKNOWN
daemon health: UNKNOWN
image health: OK
images: 3 total
2 replaying
1 stopped

So, this command: rbd --id rbd-mirror mirror pool status rbd

So basically the health and daemon health cannot be obtained due to  
permission errors, but status about images can.


When the command is run with admin permissions the health and daemon  
health are returned without issue.


I tested this on Reef 18.2.2.

Is this expected behavior? If not, I will create a tracker ticket for it.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] s3 bucket policy subusers - access denied

2024-04-23 Thread sinan

I want to achieve the following:

- Create an user
- Create 2 subusers
- Create 2 buckets
- Apply a policy for each bucket
- A subuser should only have access to its own bucket


Problem:
Getting a 403 AccessDenied with subuser credentials when uploading 
files.



I did the following:

radosgw-admin user create --uid=foo-user --display_name="Foo Test User"
radosgw-admin subuser create --uid=foo-user --gen-access-key 
--gen-secret --key-type=s3 --subuser=foo-user-subuser
radosgw-admin subuser create --uid=foo-user --gen-access-key 
--gen-secret --key-type=s3 --subuser=foo-user-subuser2


Resulting in:
{
"user_id": "foo-user",
"display_name": "Foo Test User",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [
{
"id": "foo-user:foo-user-subuser",
"permissions": ""
},
{
"id": "foo-user:foo-user-subuser2",
"permissions": ""
}
],
"keys": [
{
"user": "foo-user:foo-user-subuser",
"access_key": "",
"secret_key": ""
},
{
"user": "foo-user:foo-user-subuser2",
"access_key": "",
"secret_key": ""
},
{
"user": "foo-user",
"access_key": "",
"secret_key": ""
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}


Using the credentials of the main account (user: foo-user) creating 
buckets and setting policies:

s3cmd mb s3://foo-bucket
s3cmd mb s3://foo-bucket2
s3cmd setpolicy foo-test-subuser-policy s3://foo-bucket
s3cmd setpolicy foo-test-subuser2-policy s3://foo-bucket2

Resulting in (I am showing just foo-bucket, but the same goes for 
foo-bucket2):

# s3cmd info s3://foo-bucket
s3://foo-bucket/ (bucket):
   Payer: BucketOwner
   Ownership: none
   Versioning:none
   Expiration rule: none
   Block Public Access: none
   Policy:{
  "Version": "2012-10-17",
  "Statement": [
{
  "Effect": "Allow",
  "Principal": {
"AWS": [
  "arn:aws:iam:::user/foo-user:foo-user-subuser"
]
  },
  "Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:ListBucketMultipartUploads",
"s3:ListBucket",
"s3:ListMultipartUploadParts",
"s3:PutObject"
  ],
  "Resource": [
"arn:aws:s3:::foo-bucket"
  ]
}
  ]
}

   CORS:  none
   ACL:   Foo Test User: FULL_CONTROL


When I try to upload files (using the subuser foo-user-subuser 
credentials) it doesn't work:

# s3cmd ls
2024-04-23 06:59  s3://foo-bucket
2024-04-23 10:05  s3://foo-bucket2

# s3cmd put ~/Documents/file_2.txt s3://foo-bucket
upload: '/home/foo/Documents/file_2.txt' -> 's3://foo-bucket/file_2.txt' 
 [1 of 1]

10 of 10   100% in0s18.96 B/s  done
ERROR: S3 error: 403 (AccessDenied)


What is wrong with my policy? I thought that I did exactly the same 
earlier and it worked, but I am in doubt now


Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stuck in replay?

2024-04-23 Thread David Yang
Hi Erich
When mds cache usage is very high, recovery is very slow.
So I use command to drop mds cache:
ceph tell mds.* cache drop 600

Lars Köppel  于2024年4月23日周二 16:36写道:
>
> Hi Erich,
>
> great that you recovered from this.
> It sounds like you had the same problem I had a few months ago.
> mds crashes after up:replay state - ceph-users - lists.ceph.io
> 
>
> Kind regards,
> Lars
>
>
> [image: ariadne.ai Logo] Lars Köppel
> Developer
> Email: lars.koep...@ariadne.ai
> Phone: +49 6221 5993580 <+4962215993580>
> ariadne.ai (Germany) GmbH
> Häusserstraße 3, 69115 Heidelberg
> Amtsgericht Mannheim, HRB 744040
> Geschäftsführer: Dr. Fabian Svara
> https://ariadne.ai
>
>
> On Mon, Apr 22, 2024 at 11:31 PM Sake Ceph  wrote:
>
> > 100 GB of Ram! Damn that's a lot for a filesystem in my opinion, or am I
> > wrong?
> >
> > Kind regards,
> > Sake
> >
> > > Op 22-04-2024 21:50 CEST schreef Erich Weiler :
> > >
> > >
> > > I was able to start another MDS daemon on another node that had 512GB
> > > RAM, and then the active MDS eventually migrated there, and went through
> > > the replay (which consumed about 100 GB of RAM), and then things
> > > recovered.  Phew.  I guess I need significantly more RAM in my MDS
> > > servers...  I had no idea the MDS daemon could require that much RAM.
> > >
> > > -erich
> > >
> > > On 4/22/24 11:41 AM, Erich Weiler wrote:
> > > > possibly but it would be pretty time consuming and difficult...
> > > >
> > > > Is it maybe a RAM issue since my MDS RAM is filling up?  Should maybe
> > I
> > > > bring up another MDS on another server with huge amount of RAM and
> > move
> > > > the MDS there in hopes it will have enough RAM to complete the replay?
> > > >
> > > > On 4/22/24 11:37 AM, Sake Ceph wrote:
> > > >> Just a question: is it possible to block or disable all clients? Just
> > > >> to prevent load on the system.
> > > >>
> > > >> Kind regards,
> > > >> Sake
> > > >>> Op 22-04-2024 20:33 CEST schreef Erich Weiler :
> > > >>>
> > > >>> I also see this from 'ceph health detail':
> > > >>>
> > > >>> # ceph health detail
> > > >>> HEALTH_WARN 1 filesystem is degraded; 1 MDSs report oversized cache;
> > 1
> > > >>> MDSs behind on trimming
> > > >>> [WRN] FS_DEGRADED: 1 filesystem is degraded
> > > >>>   fs slugfs is degraded
> > > >>> [WRN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
> > > >>>   mds.slugfs.pr-md-01.xdtppo(mds.0): MDS cache is too large
> > > >>> (19GB/8GB); 0 inodes in use by clients, 0 stray files
> > > >>> [WRN] MDS_TRIM: 1 MDSs behind on trimming
> > > >>>   mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming
> > (127084/250)
> > > >>> max_segments: 250, num_segments: 127084
> > > >>>
> > > >>> MDS cache too large?  The mds process is taking up 22GB right now and
> > > >>> starting to swap my server, so maybe it somehow is too large
> > > >>>
> > > >>> On 4/22/24 11:17 AM, Erich Weiler wrote:
> > >  Hi All,
> > > 
> > >  We have a somewhat serious situation where we have a cephfs
> > filesystem
> > >  (18.2.1), and 2 active MDSs (one standby).  ThI tried to restart
> > one of
> > >  the active daemons to unstick a bunch of blocked requests, and the
> > >  standby went into 'replay' for a very long time, then RAM on that
> > MDS
> > >  server filled up, and it just stayed there for a while then
> > eventually
> > >  appeared to give up and switched to the standby, but the cycle
> > started
> > >  again.  So I restarted that MDS, and now I'm in a situation where I
> > see
> > >  this:
> > > 
> > >  # ceph fs status
> > >  slugfs - 29 clients
> > >  ==
> > >  RANK   STATEMDSACTIVITY   DNSINOS
> > >  DIRS   CAPS
> > > 0 replay  slugfs.pr-md-01.xdtppo3958k  57.1k
> > >  12.2k 0
> > > 1resolve  slugfs.pr-md-02.sbblqq   0  3
> > >  1  0
> > >   POOL   TYPE USED  AVAIL
> > > cephfs_metadatametadata   997G  2948G
> > >  cephfs_md_and_datadata   0   87.6T
> > >   cephfs_datadata 773T   175T
> > > STANDBY MDS
> > >  slugfs.pr-md-03.mclckv
> > >  MDS version: ceph version 18.2.1
> > >  (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)
> > > 
> > >  It just stays there indefinitely.  All my clients are hung.  I tried
> > >  restarting all MDS daemons and they just went back to this state
> > after
> > >  coming back up.
> > > 
> > >  Is there any way I can somehow escape this state of indefinite
> > >  replay/resolve?
> > > 
> > >  Thanks so much!  I'm kinda nervous since none of my clients have
> > >  filesystem access at the moment...
> > > 
> > >  cheers,
> > >  erich
> > > >>> 

[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Janne Johansson
Den tis 23 apr. 2024 kl 11:32 skrev Frédéric Nass
:
> Ceph is strongly consistent. Either you read/write objects/blocs/files with 
> an insured strong consistency OR you don't. Worst thing you can expect from 
> Ceph, as long as it's been properly designed, configured and operated is a 
> temporary loss of access to the data.

This is often more important than you think. All centralized storage
systems will have to face some kind of latency when sending data over
the network, when splitting the data into replicas or erasure coding
shards, when waiting for all copies/shards are actually finished
written (perhaps via journals) to the final destination and then
lastly for the write to be acknowledged back to the writing client. If
some vendor says that "because of our special code, this part takes
zero time", they are basically telling you that they are lying about
the status of the write in order to finish more quickly, because this
wins them contracts or wins competitions.

It will not win you any smiles when there is an incident and data that
was ACKed to be on disk suddenly isn't because some write cache lost
power at the same time as the storage box and now some database have
half-written transactions in it. Ceph is by no means the fastest
possible way to store data on a network, but it is very good while
still retaining the strong consistencies mentioned by Frederic above
allowing for many clients to do many IOs in parallel against the
cluster.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Brett Niver
Well said!
Brett


On Tue, Apr 23, 2024 at 7:05 AM Janne Johansson  wrote:

> Den tis 23 apr. 2024 kl 11:32 skrev Frédéric Nass
> :
> > Ceph is strongly consistent. Either you read/write objects/blocs/files
> with an insured strong consistency OR you don't. Worst thing you can expect
> from Ceph, as long as it's been properly designed, configured and operated
> is a temporary loss of access to the data.
>
> This is often more important than you think. All centralized storage
> systems will have to face some kind of latency when sending data over
> the network, when splitting the data into replicas or erasure coding
> shards, when waiting for all copies/shards are actually finished
> written (perhaps via journals) to the final destination and then
> lastly for the write to be acknowledged back to the writing client. If
> some vendor says that "because of our special code, this part takes
> zero time", they are basically telling you that they are lying about
> the status of the write in order to finish more quickly, because this
> wins them contracts or wins competitions.
>
> It will not win you any smiles when there is an incident and data that
> was ACKed to be on disk suddenly isn't because some write cache lost
> power at the same time as the storage box and now some database have
> half-written transactions in it. Ceph is by no means the fastest
> possible way to store data on a network, but it is very good while
> still retaining the strong consistencies mentioned by Frederic above
> allowing for many clients to do many IOs in parallel against the
> cluster.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-04-23 Thread Eugen Block

Hi,


whats the right way to add another pool?
create pool with 4/2 and use the rule for the stretched mode, finished?
the exsisting pools were automaticly set to 4/2 after "ceph mon  
enable_stretch_mode".


if that is what you require, then yes, it's as easy as that. Although  
I haven't played too much with the strech mode yet, you'll probably  
want to check the PG distribution after you created the pool. Maybe  
start with only a few PGs (4 or 8) and then inspect the distribution:


ceph pg ls-by-pool 

Then verify if the PGs are properly distributed across both DCs.
Of course, there's also the crushtool [1] to test such changes before  
actually applying them:


- Get current crushmap (ceph osd getcrushmap -o crushmap.bin)
- Decompile it (crushtool -d crushmap.bin -o crushmap.txt)
- Add new crush rule
- Compile it (crushtool -c crushmap.txt -o crushmap.new)
- Test it (crushtool -i crushmap.new --test --rule   
--num-rep 4 --show-mappings)


If you're satisfied with the mappings (also check if there are bad  
mappings with --show-bad-mappings) you can apply the new crushmap  
(ceph osd setcrushmap -i crushmap.new). Be careful when injecting a  
new crushmap!!!



i don't know, how to setup a second crush rule for the nvme class.
i thought, that i need to filter with 2 rules for the classes. is  
that correct?


If your device class is called "nvme", you can create a rule like this:

host01:~ # ceph osd crush rule create-replicated nvme-rule default host nvme

The second parameter "default" is the "root" of your crush tree. The  
third parameter is your failure domain. You can also create rules in  
the dashboard, of course.


Regards,
Eugen

[1] https://docs.ceph.com/en/latest/man/8/crushtool/

Zitat von "ronny.lippold" :


hi ... running against the wall, i need your help, again.

our test stretched cluster is running fine.
now i have 2 questions.

whats the right way to add another pool?
create pool with 4/2 and use the rule for the stretched mode, finished?
the exsisting pools were automaticly set to 4/2 after "ceph mon  
enable_stretch_mode".


the second question, we want to use ssd and nvme together.
so, we need to have a second pool for class nvme.

i don't know, how to setup a second crush rule for the nvme class.
i thought, that i need to filter with 2 rules for the classes. is  
that correct?



thanks for help,
ronny
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-04-23 Thread Stefan Kooman

On 23-04-2024 14:40, Eugen Block wrote:

Hi,


whats the right way to add another pool?
create pool with 4/2 and use the rule for the stretched mode, finished?
the exsisting pools were automaticly set to 4/2 after "ceph mon 
enable_stretch_mode".


It should be that simple. However, it does not seem to work. I tried to 
do just that, use two separate pools, hdd and ssd in that case, but it 
would not work, see this tracker: https://tracker.ceph.com/issues/64817


If your experience is different please update the tracker ticket. If it 
indeed does not work, please also update the tracker ticket with a "+1".


Thanks,

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Frédéric Nass
Exactly, strong consistency is why we chose Ceph over other SDS solutions back 
in 2014 (and disabled any non persistent cache along the IO path like HDD disk 
cache).
A major power outage in our town a few years back (a few days before Christmas) 
and a ups malfunction has proven us right.

Another reason to adopt Ceph today is that a cluster you build today to match a 
specific workload (lets say capacity) will accommodate any future workloads 
(for example performance) you may have tomorrow, simply by adding specific 
nodes to the cluster whatever the hardware will look like in decades.

Regards,
Frédéric.

- Le 23 Avr 24, à 13:04, Janne Johansson icepic...@gmail.com a écrit :

> Den tis 23 apr. 2024 kl 11:32 skrev Frédéric Nass
> :
>> Ceph is strongly consistent. Either you read/write objects/blocs/files with 
>> an
>> insured strong consistency OR you don't. Worst thing you can expect from 
>> Ceph,
>> as long as it's been properly designed, configured and operated is a 
>> temporary
>> loss of access to the data.
> 
> This is often more important than you think. All centralized storage
> systems will have to face some kind of latency when sending data over
> the network, when splitting the data into replicas or erasure coding
> shards, when waiting for all copies/shards are actually finished
> written (perhaps via journals) to the final destination and then
> lastly for the write to be acknowledged back to the writing client. If
> some vendor says that "because of our special code, this part takes
> zero time", they are basically telling you that they are lying about
> the status of the write in order to finish more quickly, because this
> wins them contracts or wins competitions.
> 
> It will not win you any smiles when there is an incident and data that
> was ACKed to be on disk suddenly isn't because some write cache lost
> power at the same time as the storage box and now some database have
> half-written transactions in it. Ceph is by no means the fastest
> possible way to store data on a network, but it is very good while
> still retaining the strong consistencies mentioned by Frederic above
> allowing for many clients to do many IOs in parallel against the
> cluster.
> 
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cache pressure?

2024-04-23 Thread Erich Weiler
So I'm trying to figure out ways to reduce the number of warnings I'm 
getting and I'm thinking about the one "client failing to respond to 
cache pressure".


Is there maybe a way to tell a client (or all clients) to reduce the 
amount of cache it uses or to release caches quickly?  Like, all the time?


I know the linux kernel (and maybe ceph) likes to cache everything for a 
while, and rightfully so, but I suspect in my use case it may be more 
efficient to more quickly purge the cache or to in general just cache 
way less overall...?


We have many thousands of threads all doing different things that are 
hitting our filesystem, so I suspect the caching isn't really doing me 
much good anyway due to the churn, and probably is causing more problems 
than it helping...


-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Status of IPv4 / IPv6 dual stack?

2024-04-23 Thread Anthony D'Atri
Sounds like an opportunity for you to submit an expansive code PR to implement 
it.

> On Apr 23, 2024, at 04:28, Marc  wrote:
> 
>> I have removed dual-stack-mode-related information from the documentation
>> on the assumption that dual-stack mode was planned but never fully
>> implemented.
>> 
>> See https://tracker.ceph.com/issues/65631.
>> 
>> See https://github.com/ceph/ceph/pull/57051.
>> 
>> Hat-tip to Dan van der Ster, who bumped this thread for me.
> 
> "I will remove references to dual-stack mode in the documentation because i"
> 
> I prefer if it would state that dual stack is not supported (and maybe why). 
> By default I would assume such a thing is supported. I would not even suspect 
> it could be an issue.
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror failed to query services: (13) Permission denied

2024-04-23 Thread Ilya Dryomov
On Mon, Apr 22, 2024 at 7:45 PM Stefan Kooman  wrote:
>
> Hi,
>
> We are testing rbd-mirroring. There seems to be a permission error with
> the rbd-mirror user. Using this user to query the mirror pool status gives:
>
> failed to query services: (13) Permission denied
>
> And results in the following output:
>
> health: UNKNOWN
> daemon health: UNKNOWN
> image health: OK
> images: 3 total
>  2 replaying
>  1 stopped
>
> So, this command: rbd --id rbd-mirror mirror pool status rbd

Hi Stefan,

What is the output of "ceph auth get client.rbd-mirror"?

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] which grafana version to use with 17.2.x ceph version

2024-04-23 Thread Osama Elswah
Hi,


in quay.io I can find a lot of grafana versions for ceph 
(https://quay.io/repository/ceph/grafana?tab=tags) how can I find out which 
version should be used when I upgrade my cluster to 17.2.x ? Can I simply take 
the latest grafana version? Or is there a specfic grafana version I need to use?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: which grafana version to use with 17.2.x ceph version

2024-04-23 Thread Adam King
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana
image in the quincy branch

On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah 
wrote:

> Hi,
>
>
> in quay.io I can find a lot of grafana versions for ceph (
> https://quay.io/repository/ceph/grafana?tab=tags) how can I find out
> which version should be used when I upgrade my cluster to 17.2.x ? Can I
> simply take the latest grafana version? Or is there a specfic grafana
> version I need to use?
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-23 Thread Maged Mokhtar



On 19/04/2024 11:02, Niklaus Hofer wrote:

Dear all

We have an HDD ceph cluster that could do with some more IOPS. One 
solution we are considering is installing NVMe SSDs into the storage 
nodes and using them as WAL- and/or DB devices for the Bluestore OSDs.


However, we have some questions about this and are looking for some 
guidance and advice.


The first one is about the expected benefits. Before we undergo the 
efforts involved in the transition, we are wondering if it is even 
worth it. How much of a performance boost one can expect when adding 
NVMe SSDs for WAL-devices to an HDD cluster? Plus, how much faster 
than that does it get with the DB also being on SSD. Are there 
rule-of-thumb number of that? Or maybe someone has done benchmarks in 
the past?


The second question is of more practical nature. Are there any 
best-practices on how to implement this? I was thinking we won't do 
one SSD per HDD - surely an NVMe SSD is plenty fast to handle the 
traffic from multiple OSDs. But what is a good ratio? Do I have one 
NVMe SSD per 4 HDDs? Per 6 or even 8? Also, how should I chop-up the 
SSD, using partitions or using LVM? Last but not least, if I have one 
SSD handle WAL and DB for multiple OSDs, losing that SSD means losing 
multiple OSDs. How do people deal with this risk? Is it generally 
deemed acceptable or is this something people tend to mitigate and if 
so how? Do I run multiple SSDs in RAID?


I do realize that for some of these, there might not be the one 
perfect answer that fits all use cases. I am looking for best 
practices and in general just trying to avoid any obvious mistakes.


Any advice is much appreciated.

Sincerely

Niklaus Hofer


Hi Niklaus,

i would recommend always having external wal/db on flash when using HDDs.
The impact depends on workload, but roughly you should see 2x better 
performance for mixed workloads. The impact will be higher if you have 
iops intensive load.
A client write operation will require a metadata read (if not cached) + 
the data write op to the HDD + metadata write + pg log write. HDDs are 
terrible with iops (100 to 200 iops), so moving the non data ops to a 
faster device makes a lot of sense.


There are also metadata iops involved during other operations like 
rocksdb compaction, object/snap deletions, scrubbing...that will benefit 
from moving those to a fast iops device. I have seen cases where 
scrubbing alone can load the HDDs.


Typically you will always use wal+db and not just wal on external device.
Using just wal will improve write latency but not iops, this could be if 
your load is bursty with small queue depth, like having a small number 
of client write operations compared to the total number of OSDs. But in 
vast majority this is not the case and practically/economically, it is a 
no brainer to use both wal+db.


For nvme:HDD ratio, yes you can go for 1:10, or if you have extra slots 
you can use 1:5 using smaller capacity/cheaper nvmes, this will reduce 
the impact of nvme failures.


/Maged
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-23 Thread Anthony D'Atri



> On Apr 23, 2024, at 12:24, Maged Mokhtar  wrote:
> 
> For nvme:HDD ratio, yes you can go for 1:10, or if you have extra slots you 
> can use 1:5 using smaller capacity/cheaper nvmes, this will reduce the impact 
> of nvme failures.

On occasion I've seen a suggestion to mirror the fast devices and use LVM 
slices for each OSD.  This might result in increased wear.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] List of bridges irc/slack/discord

2024-04-23 Thread Alvaro Soto
(Last update) -
https://github.com/orgs/opensource-latinamerica/discussions/3

~~~
Adding a few unofficial/unregistered Ceph IRC channels (cephadm, crimson)

IRC -> slack.oss.lat

OFTC: starlingx -> slack: starlingx
OFTC: openstack-latinamerica -> slack: stack-latinamerica
OFTC: openstack-freezer -> slack: stack-freezer
OFTC: ceph -> slack: ceph
OFTC: sepia -> slack: sepia
OFTC: cephfs -> slack: cephfs
OFTC: ceph-dashboard -> slack: ceph-dashboard
OFTC: ceph-devel -> slack: ceph-devel
OFTC:cephadm -> slack:cephadm
OFTC:crimson -> slack:crimson


IRC -> ceph-storage.slack.com

OFTC: ceph -> slack: ceph
OFTC: sepia -> slack: sepia
OFTC: cephfs -> slack: cephfs
OFTC: ceph-dashboard -> slack: ceph-dashboard
OFTC: ceph-devel -> slack: ceph-devel
OFTC:cephadm -> slack:cephadm
OFTC:crimson -> slack:crimson


IRC -> disc...@ceph.io

OFTC: ceph -> disc...@ceph.io: ceph
OFTC: sepia -> disc...@ceph.io: sepia
OFTC: cephfs -> disc...@ceph.io: cephfs
OFTC: ceph-dashboard -> disc...@ceph.io: ceph-dashboard
OFTC: ceph-devel -> disc...@ceph.io: ceph-devel
OFTC:cephadm -> disc...@ceph.io:cephadm
OFTC:crimson -> disc...@ceph.io:crimson


Ceph invite URL: https://discord.gg/vacj9cZSmm
~~~
Cheers!

-- 

Alvaro Soto

*Note: My work hours may not be your work hours. Please do not feel the
need to respond during a time that is not convenient for you.*
--
Great people talk about ideas,
ordinary people talk about things,
small people talk... about other people.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror failed to query services: (13) Permission denied

2024-04-23 Thread Stefan Kooman

On 23-04-2024 17:44, Ilya Dryomov wrote:

On Mon, Apr 22, 2024 at 7:45 PM Stefan Kooman  wrote:


Hi,

We are testing rbd-mirroring. There seems to be a permission error with
the rbd-mirror user. Using this user to query the mirror pool status gives:

failed to query services: (13) Permission denied

And results in the following output:

health: UNKNOWN
daemon health: UNKNOWN
image health: OK
images: 3 total
  2 replaying
  1 stopped

So, this command: rbd --id rbd-mirror mirror pool status rbd


Hi Stefan,

What is the output of "ceph auth get client.rbd-mirror"?


[client.rbd-mirror]
key = REDACTED
caps mon = "profile rbd-mirror"
caps osd = "profile rbd"

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Orchestrator not automating services / OSD issue

2024-04-23 Thread Michael Baer


Hi,

This problem started with trying to add a new storage server into a
quincy v17.2.6 ceph cluster. Whatever I did, I could not add the drives
on the new host as OSDs: via dashboard, via cephadm shell, by setting
osd unmanaged to false.

But what I started realizing is that orchestrator will also no longer
automatically manage services. I.e. if a service is set to manage by
labels, removing and adding labels to different hosts for that service
has no affect. Same if I set a service to be manage via hostnames. Same
if I try to drain a host (the services/podman containers just keep
running). Although, I am able to add/rm services via 'cephadm shell ceph
orch daemon add/rm'. But Ceph will not manage automatically using
labels/hostnames.

This apparently includes OSD daemons. I can not create and on the new
host either automatically or manually, but I'm hoping the services/OSD
issues are related and not two issues.

I haven't been able to find any obvious errors in /var/log/ceph,
/var/log/syslog, logs , etc. I have been able to get 'slow
ops' errors on monitors by trying to add OSDs manually (and having to
restart the monitor). I've also gotten cephadm shell to hang. And had to
restart managers. I'm not an expert and it could be something obvious,
but I haven't been able to figure out a solution. If anyone has any
suggestions, I would greatly appreciate them.

Thanks,
Mike

-- 
Michael Baer
c...@mikesoffice.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Latest Doco Out Of Date?

2024-04-23 Thread duluxoz

Hi Zac,

Any movement on this? We really need to come up with an answer/solution 
- thanks


Dulux-Oz

On 19/04/2024 18:03, duluxoz wrote:


Cool!

Thanks for that  :-)

On 19/04/2024 18:01, Zac Dover wrote:
I think I understand, after more thought. The second command is 
expected to work after the first.


I will ask the cephfs team when they wake up.

Zac Dover
Upstream Docs
Ceph Foundation


On Fri, Apr 19, 2024 at 17:51, duluxoz mailto:On 
Fri, Apr 19, 2024 at 17:51, duluxoz <> wrote:

Hi All,

In reference to this page from the Ceph documentation:
https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
that page it says that you can run the following commands:

~~~
ceph fs authorize a client.x /dir1 rw
ceph fs authorize a client.x /dir2 rw
~~~

This will allow `client.x` to access both `dir1` and `dir2`.

So, having a use case where we need to do this, we are, HOWEVER, getting
the following error on running the 2nd command on a Reef 18.2.2 cluster:

`Error EINVAL: client.x already has fs capabilities that differ from
those supplied. To generate a new auth key for client.x, first remove
client.x from configuration files, execute 'ceph auth rm client.x', then
execute this command again.`

Something we're doing wrong, or is the doco "out of date" (mind you,
that's from the "latest" version of the doco, and the "reef" version),
or is something else going on?

Thanks in advance for the help

Cheers

Dulux-Oz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Latest Doco Out Of Date?

2024-04-23 Thread Zac Dover
It's in my list of ongoing initiatives. I'll stay up late tonight and ask Venky 
directly what's going on in this instance.

Sometime later today, I'll create an issue tracking bug and I'll send it to you 
for review. Make sure that I haven't misrepresented this issue.

Zac

On Wednesday, April 24th, 2024 at 2:10 PM, duluxoz  wrote:

> Hi Zac,
>
> Any movement on this? We really need to come up with an answer/solution - 
> thanks
>
> Dulux-Oz
>
> On 19/04/2024 18:03, duluxoz wrote:
>
>> Cool!
>>
>> Thanks for that :-)
>>
>> On 19/04/2024 18:01, Zac Dover wrote:
>>
>>> I think I understand, after more thought. The second command is expected to 
>>> work after the first.
>>>
>>> I will ask the cephfs team when they wake up.
>>>
>>> Zac Dover
>>> Upstream Docs
>>> Ceph Foundation
>>>
>>> On Fri, Apr 19, 2024 at 17:51, duluxoz <[dulu...@gmail.com](mailto:On Fri, 
>>> Apr 19, 2024 at 17:51, duluxoz < wrote:
>>>
 Hi All,

 In reference to this page from the Ceph documentation:
 https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of
 that page it says that you can run the following commands:

 ~~~
 ceph fs authorize a client.x /dir1 rw
 ceph fs authorize a client.x /dir2 rw
 ~~~

 This will allow `client.x` to access both `dir1` and `dir2`.

 So, having a use case where we need to do this, we are, HOWEVER, getting
 the following error on running the 2nd command on a Reef 18.2.2 cluster:

 `Error EINVAL: client.x already has fs capabilities that differ from
 those supplied. To generate a new auth key for client.x, first remove
 client.x from configuration files, execute 'ceph auth rm client.x', then
 execute this command again.`

 Something we're doing wrong, or is the doco "out of date" (mind you,
 that's from the "latest" version of the doco, and the "reef" version),
 or is something else going on?

 Thanks in advance for the help

 Cheers

 Dulux-Oz

 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Orchestrator not automating services / OSD issue

2024-04-23 Thread Frédéric Nass
Hello Michael,

You can try this:

1/ check that the host shows up on ceph orch ls with the right label 'osds'
2/ check that the host is OK with ceph cephadm check-host . It should 
look like:
 (None) ok
podman (/usr/bin/podman) version 4.6.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Hostname "" matches what is expected.
Host looks OK
3/ double check you service_type 'osd' with ceph orch ls --service-type osd 
--export
It should show the correct placement and spec (drives size, etc.)
4/ enable debugging with ceph config set mgr mgr/cephadm/log_to_cluster_level 
debug
5/ open a terminal and observe ceph -W cephadm --watch-debug
6/ ceph mgr fail
7/ ceph orch device ls --hostname= --wide --refresh (should show 
local bloc devices as Available and trigger the creation of the OSDs)

If your service_type 'osd' is correct, the orchestrator should deploy OSDs on 
the node.
If it does not then look for the reason why in ceph -W cephadm --watch-debug 
output.

Regards,
Frédéric.

- Le 24 Avr 24, à 3:22, Michael Baer c...@mikesoffice.com a écrit :

> Hi,
> 
> This problem started with trying to add a new storage server into a
> quincy v17.2.6 ceph cluster. Whatever I did, I could not add the drives
> on the new host as OSDs: via dashboard, via cephadm shell, by setting
> osd unmanaged to false.
> 
> But what I started realizing is that orchestrator will also no longer
> automatically manage services. I.e. if a service is set to manage by
> labels, removing and adding labels to different hosts for that service
> has no affect. Same if I set a service to be manage via hostnames. Same
> if I try to drain a host (the services/podman containers just keep
> running). Although, I am able to add/rm services via 'cephadm shell ceph
> orch daemon add/rm'. But Ceph will not manage automatically using
> labels/hostnames.
> 
> This apparently includes OSD daemons. I can not create and on the new
> host either automatically or manually, but I'm hoping the services/OSD
> issues are related and not two issues.
> 
> I haven't been able to find any obvious errors in /var/log/ceph,
> /var/log/syslog, logs , etc. I have been able to get 'slow
> ops' errors on monitors by trying to add OSDs manually (and having to
> restart the monitor). I've also gotten cephadm shell to hang. And had to
> restart managers. I'm not an expert and it could be something obvious,
> but I haven't been able to figure out a solution. If anyone has any
> suggestions, I would greatly appreciate them.
> 
> Thanks,
> Mike
> 
> --
> Michael Baer
> c...@mikesoffice.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io