from:"David C."

[ceph-users] Re: [SPAM] Re: Ceph crash :-(

2024-06-13 Thread David C.

gt; personal desktop, but on servers where I keep data I’m doing it. > > but what canonical did in this case is… this is LTS version :/ > > > > > > BR, > > Sebastian > > > > > >> On 13 Jun 2024, at 19:47, David C. wrote: > >> > >

[ceph-users] Re: Ceph crash :-(

2024-06-13 Thread David C.

In addition to Robert's recommendations, Remember to respect the update order (mgr->mon->(crash->)osd->mds->...) Before everything was containerized, it was not recommended to have different services on the same machine. Le jeu. 13 juin 2024 à 19:37, Robert Sander a écrit : > On 13.06.24 18:

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.

Hi Pablo, Could you tell us a little more about how that happened? Do you have a min_size >= 2 (or E/C equivalent) ? Cordialement, *David CASIER* Le lun. 17 juin 2024 à 16:26, c

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.

ools min_size. >> > >> > If it is an EC setup, it might be quite a bit more painful, depending >> on what happened to the dead OSDs and whether they are at all recoverable. >> > >> > >> > Matthi

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.

>> Matthias Grandl >>>> Head Storage Engineer >>>> matthias.gra...@croit.io >>>> >>>> > On 17. Jun 2024, at 16:56, Matthias Grandl >>>> wrote: >>&

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.

In Pablo's unfortunate incident, it was because of a SAN incident, so it's possible that Replica 3 didn't save him. In this scenario, the architecture is more the origin of the incident than the number of replicas. It seems to me that replica 3 exists, by default, since firefly => make replica 2,

[ceph-users] Re: wrong public_ip after blackout / poweroutage

2024-06-21 Thread David C.

Hi, This type of incident is often resolved by setting the public_network option to the "global" scope, in the configuration: ceph config set global public_network a:b:c:d::/64 Le ven. 21 juin 2024 à 03:36, Eugen Block a écrit : > Hi, > > this only a theory, not a proven answer or something.

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-16 Thread David C.

Hi Albert, I think it's related to your network change. Can you send me the return of "ceph report" ? Le mar. 16 juil. 2024 à 14:34, Albert Shih a écrit : > Hi everyone > > My cluster ceph run currently 18.2.2 and ceph -s say everything are OK > > root@cthulhu1:/var/lib/ceph/crash# ceph -s >

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-16 Thread David C.

h a écrit : > Le 16/07/2024 à 15:04:05+0200, David C. a écrit > Hi, > > > > > I think it's related to your network change. > > I though about it but in that case why the old (and before upgrade) server > works ? > > > Can you send me the return of "

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-17 Thread David C.

k. > > > > However, strangely, the osd and mds did not activate msgr v2 (msgr v2 was > > activated on mon). > > > > It is possible to bypass by adding the "ms_mode=legacy" option but you > need > > to find out why msgr v2 is not activated > > &

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-17 Thread David C.

Hi, It would seem that the order of declaration of mons addresses (v2 then v1 and not the other way around) is important. Albert restarted all services after this modification and everything is back to normal Le mer. 17 juil. 2024 à 09:40, David C. a écrit : > Hi Frédéric, > > The

[ceph-users] Re: Small issue with perms

2024-07-18 Thread David C.

Hi Albert, perhaps a conflict with the udev rules of locally installed packages. Try uninstalling ceph-* Le jeu. 18 juil. 2024 à 09:57, Albert Shih a écrit : > Hi everyone. > > After my upgrade from 17.2.7 to 18.2.2 I notice after each time I restart I > got a issue with perm on > > /var/lib

[ceph-users] Re: Small issue with perms

2024-07-18 Thread David C.

98 29* Le jeu. 18 juil. 2024 à 10:34, Albert Shih a écrit : > Le 18/07/2024 à 10:27:09+0200, David C. a écrit > Hi, > > > > > perhaps a conflict with the udev rules of locally installed packages. > > > > Try uninstalling ceph-* > > Sorry...not sure I

[ceph-users] Re: Small issue with perms

2024-07-18 Thread David C.

h a écrit > > Le 18/07/2024 à 10:56:33+0200, David C. a écrit > > > Hi, > > > > > > Your ceph processes are in containers. > > > > Yes I know but in my install process I just install > > > > ceph-common > > ceph-base > > > >

[ceph-users] Re: Small issue with perms

2024-07-18 Thread David C.

Thanks Christian, I see the fix is on the postinst, so probably the reboot shouldn't put "nobody" back, right? Le jeu. 18 juil. 2024 à 11:44, Christian Rohmann < christian.rohm...@inovex.de> a écrit : > On 18.07.24 9:56 AM, Albert Shih wrote: > >Error scraping /var/lib/ceph/crash: [Errno 13

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-18 Thread David C.

ig > so it's never updated with the new mon addresses. This change is to > have us recreate the OSD config when we redeploy or reconfig an OSD > so it gets the new mon addresses." > > You mentioned a network change. Maybe the orch failed to update > /var/lib/ceph/$(ceph fsi

[ceph-users] Urgent help needed please - MDS offline

2020-10-22 Thread David C

Hi All My main CephFS data pool on a Luminous 12.2.10 cluster hit capacity overnight, metadata is on a separate pool which didn't hit capacity but the filesystem stopped working which I'd expect. I increased the osd full-ratio to give me some breathing room to get some data deleted once the filesy

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread David C

s it doing anything? > Is it using lots of CPU/RAM? If you increase debug_mds do you see some > progress? > > -- dan > > > On Thu, Oct 22, 2020 at 2:01 PM David C wrote: > > > > Hi All > > > > My main CephFS data pool on a Luminous 12.2.10 clust

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread David C

t; > -- dan > > > > > > > > -- dan > > On Thu, Oct 22, 2020 at 3:35 PM David C wrote: > > > > Dan, many thanks for the response. > > > > I was going down the route of looking at mds_beacon_grace but I now > > realise when I start my MDS,

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread David C

> manifested on a multi-mds cluster, so I am not sure if it is the root > cause here https://tracker.ceph.com/issues/45090 ) > I don't know enough about the changelog diffs to suggest upgrading > right now in the middle of this outage. > > > -- dan > > On Thu, Oct

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread David C

_______ > From: Dan van der Ster > Sent: 22 October 2020 18:11:57 > To: David C > Cc: ceph-devel; ceph-users > Subject: [ceph-users] Re: Urgent help needed please - MDS offline > > I assume you aren't able to quickly double the RAM on thi

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-22 Thread David C

On Thu, Oct 22, 2020 at 6:09 PM Dan van der Ster wrote: > > > > On Thu, 22 Oct 2020, 19:03 David C, wrote: >> >> Thanks, guys >> >> I can't add more RAM right now or have access to a server that does, >> I'd fear it wouldn't be enough

[ceph-users] Re: Urgent help needed please - MDS offline

2020-10-23 Thread David C

Success! I remembered I had a server I'd taken out of the cluster to investigate some issues, that had some good quality 800GB Intel DC SSDs, dedicated an entire drive to swap, tuned up min_free_kbytes, added an MDS to that server and let it run. Took 3 - 4 hours but eventually came back online. I

[ceph-users] Re: fixing future rctime

2023-10-20 Thread David C.

Someone correct me if I'm saying something stupid but from what I see in the code, there is a check each time to make sure rctime doesn't go back. Which seems logical to me because otherwise you would have to go through all the children to determine the correct ctime. I don't have the impression t

[ceph-users] Re: fixing future rctime

2023-10-20 Thread David C.

. 20 oct. 2023 à 13:08, David C. a écrit : > Someone correct me if I'm saying something stupid but from what I see in > the code, there is a check each time to make sure rctime doesn't go back. > Which seems logical to me because otherwise you would have to go through &g

[ceph-users] Re: Quincy: failure to enable mgr rgw module if not --force

2023-10-24 Thread David C.

Hi Michel, (I'm just discovering the existence of this module, so it's possible I'm making mistakes) The rgw module is new and only seems to be there to configure multisite. It is present on the v17.2.6 branch but I don't see it in the container for this version. In any case, if you're not usin

[ceph-users] Re: Quincy: failure to enable mgr rgw module if not --force

2023-10-24 Thread David C.

ar. 24 oct. 2023 à 18:11, David C. a écrit : > Hi Michel, > > (I'm just discovering the existence of this module, so it's possible I'm > making mistakes) > > The rgw module is new and only seems to be there to configure multisite. > > It is present on the v17.2.6

[ceph-users] Re: radosgw - octopus - 500 Bad file descriptor on upload

2023-10-25 Thread David C.

Hi Hubert, It's an error "125" (ECANCELED) (and there may be many reasons for it). I see a high latency (144sec), is the object big ? No network problems ? Cordialement, *David CASIER* ___

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.

Hi Mohamed, I understand there's one operational monitor, isn't there? If so, you need to reprovision the other monitors on an empty base so that they synchronize with the only remaining monitor. Cordialement, *David CASIER* _

[ceph-users] Re: Emergency, I lost 4 monitors but all osd disk are safe

2023-11-02 Thread David C.

Hi, I've just checked with the team and the situation is much more serious than it seems: the lost disks contained the MONs AND OSDs databases (5 servers down out of 8, replica 3). It seems that the team fell victim to a bad batch of Samsung 980 Pros (I'm not a big fan of this "Pro" range, but th

[ceph-users] Re: Ceph dashboard reports CephNodeNetworkPacketErrors

2023-11-07 Thread David C.

Hi Dominique, The consistency of the data should not be at risk with such a problem. But on the other hand, it's better to solve the network problem. Perhaps look at the state of bond0 : cat /proc/net/bonding/bond0 As well as the usual network checks __

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.

Hi, It seems to me that before removing buckets from the crushmap, it is necessary to do the migration first. I think you should restore the initial crushmap by adding the default root next to it and only then do the migration. There should be some backfill (probably a lot). __

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.

11:50, David C. a écrit : > Hi, > > It seems to me that before removing buckets from the crushmap, it is > necessary to do the migration first. > I think you should restore the initial crushmap by adding the default root > next to it and only then do the migration. > There should

[ceph-users] Re: 100.00 Usage for ssd-pool (maybe after: ceph osd crush move .. root=default)

2023-11-08 Thread David C.

so the next step is to place the pools on the right rule : ceph osd pool set db-pool crush_rule fc-r02-ssd Le mer. 8 nov. 2023 à 12:04, Denny Fuchs a écrit : > hi, > > I've forget to write the command, I've used: > > = > ceph osd crush move fc-r02-ceph-osd-01 root=default > ceph osd crush

[ceph-users] Re: HDD cache

2023-11-08 Thread David C.

Without (raid/jbod) controller ? Le mer. 8 nov. 2023 à 18:36, Peter a écrit : > Hi All, > > I note that HDD cluster commit delay improves after i turn off HDD cache. > However, i also note that not all HDDs are able to turn off the cache. > special I found that two HDD with same model number, on

[ceph-users] Re: Crush map & rule

2023-11-08 Thread David C.

Hi Albert, What would be the number of replicas (in total and on each row) and their distribution on the tree ? Le mer. 8 nov. 2023 à 18:45, Albert Shih a écrit : > Hi everyone, > > I'm totally newbie with ceph, so sorry if I'm asking some stupid question. > > I'm trying to understand how the

[ceph-users] Re: Crush map & rule

2023-11-09 Thread David C.

ossible on this architecture. Cordialement, *David CASIER* Le jeu. 9 nov. 2023 à 08:48, Albert Shih a écrit : > Le 08/11/2023 à 19:29:19+0100, David C. a écrit > Hi David

[ceph-users] Re: IO stalls when primary OSD device blocks in 17.2.6

2023-11-10 Thread David C.

Hi Daniel, it's perfectly normal for a PG to freeze when the primary osd is not stable. It can sometimes happen that the disk fails but doesn't immediately send back I/O errors (which crash the osd). When the OSD is stopped, there's a 5-minute delay before it goes down in the crushmap. Le ve

[ceph-users] Re: Problem while upgrade 17.2.6 to 17.2.7

2023-11-14 Thread David C.

Hi Jean Marc, maybe look at this parameter "rgw_enable_apis", if the values you have correspond to the default (need rgw restart) : https://docs.ceph.com/en/quincy/radosgw/config-ref/#confval-rgw_enable_apis ceph config get client.rgw rgw_enable_apis _

[ceph-users] Re: per-rbd snapshot limitation

2023-11-15 Thread David C.

rbd create testpool/test3 --size=100M rbd snap limit set testpool/test3 --limit 3 Le mer. 15 nov. 2023 à 17:58, Wesley Dillingham a écrit : > looking into how to limit snapshots at the ceph level for RBD snapshots. > Ideally ceph would enforce an arbitrary number of snapshots allowable per > rb

[ceph-users] Re: per-rbd snapshot limitation

2023-11-15 Thread David C.

t for each rbd? > > Respectfully, > > *Wes Dillingham* > w...@wesdillingham.com > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Wed, Nov 15, 2023 at 1:14 PM David C. wrote: > >> rbd create testpool/test3 --size=100M >> rbd snap limit set

[ceph-users] Re: How to use hardware

2023-11-17 Thread David C.

Hi Albert , 5 instead of 3 mon will allow you to limit the impact if you break a mon (for example, with the file system full) 5 instead of 3 MDS, this makes sense if the workload can be distributed over several trees in your file system. Sometimes it can also make sense to have several FSs in ord

[ceph-users] Re: Problem while upgrade 17.2.6 to 17.2.7

2023-11-17 Thread David C.

Le ven. 17 nov. 2023 à 11:22, Jean-Marc FONTANA a écrit : > Hello, everyone, > > There's nothing cephadm.log in /var/log/ceph. > > To get something else, we tried what David C. proposed (thanks to him !!) > and found: > > nov. 17 10:53:54 svtcephmonv3 ceph-mgr[727]:

[ceph-users] Re: cephadm user on cephadm rpm package

2023-11-17 Thread David C.

Hi, You can use the cephadm account (instead of root) to control machines with the orchestrator. Le ven. 17 nov. 2023 à 13:30, Luis Domingues a écrit : > Hi, > > I noticed when installing the cephadm rpm package, to bootstrap a cluster > for example, that a user cephadm was created. But I do n

[ceph-users] Re: cephadm user on cephadm rpm package

2023-11-17 Thread David C.

figure out how to enable cephadm's access to the > machines. > > Anyway, thanks for your reply. > > Luis Domingues > Proton AG > > > On Friday, 17 November 2023 at 13:55, David C. > wrote: > > > > Hi, > > > > You can use the cephadm account (i

[ceph-users] Re: How to use hardware

2023-11-18 Thread David C.

Hello Albert, 5 vs 3 MON => you won't notice any difference 5 vs 3 MGR => by default, only 1 will be active Le sam. 18 nov. 2023 à 09:28, Albert Shih a écrit : > Le 17/11/2023 à 11:23:49+0100, David C. a écrit > > Hi, > > > > > 5 instead of 3 mon will a

[ceph-users] Re: Issue with CephFS (mds stuck in clientreplay status) since upgrade to 18.2.0.

2023-11-27 Thread David C.

Hi Guiseppe, Wouldn't you have clients who heavily load the MDS with concurrent access on the same trees ? Perhaps, also, look at the stability of all your clients (even if there are many) [dmesg -T, ...] How are your 4 active MDS configured (pinning?) ? Probably nothing to do but normal for 2

[ceph-users] Re: How to identify the index pool real usage?

2023-12-01 Thread David C.

Hi, It looks like a trim/discard problem. I would try my luck by activating the discard on a disk, to validate. I have no feedback on the reliability of the bdev_*_discard parameters. Maybe dig a little deeper into the subject or if anyone has any feedback... ___

[ceph-users] Re: How to identify the index pool real usage?

2023-12-04 Thread David C.

* Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) a écrit : > With the nodes that has some free space on that namespace, we don't have > issue, only with this which is weird. > -- > *From:* Anthony D'Atri > *Sent:* Friday, December 1, 2023

[ceph-users] Re: How to identify the index pool real usage?

2023-12-04 Thread David C.

v_async_discard": "false", > "bdev_enable_discard": "false", > > > > Istvan Szabo > Staff Infrastructure Engineer > --- > Agoda Services Co., Ltd. > e: istvan.sz...@agoda.com > -------

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.

Hi Matthew, To make a simplistic comparison, it is generally not recommended to raid 5 with large disks (>1 TB) due to the probability (low but not zero) of losing another disk during the rebuild. So imagine losing a host full of disks. Additionally, min_size=1 means you can no longer maintain yo

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.

Hi, To return to my comparison with SANs, on a SAN you have spare disks to repair a failed disk. On Ceph, you therefore need at least one more host (k+m+1). If we take into consideration the formalities/delivery times of a new server, k+m+2 is not luxury (Depending on the growth of your volume).

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.

in the point of view of > global storage capacity. > > Patrick > > Le 05/12/2023 à 12:19, David C. a écrit : > > Hi, > > To return to my comparison with SANs, on a SAN you have spare disks to > repair a failed disk. > > On Ceph, you therefore need at least o

[ceph-users] Re: Osd full

2023-12-11 Thread David C.

Hi Mohamed, Changing weights is no longer a good practice. The balancer is supposed to do the job. The number of pg per osd is really tight on your infrastructure. Can you display the ceph osd tree command? Cordialement, *David CASIER*

[ceph-users] Re: FS down - mds degraded

2023-12-20 Thread David C.

Hi Sake, I would start by decrementing max_mds by 1: ceph fs set atlassian-prod max_mds 2 The mds.1 no longer restarts? logs? Le jeu. 21 déc. 2023 à 08:11, Sake Ceph a écrit : > Starting a new thread, forgot subject in the previous. > So our FS down. Got the following error, what can I do?

[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.

Hello Nicolas, I don't know if it's an update issue. If this is not a problem for you, you can consider redeploying grafana/prometheus. It is also possible to inject your own certificates : https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example https://github.com/ceph/ceph/blob/m

[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.

function to create this certificate inside the Key > store but how ... that's the point :-( > > Regards. > > > > Le mar. 23 janv. 2024 à 15:52, David C. a écrit : > >> Hello Nicolas, >> >> I don't know if it's an update issue. >>

[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.

orError: unknown daemon type > node-exporter > > Tried to remove & recreate service : it's the same ... how to stop the > rotation now :-/ > > > > Le mar. 23 janv. 2024 à 17:18, David C. a écrit : >

[ceph-users] Re: How many pool for cephfs

2024-01-24 Thread David C.

Hi Albert, In this scenario, it is more consistent to work with subvolumes. Regarding security, you can use namespaces to isolate access at the OSD level. What Robert emphasizes is that creating pools dynamically is not without effect on the number of PGs and (therefore) on the architecture (PG

[ceph-users] Re: Questions about the CRUSH details

2024-01-24 Thread David C.

Hi, The client calculates the location (PG) of an object from its name and the crushmap. This is what makes it possible to parallelize the flows directly from the client. The client also has the map of the PGs which are relocated to other OSDs (upmap, temp, etc.) _

[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.

Albert, Never used EC for (root) data pool. Le jeu. 25 janv. 2024 à 12:08, Albert Shih a écrit : > Le 25/01/2024 à 08:42:19+, Eugen Block a écrit > > Hi, > > > > it's really as easy as it sounds (fresh test cluster on 18.2.1 without > any > > pools yet): > > > > ceph:~ # ceph fs volume creat

[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.

t; override. > > ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data --force > new fs with metadata pool 6 and data pool 8 > > CC'ing Zac here to hopefully clear that up. > > Zitat von "David C." : > > > Albert, > > Never used EC for (root) data pool

[ceph-users] Re: Stupid question about ceph fs volume

2024-01-25 Thread David C.

; then this should definitely be in the docs as a warning for EC pools > in cephfs! > > Zitat von "David C." : > > > In case the root is EC, it is likely that is not possible to apply the > > disaster recovery procedure,

[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-25 Thread David C.

20e87354373b0fac > > This example shows that it's impossible to get any metrics in an IPv6 only > network (Discovery is impossible) and it's visible at install so there's no > test for IPv6 only environnement before release ? > > Now I'm seriously

[ceph-users] Re: How check local network

2024-01-29 Thread David C.

Hello Albert, this should return you the sockets used on the network cluster : ceph report | jq '.osdmap.osds[] | .cluster_addrs.addrvec[] | .addr' Cordialement, *David CASIER*

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread David C.

Hi, The problem seems to come from the clients (reconnect). Test by disabling metrics on all clients: echo Y > /sys/module/ceph/parameters/disable_send_metrics Cordialement, *David CASIER*

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-23 Thread David C.

look at ALL cephfs kernel clients (no effect on RGW) Le ven. 23 févr. 2024 à 16:38, a écrit : > And we dont have parameter folder > > cd /sys/module/ceph/ > [root@cephgw01 ceph]# ls > coresize holders initsize initstate notes refcnt rhelversion > sections srcversion taint uevent > > My

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.

Do you have the possibility to stop/unmount cephfs clients ? If so, do that and restart the MDS. It should restart. Have the clients restart one by one and check that the MDS does not crash (by monitoring the logs) Cordialement, *David C

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-02-24 Thread David C.

if rebalancing tasks have been launched it's not a big deal, but I don't think it's the priority. The priority being to get the MDS back on its feet. I haven't seen an answer to this question: can you stop/unmount cephfs clients or not ? There are other solutions but as you are not comfortable I a

[ceph-users] Re: Seperate metadata pool in 3x MDS node

2024-02-24 Thread David C.

Hello, Each rack works on different trees or is everything parallelized ? The meta pools would be distributed over racks 1,2,4,5 ? If it is distributed, even if the addressed MDS is on the same switch as the client, you will always have this MDS which will consult/write (nvme) OSDs on the other ra

[ceph-users] Re: has anyone enabled bdev_enable_discard?

2024-03-02 Thread David C.

I came across an enterprise NVMe used for BlueFS DB whose performance dropped sharply after a few months of delivery (I won't mention the brand here but it was not among these 3: Intel, Samsung, Micron). It is clear that enabling bdev_enable_discard impacted performance, but this option also saved

[ceph-users] Re: has anyone enabled bdev_enable_discard?

2024-03-02 Thread David C.

user IO. Keep an eye on your > discards being sent to devices and the discard latency, as well (via > node_exporter, for example). > > Matt > > > On 2024-03-02 06:18, David C. wrote: > > I came across an enterprise NVMe used for BlueFS DB whose performance > > dr

[ceph-users] All MGR loop crash

2024-03-07 Thread David C.

Hello everybody, I'm encountering strange behavior on an infrastructure (it's pre-production but it's very ugly). After a "drain" on monitor (and a manager). MGRs all crash on startup: Mar 07 17:06:47 pprod-mon1 ceph-mgr[564045]: mgr ms_dispatch2 standby mgrmap(e 1310) v1 Mar 07 17:06:47 pprod-mo

[ceph-users] Re: All MGR loop crash

2024-03-07 Thread David C.

I took the wrong ligne => https://github.com/ceph/ceph/blob/v17.2.6/src/mon/MonClient.cc#L822 Le jeu. 7 mars 2024 à 18:21, David C. a écrit : > > Hello everybody, > > I'm encountering strange behavior on an infrastructure (it's > pre-production but it's very

[ceph-users] Re: All MGR loop crash

2024-03-07 Thread David C.

;name": "pprod-mon3", "weight": 10, "name": "pprod-osd2", "weight": 0, "name": "pprod-osd1", "weight": 0, "name": "pprod-osd

[ceph-users] All MGR loop crash

2024-03-07 Thread David C.

> mon weight myself, do you know how that happened? > > Zitat von "David C." : > > Ok, got it : >> >> [root@pprod-admin:/var/lib/ceph/]# ceph mon dump -f json-pretty >> |egrep "name|weigh" >> dumped monmap epoch 14 >>

[ceph-users] Re: Erasure Code with Autoscaler and Backfill_toofull

2024-03-27 Thread David C.

Hi Daniel, Changing pg_num when some OSD is almost full is not a good strategy (or even dangerous). What is causing this backfilling? loss of an OSD? balancer? other ? What is the least busy OSD level (sort -nrk17) Is the balancer activated? (upmap?) Once the situation stabilizes, it becomes i

[ceph-users] Re: Impact of Slow OPS?

2024-04-06 Thread David C.

Hi, Do slow ops impact data integrity => No Can I generally ignore it => No :) This means that some client transactions are blocked for 120 sec (that's a lot). This could be a lock on the client side (CephFS, essentially), an incident on the infrastructure side (a disk about to fall, network inst

[ceph-users] Re: CephFS performance

2022-11-22 Thread David C

My understanding is BeeGFS doesn't offer data redundancy by default, you have to configure mirroring. You've not said how your Ceph cluster is configured but my guess is you have the recommended 3x replication - I wouldn't be surprised if BeeGFS was much faster than Ceph in this case. I'd be intere

[ceph-users] Ceph upgrade advice - Luminous to Pacific with OS upgrade

2022-12-06 Thread David C

Hi All I'm planning to upgrade a Luminous 12.2.10 cluster to Pacific 16.2.10, cluster is primarily used for CephFS, mix of Filestore and Bluestore OSDs, mons/osds collocated, running on CentOS 7 nodes My proposed upgrade path is: Upgrade to Nautilus 14.2.22 -> Upgrade to EL8 on the nodes (probabl

[ceph-users] Re: [SPAM] Ceph upgrade advice - Luminous to Pacific with OS upgrade

2022-12-06 Thread David C

o 8 without a reinstall. Rocky has a > similar path I think. > > - you will need to love those filestore OSD’s to Bluestore before hitting > Pacific, might even be part of the Nautilus upgrade. This takes some time > if I remember correctly. > > - You may need to upgrade monito

[ceph-users] Re: [SPAM] Ceph upgrade advice - Luminous to Pacific with OS upgrade

2022-12-06 Thread David C

> > I don't think this is necessary. It _is_ necessary to convert all > leveldb to rocksdb before upgrading to Pacific, on both mons and any > filestore OSDs. Thanks, Josh, I guess that explains why some people had issues with Filestore OSDs post Pacific upgrade On Tue, Dec 6, 2022 at 4:07 PM Jo

[ceph-users] Re: Impacts on doubling the size of pgs in a rbd pool?

2023-10-03 Thread David C.

Hi, Michel, the pool already appears to be in automatic autoscale ("autoscale_mode on"). If you're worried (if, for example, the platform is having trouble handling a large data shift) then you can set the parameter to warn (like the rjenkis pool). If not, as Hervé says, the transition to 2048

[ceph-users] Re: Dump/Add users yaml/json

2024-12-03 Thread David C.

Hi Albert, (open question, without judgment) What is the purpose of importing users recurrently ? It seems to me that import is the complement of export, to restore. Creating in ceph and exporting (possibly) in json format is not enough ? Le mar. 3 déc. 2024 à 13:29, Albert Shih a écrit : > Le

[ceph-users] Re: Dump/Add users yaml/json

2024-12-04 Thread David C.

Hi, In this case, the tool that adds the account should perform a caps check (for security reasons) and probably use get-or-create/caps (not import) Le mer. 4 déc. 2024 à 10:42, Albert Shih a écrit : > Le 03/12/2024 à 18:27:57+0100, David C. a écrit > Hi, > > > > > (o

85 matches

Mail list logo