date:20210322

[ceph-users] Re: New Issue - Mapping Block Devices

2021-03-22 Thread Ilya Dryomov

On Tue, Mar 23, 2021 at 6:13 AM duluxoz wrote: > > Hi All, > > I've got a new issue (hopefully this one will be the last). > > I have a working Ceph (Octopus) cluster with a replicated pool > (my-pool), an erasure-coded pool (my-pool-data), and an image (my-image) > created - all *seems* to be wor

[ceph-users] New Issue - Mapping Block Devices

2021-03-22 Thread duluxoz

Hi All, I've got a new issue (hopefully this one will be the last). I have a working Ceph (Octopus) cluster with a replicated pool (my-pool), an erasure-coded pool (my-pool-data), and an image (my-image) created - all *seems* to be working correctly. I also have the correct Keyring specified

[ceph-users] How to know which client hold the lock of a file

2021-03-22 Thread Norman.Kern

Hi, Anyone knows how to know which client hold lock of a file in Ceph fs? I met a dead lock problem that a client holding on get the lock, but I don't kown which client held it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an e

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

Hi Dan: Aha - I think the first commit is probably it - before that commit, the fact that lo is highest in the interfaces enumeration didn't matter for us [since it would always be skipped]. This actually almost certainly also is associated with that other site with a similar problem (OSDs drop o

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster

There are two commits between 14.2.16 and 14.2.18 related to loopback network. Perhaps one of these is responsible for your issue [1]. I'd try playing with the options like cluster/public bind addr and cluster/public bind interface until you can convince the osd to bind to the correct listening IP

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

I don't think we explicitly set any ms settings in the OSD host ceph.conf [all the OSDs ceph.confs are identical across the entire cluster]. ip a gives: ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 1

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster

Which `ms` settings do you have in the OSD host's ceph.conf or the ceph config dump? And how does `ip a` look on one of these hosts where the osd is registering itself as 127.0.0.1? You might as well set nodown again now. This will make ops pile up, but that's the least of your concerns at the m

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

Hm, yes it does [and I was wondering why loopbacks were showing up suddenly in the logs]. This wasn't happening with 14.2.16 so what's changed about how we specify stuff? This might correlate with the other person on the IRC list who has problems with 14.2.18 and their OSDs deciding they don't wor

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster

What's with the OSDs having loopback addresses? E.g. v2: 127.0.0.1:6881/17664667,v1:127.0.0.1:6882/17664667 Does `ceph osd dump` show those same loopback addresses for each OSD? This sounds familiar... I'm trying to find the recent ticket. .. dan On Mon, Mar 22, 2021, 6:07 PM Sam Skipsey wrot

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

hi Dan: So, unsetting nodown results in... almost all of the OSDs being marked down. (231 down out of 328). Checking the actual OSD services, most of them were actually up and active on the nodes, even when the mons had marked them down. (On a few nodes, the down services corresponded to OSDs that

[ceph-users] DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC

2021-03-22 Thread John Zachary Dover

There will be a DocuBetter meeting on Thursday, 25 Mar 2021 at 0100 UTC. We will discuss the Google Season of Docs proposal (the Comprehensive Contribution Guide), the rewriting of the cephadm documentation and the new sectin of the Teuthology Guide. DocuBetter Meeting -- APAC 25 Mar 2021 0100 UT

[ceph-users] March 2021 Tech Talk and Code Walk-through

2021-03-22 Thread Mike Perez

Hi everyone! I'm excited to announce two talks we have on the schedule for March 2021: Persistent Bucket Notifications By Yuval Lifshitz https://ceph.io/ceph-tech-talks/ The stream starts on March 25th at 17:00 UTC / 18:00 CET / 1:00 PM EST / 10:00 AM PST Persistent bucket notifications are go

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster

Hi, I would unset nodown (hiding osd failures) and norecover (blcoking PGs from recovering degraded objects), then start starting osds. As soon as you have some osd logs reporting some failures, then share those... - Dan On Mon, Mar 22, 2021 at 3:49 PM Sam Skipsey wrote: > > So, we started the

[ceph-users] Device class not deleted/set correctly

2021-03-22 Thread Nico Schottelius

Hello, follow up from my mail from 2020 [0], it seems that OSDs sometimes have "multiple classes" assigned: [15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush rm-device-class osd.4 done removing class of osd(s): 4 [15:47:17] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd cru

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

So, we started the mons and mgr up again, and here's the relevant logs, including also ceph versions. We've also turned off all of the firewalls on all of the nodes so we know that there can't be network issues [and, indeed, all of our management of the OSDs happens via logins from the service node

[ceph-users] Ceph User Survey Working Group - Next Steps

2021-03-22 Thread Mike Perez

Hi everyone, We are approaching the April 2nd deadline in two weeks, so we should start proposing the next meeting to plan the survey results. Anybody in the community is welcome to join the Ceph Working Groups. Please add your name to: https://ceph.io/user-survey/ I have started a doodle: https

[ceph-users] how to disable write-back mode in ceph octopus

2021-03-22 Thread 无名万剑归宗

I tried cache tier in write-back mode in my cluster, but because my ssd drive is home used, can not satisfy the needs of IOPS. Now I want disable write-back mode , I founded office documents,but the doc was outdated https://docs.ceph.com/en/latest/rados/operations/cache-tiering/?highlight=cache%20

[ceph-users] Re: Question about migrating from iSCSI to RBD

2021-03-22 Thread Justin Goetz

Hey Rich! Appreciate the info. This did work successfully! Just wanted to share my experience in case others run into a similar situation: First step, I disabled the tcmu-runner process on all 3 of our previous iSCSI gateway nodes. Then from our MONs, I confirmed there were no current locks

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-03-22 Thread 特木勒

Thank you~ I will try to upgrade cluster too. Seem like this is the only way for now. 😭 I will let you know once I complete testing. :) Have a good day Szabo, Istvan (Agoda) 于2021年3月22日周一下午3:38写道： > Yeah, doesn't work. Last week they fixed my problem ticket which caused > the crashes, and due

[ceph-users] Re: Incomplete pg , any chance to to make it survive or data loss :( ?

2021-03-22 Thread Szabo, Istvan (Agoda)

Some news, due to the ceph pg inactive list command gave back that 0 objects are in this pg, I've marked complete on the primary osd, now it is unfound. Now I've stucked again 😕 [WRN] OBJECT_UNFOUND: 4/58369044 objects unfound (0.000%) pg 44.1aa has 4 unfound objects [ERR] PG_DAMAGED: Possib

[ceph-users] Re: How to sizing nfs-ganesha.

2021-03-22 Thread Daniel Gryniewicz

Hi. Unfortunately, there isn't a good guide for sizing Ganesha. It's pretty light weight, and so the machines it needs are generally smaller than what Ceph needs, so you probably won't have much of a problem. The scaling of Ganesha is in 2 factors, based on the workload involved: the CPU us

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

Hi Dan: Thanks for the reply - at present, our mons and mgrs are off [because of the unsustainable nature of the filesystem usage]. We'll try putting them on again for long enough to get "ceph status" out of them, but because the mgr was unable to actually talk to anything, and reply at that point

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Dan van der Ster

Hi Sam, The daemons restart (for *some* releases) because of this: https://tracker.ceph.com/issues/21672 In short, if the selinux module changes, and if you have selinux enabled, then midway through yum update, there will be a systemctl restart ceph.target issued. For the rest -- I think you shou

[ceph-users] Re: Incomplete pg , any chance to to make it survive or data loss :( ?

2021-03-22 Thread Szabo, Istvan (Agoda)

Forgot to say, this is an octopus 15.2.9 cluster, there isn't any force_create_pg option that has couple of thread to make it work. https://tracker.ceph.com/issues/10411 https://www.oreilly.com/library/view/mastering-proxmox-/9781788397605/42d80c67-10aa-4cf2-8812-e38c861cdc5d.xhtml [https://www.or

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

2021-03-22 Thread Szabo, Istvan (Agoda)

Yeah, doesn't work. Last week they fixed my problem ticket which caused the crashes, and due to the crashes stopped the replication I'll give a try this week again after the update if the daemon doesn't crash, maybe it will work, because if crash hasn't happened, the data was synced. Fingers cro

[ceph-users] Incomplete pg , any chance to to make it survive or data loss :( ?

2021-03-22 Thread Szabo, Istvan (Agoda)

Hi, What can I do with this pg to make it work? We lost and don't have the osds 61,122 but we have the 32,33,70. I've exported the pg chunk from them, but they are very small and when I imported back to another osd that osd never started again so I had to remove that chunk (44.1aas2, 44.1aas3

[ceph-users] Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-22 Thread Sam Skipsey

Hi everyone: I posted to the list on Friday morning (UK time), but apparently my email is still in moderation (I have an email from the list bot telling me that it's held for moderation but no updates). Since this is a bit urgent - we have ~3PB of storage offline - I'm posting again. To save ret

[ceph-users] Re: New Issue - Mapping Block Devices

[ceph-users] New Issue - Mapping Block Devices

[ceph-users] How to know which client hold the lock of a file

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] DocuBetter Meeting -- APAC 25 Mar 2021 0100 UTC

[ceph-users] March 2021 Tech Talk and Code Walk-through

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Device class not deleted/set correctly

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Ceph User Survey Working Group - Next Steps

[ceph-users] how to disable write-back mode in ceph octopus

[ceph-users] Re: Question about migrating from iSCSI to RBD

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

[ceph-users] Re: Incomplete pg , any chance to to make it survive or data loss :( ?

[ceph-users] Re: How to sizing nfs-ganesha.

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

[ceph-users] Re: Incomplete pg , any chance to to make it survive or data loss :( ?

[ceph-users] Re: [Suspicious newsletter] RGW: Multiple Site does not sync olds data

[ceph-users] Incomplete pg , any chance to to make it survive or data loss :( ?

[ceph-users] Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

27 matches

Site Navigation

Mail list logo

Footer information