[ceph-users] Re: Fix PGs states

2020-11-02 Thread Eugen Block
There's nothing wrong with EC pools or multiple datacenters, you just need the right configuration to cover the specific requirements ;-) Zitat von "Ing. Luis Felipe Domínguez Vega" : Yes, thanks to all, the decisition was, remove all and start from 0 and not use EC pools, use only Replicat

[ceph-users] Re: Seriously degraded performance after update to Octopus

2020-11-02 Thread Marc Roos
I am advocating already a long time for publishing testing data of some basic test cluster against different ceph releases. Just a basic ceph cluster that covers most configs and run the same tests, so you can compare just ceph performance. That would mean a lot for smaller companies that do

[ceph-users] Re: Restart Error: osd.47 already exists in network host

2020-11-02 Thread Eugen Block
Hi, are you sure it's the right container ID you're using for the restart? I noticed that 'cephadm ls' shows older containers after a daemon had to be recreated (a MGR in my case). Maybe you're trying to restart a daemon that was already removed? Regards, Eugen Zitat von Ml Ml : Hello L

[ceph-users] Re: [EXTERNAL] Re: 14.2.12 breaks mon_host pointing to Round Robin DNS entry

2020-11-02 Thread Wido den Hollander
On 31/10/2020 11:16, Sasha Litvak wrote: Hello everyone, Assuming that backport has been merged for few days now,  is there a chance that 14.2.13 will be released? > On the dev list it was posted that .13 will be released this week. Wido On Fri, Oct 23, 2020, 6:03 AM Van Alstyne, Kennet

[ceph-users] how to rbd export image from group snap?

2020-11-02 Thread Timo Weingärtner
Hi, we're using rbd for VM disk images and want to make consistent backups of groups of them. I know I can create a group and make consistent snapshots of all of them: # rbd --version ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0) nautilus (stable) # rbd create test_foo --size 1

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
Hi Sagara, I'm not sure if my hypothesis can be correct. Ceph sends an acknowledge of a write only after all copies are on disk. In other words, if PGs end up on different versions after a power outage, one always needs to roll back. Since you have two healthy OSDs in the PG and the PG is activ

[ceph-users] Seriously degraded performance after update to Octopus

2020-11-02 Thread Martin Rasmus Lundquist Hansen
Two weeks ago we updated our Ceph cluster from Nautilus (14.2.0) to Octopus (15.2.5), an update that was long overdue. We used the Ansible playbooks to perform a rolling update and except from a few minor problems with the Ansible code, the update went well. The Ansible playbooks were also used

[ceph-users] Restart Error: osd.47 already exists in network host

2020-11-02 Thread Ml Ml
Hello List, sometimes some OSD get taken our for some reason ( i am still looking for the reason, and i guess its due to some overload), however, when i try to restart them i get: Nov 02 08:05:26 ceph05 bash[9811]: Error: No such container: ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df-osd.47 Nov 02 0

[ceph-users] Re: read latency

2020-11-02 Thread Vladimir Prokofev
With sequential read you get "read ahead" mechanics attached which helps a lot. So let's say you do 4KB seq reads with fio. By default, Ubuntu, for example, has 128KB read ahead size. That means when you request that 4KB of data, driver will actually request 128KB. When your IO is served, and you r

[ceph-users] Intel SSD firmware guys contacts, if any

2020-11-02 Thread vitalif
Hi! I have an interesting question regarding SSDs and I'll try to ask about it here. During my testing of Ceph & Vitastor & Linstor on servers equipped with Intel D3-4510 SSDs I discovered a very funny problem with these SSDs: They don't like overwrites of the same sector. That is, if you over

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
Hi Frank > I'm not sure if my hypothesis can be correct. Ceph sends an acknowledge of a > write only after all copies are on disk. In other words, if PGs end up on > different versions after a power outage, one always needs to roll back. Since > you have two healthy OSDs in the PG and the PG i

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
Hi Sagra, looks like you have one on a new and 2 on an old version. Can you add the information about which OSD each version resides? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Sagara Wijetunga Sent: 02 Nov

[ceph-users] RGW seems to not clean up after some requests

2020-11-02 Thread Denis Krienbühl
Hi everyone We have faced some RGW outages recently, with the RGW returning HTTP 503. First for a few, then for most, then all requests - in the course of 1-2 hours. This seems to have started since we have updated from 15.2.4 to 15.2.5. The line that accompanies these outages in the log is the

[ceph-users] Re: RGW seems to not clean up after some requests

2020-11-02 Thread Abhishek Lekshmanan
Denis Krienbühl writes: > Hi everyone > > We have faced some RGW outages recently, with the RGW returning HTTP 503. > First for a few, then for most, then all requests - in the course of 1-2 > hours. This seems to have started since we have updated from 15.2.4 to 15.2.5. > > The line that accom

[ceph-users] Re: Fix PGs states

2020-11-02 Thread Ing . Luis Felipe Domínguez Vega
Of course yes yes jejeje, the thing is that my housing provider has problems with the black fibber that connects the DCs, so i prefer use only 1 DC and replicated PGs El 2020-11-02 03:13, Eugen Block escribió: There's nothing wrong with EC pools or multiple datacenters, you just need the right

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
Hi Frank > looks like you have one on a new and 2 on an old version. Can you add the > information about which OSD each version resides? The "ceph pg 3.b query" shows following:     "peer_info": [         {             "peer": "1",             "pgid": "3.b",             "last_update": "4825

[ceph-users] Re: Restart Error: osd.47 already exists in network host

2020-11-02 Thread Ml Ml
Hello Eugen, cephadm ls for OSD.41: { "style": "cephadm:v1", "name": "osd.41", "fsid": "5436dd5d-83d4-4dc8-a93b-60ab5db145df", "systemd_unit": "ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@osd.41", "enabled": true, "state": "error", "contain

[ceph-users] Re: RGW seems to not clean up after some requests

2020-11-02 Thread Denis Krienbühl
Hi Abhishek > On 2 Nov 2020, at 14:54, Abhishek Lekshmanan wrote: > > There isn't much in terms of code changes in the scheduler from > v15.2.4->5. Does the perf dump (`ceph daemon perf dump > `) on RGW socket show any throttle counts? I know, I was wondering if this somehow might have an infl

[ceph-users] Re: cephfs cannot write

2020-11-02 Thread Eugen Block
Hi, I'm not sure if the ceph-volume error is related to the "operation not permitted" error. Have you checked the auth settings for your cephfs client? Or did you mount it as admin user? Zitat von Patrick : Hi all, My ceph cluster is HEALTH_OK, but I cannot write on cephfs. OS: Ubuntu

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
Hi Frank > Please note, there is no peer 0 in "ceph pg 3.b query". Also no word osd.I > checked other PGs with active+clean, there is a "peer": "0".  But "ceph pg pgid query" always shows only two peers, sometime peer 0 and 1, or 1 and 2, 0 and 2, etc. Regards Sagara

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
Hi Frank > Please note, there is no peer 0 in "ceph pg 3.b query". Also no word osd. I checked other PGs with "active+clean", there is a "peer": "0".   But "ceph pg pgid query" always shows only two peers, sometime peer 0 and 1, or 1 and 2, 0 and 2, etc. Regards Sagara

[ceph-users] Re: Seriously degraded performance after update to Octopus

2020-11-02 Thread Vladimir Prokofev
Just shooting in the dark here, but you may be affected by similar issue I had a while back, it was discussed here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZOPBOY6XQOYOV6CQMY27XM37OC6DKWZ7/ In short - they've changed setting bluefs_buffered_io to false in the recent Nautilu

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
Hi Sagara, the primary OSD is probably not listed as a peer. Can you post the complete output of - ceph pg 3.b query - ceph pg dump - ceph osd df tree in a pastebin? = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Sagara Wij

[ceph-users] v14.2.13 Nautilus released

2020-11-02 Thread Abhishek Lekshmanan
This is the 13th backport release in the Nautilus series. This release fixes a regression introduced in v14.2.12, and a few ceph-volume & RGW fixes. We recommend users to update to this release. Notable Changes --- * Fixed a regression that caused breakage in clusters that referred

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
Hi Frank > the primary OSD is probably not listed as a peer. Can you post the complete > output of > - ceph pg 3.b query > - ceph pg dump > - ceph osd df tree > in a pastebin? Yes, the Primary OSD is 0. I have attached above as .txt files. Please let me know if you still cannot read them. Re

[ceph-users] Inconsistent Space Usage reporting

2020-11-02 Thread Vikas Rana
Hi Friends, We have some inconsistent storage space usage reporting. We used only 46TB with single copy but the space used on the pool is close to 128TB. Any idea where's the extra space is utilized and how to reclaim it? Ceph Version : 12.2.11 with XFS OSDs. We are planning to upgrade

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
Hmm, I'm getting a bit confused. Could you also send the output of "ceph osd pool ls detail". Did you look at the disk/controller cache settings? I think you should start a deep-scrub with "ceph pg deep-scrub 3.b" and record the output of "ceph -w | grep '3\.b'" (note the single quotes). The e

[ceph-users] Re: pgs stuck backfill_toofull

2020-11-02 Thread Joachim Kraftmayer
Stefan, I agree with you. In Jewel the recovery process is not really throttled by default. With Luminous and later you benefit from dynamic resharding and the too big OMAPs handling. Regars, Joachim ___ Clyso GmbH Am 29.10.2020 um 21:30 schrieb Stefan Kooma

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Sagara Wijetunga
> Hmm, I'm getting a bit confused. Could you also send the output of "ceph osd > pool ls detail". File ceph-osd-pool-ls-detail.txt attached. > Did you look at the disk/controller cache settings? I don't have disk controllers on Ceph machines. The hard disk is directly attached to the mother

[ceph-users] Re: read latency

2020-11-02 Thread Tony Liu
Thanks Vladimir for the clarification! Tony > -Original Message- > From: Vladimir Prokofev > Sent: Monday, November 2, 2020 3:46 AM > Cc: ceph-users > Subject: [ceph-users] Re: read latency > > With sequential read you get "read ahead" mechanics attached which helps > a lot. > So let's

[ceph-users] Re: How to recover from active+clean+inconsistent+failed_repair?

2020-11-02 Thread Frank Schilder
> But there can be a on chip disk controller on the motherboard, I'm not sure. There is always some kind of controller. Could be on-board. Usually, the cache settings are accessible when booting into the BIOS set-up. > If your worry is fsync persistence No, what I worry about is volatile write

[ceph-users] Re: Monitor persistently out-of-quorum

2020-11-02 Thread Ki Wong
Folks, We’ve finally found the issue: MTU mismatch on the switch-side. So, my colleague noticed “tracepath” from the other monitors to the problematic one does not return and I tracked it down to an MTU mismatch (jumbo vs not) on the switch end. After fixing the mismatch all is back to normal. It

[ceph-users] Ceph flash deployment

2020-11-02 Thread Seena Fallah
Hi all, Does this guid is still valid for a bluestore deployment with nautilus or octopus? https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an emai