[ceph-users] Re: pacific installation at ubuntu 20.04
I think, I found the reason. the cephadm-script uses the ubuntu repo instead the ceph repo. so I get the older version 15 ... root@node1:~# ./cephadm -v add-repo --release pacific Could not locate podman: podman not found Installing repo GPG key from https://download.ceph.com/keys/release.asc... Installing repo file at /etc/apt/sources.list.d/ceph.list... root@node1:~# ./cephadm -v install Could not locate podman: podman not found Installing packages ['cephadm']... Running command: apt-get install -y cephadm apt-get: Reading package lists... apt-get: Building dependency tree... apt-get: Reading state information... apt-get: Recommended packages: apt-get: docker.io apt-get: The following NEW packages will be installed: apt-get: cephadm apt-get: 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. apt-get: Need to get 57.8 kB of archives. apt-get: After this operation, 303 kB of additional disk space will be used. apt-get: Get:1 http://de.archive.ubuntu.com/ubuntu focal-updates/universe amd64 cephadm amd64 15.2.11-0ubuntu0.20.04.2 [57.8 kB] apt-get: Fetched 57.8 kB in 0s (282 kB/s) apt-get: Selecting previously unselected package cephadm. (Reading database ... 71696 files and directories currently installed.) apt-get: (Reading database ... apt-get: Preparing to unpack .../cephadm_15.2.11-0ubuntu0.20.04.2_amd64.deb ... apt-get: Unpacking cephadm (15.2.11-0ubuntu0.20.04.2) ... apt-get: Setting up cephadm (15.2.11-0ubuntu0.20.04.2) ... apt-get: Adding system user cephadmdone any idea how to fix that? Am 2021-06-23 16:50, schrieb Jana Markwort: Hi all, I'm a new ceph user and try to install my first cluster. I try to install pacific but as result I get octopus. What's wrong here? I've done: # curl --silent --remote-name --location https://github.com/ceph/ceph/raw/pacific/src/cephadm/cephadm # chmod +x cephadm # ./cephadm add-repo --release pacific # ./cephadm install # cephadm install ceph-common # ceph -v ceph version 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable) # cat /etc/apt/sources.list.d/ceph.list deb https://download.ceph.com/debian-pacific/ focal main ?? Kind regards, Jana ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pacific installation at ubuntu 20.04
Hi, On 24.06.21 09:34, Jana Markwort wrote: > > I think, I found the reason. > the cephadm-script uses the ubuntu repo instead the ceph repo. > so I get the older version 15 ... > > root@node1:~# ./cephadm -v add-repo --release pacific > Could not locate podman: podman not found > Installing repo GPG key from > https://download.ceph.com/keys/release.asc... > Installing repo file at /etc/apt/sources.list.d/ceph.list... > > root@node1:~# ./cephadm -v install > Could not locate podman: podman not found I think there is an "apt update" missing between these two steps. The first creates /etc/apt/sources.list.d/ceph.list and the second installs packages, but the repo list was never updated. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 93818 B Geschäftsführer: Peer Heinlein - Sitz: Berlin signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Missing objects in pg
Dear List, since my update yesterday from 14.2.18 to 14.2.20 i got an unhealthy cluster. As I remember right, it appeared after rebooting the second server. They are 7 missing objects from pgs of a cache pool (pool 3). This pool is now changed writeback to proxy and i'm not able to flush all objects. root@scvirt06:/home/urzadmin/ceph_issue# ceph -s cluster: id: 5349724e-fa96-4fd6-8e44-8da2a39253f7 health: HEALTH_ERR 7/15893342 objects unfound (0.000%) Possible data damage: 7 pgs recovery_unfound Degraded data redundancy: 21/47680026 objects degraded (0.000%), 7 pgs degraded, 7 pgs undersized client is using insecure global_id reclaim mons are allowing insecure global_id reclaim services: mon: 3 daemons, quorum scvirt03,scvirt06,scvirt01 (age 19h) mgr: scvirt04(active, since 21m), standbys: scvirt03, scvirt02 mds: scfs:1 {0=scvirt04=up:active} 1 up:standby-replay 1 up:standby osd: 54 osds: 54 up (since 17m), 54 in (since 10w); 7 remapped pgs task status: scrub status: mds.scvirt03: idle data: pools: 5 pools, 704 pgs objects: 15.89M objects, 49 TiB usage: 139 TiB used, 145 TiB / 285 TiB avail pgs: 21/47680026 objects degraded (0.000%) 7/15893342 objects unfound (0.000%) 694 active+clean 7 active+recovery_unfound+undersized+degraded+remapped 3 active+clean+scrubbing+deep io: client: 3.7 MiB/s rd, 6.6 MiB/s wr, 40 op/s rd, 31 op/s wr my cluster: scvirt01 - mon,osds scvirt02 - mgr,osds scvirt03 - mon,mgr,mds,osds scvirt04 - mgr,mds,osds scvirt05 - osds scvirt06 - mon,mds,osds log of osd.49: root@scvirt03:/home/urzadmin# tail -f /var/log/ceph/ceph-osd.49.log AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 0.64 GB write, 0.01 MB/s write, 0.54 GB read, 0.01 MB/s read, 6.5 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count ** File Read Latency Histogram By Level [default] ** 2021-06-24 08:53:08.865 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:08.865 7f88a505f700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88ab86c700 -1 log_channel(cluster) log [ERR] : 3.9 has 1 objects unfound and apparently lost 2021-06-24 08:53:40.570 7f88a9067700 -1 log_channel(cluster) log [ERR] : 3.1e has 1 objects unfound and apparently lost 2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:777] --- DUMPING STATS --- 2021-06-24 08:54:45.042 7f88b487e700 4 rocksdb: [db/db_impl.cc:778] ** DB Stats ** Uptime(secs): 85202.3 total, 600.0 interval Cumulative writes: 1148K writes, 8640K keys, 1148K commit groups, 1.0 writes per commit group, ingest: 1.24 GB, 0.01 MB/s Cumulative WAL: 1148K writes, 546K syncs, 2.10 writes per sync, written: 1.24 GB, 0.01 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 369 writes, 1758 keys, 369 commit groups, 1.0 writes per commit group, ingest: 0.41 MB, 0.00 MB/s Interval WAL: 369 writes, 155 syncs, 2.37 writes per sync, written: 0.00 MB, 0.00 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop L0 3/0 104.40 MB 0.8 0.0 0.0 0.0 0.2 0.2 0.0 1.0 0.0 67.8 2.89 2.70 6 0.482 0 0 L1 2/0 131.98 MB 0.5 0.2 0.1 0.1 0.2 0.1 0.0 1.8 149.9 120.9 1.53 1.41 1 1.527 2293K 140K L2 16/0 871.57 MB 0.3 0.3 0.1 0.3 0.3 -0.0 0.0 5.2 158.1 132.3 2.05 1.93 1 2.052 3997K 1089K Sum 21/0 1.08 GB 0.0 0.5 0.2 0.4 0.6 0.2 0.0 3.3 85.5 100.8 6.47 6.03 8 0.809 6290K 1229K Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 If I run ceph pg repair 3.1e it doesn't change anything and i do not understand why these pgs are undersized. All OSDs are up. ceph.conf: [g
[ceph-users] Re: pacific installation at ubuntu 20.04
ok, the problem is the GPG key: root@node1:~# ./cephadm -v add-repo --release pacific Could not locate podman: podman not found Installing repo GPG key from https://download.ceph.com/keys/release.asc... Installing repo file at /etc/apt/sources.list.d/ceph.list... ... W: https://download.ceph.com/debian-pacific/dists/focal/InRelease: The key(s) in the keyring /etc/apt/trusted.gpg.d/ceph.release.gpg are ignored as the file has an unsupported filetype. workaround: # rm /etc/apt/trusted.gpg.d/ceph.release.gpg # wget https://download.ceph.com/keys/release.asc # apt-key add release.asc # apt update after that, cephadm installs from https://download.ceph.com/debian-pacific root@node1:~# ceph -v ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable) Am 2021-06-24 13:08, schrieb Robert Sander: Hi, On 24.06.21 09:34, Jana Markwort wrote: I think, I found the reason. the cephadm-script uses the ubuntu repo instead the ceph repo. so I get the older version 15 ... root@node1:~# ./cephadm -v add-repo --release pacific Could not locate podman: podman not found Installing repo GPG key from https://download.ceph.com/keys/release.asc... Installing repo file at /etc/apt/sources.list.d/ceph.list... root@node1:~# ./cephadm -v install Could not locate podman: podman not found I think there is an "apt update" missing between these two steps. The first creates /etc/apt/sources.list.d/ceph.list and the second installs packages, but the repo list was never updated. Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Start a service on a specified node
If I understand the documentation for the placements in "ceph orch apply" correctly, I can place the daemons by number or on specific host. But what I want is: "Start 3 mgr services, and one of it should be started on node ceph01." How I can achieve this? Thanks! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW topic created in wrong (default) tenant
On Wed, Jun 23, 2021 at 6:39 PM Daniel Iwan wrote: > this looks like a bug, the topic should be created in the right tenant. >> please submit a tracker for that. >> > > Thank you for confirming. > Created here https://tracker.ceph.com/issues/51331 > > thanks > yes. topics are owned by the tenant. previously, they were owned by the >> user but since the same topic could be used among different buckets and >> different users, this was causing issues (was fixed here: >> https://github.com/ceph/ceph/pull/38136) >> (documentation also mentioned that in the intro paragraph of the doc: >> https://docs.ceph.com/en/latest/radosgw/notifications/) >> > > I think it's this section > ``` > A user can create different topics. A topic entity is defined by its name > and is per tenant. A user can only associate its topics (via notification > configuration) with buckets it owns. > ``` > > >> no permissions are needed to create a topic. however, note that without >> proper permissions on the bucket, you cannot create a notification that >> associates this topic with the bucket. >> > > Yes, I thought it would be similar to AWS, possibly not implemented/needed > so far: > > https://docs.aws.amazon.com/sns/latest/dg/sns-using-identity-based-policies.html > > https://docs.aws.amazon.com/config/latest/developerguide/sns-topic-policy.html > > ``` > { "Statement": [{ "Effect": "Allow", "Action": ["sns:CreateTopic", > "sns:ListTopics", "sns:SetTopicAttributes", "sns:DeleteTopic"], "Resource": > "*" }] } > ``` > > Not having that sns:CreateTopic sns:DeleteTopic leaves room for abuse. > User could potentially create many topics, delete all topics from > tenant(s) maliciously or by accident (bugs) etc. > > "tenant" in the RGW is somewhat equivalent to an "account" in AWS. however, "tenant" does not have all the security aspects that an "account" has. adding that would be much wider in scope than the creation/deletion of topics. > On a deletion note, if I understand correctly, deletion of the topic > without deletion of all notifications first creates the situation where > notifications can no longer be deleted due to the topic missing. > The only option is to re-create the topic and delete notifications first. > according to what i tested, this is not the case. deletion of a topic only prevents the creation of new notifications with that topic. it does not effect the deletion of notifications with that topic, not the actual sending of these notifications. note that we also added a cascade delete process to delete all notifications of a bucket when a bucket is deleted. (it should be in pacific: https://github.com/ceph/ceph/pull/38351) > > Btw I enjoyed your FOSDEM presentation > https://fosdem.org/2021/schedule/event/sds_ceph_rgw_serverless/ > thank you! Any timeframe for native SQS coming to Ceph? > > no actual timelines... but it should probably land in the main branch later this year :-) > Regards > Daniel > > > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Month June Schedule Now Available
Hi everyone, Today is the final day for Ceph Month! Here's today's schedule: 9:00 ET / 15:00 CEST Evaluating CephFS Performance vs. Cost on High-Density Commodity Disk Servers [Dan van der Ster] 9:30 ET / 15:30 CEST Ceph Market Development Working Group BoF 10:10 ET / 16:10 CEST Ceph Community Ambassador BoF Meeting link: https://bluejeans.com/908675367 Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 On Tue, Jun 22, 2021 at 5:50 AM Mike Perez wrote: > > Hi everyone, > > Join us in ten minutes for week 4 of Ceph Month! > > 9:00 ET / 15:00 CEST cephadm [sebastian wagner] > 9:30 ET / 15:30 CEST CephFS + fscrypt: filename and content encryption > 10:00 ET / 16:00 CEST Crimson Update [Samuel Just] > > Meeting link:https://bluejeans.com/908675367 > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > On Fri, Jun 18, 2021 at 5:50 AM Mike Perez wrote: > > > > Hi everyone, > > > > Join us in ten minutes for more Ceph Month presentations! > > > > 9:00 ET / 15:00 CEST Optimizing Ceph on Arm64 [Richael Zhuang] > > 9:30 ET / 15:30 CEST Improving Cosbench for Ceph Benchmarking [Danny > > Abukalam] > > > > Meeting link:https://bluejeans.com/908675367 > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > On Wed, Jun 16, 2021 at 6:25 AM Mike Perez wrote: > > > > > > Hi everyone, > > > > > > Here's today schedule for Ceph Month: > > > > > > 9:00 ET / 15:00 CEST Project Aquarium - An easy-to-use storage > > > appliance wrapped around Ceph [Joao Eduardo Luis] > > > 9:30 ET / 15:30 CEST [lightning] Qemu: librbd vs krbd performance > > > [Wout van Heeswijk] > > > 9:40 ET / 15:40 CEST [lightning] Evaluation of RBD replication options > > > @CERN Arthur, Outhenin-Chalandre > > > > > > Meeting link:https://bluejeans.com/908675367 > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > > On Tue, Jun 15, 2021 at 5:52 AM Mike Perez wrote: > > > > > > > > Hi everyone, > > > > > > > > Here's today's schedule for Ceph Month: > > > > > > > > 9:00ET / 15:00 CEST Dashboard Update [Ernesto] > > > > 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k [Wido, > > > > den Hollander] > > > > 9:40 ET / 15:40 CEST [lightning] From Open Source to Open Ended in > > > > Ceph with Lua [Yuval Lifshitz] > > > > > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > Meeting link: https://bluejeans.com/908675367 > > > > > > > > On Mon, Jun 14, 2021 at 6:50 AM Mike Perez wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > In ten minutes, Ceph Month continues with the following schedule > > > > > today: > > > > > > > > > > 10:00 ET / 16:00 CEST RBD update [Ilya Dryomov] > > > > > 10:30 ET / 16:30 CEST 5 more ways to break your ceph cluster [Wout > > > > > van Heeswijk] > > > > > > > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > Meeting link: https://bluejeans.com/908675367 > > > > > > > > > > > > > > > On Fri, Jun 11, 2021 at 6:50 AM Mike Perez wrote: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > In ten minutes, join us for the next Ceph Month presentation on > > > > > > Intel > > > > > > QLC SSD: Cost-Effective Ceph Deployments by Anthony D'Atri > > > > > > > > > > > > https://bluejeans.com/908675367 > > > > > > https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > > > > > On Fri, Jun 11, 2021 at 5:50 AM Mike Perez > > > > > > wrote: > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > In ten minutes, join us for the next Ceph Month presentation on > > > > > > > Performance Optimization for All Flash-based on aarch64 by > > > > > > > Chunsong > > > > > > > Feng > > > > > > > > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > https://bluejeans.com/908675367 > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 6:00 AM Mike Perez > > > > > > > wrote: > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > We're about to start Ceph Month 2021 with Casey Bodley giving a > > > > > > > > RGW update! > > > > > > > > > > > > > > > > Afterward we'll have two BoF discussions on: > > > > > > > > > > > > > > > > 9:30 ET / 15:30 CEST [BoF] Ceph in Research & Scientific > > > > > > > > Computing > > > > > > > > [Kevin Hrpcek] > > > > > > > > > > > > > > > > 10:10 ET / 16:10 CEST [BoF] The go-ceph get together [John > > > > > > > > Mulligan] > > > > > > > > > > > > > > > > Join us now on the stream: > > > > > > > > > > > > > > > > https://bluejeans.com/908675367 > > > > > > > > > > > > > > > > On Tue, Jun 1, 2021 at 6:50 AM Mike Perez > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > In ten minutes, join us for the start of the Ceph Month June > > > > > > > > > event! > > > > > > > > > The schedule and meeting link can be found on this etherpad: > > > > > > > > > > > > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > > > > > > > > > > > On Tue, May 25, 2021 at 11
[ceph-users] Re: Ceph Month June Schedule Now Available
Hi Marc, We can look into that for future events. For this event, we recommended people subscribe to the Ceph Community Calendar which does display the times in your local time. https://calendar.google.com/calendar/embed?src=9ts9c7lt7u1vic2ijvvqqlfpo0%40group.calendar.google.com On Tue, Jun 22, 2021 at 5:57 AM Marc wrote: > > Maybe it is nice to send this as a calendar invite? So it nicely shows up at > correct local time of everyone? > > > > > -Original Message- > > From: Mike Perez > > Sent: Tuesday, 22 June 2021 14:50 > > To: ceph-users > > Subject: [ceph-users] Re: Ceph Month June Schedule Now Available > > > > Hi everyone, > > > > Join us in ten minutes for week 4 of Ceph Month! > > > > 9:00 ET / 15:00 CEST cephadm [sebastian wagner] > > 9:30 ET / 15:30 CEST CephFS + fscrypt: filename and content encryption > > 10:00 ET / 16:00 CEST Crimson Update [Samuel Just] > > > > Meeting link:https://bluejeans.com/908675367 > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > On Fri, Jun 18, 2021 at 5:50 AM Mike Perez wrote: > > > > > > Hi everyone, > > > > > > Join us in ten minutes for more Ceph Month presentations! > > > > > > 9:00 ET / 15:00 CEST Optimizing Ceph on Arm64 [Richael Zhuang] > > > 9:30 ET / 15:30 CEST Improving Cosbench for Ceph Benchmarking [Danny > > Abukalam] > > > > > > Meeting link:https://bluejeans.com/908675367 > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > On Wed, Jun 16, 2021 at 6:25 AM Mike Perez wrote: > > > > > > > > Hi everyone, > > > > > > > > Here's today schedule for Ceph Month: > > > > > > > > 9:00 ET / 15:00 CEST Project Aquarium - An easy-to-use storage > > > > appliance wrapped around Ceph [Joao Eduardo Luis] > > > > 9:30 ET / 15:30 CEST [lightning] Qemu: librbd vs krbd performance > > > > [Wout van Heeswijk] > > > > 9:40 ET / 15:40 CEST [lightning] Evaluation of RBD replication > > options > > > > @CERN Arthur, Outhenin-Chalandre > > > > > > > > Meeting link:https://bluejeans.com/908675367 > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > > > > > On Tue, Jun 15, 2021 at 5:52 AM Mike Perez > > wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > Here's today's schedule for Ceph Month: > > > > > > > > > > 9:00ET / 15:00 CEST Dashboard Update [Ernesto] > > > > > 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k > > [Wido, > > > > > den Hollander] > > > > > 9:40 ET / 15:40 CEST [lightning] From Open Source to Open Ended > > in > > > > > Ceph with Lua [Yuval Lifshitz] > > > > > > > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > Meeting link: https://bluejeans.com/908675367 > > > > > > > > > > On Mon, Jun 14, 2021 at 6:50 AM Mike Perez > > wrote: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > In ten minutes, Ceph Month continues with the following schedule > > today: > > > > > > > > > > > > 10:00 ET / 16:00 CEST RBD update [Ilya Dryomov] > > > > > > 10:30 ET / 16:30 CEST 5 more ways to break your ceph cluster > > [Wout van Heeswijk] > > > > > > > > > > > > Full schedule: https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > Meeting link: https://bluejeans.com/908675367 > > > > > > > > > > > > > > > > > > On Fri, Jun 11, 2021 at 6:50 AM Mike Perez > > wrote: > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > In ten minutes, join us for the next Ceph Month presentation > > on Intel > > > > > > > QLC SSD: Cost-Effective Ceph Deployments by Anthony D'Atri > > > > > > > > > > > > > > https://bluejeans.com/908675367 > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > > > > > > > On Fri, Jun 11, 2021 at 5:50 AM Mike Perez > > wrote: > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > In ten minutes, join us for the next Ceph Month presentation > > on > > > > > > > > Performance Optimization for All Flash-based on aarch64 by > > Chunsong > > > > > > > > Feng > > > > > > > > > > > > > > > > https://pad.ceph.com/p/ceph-month-june-2021 > > > > > > > > https://bluejeans.com/908675367 > > > > > > > > > > > > > > > > On Thu, Jun 10, 2021 at 6:00 AM Mike Perez > > wrote: > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > We're about to start Ceph Month 2021 with Casey Bodley > > giving a RGW update! > > > > > > > > > > > > > > > > > > Afterward we'll have two BoF discussions on: > > > > > > > > > > > > > > > > > > 9:30 ET / 15:30 CEST [BoF] Ceph in Research & Scientific > > Computing > > > > > > > > > [Kevin Hrpcek] > > > > > > > > > > > > > > > > > > 10:10 ET / 16:10 CEST [BoF] The go-ceph get together [John > > Mulligan] > > > > > > > > > > > > > > > > > > Join us now on the stream: > > > > > > > > > > > > > > > > > > https://bluejeans.com/908675367 > > > > > > > > > > > > > > > > > > On Tue, Jun 1, 2021 at 6:50 AM Mike Perez > > wrote: > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > In
[ceph-users] Re: ceph fs mv does copy, not move
Dear Patrick, thanks for letting me know. Could you please consider to make this a ceph client mount option, for example, '-o fast_move', that enables a code path that enforces an mv to be a proper atomic mv with the risk that in some corner cases the target quota is overrun? With this option enabled, a move should either be a move or fail outright with "out of disk quota" (no partial move, no cp+rm at all). The fail should only occur if it is absolutely obvious that the target quota will be exceeded. Any corner cases are the responsibility of the operator. Application crashes due to incorrect error handling are acceptable. Reasoning: >From a user's/operator's side, the preferred functionality is that in cases >where a definite quota overrun can securely be detected in advance, the move >should actually fail with "out of disk quota" instead of resorting to cp+rm, >potentially leading to partial moves and a total mess for users/operators to >clean up. In any other case, the quota should simply be ignored and the move >should be a complete atomic move with the risk of exceeding the target quota >and IO to stall. A temporary stall or fail of IO until the operator increases >the quota again is, in my opinion and use case, highly preferable over the >alternative of cp+rm. A quota or a crashed job is fast to fix, a partial move >is not. Some background: We use ceph fs as an HPC home file system and as a back-end store. Being able to move data quickly across the entire file system is essential, because users re-factor their directory structure containing huge amounts of data quite often for various reasons. On our system, we set file system quotas mainly for psychological reasons. We run a cron job that adjusts the quotas every day to show between 20% and 30% free capacity on the mount points. The psychological side here is to give an incentive to users to clean up temporary data. It is not intended to limit usage seriously, only to limit what can be done in between cron job runs as a safe-guard. The pool quotas set the real hard limits. I'm in the process of migrating 100+TB right now and am really happy that I still have a client where I can do an O(1) move. It would be a disaster if I had now to use rsync or similar, which would take weeks. Please, in such situations where developers seem to have to make a definite choice, consider the possibility of offering operators to choose the alternative that suits their use case best. Adding further options seems far better than limiting functionality in a way that becomes a terrible burden in certain, if not many use cases. In ceph fs there have been many such decisions that allow for different answers from a user/operator perspective. For example, I would prefer if I could get rid of the attempted higher POSIX compliance level of ceph fs compared with Lustre, just disable all the client-caps and cache-coherence management and turn it into an awesome scale-out parallel file system. The attempt of POSIX compliant handling of simultaneous writes to files offers nothing to us, but costs huge in performance and forces users to move away from perfectly reasonable HPC work flows. Also, that it takes a TTL to expire before changes on one client become visible on another (unless direct_io is used for all IO) is perfectly acceptable for us given the potential performance gain due to simpler client-MDS communication. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: 24 June 2021 05:29:45 To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] ceph fs mv does copy, not move Hello Frank, On Tue, Jun 22, 2021 at 2:16 AM Frank Schilder wrote: > > Dear all, > > some time ago I reported that the kernel client resorts to a copy instead of > move when moving a file across quota domains. I was told that the fuse client > does not have this problem. If enough space is available, a move should be a > move, not a copy. > > Today, I tried to move a large file across quota domains testing botn, the > kernel- and the fuse client. Both still resort to a copy even though this > issue was addressed quite a while ago > (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/44AEIHNEGKV4VGCARRTARGFZ264CR4T7/#XY7ZCE3KWHI4QSUNZHDWL3QZQFOHXRQW). > The versions I'm using are (CentOS 7) > > # yum list installed | grep ceph-fuse > ceph-fuse.x86_64 2:13.2.10-0.el7 @ceph > > # uname -r > 3.10.0-1160.31.1.el7.x86_64 > > Any suggestions how to get this to work? I have to move directories > containing 100+ TB. ceph-fuse reverted this behavior in: https://tracker.ceph.com/issues/48203 The kernel had a patch around that time too. In summary, it was not possible to accurately account for the quota usage prior to doing the rename. Rather than allow a quota to potentially be massively overru
[ceph-users] iscsi, gwcli, and vmware version
I notice on https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ that it lists a requirement of "VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6." Could anyone enlighten me as to why this specific limit is in place? Officlaly knowing something like, "you have to use v6.5 or later, because X happens", would be very helpful to me when doing a writeup for potential deployment plans. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbr...@medata.com| www.medata.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] query about product use of rbd mirror for DR
Dear Ceph Folks, Does anyone has real experience of using rbd mirroring for disaster recovery over 1000 miles away? I am planning using Ceph rbd mirroring feature for DR, and has no real experience. Could anyone sharing good or bad experience here? I am thinking of using iSCSI over rbd-nbd map, with rbd mirror to a remote site using a dedicated link of 200Mb/s. Ceph version will be on Luminous 12.2.13 Any sharing, suggestions and comments are highly appreciated. best regards, samuel huxia...@horebdata.cn ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: iscsi, gwcli, and vmware version
Hi Philip, Part of it will be down to VFMS supporting features for ISCSI and then that is chained to specific ESXi and VM levels. Andrew Ferris Network & System Management UBC Centre for Heart & Lung Innovation St. Paul's Hospital, Vancouver http://www.hli.ubc.ca >>> Philip Brown 6/24/2021 12:56 PM >>> I notice on https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ that it lists a requirement of "VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6." Could anyone enlighten me as to why this specific limit is in place? Officlaly knowing something like, "you have to use v6.5 or later, because X happens", would be very helpful to me when doing a writeup for potential deployment plans. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbr...@medata.com| www.medata.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: iscsi, gwcli, and vmware version
I would appreciate it if anyone could call out specific features involved here. "upgrade because it's better" doesnt usually fly in cost justification writeups. - Original Message - From: "Andrew Ferris" To: "ceph-users" , "Philip Brown" Sent: Thursday, June 24, 2021 1:13:02 PM Subject: Re: [ceph-users] iscsi, gwcli, and vmware version Hi Philip, Part of it will be down to VFMS supporting features for ISCSI and then that is chained to specific ESXi and VM levels. Andrew Ferris Network & System Management UBC Centre for Heart & Lung Innovation St. Paul's Hospital, Vancouver http://www.hli.ubc.ca >>> Philip Brown 6/24/2021 12:56 PM >>> I notice on https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ that it lists a requirement of "VMware ESX 6.5 or later using Virtual Machine compatibility 6.5 with VMFS 6." Could anyone enlighten me as to why this specific limit is in place? Officlaly knowing something like, "you have to use v6.5 or later, because X happens", would be very helpful to me when doing a writeup for potential deployment plans. -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbr...@medata.com| www.medata.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
On Sat, Jun 19, 2021 at 3:43 PM Nico Schottelius wrote: > Good evening, > > as an operator running Ceph clusters based on Debian and later Devuan > for years and recently testing ceph in rook, I would like to chime in to > some of the topics mentioned here with short review: > > Devuan/OS package: > > - Over all the years changing from Debian to Devuan, changing the Devuan > versions, dist-upgrading - we did not encounter a single issues on the > OS basis. The only real problems where, when ceph version > incompatibilities between the major versions happened. However this > will not change with containers. > > I do see the lack of proper packages for Alpine Linux, which would be > an amazing lean target for running ceph. > > The biggest problem I see is that ceph/cephadm is the longer the more > relying on systemd and that actually locks out folks. I want to reiterate that while cephadm requirements are systemd+lvm+python3+containers, the orchestration framework does not have any of these limitations, and is designed to allow you to plug in other options. > [...] > > Thus my suggestion for the ceph team is to focus on 2 out of the three > variants: > > - Keep providing a native, even manual deployment mode. Let people get > an understanding of ceph, develop even their own tooling around it. > This enables distros, SMEs, Open Source communities, hackers, > developers. Low entrance barrier, easy access, low degree of > automation. > > - For those who are into containers, advise them how to embrace k8s. How > to use k8s on bare metal. Is it potentially even smarter to run ceph > on IPv6 only clusters? What does the architecture look like with k8s? > How does rook do autodetection, what metrics can the kube-prometheus > grafana help with? etc. etc. The whole shebang that you'll need to > develop and create over time anyway. Cephadm is intended to be the primary non-k8s option, since it seems pretty clear that there is a significant (huge?) portion of the user commuity that is not interested in adding kubernetes underneath their storage (take all of the "containers add complex" arguments and * 100). We used containers because, in our view, it simplified the developer AND user experience. But neither rook nor cephadm preclude deploying Ceph the traditional way. The newer capabilities in the dashboard to manage the deployment of Ceph relies on the orchestrator API, so a traditional deployment today cannot make use of these new features, but nothing is preventing a non-container-based orchestrator implementation. sage ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
On Sun, Jun 20, 2021 at 9:51 AM Marc wrote: > Remarks about your cephadm approach/design: > > 1. I am not interested in learning podman, rook or kubernetes. I am using > mesos which is also on my osd nodes to use the extra available memory and > cores. Furthermore your cephadm OC is limited to only ceph nodes. While my > mesos OC is spread across a larger cluster and has rules when, and when not > to run tasks on the osd nodes. You incorrectly assume that rgw, grafana, > prometheus, haproxy are going to be ran on your ceph OC. rgw, grafana, prom, haproxy, etc are all optional components. The monitoring stack is deployed by default but is trivially disabled via a flag to the bootstrap command. We are well aware that not everyone wants these, but we cannot ignore the vast majority of users that wants things to Just Work without figuring out how to properly deploy and manage all of these extraneous integrated components. > 2. Nico pointed out that you do not have alpine linux container images. I did > not even know you were using container images. So how big are these? Where > are these stored. And why are these not as small as they can be? Such an osd > container image should be 20MB or so at most. I would even expect statically > build binary container image, why even a tiny os? > 4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ > to do with container thinking. I expected to find container images for osd, > msd, rgw separately and smaller. This looks more like an OS deployment. Early on the team building the container images opted for a single image that includes all of the daemons for simplicity. We could build stripped down images for each daemon type, but that's an investment in developer time and complexity and we haven't heard any complaints about the container size. (Usually a few hundred MB on a large scale storage server isn't a problem.) > 3. Why is in this cephadm still being talked about systemd? Your orchestrator > should handle restarts,namespaces and failed tasks not? There should be no > need to have a systemd dependency, at least I have not seen any container > images relying on this. Something needs to start the ceph daemon containers when the system reboots. We integrated with systemd since all major distros adopted it. Cephadm could be extended to support other init systems with pretty minimal effort... we aren't doing anything fancy with systemd. > 5. I have been writing this previously on the mailing list here. Is each rgw > still requiring its own dedicated client id? Is it still true, that if you > want to spawn 3 rgw instances, they need to authorize like client.rgw1, > client.rgw2 and client.rgw3? > This does not allow for auto scaling. The idea of using an OC is that you > launch a task, and that you can scale this task automatically when necessary. > So you would get multiple instances of rgw1. If this is still and issue with > rgw, mds and mgr etc. Why even bother doing something with an OC and > containers? The orchestrator automates the creation and cleanup of credentials for each rgw instance. (It also trivially scales them up/down, ala k8s.) If you have an autoscaler, you just need to tell cephadm how many you want and it will add/remove daemons. If you are using cephadm's ingress (haproxy) capability, the LB configuration will be adjusted for you. If you are using an external LB, you can query cephadm for a description of the current daemons and their endpoints and feed that info into your own ingress solution. > 6. As I wrote before I do not want my rgw or haproxy running in a OC that has > the ability to give tasks capability SYSADMIN. So that would mean I have to > run my osd daemons/containers separately. Only the OSD containers get extra caps to deal with the storage hardware. > 7. If you are not setting cpu and memory limits on your cephadm containers, > then again there is an argument why even use containers. Memory limits are partially implemented; we haven't gotten to CPU limits yet. It's on the list! > 8. I still see lots of comments on the mailing list about accessing logs. I > have all my containers log to a remote syslog server, if you still have your > ceph daemons that can not do this (correctly). What point is it even going to > containers. By default, we log to stderr and your logs are in journald or whatever alternative your container runtime has set up. You can trivially flip a switch and you get traditional file-based logs with a logrotated.d config, primary to satisfy users (like me!) who aren't comfortable with the newfangled log management style. > 9. I am updating my small cluster something like this: > > ssh root@c01 "ceph osd set noout ; ceph osd set noscrub ; ceph osd set > nodeep-scrub" > ssh root@c01 "ceph tell osd.* injectargs '--osd_max_scrubs=0'" > > ssh root@c01 "yum update 'ceph-*' -y" > ... > > ssh root@c01 "service ceph-mon@a restart" > ... > > ssh root@c01 "s
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
On Tue, Jun 22, 2021 at 11:58 AM Martin Verges wrote: > > > There is no "should be", there is no one answer to that, other than 42. > Containers have been there before Docker, but Docker made them popular, > exactly for the same reason as why Ceph wants to use them: ship a known > good version (CI tests) of the software with all dependencies, that can be > run "as is" on any supported platform. > > So ship it tested for container software XXX and run it on YYY. How will > that benefit me as a user? There are differences when running a docker > container, lxc, nspawn, podman, kubernetes and whatever. So you trade error > A for error B. There are even problems with containers if you don't use > version X from docker. That's what the past told us, why should it be > better in the future with even more container environments. Have you tried > running rancher on debian in the past? It breaks apart due to iptables or > other stuff. Rook is based on kubernetes, and cephadm on podman or docker. These are well-defined runtimes. Yes, some have bugs, but our experience so far has been a big improvement over the complexity of managing package dependencies across even just a handful of distros. (Podman has been the only real culprit here, tbh, but I give them a partial pass as the tool is relatively new.) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman wrote: > On 6/21/21 6:19 PM, Nico Schottelius wrote: > > And while we are at claiming "on a lot more platforms", you are at the > > same time EXCLUDING a lot of platforms by saying "Linux based > > container" (remember Ceph on FreeBSD? [0]). > > Indeed, and that is a more fundamental question: how easy it is to make > Ceph a first-class citizen on non linux platforms. Was that ever a > (design) goal? But then again, if you would be able to port docker > natively to say OpenBSD, you should be able to run Ceph on it as well. Thank you for bringing this up. This is in fact a key reason why the orchestration abstraction works the way it does--to allow other runtime environments to be supported (FreeBSD! sysvinit/Devuan/whatever for systemd haters!) while ALSO allowing an integrated, user-friendly experience in which users workflow for adding/removing hosts, replacing failed OSDs, managing services (MDSs, RGWs, load balancers, etc) can be consistent across all platforms. For 10+ years we basically said "out of scope" to these pesky deployment details and left this job to Puppet, Chef, Ansible, ceph-deploy, rook, etc., but the result of that strategy was pretty clear: ceph was hard to use and the user experience dismal when compared to an integrated product from any half-decent enterprise storage company, or products like Martin's that capitalize on core ceph's bad UX. The question isn't whether we support other environments, but how. As I mentioned in one of my first messages, we can either (1) generalize cephadm to work in other environments (break the current systemd+container requirement), or (2) add another orchestrator backend that supports a new environment. I don't have any well-formed opinion here. There is a lot of pretty generic "orchestration" logic in cephadm right now that isn't related to systemd or containers that could either be pulled out of cephadm into the mgr/ochestrator layer or a library. Or an independent, fresh orch backend implementation could opt for a very different approach or set of opinions. Either way, my assumption has been that these other environments would probably not be docker|podman-based. In the case of FreeBSD we'd probably want to use jails or whatever. But anything is possible. s ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] PG inconsistent+failed_repair
Hello. Today we've experienced a complete CEPH cluster outage - total loss of power in the whole infrastructure. 6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10 This resulted in unfound objects, which were "reverted" in a hurry with ceph pg mark_unfound_lost revert In retrospect that was probably a mistake as the "have" part stated 0'0. But then deep-scrubs started and they found inconsistent PGs. We tried repairing them, but they just switched to failed_repair. Here's a log example: 2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6 3:3163e703:::rbd_data.be08c566ef438d.2445:head : missing 2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c 3:3163e2ee:::rbd_data.efa86358d15f4a.004b:6ab1 : is an unexpected clone 2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0 inconsistent objects 2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed I tried manually deleting conflicting objects from secondary osds with ceph-objectstore-tool like this ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c rbd_data.efa86358d15f4a.004b:6ab1 remove it removes it but without any positive impact. Pretty sure I don't understand the concept. So currently I have the following thoughts: - is there any doc on the object placement specifics and what all of those numbers in their name mean? I've seen objects with similar prefix/mid but different suffix and I have no idea what does it mean; - I'm actually not sure what the production impact is at that point because everything seems to work so far. So I'm thinking if it's possible to kill replicas on secondary OSDd with ceph-objectstore-tool and just let CEPH create a replica from primary PG? I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid that further deep scrubs will reveal more errors. Any thoughts appreciated. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph fs mv does copy, not move
On 6/24/21 5:34 PM, Frank Schilder wrote: Please, in such situations where developers seem to have to make a definite choice, consider the possibility of offering operators to choose the alternative that suits their use case best. Adding further options seems far better than limiting functionality in a way that becomes a terrible burden in certain, if not many use cases. Yeah, I agree. In ceph fs there have been many such decisions that allow for different answers from a user/operator perspective. For example, I would prefer if I could get rid of the attempted higher POSIX compliance level of ceph fs compared with Lustre, just disable all the client-caps and cache-coherence management and turn it into an awesome scale-out parallel file system. The attempt of POSIX compliant handling of simultaneous writes to files offers nothing to us, but costs huge in performance and forces users to move away from perfectly reasonable HPC work flows. Also, that it takes a TTL to expire before changes on one client become visible on another (unless direct_io is used for all IO) is perfectly acceptable for us given the potential performance gain due to simpler client-MDS communication. Isn't that where LazyIO is for? See https://docs.ceph.com/en/latest/cephfs/lazyio/ Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG inconsistent+failed_repair
Followup. This is what's written in logs when I try to fix one PG: ceph pg repair 3.60 primary osd log: 2021-06-25 01:07:32.146 7fc006339700 -1 log_channel(cluster) log [ERR] : repair 3.53 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 : is an unexpected clone 2021-06-25 01:07:32.146 7fc006339700 -1 osd.6 pg_epoch: 210926 pg[3.53( v 210926'64271902 (210920'64268839,210926'64271902] local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f 210883/210883/5620 210811/210882/210882) [6,22,12] r=0 lpr=210882 luod=210926'64271899 crt=210926'64271902 lcod 210926'64271898 mlcod 210926'64271898 active+clean+scrubbing+deep+inconsistent+repair] _scan_snaps no clone_snaps for 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{} secondary osd 1: 2021-06-25 01:07:31.934 7f9eae8fa700 -1 osd.22 pg_epoch: 210926 pg[3.53( v 210926'64271899 (210920'64268839,210926'64271899] local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f 210883/210883/5620 210811/210882/210882) [6,22,12] r=1 lpr=210882 luod=0'0 lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898 active+inconsistent mbc={}] _scan_snaps no clone_snaps for 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{} secondary osd 2: 2021-06-25 01:07:30.828 7f94d6e61700 -1 osd.12 pg_epoch: 210926 pg[3.53( v 210926'64271899 (210920'64268839,210926'64271899] local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f 210883/210883/5620 210811/210882/210882) [6,22,12] r=2 lpr=210882 luod=0'0 lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898 active+inconsistent mbc={}] _scan_snaps no clone_snaps for 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{} And nothing happens, it's still in a failed_repair state. пт, 25 июн. 2021 г. в 00:36, Vladimir Prokofev : > Hello. > > Today we've experienced a complete CEPH cluster outage - total loss of > power in the whole infrastructure. > 6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10 > > This resulted in unfound objects, which were "reverted" in a hurry with > ceph pg mark_unfound_lost revert > In retrospect that was probably a mistake as the "have" part stated 0'0. > > But then deep-scrubs started and they found inconsistent PGs. We tried > repairing them, but they just switched to failed_repair. > > Here's a log example: > 2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6 > 3:3163e703:::rbd_data.be08c566ef438d.2445:head : missing > 2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c > 3:3163e2ee:::rbd_data.efa86358d15f4a.004b:6ab1 : is an > unexpected clone > 2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0 > inconsistent objects > 2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed > > I tried manually deleting conflicting objects from secondary osds > with ceph-objectstore-tool like this > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c > rbd_data.efa86358d15f4a.004b:6ab1 remove > it removes it but without any positive impact. Pretty sure I don't > understand the concept. > > So currently I have the following thoughts: > - is there any doc on the object placement specifics and what all of > those numbers in their name mean? I've seen objects with similar prefix/mid > but different suffix and I have no idea what does it mean; > - I'm actually not sure what the production impact is at that point > because everything seems to work so far. So I'm thinking if it's possible > to kill replicas on secondary OSDd with ceph-objectstore-tool and just let > CEPH create a replica from primary PG? > > I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid > that further deep scrubs will reveal more errors. > Any thoughts appreciated. > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph fs mv does copy, not move
Hi Stefan, > Isn't that where LazyIO is for? See ... Yes, it is, to some extend. However, there are many large HPC applications that will not start using exotic libraries for IO. A parallel file system offers everything that is needed with standard OS library calls. This is better solved on the FS than the client side. We put the link to lazy IO on our cluster documentation over a year ago, but I cannot imagine any of our users starting to invest porting massive applications even though we have ceph. So far, nobody did. Its also that HPC uses MPI, which comes with IO libraries users don't have influence on. I don't see this becoming a relevant alternative to a parallel file system any-time soon. Sorry. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 24 June 2021 20:01:16 To: Frank Schilder; Patrick Donnelly Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs mv does copy, not move On 6/24/21 5:34 PM, Frank Schilder wrote: > Please, in such situations where developers seem to have to make a definite > choice, consider the possibility of offering operators to choose the > alternative that suits their use case best. Adding further options seems far > better than limiting functionality in a way that becomes a terrible burden in > certain, if not many use cases. Yeah, I agree. > > In ceph fs there have been many such decisions that allow for different > answers from a user/operator perspective. For example, I would prefer if I > could get rid of the attempted higher POSIX compliance level of ceph fs > compared with Lustre, just disable all the client-caps and cache-coherence > management and turn it into an awesome scale-out parallel file system. The > attempt of POSIX compliant handling of simultaneous writes to files offers > nothing to us, but costs huge in performance and forces users to move away > from perfectly reasonable HPC work flows. Also, that it takes a TTL to expire > before changes on one client become visible on another (unless direct_io is > used for all IO) is perfectly acceptable for us given the potential > performance gain due to simpler client-MDS communication. Isn't that where LazyIO is for? See https://docs.ceph.com/en/latest/cephfs/lazyio/ Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
I've actually had rook-ceph not proceed with something that I would have continued on with. Turns out I was wrong and it was right. Its checking was more through then mine. Thought that was pretty cool. It eventually cleared itself and finished up. For a large ceph cluster, the orchestration is very nice. Thanks, Kevin From: Sage Weil Sent: Thursday, June 24, 2021 1:46 PM To: Marc Cc: Anthony D'Atri; Nico Schottelius; Matthew Vernon; ceph-users@ceph.io Subject: [ceph-users] Re: Why you might want packages not containers for Ceph deployments Check twice before you click! This email originated from outside PNNL. On Sun, Jun 20, 2021 at 9:51 AM Marc wrote: > Remarks about your cephadm approach/design: > > 1. I am not interested in learning podman, rook or kubernetes. I am using > mesos which is also on my osd nodes to use the extra available memory and > cores. Furthermore your cephadm OC is limited to only ceph nodes. While my > mesos OC is spread across a larger cluster and has rules when, and when not > to run tasks on the osd nodes. You incorrectly assume that rgw, grafana, > prometheus, haproxy are going to be ran on your ceph OC. rgw, grafana, prom, haproxy, etc are all optional components. The monitoring stack is deployed by default but is trivially disabled via a flag to the bootstrap command. We are well aware that not everyone wants these, but we cannot ignore the vast majority of users that wants things to Just Work without figuring out how to properly deploy and manage all of these extraneous integrated components. > 2. Nico pointed out that you do not have alpine linux container images. I did > not even know you were using container images. So how big are these? Where > are these stored. And why are these not as small as they can be? Such an osd > container image should be 20MB or so at most. I would even expect statically > build binary container image, why even a tiny os? > 4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ > to do with container thinking. I expected to find container images for osd, > msd, rgw separately and smaller. This looks more like an OS deployment. Early on the team building the container images opted for a single image that includes all of the daemons for simplicity. We could build stripped down images for each daemon type, but that's an investment in developer time and complexity and we haven't heard any complaints about the container size. (Usually a few hundred MB on a large scale storage server isn't a problem.) > 3. Why is in this cephadm still being talked about systemd? Your orchestrator > should handle restarts,namespaces and failed tasks not? There should be no > need to have a systemd dependency, at least I have not seen any container > images relying on this. Something needs to start the ceph daemon containers when the system reboots. We integrated with systemd since all major distros adopted it. Cephadm could be extended to support other init systems with pretty minimal effort... we aren't doing anything fancy with systemd. > 5. I have been writing this previously on the mailing list here. Is each rgw > still requiring its own dedicated client id? Is it still true, that if you > want to spawn 3 rgw instances, they need to authorize like client.rgw1, > client.rgw2 and client.rgw3? > This does not allow for auto scaling. The idea of using an OC is that you > launch a task, and that you can scale this task automatically when necessary. > So you would get multiple instances of rgw1. If this is still and issue with > rgw, mds and mgr etc. Why even bother doing something with an OC and > containers? The orchestrator automates the creation and cleanup of credentials for each rgw instance. (It also trivially scales them up/down, ala k8s.) If you have an autoscaler, you just need to tell cephadm how many you want and it will add/remove daemons. If you are using cephadm's ingress (haproxy) capability, the LB configuration will be adjusted for you. If you are using an external LB, you can query cephadm for a description of the current daemons and their endpoints and feed that info into your own ingress solution. > 6. As I wrote before I do not want my rgw or haproxy running in a OC that has > the ability to give tasks capability SYSADMIN. So that would mean I have to > run my osd daemons/containers separately. Only the OSD containers get extra caps to deal with the storage hardware. > 7. If you are not setting cpu and memory limits on your cephadm containers, > then again there is an argument why even use containers. Memory limits are partially implemented; we haven't gotten to CPU limits yet. It's on the list! > 8. I still see lots of comments on the mailing list about accessing logs. I > have all my containers log to a remote syslog server, if you still have your > ceph daemons that can not do this (correctly). What point is it even going to
[ceph-users] Re: Why you might want packages not containers for Ceph deployments
I bumped into this recently: https://samuel.karp.dev/blog/2021/05/running-freebsd-jails-with-containerd-1-5/ :) Kevin From: Sage Weil Sent: Thursday, June 24, 2021 2:06 PM To: Stefan Kooman Cc: Nico Schottelius; Kai Börnert; Marc; ceph-users Subject: [ceph-users] Re: Why you might want packages not containers for Ceph deployments Check twice before you click! This email originated from outside PNNL. On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman wrote: > On 6/21/21 6:19 PM, Nico Schottelius wrote: > > And while we are at claiming "on a lot more platforms", you are at the > > same time EXCLUDING a lot of platforms by saying "Linux based > > container" (remember Ceph on FreeBSD? [0]). > > Indeed, and that is a more fundamental question: how easy it is to make > Ceph a first-class citizen on non linux platforms. Was that ever a > (design) goal? But then again, if you would be able to port docker > natively to say OpenBSD, you should be able to run Ceph on it as well. Thank you for bringing this up. This is in fact a key reason why the orchestration abstraction works the way it does--to allow other runtime environments to be supported (FreeBSD! sysvinit/Devuan/whatever for systemd haters!) while ALSO allowing an integrated, user-friendly experience in which users workflow for adding/removing hosts, replacing failed OSDs, managing services (MDSs, RGWs, load balancers, etc) can be consistent across all platforms. For 10+ years we basically said "out of scope" to these pesky deployment details and left this job to Puppet, Chef, Ansible, ceph-deploy, rook, etc., but the result of that strategy was pretty clear: ceph was hard to use and the user experience dismal when compared to an integrated product from any half-decent enterprise storage company, or products like Martin's that capitalize on core ceph's bad UX. The question isn't whether we support other environments, but how. As I mentioned in one of my first messages, we can either (1) generalize cephadm to work in other environments (break the current systemd+container requirement), or (2) add another orchestrator backend that supports a new environment. I don't have any well-formed opinion here. There is a lot of pretty generic "orchestration" logic in cephadm right now that isn't related to systemd or containers that could either be pulled out of cephadm into the mgr/ochestrator layer or a library. Or an independent, fresh orch backend implementation could opt for a very different approach or set of opinions. Either way, my assumption has been that these other environments would probably not be docker|podman-based. In the case of FreeBSD we'd probably want to use jails or whatever. But anything is possible. s ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io